cnbc cluster

The Carnegie Mellon University (CMU) Neuroscience Institute and the CNBC at CMU maintains its own computer facilities that provide faculty, staff and students with support, access to state-of-the-art equipment and high-speed network access.

We maintain multiple servers that are located in a machine room in Mellon Institute. All servers have redundant power supplies and Uninterrupted Power Supplies (UPS) ensuring connectivity for several hours without power. A Dell PowerEdge R740XD server running Ubuntu Linux OS servers a 160TB, enterprise grade, RAID 6, ZFS disk storage system provides space for research labs to store their data. They can access the files from their local computer using CMU’s network. The ZFS space provides snapshots and is replicated to a separate server located at another building on campus as protection against catastrophic failure. This server also has a 100TB, enterprise grade, disk storage system using ZFS RAID 6 which was built to store the MICrONs project data, a project funded by the Intelligence Advanced Research Projects Activity (IARPA). This storage pool is also configured with ZFS snapshots and replication to the backup server.

A server with 40TBs of RAID 5 enterprise grade disk storage space was installed in October 2014 to provide desktop/laptop file incremental backup using Code42 software, a multi-platform enterprise solution used by core-staff and some of the NI/CNBC faculty and lab members in the Mellon Institute location.

The HPC Cluster is located in the CMU School of Computer Science (SCS) machine room. The SCS Facility has 24/7 support and a team of HPC experts that works closely with the NI computing administrator. Currently the cluster is a two-rack system, with power distribution units and Uninterrupted Power Supply (UPS) protecting the essential components from primary power source loss. The cluster consists of 21 CPU nodes and 12 GPU nodes. The combined 33 compute node cluster system includes a total of 700 CPUs, 46 GPU cards, 280TB terabytes of shared disk space and over 2.8 terabytes of RAM. Components are connected via high speed Infiniband network and the cluster is accessible through the Carnegie Mellon University Ethernet network. The CentOS operating system allows for a flexible, stable working environment for the machine’s core applications. The SLURM Workload Manager is used for job scheduling and to allocate computational tasks, (i.e., batch jobs) among the available CPU and GPU computing nodes. The individual users’ home directories and data space are incrementally backed up to the SCS’s tape backup system. The two data partitions are configured using RAID 6 and the ZFS file system. The data partitions have snapshots enabled and each are replicated onto an identical RAID 6 volume for additional protection.

Examples of some software packages that are used on the cluster include:
• Anaconda
• DSI Studio
• Freesurfer
• Jupyter Notebook/Lab
• Matlab
• R
• SPM8 and SPM12
• TensorFlow

Faculty and their lab members affiliated with the CNBC at CMU and Pitt use the HPC Cluster for a fee to cover support. The Computing Facility is a 100% recharge center, meaning that the costs we recover from users is expected to support the ongoing costs of operation. We therefore charge annual fees to recover the costs of providing these services, which include ongoing costs to pay for power and maintain the cluster as well as salary for staff. The cluster is kept up to date and maintained by SCS Facility. David Pane, Manager of Computational Resources for the NI/CNBC, is available for our users.  Contact David if you are interested in using the cluster, have questions, problems and/or have request(s) for software.

David maintains a website with basic information about the cluster services, a description of the cluster, policies and procedures. The fees assessed for access to computational support generally come close to recovering the ongoing costs of operating the Computing Center; however, they do not support any capital investment in the Cluster. Many of the computational nodes and storage making up the Cluster were purchased by PIs using grant or startup funds, as there is no budget specifically for upgrading the Cluster infrastructure. Given that these resources age rapidly, we rely on users to invest in hardware regularly to keep the Cluster operational.