The Carnegie Mellon University (CMU) Neuroscience Institute and the CNBC at CMU maintains its own computer facilities that provide faculty, staff and students with support, access to state of the art equipment and high-speed network access. We maintain multiple servers that are located in a machine room in Mellon Institute. All servers have redundant power supplies and Uninterrupted Power Supplies (UPS) ensuring connectivity for several hours without power. A 70TB, enterprise grade, RAID6, disk storage system provides space for users to store their data. They can access the files from their local computer using CMU’s network. The files are incrementally backed up to a separate server and saved for 3 months. A server with 40TBs of RAID 5 enterprise grade disk storage space was installed in October 2014 to provide desktop/laptop file incremental backup using Code42 (formally called CrashPlan PROe) software, a multi-platform (MacOS, Windows, Linux, Solaris) enterprise solution. In June 2016, a 100TB , enterprise grade, disk storage system using zfs RAID6 was built to store the MICrONs project data, a project funded by the Intelligence Advanced Research Projects Activity (IARPA).
The Cluster is located in the CMU School of Computer Science (SCS) machine room. The SCS Facility has 24/7 support and a team of HPC experts that works closely with the NI computing administrator. Currently the cluster is a two rack system, with power distribution units and Uninterrupted Power Supply (UPS) protecting the essential components from primary power source loss. The cluster consists of 22 CPU nodes and 10 GPU nodes with 4 GPU cards in each. The combined 32 compute node cluster system includes a total of 568 CPUs, 32 GPU cards, 85 terabytes of shared disk space and 2.4 terabytes of RAM. Components are connected via high speed Infiniband network and the cluster is accessible through the Carnegie Mellon University Ethernet network. The CentOS operating system allows for a flexible, stable working environment for the machine’s core applications. We use SLURM Workload Manager for job scheduling and to allocate computational tasks, (i.e., batch jobs) among the available CPU and GPU computing nodes. Users’ home directories are incrementally backed up to the SCS’s tape backup system. The two data partitions are configured using RAID 6 and the ZFS file system. The data partitions have snapshots enabled and each are replicated onto an identical RAID 6 volume for additional protection. Examples of some software packages that are used on the cluster include:
Software packages used on the cluster include:
The Computing Facilities are a 100% recharge center, meaning that the costs we recover from users is expected to support the ongoing costs of operation. We therefore charge annual fees to recover the costs of providing these services, which include ongoing costs to pay for power and maintain the cluster as well as salary for staff. The cluster is kept up to date and maintained by SCS Facilities. David Pane, Manager of Computational Resources for the NI, is on hand to answer any questions, handle problems and request(s) for software and to support our users.
David maintains a website with basic information about the cluster services, a description of the cluster, policies and procedures and other computing information. The fees assessed for access to computational support generally come close to recovering the ongoing costs of operating the Computing Center; however, they do not support any capital investment in the Cluster. Many of the computational nodes and storage making up the Cluster were purchased by PIs using grant or startup funds, as there is no budget specifically for upgrading the Cluster infrastructure. Given that these resources age rapidly, we rely on users to invest in hardware regularly to keep the Cluster operational.