CSL: Facilities: Cluster Computers

The C. S. Department maintains a small cluster of 67 (2336 cores) machines for general computing. The machines are set up to use Slurm for batch job submission. Be aware that you will need to use your NetID and NetID password to access the login host (sbatch.cs.duke.edu). The compute cluster machines are on a 3-4 year replacement cycle, so any machine in the cluster should be no more than four years old. You can use the program sinfo to monitor cluster usage. For more information see the Slurm documentation.

N.B. As of July 1st, 2020 you will need to use MFA to access all publically accessible ssh hosts (sbatch.cs.duke.edu, login.cs.duke.edu, and dukelogin.cs.duke.edu).

Logging in for a terminal session

The terminal session looks like this:

macbook-pro $ ssh sbatch.cs.duke.edu
netid@sbatch.cs.duke.edu's password: 
Duo two-factor login for netid

Enter a passcode or select one of the following options:

   1. Duo Push to XXX-XXX-1234
   2. Phone call to XXX-XXX-1234
   3. SMS passcodes to XXX-XXX-1234 (next code starts with: 1)

Passcode or option (1-3): 1
Success. Logging you in...
Last login: Wed May 20 12:21:38 2020 from 174.247.16.115
netid@sbatch ~

The rest of your session will proceed as normal. Transferring data using SCP will look just the same as logging in via ssh.

 

The cluster is comprised of the following machine configurations:

GPU Resources:

 

GPU Cores Tensor Cores VRAM Hosts
10 V100s 5120 640 32GB gpu-compute[5-7]
26 P100s 3584   16GB

linux[41-50]

gpu-compute[4-5]

24 K80s 4992   12GB gpu-compute[1-3]
30 2080RTXTi 4352   11GB linux[41-60]

 

10x TensorEX TS2-673917-DPN Intel Xeon Gold 6226 Processor, 2.7Ghz (768GB RAM 48 cores). Each of the machines has 2 Nvidia Tesla 2080 RTX Tis.

  • linux51
  • linux52
  • linux53
  • linux54
  • linux55
  • linux56
  • linux57
  • linux58
  • linux59
  • linux60

10x Tensor TXR231-1000R D126 Intel(R) Xeon(R) CPU E5-2640 v4 @ 2.40GHz (512GB RAM - 40 cores). Each of the machines has 2 Nvidia Tesla P100s, and 1 Nvidia 2080 RTX Tis.

  • linux41
  • linux42
  • linux43
  • linux44
  • linux45
  • linux46
  • linux47
  • linux48
  • linux49
  • linux50
N.B. If you need double-precision FP use the k80, p100, or v100

3x Quantum TXR430-0512R Intel(R) Xeon(R) CPU E5-2640 v3 @ 2.60GHz (256GB RAM - 32 cores) with 10GB interconnects. Each of the machines has 4 Nvidia Tesla K80s.

  • gpu-compute1
  • gpu-compute2
  • gpu-compute3

4x Quantum TXR113-1000R Intel(R) Xeon(R) CPU E5-2640 v4 @ 2.40GHz (256GB RAM - 40 cores) with 10GB interconnects. Each of the machines has Nvidia Tesla P100s or V100s.

  • gpu-compute4 (4x P100s)
  • gpu-compute5 (2x P100s, 2x V100s)
  • gpu-compute6 (4x V100s)
  • gpu-compute7 (4x V100s)

2x Dell R610 with 2x E5540 Xeon Processor, 2.53GHz 8M Cache (96GB RAM - 16 cores)

  • linux29.cs.duke.edu
  • linux30.cs.duke.edu

10x Dell R730 with 2 Intel Xeon E5-2640 v4 2.4GHz,25M Cache (256GB RAM - 40 cores)

  • linux31.cs.duke.edu
  • linux32.cs.duke.edu
  • linux33.cs.duke.edu
  • linux34.cs.duke.edu
  • linux35.cs.duke.edu
  • linux36.cs.duke.edu
  • linux37.cs.duke.edu
  • linux38.cs.duke.edu
  • linux39.cs.duke.edu
  • linux40.cs.duke.edu

10x Dell R610 with 2 E5640 Xeon Processor, 2.66GHz 12M Cache (64GB RAM - 16 cores)

  • linux1.cs.duke.edu
  • linux2.cs.duke.edu
  • linux3.cs.duke.edu
  • linux4.cs.duke.edu
  • linux5.cs.duke.edu
  • linux6.cs.duke.edu
  • linux7.cs.duke.edu
  • linux8.cs.duke.edu
  • linux9.cs.duke.edu
  • linux10.cs.duke.edu

10x Dell R620 with 2 Xeon(R) CPU E5-2695 v2 @ 2.40GHz 30M Cache (256GB RAM - 48 hyperthreaded cores)

  • linux11.cs.duke.edu
  • linux12.cs.duke.edu
  • linux13.cs.duke.edu
  • linux14.cs.duke.edu
  • linux15.cs.duke.edu
  • linux16.cs.duke.edu
  • linux17.cs.duke.edu
  • linux18.cs.duke.edu
  • linux19.cs.duke.edu
  • linux20.cs.duke.edu

8x Dell R610 with 2 E5540 Xeon Processor, 2.53GHz 8M Cache (48GB RAM - 16 cores)

  • linux21.cs.duke.edu
  • linux22.cs.duke.edu
  • linux23.cs.duke.edu
  • linux24.cs.duke.edu
  • linux25.cs.duke.edu
  • linux26.cs.duke.edu
  • linux27.cs.duke.edu
  • linux28.cs.duke.edu

 

Please be aware that compute cluster machines are not backed up. Users should copy any important data to filesystems that are backed up to avoid losing data. In addition, try to be cognizant that this is a shared resource. Please minimize the network traffic for shared resources like disk space. If you need to read and write lots of data, please copy that to local disks, compute the results, and store the results on longer term storage.