Usage
The master node is proj1, you should log in to proj1 to submit jobs, with your Unix username and password:
ssh $USER@proj1
Important: DO NOT run any jobs on the master node proj1.
Job Submission
We use Slurm to manage our cluster, the guidance can be found in https://corner.cse.cuhk.edu.hk/fac/slurm-qug.html and https://slurm.schedmd.com/
Resources
We have 88 GPUs in 14 nodes. Your home folder $HOME is: /mnt/home/xxx.
- Backup folder: the backup disk (shared across all nodes) has 48T storage and will be expanded to 80T (in Dec 2019) in total. While this disk is NOT SSD, it is slow for frequently reading and writing. You can store your codes and environments (e.g, Conda) here while you are not recommended to store your datasets that are used frequently here. As the name indicates, it is for backup. The quota for each user on this disk is 3T.
- Work folder: the local disk (also shared across all nodes) in each node is SSD and you can do frequently reading and writing operations. The overview of the resources is listed in the following table. Two or three users share one node's local disk (as specified in the table). The quota for each user on the local disk is 3T. The remaining quota on a local disk is reserved for other users' temporary usage.
- The GPUs assigned to you are not specified to a certain node. While you can still access your work folder on different nodes (NFS already set up). The network loading speed is fast. If you find the loading speed is not satisfactory, you can temporarily copy your data into the assigned node, and delete it after your job finished. Other users can request to delete your data on the node when the local disk is full.
- Some commonly used shared datasets can be found under the folder: /mnt/backup/project/shareddataset. You can put your datasets here if they occupy lots of your quota on the backup disk.
Name | IP | GPU | CPU | Memory | Disk | Disk User |
---|---|---|---|---|---|---|
proj2 | 50.2 | 8 x Titan Xp | 16 | 256G | 7.3T | sjqian, ztyang |
proj3 | 50.3 | 8 x Titan Xp | 16 | 256G | 7.3T | xgxu, lylu |
proj44 | 50.44 | 8 x 2080Ti | 16 | 256G | 7T | hszhao, zttian |
proj45 | 50.45 | 8 x 2080Ti | 16 | 256G | 7T | ycchen, linhj |
proj20 | 50.20 | 8 x Titan X(Pascal) | 16 | 256G | 9.1T | luqi, wenboli |
proj21 | 50.21 | 8 x 2080Ti | 16 | 256G | 12T | yiwang, pgchen |
proj22 | 50.22 | 8 x Titan X(Pascal) | 16 | 256G | 11T | rzwu, ylchen |
proj23 | 50.23 | 8 x Titan X(Pascal) | 16 | 256G | 9.1T | rxwang, lijiang |
proj56 | 50.56 | 4 x Titan V | 40 | 256G | 5.5T | jqcui |
proj57 | 50.57 | 4 x Titan V | 40 | 256G | 5.5T | none |
proj58 | 50.58 | 4 x Titan V | 40 | 256G | 5.5T | none |
proj59 | 50.59 | 4 x Titan V | 40 | 256G | 5.5T | taohu |
proj111 | 50.111 | 3x1080 Ti + 1xTitan X | 12 | 32G | 3.6T | none |
proj112 | 50.112 | 4 x Titan X | 12 | 64G | 1T | none |