Tips for Good Practice
Avoid too many small/fragment files.
Reason: Too many small files will make the disk spend more time on looking for data. The best situation is that needed data are stored contiguously.
Solutions:
Save more instances into one larger file. Note that the file should not be too big for some file formats. For example, it takes much longer time to open a huge file in matlab compared with opening some smaller ones.
Design an appropriate structure to save a set of small files. For example, when conducting instance segmentation, there are many instance (more than 20 instances on some datasets) masks in one image. Instead of saving instance masks in separate files, saving them into one file and encode each instance mask with instance ID is much better.
Avoid too much redundant IO read/write
Optimize your code to reduce the times of IO. This will not only speed up your code, but also make our servers smoother.
Always check memory occupation
Reason: If the memory is full, it will lead to the server not respond to any operations and we can only reboot it. So it will be beneficial to save some middle results in case the server crashes.
Solutions:
If you want to use parfor in matlab, run with small number of threads first and check the memory occupation of your program. Then increase the number of threads gradually.
Before running experiments, check the amount of empty memory left. Do not run new experiments when less than 1G memory is left. The reason is that memory consumption of some programs varies. Leave some empty memory for them.
Clear some middle temporary variables to save memory.
Try on tiny dataset to check the correctness of your code
When finish your implementation, try to make your code and model overfit on a tiny dataset to check whether there are bugs in your code. If it can not overfit, go to check your code.
Be aware of overfitting and underfitting
When you are trying on a new task, new dataset or new network structure, there may not be many papers to refer to. Then you need to try the learning rate, batch size, step size etc. When finish training, do remember to check the trend of the performance on validation in case of overfitting or underfitting.