Requesting Resources
TLDR
Remember, Turing is a shared resource, so only use what you absolutely need!
Determining What You Need
The first step to requesting resources is getting a good estimate of what you need to begin with. If you need to run an LLM locally, consider its size and the size of your data. If you are running software, look at the minimum specifications its documentation provides. From there, increase your requested resources until you are satisfied with Turing's performance.
Remember! Over-requesting resources will result in longer wait times in the queue!
Common Pitfalls
- Requesting many more GPUs than necessary to run your model
- This can actually slow your model in addition to increasing the wait time. Most default GPU optimizations split your model across the GPUs. This is great for fitting large models on GPUs that can't support them alone, but it also increases communication bottlenecks as data gets passed from one GPU to the other.
- Requesting many CPUs than necessary
- Some programs do need a lot of CPUs to parallelize computation. But unless you are using CPU-intensive code, you wont benefit from more spare computational power.
- Requesting large amounts of RAM
- This is arguably the most difficult resource to balance. If you get OOM errors, then requesting more is the best option assuming there are no issues with your code. Our advice is start small and work up gradually until you fix the issue.