SkyPilot
This is a framework for running LLMs, AI, and batch jobs on any cloud, offering maximum cost savings, highest GPU availability, and managed execution. It's very useful to train a ML model in the cloud.
- SkyPilot abstracts away cloud infra burdens
- SkyPilot maximizes GPU availability for your jobs
- SkyPilot cuts your cloud costs
- https://github.com/skypilot-org/skypilot