Together AI
Staff Engineer, Distributed Storage and HPC & AI Infrastructure at Together AI — San Francisco. Staff-level engineering role on the Engineering team.
As published by Together AI on their official careers page.
In this role, you will design and deliver multi-petabyte storage systems purpose-built for the world’s largest AI training and inference workloads. You’ll architect high-performance parallel filesystems and object stores, evaluate and integrate cutting-edge technologies such as WekaFS, Ceph, and Lustre, and drive aggressive cost optimization-routinely achieving 30-50% savings through intelligent tiering, lifecycle policies, capacity forecasting, and right-sizing.
You will also build Kubernetes-native storage operators and self-service platforms that provide automated provisioning, strict multi-tenancy, performance isolation, and quota enforcement at cluster scale. Day-to-day, you’ll optimize end-to-end data paths for 10-50 GB/s per node, design multi-tier caching architectures, implement intelligent prefetching and model-weight distribution, and tune parallel filesystems for AI workloads.
Responsibilities
Together AI is a research-driven artificial intelligence company. We believe open and transparent AI systems will drive innovation and create the best outcomes for society, and together we are on a mission to significantly lower the cost of modern AI systems by co-designing software, hardware, algorithms, and models. We have contributed to leading open-source research, models, and datasets to advance the frontier of AI, and our team has been behind technological advancement such as FlashAttention, Hyena, FlexGen, and RedPajama. We invite you to join a passionate group of researchers in our journey in building the next generation AI infrastructure.
We offer competitive compensation, startup equity, health insurance, and other benefits, as well as flexibility in terms of remote work. The US base salary range for this full-time position is: $250,000 - $300,000 + equity + benefits. Our salary ranges are determined by location, level and role. Individual compensation will be determined by experience, skills, and job-related knowledge.
Together AI is an Equal Opportunity Employer and is proud to offer equal employment opportunity to everyone regardless of race, color, ancestry, religion, sex, national origin, sexual orientation, age, citizenship, marital status, disability, gender identity, veteran status, and more.
Please see our privacy policy at https://www.together.ai/privacy
OpenAI
Software Engineer, Infrastructure Reliability at OpenAI — London, UK. Mid-level engineering role on the Applied AI Infrastructure team.
OpenAI
Solutions Engineering, Ads Solutions at OpenAI — San Francisco. Mid-level engineering role on the Go To Market team.
OpenAI
Software Engineer, Internal Applications - Enterprise at OpenAI — Remote · San Francisco. Intern-level engineering role on the Security team.
OpenAI
AI Deployment Engineer, Startups at OpenAI — San Francisco. Mid-level engineering role on the Technical Success team.
OpenAI
Value Engineer, AI Success - San Francisco at OpenAI — San Francisco. Mid-level engineering role on the AI Success team.
OpenAI
Engineering Manager, Premium at OpenAI — San Francisco. Lead-level engineering role on the Applied AI Engineering team.