Software Engineer, Inference

Remote · San FranciscoEngineeringLead / Manager$252,000 – $280,000

Lead operational processes for OpenAI's developer ecosystem, partnering cross-functionally to scale platform growth while upholding trust and safety.

program managementdeveloper operationsapisoauth

Forward Deployed Engineer - Stockholm

OpenAI

Remote · Stockholm, SwedenEngineeringMid-level

Forward Deployed Engineer - Stockholm at OpenAI — Remote · Stockholm, Sweden. Mid-level engineering role on the Model Deployment for Business team.

Remote · San FranciscoEngineeringLead / Manager$293,000 – $490,000

Engineering Manager, Identity & Access Platform

OpenAI

4d ago

Engineering Manager, Identity & Access Platform at OpenAI — Remote · San Francisco. Lead-level engineering role on the Security team.

gorustawsrag

System Performance Engineer, Consumer Devices

OpenAI

Remote · San FranciscoEngineeringMid-level$293,000 – $325,000

System Performance Engineer, Consumer Devices at OpenAI — Remote · San Francisco. Mid-level engineering role on the Software team.

pythonrustc++aws

Manager, Forward Deployed Engineering

OpenAI

3d ago

San FranciscoEngineeringLead / Manager$280,000 – $335,000

Manager, Forward Deployed Engineering at OpenAI — San Francisco. Lead-level engineering role on the Model Deployment for Business team.

Full Stack Software Engineer, API Experience

OpenAI

Remote · New York CityEngineeringMid-level$293,000 – $385,000

Full Stack Software Engineer, API Experience at OpenAI — Remote · New York City. Mid-level engineering role on the Core Product & Platform | API team.

pythontypescriptreactgo

Software Engineer, Inference

Pika

Palo Alto HQEngineeringMid-levelFull-time$185,000 – $250,000

Role at a glance

Software Engineer, Inference at Pika — Palo Alto HQ. Mid-level engineering role on the Engineering team.

→Based in Palo Alto HQ
→Engineering team
→$185,000 – $250,000
→Full-time

Who should apply

Skills & technologies

gocudallmdeep learningscala

Full job description

As published by Pika on their official careers page.

About the Role

What You’ll Do

Accelerate Inference: Lead and implement advanced inference acceleration techniques, including attention optimization and quantization for efficient model serving.
Maximize GPU Parallelism: Engineer and optimize GPU strategies across tensor, sequence, and pipeline parallelism (TP, SP, PP) for maximal efficiency and scalability.
Programming for Performance: Develop and optimize high-performance computing kernels and distributed workloads using CUDA and NCCL.
Advance AI Deployment: Collaborate with research and engineering teams to bring state-of-the-art videogen and large language models into production.
Improve Training Efficiency: (Bonus) Contribute to improvements in model training speed, stability, and resource utilization as part of our deployment lifecycle.
Technical Excellence: Drive rigorous code reviews, participate in technical discussions, and mentor fellow engineers on best practices in inference and GPU programming.

What We’re Looking For

Experience: 3+ years engineering experience, with a strong track record in inference acceleration and model deployment at scale.
Inference Mastery: Proven expertise in inference optimization, including quantization, attention acceleration, and deep learning compiler stacks.
GPU & Parallelism: Deep knowledge of GPU programming (CUDA, NCCL) and experience with SP, TP, PP, and other forms of parallelism for distributed inference.
AI Domain Knowledge: Familiarity with video generation (videogen) models and large language models (LLMs).
Collaboration: Strong cross-discipline communication skills; able to drive shared goals across research and engineering functions.
Ownership Mindset: Self-driven, solutions-oriented, and capable of managing ambiguity in a fast-paced startup environment.
Bonus: Experience in enhancing training efficiency, stability, or resource optimization for large models.

Nice to Have

Experience with high-throughput video or real-time streaming model deployment
Familiarity with distributed training and optimization toolkits
Contributions to open source projects in AI infrastructure or deep learning compilers
Startup or rapid prototyping experience

What We Offer

Competitive salary in the AI industry
Equity in a fast-growing startup shaping the future of AI
Comprehensive health benefits, monthly stipends, company retreats
A supportive and collaborative office culture—we’re all building and launching together

About Pika

We work from our Palo Alto office 3–5 days a week and welcome applicants who are eager to contribute onsite.

Related roles

Platform Operations Program Manager

OpenAI

Remote · San FranciscoEngineeringLead / Manager$252,000 – $280,000

Lead operational processes for OpenAI's developer ecosystem, partnering cross-functionally to scale platform growth while upholding trust and safety.

program managementdeveloper operationsapisoauth

Forward Deployed Engineer - Stockholm

OpenAI

Remote · Stockholm, SwedenEngineeringMid-level

Forward Deployed Engineer - Stockholm at OpenAI — Remote · Stockholm, Sweden. Mid-level engineering role on the Model Deployment for Business team.

Remote · San FranciscoEngineeringLead / Manager$293,000 – $490,000

Engineering Manager, Identity & Access Platform

OpenAI

4d ago

Engineering Manager, Identity & Access Platform at OpenAI — Remote · San Francisco. Lead-level engineering role on the Security team.

gorustawsrag

System Performance Engineer, Consumer Devices

OpenAI

Remote · San FranciscoEngineeringMid-level$293,000 – $325,000

System Performance Engineer, Consumer Devices at OpenAI — Remote · San Francisco. Mid-level engineering role on the Software team.

pythonrustc++aws

Manager, Forward Deployed Engineering

OpenAI

3d ago

San FranciscoEngineeringLead / Manager$280,000 – $335,000

Manager, Forward Deployed Engineering at OpenAI — San Francisco. Lead-level engineering role on the Model Deployment for Business team.

Full Stack Software Engineer, API Experience

OpenAI