Senior Software Engineer, Inference

Pika

Palo Alto HQEngineeringSeniorFull-time$185,000 – $250,000

Role at a glance

Pika seeks a senior inference engineer to optimize AI model performance using GPU parallelism and advanced deployment techniques in Palo Alto.

→Lead advanced inference acceleration including quantization and attention optimi
→Engineer GPU parallelism strategies across TP, SP, and PP
→Develop high-performance kernels using CUDA and NCCL
→Collaborate on videogen and LLM deployment into production
→Drive code reviews and mentor engineers on inference best practices

Who should apply

Candidates need 5+ years of engineering experience with a proven record in inference acceleration and large-scale model deployment. They must demonstrate expertise in quantization, attention optimization, CUDA, NCCL, and distributed parallelism strategies including TP, SP, and PP. Ideal applicants also bring familiarity with videogen models and LLMs plus strong cross-team collaboration skills.

Skills & technologies

inference optimizationquantizationcudanccltensor parallelismsequence parallelismpipeline parallelismvideogen models

Full job description

As published by Pika on their official careers page.

About the Role

We are seeking a Senior Inference Engineer to accelerate the performance of Pika's AI-driven products. In this highly technical role, you will operate at the intersection of cutting-edge inference acceleration, GPU parallelism, advanced model deployment, and video generation technologies. Your expertise will drive significant improvements to model speed and efficiency, ensuring our creative AI systems deliver industry-leading user experiences at scale.

You will design and optimize inference pipelines, implement state-of-the-art acceleration techniques, and work closely with researchers and engineers across the team to push the boundaries of what’s possible in real-time AI deployment. Your efforts will play a foundational role in powering the next generation of Pika’s video and language models.

What You’ll Do

Accelerate Inference: Lead and implement advanced inference acceleration techniques, including attention optimization and quantization for efficient model serving.
Maximize GPU Parallelism: Engineer and optimize GPU strategies across tensor, sequence, and pipeline parallelism (TP, SP, PP) for maximal efficiency and scalability.
Programming for Performance: Develop and optimize high-performance computing kernels and distributed workloads using CUDA and NCCL.
Advance AI Deployment: Collaborate with research and engineering teams to bring state-of-the-art videogen and large language models into production.
Improve Training Efficiency: (Bonus) Contribute to improvements in model training speed, stability, and resource utilization as part of our deployment lifecycle.
Technical Excellence: Drive rigorous code reviews, participate in technical discussions, and mentor fellow engineers on best practices in inference and GPU programming.

What We’re Looking For

Experience: 5+ years engineering experience, with a strong track record in inference acceleration and model deployment at scale.
Inference Mastery: Proven expertise in inference optimization, including quantization, attention acceleration, and deep learning compiler stacks.
GPU & Parallelism: Deep knowledge of GPU programming (CUDA, NCCL) and experience with SP, TP, PP, and other forms of parallelism for distributed inference.
AI Domain Knowledge: Familiarity with video generation (videogen) models and large language models (LLMs).
Collaboration: Strong cross-discipline communication skills; able to drive shared goals across research and engineering functions.
Ownership Mindset: Self-driven, solutions-oriented, and capable of managing ambiguity in a fast-paced startup environment.
Bonus: Experience in enhancing training efficiency, stability, or resource optimization for large models.

Nice to Have

Experience with high-throughput video or real-time streaming model deployment
Familiarity with distributed training and optimization toolkits
Contributions to open source projects in AI infrastructure or deep learning compilers
Startup or rapid prototyping experience

What We Offer

Competitive salary in the AI industry
Equity in a fast-growing startup shaping the future of AI
Comprehensive health benefits, monthly stipends, company retreats
A supportive and collaborative office culture—we’re all building and launching together

About Pika

At Pika, we're crafting a future where video creation is seamless, intuitive, and universally accessible. Our mission is to empower creativity by breaking down technical barriers using the transformative power of AI. We’re a tight-knit, energetic team based in Palo Alto, CA, valuing efficiency, curiosity, and the ambition to make a meaningful impact on the world.

We work from our Palo Alto office 3–5 days a week and welcome applicants who are eager to contribute onsite.

Related roles

Data Center Physical Security Systems Engineer

OpenAI

Yesterday

Remote · Remote - USEngineeringMid-level$205,000 – $335,000

Data Center Physical Security Systems Engineer at OpenAI — Remote · Remote - US. Mid-level engineering role on the Security team.

gorustawsrest

Engineering Manager, MLE

OpenAI

Yesterday

San FranciscoEngineeringLead / Manager$293,000 – $385,000

Engineering Manager, MLE at OpenAI — San Francisco. Lead-level engineering role on the Applied AI Engineering team.

gorustawspytorch

Manager, Forward Deployed Engineering

OpenAI

Yesterday

San FranciscoEngineeringLead / Manager$280,000 – $335,000

Lead and grow a forward deployed engineering team delivering production AI systems for enterprise customers from San Francisco.

team leadershipjavascriptpythonfrontend

AI Deployment Engineer

OpenAI

Yesterday

Remote · New York CityEngineeringMid-level$197,000 – $278,000

Mid-level engineering role deploying generative AI solutions post-sale for enterprise customers in hybrid NYC office.

pythonjavascriptgenerative ai deploymentsolutions architecture

Staff Security Reliability Engineer

OpenAI

Yesterday

Remote · San FranciscoEngineeringStaff$293,000 – $385,000

Staff engineer to design, build, and operate secure, reliable infrastructure for identity and platform services.

terraformchefansiblemicrosoft entra

Senior RTL Engineer, Interconnect Design

OpenAI

Yesterday

Remote · San FranciscoEngineeringSenior$225,000 – $445,000

Senior RTL engineer to own microarchitecture and delivery of on- and off-chip interconnect fabrics for OpenAI's custom AI accelerator SoC.

rtl designmicroarchitecturenocsoc interconnect

Senior Software Engineer, Inference

Pika

Palo Alto HQEngineeringSeniorFull-time$185,000 – $250,000

Role at a glance

Pika seeks a senior inference engineer to optimize AI model performance using GPU parallelism and advanced deployment techniques in Palo Alto.

→Lead advanced inference acceleration including quantization and attention optimi
→Engineer GPU parallelism strategies across TP, SP, and PP
→Develop high-performance kernels using CUDA and NCCL
→Collaborate on videogen and LLM deployment into production
→Drive code reviews and mentor engineers on inference best practices

Who should apply

Skills & technologies

inference optimizationquantizationcudanccltensor parallelismsequence parallelismpipeline parallelismvideogen models

Full job description

As published by Pika on their official careers page.

About the Role

What You’ll Do

Accelerate Inference: Lead and implement advanced inference acceleration techniques, including attention optimization and quantization for efficient model serving.
Maximize GPU Parallelism: Engineer and optimize GPU strategies across tensor, sequence, and pipeline parallelism (TP, SP, PP) for maximal efficiency and scalability.
Programming for Performance: Develop and optimize high-performance computing kernels and distributed workloads using CUDA and NCCL.
Advance AI Deployment: Collaborate with research and engineering teams to bring state-of-the-art videogen and large language models into production.
Improve Training Efficiency: (Bonus) Contribute to improvements in model training speed, stability, and resource utilization as part of our deployment lifecycle.
Technical Excellence: Drive rigorous code reviews, participate in technical discussions, and mentor fellow engineers on best practices in inference and GPU programming.

What We’re Looking For

Experience: 5+ years engineering experience, with a strong track record in inference acceleration and model deployment at scale.
Inference Mastery: Proven expertise in inference optimization, including quantization, attention acceleration, and deep learning compiler stacks.
GPU & Parallelism: Deep knowledge of GPU programming (CUDA, NCCL) and experience with SP, TP, PP, and other forms of parallelism for distributed inference.
AI Domain Knowledge: Familiarity with video generation (videogen) models and large language models (LLMs).
Collaboration: Strong cross-discipline communication skills; able to drive shared goals across research and engineering functions.
Ownership Mindset: Self-driven, solutions-oriented, and capable of managing ambiguity in a fast-paced startup environment.
Bonus: Experience in enhancing training efficiency, stability, or resource optimization for large models.

Nice to Have

Experience with high-throughput video or real-time streaming model deployment
Familiarity with distributed training and optimization toolkits
Contributions to open source projects in AI infrastructure or deep learning compilers
Startup or rapid prototyping experience

What We Offer

Competitive salary in the AI industry
Equity in a fast-growing startup shaping the future of AI
Comprehensive health benefits, monthly stipends, company retreats
A supportive and collaborative office culture—we’re all building and launching together

About Pika

We work from our Palo Alto office 3–5 days a week and welcome applicants who are eager to contribute onsite.

Related roles

Data Center Physical Security Systems Engineer

OpenAI

Yesterday

Remote · Remote - USEngineeringMid-level$205,000 – $335,000

Data Center Physical Security Systems Engineer at OpenAI — Remote · Remote - US. Mid-level engineering role on the Security team.

gorustawsrest

Engineering Manager, MLE

OpenAI

Yesterday

San FranciscoEngineeringLead / Manager$293,000 – $385,000

Engineering Manager, MLE at OpenAI — San Francisco. Lead-level engineering role on the Applied AI Engineering team.

gorustawspytorch

Manager, Forward Deployed Engineering

OpenAI

Yesterday

San FranciscoEngineeringLead / Manager$280,000 – $335,000

Lead and grow a forward deployed engineering team delivering production AI systems for enterprise customers from San Francisco.

team leadershipjavascriptpythonfrontend

AI Deployment Engineer

OpenAI

Yesterday

Remote · New York CityEngineeringMid-level$197,000 – $278,000

Mid-level engineering role deploying generative AI solutions post-sale for enterprise customers in hybrid NYC office.

pythonjavascriptgenerative ai deploymentsolutions architecture

Staff Security Reliability Engineer

OpenAI

Yesterday

Remote · San FranciscoEngineeringStaff$293,000 – $385,000

Staff engineer to design, build, and operate secure, reliable infrastructure for identity and platform services.

terraformchefansiblemicrosoft entra

Senior RTL Engineer, Interconnect Design

OpenAI

Yesterday

Remote · San FranciscoEngineeringSenior$225,000 – $445,000

Senior RTL engineer to own microarchitecture and delivery of on- and off-chip interconnect fabrics for OpenAI's custom AI accelerator SoC.

rtl designmicroarchitecturenocsoc interconnect