Researcher, Recursive Self-Improvement Safety

San FranciscoResearchMid-levelFull-time$295,000 – $445,000

Role at a glance

Mid-level researcher role on OpenAI's Preparedness team focused on mitigating risks from recursive self-improvement in frontier AI systems.

→Scalable oversight for superhuman model regimes
→Automated auditing of production traffic and tail risks
→Rigorous monitorability testing including chain-of-thought
→Model behavior science via experiments and evaluations
→Maintaining RSI safety cases and addressing blindspots

Who should apply

Strong technical executors at the mid level who can reason strategically about future problems that may not yet exist. Candidates should demonstrate taste in designing mitigations for loss-of-control risks and turning technical work into institutional practices.

Skills & technologies

model evaluationrisk assessmentred-teamingexperiment designmisalignment monitoringtraining interventionsautomated auditingsafety case development

Full job description

As published by OpenAI on their official careers page.

About the team

Preparedness is a critical Safety Research team at OpenAI, which is focused on mitigating AI threats that could scale to an extreme level of severity.

Our work involves:

Tracking and prediction. Monitoring and predicting the evolving misalignment propensities and capabilities of frontier AI systems.
Mitigation. Keeping misuse safeguards, alignment tools, and security measures on track to adequately address extreme threats that might arise in the future.
Coordination. Setting mitigation targets by maintaining OpenAI’s preparedness framework, and partnering with other staff to achieve these targets.

This is urgent, fast-paced work that has far-reaching implications for the company and for society.

About the role

Preparedness is hiring strong technical executors to support preparations for recursive self-improvement. This work relies on reasoning about problems that might exist in the future, but might not exist now; so it’s especially important that people in this role are tasteful and strategic.

The role is wide-ranging, covering any mitigation for loss of control risk, spanning the design and implementation of better pre-deployment risk-assessment, control measures, RSI-relevant training interventions, and turning one’s technical work into established institutional practices.

Below is a subset of our focus areas:

Scalable oversight: Establishing practices for model misbehavior monitoring and oversight which remain effective in superhuman model capability regimes, with a focus on bridging from today’s monitoring approaches to future-proof ones.
Automated auditing: As model capabilities increase, we’ll increasingly rely on automated approaches for finding the most severe forms of model misalignments. We’ll both need to sift through large swaths of production traffic to find the most egregious misalignments, and reliably elicit tail risks before deployment.
Rigorous monitorability: Rigorous testing and red-teaming of our measurements of model misbehavior related to loss-of-control (e.g. reward hacking, sandbagging, scheming). This includes better understanding monitorability, and e.g. preparing for potential losses of Chain-of-Thought monitorability.
Model behavior science: Design experiments and evaluations to understand the extent to which models are problematically misaligned, or their safety-relevant capabilities lag behind dangerous capabilities. This may include training model organisms of misbehavior for behaviors not currently present in production, or training interventions to increase safety-relevant capabilities.
Coordination and verification: Prototype technical mechanisms for verifying compliance with potential future AI safety agreements.
AI R&D risk measurement: Track progress toward automation of technical staff to inform OpenAI’s near-term investments in alignment and security.
Maintaining and strengthening RSI safety cases: We’re especially interested in identifying and addressing blindspots of mitigation areas which we may have missed.

Generally, our team alternates between performing rigorous hypothesis-driven research and turning our insights into interventions or control systems which impact production models, with occasional support of engineering teams.

In this role, you will:

Carefully consider the problems OpenAI might face in the future and how to prepare for them.
Turn an open-ended objective like “prepare for future security threats” into a much more concrete direction (e.g. “implement monitors for data poisoning”) – prioritizing the work that is most useful to start right now.
Execute quickly, building scrappy prototypes, and then improving them iteratively until they become established components of our safety pipelines.
Secure buy-in from other staff at OpenAI when necessary, and communicate your work clearly.
Collaborate with or manage other staff as needed, since we might need to rapidly scale to tackle these problems quickly.

You might thrive in this role if you:

Are an exceptional technical executor.
Have strong strategic and research taste: you can prioritize effectively in domains with weak feedback loops.
Are passionate about mitigating the risks associated with recursive self-improvement.
Are driven by a desire to do whatever work most positively impacts the future of AI development.
Bonus: you have already done work in one of the domains listed above (ML research, AI alignment, AI verification etc).

About OpenAI

OpenAI is an AI research and deployment company dedicated to ensuring that general-purpose artificial intelligence benefits all of humanity. We push the boundaries of the capabilities of AI systems and seek to safely deploy them to the world through our products. AI is an extremely powerful tool that must be created with safety and human needs at its core, and to achieve our mission, we must encompass and value the many different perspectives, voices, and experiences that form the full spectrum of humanity.

We are an equal opportunity employer, and we do not discriminate on the basis of race, religion, color, national origin, sex, sexual orientation, age, veteran status, disability, genetic information, or other applicable legally protected characteristic.

For additional information, please see OpenAI’s Affirmative Action and Equal Employment Opportunity Policy Statement.

Background checks for applicants will be administered in accordance with applicable law, and qualified applicants with arrest or conviction records will be considered for employment consistent with those laws, including the San Francisco Fair Chance Ordinance, the Los Angeles County Fair Chance Ordinance for Employers, and the California Fair Chance Act, for US-based candidates. For unincorporated Los Angeles County workers: we reasonably believe that criminal history may have a direct, adverse and negative relationship with the following job duties, potentially resulting in the withdrawal of a conditional offer of employment: protect computer hardware entrusted to you from theft, loss or damage; return all computer hardware in your possession (including the data contained therein) upon termination of employment or end of assignment; and maintain the confidentiality of proprietary, confidential, and non-public information. In addition, job duties require access to secure and protected information technology systems and related data security obligations.

To notify OpenAI that you believe this job posting is non-compliant, please submit a report through this form. No response will be provided to inquiries unrelated to job posting compliance.

We are committed to providing reasonable accommodations to applicants with disabilities, and requests can be made via this link.

OpenAI Global Applicant Privacy Policy

At OpenAI, we believe artificial intelligence has the potential to help people solve immense global challenges, and we want the upside of AI to be widely shared. Join us in shaping the future of technology.

Related roles

Software Engineer, Full Stack - Codex Cloud Apps

OpenAI

1w ago

San FranciscoEngineeringMid-level$230,000 – $325,000

Software Engineer, Full Stack - Codex Cloud Apps at OpenAI — San Francisco. Mid-level engineering role on the Codex - Engineering team.

gorustawsrest

Full Stack Engineer, ChatGPT Finances

OpenAI

Today

Remote · San FranciscoEngineeringMid-level$293,000 – $325,000

Full Stack Engineer, ChatGPT Finances at OpenAI — Remote · San Francisco. Mid-level engineering role on the Applied AI Engineering team.

pythontypescriptreactnode

Senior Vendor Manager, SMB Sales

OpenAI

1w ago

Remote · SingaporeSales & GTMLead / Manager

Senior Vendor Manager, SMB Sales at OpenAI — Remote · Singapore. Lead-level sales role on the Marketing team.

gorustawsrest

Pricing Strategist

OpenAI

1w ago

Remote · San FranciscoOperations & PeopleMid-level$234,000 – $260,000

Pricing Strategist at OpenAI — Remote · San Francisco. Mid-level operations role on the Strategic Finance team.

gorustawsrest

Regional Sales Manager, Ads Solutions EMEA

OpenAI

1w ago

Remote · London, UKSales & GTMLead / Manager

Regional Sales Manager, Ads Solutions EMEA at OpenAI — Remote · London, UK. Lead-level sales role on the Ads Solutions team.

gorustawsrest

Account Director, Startups

OpenAI

Today

Remote · San FranciscoMarketingExecutive$324,000 – $360,000

Account Director, Startups at OpenAI — Remote · San Francisco. Executive-level marketing role on the Sales team.

gorustawsrag