Research Engineer (Machine Translation)
Sanas
Sanas is pioneering the future of human communication. Founded by a team of Stanford researchers and entrepreneurs with deep industry experience, Sanas has developed the world's first real-time speech AI platform capable of accent translation, noise cancellation, speech enhancement, cross-language communication, and more.
Sanas makes conversations clearer, more inclusive, and more effective, removing barriers that prevent people from being understood, regardless of accent, background noise, or native language.
Sanas is currently one of the fastest growing startups in Silicon Valley, growing from $16M to $50M ARR in 2025. The company's core business is profitable and is on track to end 2026 with >$120M ARR. Our team combines deep expertise in model innovation and systems engineering with a design-minded product engineering culture to build and ship cutting-edge AI models and experiences — entirely in-house.
Sanas is a 180-strong team, established in 2020. In this short span, we've successfully secured over $100 million in funding. Our innovation has been supported by the industry's leading investors, including Insight Partners, Google Ventures, Quadrille Capital, General Catalyst, Quiet Capital, and other influential investors. Our reputation is further solidified by collaborations with numerous Fortune 100 companies. With Sanas, you're not just adopting a product; you're investing in the future of communication.
If you’re looking to have a significant role in roadmapping and driving technical directions, if you’re looking to deploy challenging and big ideas without much overhead or slowness, if you're looking to leave your mark on an ambitious, generational mission to change how the worlds thinks about speech + AI, then Sanas is a well-suited place for you.
About the Role
Language Translation is one of Sanas's most exciting and fastest-growing product lines. We're looking for a Research Engineer who can both set technical direction and get deep in the modeling work — someone who owns translation quality end-to-end across language pairs and drives the fundamental research challenges unique to real-time simultaneous interpretation.
Job Description
Translation quality & modeling
- Own and drive improvements to translation accuracy across Sanas's supported language pairs, with a focus on conversational, spoken-language domains.
- Design, train, and evaluate neural MT models — from fine-tuning large multilingual models to building targeted components for low-resource or high-priority language pairs.
- Develop and maintain rigorous evaluation pipelines using both automated metrics (BLEU, COMET, chrF) and human evaluation frameworks calibrated to real-world enterprise use cases.
- Identify the highest-leverage research bets — data augmentation, domain adaptation, quality estimation, terminology consistency — and execute on them with measurable quality gains.
Simultaneous interpretation & delimiter modeling
- Lead research and development of Sanas's delimiter model — the component that determines optimal segmentation points in streaming speech for real-time translation output.
- Develop methods to handle speech disfluencies, sentence fragments, and incomplete utterances gracefully in a streaming translation pipeline.
- Collaborate closely with the speech and inference engineering teams to ensure translation components meet strict real-time latency budgets in production.
Research direction & technical leadership
- Define and maintain a research roadmap for MT and simultaneous interpretation, prioritizing work that moves production quality metrics.
- Stay at the frontier of MT research — track and evaluate relevant work — and translate (haha) relevant advances into practical improvements at Sanas.
- Mentor and technically guide other engineers working on translation-adjacent problems across the ML org.
Data & infrastructure
- Identify, source, and curate training data for MT and delimiter modeling — including parallel corpora, synthetic data generation, and speech-aware augmentation strategies.
- Instrument model quality monitoring in production to detect degradation across language pairs and trigger targeted retraining cycles.
Qualifications
- 3+ years of experience in machine translation, NLP, or multilingual modeling research — with a track record of measurable quality improvements in production systems.
- Deep familiarity with neural MT architectures: sequence-to-sequence models, Transformer variants, and large multilingual models.
- Hands-on experience with simultaneous or streaming translation, including segmentation and low-latency decoding strategies.
- Strong command of MT evaluation methodology — automated metrics, human evaluation design, and error analysis.
- Proficiency in Python and deep learning frameworks (PyTorch preferred)
- Demonstrated ability to set a research agenda, execute independently, and communicate findings clearly to technical and non-technical stakeholders.
- Fluency in English plus working proficiency in at least one non-English language is a strong plus.
Bonus
- Experience with speech translation (end-to-end or cascaded) and speech-aware MT pipelines.
- Familiarity with on-device or edge-optimized model deployment for low-latency inference.
- Prior work on low-resource language pairs, domain adaptation, or terminology-constrained translation.
- Published research at ACL, EMNLP, NAACL, INTERSPEECH, or equivalent venues.