Description
The opportunity
We are building the next generation of AI-driven game experiences — generative world models, neural rendering, and multi-modal understanding that turn images, text, and 3D primitives into interactive worlds. As our Staff Machine Learning Engineer, you will be a core technical leader bringing state-of-the-art computer vision and multi-modal models — transformers, diffusion networks, vision-language models (VLMs), and JEPA-style architectures — from research into robust, production-grade systems.
This is a deeply hands-on, high-impact role. You will help define the modeling and deployment strategy, drive architectural decisions across the ML stack, and mentor a team of senior and mid-level engineers. Your work will directly shape the quality, capability, and performance of AI features experienced by billions of players — across cloud, server, and on-device targets.
What you'll be doing
Technical Leadership
- Help set the technical vision and roadmap for computer vision and multi-modal AI models, spanning transformers, diffusion models, vision-language models, and JEPA-style generative architectures.
- Drive design and implementation of models for image and video understanding, generation, segmentation, detection, and dense prediction, as well as multi-modal reasoning over images, text, and 3D inputs.
- Make sound decisions on model architecture, training strategy, data pipelines, and evaluation — balancing quality, capability, latency, and cost across deployment targets.
- Own the path from research prototype to production: training, fine-tuning, distillation, export, and serving, with deployment spanning cloud GPUs through to efficient on-device inference where the product requires it.
Architecture & Research Translation
- Collaborate directly with research scientists to translate novel CV and multi-modal model architectures into deployable, well-engineered implementations.
- Design scalable systems for multi-modal inference that process diverse inputs images, video, text, primitives, and metadata — and produce rich outputs from semantic predictions to pixel-level generation.
- Track and rapidly adopt breakthroughs across the field: vision-language pretraining and alignment, efficient diffusion (e.g., consistency models, flow matching), efficient attention (e.g., FlashAttention, linear-attention variants), and tokenization/representation learning for vision.
- Where latency or device constraints demand it, apply compression, quantization, pruning, and knowledge distillation, and work with appropriate runtimes (e.g., TensorRT, ONNX Runtime, CoreML, TFLite) to meet performance budgets.
Team & Cross-Functional Leadership
- Lead and mentor a team of ML engineers; define engineering best practices, code review standards, and rigorous benchmarking and evaluation methodology.
- Partner with research, platform engineers, product managers, and runtime teams to align ML capabilities with product roadmaps and target-platform constraints.
- Champion a culture of measurement: define KPIs for model quality, accuracy, latency, memory, and cost, and ensure the team tracks them rigorously.
What we're looking for
- 6+ years in ML engineering, with significant depth in computer vision and/or multi-modal modeling.
- Proven production experience with transformer-based and diffusion-based vision models (e.g., ViT, CLIP/SigLIP-style encoders, Stable Diffusion, DETR/SAM-style architectures)
- Strong command of the full model lifecycle: data curation, training and fine-tuning, evaluation, and serving at scale.
- Familiarity with efficient attention, diffusion samplers, multi-modal fusion, and vision-language alignment techniques.
- Strong Python and modern deep-learning tooling (PyTorch); solid software engineering fundamentals.
- Track record of technical leadership: setting direction, influencing cross-functional partners, and growing engineers.
You might also have
- Experience with world-model, video-generation, or neural rendering pipelines (NeRF, 3DGS, or similar).
- Experience deploying models to constrained or on-device targets, including quantization (INT8/INT4/FP16), pruning, distillation, and runtimes such as CoreML, TFLite, ONNX
- Familiarity with mobile SoC accelerators (Apple Neural Engine, Qualcomm Hexagon/Adreno, ARM Mali) or compiler stacks such as MLIR, TVM, or XLA.
- Contributions to open-source ML frameworks or peer-reviewed CV/ML research publications.
- Background in real-time graphics or game engine pipelines (Metal, Vulkan, OpenGL ES).