OUR SECTORS
At USA Tech Recruit, our sectors cover a wide range of industries within the field of technology.
tech jobs in Europe?
Looking for
tech jobs in the US?
At European Recruitment, our sectors cover a wide range of industries within the field of technology
At European Recruitment, our sectors cover a wide
range of industries within the field of technology
At European Recruitment, our sectors cover a wide
range of industries within the field of technology
Client services
Learn about what client services we offer at USA Tech Recruit and browse though our success stories.
tech jobs in Europe?
Looking for
tech jobs in the US?
At European Recruitment, our sectors cover a wide range of industries within the field of technology
About us
Learn more about USA Tech Recruit's story, mission and values, meet our team, and read about our commitment to DE&I.
tech jobs in Europe?
Looking for
tech jobs in the US?
At European Recruitment, our sectors cover a wide range of industries within the field of technology
ML Research Engineer
We’re partnering with a venture-backed AI startup building next-generation visual conversational systems – enabling users to interact with AI through real-time video experiences that feel genuinely human. Their work sits at the intersection of multimodal ML, high-performance infrastructure, and real-time systems. They’re now hiring a Machine Learning Research Engineer to help bring cutting-edge models from research into highly optimized production environments.
This is a hands-on role focused on GPU performance, model acceleration, and scaling multimodal systems in real-world deployments.
Key Responsibilities:
-
Collaborate with research teams to productionize experimental models and transition them into reliable, scalable systems.
-
Own model performance optimization, profiling inference for latency and throughput improvements using techniques such as quantization, pruning, and distillation.
-
Debug and optimize GPU workloads, including CUDA-level performance issues.
-
Apply acceleration frameworks (e.g., TensorRT, ONNX, vLLM) to improve multimodal model efficiency across video, speech, and LLM systems.
-
Design and implement high-throughput data pipelines for large-scale video and multimedia datasets (petabyte-scale).
-
Develop evaluation frameworks to measure model quality and support continuous iteration.
-
Work closely with infrastructure teams to build scalable media processing and training workflows.
Key Qualifications:
-
2+ years of full-time experience in ML engineering, ideally in production ML environments.
-
Proven experience operationalizing research models into production systems with a focus on inference optimization (latency and throughput improvements).
-
Strong proficiency in PyTorch for training and deploying large-scale models.
-
Experience running models across large datasets for feature extraction or batch inference workloads.
-
Hands-on experience working with GPUs and debugging CUDA-related performance issues.
-
Experience with video, audio, or multimodal models preferred.
-
Background in a VC-backed startup or high-performing established company is advantageous.
-
Willingness to work onsite 5 days per week in Seattle.
-
Strong hands-on engineering focus (not a management-leaning or director-level profile).
-
Candidates whose ML experience is limited to fine-tuning models without deeper systems or performance work are unlikely to be a fit.
Apply Now
By applying to this role, you acknowledge that we may collect, store, and process your personal data on our systems.
For more information, please refer to our
Privacy
Notice