ML Pipeline Engineer (m/f)
We are looking for an ML Pipeline Engineer to join our Data Science team and support machine learning pipelines for computer vision and LLM-based projects.
This role is embedded within the Data Science team and focuses on building, optimizing, and maintaining training, validation, data processing, and deployment workflows. The ideal candidate is a hands-on engineer who can help make ML development faster, more reliable, and more scalable across both local environments and AWS.
About us:
SAFR from RealNetworks is a unified ecosystem specializing in face-based computer vision solutions optimized for real-world performance. Building on a legacy of digital media expertise and innovation, RealNetworks has created a new generation of products that employ best-in-class artificial intelligence and machine learning to enhance and secure our daily lives.
Key Responsibilities:
- Build, maintain, and optimize ML training and validation pipelines.
- Support acquisition, preparation, labeling, organization, and quality control of training data.
- Streamline data processing workflows for training, validation, testing, and experimentation.
- Collaborate with data scientists to turn experimental workflows into repeatable, automated pipelines.
- Optimize pipelines for both local development environments and AWS infrastructure.
- Maintain and improve automated deployment workflows for ML models and related services.
- Develop Python scripts, utilities, and automation tools to improve data science productivity.
- Troubleshoot pipeline failures, data quality issues, infrastructure bottlenecks, and deployment problems.
- Contribute to CI/CD workflows for model training, testing, validation, and release.
- Maintain documentation for pipelines, deployment processes, and operational best practices.
Required Qualifications:
- 3+ years of experience in machine learning engineering, data engineering, MLOps, software engineering, or a related technical role.
- Bachelor’s degree in Engineering, Mathematics, Computer Science, Data Science, or a related field, or equivalent academic/practical experience.
- Strong programming skills in Python.
- Experience with scripting and automation in Linux-based environments.
- Practical understanding of ML workflows, including data preparation, training, validation, and deployment.
- Familiarity with ML and data science libraries such as NumPy, scikit-learn, PyTorch, LangChain, pandas, or similar tools.
- Familiarity with Linux, shell scripting, and command-line tools.
- Understanding of the AWS ecosystem, including services such as EC2, Lambda, S3, IAM, CloudWatch, and CI/CD tools.
- Experience building or maintaining automated pipelines for data processing, model training, validation, or deployment.
- Strong debugging, problem-solving, communication, and documentation skills.
Preferred Qualifications:
- Experience with computer vision pipelines, including image or video data processing.
- Experience working with LLM applications, embeddings, retrieval-augmented generation, or LangChain-based workflows.
- Familiarity with lower-level programming languages such as C and/or C++ is a plus.
- Experience with Docker, containerized training environments, or reproducible ML workflows.
- Familiarity with CI/CD systems such as GitHub Actions, GitLab CI, Jenkins, AWS CodePipeline, or similar tools.
- Experience managing GPU-based workloads locally or in the cloud.
- Familiarity with dataset versioning, experiment tracking, data validation, and model evaluation best practices.
- Understanding of MLOps principles and production ML system design.