Research Data Engineer (26A102)

FreshieHire Author
Salary
Not Disclosed
Location
Bengaluru

Highlights

Transform messy data to gold. Build scalable ML pipelines. Impact real-time speech AI.


Description

Job Summary

pJoin Smallest.ai as a Research Data Engineer and revolutionize the way data is processed for cutting-edge speech, language, and real-time systems. You will transform messy, noisy data into high-quality datasets that power our models.

Responsibilities

  • Build high-throughput pipelines for audio, text, and multimodal data
  • Design heuristics and ML-based data filtering systems
  • Clean, filter, deduplicate, and normalize multilingual data
  • Create scalable evaluation datasets across languages and domains
  • Develop training data pipelines that continuously improve model performance

Required Skills

  • Data processing at scale (audio/text preferred)
  • Coding skills in Python (systems experience a plus)
  • Multilingual data handling and normalization
  • Experience with ML/data pipelines
  • Understanding of active learning loops and sampling strategies

Required Skills Explained

  • Strong fundamentals in data structures, systems, and pipelines
  • Experience with large-scale data processing (audio/text preferred)
  • Comfortable working with messy, unstructured, real-world data
  • Strong coding skills with Python required; experience with systems is a plus
  • Understanding of ML/data pipelines including training, evaluation, and data curation

Who is this for

pIf you thrive on working with raw, chaotic data and are passionate about turning it into a competitive advantage, this role is perfect for you. You should enjoy building systems that directly impact model performance.

Why This Job is a Good Opportunity

ulliPotential to significantly impact model performance by improving data qualityliChallenging yet rewarding role that involves transforming raw data into valuable assets for AI modelsliOpportunity to work on cutting-edge, real-time multilingual voice AI systems with global applications

Interview Preparation Tips

  • Prepare examples of how you have improved data quality in previous roles
  • Demonstrate your understanding of ML/data pipelines and their importance
  • Showcase your experience with large-scale data processing, particularly audio/text data
  • Discuss any relevant projects or personal initiatives related to data curation and pipeline optimization

Career Growth in This Role

pThis role offers a pathway to becoming an expert in data engineering for AI systems. With the increasing importance of high-quality data in machine learning, there are numerous opportunities to expand your skill set and contribute to groundbreaking projects.

pAs you progress, you might move into more strategic roles within data science or even lead teams focused on improving the overall data ecosystem. The continuous demand for skilled professionals who can handle complex data challenges ensures long-term career growth potential.

Explore More Opportunities

Skills

Frequently Asked Questions

What kind of experience is required?

Experience with large-scale data processing, Python coding, and understanding of ML/data pipelines is essential.

Is this role suitable for beginners?

No, this position requires a strong foundation in data structures and systems, along with practical experience in data processing.

What benefits can I expect from joining Smallest.ai?

Joining Smallest.ai offers the opportunity to work on groundbreaking projects and contribute directly to model performance improvements.

About the Author

FreshieHire Author
Hi, this is KD. On my blogs, you will find the best jobs for freshers all at one place. We curate jobs for you from various sources and combine them all at one place. Hope you got some value. : )
Cookie Consent
We serve cookies on this site to analyze traffic, remember your preferences, and optimize your experience.
Oops!
It seems there is something wrong with your internet connection. Please connect to the internet and start browsing again.
AdBlock Detected!
We have detected that you are using adblocking plugin in your browser.
The revenue we earn by the advertisements is used to manage this website, we request you to whitelist our website in your adblocking plugin.
Site is Blocked
Sorry! This site is not available in your country.