
Ensure data accuracy and reliability; design & execute test cases; monitor & trigger pipelines; perform RCA; develop automated testing frameworks
As a Data Quality Engineer, you will be responsible for ensuring the accuracy, reliability, and integrity of our data pipelines and workflows. This role requires hands-on experience in data engineering concepts, with a strong focus on quality testing, validation, and pipeline orchestration.
Your primary responsibilities include designing, developing, and executing data quality test cases to validate data pipelines and ETL/ELT processes. You will also monitor and trigger data pipelines, ensuring smooth execution and timely data delivery. Running and maintaining data quality scripts to identify anomalies, inconsistencies, and data integrity issues is a key part of your role.
You will collaborate with data engineers to implement data quality checks at various stages of the pipeline and perform root cause analysis (RCA) for data anomalies and pipeline failures. Troubleshooting pipeline failures and data quality issues efficiently will be another critical aspect of this position. Documenting data quality standards, testing procedures, and validation results is essential.
Generating data quality reports and communicating findings with engineering teams is an important part of your role. Developing automated testing frameworks to improve the efficiency of data quality validation is also a key responsibility. You will focus primarily on validating and assuring the quality of existing pipelines (not building full pipelines).
To succeed in this role, you should have a strong understanding of data engineering concepts including ETL/ELT processes, data warehousing, and data modeling. Proficiency in SQL for complex data validation and querying is essential, as well as experience with scripting languages such as Python or Shell scripting for automation.
Hands-on experience with data pipeline orchestration tools (e.g., Apache Airflow, Azure Data Factory, AWS Glue) is required, along with knowledge of data quality frameworks and tools. Familiarity with cloud platforms (AWS, Azure, or GCP) and their data services, as well as understanding of data formats such as JSON, Parquet, Avro, CSV, is a must.
Experience with big data technologies like Spark, Hadoop, Kafka, knowledge of CI/CD practices for data pipelines, familiarity with version control systems (Git), and understanding of data governance and compliance requirements are all preferred. Data visualization tools for quality reporting will also be beneficial in this role.
This is a fantastic opportunity to join a dynamic team that values collaboration and innovation. If you possess the skills and experience outlined above, we encourage you to apply for this position today!