As Data Engineer, you will work closely with our Data Scientists and Software engineers to support the design, development, and maintenance of our data infrastructure and Machine learning backend. You will have the opportunity to work with cutting-edge, open sourced technologies and contribute to the growth of our data capabilities, while mentored in all aspects of the data engineering life cycle. You will also gain the hands on experience in driving and leading a project delivery in the data engineering fields.
Responsibilities:
Data Platform / Data Product
Identify, design, and implement internal automated processes including but not limited to billing, alerts, query usage analysis, storage/compute resource usage analysis
Ideate, iterate, design and implement features from scratch or based on existing commercial projects that could be provide additional values to the platform users
Contribute directly or raise feedbacks to our platform features to the Data Infrastructure & Products team to improve our platform efficiency and capabilities.
Commercial Projects
Lead the delivery of a commercial project end to end
Collaborate with cross-functional teams to understand data requirements and develop solutions to meet business needs
Participate and drive the client facing meeting to finalise details, push back on requirements or update status
Design and implement various data pipelines in a config driven approach, including but not limited to ingestion, transformation and reverse ETL, based on client functional requirements
Design and implement data quality checks at various checkpoints
Collaborate with Machine Learning engineers to design, create and maintain various ML data pipelines in a config driven approach, including but not limited to training, batch inference, deployment, validation and monitoring
Assist in troubleshooting and resolving data-related issues
Write data quality checks to validate the accuracy, completeness, and consistency of data
Requirements:
Bachelors degree in Computer Science, Engineering, or a related field
Solid foundation in SQL and Python
Understanding of conventional database concepts
Solid understanding of modern data warehouse, lakehouse or lake
Hands-on experience of cloud platforms (e.g., AWS, Azure, GCP)
Strong analytical and problem-solving skills
Experience with implementing Data Quality Checks
Experience with data processing frameworks (e.g., dbt)
Experience with data integration and ETL tools (e.g., Airflow)
Experience with one of the modern warehouse/lakehouse solutions (e.g., Delta Lake, Redshift, Snowflake)
Experience with Databricks (either Azure or AWS) and a decent understanding of Spark
Excellent communication and collaboration abilities
Self-motivated with a passion for learning and continuous improvement
Knowledge / Experience in the following area is a plus:
in one of the MLOps platforms (e.g., MLFlow, Sagemaker)
in container technology (e.g., docker, AWS Fargate) and shell scripting
In Machine Learning, Data Warehousing (Analytical Engineering) designs or Infra implementations