The Data Engineer will work alongside data scientists and domain experts to enable teams to answering scientific questions using multi-modal data on the data42 platform. They will be involved in gathering use-case requirements, performing engineering activities for data services, building ETL processes/data pipelines in quick iterations to deliver data ready for analysis. The Data Engineer will integrate data engineering best practices and data quality checks and seek to continuously optimize efficiency.
Job Description
Your responsibilities will include, but are not limited to:
Collaborates with domain experts, data scientists and other stakeholders to fulfil use-case specific data needs.
Designs, develops, tests, and maintains ETL processes/data pipelines to extract, prepare and iterate data for analysis in close alignment with TA / DA scientific leads and data scientists.
Implements and maintains data checks to ensure accurate and high quality-data in close collaboration with domain experts.
Identifies and rectifies data inconsistencies and irregularities.
Promotes culture of transparency and communication regarding data modifications and lineage to all stakeholders.
Implements and advocates for data engineering best practices, ensuring ETL processes/data pipelines are efficient, well-documented and well-tested.
Plays a role in knowledge sharing across data42 and wider data engineering community at Novartis.
Ensures compliance with Security and Governance Principles.
Minimum Requirements:
Bachelor s degree in computer science or other quantitative field (Mathematics, Statistics, Physics, Engineering, etc.) or equivalent practical experience.
Proven experience as a data engineer, data wrangler or a similar role.
Exceptional programming skills with expertise in Python, R and Spark.
Experience and familiarity with a variety of data types, including but not limited to images, tabular, unstructured, and text.
Experience in scalable data processing engines, data ingestion, extraction and modeling.
Proficient knowledge in statistics, with an ability to assess data quality, errors, inconsistencies, etc.
Good knowledge of data engineering best practices Excellent communication and stakeholder management skills.
Demonstrated ability to work independently and as part of global Agile teams.
Desirable additional skills in two or more of the following areas:
Hands on experience on Palantir Foundry (Code Repository, Code Workbook, Contour, Data Lineage, etc.)
Knowledge of CDISC data standards (SDTM, ADaM)
Experience using AI (eg: GenAI/LLMs) for data wrangling.
Experience with pooling of clinical trial data.
High-level understanding of the drug discovery and development process.
Skills Desired
Algorithms, Computer Programming, Computer Science, Computer Vision, Data Science, People Management, R&D (Research And Development), Waterfall Project Management