Develop and implement a reusable architecture of data pipelines to make data available for various purposes including Machine Learning (ML), Analytics and Reporting
Work collaboratively as part of team engaging with system architects, data scientists and business in a healthcare context
Define hardware, tools and software to enable the reusable framework for data sharing and ML model productionization
Work comfortably with structured and unstructured data in a variety of different programming languages such as SQL, R, python, Java etc
Understanding of distributing programming and advising data scientists on how to optimally structure program code for maximum efficiency
Build data solutions that leverage controls to ensure privacy, security, compliance and data quality
Understand meta-data management systems and orchestration architecture in the designing of ML/AI pipelines.
Deep understanding of cutting edge cloud technology and frameworks to enable Data Science
System integration skills between Business Intelligence and source transactional
Improving overall production landscape as required
Define strategies with Data Scientists to monitor models post production
Write unit tests and participate in code reviews
Skill Requirement:
Expert in programming languages such as R, Python, Scala and Java
Expert database knowledge in SQL and experience with MS Azure tools such as Data Factory, Synapse Analytics, Data Lake, Databricks, Azure stream analytics and PowerBI