As a Data Engineer within our healthcare practice, you will be responsible for designing, implementing, and maintaining data pipelines, infrastructure, and platforms to support the customer s data-driven initiatives. You will collaborate with cross-functional teams to ensure the availability, reliability, and integrity of healthcare data for analytics, reporting, and decision-making purposes.
Responsibilities
Data Pipeline Development: Design, build, and maintain scalable data pipelines to ingest, transform, and load structured and unstructured healthcare data from various sources such as electronic health records (EHR), medical devices, billing systems, and external APIs.
Data Integration: Collaborate with data architects and analysts to integrate disparate data sources, ensuring data consistency, integrity, and quality for reporting, analytics, and machine learning applications.
Data Modeling and Storage: Develop and optimize data models and storage solutions, leveraging database management systems (e.g., SQL Server, Oracle, MongoDB) and big data technologies (e.g., Hadoop, Spark) to support efficient data retrieval and analysis.
Data Quality Management: Implement data quality checks, validation rules, and monitoring processes to ensure the accuracy, completeness, and reliability of healthcare data, collaborating with stakeholders to address issues and improve data quality over time.
Infrastructure Management: Manage and optimize data infrastructure, including servers, storage, and cloud-based services (e.g., AWS, Azure, GCP), to ensure scalability, performance, and cost-effectiveness of data storage and processing capabilities.
ETL/ELT Development: Develop and maintain ETL/ELT processes to extract, transform, and load data between systems, ensuring efficient data movement and transformation while adhering to best practices and performance standards.
Data Security and Compliance: Implement security controls, encryption, and access management policies to protect sensitive healthcare data and ensure compliance with regulatory requirements such as HIPAA, GDPR, and HITRUST.
Monitoring and Optimization: Monitor data pipelines, job performance, and system health metrics to identify bottlenecks, optimize performance, and troubleshoot issues in a timely manner to minimize downtime and ensure data availability.
Documentation and Collaboration: Document data engineering processes, workflows, and technical specifications, and collaborate with cross-functional teams, including data scientists, analysts, and software engineers, to ensure alignment and knowledge sharing.
Qualifications
Bachelors or Masters degree in Computer Science, Engineering, or related field.
At least 5 years of experience as a data engineer, ETL developer, or similar role, preferably in a healthcare or life sciences environment.
Proficiency in programming languages such as Python, Java, or Scala, as well as SQL for data manipulation and scripting tasks.
Hands-on experience with data pipeline orchestration tools (e.g., Apache Airflow, Luigi) and ETL/ELT frameworks (e.g., Apache Spark, Apache Beam).
Strong understanding of database systems, data modeling principles, and SQL query optimization techniques.
Familiarity with cloud platforms (e.g., AWS, Azure, GCP) and big data technologies (e.g., Hadoop, Kafka) preferred.
Excellent problem-solving skills, attention to detail, and ability to work independently and collaboratively in a dynamic environment.
Knowledge of any MDM tool like Informatica or Semarchy would be a big plus.