5 Apollo Health Axis Jobs
Apollo HealthAxis - Data Engineer - Google Cloud Platform (3-6 yrs)
Apollo Health Axis
posted 11hr ago
Key skills for the job
Role Overview
We are looking for a highly skilled and experienced GCP Data Engineer to design, build, and maintain robust and scalable data solutions using Google Cloud Platform (GCP).
The ideal candidate will have in-depth knowledge of GCP data services, hands-on experience with advanced data engineering practices, and the ability to optimize large-scale data systems for performance and efficiency.
Key Responsibilities :
Data Architecture & Modeling :
- Design and implement scalable data architectures for both structured and unstructured datasets.
- Develop and maintain data models for OLAP/OLTP systems and data warehouses using best practices.
- Build end-to-end data pipelines, from ingestion to visualization, ensuring reliability and performance.
GCP Expertise :
Expert-level understanding of GCP services including :
- BigQuery : Schema design, performance tuning, partitioning, clustering, query optimization.
- Cloud Storage : Designing storage solutions, lifecycle management, and integration with other GCP services.
- Cloud Pub/Sub : Streaming data ingestion and real-time messaging.
- Cloud Dataflow : Batch and stream data processing with Apache Beam.
- Cloud Dataproc : Managing Hadoop/Spark clusters for advanced data processing.
- Cloud Composer : Workflow orchestration using Apache Airflow.
- Experience with Vertex AI or other ML services on GCP is a plus.
Data Integration :
- Build, optimize, and maintain robust ETL/ELT pipelines for data ingestion and transformation.
- Use tools like Apache Beam, Google Cloud Dataflow, or third-party ETL tools (i.e. Informatica, Talend, or Fivetran).
- Integrate on-premise data systems with GCP using secure and efficient methods.
Data Warehousing :
- Develop and optimize enterprise-grade data warehouses in BigQuery.
- Leverage BigQuery features like partitioning, clustering, UDFs, materialized views, and BI Engine for performance optimization.
- Build reusable data marts and manage metadata for efficient querying.
Programming & Scripting :
- Strong programming skills in Python, Java, or Scala for data processing and pipeline development.
- Proficiency in SQL with the ability to write complex queries, window functions, CTEs, and optimize for performance.
- Familiarity with version control systems like Git for collaborative development.
Streaming Data Processing :
- Design and implement real-time data pipelines for event-driven architectures.
- Work with Apache Kafka, Google Cloud Pub/Sub, or similar tools for streaming data ingestion and processing.
Infrastructure as Code (IaC) :
- Deploy and manage GCP resources using IaC tools like Terraform, Google Cloud Deployment Manager, or Ansible.
Containerization & Orchestration :
- Develop containerized applications using Docker and deploy them on Kubernetes or Google Kubernetes Engine (GKE).
- Monitor and manage distributed systems using tools like Prometheus and Grafana.
Data Security & Compliance :
- Implement secure data practices including encryption (at rest/in transit), IAM policies, and VPC configurations.
- Ensure compliance with GDPR, CCPA, or other relevant data privacy regulations.
Monitoring & Optimization :
- Use GCP-native tools like Cloud Monitoring and Cloud Logging to monitor performance and troubleshoot issues.
- Implement data quality checks, logging, and alerts to ensure system reliability.
Shift Flexibility :
- Willingness to work in shift timings during the initial two months for project setup and delivery support.
Required Skills & Qualifications :
Core Skills :
- Advanced expertise in BigQuery query design, optimization, and performance tuning.
- Strong programming skills in Python (Pandas, NumPy, PySpark), Java, or Scala.
- Extensive experience with ETL/ELT pipelines and tools (Apache Beam, Dataflow, Informatica).
- Hands-on knowledge of batch and stream processing frameworks.
GCP Data Services :
- BigQuery, Cloud Storage, Pub/Sub, Dataflow, Dataproc, Composer, Dataprep, and Vertex AI.
Development & Automation :
- Terraform, Deployment Manager, and CI/CD pipelines (i.e., Jenkins, GitLab CI/CD).
Databases & Querying :
- Proficiency in SQL, NoSQL databases (Firestore, Bigtable), and distributed databases.
Other Tools :
- Docker, Kubernetes (GKE), Apache Airflow, Apache Kafka.
- Monitoring tools like Prometheus, Grafana, and Cloud Monitoring.
Soft Skills :
- Strong analytical and problem-solving skills.
- Excellent communication and ability to work collaboratively in a team.
Preferred Qualifications :
- GCP certifications such as Professional Data Engineer or Cloud Architect.
- Knowledge of advanced machine learning and AI on GCP.
- Familiarity with data visualization tools like Looker or Tableau integrated with GCP
Functional Areas: Software/Testing/Networking
Read full job description3-6 Yrs
4-7 Yrs