Design, implement, and maintain a scalable Data Lake on GCP to centralize structured and unstructured data from various sources (databases, APIs, cloud storage).
Utilize GCP services including BigQuery, Dataflow, Pub/Sub, and Cloud Storage to optimize and manage data workflows, ensuring scalability, performance, and security.
Collaborate closely with data analytics and data science teams to understand data needs, ensuring data is properly prepared for consumption by various systems (e.g. DOMO, Looker, Databricks)
Implement best practices for data quality, consistency, and governance across all data pipelines and systems, ensuring compliance with internal and external standards.
Continuously monitor, test, and optimize data workflows to improve performance, cost efficiency, and reliability.
Maintain comprehensive technical documentation of data pipelines, systems, and architecture for knowledge sharing and future development.
Requirements
Bachelors degree in Computer Science, Data Engineering, Data Science, or a related quantitative field (e.g. Mathematics, Statistics, Engineering).
3+ years of experience using GCP Data Lake and Storage Services. Certifications in GCP are preferred (e.g. Professional Cloud Developer, Professional Cloud Database Engineer).
Advanced proficiency with SQL, with experience in writing complex queries, optimizing for performance, and using SQL in large-scale data processing workflows.
Proficiency in programming languages such as Python, Java, or Scala, with practical experience building data pipelines, automating data workflows, and integrating APIs for data ingestion.