Create and implement data pipelines that quickly extract, convert, and load (ETL) information from a variety of sources into BigQuery
Using Apache Airflow, create and manage workflows
Create DAGs (Directed Acyclic Graphs) to organize and schedule data processing operations, guaranteeing timely and reliable operation of data pipelines
To speed up the process of adding new features to the datasets, create Python scripts
Drive the operations, architecture, and execution of relevant technical and product strategies for one of the biggest sites in the world
To ensure that data is structured appropriately for effective querying and analysis, create and maintain data models and schemas within BigQuery
Find and use performance-enhancing techniques, such as indexing, partitioning, and caching, for both BigQuery queries and Airflow processes
Implement data quality controls to assure the reliability and accuracy of the data
In order to proactively identify and address any pipeline or job issues, set up monitoring and alerting systems
Put access controls and security mechanisms in place to safeguard sensitive data in BigQuery
Establish permissions and roles for different teams and users
Add information to the data pipelines from many sources and platforms, such as APIs, databases, logs, and third-party services
Use Airflow and other automation tools to automate ETL processes to boost productivity and decrease manual involvement
Maintain data governance and adherence to pertinent data standards and legislation, which includes data anonymization, data retention guidelines, and data privacy protection
Collaborate closely with data analysts, scientists, and other key players to comprehend their data requirements and assist with data-related projects
Determine and fix any problems with the data pipeline, performance bottlenecks, and other potential technical difficulties
To promote knowledge sharing and the onboarding of new team members, keep documentation for data pipelines, ETL procedures, and workflow setups
Keep up with the most recent BigQuery and Airflow innovations and continually enhance your data engineering methods
Job Requirements:
Bachelor s/Master s degree in Engineering, Computer Science (or equivalent experience)
At least 5+ years of relevant experience as a data engineer
Extensive experience working with BigQuery, Airflow, and Python
Familiarity working with Spark is desirable
Strong understanding data sources, data transformation requirements, and data modeling
Extensive familiarity with BigQuery, including its architecture, recommended practices, and optimization methods
Prior experience crafting intricate SQL queries to quickly process and analyze huge datasets
Excellent spoken and written English communication skills