Design, develop, and implement data pipelines to extract, transform, and load (ETL) data from various sources (databases, APIs, cloud storage, streaming platforms).
Develop and maintain data pipelines using tools like Apache Spark, Kafka, and other big data technologies.
Ensure data quality and integrity throughout the data ingestion process.
Design and implement data warehousing and data lake architectures.
Develop and maintain data models and schemas for data warehousing and data lakes.
Optimize data storage and retrieval for efficient data access and analysis.
Leverage cloud platforms (AWS, Azure, GCP) for data storage, processing, and analysis.
Utilize cloud-native services (AWS S3, Azure Data Lake, Google Cloud Storage) for data storage and retrieval.
Implement cloud-based data pipelines and orchestration tools (AWS Glue, Azure Data Factory, Google Cloud Dataflow).
Develop and automate data pipelines using scripting languages (Python, Scala) and orchestration tools (Apache Airflow, Luigi).
Monitor and maintain data pipelines to ensure data quality and timely delivery.
Troubleshoot and resolve data pipeline issues.
Collaborate with data analysts, data scientists, and business stakeholders to understand data requirements and translate them into technical solutions.
Communicate technical concepts effectively to both technical and non-technical audiences.
Participate in code reviews and provide constructive feedback to other engineers.
Qualifications & Experience
Education :Bachelor's degree in Computer Science, Computer Engineering, or a related field.
Experience :3-6 years of professional experience in data engineering.
Skills
Essential :
Strong proficiency in Python and SQL.
Experience with big data technologies (Hadoop, Spark, Hive).
Experience with cloud computing platforms (AWS, Azure, GCP).
Experience with data warehousing and data lake concepts.
Experience with ETL/ELT processes and tools.
Experience with data modeling and schema design.
Strong analytical and problem-solving skills.
Excellent communication and collaboration skills.
Desirable
Experience with stream processing technologies (Kafka, Kinesis).
Experience with NoSQL databases (MongoDB, Cassandra).
Experience with containerization technologies (Docker, Kubernetes).
Experience with data governance and data security best practices