44 Bluebyte Technologies Jobs
Data Engineer - Spark/Hadoop (3-8 yrs)
Bluebyte Technologies
posted 3d ago
Key skills for the job
Job Description :
Main Skills : Apache Airflow, Java, Maven, SQL, GCP services like Big Query, Cloud Composer, Data Proc, Data Flow
Design & Implement Data Pipelines :
- Develop, implement, and maintain scalable data pipelines using Google Cloud Dataflow and Apache Beam.
- Ensure the pipelines can process large-scale data efficiently with proper data validation, transformation, and loading.
Cloud Infrastructure & GCP Services :
- Leverage a variety of GCP services including BigQuery, Cloud Storage, Pub/Sub, Cloud Functions, and Cloud Composer to build, deploy, and manage data workflows.
- Utilize Google Cloud SDK and other cloud tools for managing cloud resources and automating workflows.
Optimize Data Flow & Performance :
- Monitor and optimize pipeline performance to ensure that data processing is cost-effective and efficient, meeting service-level agreements (SLAs).
- Troubleshoot and resolve issues related to data quality, pipeline execution failures, and performance bottlenecks.
Data Quality & Transformation :
- Implement data validation and cleaning techniques to ensure the accuracy and consistency of data throughout the pipeline.
- Develop transformation logic to process structured, semi-structured, and unstructured data from various sources.
Collaboration & Documentation :
- Collaborate with data scientists, analysts, and other stakeholders to ensure data flows meet the analytical needs of the business.
- Maintain clear documentation for data pipeline designs, architecture, and operational procedures.
Automation & CI/CD :
- Implement automation strategies for pipeline deployment, testing, and monitoring using CI/CD tools such as Cloud Build, Jenkins, or GitLab CI.
Security & Compliance :
- Follow best practices for securing data and ensuring compliance with industry regulations, including encryption, access control, and auditing.
Reporting & Monitoring :
- Implement monitoring and alerting for data pipelines using tools such as Google Stackdriver, Cloud Monitoring, and Cloud Logging.
- Generate reports on pipeline health, data quality, and performance for internal stakeholders.
Required Skills and Qualifications :
Experience :
- 3+ years of experience in data engineering or cloud engineering, specifically working with Google Cloud Platform (GCP).
- Proficiency in building data pipelines using Google Dataflow, Apache Beam, or similar tools.
- Strong experience with BigQuery, Cloud Storage, Pub/Sub, and Cloud Functions for data processing and management.
Technical Skills :
- Expertise in SQL and scripting languages (e.g., Python, Java, Scala).
- Experience with distributed data processing and big data technologies such as Apache Hadoop, Spark, or Kafka.
- Understanding of data modeling, ETL processes, and data warehousing.
- Familiarity with cloud security concepts, including IAM roles, encryption, and network security in GCP.
Soft Skills :
- Strong analytical and problem-solving abilities.
- Excellent communication skills for collaborating with cross-functional teams.
- Ability to manage multiple projects and priorities in a fast-paced environment.
Education :
- Bachelor's degree in Computer Science, Information Technology, Engineering, or a related field.
- Relevant certifications, such as Google Cloud Professional Data Engineer, are a plus.
Functional Areas: Software/Testing/Networking
Read full job description5-8 Yrs