i
HerKey
8 HerKey Jobs
Big Data Engineer - Python/Spark (2-10 yrs)
HerKey
posted 4d ago
Flexible timing
Key skills for the job
"This position is open only for female candidates as part of our diversity hiring initiative."
Job Summary :
We are seeking a highly skilled Big Data Engineer to design, develop, and manage scalable data pipelines and architectures. The ideal candidate will have expertise in big data technologies, distributed computing, data processing frameworks, and cloud platforms. You will collaborate with data scientists, analysts, and software engineers to build efficient, real-time, and batch data processing solutions.
Key Responsibilities :
- Design, develop, and maintain scalable big data architectures and ETL/ELT pipelines.
- Build real-time and batch processing data pipelines using tools like Apache Spark, Hadoop, Kafka, Flink, or Snowflake.
- Work with structured and unstructured data, ensuring high data quality and availability.
- Optimize data storage and retrieval using NoSQL, SQL, and cloud data warehouses (e.g., AWS Redshift, Google BigQuery, Azure Synapse).
- Implement data security, governance, and compliance best practices.
- Collaborate with Data Scientists, Analysts, and DevOps to enhance data accessibility and usability.
- Automate data workflows using Airflow, Prefect, or similar orchestration tools.
- Monitor and troubleshoot data pipeline performance and scalability issues.
Required Skills & Qualifications :
- Bachelor's/Master's degree in Computer Science, Data Engineering, or related field.
- 2+ years of experience in Big Data Engineering or related role.
- Strong proficiency in Python, Scala, Java, or SQL.
- Hands-on experience with big data processing frameworks (e.g., Spark, Hadoop, Hive, Flink).
- Expertise in cloud platforms (AWS/GCP/Azure) and cloud-native data solutions.
- Knowledge of data warehousing technologies (Snowflake, Redshift, BigQuery, or Synapse).
- Experience with Kafka, Kinesis, or Pulsar for real-time data streaming.
- Familiarity with containerization and orchestration tools (Docker, Kubernetes).
- Strong understanding of data modeling, ETL, and database performance tuning.
- Experience with CI/CD pipelines and Infrastructure-as-Code (IaC) (Terraform, CloudFormation).
Preferred Qualifications :
- Experience with Graph databases (Neo4j, JanusGraph).
- Hands-on with Machine Learning workflows and feature engineering.
- Familiarity with DataOps & MLOps best practices.
- Certifications in AWS Big Data, Google Data Engineering, or Azure Data Engineering.
Functional Areas: Software/Testing/Networking
Read full job descriptionPrepare for Big Data Engineer roles with real interview advice
18-24 Yrs