4 Blue Ridge Jobs
Senior Data Engineer
Blue Ridge
posted 6hr ago
POSITION SENIOR DATA ENGINEER
Are you passionate about building state-of-the-art data platforms and powering the next generation of compute
and AI applications, we'd love to hear from you. This is an exciting opportunity to leverage your expertise in
distributed computing frameworks to make a significant impact as we push the boundaries of supply chain planning
and eventually adoption of Gen AI.
JOB BRIEF
We are seeking an experienced Data Engineer to join our team and lead the development of a cutting-edge data
platform. The platform will leverage distributed computing frameworks such as Apache Spark, Databricks, and
Snowflake to enable near real time supply chain planning, eventually leading to advanced analytics, insights into
data with the adoption of Generative AI (GenAI) technologies across our product base.
KEY RESPONSIBILITIES
As a Lead Data Engineer, the candidate would be responsible for:
• Design and build a highly scalable, fault-tolerant data platform optimized for distributed computing and
large-scale data processing.
• Implement data pipelines and ETL/ELT processes using distributed computing frameworks to efficiently
ingest, transform, and load massive datasets from various sources.
• Leverage cloud data platforms to enable seamless data sharing, near-zero maintenance, and fast analytics
on structured and semi-structured data.
• Collaborate with data scientists, machine learning engineers, and software developers to understand data
requirements and build solutions to power GenAI applications.
• Optimize distributed computing jobs and queries for maximum performance and cost efficiency.
• Implement data governance, security, and compliance best practices.
• Provide guidance on distributed computing architecture and mentor junior data engineers.
QUALIFICATIONS
• 5+ years of experience as a Data Engineer for building of large-scale data pipelines using big data
technologies (Apache Spark/Kafka/Flink/Storm/Airflow/Hadoop/Map Reduce/Redshift/Presto).
• Strong proficiency in SQL, object-oriented programming experience in python and data modelling
techniques.
• Deep expertise in distributed computing principles and frameworks (e.g., Apache Spark), including SQL,
streaming, and optimizing jobs for scale and efficiency.
• Hands-on experience with developing and deploying distributed computing applications using cloud-based
platforms (e.g., AWS EMR, Azure HDInsight, GCP DataProc or equivalent).
• Strong understanding of cloud data platform architectures and best practices for ELT/ETL and Data
warehousing, data sharing, and query optimization (e.g., AWS Redshift/Athena, AWS Glue, Azure Synapse
Analytics, or equivalent).
• Experience enabling application engineers to build applications leveraging the data platform through APIs
and abstractions.
• Experience with orchestration frameworks like Apache Airflow and data streaming technologies like Kafka, Flink and Apache Storm.
• Knowledge of Datalake and Lakehouse concepts.
• Experience building and optimizing data pipelines for machine learning applications.
• Good knowledge on performance tuning and troubleshooting of batch and streaming jobs.
• Knowledge of data modelling, data warehousing, and schema design.
• Familiarity with public cloud platforms such as AWS, Azure, or GCP.
• Strong computer science fundamentals in data structures and algorithms.
• Good understanding of metadata driven development.
• Excellent problem-solving and communication skills.
• Bachelor's or Master's degree in Computer Science (Preferred), Engineering, or a related field
EXPERIENCE
5 Years +
EMPLOYMENT TYPE
Full-time
LOCATION
ICC Trade Tower, Shivaji Nagar, Pune
Employment Type: Full Time, Permanent
Read full job descriptionPrepare for Senior Data Engineer roles with real interview advice