Lead and mentor a team of data engineers in developing and managing scalable, secure, and high-performance data pipelines .
Define best practices for data ingestion, transformation, and processing in a Lakehouse architecture .
Drive automation, performance tuning, and cost optimization in cloud data solutions.
Cloud Data Infrastructure & Processing
Architect and manage AWS-based big data solutions (EMR, EKS, Glue, Redshift).
Design and maintain Apache Airflow workflows for data orchestration.
Optimize Spark and distributed data processing frameworks for large-scale workloads.
Implement streaming solutions (Kafka, Kinesis, Flink) for real-time data processing.
AI/ML & Advanced Analytics
Collaborate with Data Scientists and AI/ML teams to build and deploy machine learning models using AWS SageMaker .
Support feature engineering, model training, and inference pipelines at scale.
Enable AI-driven analytics by integrating structured and unstructured data sources.
Business Intelligence & Visualization
Support BI and reporting teams with optimized data models for Amazon QuickSight and other visualization tools .
Ensure efficient data aggregation and pre-processing for interactive dashboards and self-service analytics.
Design, develop, and maintain middleware components that facilitate seamless communication between data platforms, applications, and analytics layers .
Master Data Management (MDM) & Governance
Implement MDM strategies to ensure clean, consistent, and deduplicated data.
Establish data governance policies for security, privacy, and compliance (GDPR, HIPAA, etc).
Ensure adherence to data quality frameworks across structured and unstructured datasets.
Collaboration & Strategy
Partner with business teams, AI/ML teams, and analysts to deliver high-value data products .
Define and maintain data architecture strategies aligned with business goals.
Enable real-time and batch processing for analytics, reporting, and AI-driven insights.
Technical Expertise:
Extensive AWS experience with services such as EMR, EKS, Glue, Redshift, S3, Lambda, and SageMaker.
Proficient in big data processing frameworks (eg, Spark, Hive, Presto) and Lakehouse architectures.
Skilled in designing and managing Apache Airflow workflows and other orchestration tools.
Solid understanding of Master Data Management (MDM) and data governance best practices.
Middleware Development Proven expertise in building middleware components like REST API that integrate data pipelines with applications, analytics platforms, and real-time systems.
Hands-on experience with Gitlab CI/CD, Terraform, CFT, and Infrastructure-as-Code (IaC) methodologies.
Familiarity with AI/ML pipelines, model deployment, and monitoring using SageMaker.
Experience with data visualization tools, particularly AWS QuickSight, for business intelligence.
Qualifications
Experience with Lakehouse frameworks (Glue Catalog, Iceberg, Delta Lake). Expertise in streaming data solutions (Kafka, Kinesis, Flink). In-depth understanding of security best practices in AWS data architectures. Demonstrated success in driving AI/ML initiatives from ideation to production.
Educational Qualification:
Bachelor s degree or higher (UG+) in Computer Science, Data Engineering, Aerospace Engineering, or a related field.
Advanced degrees (Master s, PhD) in Data Science or AI/ML are a plus