Data Pipeline Development: Design, build, and maintain scalable data pipelines using Databricks on AWS. Data Integration: Ingest data from multiple sources (structured and unstructured) using AWS services such as S3, Lambda, Glue, and Kinesis. Big Data Processing: Develop and optimize Spark-based ETL jobs to process large datasets on Databricks. Data Architecture: Collaborate with the architecture team to implement best practices for cloud-based data lakes and data warehouses. Optimization: Tune Spark jobs for performance improvements and efficient resource usage. Strong experience with Databricks, including Apache Spark development. Proficiency in Python or Scala for Spark job development. Experience with SQL for querying and managing data. Familiarity with CI/CD tools (eg, Jenkins, GitLab) for automating deployment and testing of data pipelines. Knowledge of big data technologies (Hadoop, Hive) and ETL frameworks. Familiarity with AWS security best practices (IAM, VPC, security groups).
Minimum Skills Required: Data Engineer, Databricks, Cloud migration 1+ year (at least one project)