- Design, construct, install, test, and maintain highly scalable data management systems using big data technologies such as Apache Spark (with focus on Spark SQL) and Hive. Manage and optimize our data warehousing solutions, with a strong emphasis on SQL performance tuning. Implement ETL/ELT processes using tools like Talend or custom scripts, ensuring efficient data flow and transformation across our systems.
- Utilize AWS services including S3, EC2, and EMR to build and manage scalable, secure, and reliable cloud-based solutions.
- Develop and deploy scripts in Linux environments, demonstrating proficiency in shell scripting. Utilize scheduling tools such as Airflow or Control-M to automate data processes and workflows.
- Implement and maintain metadata-driven frameworks, promoting reusability, efficiency, and data governance. Collaborate closely with DevOps teams utilizing SDLC tools such as Bamboo, JIRA, Bitbucket, and Confluence to ensure seamless integration of data systems into the software development lifecycle.
- Communicate effectively with both technical and non-technical stakeholders, for handover, incident management reporting, etc
Required education
Bachelor's Degree
Preferred education
Master's Degree
Required technical and professional expertise
- Demonstrated expertise in Big Data Technologies, specifically Apache Spark (focus on Spark SQL) and Hive.
- Extensive experience with AWS services, including S3, EC2, and EMR.
- Strong expertise in Data Warehousing and SQL, with experience in performance optimization
- Experience with ETL/ELT implementation (such as Talend)
- Proficiency in Linux, with a strong background in shell scripting
Preferred technical and professional experience
- Familiarity with scheduling tools like Airflow or Control-M.
- Experience with metadata-driven frameworks.
- Knowledge of DevOps tools such as Bamboo, JIRA, Bitbucket, and Confluence. Excellent communication skills and a willing attitude towards learning
Employment Type: Full Time, Permanent
Read full job description