- Design, implement, and manage large-scale data processing systems using Big Data Technologies such as Hadoop, Apache Spark, and Hive.
- Develop and manage our database infrastructure based on Relational Database Management Systems (RDBMS), with strong expertise in SQL.
- Utilize scheduling tools like Airflow, Control M, or shell scripting to automate data pipelines and workflows.
- Write efficient code in Python and/or Scala for data manipulation and processing tasks.
- Leverage AWS services including S3, Redshift, and EMR to create scalable, cost-effective data storage and processing solutions
Required education
Bachelor's Degree
Preferred education
Master's Degree
Required technical and professional expertise
- Proficiency in Big Data Technologies, including Hadoop, Apache Spark, and Hive.
- Strong understanding of AWS services, particularly S3, Redshift, and EMR.
- Deep expertise in RDBMS and SQL, with a proven track record in database management and query optimization.
- Experience using scheduling tools such as Airflow, Control M, or shell scripting.
- Practical experience in Python and/or Scala programming languages
Preferred technical and professional experience
- Knowledge of Core Java (1.8 preferred) is highly desired Excellent communication skills and a willing attitude towards learning.
- Solid experience in Linux and shell scripting. Experience with PySpark or Spark is nice to have Familiarity with DevOps tools including Bamboo, JIRA, Git, Confluence, and Bitbucket is nice to have
- Experience in data modelling, data quality assurance, and load assurance is a nice-to-have.
Employment Type: Full Time, Permanent
Read full job description