Primary Responsibilities:
Design and build data pipelines to process terabytes of data Orchestrate in Airflow the data tasks to run on Kubernetes/Hadoop for the ingestion, processing and cleaning of data Create Docker images for various applications and deploy them on Kubernetes Design and build best in class processes to clean and standardize data Troubleshoot production issues in our Elastic Environment Tuning and optimizing data processes Advancing the team"s DataOps culture (CI/CD, Orchestration, Testing, Monitoring) and building out standard development patterns Drive innovation by testing new technology and approaches to continually advance the capability of the data engineering function Drive efficiencies in current engineering processes via standardization and migration of existing on-premises processes to the cloud Ensuring Data Quality - building best in class data quality monitoring that ensure that all data products exceed customer expectations Comply with the terms and conditions of the employment contract, company policies and procedures, and any and all directives (such as, but not limited to, transfer and/or re-assignment to different work locations, change in teams and/or work shifts, policies in regards to flexibility of work benefits and/or work environment, alternative work arrangements, and other decisions that may arise due to the changing business environment). The Company may adopt, vary or rescind these policies and directives in its absolute discretion and without any limitation (implied or otherwise) on its ability to do so
Required Qualifications:
Bachelor"s degree in Computer Science or similar Hands-on experience on the following technologies: Developing processes in Spark Writing complex SQL queries Building ETL/data pipelines Related/complementary open-source software platforms and languages (e.g. Scala, Python, Java, Linux) Experience building cloud-native data pipelines on either AWS, Azure or GCP following best practices in cloud deployments Solid DataOps experience (CI/CD, Orchestration, Testing, Monitoring) Good experience handling real-time, near real-time and batch data ingestions Good understanding of Data Modelling techniques i.e. DataVault, Kimble Star Proven excellent understanding of Column-Store RDBMS (DataBricks, Snowflake, Redshift, Vertica, Clickhouse) Proven track record of designing effective data strategies and leveraging modern data architectures that resulted in business value Demonstrated effective interpersonal, influence, collaboration and listening skills Demonstrated solid stakeholder management skills Employment Type: Full Time, Permanent
Read full job description