Build highly scalable, available, fault-tolerant distributed data processing systems (batch and streaming systems) processing over 10s of terabytes of data ingested every day and petabyte-sized data warehouse
Build quality data solutions and refine existing diverse datasets to simplified models encouraging self-service
Build data pipelines that optimise on data quality and are resilient to poor quality data sources
Own the data mapping, business logic, transformations and data quality
Low level systems debugging, performance measurement & optimization on large production clusters
Participate in architecture discussions, influence product roadmap, and take ownership and responsibility over new projects
Maintain and support existing platforms and evolve to newer technology stacks and architectures
We’re excited if you have
Extensive SQL Skills
Proficiency in at least one scripting language, Python is required
Proficiency in at least one object-oriented language,? Java is preferred
Experience in big data technologies like HDFS, YARN, Map-Reduce, Hive, Kafka, Spark, Airflow, Presto, etc.
Experience with AWS, GCP, Looker is a plus
Collaborate with cross-functional teams such as developers, analysts, and operations to execute deliverables
5+ years professional experience as a data or software engineer
BS in Computer Science; MS in Computer Science preferred