Develop data products, infrastructure and data pipelines leveraging AWS and Databricks/Snowflake Data Platform ecosystem or prior experience with services such as Redshift, Kinesis, EMR, Lambda etc. and tools such as Glue, Apache Spark, Job Scheduler etc.
Designing and providing support for ETL / ELT / File Movement of data using Databricks or Snowflake with PySpark,Python and Spark SQL.
Develop new data models and end to data pipelines.
Working with Leads and Architects on developing robust and scalable data pipelines to ingest, transform, and analyse large volumes of structured and unstructured data from diverse data sources.
Experience with Pipeline optimisation for performance, reliability, and scalability.
Contributes to initiatives to enhance data quality, governance and security across the organisation, ensuring compliance with guidelines and industry best practices.
Builds innovative solutions to acquiring and enriching data from a variety of data sources.
Conducts insightful and physical database design, designs key and indexing schemes and designs partitioning.
Participates in building and testing business continuity & disaster recovery procedures per requirements
What we are looking for
Bachelor s degree in Computer Science /Information Systems/Engineering/related field
6-8 years of experience in Data Platform Ecosystem of Databricks/Snowflake or Lakehouse Ecosystem
Prior experience in Apache Spark performance tuning and debugging
Experience in Workflow Scheduler eg. KubeFlow, AirFlow, Oozie etc.
SQL experience eg. SparkSql, Impala, BigQuery, Presto/Trino, StarRocks etc.
Experience debugging and reasoning about production issues is desirable