Design, implement, test, deploy, and maintain stable, secure, and scalable data engineering solutions in support of data and analytics projects, including integrating new sources of data into our Analytics data warehouse, and moving data out to applications and affiliates.
Use Python ,PySpark and SparkSQL to develop and optimize scalable data processing pipelines. Proficient in performing distributed data transformations, aggregations, and analysis to support advanced analytics
Should be Proficient in writing complex SQL queries, including subqueries, window functions, CTEs (Common Table Expressions), and advanced joins for robust data extraction and manipulation. Experienced in implementing stored procedures for encapsulating business logic, and triggers for automated data validation and integrity checks. Skilled in query optimization and performance tuning to ensure efficient data retrieval.
Experience in developing and maintaining end-to-end data pipelines with ETL and ELT processes.
Build reports and data visualizations, using data from the data warehouse and other sources.
Help in drafting what needs to be tracked using google analytics to deliver the dashboards
Produce scalable, replicable code and engineering solutions that help automate repetitive data management tasks.
Perform one-off data manipulation/munging and analysis on a wide variety of organizing data.
Implement and monitor best in class security measures in our data warehouse and analytics environment, with an eye towards the evolving threat landscape.
Preferred candidate profile
Strong command of relational databases and SQL. Extract, Transform, and Load (ETL) data into a relational database.
Proficiency with Python or R, especially for data manipulation and analysis, and ability to build, maintain and deploy sequences of automated processes with these tools.
General data manipulation skills: read in data, process and clean it, transform and recode it, merge different data sets together, reformat data between wide and long, etc.
Demonstrated ability to learn new techniques and troubleshoot code without support
Demonstrated ability to write clear code that is well-documented and stored in a version control system
Demonstrated ability to work independently and be a self-starter.
Excellent listening, interpersonal, communication and problem solving skills.
Demonstrated ability to work effectively in teams, in both a lead and support role.
Use APIs to push and pull data from various data systems and platforms.
Experience working with cloud infrastructure services like and Google Cloud especially BigQuery.
Effective time management skills, including demonstrated ability to manage and prioritize multiple tasks and projects.
Experience with advanced data visualization and mapping are helpful, but not required.
Ability to work long and extended hours as needed.