Job Description: We are looking for a Data Scientist who is proficient in data manipulation, ETL processes, and analytics. The ideal candidate will be responsible for managing and optimizing ETL pipelines and ensuring data accuracy and integrity. This role involves working with a variety of tools and technologies including Pandas, PostgreSQL, BigQuery, and AWS Glue.
Responsibilities:
Design, develop, and maintain robust ETL pipelines using AWS Glue and other relevant technologies. Perform data extraction, transformation, and loading of large and complex data sets. Analyze data using statistical techniques and provide reports using Pandas and BigQuery. Work collaboratively with other teams to integrate systems and data effectively. Optimize data retrieval and develop dashboards for data visualization and analysis. Implement data validation and testing to ensure accuracy and quality of data. Document all processes, models, and activities to ensure alignment with company policy and compliance with regulations. Requirements:
Proven experience as a Data Scientist or in a similar role. Strong proficiency in Python, especially with Pandas for data manipulation. Experience with relational databases such as PostgreSQL and cloud services like BigQuery. Proficient in designing and implementing ETL processes, preferably with AWS Glue. Knowledge of PySpark and Apache Airflow is highly desirable. Ability to work independently on complex data challenges. Excellent problem-solving skills and attention to detail. Preferred Qualifications:
Experience in machine learning and advanced analytics is a plus. Familiarity with data modeling and warehousing concepts. Strong communication and teamwork skills.