PivotRoots is looking for a Backend Engineering role for CSA Team.
Job Description :
Job description
Developing ETL Pipelines : Designing, developing, and maintaining scalable and adaptable data pipelines using Python or PySpark to facilitate the smooth migration of data from diverse data sources . Host these ETL pipelines in AWS EC2, AWS Glue or AWS EMR and store this data to cloud database services like Google BigQuery, AWS S3, Redshift, RDS, Delta Lake etc. This includes managing significant data migrations and ensuring seamless transitions between systems.
Implementing Data Quality Check Framework : Establishing and executing data quality checks and validation pipelines using different tools like Python, PySpark, Athena or BigQuery, S3, Delta Lake to uphold the integrity and accuracy of our datasets.
Creating Mechanisms for Generating ETL Migration Status Reports : Devising a framework to generate concise summary reports detailing data migration progress, alongside promptly alerting stakeholders to any failures within ETL pipelines. This ensures swift resolution of data discrepancies arising from pipeline failures. Implement this using standard SMTP, Python, AWS SNS, AWS SES, AWS S3, Delta Lake etc services.
Data Transformations and Processing : Implementing various data encryption and decryption techniques using Python and PySpark libraries, in addition to generating insightful reports and analyses derived from processed data to aid in informed business decision-making.
Development of APIs : Building APIs using frameworks such as Flask or Django, incorporating diverse authentication and authorization techniques to safeguard the exchange of data. Host these API s on EC2 server using services like Gearman etc or Write API logics in lambda and host these API s using API Gateway services of cloud.
Code Versioning and Deployment : Leveraging GitHub extensively for robust code versioning, deployment of the latest code iterations, seamless transitioning between different code versions, and merging various branches to streamline development and code release processes.
Automation : Designing and implementing code automation solutions to streamline and automate manual tasks effectively.
Required Candidate profile
Soft Skills
Must Have :
Demonstrates adept problem-solving skills to efficiently address complex challenges encountered during data engineering tasks.
Exhibits clear and effective communication skills, facilitating seamless collaboration and comprehension across diverse teams and stakeholders.
Displays proficiency in both independent and collaborative work dynamics, fostering productivity and synergy within a fast-paced team environment.
Demonstrates a high level of adaptability to changing requirements, customer dynamics, and work demands.
Self-motivated and responsible individual who takes ownership and initiative in tasks.
Good to Have :
Demonstrates project management experience, offering valuable insights and contributions towards efficient project execution and delivery.
Good Presentation skills
Excellent customer handling skills.
Technical Skills
Proficiency in SQL (Structured Query Language) for querying and manipulating databases. Experience with relational database systems like MySQL, PostgreSQL, or Oracle and NoSQL databases like Mongo.
Proficiency in object-oriented programming concepts such as encapsulation, inheritance, and polymorphism.
Knowledge of data warehousing concepts and experience with data warehousing solutions like Amazon Redshift, Google BigQuery, or Snowflake.
Experienced in developing ETL pipelines using Python, PySpark.
Knowledge of Python libraries/frameworks like Pandas, NumPy, or Spark for data processing and analysis.
Familiarity with big data processing frameworks like Apache Hadoop and Apache Spark for handling large-scale datasets and performing distributed computing.
Knowledge of cloud-based services like AWS S3, AWS Glue, AWS EMR, AWS Lambda, Athena, Azure Data Lake, Google BigQuery, etc.
Familiarity with version control systems like Git for managing codebase changes, collaborating with team members, and maintaining code quality.
Experience with web scraping libraries and frameworks like BeautifulSoup, Scrapy, Puppeteer, Selenium, etc., is highly beneficial. Knowledge of regular expressions is useful for pattern matching and extracting specific data formats from text. Understanding of HTTP protocols and how web servers respond to requests, how to send requests to web servers, handle responses, and manage sessions and cookies is essential. Familiarity with XPath expressions or CSS selectors is important for targeting specific elements within the HTML structure.
Required Experience
The ideal candidate will have a minimum of 1-3 years of relevant experience in data engineering roles, with a demonstrated history of successfully developing and maintaining ETL pipelines, handling big data migrations, and ensuring data quality and validation.
Must have excellent knowledge and programing capability using Python , PySpark working on any of the Cloud Platforms like AWS, Azure or Google.
Role
Industry Type: Engineering
Functional Area: Data Engineering, Software Development, Automation
Employment Type: Full Time, Permanent
Role Category: System Design/Implementation
Education : A minimum educational requirement is graduation.
Here at Havas across the group we pride ourselves on being committed to offering equal opportunities to all potential employees and have zero tolerance for discrimination. We are an equal opportunity employer and welcome applicants irrespective of age, sex, race, ethnicity, disability and other factors that have no bearing on an individual s ability to perform their job.