Create, construct, and manage reliable data pipelines to load, extract, and transform (ETL) data into our data warehouse from a variety of sources
In order to meet business intelligence and analytics demands, create and manage data models that guarantee data performance, correctness, and consistency
Ensure data integrity, availability, and security by managing and optimizing data warehouses and other data storage solutions
Locate and fix performance bottlenecks in databases and data pipelines to guarantee effective query performance and data processing
Design, develop, and maintain the data architecture, ensuring the data pipelines availability, reliability, and scalability
Collaborate closely with data scientists, analysts, and other stakeholders to empower data-driven decision-making across the organization
Automate data quality checks, data extraction, and pre-processing
Ensure that the data will flow through the pipelines seamlessly and enable the Data Analysts and Data Scientists to extract nuggets from the data
To guarantee the accuracy of the data used for reporting and analysis, put in place procedures for data quality checks and validation
To enable information sharing and troubleshooting, create and maintain documentation for data processes, pipelines, and infrastructure
Work together with cross-functional teams to comprehend data requirements and provide solutions that satisfy them, including data scientists, analysts, and software developers
Install monitoring and warning systems to find and fix infrastructure and data pipeline problems early
Follow compliance protocols in line with legal requirements, guidelines, and standards
Job Requirements:
Bachelor s/Master s degree in Engineering, Computer Science (or equivalent experience)
At least 3+ years of relevant experience as a data engineer
Demonstrated expertise in data pipeline management, ETL creation, and data engineering
Competence with large data processing frameworks such as Spark (park) and programming languages such as Python, Java, or Scala
Solid SQL abilities and familiarity with relational databases (PostgreSQL, MySQL, etc.)
Practical experience with cloud platforms (Azure, Google Cloud, AWS, etc.) and big data technologies (Hadoop, Spark, Databricks, etc.)
Comprehensive understanding of data warehousing principles and methods
Proficiency with DevOps techniques and version control technologies (such as Git)
Results-oriented with excellent critical thinking, analytical, and numerical abilities
Have the ability to collaborate with others in cross-functional teams to carry out project requirements
Ability to operate freely and productively in a fast-paced, flexible setting with strict deadlines
Be a cooperative, adaptable, and realistic team member with the natural ability to interact with stakeholders at all organizational levels requires careful attention to detail in order to ensure data correctness and cleanliness
Ability to use data-driven methods to solve real-world business problems
When working with sensitive data, be aware of data ethics and privacy laws (such as the IT Act 2000 and DPDP)
Excellent English communication skills, both spoken and written