8 InOpTra Digital Jobs
Data Engineer - ETL/Python (5-7 yrs)
InOpTra Digital
posted 15d ago
Flexible timing
Key skills for the job
About the Job :
We are seeking a highly skilled Data Engineer with certification in Databricks to join our team and help drive cloud data engineering initiatives.
The ideal candidate will have extensive experience in building and optimizing scalable data architectures, developing and managing ETL/ELT pipelines, and leveraging Databricks for large-scale data processing.
As a subject matter expert in Databricks, you will work closely with cross-functional teams, stakeholders, and clients to design and implement data solutions that enable better data access, security, and governance.
Key Responsibilities :
- Serve as the point of contact and subject matter expert for all Databricks-related activities within the organization, guiding architectural decisions, development processes, and operational best practices.
- Architect and implement scalable data pipelines, leveraging Databricks and other cloud technologies to support business operations and analytics initiatives.
- Work closely with the sales team to understand client needs, propose a data roadmap, and provide technical expertise for prospects looking to migrate to the cloud.
- Create and present Proof of Concepts (PoCs) to demonstrate the capabilities of Databricks and how it can meet specific client requirements.
- Design, develop, and manage efficient ETL/ELT pipelines in Databricks using Python (PySpark) and Spark for high-volume data processing, integrating various data sources to ensure the smooth flow of data across business operations.
- Ensure data pipelines are optimized for performance, reliability, and scalability, using best practices in data engineering.
- Leverage Unity Catalog within Databricks to manage data governance, security, and lineage across the entire Databricks environment.
- Implement data access controls, ensuring compliance with security protocols and industry regulations (e., GDPR, HIPAA).
- Develop and maintain Continuous Integration / Continuous Deployment (CI/CD) pipelines for Databricks workflows, enabling automated testing, version control, and efficient code deployments.
- Use Git and other DevOps tools to automate workflows and ensure seamless integration across cloud environments.
- Design and implement large-scale data architectures, including Data Lakes, Lakehouses, and Data Warehouses, ensuring that data is organized, stored, and managed efficiently.
- Build a flexible, high-performance data environment that supports both structured and unstructured data.
- Configure, monitor, and optimize Databricks clusters and jobs for both batch and streaming data processing, ensuring that the system can handle large and dynamic datasets with minimal latency.
- Continuously tune Databricks workflows to improve overall performance, resource utilization, and cost efficiency.
- Stay up-to-date with the latest developments, features, and tools in Databricks and the broader data engineering ecosystem, bringing new solutions and best practices into our workflows.
- Continuously assess and adopt emerging technologies to enhance our data engineering capabilities.
- Collaborate with cross-functional teams, including security and compliance specialists, to enforce data governance practices and ensure that the data solutions meet regulatory requirements and organizational policies.
- Implement robust security practices to protect data across the entire pipeline.
- Continuously monitor and tune Databricks workloads and resources to ensure high performance, scalability, and availability.
- Adapt data processing systems and architecture to meet evolving business needs, handling increasingly large datasets and complex processing requirements.
- Provide training, mentorship, and guidance to junior cloud engineers, fostering a culture of knowledge sharing, continuous learning, and adherence to best practices.
- Contribute to team development by promoting a collaborative and positive work environment.
- 5-7 years of experience in data engineering, with a strong background in cloud data architecture and building ETL/ELT pipelines.
- Databricks Certification (Data Engineer) required.
- Hands-on experience with Databricks platform, Python (PySpark), Spark, and managing large-scale data processing workflows.
- Experience in designing and managing Data Lakes, Lakehouses, and Data Warehouses on the cloud.
- Proven experience with CI/CD pipelines, version control (Git), and DevOps tools.
- Strong knowledge of Unity Catalog for managing data governance, security, and lineage.
Technical Skills :
- Proficiency in Python and PySpark for data processing.
- Strong experience with Apache Spark and related technologies for distributed computing.
- Familiarity with cloud platforms (e., AWS, Azure, Google Cloud) and Databricks clusters management.
- Experience with database systems (SQL and NoSQL), data warehousing, and data pipeline orchestration tools (e., Apache Airflow, DBT).
- Experience with data integration tools and APIs for connecting various data sources and destinations
Functional Areas: Software/Testing/Networking
Read full job descriptionPrepare for Data Engineer roles with real interview advice