1 O2f info Solutions Job
Data Engineer - Python/Spark (4-8 yrs)
O2f info Solutions
posted 5d ago
Flexible timing
Key skills for the job
Job Summary :
We are seeking a highly skilled Senior Data Engineer with 4 to 8 years of experience in building robust data pipelines and working extensively with PySpark to join our data engineering team.
Key Responsibilities :
Data Pipeline Development :
- Design, build, and maintain scalable data pipelines using PySpark to process large datasets and support data-driven applications and analytics.
ETL Process Automation :
- Develop and automate ETL (Extract, Transform, Load) processes using PySpark, ensuring efficient data processing, transformation, and loading from diverse sources into data lakes, warehouses, or databases.
Distributed Computing with PySpark :
- Leverage Apache Spark and PySpark to process large-scale data in a distributed computing environment, optimizing for performance and scalability.
Cloud Data Solutions :
- Develop and deploy data pipelines and processing frameworks on cloud platforms (AWS, Azure, GCP) using native tools like AWS Glue, Azure Databricks, or Google Dataproc.
Data Integration & Transformation :
- Integrate data from various internal and external sources, ensuring data consistency, quality, and reliability throughout the pipeline.
Performance Optimization :
- Optimize PySpark jobs and pipelines for faster data processing, handling large volumes of data efficiently with minimal latency.
- Proven experience as a Data Engineer or similar role, with a strong background in database development, ETL processes, and software development.
- Proficiency in SQL and scripting languages such as Python, with experience working with relational databases.
- Proficiency in dataProc (PySpark), Pandas or other data processing libraries
- Experience with data modeling, schema design, and optimization techniques for scalability.
- Strong analytical and problem-solving skills, with the ability to troubleshoot complex data issues and optimize data processing pipelines for scale
Required Qualifications :
Experience :
- 4-8 years of experience in data engineering, with a strong focus on PySpark and large-scale data processing.
Technical Skills :
- Expertise in PySpark for distributed data processing, data transformation, and job optimization.
- Strong proficiency in Python and SQL for data manipulation and pipeline creation.
- Hands-on experience with Apache Spark and its ecosystem, including Spark SQL, Spark Streaming, and PySpark MLlib.
- Solid experience working with ETL tools and frameworks, such as Apache Airflow or similar orchestration tools.
Functional Areas: Software/Testing/Networking
Read full job descriptionPrepare for Data Engineer roles with real interview advice