5 Maiora Jobs
Data Engineer - Python/PySpark (4-6 yrs)
Maiora
posted 18hr ago
Key skills for the job
That You Will Do :
- Responsible for the documentation, design, development, and architecture of Hadoop applications.
- Must have at least 3 years of hands-on working knowledge on Big Data technologies such as Impala, Hive, Hadoop, Spark, Spark streaming, Kafka etc.
- Excellent programming skills in python.
- Experience with stream-processing systems : Storm, Spark-Streaming, etc.
- Experience with relational SQL and NoSQL databases, including Vertica.
- Experience with cloud services.
- Cloudera Hadoop Distribution, Shell Scripting, Superset, Hands on with cluster management.
- Development : Create and maintain scalable big data applications using Python, Spark, Hive, and Impala.
- Data Pipelines : Develop and optimize data processing pipelines to handle large datasets.
- Integration : Implement data ingestion, transformation, and loading processes.
- Collaboration : Work with data scientists and analysts to meet data requirements.
- Quality Control : Ensure data quality, integrity, and security.
- Performance : Monitor and troubleshoot performance issues to improve efficiency.
- Documentation : Participate in code reviews, testing, and documentation.
- Learning : Stay updated with industry trends and advancements in big data technologies.
Requirements :
- Bachelor's or Master's degree in Computer Science, IT, or related field.
- 3 years in a Big Data Developer role.
- Proficiency in Python.
- Strong experience with Apache Spark.
- Hands-on experience with Hive and Impala.
- Familiarity with Hadoop, HDFS, Kafka, and other big data tools.
- Knowledge of data modeling, ETL processes, and data warehousing concepts.
Soft Skills :
- Excellent problem-solving, communication, and teamwork skills.
- Requirements 3 years in a Big Data Developer role.
- Proficiency in Python.
- Strong experience with Apache Spark.
- Hands-on experience with Hive and Impala.
- Familiarity with Hadoop, HDFS, Kafka, and other big data tools.
- Knowledge of data modeling, ETL processes, and data warehousing concepts.
Soft Skills :
- Excellent problem-solving, communication, and teamwork skills.
- Collaborate with a dynamic team in a fast-paced environment to develop and maintain Python-based applications.
- Write clean, scalable, and well-documented code.
- Design and implement software solutions, ensuring high performance and responsiveness.
- Optimize code for maximum efficiency and maintainability.
- Collaborate with cross-functional teams to define, design, and ship new features.
- Contribute to the entire software development lifecycle, from concept to deployment.
- Troubleshoot, debug, and address software defects and issues.
- Stay updated on industry best practices and emerging technologies.
Required Skills :
- Strong proficiency in Python and PySpark.
- Experience in writing SQL queries & scripting.
- Experience in creating ETL flows and data orchestration.
- Experience in working on files : CSV, Excel, Parquet
- Good to have working experience with Databricks, and Spark Server.
- Good to have working experience with PowerBi, Tableau.
- Knowledge of database systems : MySQL, PostgreSQL,OracleDB, MSSQL.
- Familiarity with version control systems, particularly Git.
- Exposure to DevOps practices and tools.
- Exposure to cloud services, particularly AWS.
- Experience in managing Apache Airflow.
Qualifications :
- Bachelor's degree in Computer Science or a related field.
- Strong problem-solving and algorithmic thinking.
- Ability to work collaboratively in a team-oriented environment
Functional Areas: Software/Testing/Networking
Read full job descriptionPrepare for Data Engineer roles with real interview advice