5 Talent On Lease Jobs
Big Data Developer - Apache Spark/Flink (8-10 yrs)
Talent On Lease
posted 14hr ago
Flexible timing
Key skills for the job
Key Responsibilities :
Data System Development & Optimization :
- Design, develop, and implement efficient data pipelines and data storage solutions using Big Data technologies.
- Build robust, scalable, and high-performance data processing systems using distributed frameworks such as Apache Spark (Core, Streaming, SQL), Flink, or Storm.
- Optimize and troubleshoot data processing jobs to ensure high throughput, low latency, and resource efficiency.
- Work on ClickHouse or similar OLAP (Online Analytical Processing) systems for high-performance analytics on large datasets.
Cloud Data Engineering :
- Develop and manage data solutions using AWS Cloud services such as S3, Redshift, Glue, Kinesis, EMR, and other cloud-based data services.
- Integrate cloud data solutions with on-premises data infrastructure, ensuring seamless data movement and access.
- Implement and optimize cloud-based ETL (Extract, Transform, Load) processes in AWS.
- Ensure data security, integrity, and compliance with industry standards in cloud environments.
Programming & Development :
- Utilize Java, Scala, and Python for developing data processing applications, algorithms, and custom solutions.
- Build reusable code and libraries for future use, ensuring modularity and scalability of data solutions.
- Write complex queries and data transformation logic to meet business and analytical needs.
Collaboration & Solution Design :
- Collaborate with cross-functional teams, including Data Scientists, Data Analysts, and Business Analysts, to understand requirements and design data architectures.
- Work with the product and engineering teams to define and implement data solutions that are aligned with business goals.
- Provide expertise in the integration of data systems with other business applications.
Performance & Scalability :
- Monitor and improve the performance of data systems and queries, ensuring the scalability of solutions to handle growing data volumes.
- Conduct performance tuning and troubleshooting of large-scale distributed data processing jobs to optimize speed and cost-effectiveness.
- Leverage cloud-native tools and frameworks to ensure efficient data storage, processing, and access.
Documentation & Reporting :
- Create and maintain detailed technical documentation for data systems, including architecture, processes, and workflows.
- Report on the status of data processing projects, providing insights into data trends, system performance, and areas for improvement.
Technical Qualifications :
Core Skills :
Big Data Technologies :
- Proficient in Apache Spark (Core, Streaming, SQL), Flink, Storm, or other distributed data processing frameworks.
- Strong knowledge of ClickHouse or other OLAP systems for large-scale data analytics and querying.
- Experience in designing and building high-performance data pipelines.
Cloud Data Services (AWS) :
- Expertise in AWS services: S3, Redshift, EMR, Glue, Kinesis, Lambda, and CloudFormation.
- Experience with cloud-based ETL processes and data storage/management in AWS.
Programming Languages :
- Expertise in Java, Scala, and Python for developing data-driven applications and algorithms.
- Proficient in building distributed systems and data processing frameworks in these languages.
Data Storage & Management :
- Familiar with HDFS, NoSQL databases (e.g, Cassandra, HBase), and data warehouses like Redshift and BigQuery.
- Knowledge of data lake architectures and cloud storage solutions.
Data Integration & Transformation :
- Skilled in ETL processes and frameworks for ingesting, transforming, and storing large volumes of data.
- Proficient in SQL and NoSQL query languages for data retrieval and processing.
Performance & Optimization :
- Strong understanding of performance tuning and optimization techniques for large-scale distributed systems.
- Experience with data partitioning, sharding, and optimizing big data storage and access.
Version Control & Automation :
- Familiar with Git and CI/CD pipelines for version control and automated deployment.
Preferred Qualifications :
- Experience with additional Big Data technologies such as Kafka, Hive, Presto, or Druid.
- Familiarity with containerization tools like Docker and orchestration platforms like Kubernetes.
- Experience working with Data Lakes and Machine Learning models in production environments.
- Familiar with DevOps practices and cloud-native development tools.
Education & Work Experience :
- Bachelor's/Master's degree in Computer Science, Engineering, or a related technical field.
- 8+ years of IT experience, with at least 5+ years in data-related technologies.
- Minimum 3+ years of experience in AWS Cloud services for data engineering.
- Proven experience in working with distributed systems and Big Data technologies like Apache Spark, ClickHouse, or similar.
Soft Skills :
- Strong problem-solving skills and the ability to work under pressure.
- Ability to work independently and as part of a collaborative team.
- Strong communication skills, with the ability to explain complex technical concepts to non-technical stakeholders.
- Highly organized, detail-oriented, and results-driven
Functional Areas: Software/Testing/Networking
Read full job description