5+ years highly skilled Data Scientist, responsible for designing, developing, and maintaining large-scale data processing systems using cutting-edge technologies like Spark Streaming, Kafka, or similar technologies.
The ideal candidate will have a strong background in object-oriented programming, data architecture, and cloud-based data management. The successful candidate will work closely with cross-functional teams to deliver complex data projects.
Your role:
Implement Real-time Data Processing Solutions: Design, develop, and deploy real-time data processing solutions using Spark Streaming and/or Kafka to handle high-volume, high-velocity data streams.
Stream Processing Frameworks: Ability to develop / work on / maintain a near real-time stream processing frameworks to handle high-volume, high-velocity data streams, leveraging Spark Streaming and/or Kafka.
Advanced Data Storage: Knowledge of advanced data storage formats like Apache Hudi or Apache Iceberg to optimize data storage, management, and retrieval.
Code Quality and Maintainability: Develop and maintain high-quality, maintainable code using object-oriented programming principles, ensuring scalability, efficiency, and reliability.
ETL Process Design and Implementation: Experience with designing and implementing ETL (Extract, Transform, Load) processes managing data pipelines, data transformation, and data loading.
Collaboration and Communication: Collaborate with cross-functional teams, including data engineering, data analytics, software developers, testers, cloud architects and business stakeholders to deliver complex data projects, ensuring alignment with business objectives and technical requirements.
Data Architecture and Security: Participate in designing and implementing data architectures, ensuring data security, governance, and compliance with organizational standards.
What we need:
masters or Ph.D. in Computer Science, Statistics, Mathematics, or related fields.
5+ years of experience in data science, data engineering, or related fields.
Strong expertise in Spark Streaming and/or Kafka
Hands on knowledge of Apache Hudi, Apache Iceberg, and AWS Glue.
Very good proficiency in object-oriented programming languages (Java, Scala, Python).
Experience with cloud-based data management (AWS, GCP, Azure).
Strong understanding of data architecture, data modeling, and data governance.
Excellent problem-solving skills, with the ability to work in a fast-paced environment.
Strong communication and collaboration skills, with experience working in cross-functional teams.
Experience with Agile development methodologies and version control systems (eg, Git).
Nice to Have:
Experience with AWS Glue Data Catalog, AWS Glue ETL, and AWS Lake Formation.
Knowledge of containerization (Docker) and orchestration (Kubernetes).
Certification in data science, data engineering, or related fields