190 TalentXO Jobs
Lead Big Data Engineer - PySpark/Python (5-10 yrs)
TalentXO
posted 11hr ago
Flexible timing
What You'll Be Doing :
- Build highly scalable, available, fault-tolerant distributed data processing systems (batch and streaming systems) processing over 100s of terabytes of data ingested every day and petabyte-sized data warehouse and elasticsearch cluster.
- Build quality data solutions and refine existing diverse datasets to simplified models encouraging self-service.
- Build data pipelines that optimize on data quality and are resilient to poor-quality data sources.
- Own the data mapping, business logic, transformations, and data quality.
- Low-level systems debugging, performance measurement & optimization on large production clusters.
- Participate in architecture discussions, influence product roadmap, and take ownership and responsibility over new projects.
- Maintain and support existing platforms and evolve to newer technology stacks and architectures.
Ideal Candidate :
- Proficiency in Python and PySpark.
- Deep understanding of Apache Spark, Spark tuning, creating RDDs, and building data frames.
- Experience in big data technologies like HDFS, YARN, Map-Reduce, Hive, Kafka, Spark, Airflow, Presto, etc.
- Experience in building distributed environments using any of Kafka, Spark, Hive, Hadoop, etc.
- Good understanding of the architecture and functioning of distributed database systems.
- Experience working with various file formats like Parquet, Avro, etc., for large volumes of data.
- Experience with one or more NoSQL databases.
- Experience with AWS, GCP.
- 5+ years of professional experience as a data or software engineer.
Functional Areas: Other
Read full job descriptionPrepare for Big Data Engineer Lead roles with real interview advice
15-20 Yrs