147 TalentXO Jobs
Lead Data Engineer - Python (5-9 yrs)
TalentXO
posted 18hr ago
Flexible timing
Key skills for the job
About Lifesight :
Lifesight is a fast-growing SaaS company focused on helping businesses leverage data & AI to improve customer acquisition and retention. We have a team of 130 serving 300+ customers across 5 offices in the US, Singapore, India, Australia, and the UK.
Our mission is to make it easy for non-technical marketers to leverage advanced data activation and marketing measurement tools powered by AI, to improve their performance and achieve their KPIs. Our product is being adopted rapidly globally, and we need the best people onboard to accelerate our growth.
Dealing with Petabytes of data and more than 400TB+ daily data processing to power attribution and measurement platforms, building scalable, highly available, fault-tolerant big data platforms is critical for our success.
From your first day at Lifesight, you'll make a valuable - and valued - contribution. We offer you the opportunity to delight customers around the world while gaining meaningful experience across a variety of disciplines.
Working Days : 5 days/week
Office Location : HSR Layout, Bengaluru
Role & Responsibilities :
Lifesight is growing rapidly and seeking a strong Data Engineer to be a key member of the Data and Business Intelligence organization with a focus on deep data engineering projects. You will be joining as one of the few initial data engineers as part of the data platform team in our Bengaluru office. You will have an opportunity to help define our technical strategy and data engineering team culture in India.
You will design and build data platforms and services while managing our data infrastructure in cloud environments that fuel strategic business decisions across Lifesight products.
A successful candidate will be a self-starter, who drives excellence, is ready to jump into a variety of big data technologies & frameworks, and is able to coordinate and collaborate with other engineers, as well as mentor other engineers in the team.
What You'll Be Doing :
- Build highly scalable, available, fault-tolerant distributed data processing systems (batch and streaming systems) processing over 100s of terabytes of data ingested every day and petabyte-sized data warehouse and elasticsearch cluster.
- Build quality data solutions and refine existing diverse datasets to simplified models encouraging self-service.
- Build data pipelines that optimize on data quality and are resilient to poor-quality data sources.
- Own the data mapping, business logic, transformations, and data quality.
- Low-level systems debugging, performance measurement & optimization on large production clusters.
- Participate in architecture discussions, influence product roadmap, and take ownership and responsibility over new projects.
- Maintain and support existing platforms and evolve to newer technology stacks and architectures.
Ideal Candidate :
- Proficiency in Python and PySpark.
- Deep understanding of Apache Spark, Spark tuning, creating RDDs, and building data frames.
- Experience in big data technologies like HDFS, YARN, Map-Reduce, Hive, Kafka, Spark, Airflow, Presto, etc.
- Experience in building distributed environments using any of Kafka, Spark, Hive, Hadoop, etc.
- Good understanding of the architecture and functioning of distributed database systems.
- Experience working with various file formats like Parquet, Avro, etc., for large volumes of data.
- Experience with one or more NoSQL databases.
- Experience with AWS, GCP.
- 5+ years of professional experience as a data or software engineer.
Functional Areas: Software/Testing/Networking
Read full job descriptionPrepare for Lead Data Engineer roles with real interview advice