i
LTIMindtree
Proud winner of ABECA 2025 - AmbitionBox Employee Choice Awards
Filter interviews by
I admire LTIMindtree's innovative approach and commitment to data-driven solutions, making it an ideal place for my growth as a Data Engineer.
LTIMindtree's focus on cutting-edge technologies aligns with my passion for data engineering and analytics.
The company's diverse portfolio offers opportunities to work on various projects, enhancing my skills and experience.
I appreciate LTIMindtree's emphasis on collaboratio...
I have extensive experience in data engineering, focusing on ETL processes, data warehousing, and big data technologies.
Developed ETL pipelines using Apache Spark to process large datasets for real-time analytics.
Designed and implemented a data warehouse using Amazon Redshift, improving query performance by 40%.
Worked with data modeling techniques to optimize database structures for better data retrieval.
Utilized ...
List is mutable, tuple is immutable in Python.
List can be modified after creation, tuple cannot.
List is defined using square brackets [], tuple using parentheses ().
Example: list_example = [1, 2, 3], tuple_example = (4, 5, 6)
Python is a high-level programming language known for its simplicity, readability, and versatility.
Python is preferred for data engineering due to its ease of use and readability, making it easier to write and maintain code.
It has a large number of libraries and frameworks specifically designed for data processing and analysis, such as Pandas, NumPy, and SciPy.
Python's flexibility allows for seamless integration w...
What people are saying about LTIMindtree
Spark join syntax allows combining DataFrames based on common keys using various join types like inner, outer, left, and right.
Use `df1.join(df2, 'key')` for an inner join.
For a left join, use `df1.join(df2, 'key', 'left')`.
Outer join syntax: `df1.join(df2, 'key', 'outer')`.
Right join example: `df1.join(df2, 'key', 'right')`.
Cross join can be done using `df1.crossJoin(df2)`.
SparkContext is the main entry point for Spark functionality, while SparkSession is the entry point for Spark SQL.
SparkContext is the entry point for low-level API functionality in Spark.
SparkSession is the entry point for Spark SQL functionality.
SparkContext is used to create RDDs (Resilient Distributed Datasets) in Spark.
SparkSession provides a unified entry point for reading data from various sources and perfor...
When a spark job is submitted, various steps are executed at the backend to process the job.
The job is submitted to the Spark driver program.
The driver program communicates with the cluster manager to request resources.
The cluster manager allocates resources (CPU, memory) to the job.
The driver program creates DAG (Directed Acyclic Graph) of the job stages and tasks.
Tasks are then scheduled and executed on worker n...
Performance optimization in Spark involves tuning configurations, optimizing code, and utilizing caching.
Tune Spark configurations such as executor memory, number of executors, and shuffle partitions.
Optimize code by reducing unnecessary shuffles, using efficient transformations, and avoiding unnecessary data movements.
Utilize caching to store intermediate results in memory and avoid recomputation.
Example: In my p...
Optimizing SQL queries involves using indexes, avoiding unnecessary joins, and optimizing the query structure.
Use indexes on columns frequently used in WHERE clauses
Avoid using SELECT * and only retrieve necessary columns
Optimize joins by using INNER JOIN instead of OUTER JOIN when possible
Use EXPLAIN to analyze query performance and make necessary adjustments
Client mode is better for very less latency due to direct communication with the cluster.
Client mode allows direct communication with the cluster, reducing latency.
Standalone mode requires an additional layer of communication, increasing latency.
Client mode is preferred for real-time applications where low latency is crucial.
I applied via Naukri.com and was interviewed in Oct 2024. There were 2 interview rounds.
Optimizing SQL queries involves using indexes, avoiding unnecessary joins, and optimizing the query structure.
Use indexes on columns frequently used in WHERE clauses
Avoid using SELECT * and only retrieve necessary columns
Optimize joins by using INNER JOIN instead of OUTER JOIN when possible
Use EXPLAIN to analyze query performance and make necessary adjustments
Performance optimization in Spark involves tuning configurations, optimizing code, and utilizing caching.
Tune Spark configurations such as executor memory, number of executors, and shuffle partitions.
Optimize code by reducing unnecessary shuffles, using efficient transformations, and avoiding unnecessary data movements.
Utilize caching to store intermediate results in memory and avoid recomputation.
Example: In my projec...
SparkContext is the main entry point for Spark functionality, while SparkSession is the entry point for Spark SQL.
SparkContext is the entry point for low-level API functionality in Spark.
SparkSession is the entry point for Spark SQL functionality.
SparkContext is used to create RDDs (Resilient Distributed Datasets) in Spark.
SparkSession provides a unified entry point for reading data from various sources and performing ...
When a spark job is submitted, various steps are executed at the backend to process the job.
The job is submitted to the Spark driver program.
The driver program communicates with the cluster manager to request resources.
The cluster manager allocates resources (CPU, memory) to the job.
The driver program creates DAG (Directed Acyclic Graph) of the job stages and tasks.
Tasks are then scheduled and executed on worker nodes ...
Calculate second highest salary using SQL and pyspark
Use SQL query with ORDER BY and LIMIT to get the second highest salary
In pyspark, use orderBy() and take() functions to achieve the same result
The two types of modes for Spark architecture are standalone mode and cluster mode.
Standalone mode: Spark runs on a single machine with a single JVM and is suitable for development and testing.
Cluster mode: Spark runs on a cluster of machines managed by a cluster manager like YARN or Mesos for production workloads.
Client mode is better for very less latency due to direct communication with the cluster.
Client mode allows direct communication with the cluster, reducing latency.
Standalone mode requires an additional layer of communication, increasing latency.
Client mode is preferred for real-time applications where low latency is crucial.
SQL and PySpark code examples for data manipulation and analysis.
Use SQL for structured queries: SELECT, JOIN, GROUP BY.
Example SQL: SELECT name, COUNT(*) FROM patients GROUP BY name;
Use PySpark for big data processing: DataFrame API, RDDs.
Example PySpark: df.groupBy('name').count().show();
Optimize queries with indexing in SQL and caching in PySpark.
I applied via Campus Placement
It included questions related to aptitude and computer science.
It was amazing fabulous fantastic mind blowing
I applied via Naukri.com and was interviewed in Jul 2024. There were 2 interview rounds.
I am a data engineer with a strong background in programming and data analysis.
Experienced in designing and implementing data pipelines
Proficient in programming languages like Python, SQL, and Java
Skilled in working with big data technologies such as Hadoop and Spark
I rate myself highly in SQL and Snowflake, with extensive experience in both technologies.
Proficient in writing complex SQL queries for data manipulation and analysis
Skilled in optimizing queries for performance and efficiency
Experienced in working with Snowflake for data warehousing and analytics
Familiar with Snowflake's unique features such as virtual warehouses and data sharing
I applied via Approached by Company and was interviewed in Aug 2024. There were 3 interview rounds.
My strengths include strong analytical skills, attention to detail, and the ability to work well under pressure.
Strong analytical skills - able to analyze complex data sets and identify trends
Attention to detail - meticulous in ensuring data accuracy and completeness
Ability to work well under pressure - can meet tight deadlines and handle high-pressure situations
Very nice eay peasy iiiìiiiiii
Veey nice super ataiiansjsjskkshshshshj
I applied via Campus Placement
Reasoning and vocabulary.
Time travel and fail safe are concepts in data engineering related to managing data backups and ensuring data integrity.
Time travel refers to the ability to access historical versions of data to track changes over time.
Fail safe mechanisms ensure that data is backed up and can be recovered in case of system failures or data corruption.
Examples of fail safe practices include regular backups, redundancy in storage system...
I appeared for an interview in Mar 2025, where I was asked the following questions.
Explaining joins in Python, PySpark, and SQL with advanced window functions.
In Python, use pandas: df1.merge(df2, on='key', how='inner') for joins.
In PySpark, use DataFrame API: df1.join(df2, 'key', 'inner').
In SQL, use JOIN clause: SELECT * FROM table1 INNER JOIN table2 ON table1.key = table2.key.
Advanced window functions in SQL: SELECT *, ROW_NUMBER() OVER (PARTITION BY key ORDER BY value) AS row_num FROM table.
In Py...
I applied via Naukri.com and was interviewed in Jul 2024. There was 1 interview round.
Python sql aws etl tool you have hands on experience
Related to recent events
Some of the top questions asked at the LTIMindtree Data Engineer interview -
The duration of LTIMindtree Data Engineer interview process can vary, but typically it takes about less than 2 weeks to complete.
based on 71 interview experiences
Difficulty level
Duration
based on 374 reviews
Rating in categories
Senior Software Engineer
22k
salaries
| ₹6 L/yr - ₹23 L/yr |
Software Engineer
16.3k
salaries
| ₹2 L/yr - ₹10 L/yr |
Technical Lead
6.4k
salaries
| ₹9.5 L/yr - ₹37.5 L/yr |
Module Lead
5.7k
salaries
| ₹7 L/yr - ₹28 L/yr |
Senior Engineer
4.4k
salaries
| ₹4.2 L/yr - ₹16 L/yr |
Cognizant
Capgemini
Accenture
TCS