LTIMindtree
10+ DSP Investment Managers Interview Questions and Answers
Q1. If you want very less latency - which is better standalone or client mode?
Client mode is better for very less latency due to direct communication with the cluster.
Client mode allows direct communication with the cluster, reducing latency.
Standalone mode requires an additional layer of communication, increasing latency.
Client mode is preferred for real-time applications where low latency is crucial.
Q2. When a spark job is submitted, what happens at backend. Explain the flow.
When a spark job is submitted, various steps are executed at the backend to process the job.
The job is submitted to the Spark driver program.
The driver program communicates with the cluster manager to request resources.
The cluster manager allocates resources (CPU, memory) to the job.
The driver program creates DAG (Directed Acyclic Graph) of the job stages and tasks.
Tasks are then scheduled and executed on worker nodes in the cluster.
Intermediate results are stored in memory o...read more
Q3. How do you do performance optimization in Spark. Tell how you did it in you project.
Performance optimization in Spark involves tuning configurations, optimizing code, and utilizing caching.
Tune Spark configurations such as executor memory, number of executors, and shuffle partitions.
Optimize code by reducing unnecessary shuffles, using efficient transformations, and avoiding unnecessary data movements.
Utilize caching to store intermediate results in memory and avoid recomputation.
Example: In my project, I optimized Spark performance by increasing executor me...read more
Q4. How do you optimize SQL queries?
Optimizing SQL queries involves using indexes, avoiding unnecessary joins, and optimizing the query structure.
Use indexes on columns frequently used in WHERE clauses
Avoid using SELECT * and only retrieve necessary columns
Optimize joins by using INNER JOIN instead of OUTER JOIN when possible
Use EXPLAIN to analyze query performance and make necessary adjustments
Q5. Calculate second highest salary using SQL as well as pyspark.
Calculate second highest salary using SQL and pyspark
Use SQL query with ORDER BY and LIMIT to get the second highest salary
In pyspark, use orderBy() and take() functions to achieve the same result
Q6. 2 types of modes for Spark architecture ?
The two types of modes for Spark architecture are standalone mode and cluster mode.
Standalone mode: Spark runs on a single machine with a single JVM and is suitable for development and testing.
Cluster mode: Spark runs on a cluster of machines managed by a cluster manager like YARN or Mesos for production workloads.
Q7. What factors should be considered when designing a road curve?
Factors to consider when designing a road curve
Radius of the curve
Speed limit of the road
Banking of the curve
Visibility around the curve
Traffic volume on the road
Road surface conditions
Presence of obstacles or hazards
Environmental factors such as weather conditions
Q8. What is SparkContext and SparkSession?
SparkContext is the main entry point for Spark functionality, while SparkSession is the entry point for Spark SQL.
SparkContext is the entry point for low-level API functionality in Spark.
SparkSession is the entry point for Spark SQL functionality.
SparkContext is used to create RDDs (Resilient Distributed Datasets) in Spark.
SparkSession provides a unified entry point for reading data from various sources and performing SQL queries.
Q9. Write the code to sort the array.
Code to sort an array of strings
Use the built-in sort() function in the programming language of your choice
If case-insensitive sorting is required, use a custom comparator
Consider the time complexity of the sorting algorithm used
Q10. Explain ur experience
I have 5 years of experience working as a Data Engineer in various industries.
Developed ETL pipelines to extract, transform, and load data from multiple sources into a data warehouse
Optimized database performance by tuning queries and indexes
Implemented data quality checks to ensure accuracy and consistency of data
Worked with cross-functional teams to design and implement data solutions for business needs
More about working at LTIMindtree
Top HR Questions asked in DSP Investment Managers
Interview Process at DSP Investment Managers
Top Data Engineer Interview Questions from Similar Companies
Reviews
Interviews
Salaries
Users/Month