i
Impetus Technologies
Filter interviews by
I applied via Naukri.com and was interviewed in Jun 2022. There were 4 interview rounds.
Explaining partitioning, bucketing, joins, optimization, broadcast variables, ORC vs Parquet, RDD vs Dataframe, project architecture and responsibilities for Big Data Engineer role.
Partitioning is dividing data into smaller chunks for parallel processing, while bucketing is organizing data into buckets based on a hash function.
Types of joins in Spark include inner join, outer join, left join, right join, and full outer...
Remove duplicate records and find 5th highest salary department wise using SQL.
Use DISTINCT keyword to remove duplicate records.
Use GROUP BY clause to group the records by department.
Use ORDER BY clause to sort the salaries in descending order.
Use LIMIT clause to get the 5th highest salary.
Combine all the above clauses to get the desired result.
I applied via Naukri.com and was interviewed in Jul 2021. There was 1 interview round.
Spark can handle upserts using merge() function
Use merge() function to handle upserts in Spark
Specify the primary key column(s) to identify matching rows
Specify the update column(s) to update existing rows
Specify the insert column(s) to insert new rows
Example: df1.merge(df2, on='id', whenMatched='update', whenNotMatched='insert')
I applied via Naukri.com and was interviewed in Jun 2021. There were 4 interview rounds.
Java collection is a single interface while collections is a utility class.
Java collection is an interface that provides a unified architecture for manipulating and storing groups of objects.
Collections is a utility class that provides static methods for working with collections.
Java collection is a part of the Java Collections Framework while collections is not.
Examples of Java collections include List, Set, and Map w...
Spark memory optimisation techniques
Use broadcast variables to reduce memory usage
Use persist() or cache() to store RDDs in memory
Use partitioning to reduce shuffling and memory usage
Use off-heap memory to avoid garbage collection overhead
Tune memory settings such as spark.driver.memory and spark.executor.memory
Hadoop serialisation techniques are used to convert data into a format that can be stored and processed in Hadoop.
Hadoop uses Writable interface for serialisation and deserialisation of data
Avro, Thrift, and Protocol Buffers are popular serialisation frameworks used in Hadoop
Serialisation can be customised using custom Writable classes or external libraries
Serialisation plays a crucial role in Hadoop performance and ef
I applied via Naukri.com and was interviewed in Jan 2021. There were 5 interview rounds.
Impetus Technologies interview questions for designations
Top trending discussions
Very Good. Easy question asked
Moderate Question Asked
I applied via Naukri.com and was interviewed in Oct 2023. There were 2 interview rounds.
The word count program test case has to be successful
posted on 3 Jun 2021
I applied via Referral and was interviewed in Dec 2020. There were 3 interview rounds.
Parquet is columnar storage format while Avro is row-based storage format.
Parquet is optimized for analytics and is efficient for reading large datasets.
Avro is optimized for serialization and is efficient for writing data to disk.
Parquet supports compression and encoding schemes while Avro supports schema evolution.
Parquet is used in Hadoop ecosystem while Avro is used in Kafka and Hadoop ecosystem.
I applied via Recruitment Consulltant and was interviewed in Jul 2024. There was 1 interview round.
Display in Databricks is used to visualize data in a tabular format or as charts/graphs.
Display function is used to show data in a tabular format in Databricks notebooks.
It can also be used to create visualizations like charts and graphs.
Display can be customized with different options like title, labels, and chart types.
To create a workflow in Databricks, use Databricks Jobs or Databricks Notebooks with scheduling capabilities.
Use Databricks Jobs to create and schedule workflows in Databricks.
Utilize Databricks Notebooks to define the workflow steps and dependencies.
Leverage Databricks Jobs API for programmatic workflow creation and management.
Use Databricks Jobs UI to visually design and schedule workflows.
Integrate with Databricks D
Coding question for SQL and multiple choice questions around pyspark and python
Some of the top questions asked at the Impetus Technologies Big Data Engineer interview -
based on 6 reviews
Rating in categories
Senior Software Engineer
715
salaries
| ₹8.2 L/yr - ₹30 L/yr |
Software Engineer
537
salaries
| ₹5 L/yr - ₹20.5 L/yr |
Module Lead Software Engineer
277
salaries
| ₹11.2 L/yr - ₹37.5 L/yr |
Module Lead
251
salaries
| ₹12 L/yr - ₹35 L/yr |
Lead Software Engineer
196
salaries
| ₹15.2 L/yr - ₹39.8 L/yr |
Persistent Systems
TCS
Infosys
Wipro