Eaton Interview Questions and Answers

Question 1

Asked in

Senior Data Engineer Interview

Q1. What does the purpose of the Spark Submit command in Apache Spark?

Add your answer

Answer

Spark Submit command is used to submit Spark applications to a cluster.

Spark Submit command is used to launch applications on a Spark cluster.
It allows users to specify application parameters like main class, jars, and arguments.
Users can also configure properties like memory allocation and number of executors.
Example: spark-submit --class com.example.Main --master yarn --deploy-mode cluster myApp.jar

Question 2

Asked in

Senior Data Engineer Interview

Q2. What are the common file formats used in data storages? Which one is best for compression?

Add your answer

Answer

Common file formats used in data storages include CSV, JSON, Parquet, Avro, and ORC. Parquet is best for compression.

CSV (Comma-Separated Values) - simple and widely used, but not efficient for large datasets
JSON (JavaScript Object Notation) - human-readable and easy to parse, but can be inefficient for storage
Parquet - columnar storage format that is highly efficient for compression and query performance
Avro - efficient binary format with schema support, good for data serial...read more

Question 3

Asked in

Senior Data Engineer Interview

Q3. What is the difference between Cache() and Persist()?

Add your answer

Answer

Cache() and Persist() are both used for caching RDDs in Apache Spark, but Persist() allows for more customization.

Cache() is a shorthand for Persist(StorageLevel.MEMORY_ONLY)
Persist() allows for specifying different storage levels like MEMORY_ONLY, MEMORY_AND_DISK, etc.
Persist() also allows for specifying serialization formats like Java serialization, Kryo serialization, etc.

Question 4

Asked in

Senior Data Engineer Interview

Q4. Given the list of words, write the Python program to print the most repeating substring out of all words.

Add your answer

Answer

Python program to find the most repeating substring in a list of words.

Iterate through each word in the list
Generate all possible substrings for each word
Count the occurrences of each substring using a dictionary
Find the substring with the highest count

Question 5

Asked in

Senior Data Engineer Interview

Q5. What are window functions in SQL?

Add your answer

Answer

Window functions in SQL are used to perform calculations across a set of table rows related to the current row.

Window functions operate on a set of rows related to the current row
They can be used to calculate running totals, ranks, and averages
Examples include ROW_NUMBER(), RANK(), and SUM() OVER()

Question 6

Asked in

Senior Data Engineer Interview

Q6. What is Application Master in Spark

Add your answer

Answer

Application Master in Spark is responsible for negotiating resources with the ResourceManager and executing tasks on the cluster.

Responsible for negotiating resources with the ResourceManager
Manages the execution of tasks on the cluster
Monitors the progress of tasks and reports back to the driver program

Question 7

Asked in

Senior Data Engineer Interview

Q7. Architecture of Spark

Add your answer

Answer

Spark is a distributed computing framework that provides in-memory processing capabilities for big data analytics.

Spark has a master-slave architecture with a central coordinator called the Spark Master and distributed workers called Spark Workers.
It uses Resilient Distributed Datasets (RDDs) for fault-tolerant distributed data processing.
Spark supports various data sources like HDFS, Cassandra, and S3 for input/output operations.
It includes components like Spark SQL for stru...read more

Question 8

Asked in

Senior Data Engineer Interview

Q8. Optimization on spark

Add your answer

Answer

Optimizing Spark involves tuning configurations, partitioning data, using efficient transformations, and caching intermediate results.

Tune Spark configurations for optimal performance
Partition data to distribute workload evenly
Use efficient transformations like map, filter, and reduce
Cache intermediate results to avoid recomputation

Question 9

Asked in

Senior Data Engineer Interview

Q9. Azure services experience

Add your answer

Answer

I have extensive experience working with various Azure services such as Azure Data Factory, Azure Databricks, Azure SQL Database, and Azure Blob Storage.

Experience with Azure Data Factory for ETL processes
Proficiency in using Azure Databricks for big data processing
Knowledge of Azure SQL Database for data storage and querying
Familiarity with Azure Blob Storage for storing unstructured data

Eaton Interview Questions and Answers

Q1. What does the purpose of the Spark Submit command in Apache Spark?

Q2. What are the common file formats used in data storages? Which one is best for compression?

Q3. What is the difference between Cache() and Persist()?

Q4. Given the list of words, write the Python program to print the most repeating substring out of all words.

Q5. What are window functions in SQL?

Q6. What is Application Master in Spark

Q7. Architecture of Spark

Q8. Optimization on spark

Q9. Azure services experience

More about working at Publicis Sapient

Interview Process at Eaton

Top Senior Data Engineer Interview Questions from Similar Companies