Publicis Sapient
Marelli Talbros Chassis Systems Interview Questions and Answers
Q1. What are the common file formats used in data storages? Which one is best for compression?
Common file formats used in data storages include CSV, JSON, Parquet, Avro, and ORC. Parquet is best for compression.
CSV (Comma-Separated Values) - simple and widely used, but not efficient for large datasets
JSON (JavaScript Object Notation) - human-readable and easy to parse, but can be inefficient for storage
Parquet - columnar storage format that is highly efficient for compression and query performance
Avro - efficient binary format with schema support, good for data serial...read more
Q2. Given the list of words, write the Python program to print the most repeating substring out of all words.
Python program to find the most repeating substring in a list of words.
Iterate through each word in the list
Generate all possible substrings for each word
Count the occurrences of each substring using a dictionary
Find the substring with the highest count
Q3. What is Application Master in Spark
Application Master in Spark is responsible for negotiating resources with the ResourceManager and executing tasks on the cluster.
Responsible for negotiating resources with the ResourceManager
Manages the execution of tasks on the cluster
Monitors the progress of tasks and reports back to the driver program
Q4. Architecture of Spark
Spark is a distributed computing framework that provides in-memory processing capabilities for big data analytics.
Spark has a master-slave architecture with a central coordinator called the Spark Master and distributed workers called Spark Workers.
It uses Resilient Distributed Datasets (RDDs) for fault-tolerant distributed data processing.
Spark supports various data sources like HDFS, Cassandra, and S3 for input/output operations.
It includes components like Spark SQL for stru...read more
Q5. Optimization on spark
Optimizing Spark involves tuning configurations, partitioning data, using efficient transformations, and caching intermediate results.
Tune Spark configurations for optimal performance
Partition data to distribute workload evenly
Use efficient transformations like map, filter, and reduce
Cache intermediate results to avoid recomputation
Q6. Azure services experience
I have extensive experience working with various Azure services such as Azure Data Factory, Azure Databricks, Azure SQL Database, and Azure Blob Storage.
Experience with Azure Data Factory for ETL processes
Proficiency in using Azure Databricks for big data processing
Knowledge of Azure SQL Database for data storage and querying
Familiarity with Azure Blob Storage for storing unstructured data
Interview Process at Marelli Talbros Chassis Systems
Top Senior Data Engineer Interview Questions from Similar Companies
Reviews
Interviews
Salaries
Users/Month