Filter interviews by
I applied via Naukri.com and was interviewed in Sep 2024. There were 3 interview rounds.
Some multiple choice, 2 sql and 2 python questions were asked
Developed a real-time data processing system for analyzing customer behavior
Used Apache Kafka for streaming data ingestion
Implemented data pipelines using Apache Spark for processing and analysis
Utilized Elasticsearch for storing and querying large volumes of data
Developed custom machine learning models for predictive analytics
I have used partitioning and indexing to optimize query performance.
Implemented partitioning on large tables to improve query performance by limiting the data scanned
Created indexes on frequently queried columns to speed up data retrieval
Utilized clustering keys to physically organize data on disk for faster access
I applied via Naukri.com and was interviewed in Sep 2024. There were 2 interview rounds.
Spark optimization techniques involve partitioning, caching, and tuning resources for efficient data processing.
Partitioning data to distribute workload evenly
Caching frequently accessed data to avoid recomputation
Tuning resources like memory allocation and parallelism
Using broadcast variables for small lookup tables
Developed a data pipeline to ingest, process, and analyze real-time streaming data from IoT devices.
Designed and implemented data ingestion process using Apache Kafka
Utilized Apache Spark for real-time data processing and analysis
Developed data models and algorithms to extract insights from the data
Worked with stakeholders to understand requirements and deliver actionable insights
Some challenges faced include data quality issues, scalability issues, and keeping up with evolving technologies.
Data quality issues such as missing values, inconsistencies, and errors in data sources.
Scalability issues when dealing with large volumes of data and ensuring efficient processing.
Keeping up with evolving technologies and tools in the field of data engineering.
Collaborating with cross-functional teams and s...
I applied via Naukri.com and was interviewed in Jan 2024. There was 1 interview round.
Columnar storage is a data storage format that stores data in columns rather than rows, improving query performance.
Columnar storage stores data in a column-wise manner instead of row-wise.
It improves query performance by reducing the amount of data that needs to be read from disk.
Parquet is a columnar storage file format that is optimized for big data workloads.
It is used in Apache Spark and other big data processing ...
The project architecture involves the design and organization of data pipelines and systems for efficient data processing and storage.
The architecture includes components such as data sources, data processing frameworks, storage systems, and data delivery mechanisms.
It focuses on scalability, reliability, and performance to handle large volumes of data.
Example: A project architecture may involve using Apache Kafka for ...
To connect SQL server to Databricks, use JDBC/ODBC drivers and configure the connection settings.
Install the appropriate JDBC/ODBC driver for SQL server
Configure the connection settings in Databricks
Use the JDBC/ODBC driver to establish the connection
Optimisation techniques used in data engineering
Partitioning data to improve query performance
Using indexing to speed up data retrieval
Implementing caching mechanisms to reduce data access time
Optimizing data storage formats for efficient storage and processing
Parallel processing and distributed computing for faster data processing
Using compression techniques to reduce storage space and improve data transfer
Applying qu...
What people are saying about KPMG India
I applied via Recruitment Consulltant and was interviewed in May 2024. There was 1 interview round.
KPMG India interview questions for designations
Get interview-ready with Top KPMG India Interview Questions
I applied via Approached by Company and was interviewed in May 2024. There was 1 interview round.
I applied via Approached by Company and was interviewed in May 2023. There were 2 interview rounds.
I applied via Approached by Company and was interviewed in Mar 2023. There were 2 interview rounds.
Function to check if a number is an Armstrong Number
An Armstrong Number is a number that is equal to the sum of its own digits raised to the power of the number of digits
To check if a number is an Armstrong Number, we need to calculate the sum of each digit raised to the power of the number of digits
If the sum is equal to the original number, then it is an Armstrong Number
To initiate Sparkcontext, create a SparkConf object and pass it to SparkContext constructor.
Create a SparkConf object with app name and master URL
Pass the SparkConf object to SparkContext constructor
Example: conf = SparkConf().setAppName('myApp').setMaster('local[*]') sc = SparkContext(conf=conf)
Stop SparkContext using sc.stop()
DataFrames are better than RDDs due to their optimized performance and ease of use.
DataFrames are optimized for better performance than RDDs.
DataFrames have a schema, making it easier to work with structured data.
DataFrames support SQL queries and can be used with Spark SQL.
RDDs are more low-level and require more manual optimization.
RDDs are useful for unstructured data or when fine-grained control is needed.
I applied via Approached by Company and was interviewed in Sep 2022. There were 5 interview rounds.
It was a MCQ test to intepret codes and its outcomes.
Find the greatest number for same key in a Python dictionary.
Use max() function with key parameter to find the maximum value for each key in the dictionary.
Iterate through the dictionary and apply max() function on each key.
If the dictionary is nested, use recursion to iterate through all the keys.
Pyspark code to read csv file and show top 10 records.
Import the necessary libraries
Create a SparkSession
Read the CSV file using the SparkSession
Display the top 10 records using the show() method
Pyspark code to change column name and divide one column by another column.
Use 'withColumnRenamed' method to change column name
Use 'withColumn' method to divide one column by another column
Example: df = df.withColumnRenamed('old_col_name', 'new_col_name').withColumn('new_col_name', df['col1']/df['col2'])
Optimization techniques in PySpark code include partitioning, caching, and using broadcast variables.
Partitioning data based on key columns to optimize join operations
Caching frequently accessed data in memory to avoid recomputation
Using broadcast variables to efficiently share small data across nodes
Using appropriate data types and avoiding unnecessary type conversions
Avoiding shuffling of data by using appropriate tr...
Handling changing schema from source in Hadoop
Use schema evolution techniques like Avro or Parquet to handle schema changes
Implement a flexible ETL pipeline that can handle schema changes
Use tools like Apache NiFi to dynamically adjust schema during ingestion
Common issues include data loss, data corruption, and performance degradation
Resolve issues by implementing proper testing, monitoring, and backup strategies
I applied via Naukri.com and was interviewed before Aug 2023. There were 2 interview rounds.
Integration run time in ADF is a compute infrastructure used to run activities in Azure Data Factory pipelines.
Integration run time is a managed compute infrastructure in Azure Data Factory.
It is used to run activities within pipelines, such as data movement or data transformation tasks.
Integration run time can be auto-scaled based on the workload requirements.
It supports various data integration scenarios, including b...
Data can be copied from on-premise to Azure cloud using various methods like Azure Data Factory, Azure Storage Explorer, Azure Data Migration Service, etc.
Use Azure Data Factory to create data pipelines for moving data from on-premise to Azure cloud
Utilize Azure Storage Explorer to manually copy data from on-premise to Azure Blob Storage
Leverage Azure Data Migration Service for migrating large volumes of data from on-p...
I have 4 members in my family including my parents, my sibling, and myself.
I have 2 parents
I have 1 sibling
I am included in the count
Yes, I am open to relocating for the right opportunity.
I am willing to relocate for the right job opportunity
I am open to exploring new locations and experiences
I understand the importance of being flexible in the job market
Some of the top questions asked at the KPMG India Data Engineer interview -
The duration of KPMG India Data Engineer interview process can vary, but typically it takes about less than 2 weeks to complete.
based on 10 interviews
2 Interview rounds
based on 9 reviews
Rating in categories
Consultant
7.7k
salaries
| ₹6.5 L/yr - ₹27 L/yr |
Assistant Manager
6.9k
salaries
| ₹10.3 L/yr - ₹35.1 L/yr |
Associate Consultant
4.6k
salaries
| ₹4.5 L/yr - ₹16 L/yr |
Analyst
3.5k
salaries
| ₹1 L/yr - ₹9.7 L/yr |
Manager
2.9k
salaries
| ₹15.9 L/yr - ₹50 L/yr |
Cognizant
PwC
Capgemini