Filter interviews by
Pyspark code to change column name and divide one column by another column.
Use 'withColumnRenamed' method to change column name
Use 'withColumn' method to divide one column by another column
Example: df = df.withColumnRenamed('old_col_name', 'new_col_name').withColumn('new_col_name', df['col1']/df['col2'])
Optimization techniques in PySpark code include partitioning, caching, and using broadcast variables.
Partitioning data based on key columns to optimize join operations
Caching frequently accessed data in memory to avoid recomputation
Using broadcast variables to efficiently share small data across nodes
Using appropriate data types and avoiding unnecessary type conversions
Avoiding shuffling of data by using appropria...
Pyspark code to read csv file and show top 10 records.
Import the necessary libraries
Create a SparkSession
Read the CSV file using the SparkSession
Display the top 10 records using the show() method
Find the greatest number for same key in a Python dictionary.
Use max() function with key parameter to find the maximum value for each key in the dictionary.
Iterate through the dictionary and apply max() function on each key.
If the dictionary is nested, use recursion to iterate through all the keys.
What people are saying about KPMG India
Handling changing schema from source in Hadoop
Use schema evolution techniques like Avro or Parquet to handle schema changes
Implement a flexible ETL pipeline that can handle schema changes
Use tools like Apache NiFi to dynamically adjust schema during ingestion
Common issues include data loss, data corruption, and performance degradation
Resolve issues by implementing proper testing, monitoring, and backup strategies
Spark optimization techniques involve partitioning, caching, and tuning resources for efficient data processing.
Partitioning data to distribute workload evenly
Caching frequently accessed data to avoid recomputation
Tuning resources like memory allocation and parallelism
Using broadcast variables for small lookup tables
I applied via Naukri.com and was interviewed in Sep 2024. There were 2 interview rounds.
Spark optimization techniques involve partitioning, caching, and tuning resources for efficient data processing.
Partitioning data to distribute workload evenly
Caching frequently accessed data to avoid recomputation
Tuning resources like memory allocation and parallelism
Using broadcast variables for small lookup tables
Developed a data pipeline to ingest, process, and analyze real-time streaming data from IoT devices.
Designed and implemented data ingestion process using Apache Kafka
Utilized Apache Spark for real-time data processing and analysis
Developed data models and algorithms to extract insights from the data
Worked with stakeholders to understand requirements and deliver actionable insights
Some challenges faced include data quality issues, scalability issues, and keeping up with evolving technologies.
Data quality issues such as missing values, inconsistencies, and errors in data sources.
Scalability issues when dealing with large volumes of data and ensuring efficient processing.
Keeping up with evolving technologies and tools in the field of data engineering.
Collaborating with cross-functional teams and s...
I applied via Approached by Company and was interviewed in Sep 2022. There were 5 interview rounds.
It was a MCQ test to intepret codes and its outcomes.
Find the greatest number for same key in a Python dictionary.
Use max() function with key parameter to find the maximum value for each key in the dictionary.
Iterate through the dictionary and apply max() function on each key.
If the dictionary is nested, use recursion to iterate through all the keys.
Pyspark code to read csv file and show top 10 records.
Import the necessary libraries
Create a SparkSession
Read the CSV file using the SparkSession
Display the top 10 records using the show() method
Pyspark code to change column name and divide one column by another column.
Use 'withColumnRenamed' method to change column name
Use 'withColumn' method to divide one column by another column
Example: df = df.withColumnRenamed('old_col_name', 'new_col_name').withColumn('new_col_name', df['col1']/df['col2'])
Optimization techniques in PySpark code include partitioning, caching, and using broadcast variables.
Partitioning data based on key columns to optimize join operations
Caching frequently accessed data in memory to avoid recomputation
Using broadcast variables to efficiently share small data across nodes
Using appropriate data types and avoiding unnecessary type conversions
Avoiding shuffling of data by using appropriate tr...
Handling changing schema from source in Hadoop
Use schema evolution techniques like Avro or Parquet to handle schema changes
Implement a flexible ETL pipeline that can handle schema changes
Use tools like Apache NiFi to dynamically adjust schema during ingestion
Common issues include data loss, data corruption, and performance degradation
Resolve issues by implementing proper testing, monitoring, and backup strategies
I applied via Recruitment Consultant and was interviewed in Mar 2021. There were 4 interview rounds.
I applied via Job Portal and was interviewed before Jan 2021. There were 2 interview rounds.
What people are saying about KPMG India
I applied via Naukri.com and was interviewed before Oct 2020. There were 3 interview rounds.
Query to generate state-wise sales summary report with ranking using multiple table joins.
Use SQL JOINs to combine sales, states, and products tables.
Aggregate sales data using SUM() function grouped by state.
Use RANK() or DENSE_RANK() to rank states based on total sales.
Example SQL: SELECT state, SUM(sales) AS total_sales, RANK() OVER (ORDER BY SUM(sales) DESC) AS sales_rank FROM sales_data JOIN states ON sales_data.s...
I appeared for an interview before Jan 2021.
Round duration - 60 minutes
Round difficulty - Easy
This was an online aptitude round on the Talview platform which we can give from our Home. To attempt this test, we have been given a window of 12-24 hours. It was a proctored online test. The round consisted of 60 aptitude questions in 60 minutes of time without any negative marking. The aptitude test consisted of various questions of Logical Reasoning, Mental Ability, Data Interpretation Questions, Numerical Reasoning.
Given a matrix MAT
, your task is to return the transpose of the matrix. The transpose of a matrix is obtained by converting rows into columns and vice versa. Specificall...
Transpose a given matrix by switching rows and columns.
Iterate through the matrix and swap elements at [i][j] with [j][i].
Create a new matrix to store the transposed values.
Ensure the dimensions of the transposed matrix are reversed from the original matrix.
Round duration - 30 minutes
Round difficulty - Medium
This was a video interview round taken immediately after the Aptitude test, it was a proctored round on Talview platform. This round consisted of 8 questions, we have to answer each question in the span of 3 mins and 10 secs were given for reading the question, our audio and video was being recorded while answering those questions. Our soft skills were being judged in this round. The questions basically consisted of HR questions and Guesstimates
Round duration - 120 minutes
Round difficulty - Hard
This round was the case study round, in this round, we were provided a case study which we had to solve in 1 hour using pen and paper and had to share the answers of the case study questions in a PDF format to the HR, the questions were related to market share, advertisement strategies, and can be easily solved using basic mathematics. After this, there was be a one-to-one interview on Zoom Platform in which the interviewer will ask you to describe your approach for every question you have solved in the case study, they will not ask for the exact answer, all they ask is about your approach that you have used to solve a particular problem and the way you explain it. They may or may not ask some guesstimates or puzzles as well in this round. The interviewer was very friendly and he gave me hints as well if I was stuck somewhere.
Round duration - 60 minutes
Round difficulty - Hard
The next round was the Behavioral interview round, in this round, the interviewer asked me about my projects and some guesstimates or puzzles. This round was for 1 hour. The interviewer was very friendly and helpful, she helped me if I was stuck somewhere, she kept me calm during the interview process.
Round duration - 40 minutes
Round difficulty - Hard
The fit round was my final interview round, in this, you will meet the manager/Principal from the company. They will basically check whether you are fit for the organization. They ask about the projects I have done in college/internships. Some basic questions like what are your strengths and weaknesses, why do you want to join the company and some guesstimates or puzzles as well.
On-campus, the behavioral and fit interview rounds were combined into one.
This round was of 40 minutes and the interview was taken on Zoom, the interviewer was helpful and sweet.
The number of pizzas sold in Pune in one day varies depending on factors like day of the week, weather, events, etc.
The number of pizzas sold in Pune can range from hundreds to thousands in a day.
Factors like day of the week (weekend vs weekday), weather (rainy vs sunny), events (festivals, holidays) can impact the sales.
Popular pizza outlets in Pune like Domino's, Pizza Hut, etc. contribute to the overall sales.
Data f...
Tip 1 : Prepare well for guesstimates and puzzles.
Tip 2 : Practice Data Interpretation Questions.
Tip 3 : Be well versed with your projects.
Tip 1 : The resume should be very precise and concise
Tip 2 : Do not add such skills in your resume in which you're not comfortable.
I applied via Company Website and was interviewed in Aug 2024. There were 2 interview rounds.
Uber data model design for efficient storage and retrieval of ride-related information.
Create tables for users, drivers, rides, payments, and ratings
Include attributes like user_id, driver_id, ride_id, payment_id, rating_id, timestamp, location, fare, etc.
Establish relationships between tables using foreign keys
Implement indexing for faster query performance
I applied via Company Website and was interviewed in Dec 2021. There were 11 interview rounds.
A test designed to a determination a person ability in particular skill of field knowledge
Coding and decoding
Devolpment of a particular person group or situation over a period of time
Is a situation faced when individuals collectively make a choice from alternative before them
A task or piEce of work allocated to someone a part of a job or course of study a home work assignment
Coding interview test candidate technical knowledge coding ability creativity typically on a white board
The from are checked and returned to the census officer for coding
I applied via Indeed and was interviewed in Jan 2024. There were 2 interview rounds.
To add a date table in Power BI, you can create a new table with a list of dates and relationships with other tables.
Create a new table in Power BI with a list of dates
Add columns for day, month, year, etc. for additional analysis
Establish relationships between the date table and other tables in the data model
I applied via Naukri.com and was interviewed in Mar 2024. There were 3 interview rounds.
Basic aptitude questions liek tiem distance, percentage , ratio, work and hour
Excel, Numpy, Pandas, basic ML
Case study talking about how to optimise the supply chain using data analytics
Some of the top questions asked at the KPMG India Data Engineer interview for experienced candidates -
The duration of KPMG India Data Engineer interview process can vary, but typically it takes about less than 2 weeks to complete.
based on 1 interview experience
Difficulty level
Duration
based on 10 reviews
Rating in categories
Consultant
8.7k
salaries
| ₹11.3 L/yr - ₹20 L/yr |
Assistant Manager
7.9k
salaries
| ₹15.9 L/yr - ₹27 L/yr |
Associate Consultant
5.1k
salaries
| ₹7.7 L/yr - ₹13 L/yr |
Analyst
3.8k
salaries
| ₹2.5 L/yr - ₹8 L/yr |
Manager
3.5k
salaries
| ₹22 L/yr - ₹38 L/yr |
Cognizant
PwC
Capgemini