i
KPI Partners
Filter interviews by
Clear (1)
I applied via Naukri.com and was interviewed in Nov 2024. There were 3 interview rounds.
Top trending discussions
I applied via Company Website and was interviewed before Feb 2021. There were 5 interview rounds.
It was technical MCQ with 60 questions, based Spark, Hive, Python, ML.
Coding scenario like as - read xml, json using pyspark, flatten nested xml, json and basic data transformation related scenarios.
I applied via Naukri.com and was interviewed before Jul 2021. There were 2 interview rounds.
MDM (Master Data Management) typically consists of three layers: operational, analytical, and data governance.
Operational layer: manages the day-to-day data operations and transactions.
Analytical layer: focuses on data analysis and reporting for decision-making.
Data governance layer: ensures data quality, security, and compliance.
Example: In a retail company, the operational layer manages customer transactions, the ana...
I applied via Naukri.com and was interviewed in May 2024. There was 1 interview round.
PySpark architecture is a distributed computing framework that combines Python and Spark to process big data.
PySpark architecture consists of a driver program, cluster manager, and worker nodes.
The driver program is responsible for creating SparkContext, which connects to the cluster manager.
Cluster manager allocates resources and schedules tasks on worker nodes.
Worker nodes execute the tasks and return results to the ...
Skewed partitioning is when data is not evenly distributed across partitions, leading to performance issues.
Skewed partitioning can occur when a key column has a few values that are much more common than others.
It can lead to uneven processing and resource utilization in distributed systems like Hadoop or Spark.
To address skewed partitioning, techniques like data skew detection, data skew handling, and data skew preven
Spark architecture refers to the structure of Apache Spark, a distributed computing framework.
Spark architecture consists of a cluster manager, worker nodes, and a driver program.
The cluster manager allocates resources and schedules tasks across worker nodes.
Worker nodes execute tasks in parallel and store data in memory or disk.
The driver program coordinates the execution of tasks and manages the overall workflow.
Spar...
Optimizing Spark jobs involves tuning configurations, partitioning data, using appropriate data structures, and leveraging caching.
Tune Spark configurations for optimal performance
Partition data to distribute workload evenly
Use appropriate data structures like DataFrames or Datasets
Leverage caching to avoid recomputation
Optimize shuffle operations to reduce data movement
It was also very easy.
I applied via Recruitment Consulltant and was interviewed before Jul 2022. There were 4 interview rounds.
Mostly coding questions on Python, pyspark and SQL of medium difficulty.
Dynamic data frame in AWS Glue job is a dynamically generated data frame, while Spark data frame is specifically created using Spark APIs.
Dynamic data frame is generated dynamically at runtime based on the data source and schema, while Spark data frame is explicitly created using Spark APIs.
Dynamic data frame is more flexible but may have performance implications compared to Spark data frame.
You can convert a dynamic d...
I applied via Naukri.com and was interviewed in Aug 2020. There were 4 interview rounds.
Some of the top questions asked at the KPI Partners Senior Data Engineer interview -
based on 2 interviews
Interview experience
based on 15 reviews
Rating in categories
Data Engineer
85
salaries
| ₹0 L/yr - ₹0 L/yr |
Senior Data Engineer
56
salaries
| ₹0 L/yr - ₹0 L/yr |
Lead Data Engineer
49
salaries
| ₹0 L/yr - ₹0 L/yr |
Senior Consultant
45
salaries
| ₹0 L/yr - ₹0 L/yr |
Senior Data Analyst
24
salaries
| ₹0 L/yr - ₹0 L/yr |
Accenture
Deloitte
PwC
KPMG India