Filter interviews by
I applied via Company Website and was interviewed in Jul 2024. There were 3 interview rounds.
I had a few questions regarding statistics and probability, along with one question each on Python and SQL.
I applied via Naukri.com and was interviewed in Sep 2021. There were 5 interview rounds.
I applied via Naukri.com and was interviewed in Oct 2024. There were 2 interview rounds.
I applied via Naukri.com and was interviewed in Oct 2024. There was 1 interview round.
Incremental load in pyspark refers to loading only new or updated data into a dataset without reloading the entire dataset.
Use the 'delta' function in pyspark to perform incremental loads by specifying the 'mergeSchema' option.
Utilize the 'partitionBy' function to optimize incremental loads by partitioning the data based on specific columns.
Implement a logic to identify new or updated records based on timestamps or uni...
I applied via LinkedIn and was interviewed in Jan 2024. There was 1 interview round.
Pyspark is a Python API for Apache Spark, a powerful open-source distributed computing system.
Pyspark is used for processing large datasets in parallel across a cluster of computers.
It provides high-level APIs in Python for Spark programming.
Pyspark allows seamless integration with other Python libraries like Pandas and NumPy.
Example: Using Pyspark to perform data analysis and machine learning tasks on big data sets.
Pyspark SQL is a module in Apache Spark that provides a SQL interface for working with structured data.
Pyspark SQL allows users to run SQL queries on Spark dataframes.
It provides a more concise and user-friendly way to interact with data compared to traditional Spark RDDs.
Users can leverage the power of SQL for data manipulation and analysis within the Spark ecosystem.
To merge 2 dataframes of different schema, use join operations or data transformation techniques.
Use join operations like inner join, outer join, left join, or right join based on the requirement.
Perform data transformation to align the schemas before merging.
Use tools like Apache Spark, Pandas, or SQL to merge dataframes with different schemas.
Pyspark streaming is a scalable and fault-tolerant stream processing engine built on top of Apache Spark.
Pyspark streaming allows for real-time processing of streaming data.
It provides high-level APIs in Python for creating streaming applications.
Pyspark streaming supports various data sources like Kafka, Flume, Kinesis, etc.
It enables windowed computations and stateful processing for handling streaming data.
Example: C...
I applied via Referral and was interviewed in Feb 2024. There was 1 interview round.
Just focus on the basics of pyspark.
posted on 9 May 2022
I applied via Approached by Company and was interviewed in Nov 2021. There was 1 interview round.
Normalization is a process of organizing data in a database to reduce redundancy and improve data integrity.
Normalization involves breaking down a table into smaller tables and defining relationships between them.
It helps in reducing data redundancy and inconsistencies.
Views are virtual tables that are created based on the result of a query. They can be used to simplify complex queries.
Stored procedures are precompiled...
I applied via Naukri.com and was interviewed in Sep 2024. There was 1 interview round.
Interview experience
Data Analyst
32
salaries
| ₹3 L/yr - ₹10.5 L/yr |
Senior Software Engineer
20
salaries
| ₹12.3 L/yr - ₹21 L/yr |
Software Engineer
19
salaries
| ₹6 L/yr - ₹10.6 L/yr |
Data Scientist
19
salaries
| ₹4 L/yr - ₹14 L/yr |
Senior Consultant
14
salaries
| ₹8 L/yr - ₹22.9 L/yr |
Crisil
ICRA
Genpact
TCS