Filter interviews by
Basics of sql and joins
I applied via Campus Placement and was interviewed before Aug 2023. There were 4 interview rounds.
Generic aptitude test
Top trending discussions
I applied via Naukri.com and was interviewed in Oct 2024. There were 2 interview rounds.
Spark performance problems can arise due to inefficient code, data skew, resource constraints, and improper configuration.
Inefficient code can lead to slow performance, such as using collect() on large datasets.
Data skew can cause uneven distribution of data across partitions, impacting processing time.
Resource constraints like insufficient memory or CPU can result in slow Spark jobs.
Improper configuration settings, su...
posted on 28 Aug 2024
I have experience working on projects involving data pipeline development, ETL processes, and data warehousing.
Developed ETL processes to extract, transform, and load data from various sources into a data warehouse
Built data pipelines to automate the flow of data between systems and ensure data quality and consistency
Optimized database performance and implemented data modeling best practices
Worked on real-time data pro...
I was interviewed in Aug 2024.
I applied via Naukri.com and was interviewed in Oct 2024. There was 1 interview round.
Incremental load in pyspark refers to loading only new or updated data into a dataset without reloading the entire dataset.
Use the 'delta' function in pyspark to perform incremental loads by specifying the 'mergeSchema' option.
Utilize the 'partitionBy' function to optimize incremental loads by partitioning the data based on specific columns.
Implement a logic to identify new or updated records based on timestamps or uni...
My strengths include strong analytical skills, attention to detail, and problem-solving abilities.
Strong analytical skills - able to analyze complex data sets and derive meaningful insights
Attention to detail - meticulous in ensuring data accuracy and quality
Problem-solving abilities - adept at identifying and resolving data-related issues
Experience with data manipulation tools like SQL, Python, and Spark
Seeking new challenges and growth opportunities in a different environment.
Looking for new challenges to enhance my skills and knowledge
Seeking growth opportunities that align with my career goals
Interested in exploring different technologies and industries
Want to work in a more collaborative team environment
Seeking better work-life balance or location proximity
I applied via LinkedIn and was interviewed in Jan 2024. There was 1 interview round.
Pyspark is a Python API for Apache Spark, a powerful open-source distributed computing system.
Pyspark is used for processing large datasets in parallel across a cluster of computers.
It provides high-level APIs in Python for Spark programming.
Pyspark allows seamless integration with other Python libraries like Pandas and NumPy.
Example: Using Pyspark to perform data analysis and machine learning tasks on big data sets.
Pyspark SQL is a module in Apache Spark that provides a SQL interface for working with structured data.
Pyspark SQL allows users to run SQL queries on Spark dataframes.
It provides a more concise and user-friendly way to interact with data compared to traditional Spark RDDs.
Users can leverage the power of SQL for data manipulation and analysis within the Spark ecosystem.
To merge 2 dataframes of different schema, use join operations or data transformation techniques.
Use join operations like inner join, outer join, left join, or right join based on the requirement.
Perform data transformation to align the schemas before merging.
Use tools like Apache Spark, Pandas, or SQL to merge dataframes with different schemas.
Pyspark streaming is a scalable and fault-tolerant stream processing engine built on top of Apache Spark.
Pyspark streaming allows for real-time processing of streaming data.
It provides high-level APIs in Python for creating streaming applications.
Pyspark streaming supports various data sources like Kafka, Flume, Kinesis, etc.
It enables windowed computations and stateful processing for handling streaming data.
Example: C...
I applied via Referral and was interviewed in Feb 2024. There was 1 interview round.
Just focus on the basics of pyspark.
based on 2 interviews
Interview experience
based on 20 reviews
Rating in categories
Senior Engineer
884
salaries
| ₹6.2 L/yr - ₹22.9 L/yr |
Senior Software Engineer
562
salaries
| ₹6.8 L/yr - ₹25.9 L/yr |
Software Engineer
259
salaries
| ₹3.5 L/yr - ₹14 L/yr |
Technical Specialist
210
salaries
| ₹10.9 L/yr - ₹38.5 L/yr |
Software Development Engineer
188
salaries
| ₹4 L/yr - ₹12 L/yr |
Accenture
TCS
Infosys
Wipro