i
OSI Digital
Filter interviews by
I applied via Approached by Company and was interviewed in Nov 2023. There was 1 interview round.
Coalesce reduces the number of partitions in a DataFrame, while repartition reshuffles the data across a specified number of partitions in Spark.
Coalesce is used to reduce the number of partitions in a DataFrame without shuffling the data
Repartition is used to increase or decrease the number of partitions in a DataFrame by shuffling the data across the specified number of partitions
Coalesce is more efficient than repar...
Top trending discussions
I was interviewed in Oct 2024.
Designing an ADF pipeline for data processing
Identify data sources and destinations
Define data transformations and processing steps
Consider scheduling and monitoring requirements
Utilize ADF activities like Copy Data, Data Flow, and Databricks
Implement error handling and logging mechanisms
Discussing expected and current salary for negotiation purposes.
Be honest about your current salary and provide a realistic expectation for your desired salary.
Highlight your skills and experience that justify your desired salary.
Be open to negotiation and willing to discuss other benefits besides salary.
Research industry standards and salary ranges for similar positions to support your negotiation.
Focus on the value y...
I applied via Naukri.com and was interviewed in Oct 2024. There were 2 interview rounds.
I applied via Campus Placement
Based on SQL , statistics , python , cognitive
Address toxic work culture by open communication, setting boundaries, seeking support, and considering leaving if necessary.
Open communication with colleagues and management about issues
Set boundaries to protect your mental and emotional well-being
Seek support from HR, a mentor, or a therapist if needed
Consider leaving the toxic work environment if the situation does not improve
I was interviewed in Aug 2024.
I applied via Naukri.com and was interviewed in Oct 2024. There was 1 interview round.
Incremental load in pyspark refers to loading only new or updated data into a dataset without reloading the entire dataset.
Use the 'delta' function in pyspark to perform incremental loads by specifying the 'mergeSchema' option.
Utilize the 'partitionBy' function to optimize incremental loads by partitioning the data based on specific columns.
Implement a logic to identify new or updated records based on timestamps or uni...
I applied via Campus Placement and was interviewed in Aug 2024. There were 2 interview rounds.
Java and sql questions
Simple java program for find factorial and prime number
I applied via LinkedIn and was interviewed in Jan 2024. There was 1 interview round.
Pyspark is a Python API for Apache Spark, a powerful open-source distributed computing system.
Pyspark is used for processing large datasets in parallel across a cluster of computers.
It provides high-level APIs in Python for Spark programming.
Pyspark allows seamless integration with other Python libraries like Pandas and NumPy.
Example: Using Pyspark to perform data analysis and machine learning tasks on big data sets.
Pyspark SQL is a module in Apache Spark that provides a SQL interface for working with structured data.
Pyspark SQL allows users to run SQL queries on Spark dataframes.
It provides a more concise and user-friendly way to interact with data compared to traditional Spark RDDs.
Users can leverage the power of SQL for data manipulation and analysis within the Spark ecosystem.
To merge 2 dataframes of different schema, use join operations or data transformation techniques.
Use join operations like inner join, outer join, left join, or right join based on the requirement.
Perform data transformation to align the schemas before merging.
Use tools like Apache Spark, Pandas, or SQL to merge dataframes with different schemas.
Pyspark streaming is a scalable and fault-tolerant stream processing engine built on top of Apache Spark.
Pyspark streaming allows for real-time processing of streaming data.
It provides high-level APIs in Python for creating streaming applications.
Pyspark streaming supports various data sources like Kafka, Flume, Kinesis, etc.
It enables windowed computations and stateful processing for handling streaming data.
Example: C...
Interview experience
Software Engineer
161
salaries
| ₹3.2 L/yr - ₹13 L/yr |
Senior Software Engineer
154
salaries
| ₹6.2 L/yr - ₹23 L/yr |
Associate Software Engineer
126
salaries
| ₹3 L/yr - ₹7.5 L/yr |
Associate Technical Leader
61
salaries
| ₹10 L/yr - ₹23.2 L/yr |
Technical Lead
54
salaries
| ₹12.4 L/yr - ₹25.5 L/yr |
TCS
Infosys
Wipro
HCLTech