i
OSI Digital
Filter interviews by
I applied via Approached by Company and was interviewed in Nov 2023. There was 1 interview round.
Coalesce reduces the number of partitions in a DataFrame, while repartition reshuffles the data across a specified number of partitions in Spark.
Coalesce is used to reduce the number of partitions in a DataFrame without shuffling the data
Repartition is used to increase or decrease the number of partitions in a DataFrame by shuffling the data across the specified number of partitions
Coalesce is more efficient than repar...
Top trending discussions
I applied via Naukri.com and was interviewed in Oct 2024. There were 2 interview rounds.
I applied via Campus Placement
Based on SQL , statistics , python , cognitive
Address toxic work culture by open communication, setting boundaries, seeking support, and considering leaving if necessary.
Open communication with colleagues and management about issues
Set boundaries to protect your mental and emotional well-being
Seek support from HR, a mentor, or a therapist if needed
Consider leaving the toxic work environment if the situation does not improve
I was interviewed in Aug 2024.
I applied via Naukri.com and was interviewed in Oct 2024. There was 1 interview round.
Incremental load in pyspark refers to loading only new or updated data into a dataset without reloading the entire dataset.
Use the 'delta' function in pyspark to perform incremental loads by specifying the 'mergeSchema' option.
Utilize the 'partitionBy' function to optimize incremental loads by partitioning the data based on specific columns.
Implement a logic to identify new or updated records based on timestamps or uni...
I applied via Naukri.com and was interviewed in Mar 2024. There were 2 interview rounds.
Technical Assessment Test (MCQs) - 30 mins.
Use the split() method to convert a string with multiple lines into a list of strings.
Use the split() method with the newline character '\n' as the delimiter to split the string into a list of strings.
Example: 'Hello\nWorld\n' -> ['Hello', 'World']
Convert a string of multiple lines with 'n' words to multiple arrays of fixed size without overlap.
Split the string into individual words
Create arrays of fixed size 'k' and distribute words evenly
Handle cases where the number of words is not divisible by 'k'
I applied via Campus Placement and was interviewed in Aug 2024. There were 2 interview rounds.
Java and sql questions
Simple java program for find factorial and prime number
I applied via Referral and was interviewed in Apr 2024. There were 2 interview rounds.
Power BI offers different types of licenses for data modeling, including Power BI Pro and Power BI Premium.
Power BI Pro license allows users to create and share reports and dashboards with others.
Power BI Premium license offers additional features such as larger data capacity and advanced AI capabilities.
Power BI Embedded license is designed for embedding reports and dashboards into custom applications.
Power BI Report ...
I applied via LinkedIn and was interviewed in Jan 2024. There was 1 interview round.
Pyspark is a Python API for Apache Spark, a powerful open-source distributed computing system.
Pyspark is used for processing large datasets in parallel across a cluster of computers.
It provides high-level APIs in Python for Spark programming.
Pyspark allows seamless integration with other Python libraries like Pandas and NumPy.
Example: Using Pyspark to perform data analysis and machine learning tasks on big data sets.
Pyspark SQL is a module in Apache Spark that provides a SQL interface for working with structured data.
Pyspark SQL allows users to run SQL queries on Spark dataframes.
It provides a more concise and user-friendly way to interact with data compared to traditional Spark RDDs.
Users can leverage the power of SQL for data manipulation and analysis within the Spark ecosystem.
To merge 2 dataframes of different schema, use join operations or data transformation techniques.
Use join operations like inner join, outer join, left join, or right join based on the requirement.
Perform data transformation to align the schemas before merging.
Use tools like Apache Spark, Pandas, or SQL to merge dataframes with different schemas.
Pyspark streaming is a scalable and fault-tolerant stream processing engine built on top of Apache Spark.
Pyspark streaming allows for real-time processing of streaming data.
It provides high-level APIs in Python for creating streaming applications.
Pyspark streaming supports various data sources like Kafka, Flume, Kinesis, etc.
It enables windowed computations and stateful processing for handling streaming data.
Example: C...
Senior Software Engineer
155
salaries
| ₹6.2 L/yr - ₹19.4 L/yr |
Software Engineer
154
salaries
| ₹3.2 L/yr - ₹12 L/yr |
Associate Software Engineer
124
salaries
| ₹3 L/yr - ₹7.5 L/yr |
Associate Technical Leader
61
salaries
| ₹10 L/yr - ₹23.2 L/yr |
Technical Lead
58
salaries
| ₹12.4 L/yr - ₹26 L/yr |
TCS
Infosys
Wipro
HCLTech