i
Sigmoid
Filter interviews by
Find indices of an element in a non-decreasing array
Iterate through the array and keep track of the indices where the element X is found
Return the list of indices or [-1, -1] if element X is not found
Handle edge cases like empty array or X not present in the array
inferschema in pyspark is used to automatically infer the schema of a file when reading it.
inferschema is a parameter in pyspark that can be set to true when reading a file to automatically infer the schema based on the data
It is useful when the schema of the file is not known beforehand
Example: df = spark.read.csv('file.csv', header=True, inferSchema=True)
SCD stands for Slowly Changing Dimension in Data Warehousing.
SCD is a technique used in data warehousing to track changes to dimension data over time.
There are different types of SCDs - Type 1, Type 2, and Type 3.
Type 1 SCD overwrites old data with new data, Type 2 creates new records for changes, and Type 3 maintains both old and new values in separate columns.
Example: In a customer dimension table, if a customer...
Repartition is used to increase the number of partitions in a DataFrame, while coalesce is used to decrease the number of partitions.
Repartition involves shuffling data across the network, which can be expensive in terms of performance and resources.
Coalesce is a more efficient operation as it minimizes data movement by only merging existing partitions.
Repartition is typically used when there is a need for more pa...
Rank assigns unique ranks to each distinct value, while dense rank assigns ranks without gaps.
Rank function assigns unique ranks to each distinct value in a result set.
Dense rank function assigns ranks to rows in a result set without any gaps between the ranks.
Rank function may skip ranks if there are ties in values, while dense rank will not skip ranks.
Read a parquet file using PySpark and remove duplicates based on specified columns.
Use PySpark to read the parquet file: `df = spark.read.parquet('path/to/file.parquet')`.
Identify the columns to check for duplicates, e.g., `['column1', 'column2']`.
Use the `dropDuplicates()` method: `df_unique = df.dropDuplicates(['column1', 'column2'])`.
Write the cleaned DataFrame back to a parquet file: `df_unique.write.parquet('...
Understanding the number of rows returned by different types of SQL joins is crucial for data analysis.
Inner Join: Returns rows with matching values in both tables. Example: 10 rows from Table A and 15 from Table B may yield 5 rows.
Left Join: Returns all rows from the left table and matched rows from the right. Example: 10 rows from A and 5 matches in B yield 10 rows.
Right Join: Returns all rows from the right tab...
Optimizing techniques in Spark involve partitioning, caching, and tuning resources for efficient data processing.
Use partitioning to distribute data evenly across nodes for parallel processing
Cache frequently accessed data in memory to avoid recomputation
Tune resources such as memory allocation and parallelism settings for optimal performance
Normalization in databases is the process of organizing data in a database to reduce redundancy and improve data integrity.
Normalization is used to eliminate redundant data and ensure data integrity.
It involves breaking down a table into smaller tables and defining relationships between them.
There are different normal forms such as 1NF, 2NF, 3NF, and BCNF.
Normalization helps in reducing data redundancy and improvi...
Transformation involves changing the data structure, while action involves performing a computation on the data.
Transformation changes the data structure without executing any computation
Action performs a computation on the data and triggers the execution
Examples of transformation include map, filter, and reduce in Spark or Pandas
Examples of action include count, collect, and saveAsTextFile in Spark
Find indices of an element in a non-decreasing array
Iterate through the array and keep track of the indices where the element X is found
Return the list of indices or [-1, -1] if element X is not found
Handle edge cases like empty array or X not present in the array
I applied via Naukri.com and was interviewed in Sep 2024. There was 1 interview round.
SCD stands for Slowly Changing Dimension in Data Warehousing.
SCD is a technique used in data warehousing to track changes to dimension data over time.
There are different types of SCDs - Type 1, Type 2, and Type 3.
Type 1 SCD overwrites old data with new data, Type 2 creates new records for changes, and Type 3 maintains both old and new values in separate columns.
Example: In a customer dimension table, if a customer chan...
inferschema in pyspark is used to automatically infer the schema of a file when reading it.
inferschema is a parameter in pyspark that can be set to true when reading a file to automatically infer the schema based on the data
It is useful when the schema of the file is not known beforehand
Example: df = spark.read.csv('file.csv', header=True, inferSchema=True)
Rank assigns unique ranks to each distinct value, while dense rank assigns ranks without gaps.
Rank function assigns unique ranks to each distinct value in a result set.
Dense rank function assigns ranks to rows in a result set without any gaps between the ranks.
Rank function may skip ranks if there are ties in values, while dense rank will not skip ranks.
Understanding the number of rows returned by different types of SQL joins is crucial for data analysis.
Inner Join: Returns rows with matching values in both tables. Example: 10 rows from Table A and 15 from Table B may yield 5 rows.
Left Join: Returns all rows from the left table and matched rows from the right. Example: 10 rows from A and 5 matches in B yield 10 rows.
Right Join: Returns all rows from the right table an...
Read a parquet file using PySpark and remove duplicates based on specified columns.
Use PySpark to read the parquet file: `df = spark.read.parquet('path/to/file.parquet')`.
Identify the columns to check for duplicates, e.g., `['column1', 'column2']`.
Use the `dropDuplicates()` method: `df_unique = df.dropDuplicates(['column1', 'column2'])`.
Write the cleaned DataFrame back to a parquet file: `df_unique.write.parquet('path/...
Optimizing techniques in Spark involve partitioning, caching, and tuning resources for efficient data processing.
Use partitioning to distribute data evenly across nodes for parallel processing
Cache frequently accessed data in memory to avoid recomputation
Tune resources such as memory allocation and parallelism settings for optimal performance
Repartition is used to increase the number of partitions in a DataFrame, while coalesce is used to decrease the number of partitions.
Repartition involves shuffling data across the network, which can be expensive in terms of performance and resources.
Coalesce is a more efficient operation as it minimizes data movement by only merging existing partitions.
Repartition is typically used when there is a need for more paralle...
Normalization in databases is the process of organizing data in a database to reduce redundancy and improve data integrity.
Normalization is used to eliminate redundant data and ensure data integrity.
It involves breaking down a table into smaller tables and defining relationships between them.
There are different normal forms such as 1NF, 2NF, 3NF, and BCNF.
Normalization helps in reducing data redundancy and improving qu...
Transformation involves changing the data structure, while action involves performing a computation on the data.
Transformation changes the data structure without executing any computation
Action performs a computation on the data and triggers the execution
Examples of transformation include map, filter, and reduce in Spark or Pandas
Examples of action include count, collect, and saveAsTextFile in Spark
Share price related question.most profit
Top trending discussions
I applied via Approached by Company and was interviewed before Jul 2021. There was 1 interview round.
To write a REST API from scratch, I would follow these steps:
Define the resources and endpoints
Choose a programming language and framework
Implement CRUD operations for each resource
Use HTTP methods and status codes correctly
Add authentication and authorization
Test the API using tools like Postman
Document the API using tools like Swagger
I value higher education for its role in personal growth and professional development in the tech industry.
Higher education provides in-depth knowledge in specialized areas, such as machine learning or cybersecurity.
It offers networking opportunities with peers and industry professionals, which can lead to collaborations and job opportunities.
Pursuing advanced degrees, like a Master's in Computer Science, can enhance c...
str.swapcase() returns a new string with all uppercase letters converted to lowercase and vice versa.
Usage: 'Hello World'.swapcase() returns 'hELLO wORLD'.
It affects only alphabetic characters; numbers and symbols remain unchanged.
This method does not modify the original string; it returns a new one.
Example: 'Python 3.8'.swapcase() results in 'pYTHON 3.8'.
FIFO stands for First In, First Out. LIFO stands for Last In, First Out.
FIFO is a method for organizing and manipulating a data buffer, where the first element added is the first to be removed.
LIFO is a method where the last element added is the first to be removed.
FIFO is like a queue, while LIFO is like a stack.
Example: In a FIFO queue, if elements A, B, and C are added in that order, they will be removed in the same...
First round was online coding round, second was coding in interview infornt of panel, third was DSA+basic Database questions
I appeared for an interview in Apr 2024.
I applied via LinkedIn and was interviewed in Oct 2021. There were 4 interview rounds.
based on 3 interview experiences
Difficulty level
Duration
based on 2 reviews
Rating in categories
Software Development Engineer II
112
salaries
| ₹14 L/yr - ₹23 L/yr |
Data Engineer
102
salaries
| ₹8.2 L/yr - ₹20 L/yr |
Senior Data Scientist
68
salaries
| ₹17 L/yr - ₹28.5 L/yr |
Data Scientist
63
salaries
| ₹8 L/yr - ₹24 L/yr |
Senior Data Analyst
52
salaries
| ₹13.2 L/yr - ₹25 L/yr |
IKS Health
Crisil
CorroHealth infotech
Indegene