Filter interviews by
I was interviewed in Nov 2024.
Use 'hdfs diskbalancer' command to check disk utilisation and health in Hadoop
Run 'hdfs diskbalancer -report' to get a report on disk utilisation
Use 'hdfs diskbalancer -plan <path>' to generate a plan for balancing disk usage
Check the Hadoop logs for any disk health issues
Spark Architecture consists of Driver, Cluster Manager, and Executors. Driver manages the execution of Spark jobs.
Driver: Manages the execution of Spark jobs, converts user code into tasks, and coordinates with Cluster Manager.
Cluster Manager: Manages resources across the cluster and allocates resources to Spark applications.
Executors: Execute tasks assigned by the Driver and store data in memory or disk for further pr...
Optimization techniques in Spark improve performance and efficiency of data processing.
Partitioning data to distribute workload evenly
Caching frequently accessed data in memory
Using broadcast variables for small lookup tables
Avoiding shuffling operations whenever possible
Tuning memory settings and garbage collection parameters
I am unable to provide this information as it is confidential.
Confidential information about salaries in previous organizations should not be disclosed.
It is important to respect the privacy and confidentiality of past employers.
Discussing specific salary details may not be appropriate in a professional setting.
To create a pivot table in SQL from a non-pivot table, you can use the CASE statement with aggregate functions.
Use the CASE statement to categorize data into columns
Apply aggregate functions like SUM, COUNT, AVG, etc. to calculate values for each category
Group the data by the columns you want to pivot on
Creating triggers in a database involves defining the trigger, specifying the event that will activate it, and writing the code to be executed.
Define the trigger using the CREATE TRIGGER statement
Specify the event that will activate the trigger (e.g. INSERT, UPDATE, DELETE)
Write the code or actions to be executed when the trigger is activated
Test the trigger to ensure it functions as intended
I applied via Referral and was interviewed in Mar 2022. There was 1 interview round.
Spark optimization techniques improve performance and efficiency of Spark applications.
Partitioning data to reduce shuffling
Caching frequently used data
Using broadcast variables for small data
Using efficient data formats like Parquet
Tuning memory and CPU usage
Using appropriate cluster size
Avoiding unnecessary data shuffling
Using appropriate serialization formats
Using appropriate join strategies
We use Hadoop Distributed File System (HDFS) for our project.
HDFS is a distributed file system designed to run on commodity hardware.
It provides high-throughput access to application data and is fault-tolerant.
HDFS is used by many big data processing frameworks like Hadoop, Spark, etc.
It stores data in a distributed manner across multiple nodes in a cluster.
HDFS is optimized for large files and sequential reads and wri
I applied via Referral and was interviewed in Dec 2024. There were 2 interview rounds.
30 Questions in 20 Minutes
I applied via Naukri.com and was interviewed in Jul 2023. There were 2 interview rounds.
Spark internal working and optimization techniques
Spark uses Directed Acyclic Graph (DAG) for optimizing workflows
Lazy evaluation helps in optimizing transformations by combining them into a single stage
Caching and persistence of intermediate results can improve performance
Partitioning data can help in parallel processing and reducing shuffle operations
I applied via Naukri.com and was interviewed in Nov 2024. There was 1 interview round.
Enhanced optimization in AWS Glue improves job performance by automatically adjusting resources based on workload
Enhanced optimization in AWS Glue automatically adjusts resources like DPUs based on workload
It helps improve job performance by optimizing resource allocation
Users can enable enhanced optimization in AWS Glue job settings
Optimizing querying in Amazon Redshift involves proper table design, distribution keys, sort keys, and query optimization techniques.
Use appropriate distribution keys to evenly distribute data across nodes for parallel processing.
Utilize sort keys to physically order data on disk, reducing the need for sorting during queries.
Avoid using SELECT * and instead specify only the columns needed to reduce data transfer.
Use AN...
I applied via Campus Placement and was interviewed in Oct 2024. There was 1 interview round.
Use regular expression to remove special characters from a string
Use the regex pattern [^a-zA-Z0-9\s] to match any character that is not a letter, digit, or whitespace
Use the replace() function in your programming language to replace the matched special characters with an empty string
Example: input string 'Hello! How are you?' will become 'Hello How are you' after removing special characters
I applied via Naukri.com and was interviewed in Sep 2024. There was 1 interview round.
SCD type 2 is a method used in data warehousing to track historical changes by creating a new record for each change.
SCD type 2 stands for Slowly Changing Dimension type 2
It involves creating a new record in the dimension table whenever there is a change in the data
The old record is marked as inactive and the new record is marked as current
It allows for historical tracking of changes in data over time
Example: If a cust...
posted on 16 May 2024
I applied via Company Website and was interviewed in Apr 2024. There was 1 interview round.
posted on 9 Jan 2025
I applied via Campus Placement and was interviewed in Jul 2024. There was 1 interview round.
Interview experience
based on 3 reviews
Rating in categories
Senior Consultant
4k
salaries
| ₹8 L/yr - ₹25.5 L/yr |
Consultant
3.3k
salaries
| ₹6 L/yr - ₹20.9 L/yr |
Lead Consultant
3.3k
salaries
| ₹10.5 L/yr - ₹36 L/yr |
Software Engineer
3.2k
salaries
| ₹3.5 L/yr - ₹13 L/yr |
Associate Consultant
2.8k
salaries
| ₹4.6 L/yr - ₹15 L/yr |
Cognizant
TCS
Infosys
Accenture