Virtusa Consulting Services
Interview Questions and Answers
Q1. what type of filesystem used in ur project
We use Hadoop Distributed File System (HDFS) for our project.
HDFS is a distributed file system designed to run on commodity hardware.
It provides high-throughput access to application data and is fault-tolerant.
HDFS is used by many big data processing frameworks like Hadoop, Spark, etc.
It stores data in a distributed manner across multiple nodes in a cluster.
HDFS is optimized for large files and sequential reads and writes.
Q2. Command to check disk utilisation and health in Hadoop
Use 'hdfs diskbalancer' command to check disk utilisation and health in Hadoop
Run 'hdfs diskbalancer -report' to get a report on disk utilisation
Use 'hdfs diskbalancer -plan
' to generate a plan for balancing disk usage Check the Hadoop logs for any disk health issues
Q3. Pivot table creation in SQL from not pivot one
To create a pivot table in SQL from a non-pivot table, you can use the CASE statement with aggregate functions.
Use the CASE statement to categorize data into columns
Apply aggregate functions like SUM, COUNT, AVG, etc. to calculate values for each category
Group the data by the columns you want to pivot on
Q4. How to create triggers
Creating triggers in a database involves defining the trigger, specifying the event that will activate it, and writing the code to be executed.
Define the trigger using the CREATE TRIGGER statement
Specify the event that will activate the trigger (e.g. INSERT, UPDATE, DELETE)
Write the code or actions to be executed when the trigger is activated
Test the trigger to ensure it functions as intended
Q5. Spark optimization techniques
Optimization techniques in Spark improve performance and efficiency of data processing.
Partitioning data to distribute workload evenly
Caching frequently accessed data in memory
Using broadcast variables for small lookup tables
Avoiding shuffling operations whenever possible
Tuning memory settings and garbage collection parameters
Q6. Spark optimization techniques
Spark optimization techniques improve performance and efficiency of Spark applications.
Partitioning data to reduce shuffling
Caching frequently used data
Using broadcast variables for small data
Using efficient data formats like Parquet
Tuning memory and CPU usage
Using appropriate cluster size
Avoiding unnecessary data shuffling
Using appropriate serialization formats
Using appropriate join strategies
More about working at Virtusa Consulting Services
Reviews
Interviews
Salaries
Users/Month