Hexaware Technologies
SAVE Housing Finance Interview Questions and Answers
Q1. What is the main advantage of delta lake?
Delta Lake provides ACID transactions, schema enforcement, and time travel capabilities for data lakes.
ACID transactions ensure data consistency and reliability.
Schema enforcement helps maintain data quality and prevent data corruption.
Time travel allows users to access and revert to previous versions of data for auditing or analysis purposes.
Q2. Find the student with marks greater than 80 in all subjects
Filter students with marks greater than 80 in all subjects
Iterate through each student's marks in all subjects
Check if all marks are greater than 80 for a student
Return the student if all marks are greater than 80
Q3. Write the syntax to define the schema of a file for loading.
Syntax to define schema of a file for loading
Use CREATE EXTERNAL TABLE statement in SQL
Specify column names and data types in the schema definition
Example: CREATE EXTERNAL TABLE MyTable (col1 INT, col2 STRING) USING CSV
Q4. how to do performance tuning in adf
Performance tuning in Azure Data Factory involves optimizing data flows and activities to improve efficiency and reduce processing time.
Identify bottlenecks in data flows and activities
Optimize data partitioning and distribution
Use appropriate data integration patterns
Leverage caching and parallel processing
Monitor and analyze performance metrics
Q5. What is Azure synapse architecture?
Azure Synapse is a cloud-based analytics service that brings together big data and data warehousing.
Azure Synapse integrates big data and data warehousing capabilities in a single service
It allows for data ingestion, preparation, management, and serving for BI and machine learning
Supports both serverless and provisioned resources for data processing
Offers integration with Azure Machine Learning, Power BI, and Azure Data Factory
Q6. Types of cluster in data bricks??
Types of clusters in Databricks include Standard, High Concurrency, and Single Node clusters.
Standard cluster: Suitable for running single jobs or workflows.
High Concurrency cluster: Designed for multiple users running concurrent jobs.
Single Node cluster: Used for development and testing purposes.
Q7. What is a catalyst optimizer?
The catalyst optimizer is a query optimization engine in Apache Spark that improves performance by generating optimized query plans.
It is a query optimization engine in Apache Spark.
It improves performance by generating optimized query plans.
It uses rule-based and cost-based optimization techniques.
It leverages advanced techniques like code generation and adaptive query execution.
Example: Catalyst optimizer in Spark SQL analyzes the query and generates an optimized query plan...read more
Q8. What is Catalyst optimizer
Catalyst optimizer is a query optimization framework in Apache Spark.
Catalyst optimizer is a rule-based optimization framework used in Apache Spark for optimizing query plans.
It leverages advanced programming language features in Scala to build an extensible query optimizer.
Catalyst optimizer performs various optimizations such as constant folding, predicate pushdown, and projection pruning.
It helps in improving the performance of Spark SQL queries by generating efficient que...read more
Q9. Find the duplicate row ?
Use SQL query with GROUP BY and HAVING clause to find duplicate rows.
Use GROUP BY to group rows with same values
Use HAVING COUNT(*) > 1 to filter out duplicate rows
Example: SELECT column1, column2, COUNT(*) FROM table_name GROUP BY column1, column2 HAVING COUNT(*) > 1
Interview Process at SAVE Housing Finance
Top Azure Data Engineer Interview Questions from Similar Companies
Reviews
Interviews
Salaries
Users/Month