Tech Mahindra
Interview Questions and Answers
Q1. Lets say you have table 1 with values 1,2,3,5,null,null,0 and table 2 has null,2,4,7,3,5 What would be the output after inner join?
The output after inner join of table 1 and table 2 will be 2,3,5.
Inner join only includes rows that have matching values in both tables.
Values 2, 3, and 5 are present in both tables, so they will be included in the output.
Null values are not considered as matching values in inner join.
Q2. How do you design an effective ADF pipeline and what all metrics and considerations you should keep in mind while designing?
Designing an effective ADF pipeline involves considering various metrics and factors.
Understand the data sources and destinations
Identify the dependencies between activities
Optimize data movement and processing for performance
Monitor and track pipeline execution for troubleshooting
Consider security and compliance requirements
Use parameterization and dynamic content for flexibility
Implement error handling and retries for robustness
Q3. What is incremental load. What is partition and bucketing. Spark archtecture
Incremental load is the process of loading only new or updated data into a data warehouse, rather than reloading all data each time.
Incremental load helps in reducing the time and resources required for data processing.
It involves identifying new or updated data since the last load and merging it with the existing data.
Common techniques for incremental load include using timestamps or change data capture (CDC) mechanisms.
Example: Loading only new sales transactions into a dat...read more
Q4. Advanced SQL questions - highest sales from each city
Use window functions like ROW_NUMBER() to find highest sales from each city in SQL.
Use PARTITION BY clause in ROW_NUMBER() to partition data by city
Order the data by sales in descending order
Filter the results to only include rows with row number 1
Q5. Project Architecture, spark transformations used?
The project architecture includes Spark transformations for processing large volumes of data.
Spark transformations are used to manipulate data in distributed computing environments.
Examples of Spark transformations include map, filter, reduceByKey, join, etc.
Q6. Databricks - how to mount?
Databricks can be mounted using the Databricks CLI or the Databricks REST API.
Use the Databricks CLI command 'databricks fs mount' to mount a storage account to a Databricks workspace.
Alternatively, you can use the Databricks REST API to programmatically mount storage.
Interview Process at null
Top Azure Data Engineer Interview Questions from Similar Companies
Reviews
Interviews
Salaries
Users/Month