Tech Mahindra
D Imagination Photography Interview Questions and Answers
Q1. Lets say you have table 1 with values 1,2,3,5,null,null,0 and table 2 has null,2,4,7,3,5 What would be the output after inner join?
The output after inner join of table 1 and table 2 will be 2,3,5.
Inner join only includes rows that have matching values in both tables.
Values 2, 3, and 5 are present in both tables, so they will be included in the output.
Null values are not considered as matching values in inner join.
Q2. How do you design an effective ADF pipeline and what all metrics and considerations you should keep in mind while designing?
Designing an effective ADF pipeline involves considering various metrics and factors.
Understand the data sources and destinations
Identify the dependencies between activities
Optimize data movement and processing for performance
Monitor and track pipeline execution for troubleshooting
Consider security and compliance requirements
Use parameterization and dynamic content for flexibility
Implement error handling and retries for robustness
Q3. What methods do you use to transfer data from on-premises storage to Azure Data Lake Storage Gen2?
Methods to transfer data from on-premises storage to Azure Data Lake Storage Gen2
Use Azure Data Factory to create pipelines for data transfer
Utilize Azure Data Box for offline data transfer
Leverage Azure Storage Explorer for manual data transfer
Implement Azure Data Migration Service for large-scale data migration
Q4. What are the optimization techniques used in Spark?
Optimization techniques in Spark improve performance and efficiency of data processing.
Partitioning data to distribute workload evenly
Caching frequently accessed data in memory
Using broadcast variables for small lookup tables
Avoiding shuffling operations whenever possible
Tuning configuration settings like memory allocation and parallelism
Q5. What is incremental load. What is partition and bucketing. Spark archtecture
Incremental load is the process of loading only new or updated data into a data warehouse, rather than reloading all data each time.
Incremental load helps in reducing the time and resources required for data processing.
It involves identifying new or updated data since the last load and merging it with the existing data.
Common techniques for incremental load include using timestamps or change data capture (CDC) mechanisms.
Example: Loading only new sales transactions into a dat...read more
Q6. Advanced SQL questions - highest sales from each city
Use window functions like ROW_NUMBER() to find highest sales from each city in SQL.
Use PARTITION BY clause in ROW_NUMBER() to partition data by city
Order the data by sales in descending order
Filter the results to only include rows with row number 1
Q7. Project Architecture, spark transformations used?
The project architecture includes Spark transformations for processing large volumes of data.
Spark transformations are used to manipulate data in distributed computing environments.
Examples of Spark transformations include map, filter, reduceByKey, join, etc.
Q8. Databricks - how to mount?
Databricks can be mounted using the Databricks CLI or the Databricks REST API.
Use the Databricks CLI command 'databricks fs mount' to mount a storage account to a Databricks workspace.
Alternatively, you can use the Databricks REST API to programmatically mount storage.
Q9. Types of joins and spark queries
Types of joins include inner, outer, left, right, and full joins in Spark queries.
Inner join: Returns rows that have matching values in both tables
Outer join: Returns all rows when there is a match in one of the tables
Left join: Returns all rows from the left table and the matched rows from the right table
Right join: Returns all rows from the right table and the matched rows from the left table
Full join: Returns rows when there is a match in one of the tables
Interview Process at D Imagination Photography
Top Azure Data Engineer Interview Questions from Similar Companies
Reviews
Interviews
Salaries
Users/Month