Azure Data Engineer

Azure Data Engineer Interview Questions and Answers for Freshers

Updated 3 Jan 2025

Q1. How do you design an effective ADF pipeline and what all metrics and considerations you should keep in mind while designing?

Ans.

Designing an effective ADF pipeline involves considering various metrics and factors.

Understand the data sources and destinations
Identify the dependencies between activities
Optimize data movement and processing for performance
Monitor and track pipeline execution for troubleshooting
Consider security and compliance requirements
Use parameterization and dynamic content for flexibility
Implement error handling and retries for robustness

Q2. Lets say you have table 1 with values 1,2,3,5,null,null,0 and table 2 has null,2,4,7,3,5 What would be the output after inner join?

Ans.

The output after inner join of table 1 and table 2 will be 2,3,5.

Inner join only includes rows that have matching values in both tables.
Values 2, 3, and 5 are present in both tables, so they will be included in the output.
Null values are not considered as matching values in inner join.

Q3. Which IR should we use if we want to copy data from on-premise db to azure

Ans.

We should use the Self-hosted Integration Runtime (IR) to copy data from on-premise db to Azure.

Self-hosted IR allows data movement between on-premise and Azure
It is installed on a local machine or virtual machine in the on-premise network
Self-hosted IR securely connects to the on-premise data source and transfers data to Azure
It supports various data sources like SQL Server, Oracle, MySQL, etc.
Self-hosted IR can be managed and monitored through Azure Data Factory

View 1 answer

Q4. What is difference between scheduled trigger and tumbling window trigger

Ans.

Scheduled trigger is time-based while tumbling window trigger is data-based.

Scheduled trigger is based on a specific time or interval, such as every hour or every day.
Tumbling window trigger is based on the arrival of new data or a specific event.
Scheduled trigger is useful for regular data processing tasks, like ETL jobs.
Tumbling window trigger is useful for aggregating data over fixed time intervals.
Scheduled trigger can be set to run at a specific time, while tumbling wind...read more

Are these interview questions helpful?

Q5. What are the control flow activites in adf

Ans.

Control flow activities in Azure Data Factory (ADF) are used to define the workflow and execution order of activities.

Control flow activities are used to manage the flow of data and control the execution order of activities in ADF.
They allow you to define dependencies between activities and specify conditions for their execution.
Some commonly used control flow activities in ADF are If Condition, For Each, Until, and Switch.
If Condition activity allows you to define conditiona...read more

Q6. What are the types of IR

Ans.

IR stands for Integration Runtime. There are two types of IR: Self-hosted and Azure-SSIS.

Self-hosted IR is used to connect to on-premises data sources.
Azure-SSIS IR is used to run SSIS packages in Azure Data Factory.
Self-hosted IR requires an on-premises machine to be installed and configured.
Azure-SSIS IR is a fully managed service provided by Azure.
Both types of IR enable data movement and transformation in Azure Data Factory.

View 2 more answers

Share interview questions and help millions of jobseekers 🌟

Q7. What is the data flow of databricks

Ans.

Data flow in Databricks involves reading data from various sources, processing it using Spark, and storing the results in different formats.

Data is read from sources like Azure Data Lake Storage, Azure Blob Storage, or databases
Data is processed using Apache Spark clusters in Databricks
Results can be stored in various formats like Parquet, Delta Lake, or SQL tables

Q8. What is linked services in adf

Ans.

Linked services in ADF are connections to external data sources or destinations that allow data movement and transformation.

Linked services are used to connect to various data sources such as databases, file systems, and cloud services.
They provide the necessary information and credentials to establish a connection.
Linked services enable data movement activities like copying data from one source to another or transforming data during the movement process.
Examples of linked se...read more

Azure Data Engineer Jobs

CGI - Azure Data Engineer - Data Factory & Databricks (4-10 yrs) • 4-10 years

CGI

•

4.0

Azure Data Engineer • 6-10 years

Wipro

•

3.7

Pune

Azure Data Engineer • 5-8 years

PEPSICO GLOBAL BUSINESS SERVICES INDIA LLP

•

4.0

Hyderabad / Secunderabad

View all Azure Data Engineer jobs

Q9. What is tumbling window trigger

Ans.

Tumbling window trigger is a type of trigger in Azure Data Factory that defines a fixed-size window of time for data processing.

Tumbling window trigger divides data into fixed-size time intervals for processing
It is useful for scenarios where data needs to be processed in regular intervals
Example: Triggering a pipeline every hour to process data for the past hour