Azure Data Engineer
Azure Data Engineer Interview Questions and Answers for Freshers
Q1. How do you design an effective ADF pipeline and what all metrics and considerations you should keep in mind while designing?
Designing an effective ADF pipeline involves considering various metrics and factors.
Understand the data sources and destinations
Identify the dependencies between activities
Optimize data movement and processing for performance
Monitor and track pipeline execution for troubleshooting
Consider security and compliance requirements
Use parameterization and dynamic content for flexibility
Implement error handling and retries for robustness
Q2. Lets say you have table 1 with values 1,2,3,5,null,null,0 and table 2 has null,2,4,7,3,5 What would be the output after inner join?
The output after inner join of table 1 and table 2 will be 2,3,5.
Inner join only includes rows that have matching values in both tables.
Values 2, 3, and 5 are present in both tables, so they will be included in the output.
Null values are not considered as matching values in inner join.
Q3. Which IR should we use if we want to copy data from on-premise db to azure
We should use the Self-hosted Integration Runtime (IR) to copy data from on-premise db to Azure.
Self-hosted IR allows data movement between on-premise and Azure
It is installed on a local machine or virtual machine in the on-premise network
Self-hosted IR securely connects to the on-premise data source and transfers data to Azure
It supports various data sources like SQL Server, Oracle, MySQL, etc.
Self-hosted IR can be managed and monitored through Azure Data Factory
Q4. What is difference between scheduled trigger and tumbling window trigger
Scheduled trigger is time-based while tumbling window trigger is data-based.
Scheduled trigger is based on a specific time or interval, such as every hour or every day.
Tumbling window trigger is based on the arrival of new data or a specific event.
Scheduled trigger is useful for regular data processing tasks, like ETL jobs.
Tumbling window trigger is useful for aggregating data over fixed time intervals.
Scheduled trigger can be set to run at a specific time, while tumbling wind...read more
Q5. What are the control flow activites in adf
Control flow activities in Azure Data Factory (ADF) are used to define the workflow and execution order of activities.
Control flow activities are used to manage the flow of data and control the execution order of activities in ADF.
They allow you to define dependencies between activities and specify conditions for their execution.
Some commonly used control flow activities in ADF are If Condition, For Each, Until, and Switch.
If Condition activity allows you to define conditiona...read more
Q6. What are the types of IR
IR stands for Integration Runtime. There are two types of IR: Self-hosted and Azure-SSIS.
Self-hosted IR is used to connect to on-premises data sources.
Azure-SSIS IR is used to run SSIS packages in Azure Data Factory.
Self-hosted IR requires an on-premises machine to be installed and configured.
Azure-SSIS IR is a fully managed service provided by Azure.
Both types of IR enable data movement and transformation in Azure Data Factory.
Share interview questions and help millions of jobseekers 🌟
Q7. What is the data flow of databricks
Data flow in Databricks involves reading data from various sources, processing it using Spark, and storing the results in different formats.
Data is read from sources like Azure Data Lake Storage, Azure Blob Storage, or databases
Data is processed using Apache Spark clusters in Databricks
Results can be stored in various formats like Parquet, Delta Lake, or SQL tables
Q8. What is linked services in adf
Linked services in ADF are connections to external data sources or destinations that allow data movement and transformation.
Linked services are used to connect to various data sources such as databases, file systems, and cloud services.
They provide the necessary information and credentials to establish a connection.
Linked services enable data movement activities like copying data from one source to another or transforming data during the movement process.
Examples of linked se...read more
Azure Data Engineer Jobs
Q9. What is tumbling window trigger
Tumbling window trigger is a type of trigger in Azure Data Factory that defines a fixed-size window of time for data processing.
Tumbling window trigger divides data into fixed-size time intervals for processing
It is useful for scenarios where data needs to be processed in regular intervals
Example: Triggering a pipeline every hour to process data for the past hour
Q10. What are type of triggers
Types of triggers include DDL triggers, DML triggers, and logon triggers.
DDL triggers are fired in response to DDL events like CREATE, ALTER, DROP
DML triggers are fired in response to DML events like INSERT, UPDATE, DELETE
Logon triggers are fired in response to logon events
Interview Questions of Similar Designations
Top Interview Questions for Azure Data Engineer Related Skills
Interview experiences of popular companies
Calculate your in-hand salary
Confused about how your in-hand salary is calculated? Enter your annual salary (CTC) and get your in-hand salary
Reviews
Interviews
Salaries
Users/Month