Infovision
10+ Brotherhood Vikas Samiti Interview Questions and Answers
Q1. What is IR - integration Runtime? what are the types of IR
Integration Runtime (IR) is a compute infrastructure that provides data integration capabilities across different network environments.
IR is used in Azure Data Factory to provide data integration capabilities
There are three types of IR: Azure, Self-hosted, and Azure-SSIS
Azure IR is fully managed by Microsoft and is used for data movement in the cloud
Self-hosted IR allows data movement between on-premises and cloud data stores
Azure-SSIS IR is used for running SQL Server Integr...read more
Q2. ETL- how to do the incremental load in ADF and in SSIS
Incremental load in ADF and SSIS involves identifying new or updated data and loading only those changes.
In ADF, use watermark columns to track the last loaded value and filter data based on this value
In SSIS, use CDC (Change Data Capture) components or custom scripts to identify new or updated data
Both ADF and SSIS support incremental loading by comparing source and target data to determine changes
Q3. how to insert non-duplicate data into target table. how many ways we can do.
To insert non-duplicate data into a target table, you can use methods like using a unique constraint, using a merge statement, or using a temporary table.
Use a unique constraint on the target table to prevent duplicate entries.
Use a merge statement to insert data into the target table only if it does not already exist.
Use a temporary table to store the new data, then insert only the non-duplicate records into the target table.
Q4. ETL - How to do full load in SSIS, mention the steps
To perform a full load in SSIS, you can use the Data Flow Task with a source and destination component.
Create a Data Flow Task in the Control Flow tab of the SSIS package.
Add a source component to extract data from the source system.
Add a destination component to load data into the destination system.
Map the columns from the source to the destination.
Run the package to execute the full load.
Q5. how to copy data without using multiple activities. Dynamically using loops/ parameterization.
Use a single activity with dynamic parameterization and loops to copy data.
Use a loop to iterate through the data source and destination locations.
Parameterize the source and destination locations to dynamically copy data.
Utilize a scripting language like Python or PowerShell to implement the logic.
Example: Use a Python script with a loop to copy files from one folder to another.
Example: Use PowerShell script with dynamic parameters to copy data from one database to another.
Q6. SQL - how do you identify the long running queries.
Identifying long running queries in SQL
Monitor query execution times using tools like SQL Server Profiler or Performance Monitor
Check system views like sys.dm_exec_requests or sys.dm_exec_query_stats for query durations
Use query hints like OPTION (RECOMPILE) to force recompilation of queries for better performance
Q7. What are window functions in SQL
Window functions in SQL are used to perform calculations across a set of table rows related to the current row.
Window functions are used to calculate values based on a set of rows related to the current row.
They allow for ranking, aggregation, and other calculations within a specific window of rows.
Common window functions include ROW_NUMBER(), RANK(), DENSE_RANK(), and SUM() OVER().
Q8. SQL - Delete Vs Truncate ?
Delete removes rows one by one and can be rolled back, while Truncate removes all rows at once and cannot be rolled back.
Delete is a DML command, while Truncate is a DDL command.
Delete can be rolled back using a transaction, while Truncate cannot be rolled back.
Delete fires triggers on each row deletion, while Truncate does not fire triggers.
Delete is slower as it removes rows one by one, while Truncate is faster as it removes all rows at once.
Delete maintains the table struc...read more
Q9. What is snowflake?
Snowflake is a cloud-based data warehousing platform that allows for easy and scalable data storage and analysis.
Snowflake is a fully managed service that works on a pay-as-you-go model.
It separates storage and compute resources, allowing for better scalability and cost-effectiveness.
Snowflake supports SQL queries and has built-in support for semi-structured data like JSON and XML.
It provides features like automatic scaling, data sharing, and data replication for high availab...read more
Q10. Linked Service Vs Dataset
Linked Service connects to external data sources, while Dataset represents the data within the data store.
Linked Service is used to connect to external data sources like databases, APIs, and file systems.
Dataset represents the data within the data store and can be used for data processing and analysis.
Linked Service defines the connection information and credentials needed to access external data sources.
Dataset defines the schema and structure of the data stored within the d...read more
Top Data Engineer Interview Questions from Similar Companies
Reviews
Interviews
Salaries
Users/Month