Accenture
10+ Brightcom Group Interview Questions and Answers
Q1. Which IR should we use if we want to copy data from on-premise db to azure
We should use the Self-hosted Integration Runtime (IR) to copy data from on-premise db to Azure.
Self-hosted IR allows data movement between on-premise and Azure
It is installed on a local machine or virtual machine in the on-premise network
Self-hosted IR securely connects to the on-premise data source and transfers data to Azure
It supports various data sources like SQL Server, Oracle, MySQL, etc.
Self-hosted IR can be managed and monitored through Azure Data Factory
Q2. What is difference between scheduled trigger and tumbling window trigger
Scheduled trigger is time-based while tumbling window trigger is data-based.
Scheduled trigger is based on a specific time or interval, such as every hour or every day.
Tumbling window trigger is based on the arrival of new data or a specific event.
Scheduled trigger is useful for regular data processing tasks, like ETL jobs.
Tumbling window trigger is useful for aggregating data over fixed time intervals.
Scheduled trigger can be set to run at a specific time, while tumbling wind...read more
Q3. What are the control flow activites in adf
Control flow activities in Azure Data Factory (ADF) are used to define the workflow and execution order of activities.
Control flow activities are used to manage the flow of data and control the execution order of activities in ADF.
They allow you to define dependencies between activities and specify conditions for their execution.
Some commonly used control flow activities in ADF are If Condition, For Each, Until, and Switch.
If Condition activity allows you to define conditiona...read more
Q4. What are the types of IR
IR stands for Integration Runtime. There are two types of IR: Self-hosted and Azure-SSIS.
Self-hosted IR is used to connect to on-premises data sources.
Azure-SSIS IR is used to run SSIS packages in Azure Data Factory.
Self-hosted IR requires an on-premises machine to be installed and configured.
Azure-SSIS IR is a fully managed service provided by Azure.
Both types of IR enable data movement and transformation in Azure Data Factory.
Q5. How to mask data in azure
Data masking in Azure helps protect sensitive information by replacing original data with fictitious data.
Use Dynamic Data Masking in Azure SQL Database to obfuscate sensitive data in real-time
Leverage Azure Purview to discover, classify, and mask sensitive data across various data sources
Implement Azure Data Factory to transform and mask data during ETL processes
Utilize Azure Information Protection to apply encryption and access controls to sensitive data
Q6. What is linked services in adf
Linked services in ADF are connections to external data sources or destinations that allow data movement and transformation.
Linked services are used to connect to various data sources such as databases, file systems, and cloud services.
They provide the necessary information and credentials to establish a connection.
Linked services enable data movement activities like copying data from one source to another or transforming data during the movement process.
Examples of linked se...read more
Q7. What is azure data factory
Azure Data Factory is a cloud-based data integration service that allows you to create, schedule, and manage data pipelines.
Azure Data Factory is used to move and transform data from various sources to destinations.
It supports data integration and orchestration of workflows.
You can monitor and manage data pipelines using Azure Data Factory.
It provides a visual interface for designing and monitoring data pipelines.
Azure Data Factory can be used for data migration, data warehou...read more
Q8. What is Azure data lake
Azure Data Lake is a scalable data storage and analytics service provided by Microsoft Azure.
Azure Data Lake Store is a secure data repository that allows you to store and analyze petabytes of data.
Azure Data Lake Analytics is a distributed analytics service that can process big data using Apache Hadoop and Apache Spark.
It is designed for big data processing and analytics tasks, providing high performance and scalability.
Q9. What is index in table
An index in a table is a data structure that improves the speed of data retrieval operations on a database table.
Indexes are used to quickly locate data without having to search every row in a table.
They can be created on one or more columns in a table.
Examples of indexes include primary keys, unique constraints, and non-unique indexes.
Q10. What is copy activity
Copy activity is a tool in Azure Data Factory used to move data between data stores.
Copy activity is a feature in Azure Data Factory that allows you to move data between supported data stores.
It supports various data sources and destinations such as Azure Blob Storage, Azure SQL Database, and more.
You can define data movement tasks using pipelines in Azure Data Factory and monitor the progress of copy activities.
Q11. What is azure IR
Azure IR stands for Azure Integration Runtime, which is a data integration service in Azure Data Factory.
Azure IR is used to provide data integration capabilities across different network environments.
It allows data movement between cloud and on-premises data sources.
Azure IR can be configured to run data integration activities in Azure Data Factory pipelines.
It supports different types of data integration activities such as copy data, transform data, and run custom activitie...read more
Q12. What is scd type 1
SCD Type 1 is a method of updating data in a data warehouse by overwriting existing data with new information.
Overwrites existing data with new information
No historical data is kept
Simplest and fastest method of updating data
Q13. Activities used in ADF
Activities in Azure Data Factory (ADF) are the building blocks of a pipeline and perform various tasks like data movement, data transformation, and data orchestration.
Activities can be used to copy data from one location to another (Copy Activity)
Activities can be used to transform data using mapping data flows (Data Flow Activity)
Activities can be used to run custom code or scripts (Custom Activity)
Activities can be used to control the flow of data within a pipeline (Control...read more
Q14. Dataframes in pyspark
Dataframes in pyspark are distributed collections of data organized into named columns.
Dataframes are similar to tables in a relational database, with rows and columns.
They can be created from various data sources like CSV, JSON, Parquet, etc.
Dataframes support SQL queries and transformations using PySpark functions.
Example: df = spark.read.csv('file.csv')
More about working at Accenture
Interview Process at Brightcom Group
Top Azure Data Engineer Interview Questions from Similar Companies
Reviews
Interviews
Salaries
Users/Month