UST
Oswaal Books and Learning Private Limited Interview Questions and Answers
Q1. what is IR, and difference between dataset and linked service
IR stands for Integration Runtime. Dataset is a representation of data, while linked service is a connection to the data source.
IR is a compute infrastructure used to provide data integration capabilities
Dataset is a structured representation of data used in data engineering tasks
Linked service is a connection to a data source, providing access to the data
IR enables data movement and transformation between different data sources
Dataset defines the schema and structure of the ...read more
Q2. write code in regular expression to remove the special characters
Use regular expression to remove special characters from a string
Use the regex pattern [^a-zA-Z0-9\s] to match any character that is not a letter, digit, or whitespace
Use the replace() function in your programming language to replace the matched special characters with an empty string
Example: input string 'Hello! How are you?' will become 'Hello How are you' after removing special characters
Q3. How to dump data from csv into bq
Use Google Cloud Storage to load CSV data into BigQuery
Upload the CSV file to Google Cloud Storage
Create a BigQuery table with the appropriate schema
Use the 'bq load' command to load the data from the CSV file into the BigQuery table
Q4. Diff between rank and dense rank
Rank assigns unique rank to each row, while dense rank assigns consecutive ranks without gaps.
Rank leaves gaps in rank sequence if there are ties, while dense rank does not
Rank function is used to assign a unique rank to each row based on a specified column
Dense rank function is used to assign consecutive ranks to rows without any gaps
Example: If there are two rows with rank 1 in a dataset, rank function will assign 1 and 2, while dense rank will assign 1 and 1
Q5. optimization techniques in spark
Optimization techniques in Spark
Partitioning data to optimize data locality
Caching frequently accessed data
Using broadcast variables for small data sets
Using appropriate data structures and algorithms
Avoiding unnecessary shuffling of data
Q6. what is adf triggers
ADF triggers are used in Azure Data Factory to schedule and orchestrate data pipelines.
ADF triggers enable the automation of data movement and data transformation activities.
Triggers can be scheduled to run at specific times or based on event-based triggers.
They can be used to start or stop pipelines, and can be configured with parameters and dependencies.
Examples of triggers include time-based schedules, event-based triggers like file arrival, or manual triggers.
Triggers can...read more
More about working at UST
Interview Process at Oswaal Books and Learning Private Limited
Top Data Engineer Interview Questions from Similar Companies
Reviews
Interviews
Salaries
Users/Month