Microsoft Corporation Interview Questions and Answers

Question 1

Asked in

Q1. how to remove duplicate rows from bigquery? find the month of a given date in bigquery.

Add your answer

Answer

To remove duplicate rows from BigQuery, use the DISTINCT keyword. To find the month of a given date, use the EXTRACT function.

To remove duplicate rows, use SELECT DISTINCT * FROM table_name;
To find the month of a given date, use SELECT EXTRACT(MONTH FROM date_column) AS month_name FROM table_name;
Make sure to replace 'table_name' and 'date_column' with the appropriate values in your query.

Question 2

Asked in

Data Engineer Interview

Q2. what operator is used in composer to move data from gcs to bq

Add your answer

Answer

The operator used in Composer to move data from GCS to BigQuery is the GCS to BigQuery operator.

The GCS to BigQuery operator is used in Apache Airflow, which is the underlying technology of Composer.
This operator allows you to transfer data from Google Cloud Storage (GCS) to BigQuery.
You can specify the source and destination parameters in the operator to define the data transfer process.

Question 3

Asked in

Data Engineer Interview

Q3. write a code for this - input = [1,2,3,4] output = [1,4,9,16]

Add your answer

Answer

Code to square each element in the input array.

Iterate through the input array and square each element.
Store the squared values in a new array to get the desired output.

Question 4

Asked in

Data Engineer Interview

Q4. architecture of bq. Query optimization techniques in bigquery.

Add your answer

Answer

BigQuery architecture includes storage, execution, and optimization components for efficient query processing.

BigQuery stores data in Capacitor storage system for fast access.
Query execution is distributed across multiple nodes for parallel processing.
Query optimization techniques include partitioning tables, clustering tables, and using query cache.
Using partitioned tables can help eliminate scanning unnecessary data.
Clustering tables based on certain columns can improve que...read more

Question 5

Asked in

Data Engineer Interview

Q5. difference between bigtable and bigquery.

Add your answer

Answer

Bigtable is a NoSQL database for real-time analytics, while BigQuery is a fully managed data warehouse for running SQL queries.

Bigtable is a NoSQL database designed for real-time analytics and high throughput, while BigQuery is a fully managed data warehouse for running SQL queries.
Bigtable is used for storing large amounts of semi-structured data, while BigQuery is used for analyzing structured data using SQL queries.
Bigtable is suitable for real-time data processing and hig...read more

Question 6

Asked in

Data Engineer Interview

Q6. RDD vs dataframe vs dataset in pyspark

Add your answer

Answer

RDD vs dataframe vs dataset in PySpark

RDD (Resilient Distributed Dataset) is the basic abstraction in PySpark, representing a distributed collection of objects
Dataframe is a distributed collection of data organized into named columns, similar to a table in a relational database
Dataset is a distributed collection of data with the ability to use custom classes for type safety and user-defined functions
Dataframes and Datasets are built on top of RDDs, providing a more structured...read more

Question 7

Asked in

Data Engineer Interview

Q7. dataflow vs dataproc.

Add your answer

Answer

Dataflow is a fully managed stream and batch processing service, while Dataproc is a managed Apache Spark and Hadoop service.

Dataflow is a serverless data processing service that automatically scales to handle your data, while Dataproc is a managed Spark and Hadoop service that requires you to provision and manage clusters.
Dataflow is designed for both batch and stream processing, allowing you to process data in real-time, while Dataproc is more focused on batch processing.
Da...read more

Microsoft Corporation Interview Questions and Answers

Q1. how to remove duplicate rows from bigquery? find the month of a given date in bigquery.

Q2. what operator is used in composer to move data from gcs to bq

Q3. write a code for this - input = [1,2,3,4] output = [1,4,9,16]

Q4. architecture of bq. Query optimization techniques in bigquery.

Q5. difference between bigtable and bigquery.

Q6. RDD vs dataframe vs dataset in pyspark

Q7. dataflow vs dataproc.

More about working at Tech Mahindra

Interview Process at Microsoft Corporation

Top Data Engineer Interview Questions from Similar Companies