Myntra Interview Questions and Answers

Question 1

Asked in

Azure Data Engineer Interview

Q1. How to read parquet file, how to call notebook from adf, Azure Devops CI/CD Process, system variables in adf

Add your answer

Answer

Answering questions related to Azure Data Engineer interview

To read parquet file, use PyArrow or Pandas library
To call notebook from ADF, use Notebook activity in ADF pipeline
For Azure DevOps CI/CD process, use Azure Pipelines
System variables in ADF can be accessed using expressions like @pipeline().RunId

Question 2

Asked in

Azure Data Engineer Interview

Q2. difference between persist and cache in pyspark?

Add your answer

Answer

Persist and cache are both used for optimizing performance in PySpark, but persist stores data in memory and/or disk while cache only stores data in memory.

Persist allows you to specify storage level (memory, disk, etc.) while cache only stores data in memory
Persist is more flexible in terms of storage options compared to cache
Persist is used when you want to store data in memory and/or disk for future reuse, while cache is used for temporary storage in memory only

Question 3

Asked in

Azure Data Engineer Interview

Q3. How you migrated oracle data into azure?

Add your answer

Answer

I migrated Oracle data into Azure using Azure Data Factory and Azure Database Migration Service.

Used Azure Data Factory to create pipelines for data migration
Utilized Azure Database Migration Service for schema and data migration
Ensured data consistency and integrity during the migration process

Question 4

Asked in

Azure Data Engineer Interview

Q4. SDC 1 and SDC 2 in ADF explain with example

Add your answer

Answer

SDC 1 and SDC 2 in ADF are Self-Hosted Integration Runtimes used for data movement in Azure Data Factory.

SDC 1 and SDC 2 are Self-Hosted Integration Runtimes that allow data movement between on-premises and cloud data stores in Azure Data Factory.
SDC 1 is used for data movement in ADF pipelines and can be installed on an on-premises server.
SDC 2 is an updated version of SDC 1 with improved performance and scalability.
Both SDC 1 and SDC 2 provide secure and efficient data tran...read more

Question 5

Asked in

Azure Data Engineer Interview

Q5. read a csv file in pyspark

Add your answer

Answer

Read a CSV file in PySpark

Use SparkSession to create a Spark DataFrame from the CSV file
Specify the file path and format when reading the CSV file
Use options like 'header' and 'inferSchema' to read the CSV file correctly

Question 6

Asked in

Azure Data Engineer Interview

Q6. Remove duplicates

Add your answer

Answer

Use DISTINCT keyword in SQL to remove duplicates from a dataset.

Use SELECT DISTINCT column_name FROM table_name to retrieve unique values from a specific column.
Use SELECT DISTINCT * FROM table_name to retrieve unique rows from the entire table.
Use GROUP BY clause with COUNT() function to remove duplicates based on specific criteria.

Myntra Interview Questions and Answers

Q1. How to read parquet file, how to call notebook from adf, Azure Devops CI/CD Process, system variables in adf

Q2. difference between persist and cache in pyspark?

Q3. How you migrated oracle data into azure?

Q4. SDC 1 and SDC 2 in ADF explain with example

Q5. read a csv file in pyspark

Q6. Remove duplicates

More about working at Capgemini

Interview Process at Myntra

Top Azure Data Engineer Interview Questions from Similar Companies