Add office photos
Engaged Employer

Capgemini

3.8
based on 39.9k Reviews
Proud winner of ABECA 2024 - AmbitionBox Employee Choice Awards
Filter interviews by

APECO Infrastructure Interview Questions and Answers

Updated 21 Jan 2025
Popular Designations

Q1. How to read parquet file, how to call notebook from adf, Azure Devops CI/CD Process, system variables in adf

Ans.

Answering questions related to Azure Data Engineer interview

  • To read parquet file, use PyArrow or Pandas library

  • To call notebook from ADF, use Notebook activity in ADF pipeline

  • For Azure DevOps CI/CD process, use Azure Pipelines

  • System variables in ADF can be accessed using expressions like @pipeline().RunId

Add your answer

Q2. difference between persist and cache in pyspark?

Ans.

Persist and cache are both used for optimizing performance in PySpark, but persist stores data in memory and/or disk while cache only stores data in memory.

  • Persist allows you to specify storage level (memory, disk, etc.) while cache only stores data in memory

  • Persist is more flexible in terms of storage options compared to cache

  • Persist is used when you want to store data in memory and/or disk for future reuse, while cache is used for temporary storage in memory only

Add your answer

Q3. How you migrated oracle data into azure?

Ans.

I migrated Oracle data into Azure using Azure Data Factory and Azure Database Migration Service.

  • Used Azure Data Factory to create pipelines for data migration

  • Utilized Azure Database Migration Service for schema and data migration

  • Ensured data consistency and integrity during the migration process

Add your answer

Q4. SDC 1 and SDC 2 in ADF explain with example

Ans.

SDC 1 and SDC 2 in ADF are Self-Hosted Integration Runtimes used for data movement in Azure Data Factory.

  • SDC 1 and SDC 2 are Self-Hosted Integration Runtimes that allow data movement between on-premises and cloud data stores in Azure Data Factory.

  • SDC 1 is used for data movement in ADF pipelines and can be installed on an on-premises server.

  • SDC 2 is an updated version of SDC 1 with improved performance and scalability.

  • Both SDC 1 and SDC 2 provide secure and efficient data tran...read more

Add your answer
Discover APECO Infrastructure interview dos and don'ts from real experiences

Q5. read a csv file in pyspark

Ans.

Read a CSV file in PySpark

  • Use SparkSession to create a Spark DataFrame from the CSV file

  • Specify the file path and format when reading the CSV file

  • Use options like 'header' and 'inferSchema' to read the CSV file correctly

Add your answer

Q6. Remove duplicates

Ans.

Use DISTINCT keyword in SQL to remove duplicates from a dataset.

  • Use SELECT DISTINCT column_name FROM table_name to retrieve unique values from a specific column.

  • Use SELECT DISTINCT * FROM table_name to retrieve unique rows from the entire table.

  • Use GROUP BY clause with COUNT() function to remove duplicates based on specific criteria.

Add your answer
Contribute & help others!
Write a review
Share interview
Contribute salary
Add office photos

Interview Process at APECO Infrastructure

based on 7 interviews in the last 1 year
1 Interview rounds
Technical Round
View more
Interview Tips & Stories
Ace your next interview with expert advice and inspiring stories

Top Azure Data Engineer Interview Questions from Similar Companies

3.7
 • 15 Interview Questions
3.9
 • 14 Interview Questions
3.5
 • 12 Interview Questions
3.3
 • 10 Interview Questions
View all
Share an Interview
Stay ahead in your career. Get AmbitionBox app
qr-code
Helping over 1 Crore job seekers every month in choosing their right fit company
70 Lakh+

Reviews

5 Lakh+

Interviews

4 Crore+

Salaries

1 Cr+

Users/Month

Contribute to help millions
Get AmbitionBox app

Made with ❤️ in India. Trademarks belong to their respective owners. All rights reserved © 2024 Info Edge (India) Ltd.

Follow us
  • Youtube
  • Instagram
  • LinkedIn
  • Facebook
  • Twitter