Upload Button Icon Add office photos
Engaged Employer

i

This company page is being actively managed by Diggibyte Technologies Team. If you also belong to the team, you can get access from here

Diggibyte Technologies Verified Tick

Compare button icon Compare button icon Compare
3.7

based on 17 Reviews

Filter interviews by

Diggibyte Technologies Azure Data Engineer Interview Questions, Process, and Tips

Updated 15 Nov 2022

Diggibyte Technologies Azure Data Engineer Interview Experiences

1 interview found

I applied via Naukri.com and was interviewed in May 2022. There were 2 interview rounds.

Round 1 - Resume Shortlist 
Pro Tip by AmbitionBox:
Keep your resume crisp and to the point. A recruiter looks at your resume for an average of 6 seconds, make sure to leave the best impression.
View all tips
Round 2 - One-on-one 

(7 Questions)

  • Q1. What is the Spark architecture? what is azure sql?
  • Ans. 

    Spark architecture is a distributed computing framework that processes large datasets in parallel across a cluster of nodes.

    • Spark has a master-slave architecture with a driver program that communicates with the cluster manager to allocate resources and tasks to worker nodes.

    • Worker nodes execute tasks in parallel and store data in memory or disk.

    • Spark supports various data sources and APIs for batch processing, streamin...

  • Answered by AI
  • Q2. What is DAG? what is RDD?
  • Ans. 

    DAG stands for Directed Acyclic Graph and is a way to represent dependencies between tasks. RDD stands for Resilient Distributed Datasets and is a fundamental data structure in Apache Spark.

    • DAG is used to represent a series of tasks or operations where each task depends on the output of the previous task.

    • RDD is a distributed collection of data that can be processed in parallel across multiple nodes in a cluster.

    • RDDs ar...

  • Answered by AI
  • Q3. What is serialization? what is broadcast join?
  • Ans. 

    Serialization is the process of converting an object into a stream of bytes for storage or transmission.

    • Serialization is used to transfer objects between different applications or systems.

    • It allows objects to be stored in a file or database.

    • Serialization can be used for caching and improving performance.

    • Examples of serialization formats include JSON, XML, and binary formats like Protocol Buffers and Apache Avro.

  • Answered by AI
  • Q4. What is your roles and responsibilities in your current project?
  • Q5. What is Accumulators? what is groupby key and reducedby key?
  • Ans. 

    Accumulators are variables used for aggregating data in Spark. GroupByKey and ReduceByKey are operations used for data transformation.

    • Accumulators are used to accumulate values across multiple tasks in a distributed environment.

    • GroupByKey is used to group data based on a key and create a pair of key-value pairs.

    • ReduceByKey is used to aggregate data based on a key and reduce the data to a single value.

    • GroupByKey is less...

  • Answered by AI
  • Q6. How to choose a cluster to process the data? What is Azure services ?
  • Ans. 

    Choose a cluster based on data size, complexity, and processing requirements.

    • Consider the size and complexity of the data to be processed.

    • Determine the processing requirements, such as batch or real-time processing.

    • Choose a cluster with appropriate resources, such as CPU, memory, and storage.

    • Examples of Azure clusters include HDInsight, Databricks, and Synapse Analytics.

  • Answered by AI
  • Q7. How to create mount points? How to load data source to ADLS?
  • Ans. 

    To create mount points in ADLS, use the Azure Storage Explorer or Azure Portal. To load data source, use Azure Data Factory or Azure Databricks.

    • Mount points can be created using Azure Storage Explorer or Azure Portal

    • To load data source, use Azure Data Factory or Azure Databricks

    • Mount points allow you to access data in ADLS as if it were a local file system

    • Data can be loaded into ADLS using various tools such as Azure D...

  • Answered by AI

Interview Preparation Tips

Interview preparation tips for other job seekers - Keep learning until get the job.
we will more focus on practical knowledge.

Skills evaluated in this interview

Interview questions from similar companies

Interview experience
4
Good
Difficulty level
Moderate
Process Duration
Less than 2 weeks
Result
Not Selected

I applied via Naukri.com and was interviewed in Nov 2024. There were 2 interview rounds.

Round 1 - Technical 

(5 Questions)

  • Q1. How would you create a pipeline for ADLS to SQL data movement?
  • Q2. How would you create a pipeline from REST API to ADLS? What is there are 8 million rows of records?
  • Q3. IF data needs filtering, joining and aggregation, how would you do it with ADF?
  • Q4. Explain medallion architecture.
  • Q5. Explain medallion with databricks
Round 2 - HR 

(1 Question)

  • Q1. Basic questions and salary expectation.

Interview Preparation Tips

Topics to prepare for Capgemini Azure Data Engineer interview:
  • ADF
  • Databricks
Interview experience
5
Excellent
Difficulty level
Moderate
Process Duration
Less than 2 weeks
Result
Selected Selected

I applied via Recruitment Consulltant and was interviewed in Aug 2024. There were 3 interview rounds.

Round 1 - Technical 

(4 Questions)

  • Q1. Lets say you have table 1 with values 1,2,3,5,null,null,0 and table 2 has null,2,4,7,3,5 What would be the output after inner join?
  • Ans. 

    The output after inner join of table 1 and table 2 will be 2,3,5.

    • Inner join only includes rows that have matching values in both tables.

    • Values 2, 3, and 5 are present in both tables, so they will be included in the output.

    • Null values are not considered as matching values in inner join.

  • Answered by AI
  • Q2. Lets say you have customers table with customerID and customer name, Orders table with OrderId and CustomerID. write a query to find the customer name who placed the maximum orders. if more than one person...
  • Q3. Spark Architecture, Optimisation techniques
  • Q4. Some personal questions.
Round 2 - Technical 

(5 Questions)

  • Q1. Explain the entire architecture of a recent project you are working on in your organisation.
  • Ans. 

    The project involves building a data pipeline to ingest, process, and analyze large volumes of data from various sources in Azure.

    • Utilizing Azure Data Factory for data ingestion and orchestration

    • Implementing Azure Databricks for data processing and transformation

    • Storing processed data in Azure Data Lake Storage

    • Using Azure Synapse Analytics for data warehousing and analytics

    • Leveraging Azure DevOps for CI/CD pipeline aut

  • Answered by AI
  • Q2. How do you design an effective ADF pipeline and what all metrics and considerations you should keep in mind while designing?
  • Ans. 

    Designing an effective ADF pipeline involves considering various metrics and factors.

    • Understand the data sources and destinations

    • Identify the dependencies between activities

    • Optimize data movement and processing for performance

    • Monitor and track pipeline execution for troubleshooting

    • Consider security and compliance requirements

    • Use parameterization and dynamic content for flexibility

    • Implement error handling and retries fo

  • Answered by AI
  • Q3. Lets say you have a very huge data volume and in terms of performance how would you slice and dice the data in such a way that you can boost the performance?
  • Q4. Lets say you have to reconstruct a table and we have to preserve the historical data ? ( i couldnt answer that but please refer to SCD)
  • Q5. We have adf and databricks both, i can achieve transformation , fetching the data and loading the dimension layer using adf also but why do we use databricks if both have the similar functionality for few ...
Round 3 - HR 

(1 Question)

  • Q1. Basic HR questions

Interview Preparation Tips

Topics to prepare for Tech Mahindra Azure Data Engineer interview:
  • SQL
  • Databricks
  • Azure Data Factory
  • Pyspark
  • Spark
Interview preparation tips for other job seekers - The interviewers were really nice.

Skills evaluated in this interview

Interview experience
5
Excellent
Difficulty level
Moderate
Process Duration
Less than 2 weeks
Result
-

I was interviewed in Dec 2024.

Round 1 - Technical 

(4 Questions)

  • Q1. 2 Python Questions based on Strong and List manipulation.
  • Q2. 2 Sql Questions
  • Q3. 1 Pyspark Question
  • Q4. What is service principal, ADF questions...
Round 2 - HR 

(2 Questions)

  • Q1. Introduction. Question on last company why left.
  • Q2. Salary Negotiation.
Interview experience
5
Excellent
Difficulty level
Moderate
Process Duration
2-4 weeks
Result
Not Selected

I applied via Company Website and was interviewed in Dec 2024. There was 1 interview round.

Round 1 - One-on-one 

(2 Questions)

  • Q1. SCD type 1 and SCD type 2 in databircks
  • Q2. How to pass parameters form ADF to ADB

Interview Preparation Tips

Interview preparation tips for other job seekers - Prepare well on basics of dataenigineer
Interview experience
4
Good
Difficulty level
Moderate
Process Duration
2-4 weeks
Result
Selected Selected

I applied via Naukri.com and was interviewed in Oct 2024. There was 1 interview round.

Round 1 - Technical 

(2 Questions)

  • Q1. Rdd,adf IR,databricks,spark architecture
  • Q2. Agile,project related questions
Interview experience
5
Excellent
Difficulty level
-
Process Duration
-
Result
-
Round 1 - Technical 

(2 Questions)

  • Q1. Activities used in ADF
  • Ans. 

    Activities in Azure Data Factory (ADF) are the building blocks of a pipeline and perform various tasks like data movement, data transformation, and data orchestration.

    • Activities can be used to copy data from one location to another (Copy Activity)

    • Activities can be used to transform data using mapping data flows (Data Flow Activity)

    • Activities can be used to run custom code or scripts (Custom Activity)

    • Activities can be u...

  • Answered by AI
  • Q2. Dataframes in pyspark
  • Ans. 

    Dataframes in pyspark are distributed collections of data organized into named columns.

    • Dataframes are similar to tables in a relational database, with rows and columns.

    • They can be created from various data sources like CSV, JSON, Parquet, etc.

    • Dataframes support SQL queries and transformations using PySpark functions.

    • Example: df = spark.read.csv('file.csv')

  • Answered by AI
Round 2 - HR 

(2 Questions)

  • Q1. Managerial Questions
  • Q2. About project roles and resposibilities

Skills evaluated in this interview

Interview experience
3
Average
Difficulty level
Easy
Process Duration
Less than 2 weeks
Result
Not Selected

I applied via Recruitment Consulltant and was interviewed in Mar 2024. There was 1 interview round.

Round 1 - Technical 

(4 Questions)

  • Q1. How are you connecting your onPerm from Azure?
  • Ans. 

    I connect onPrem to Azure using Azure ExpressRoute or VPN Gateway.

    • Use Azure ExpressRoute for private connection through a dedicated connection.

    • Set up a VPN Gateway for secure connection over the internet.

    • Ensure proper network configurations and security settings.

    • Use Azure Virtual Network Gateway to establish the connection.

    • Consider using Azure Site-to-Site VPN for connecting onPremises network to Azure Virtual Network.

  • Answered by AI
  • Q2. What is Autoloader in Databricks?
  • Ans. 

    Autoloader in Databricks is a feature that automatically loads new data files as they arrive in a specified directory.

    • Autoloader monitors a specified directory for new data files and loads them into a Databricks table.

    • It supports various file formats such as CSV, JSON, Parquet, Avro, and ORC.

    • Autoloader simplifies the process of ingesting streaming data into Databricks without the need for manual intervention.

    • It can be ...

  • Answered by AI
  • Q3. How do you normalize your Json data
  • Ans. 

    Json data normalization involves structuring data to eliminate redundancy and improve efficiency.

    • Identify repeating groups of data

    • Create separate tables for each group

    • Establish relationships between tables using foreign keys

    • Eliminate redundant data by referencing shared values

  • Answered by AI
  • Q4. How do you read from Kafka?

Interview Preparation Tips

Interview preparation tips for other job seekers - Focus on core technical

Skills evaluated in this interview

Interview experience
5
Excellent
Difficulty level
Moderate
Process Duration
4-6 weeks
Result
No response

I applied via Referral and was interviewed in May 2024. There was 1 interview round.

Round 1 - Technical 

(2 Questions)

  • Q1. What is polybase?
  • Ans. 

    Polybase is a feature in Azure SQL Data Warehouse that allows users to query data stored in Hadoop or Azure Blob Storage.

    • Polybase enables users to access and query external data sources without moving the data into the database.

    • It provides a virtualization layer that allows SQL queries to seamlessly integrate with data stored in Hadoop or Azure Blob Storage.

    • Polybase can significantly improve query performance by levera...

  • Answered by AI
  • Q2. Explain your current project architecture.
Interview experience
5
Excellent
Difficulty level
-
Process Duration
-
Result
-
Round 1 - Technical 

(1 Question)

  • Q1. Spark Architecture
Round 2 - Technical 

(2 Questions)

  • Q1. Explain your project
  • Q2. Remove duplicates
  • Ans. 

    Use DISTINCT keyword in SQL to remove duplicates from a dataset.

    • Use SELECT DISTINCT column_name FROM table_name to retrieve unique values from a specific column.

    • Use SELECT DISTINCT * FROM table_name to retrieve unique rows from the entire table.

    • Use GROUP BY clause with COUNT() function to remove duplicates based on specific criteria.

  • Answered by AI
Round 3 - HR 

(1 Question)

  • Q1. Salary expectations

Skills evaluated in this interview

Diggibyte Technologies Interview FAQs

How many rounds are there in Diggibyte Technologies Azure Data Engineer interview?
Diggibyte Technologies interview process usually has 2 rounds. The most common rounds in the Diggibyte Technologies interview process are Resume Shortlist and One-on-one Round.
What are the top questions asked in Diggibyte Technologies Azure Data Engineer interview?

Some of the top questions asked at the Diggibyte Technologies Azure Data Engineer interview -

  1. How to choose a cluster to process the data? What is Azure service...read more
  2. How to create mount points? How to load data source to AD...read more
  3. what is Accumulators? what is groupby key and reducedby k...read more

Tell us how to improve this page.

Data Engineer
27 salaries
unlock blur

₹3 L/yr - ₹10 L/yr

Scrum Master
4 salaries
unlock blur

₹11 L/yr - ₹19 L/yr

Front end Developer
4 salaries
unlock blur

₹3 L/yr - ₹12.5 L/yr

Qliksense Developer
4 salaries
unlock blur

₹5 L/yr - ₹7.7 L/yr

Data Scientist
3 salaries
unlock blur

₹3.7 L/yr - ₹10 L/yr

Explore more salaries
Compare Diggibyte Technologies with

Infosys

3.6
Compare

TCS

3.7
Compare

Wipro

3.7
Compare

HCLTech

3.5
Compare
Did you find this page helpful?
Yes No
write
Share an Interview