i
Diggibyte
Technologies
Filter interviews by
To create mount points in ADLS, use the Azure Storage Explorer or Azure Portal. To load data source, use Azure Data Factory or Azure Databricks.
Mount points can be created using Azure Storage Explorer or Azure Portal
To load data source, use Azure Data Factory or Azure Databricks
Mount points allow you to access data in ADLS as if it were a local file system
Data can be loaded into ADLS using various tools such as Az...
Choose a cluster based on data size, complexity, and processing requirements.
Consider the size and complexity of the data to be processed.
Determine the processing requirements, such as batch or real-time processing.
Choose a cluster with appropriate resources, such as CPU, memory, and storage.
Examples of Azure clusters include HDInsight, Databricks, and Synapse Analytics.
Accumulators are variables used for aggregating data in Spark. GroupByKey and ReduceByKey are operations used for data transformation.
Accumulators are used to accumulate values across multiple tasks in a distributed environment.
GroupByKey is used to group data based on a key and create a pair of key-value pairs.
ReduceByKey is used to aggregate data based on a key and reduce the data to a single value.
GroupByKey is...
Serialization is the process of converting an object into a stream of bytes for storage or transmission.
Serialization is used to transfer objects between different applications or systems.
It allows objects to be stored in a file or database.
Serialization can be used for caching and improving performance.
Examples of serialization formats include JSON, XML, and binary formats like Protocol Buffers and Apache Avro.
DAG stands for Directed Acyclic Graph and is a way to represent dependencies between tasks. RDD stands for Resilient Distributed Datasets and is a fundamental data structure in Apache Spark.
DAG is used to represent a series of tasks or operations where each task depends on the output of the previous task.
RDD is a distributed collection of data that can be processed in parallel across multiple nodes in a cluster.
RD...
Spark architecture is a distributed computing framework that processes large datasets in parallel across a cluster of nodes.
Spark has a master-slave architecture with a driver program that communicates with the cluster manager to allocate resources and tasks to worker nodes.
Worker nodes execute tasks in parallel and store data in memory or disk.
Spark supports various data sources and APIs for batch processing, str...
I applied via Naukri.com and was interviewed in May 2022. There were 2 interview rounds.
Spark architecture is a distributed computing framework that processes large datasets in parallel across a cluster of nodes.
Spark has a master-slave architecture with a driver program that communicates with the cluster manager to allocate resources and tasks to worker nodes.
Worker nodes execute tasks in parallel and store data in memory or disk.
Spark supports various data sources and APIs for batch processing, streamin...
DAG stands for Directed Acyclic Graph and is a way to represent dependencies between tasks. RDD stands for Resilient Distributed Datasets and is a fundamental data structure in Apache Spark.
DAG is used to represent a series of tasks or operations where each task depends on the output of the previous task.
RDD is a distributed collection of data that can be processed in parallel across multiple nodes in a cluster.
RDDs ar...
Serialization is the process of converting an object into a stream of bytes for storage or transmission.
Serialization is used to transfer objects between different applications or systems.
It allows objects to be stored in a file or database.
Serialization can be used for caching and improving performance.
Examples of serialization formats include JSON, XML, and binary formats like Protocol Buffers and Apache Avro.
Accumulators are variables used for aggregating data in Spark. GroupByKey and ReduceByKey are operations used for data transformation.
Accumulators are used to accumulate values across multiple tasks in a distributed environment.
GroupByKey is used to group data based on a key and create a pair of key-value pairs.
ReduceByKey is used to aggregate data based on a key and reduce the data to a single value.
GroupByKey is less...
Choose a cluster based on data size, complexity, and processing requirements.
Consider the size and complexity of the data to be processed.
Determine the processing requirements, such as batch or real-time processing.
Choose a cluster with appropriate resources, such as CPU, memory, and storage.
Examples of Azure clusters include HDInsight, Databricks, and Synapse Analytics.
To create mount points in ADLS, use the Azure Storage Explorer or Azure Portal. To load data source, use Azure Data Factory or Azure Databricks.
Mount points can be created using Azure Storage Explorer or Azure Portal
To load data source, use Azure Data Factory or Azure Databricks
Mount points allow you to access data in ADLS as if it were a local file system
Data can be loaded into ADLS using various tools such as Azure D...
Top trending discussions
I applied via Referral and was interviewed in Jul 2021. There was 1 interview round.
Apache Spark is a unified analytics engine for big data processing, with built-in modules for streaming, SQL, machine learning, and graph processing.
Spark is designed for speed, with in-memory data processing capabilities, making it faster than Hadoop's MapReduce.
It supports multiple programming languages, including Scala, Java, Python, and R, allowing flexibility in development.
Spark can handle both batch and real-tim...
Smb join is a method used to join two tables in SQL Server.
Smb join stands for Sort Merge Bucket join.
It is used when joining large tables.
It involves sorting the tables and then merging them.
It is an efficient join method for large tables with indexes.
Example: SELECT * FROM table1 JOIN table2 ON table1.column = table2.column OPTION (HASH JOIN, MERGE JOIN, LOOP JOIN);
I applied via Naukri.com and was interviewed before Aug 2021. There were 3 interview rounds.
MCQ based online test for the technology being interviewed for
I appeared for an interview in Nov 2020.
Design a scalable and efficient platform for booking accommodations, flights, and experiences.
User Interface: Create a user-friendly interface for searching and booking accommodations.
Database Design: Use a relational database for storing user data, bookings, and property details.
Search Functionality: Implement a robust search algorithm to filter results based on user preferences.
Scalability: Use microservices architec...
I applied via Company Website and was interviewed before Apr 2022. There were 3 interview rounds.
Joined as fresher from college so aptitude
I applied via Job Portal and was interviewed before Jul 2021. There were 2 interview rounds.
Coding test in hacker rank, easy
I applied via Campus Placement and was interviewed before Jul 2021. There were 3 interview rounds.
In this round we have aptitude plus coding mcq questions
Here we have to write full fledge code 2 questions were there and are easy
Some of the top questions asked at the Diggibyte Technologies Azure Data Engineer interview -
Data Engineer
36
salaries
| ₹3.3 L/yr - ₹10.1 L/yr |
Data Scientist
4
salaries
| ₹3.7 L/yr - ₹35 L/yr |
Scrum Master
4
salaries
| ₹11 L/yr - ₹19 L/yr |
Talent Acquisition Specialist
4
salaries
| ₹3.6 L/yr - ₹10 L/yr |
Front end Developer
4
salaries
| ₹3 L/yr - ₹12.5 L/yr |
TCS
Accenture
Cognizant
Infosys