i
Deloitte
Proud winner of ABECA 2024 - AmbitionBox Employee Choice Awards
Filter interviews by
I applied via LinkedIn and was interviewed in Aug 2024. There were 2 interview rounds.
Medallion Architecture is a data processing architecture that involves breaking down data into smaller pieces for easier processing.
Medallion Architecture involves breaking down data into smaller pieces for easier processing
It allows for parallel processing of data to improve performance
Commonly used in big data processing systems like Hadoop and Spark
Spark Architecture is a distributed computing framework that provides an efficient way to process large datasets.
Spark Architecture consists of a driver program, cluster manager, and worker nodes.
It uses Resilient Distributed Datasets (RDDs) for fault-tolerant distributed data processing.
Spark supports various programming languages like Scala, Java, Python, and SQL.
It includes components like Spark Core, Spark SQL, Spa...
Use SQL query to find the second highest salary in employee table
Use SQL query with ORDER BY and LIMIT to get the second highest salary
Example: SELECT DISTINCT salary FROM employee ORDER BY salary DESC LIMIT 1, 1
Partitioning in Azure Data Engineer involves dividing data into smaller chunks for better performance and manageability.
Partitioning can be done based on a specific column or key in the dataset
It helps in distributing data across multiple nodes for parallel processing
Partitioning can improve query performance by reducing the amount of data that needs to be scanned
In Azure Synapse Analytics, you can use ROUND_ROBIN or H
As an Azure Data Engineer, my current responsibilities include designing and implementing data solutions on Azure, optimizing data storage and processing, and ensuring data security and compliance.
Designing and implementing data solutions on Azure
Optimizing data storage and processing for performance and cost efficiency
Ensuring data security and compliance with regulations
Collaborating with data scientists and analysts
I applied via Recruitment Consulltant and was interviewed in Nov 2024. There was 1 interview round.
Partition key is a field used to distribute data across multiple partitions in a database for scalability and performance.
Partition key determines the partition in which a row will be stored in a database.
It helps in distributing data evenly across multiple partitions to improve query performance.
Choosing the right partition key is crucial for efficient data storage and retrieval.
For example, in Azure Cosmos DB, partit...
Data bricks is a unified analytics platform for big data and machine learning, while ADF (Azure Data Factory) is a cloud-based data integration service.
Data bricks is a unified analytics platform that provides a collaborative environment for big data and machine learning projects.
ADF is a cloud-based data integration service that allows you to create, schedule, and manage data pipelines.
Data bricks supports multiple pr...
Implemented Azure-based data analytics solution for a retail company
Designed and implemented data pipelines using Azure Data Factory
Utilized Azure Databricks for data processing and analysis
Developed Power BI dashboards for visualizing insights
Implemented Azure SQL Database for storing structured data
Worked closely with stakeholders to gather requirements and ensure solution met business needs
I have worked on a project to migrate on-premises infrastructure to Azure Cloud for a large enterprise.
Designed and implemented Azure Virtual Networks, Subnets, and Security Groups.
Utilized Azure Site Recovery for disaster recovery planning.
Implemented Azure Active Directory for user authentication and access control.
Utilized Azure DevOps for continuous integration and deployment.
Optimized costs by implementing Azure R
Designing a data pipeline to process and analyze large volumes of real-time data from multiple sources.
Identify the sources of data and their formats
Design a scalable data ingestion process
Implement data transformation and cleansing steps
Utilize Azure Data Factory, Azure Databricks, and Azure Synapse Analytics for processing and analysis
I applied via LinkedIn and was interviewed in Feb 2023. There were 4 interview rounds.
Delta load in ADF is achieved by comparing source and target data and only loading the changed data.
Use a Lookup activity to retrieve the latest watermark or timestamp from the target table
Use a Source activity to extract data from the source system based on the watermark or timestamp
Use a Join activity to compare the source and target data and identify the changed records
Use a Sink activity to load only the changed re
Blob is a storage service for unstructured data, while ADLS is optimized for big data analytics workloads.
Blob is a general-purpose object storage service for unstructured data, while ADLS is optimized for big data analytics workloads.
ADLS offers features like file system semantics, file-level security, and scalability for big data analytics, while Blob storage is simpler and more cost-effective for general storage nee...
There are three types of triggers available in Azure Data Factory: Schedule, Tumbling Window, and Event.
Schedule trigger: Runs pipelines on a specified schedule.
Tumbling Window trigger: Runs pipelines at specified time intervals.
Event trigger: Runs pipelines in response to events like a file being added to a storage account.
I applied via Naukri.com and was interviewed before Jun 2020. There were 4 interview rounds.
ADF key components include pipelines, activities, datasets, triggers, and linked services.
Pipelines - logical grouping of activities
Activities - individual tasks within a pipeline
Datasets - data sources and destinations
Triggers - event-based or time-based execution of pipelines
Linked Services - connections to external data sources
Examples: Copy Data activity, Lookup activity, Blob Storage dataset
Yes, encryption keys can be created in Databricks. Cluster size can be adjusted based on workload.
Encryption keys can be created using Azure Key Vault or Databricks secrets
Cluster size can be adjusted manually or using autoscaling based on workload
Encryption at rest can also be enabled for data stored in Databricks
ADLS gen 2 is an upgrade to gen 1 with improved performance, scalability, and security features.
ADLS gen 2 is built on top of Azure Blob Storage, while gen 1 is a standalone service.
ADLS gen 2 supports hierarchical namespace, which allows for better organization and management of data.
ADLS gen 2 has better performance for large-scale analytics workloads, with faster read and write speeds.
ADLS gen 2 has improved securit...
Semantic layer is a virtual layer that provides a simplified view of complex data.
It acts as a bridge between the physical data and the end-user.
It provides a common business language for users to access data.
It simplifies data access by hiding the complexity of the underlying data sources.
Examples include OLAP cubes, data marts, and virtual tables.
RDD, Dataframe and Dataset are data structures in Spark. RDD is a low-level structure, Dataframe is tabular and Dataset is a combination of both.
RDD stands for Resilient Distributed Datasets and is a low-level structure in Spark that is immutable and fault-tolerant.
Dataframe is a tabular structure with named columns and is similar to a table in a relational database.
Dataset is a combination of RDD and Dataframe and pro...
based on 1 interview
Interview experience
Consultant
33.3k
salaries
| ₹6.3 L/yr - ₹23.1 L/yr |
Senior Consultant
20.9k
salaries
| ₹11 L/yr - ₹42 L/yr |
Analyst
14.2k
salaries
| ₹3.8 L/yr - ₹12.6 L/yr |
Assistant Manager
10k
salaries
| ₹7.8 L/yr - ₹24 L/yr |
Manager
7.1k
salaries
| ₹15.8 L/yr - ₹52 L/yr |
Accenture
PwC
Ernst & Young
Cognizant