Add office photos
Employer?
Claim Account for FREE

KPMG India

3.5
based on 5.3k Reviews
Filter interviews by

10+ Hindustan Media Ventures Interview Questions and Answers

Updated 17 Jan 2025
Popular Designations

Q1. Difference between RDD, Dataframe and Dataset. How and what you have used in you databricks for data anlysis

Ans.

RDD, Dataframe and Dataset are data structures in Spark. RDD is a low-level structure, Dataframe is tabular and Dataset is a combination of both.

  • RDD stands for Resilient Distributed Datasets and is a low-level structure in Spark that is immutable and fault-tolerant.

  • Dataframe is a tabular structure with named columns and is similar to a table in a relational database.

  • Dataset is a combination of RDD and Dataframe and provides type-safety and object-oriented programming features...read more

Add your answer

Q2. What are key components in ADF? What all you have used in your pipeline?

Ans.

ADF key components include pipelines, activities, datasets, triggers, and linked services.

  • Pipelines - logical grouping of activities

  • Activities - individual tasks within a pipeline

  • Datasets - data sources and destinations

  • Triggers - event-based or time-based execution of pipelines

  • Linked Services - connections to external data sources

  • Examples: Copy Data activity, Lookup activity, Blob Storage dataset

Add your answer

Q3. Do you create any encryprion key in Databricks? Cluster size in Databricks.

Ans.

Yes, encryption keys can be created in Databricks. Cluster size can be adjusted based on workload.

  • Encryption keys can be created using Azure Key Vault or Databricks secrets

  • Cluster size can be adjusted manually or using autoscaling based on workload

  • Encryption at rest can also be enabled for data stored in Databricks

Add your answer

Q4. What steps are involved in fetching data from an on-premises Unix server?

Ans.

Steps involved in fetching data from an on-premises Unix server

  • Establish a secure connection to the Unix server using SSH or other protocols

  • Identify the data source on the Unix server and determine the data extraction method

  • Use tools like SCP, SFTP, or rsync to transfer the data from the Unix server to Azure storage

  • Transform the data as needed before loading it into Azure Data Lake or Azure SQL Database

Add your answer
Discover Hindustan Media Ventures interview dos and don'ts from real experiences

Q5. Difference between ADLS gen 1 and gen 2?

Ans.

ADLS gen 2 is an upgrade to gen 1 with improved performance, scalability, and security features.

  • ADLS gen 2 is built on top of Azure Blob Storage, while gen 1 is a standalone service.

  • ADLS gen 2 supports hierarchical namespace, which allows for better organization and management of data.

  • ADLS gen 2 has better performance for large-scale analytics workloads, with faster read and write speeds.

  • ADLS gen 2 has improved security features, including encryption at rest and in transit.

  • AD...read more

Add your answer

Q6. What are your current responsibilities as Azure Data Engineer

Ans.

As an Azure Data Engineer, my current responsibilities include designing and implementing data solutions on Azure, optimizing data storage and processing, and ensuring data security and compliance.

  • Designing and implementing data solutions on Azure

  • Optimizing data storage and processing for performance and cost efficiency

  • Ensuring data security and compliance with regulations

  • Collaborating with data scientists and analysts to support their data needs

Add your answer
Are these interview questions helpful?

Q7. What is Semantic layer?

Ans.

Semantic layer is a virtual layer that provides a simplified view of complex data.

  • It acts as a bridge between the physical data and the end-user.

  • It provides a common business language for users to access data.

  • It simplifies data access by hiding the complexity of the underlying data sources.

  • Examples include OLAP cubes, data marts, and virtual tables.

Add your answer

Q8. How do you perform Partitioning

Ans.

Partitioning in Azure Data Engineer involves dividing data into smaller chunks for better performance and manageability.

  • Partitioning can be done based on a specific column or key in the dataset

  • It helps in distributing data across multiple nodes for parallel processing

  • Partitioning can improve query performance by reducing the amount of data that needs to be scanned

  • In Azure Synapse Analytics, you can use ROUND_ROBIN or HASH distribution for partitioning

Add your answer
Share interview questions and help millions of jobseekers 🌟

Q9. What is Medallion Architecture

Ans.

Medallion Architecture is a data processing architecture that involves breaking down data into smaller pieces for easier processing.

  • Medallion Architecture involves breaking down data into smaller pieces for easier processing

  • It allows for parallel processing of data to improve performance

  • Commonly used in big data processing systems like Hadoop and Spark

Add your answer

Q10. What is Spark Architecture

Ans.

Spark Architecture is a distributed computing framework that provides an efficient way to process large datasets.

  • Spark Architecture consists of a driver program, cluster manager, and worker nodes.

  • It uses Resilient Distributed Datasets (RDDs) for fault-tolerant distributed data processing.

  • Spark supports various programming languages like Scala, Java, Python, and SQL.

  • It includes components like Spark Core, Spark SQL, Spark Streaming, and MLlib for different data processing task...read more

Add your answer

Q11. Types of triggers in azure data factory

Ans.

Types of triggers in Azure Data Factory include schedule, tumbling window, event, and manual triggers.

  • Schedule trigger: Runs pipelines on a specified schedule.

  • Tumbling window trigger: Runs pipelines at regular intervals based on a time window.

  • Event trigger: Triggers pipelines based on events like file arrival or HTTP request.

  • Manual trigger: Allows manual execution of pipelines.

Add your answer

Q12. challenging problem

Ans.

Designing a data pipeline to process and analyze large volumes of real-time data from multiple sources.

  • Identify the sources of data and their formats

  • Design a scalable data ingestion process

  • Implement data transformation and cleansing steps

  • Utilize Azure Data Factory, Azure Databricks, and Azure Synapse Analytics for processing and analysis

Add your answer
Contribute & help others!
Write a review
Share interview
Contribute salary
Add office photos

Interview Process at Hindustan Media Ventures

based on 3 interviews
1 Interview rounds
Technical Round
View more
Interview Tips & Stories
Ace your next interview with expert advice and inspiring stories

Top Azure Data Engineer Interview Questions from Similar Companies

3.7
 • 15 Interview Questions
3.9
 • 14 Interview Questions
3.3
 • 10 Interview Questions
View all
Share an Interview
Stay ahead in your career. Get AmbitionBox app
qr-code
Helping over 1 Crore job seekers every month in choosing their right fit company
70 Lakh+

Reviews

5 Lakh+

Interviews

4 Crore+

Salaries

1 Cr+

Users/Month

Contribute to help millions

Made with ❤️ in India. Trademarks belong to their respective owners. All rights reserved © 2024 Info Edge (India) Ltd.

Follow us
  • Youtube
  • Instagram
  • LinkedIn
  • Facebook
  • Twitter