Upload Button Icon Add office photos
Engaged Employer

i

This company page is being actively managed by ZingMind Technologies Team. If you also belong to the team, you can get access from here

ZingMind Technologies Verified Tick

Compare button icon Compare button icon Compare
3.8

based on 5 Reviews

Filter interviews by

ZingMind Technologies Data Engineer Interview Questions and Answers

Updated 17 Oct 2024

ZingMind Technologies Data Engineer Interview Experiences

1 interview found

Data Engineer Interview Questions & Answers

user image Anonymous

posted on 17 Oct 2024

Interview experience
3
Average
Difficulty level
Moderate
Process Duration
2-4 weeks
Result
Not Selected

I applied via Recruitment Consulltant and was interviewed in Sep 2024. There was 1 interview round.

Round 1 - Technical 

(2 Questions)

  • Q1. Structure of pyspark
  • Ans. 

    PySpark is the Python API for Apache Spark, a distributed computing framework.

    • PySpark is the Python API for Apache Spark

    • It allows for easy integration of Python code with Spark's distributed computing capabilities

    • PySpark provides high-level APIs in Python for Spark programming

  • Answered by AI
  • Q2. What is hadoop and explain its types
  • Ans. 

    Hadoop is an open-source framework for distributed storage and processing of large data sets.

    • Hadoop consists of two main components: Hadoop Distributed File System (HDFS) for storage and MapReduce for processing.

    • Types of Hadoop include Apache Hadoop, Cloudera Distribution including Apache Hadoop (CDH), Hortonworks Data Platform (HDP), and MapR.

    • Hadoop can be deployed in different modes such as standalone mode, pseudo-di...

  • Answered by AI

Skills evaluated in this interview

Interview questions from similar companies

Interview experience
3
Average
Difficulty level
Easy
Process Duration
Less than 2 weeks
Result
Not Selected

I applied via Company Website and was interviewed in Nov 2024. There were 2 interview rounds.

Round 1 - Technical 

(2 Questions)

  • Q1. Sql constrainsts, star schema, dml dcl commands
  • Q2. About cureent project and responsibilities
Round 2 - Technical 

(2 Questions)

  • Q1. Current projects and resposibilities
  • Q2. Where vs having, reason for job change

Interview Preparation Tips

Interview preparation tips for other job seekers - 1. Technical - about you current project and responsibilities, basic SQL question-constraints, starschema, DML DCL command, one sql query write.
2. Technical with senior manager- about project ,where vs having , reason of job change
Interview experience
3
Average
Difficulty level
Moderate
Process Duration
Less than 2 weeks
Result
No response

I applied via Naukri.com and was interviewed in Oct 2024. There were 2 interview rounds.

Round 1 - Technical 

(7 Questions)

  • Q1. How do you optimize SQL queries?
  • Ans. 

    Optimizing SQL queries involves using indexes, avoiding unnecessary joins, and optimizing the query structure.

    • Use indexes on columns frequently used in WHERE clauses

    • Avoid using SELECT * and only retrieve necessary columns

    • Optimize joins by using INNER JOIN instead of OUTER JOIN when possible

    • Use EXPLAIN to analyze query performance and make necessary adjustments

  • Answered by AI
  • Q2. How do you do performance optimization in Spark. Tell how you did it in you project.
  • Ans. 

    Performance optimization in Spark involves tuning configurations, optimizing code, and utilizing caching.

    • Tune Spark configurations such as executor memory, number of executors, and shuffle partitions.

    • Optimize code by reducing unnecessary shuffles, using efficient transformations, and avoiding unnecessary data movements.

    • Utilize caching to store intermediate results in memory and avoid recomputation.

    • Example: In my projec...

  • Answered by AI
  • Q3. What is SparkContext and SparkSession?
  • Ans. 

    SparkContext is the main entry point for Spark functionality, while SparkSession is the entry point for Spark SQL.

    • SparkContext is the entry point for low-level API functionality in Spark.

    • SparkSession is the entry point for Spark SQL functionality.

    • SparkContext is used to create RDDs (Resilient Distributed Datasets) in Spark.

    • SparkSession provides a unified entry point for reading data from various sources and performing

  • Answered by AI
  • Q4. When a spark job is submitted, what happens at backend. Explain the flow.
  • Ans. 

    When a spark job is submitted, various steps are executed at the backend to process the job.

    • The job is submitted to the Spark driver program.

    • The driver program communicates with the cluster manager to request resources.

    • The cluster manager allocates resources (CPU, memory) to the job.

    • The driver program creates DAG (Directed Acyclic Graph) of the job stages and tasks.

    • Tasks are then scheduled and executed on worker nodes ...

  • Answered by AI
  • Q5. Calculate second highest salary using SQL as well as pyspark.
  • Ans. 

    Calculate second highest salary using SQL and pyspark

    • Use SQL query with ORDER BY and LIMIT to get the second highest salary

    • In pyspark, use orderBy() and take() functions to achieve the same result

  • Answered by AI
  • Q6. 2 types of modes for Spark architecture ?
  • Ans. 

    The two types of modes for Spark architecture are standalone mode and cluster mode.

    • Standalone mode: Spark runs on a single machine with a single JVM and is suitable for development and testing.

    • Cluster mode: Spark runs on a cluster of machines managed by a cluster manager like YARN or Mesos for production workloads.

  • Answered by AI
  • Q7. If you want very less latency - which is better standalone or client mode?
  • Ans. 

    Client mode is better for very less latency due to direct communication with the cluster.

    • Client mode allows direct communication with the cluster, reducing latency.

    • Standalone mode requires an additional layer of communication, increasing latency.

    • Client mode is preferred for real-time applications where low latency is crucial.

  • Answered by AI
Round 2 - Technical 

(2 Questions)

  • Q1. Scenario based. Write SQL and pyspark code for a dataset.
  • Q2. If you have to find latest record based on latest timestamp in a table for a particular customer(table is having history) , how will you do it. Self join and nested query will be expensive. Optimized query...

Interview Preparation Tips

Topics to prepare for LTIMindtree Data Engineer interview:
  • SQL
  • pyspark
  • ETL
Interview preparation tips for other job seekers - L2 was scheduled next day to L1 so the process is fast. Brush up your practical knowledge more.

Skills evaluated in this interview

Data Engineer Interview Questions & Answers

Genpact user image Sashikanta Parida

posted on 17 Dec 2024

Interview experience
5
Excellent
Difficulty level
Moderate
Process Duration
Less than 2 weeks
Result
Not Selected

I applied via Recruitment Consulltant and was interviewed in Nov 2024. There were 2 interview rounds.

Round 1 - Technical 

(3 Questions)

  • Q1. What are different type of joins available in Databricks?
  • Ans. 

    Different types of joins available in Databricks include inner join, outer join, left join, right join, and cross join.

    • Inner join: Returns only the rows that have matching values in both tables.

    • Outer join: Returns all rows when there is a match in either table.

    • Left join: Returns all rows from the left table and the matched rows from the right table.

    • Right join: Returns all rows from the right table and the matched rows ...

  • Answered by AI
  • Q2. How do you make your data pipeline fault tolerant?
  • Ans. 

    Implementing fault tolerance in a data pipeline involves redundancy, monitoring, and error handling.

    • Use redundant components to ensure continuous data flow

    • Implement monitoring tools to detect failures and bottlenecks

    • Set up automated alerts for immediate response to issues

    • Design error handling mechanisms to gracefully handle failures

    • Use checkpoints and retries to ensure data integrity

  • Answered by AI
  • Q3. What is AutoLoader?
  • Ans. 

    AutoLoader is a feature in data engineering that automatically loads data from various sources into a data warehouse or database.

    • Automates the process of loading data from different sources

    • Reduces manual effort and human error

    • Can be scheduled to run at specific intervals

    • Examples: Apache Nifi, AWS Glue

  • Answered by AI
Round 2 - Technical 

(2 Questions)

  • Q1. How do you connect to different services in Azure?
  • Ans. 

    To connect to different services in Azure, you can use Azure SDKs, REST APIs, Azure Portal, Azure CLI, and Azure PowerShell.

    • Use Azure SDKs for programming languages like Python, Java, C#, etc.

    • Utilize REST APIs to interact with Azure services programmatically.

    • Access and manage services through the Azure Portal.

    • Leverage Azure CLI for command-line interface interactions.

    • Automate tasks using Azure PowerShell scripts.

  • Answered by AI
  • Q2. What are linked Services?
  • Ans. 

    Linked Services are connections to external data sources or destinations in Azure Data Factory.

    • Linked Services define the connection information needed to connect to external data sources or destinations.

    • They can be used in Data Factory pipelines to read from or write to external systems.

    • Examples of Linked Services include Azure Blob Storage, Azure SQL Database, and Amazon S3.

  • Answered by AI
Interview experience
4
Good
Difficulty level
Easy
Process Duration
-
Result
-

I applied via Recruitment Consulltant and was interviewed in Nov 2024. There was 1 interview round.

Round 1 - Technical 

(7 Questions)

  • Q1. Difference between bigtable and bigquery.
  • Ans. 

    Bigtable is a NoSQL database for real-time analytics, while BigQuery is a fully managed data warehouse for running SQL queries.

    • Bigtable is a NoSQL database designed for real-time analytics and high throughput, while BigQuery is a fully managed data warehouse for running SQL queries.

    • Bigtable is used for storing large amounts of semi-structured data, while BigQuery is used for analyzing structured data using SQL queries.

    • ...

  • Answered by AI
  • Q2. How to remove duplicate rows from bigquery? find the month of a given date in bigquery.
  • Ans. 

    To remove duplicate rows from BigQuery, use the DISTINCT keyword. To find the month of a given date, use the EXTRACT function.

    • To remove duplicate rows, use SELECT DISTINCT * FROM table_name;

    • To find the month of a given date, use SELECT EXTRACT(MONTH FROM date_column) AS month_name FROM table_name;

    • Make sure to replace 'table_name' and 'date_column' with the appropriate values in your query.

  • Answered by AI
  • Q3. What operator is used in composer to move data from gcs to bq
  • Ans. 

    The operator used in Composer to move data from GCS to BigQuery is the GCS to BigQuery operator.

    • The GCS to BigQuery operator is used in Apache Airflow, which is the underlying technology of Composer.

    • This operator allows you to transfer data from Google Cloud Storage (GCS) to BigQuery.

    • You can specify the source and destination parameters in the operator to define the data transfer process.

  • Answered by AI
  • Q4. Write a code for this - input = [1,2,3,4] output = [1,4,9,16]
  • Ans. 

    Code to square each element in the input array.

    • Iterate through the input array and square each element.

    • Store the squared values in a new array to get the desired output.

  • Answered by AI
  • Q5. Dataflow vs dataproc.
  • Ans. 

    Dataflow is a fully managed stream and batch processing service, while Dataproc is a managed Apache Spark and Hadoop service.

    • Dataflow is a serverless data processing service that automatically scales to handle your data, while Dataproc is a managed Spark and Hadoop service that requires you to provision and manage clusters.

    • Dataflow is designed for both batch and stream processing, allowing you to process data in real-t...

  • Answered by AI
  • Q6. Architecture of bq. Query optimization techniques in bigquery.
  • Ans. 

    BigQuery architecture includes storage, execution, and optimization components for efficient query processing.

    • BigQuery stores data in Capacitor storage system for fast access.

    • Query execution is distributed across multiple nodes for parallel processing.

    • Query optimization techniques include partitioning tables, clustering tables, and using query cache.

    • Using partitioned tables can help eliminate scanning unnecessary data.

    • ...

  • Answered by AI
  • Q7. RDD vs dataframe vs dataset in pyspark
  • Ans. 

    RDD vs dataframe vs dataset in PySpark

    • RDD (Resilient Distributed Dataset) is the basic abstraction in PySpark, representing a distributed collection of objects

    • Dataframe is a distributed collection of data organized into named columns, similar to a table in a relational database

    • Dataset is a distributed collection of data with the ability to use custom classes for type safety and user-defined functions

    • Dataframes and Data...

  • Answered by AI

Data Engineer Interview Questions & Answers

Wipro user image Lakshmi Narayana

posted on 27 Nov 2024

Interview experience
4
Good
Difficulty level
-
Process Duration
-
Result
-
Round 1 - Technical 

(2 Questions)

  • Q1. Explain adf questions in detail
  • Ans. 

    ADF questions refer to Azure Data Factory questions which are related to data integration and data transformation processes.

    • ADF questions are related to Azure Data Factory, a cloud-based data integration service.

    • These questions may involve data pipelines, data flows, activities, triggers, and data movement.

    • Candidates may be asked about their experience with designing, monitoring, and managing data pipelines in ADF.

    • Exam...

  • Answered by AI
  • Q2. Project related questions
Round 2 - Technical 

(2 Questions)

  • Q1. Project data related questions
  • Q2. Databricks and SQL interview questions
Interview experience
2
Poor
Difficulty level
-
Process Duration
-
Result
-
Round 1 - Coding Test 

Oops dsa sql network

Round 2 - Technical 

(2 Questions)

  • Q1. Dsa based python question
  • Q2. Dsa based python question on tress
Round 3 - Technical 

(2 Questions)

  • Q1. Sql question queries
  • Q2. Sql question query
Interview experience
5
Excellent
Difficulty level
Moderate
Process Duration
Less than 2 weeks
Result
Selected Selected

I applied via Recruitment Consulltant and was interviewed in Jun 2024. There was 1 interview round.

Round 1 - One-on-one 

(20 Questions)

  • Q1. Tell me about yourself
  • Q2. Project Architecture
  • Q3. Rate yourself out of 5 in Pyspark , Python and SQL
  • Ans. 

    I would rate myself 4 in Pyspark, 5 in Python, and 4 in SQL.

    • Strong proficiency in Python programming language

    • Experience in working with Pyspark for big data processing

    • Proficient in writing complex SQL queries for data manipulation

    • Familiarity with optimizing queries for performance

    • Hands-on experience in data engineering projects

  • Answered by AI
  • Q4. How to handle duplicates in python ?
  • Ans. 

    Use Python's built-in data structures like sets or dictionaries to handle duplicates.

    • Use a set to remove duplicates from a list: unique_list = list(set(original_list))

    • Use a dictionary to remove duplicates from a list while preserving order: unique_list = list(dict.fromkeys(original_list))

  • Answered by AI
  • Q5. Methods of migrating Hive metdatastore to unity catalog in Databricks ?
  • Ans. 

    Use Databricks provided tools like databricks-connect and databricks-cli to migrate Hive metadata to Unity catalog.

    • Use databricks-connect to connect to the Databricks workspace from your local development environment.

    • Use databricks-cli to export the Hive metadata from the existing Hive metastore.

    • Create a new Unity catalog in Databricks and import the exported metadata using databricks-cli.

    • Validate the migration by chec...

  • Answered by AI
  • Q6. Read a CSV file from ADLS path ?
  • Ans. 

    To read a CSV file from an ADLS path, you can use libraries like pandas or pyspark.

    • Use pandas library in Python to read a CSV file from ADLS path

    • Use pyspark library in Python to read a CSV file from ADLS path

    • Ensure you have the necessary permissions to access the ADLS path

  • Answered by AI
  • Q7. There was a table provided on coding screen and asked to write different programs and SQL queries from the table and tell the approach you are taking ? Like age greater than 30 then sum the age how would y...
  • Q8. How many stages will create from the above code that I have written
  • Ans. 

    The number of stages created from the code provided depends on the specific code and its functionality.

    • The number of stages can vary based on the complexity of the code and the specific tasks being performed.

    • Stages may include data extraction, transformation, loading, and processing.

    • It is important to analyze the code and identify distinct stages to determine the total number.

  • Answered by AI
  • Q9. Narrow vs Wide Transformation ?
  • Ans. 

    Narrow transformation processes one record at a time, while wide transformation processes multiple records at once.

    • Narrow transformation processes one record at a time, making it easier to parallelize and optimize.

    • Wide transformation processes multiple records at once, which can lead to shuffling and performance issues.

    • Examples of narrow transformations include map and filter operations, while examples of wide transfor

  • Answered by AI
  • Q10. What are action and transformation ?
  • Ans. 

    Actions and transformations are key concepts in data engineering, involving the manipulation and processing of data.

    • Actions are operations that trigger the execution of a data transformation job in a distributed computing environment.

    • Transformations are functions that take an input dataset and produce an output dataset, often involving filtering, aggregating, or joining data.

    • Examples of actions include 'saveAsTextFile'...

  • Answered by AI
  • Q11. What happens when we enforce the schema and when we manually define the schema in the code ?
  • Ans. 

    Enforcing the schema ensures data consistency and validation, while manually defining the schema in code allows for more flexibility and customization.

    • Enforcing the schema ensures that all data conforms to a predefined structure and format, preventing errors and inconsistencies.

    • Manually defining the schema in code allows for more flexibility in handling different data types and structures.

    • Enforcing the schema can be do...

  • Answered by AI
  • Q12. What all the optimisation are possible to reduce the overhead of reducing the reading part of large datasets in spark ?
  • Ans. 

    Optimizations like partitioning, caching, and using efficient file formats can reduce overhead in reading large datasets in Spark.

    • Partitioning data based on key can reduce the amount of data shuffled during joins and aggregations

    • Caching frequently accessed datasets in memory can avoid recomputation

    • Using efficient file formats like Parquet or ORC can reduce disk I/O and improve read performance

  • Answered by AI
  • Q13. Write a sql query to find the name of person who logged in last within each country from Person Table ?
  • Ans. 

    SQL query to find the name of person who logged in last within each country from Person Table

    • Use a subquery to find the max login time for each country

    • Join the Person table with the subquery on country and login time to get the name of the person

  • Answered by AI
  • Q14. Difference between List and Tuple ?
  • Ans. 

    List is mutable, Tuple is immutable in Python.

    • List can be modified after creation, Tuple cannot be modified.

    • List is defined using square brackets [], Tuple is defined using parentheses ().

    • Example: list_example = [1, 2, 3], tuple_example = (4, 5, 6)

  • Answered by AI
  • Q15. Difference between Rank , Dense Rank and Row Number and when we are using each of them ?
  • Ans. 

    Rank assigns a unique rank to each row, Dense Rank assigns a unique rank to each distinct row, and Row Number assigns a unique number to each row.

    • Rank assigns the same rank to rows with the same value, leaving gaps in the ranking if there are ties.

    • Dense Rank assigns a unique rank to each distinct row, leaving no gaps in the ranking.

    • Row Number assigns a unique number to each row, without any regard for the values in the...

  • Answered by AI
  • Q16. What is List Comprehension ?
  • Ans. 

    List comprehension is a concise way to create lists in Python by applying an expression to each item in an iterable.

    • Syntax: [expression for item in iterable]

    • Can include conditions: [expression for item in iterable if condition]

    • Example: squares = [x**2 for x in range(10)]

  • Answered by AI
  • Q17. Tell me about the performance optimization done in your project ?
  • Q18. Difference between the interactive cluster and job cluster ?
  • Ans. 

    Interactive clusters allow for real-time interaction and exploration, while job clusters are used for running batch jobs.

    • Interactive clusters are used for real-time data exploration and analysis.

    • Job clusters are used for running batch jobs and processing large amounts of data.

    • Interactive clusters are typically smaller in size and have shorter lifespans.

    • Job clusters are usually larger and more powerful to handle heavy w...

  • Answered by AI
  • Q19. How to add a column in dataframe ? How to rename the column in dataframe ?
  • Ans. 

    To add a column in a dataframe, use the 'withColumn' method. To rename a column, use the 'withColumnRenamed' method.

    • To add a column, use the 'withColumn' method with the new column name and the expression to compute the values for that column.

    • Example: df.withColumn('new_column', df['existing_column'] * 2)

    • To rename a column, use the 'withColumnRenamed' method with the current column name and the new column name.

    • Example:...

  • Answered by AI
  • Q20. Difference between Coalesce and Repartition and In which case we are using it ?
  • Ans. 

    Coalesce is used to combine multiple small partitions into a larger one, while Repartition is used to increase or decrease the number of partitions in a DataFrame.

    • Coalesce reduces the number of partitions in a DataFrame by combining small partitions into larger ones.

    • Repartition increases or decreases the number of partitions in a DataFrame by shuffling the data across partitions.

    • Coalesce is more efficient than Repartit...

  • Answered by AI

Interview Preparation Tips

Topics to prepare for Accenture Data Engineer interview:
  • Spark
  • Databricks
  • SQL
  • Python
  • ETL
Interview preparation tips for other job seekers - Focus on Basics , definitions and Understand the spark internals . Write SQL codes efficiently.

Skills evaluated in this interview

Interview experience
3
Average
Difficulty level
Moderate
Process Duration
-
Result
Not Selected

I applied via Walk-in

Round 1 - Technical 

(2 Questions)

  • Q1. Difference between rank and dense_rank, Left vs Left anti join
  • Ans. 

    Rank assigns unique ranks to rows, while dense_rank handles ties by assigning the same rank to tied rows. Left join includes all rows from the left table and matching rows from the right table, while left anti join includes only rows from the left table that do not have a match in the right table.

    • Rank assigns unique ranks to rows based on the specified order, while dense_rank handles ties by assigning the same rank to ...

  • Answered by AI
  • Q2. Python list comprehension, SQL query
Round 2 - Behavioral 

(1 Question)

  • Q1. Project related questions

Interview Preparation Tips

Interview preparation tips for other job seekers - No response from HR, even after clearing technical and managerial rounds

Skills evaluated in this interview

Data Engineer Interview Questions & Answers

Cognizant user image Abhishek Paithankar

posted on 16 Nov 2024

Interview experience
5
Excellent
Difficulty level
-
Process Duration
-
Result
-
Round 1 - Aptitude Test 

Aptitude test involved with quantative aptitude, logical reasoning and reading comprehensions.

Round 2 - Technical 

(2 Questions)

  • Q1. Tell me your introduction.
  • Q2. Tell me about your skills.
  • Ans. 

    I have strong skills in data processing, ETL, data modeling, and programming languages like Python and SQL.

    • Proficient in data processing and ETL techniques

    • Strong knowledge of data modeling and database design

    • Experience with programming languages like Python and SQL

    • Familiarity with big data technologies such as Hadoop and Spark

  • Answered by AI
Round 3 - HR 

(2 Questions)

  • Q1. Are you ready relocate,?
  • Ans. 

    Yes, I am open to relocating for the right opportunity.

    • I am willing to relocate for the right job opportunity.

    • I have experience moving for previous roles.

    • I am flexible and adaptable to new locations.

    • I am excited about the possibility of exploring a new city or country.

  • Answered by AI
  • Q2. Document verification

Interview Preparation Tips

Interview preparation tips for other job seekers - If you are fresher first prepare for aptitude, because once aptitude get cleared you will get selected from the large compitition and then focus on your technical knowledge and managerial skills about the company.

ZingMind Technologies Interview FAQs

How many rounds are there in ZingMind Technologies Data Engineer interview?
ZingMind Technologies interview process usually has 1 rounds. The most common rounds in the ZingMind Technologies interview process are Technical.
How to prepare for ZingMind Technologies Data Engineer interview?
Go through your CV in detail and study all the technologies mentioned in your CV. Prepare at least two technologies or languages in depth if you are appearing for a technical interview at ZingMind Technologies. The most common topics and skills that interviewers at ZingMind Technologies expect are Data Warehousing, Java, Python, Azure Databricks and Data Engineering.
What are the top questions asked in ZingMind Technologies Data Engineer interview?

Some of the top questions asked at the ZingMind Technologies Data Engineer interview -

  1. What is hadoop and explain its ty...read more
  2. structure of pysp...read more

Tell us how to improve this page.

Data Engineer Interview Questions from Similar Companies

View all
ZingMind Technologies Data Engineer Salary
based on 5 salaries
₹3 L/yr - ₹10.5 L/yr
44% less than the average Data Engineer Salary in India
View more details
Associate Software Engineer
6 salaries
unlock blur

₹2 L/yr - ₹2.5 L/yr

Data Engineer
5 salaries
unlock blur

₹3 L/yr - ₹10.5 L/yr

Associate Data Engineer
4 salaries
unlock blur

₹3 L/yr - ₹6 L/yr

Software Engineer
3 salaries
unlock blur

₹2.5 L/yr - ₹4.5 L/yr

Explore more salaries
Compare ZingMind Technologies with

TCS

3.7
Compare

Infosys

3.7
Compare

Wipro

3.7
Compare

HCLTech

3.6
Compare

Calculate your in-hand salary

Confused about how your in-hand salary is calculated? Enter your annual salary (CTC) and get your in-hand salary
Did you find this page helpful?
Yes No
write
Share an Interview