Upload Button Icon Add office photos

Filter interviews by

CivicDataLab Data Engineer Interview Questions and Answers for Freshers

Updated 13 Nov 2024

CivicDataLab Data Engineer Interview Experiences for Freshers

1 interview found

Data Engineer Interview Questions & Answers

user image Prajna Prayas

posted on 13 Nov 2024

Interview experience
5
Excellent
Difficulty level
-
Process Duration
-
Result
-
Round 1 - Assignment 

Prefect library, code examples

Interview questions from similar companies

Interview experience
3
Average
Difficulty level
Moderate
Process Duration
Less than 2 weeks
Result
No response

I applied via Naukri.com and was interviewed in Oct 2024. There were 2 interview rounds.

Round 1 - Technical 

(7 Questions)

  • Q1. How do you optimize SQL queries?
  • Ans. 

    Optimizing SQL queries involves using indexes, avoiding unnecessary joins, and optimizing the query structure.

    • Use indexes on columns frequently used in WHERE clauses

    • Avoid using SELECT * and only retrieve necessary columns

    • Optimize joins by using INNER JOIN instead of OUTER JOIN when possible

    • Use EXPLAIN to analyze query performance and make necessary adjustments

  • Answered by AI
  • Q2. How do you do performance optimization in Spark. Tell how you did it in you project.
  • Ans. 

    Performance optimization in Spark involves tuning configurations, optimizing code, and utilizing caching.

    • Tune Spark configurations such as executor memory, number of executors, and shuffle partitions.

    • Optimize code by reducing unnecessary shuffles, using efficient transformations, and avoiding unnecessary data movements.

    • Utilize caching to store intermediate results in memory and avoid recomputation.

    • Example: In my projec...

  • Answered by AI
  • Q3. What is SparkContext and SparkSession?
  • Ans. 

    SparkContext is the main entry point for Spark functionality, while SparkSession is the entry point for Spark SQL.

    • SparkContext is the entry point for low-level API functionality in Spark.

    • SparkSession is the entry point for Spark SQL functionality.

    • SparkContext is used to create RDDs (Resilient Distributed Datasets) in Spark.

    • SparkSession provides a unified entry point for reading data from various sources and performing

  • Answered by AI
  • Q4. When a spark job is submitted, what happens at backend. Explain the flow.
  • Ans. 

    When a spark job is submitted, various steps are executed at the backend to process the job.

    • The job is submitted to the Spark driver program.

    • The driver program communicates with the cluster manager to request resources.

    • The cluster manager allocates resources (CPU, memory) to the job.

    • The driver program creates DAG (Directed Acyclic Graph) of the job stages and tasks.

    • Tasks are then scheduled and executed on worker nodes ...

  • Answered by AI
  • Q5. Calculate second highest salary using SQL as well as pyspark.
  • Ans. 

    Calculate second highest salary using SQL and pyspark

    • Use SQL query with ORDER BY and LIMIT to get the second highest salary

    • In pyspark, use orderBy() and take() functions to achieve the same result

  • Answered by AI
  • Q6. 2 types of modes for Spark architecture ?
  • Ans. 

    The two types of modes for Spark architecture are standalone mode and cluster mode.

    • Standalone mode: Spark runs on a single machine with a single JVM and is suitable for development and testing.

    • Cluster mode: Spark runs on a cluster of machines managed by a cluster manager like YARN or Mesos for production workloads.

  • Answered by AI
  • Q7. If you want very less latency - which is better standalone or client mode?
  • Ans. 

    Client mode is better for very less latency due to direct communication with the cluster.

    • Client mode allows direct communication with the cluster, reducing latency.

    • Standalone mode requires an additional layer of communication, increasing latency.

    • Client mode is preferred for real-time applications where low latency is crucial.

  • Answered by AI
Round 2 - Technical 

(2 Questions)

  • Q1. Scenario based. Write SQL and pyspark code for a dataset.
  • Q2. If you have to find latest record based on latest timestamp in a table for a particular customer(table is having history) , how will you do it. Self join and nested query will be expensive. Optimized query...

Interview Preparation Tips

Topics to prepare for LTIMindtree Data Engineer interview:
  • SQL
  • pyspark
  • ETL
Interview preparation tips for other job seekers - L2 was scheduled next day to L1 so the process is fast. Brush up your practical knowledge more.

Skills evaluated in this interview

Interview experience
4
Good
Difficulty level
Easy
Process Duration
-
Result
-

I applied via Recruitment Consulltant and was interviewed in Nov 2024. There was 1 interview round.

Round 1 - Technical 

(7 Questions)

  • Q1. Difference between bigtable and bigquery.
  • Ans. 

    Bigtable is a NoSQL database for real-time analytics, while BigQuery is a fully managed data warehouse for running SQL queries.

    • Bigtable is a NoSQL database designed for real-time analytics and high throughput, while BigQuery is a fully managed data warehouse for running SQL queries.

    • Bigtable is used for storing large amounts of semi-structured data, while BigQuery is used for analyzing structured data using SQL queries.

    • ...

  • Answered by AI
  • Q2. How to remove duplicate rows from bigquery? find the month of a given date in bigquery.
  • Ans. 

    To remove duplicate rows from BigQuery, use the DISTINCT keyword. To find the month of a given date, use the EXTRACT function.

    • To remove duplicate rows, use SELECT DISTINCT * FROM table_name;

    • To find the month of a given date, use SELECT EXTRACT(MONTH FROM date_column) AS month_name FROM table_name;

    • Make sure to replace 'table_name' and 'date_column' with the appropriate values in your query.

  • Answered by AI
  • Q3. What operator is used in composer to move data from gcs to bq
  • Ans. 

    The operator used in Composer to move data from GCS to BigQuery is the GCS to BigQuery operator.

    • The GCS to BigQuery operator is used in Apache Airflow, which is the underlying technology of Composer.

    • This operator allows you to transfer data from Google Cloud Storage (GCS) to BigQuery.

    • You can specify the source and destination parameters in the operator to define the data transfer process.

  • Answered by AI
  • Q4. Write a code for this - input = [1,2,3,4] output = [1,4,9,16]
  • Ans. 

    Code to square each element in the input array.

    • Iterate through the input array and square each element.

    • Store the squared values in a new array to get the desired output.

  • Answered by AI
  • Q5. Dataflow vs dataproc.
  • Ans. 

    Dataflow is a fully managed stream and batch processing service, while Dataproc is a managed Apache Spark and Hadoop service.

    • Dataflow is a serverless data processing service that automatically scales to handle your data, while Dataproc is a managed Spark and Hadoop service that requires you to provision and manage clusters.

    • Dataflow is designed for both batch and stream processing, allowing you to process data in real-t...

  • Answered by AI
  • Q6. Architecture of bq. Query optimization techniques in bigquery.
  • Ans. 

    BigQuery architecture includes storage, execution, and optimization components for efficient query processing.

    • BigQuery stores data in Capacitor storage system for fast access.

    • Query execution is distributed across multiple nodes for parallel processing.

    • Query optimization techniques include partitioning tables, clustering tables, and using query cache.

    • Using partitioned tables can help eliminate scanning unnecessary data.

    • ...

  • Answered by AI
  • Q7. RDD vs dataframe vs dataset in pyspark
  • Ans. 

    RDD vs dataframe vs dataset in PySpark

    • RDD (Resilient Distributed Dataset) is the basic abstraction in PySpark, representing a distributed collection of objects

    • Dataframe is a distributed collection of data organized into named columns, similar to a table in a relational database

    • Dataset is a distributed collection of data with the ability to use custom classes for type safety and user-defined functions

    • Dataframes and Data...

  • Answered by AI

Data Engineer Interview Questions & Answers

Cognizant user image Abhishek Paithankar

posted on 16 Nov 2024

Interview experience
5
Excellent
Difficulty level
-
Process Duration
-
Result
-
Round 1 - Aptitude Test 

Aptitude test involved with quantative aptitude, logical reasoning and reading comprehensions.

Round 2 - Technical 

(2 Questions)

  • Q1. Tell me your introduction.
  • Q2. Tell me about your skills.
  • Ans. 

    I have strong skills in data processing, ETL, data modeling, and programming languages like Python and SQL.

    • Proficient in data processing and ETL techniques

    • Strong knowledge of data modeling and database design

    • Experience with programming languages like Python and SQL

    • Familiarity with big data technologies such as Hadoop and Spark

  • Answered by AI
Round 3 - HR 

(2 Questions)

  • Q1. Are you ready relocate,?
  • Ans. 

    Yes, I am open to relocating for the right opportunity.

    • I am willing to relocate for the right job opportunity.

    • I have experience moving for previous roles.

    • I am flexible and adaptable to new locations.

    • I am excited about the possibility of exploring a new city or country.

  • Answered by AI
  • Q2. Document verification

Interview Preparation Tips

Interview preparation tips for other job seekers - If you are fresher first prepare for aptitude, because once aptitude get cleared you will get selected from the large compitition and then focus on your technical knowledge and managerial skills about the company.
Interview experience
4
Good
Difficulty level
Moderate
Process Duration
Less than 2 weeks
Result
Not Selected

I applied via Naukri.com and was interviewed in Sep 2024. There were 2 interview rounds.

Round 1 - Aptitude Test 

Round 1 was online Test where one basic coding question was there and few aptitiude , verbal ability and python based question.

Round 2 - One-on-one 

(2 Questions)

  • Q1. What is Data WareHouse..?
  • Ans. 

    A Data Warehouse is a centralized repository that stores integrated data from multiple sources for analysis and reporting.

    • Data Warehouses are designed for query and analysis rather than transaction processing.

    • They often contain historical data and are used for decision-making purposes.

    • Data Warehouses typically use a dimensional model with facts and dimensions.

    • Examples of Data Warehouse tools include Amazon Redshift, Sn

  • Answered by AI
  • Q2. Nested Queries in Bigquery..?
  • Ans. 

    Nested queries in BigQuery allow for querying data from within another query, enabling complex data analysis.

    • Nested queries are queries that are embedded within another query

    • They can be used to perform subqueries to filter, aggregate, or manipulate data

    • Nested queries can be used in SELECT, FROM, WHERE, and HAVING clauses

  • Answered by AI

Interview Preparation Tips

Topics to prepare for Deloitte Data Engineer interview:
  • Python
  • SQL
  • Cloud
Interview experience
3
Average
Difficulty level
-
Process Duration
-
Result
-
Round 1 - Technical 

(2 Questions)

  • Q1. Difference between select and withcolumn in pyspark
  • Ans. 

    select is used to select specific columns from a DataFrame, while withColumn is used to add or update columns in a DataFrame.

    • select is used to select specific columns from a DataFrame

    • withColumn is used to add or update columns in a DataFrame

    • select does not modify the original DataFrame, while withColumn returns a new DataFrame with the added/updated column

    • Example: df.select('col1', 'col2') - selects columns col1 and co...

  • Answered by AI
  • Q2. Difference between variables and parameters in ADF
  • Ans. 

    Variables are used to store values that can be changed, while parameters are used to pass values into activities in ADF.

    • Variables can be modified within a pipeline, while parameters are set at runtime and cannot be changed within the pipeline.

    • Variables are defined within a pipeline, while parameters are defined at the pipeline level.

    • Variables can be used to store intermediate values or results, while parameters are use...

  • Answered by AI

Skills evaluated in this interview

Interview experience
4
Good
Difficulty level
-
Process Duration
-
Result
-
Round 1 - Technical 

(2 Questions)

  • Q1. SQL - Given 2 tables with some nullls and asked to output the count of rows we get for all types of joins
  • Q2. Types of joins and explain cross join
  • Ans. 

    Types of joins in SQL include inner, outer, left, right, and cross join.

    • Inner join: Returns rows when there is a match in both tables

    • Outer join: Returns all rows when there is a match in one of the tables

    • Left join: Returns all rows from the left table and the matched rows from the right table

    • Right join: Returns all rows from the right table and the matched rows from the left table

    • Cross join: Returns the Cartesian produ

  • Answered by AI
Round 2 - HR 

(1 Question)

  • Q1. Projects and impact

Interview Preparation Tips

Interview preparation tips for other job seekers - For Data Engineer Internship, they ask easy to medium DSA and sql questions. That is it. Be good with basics

Skills evaluated in this interview

Interview experience
3
Average
Difficulty level
Easy
Process Duration
-
Result
Selected Selected

I applied via Walk-in

Round 1 - Technical 

(2 Questions)

  • Q1. Spark architecture
  • Q2. Spark Optimisation techniques
  • Ans. 

    Spark optimization techniques aim to improve performance and efficiency of Spark jobs.

    • Use partitioning to distribute data evenly

    • Cache intermediate results to avoid recomputation

    • Optimize shuffle operations by reducing data shuffling

    • Use broadcast variables for small lookup tables

    • Tune memory and executor settings for better performance

  • Answered by AI

Skills evaluated in this interview

Data Engineer Interview Questions & Answers

TCS user image Atharva Bhangre

posted on 16 Jul 2024

Interview experience
5
Excellent
Difficulty level
-
Process Duration
-
Result
-
Round 1 - Technical 

(2 Questions)

  • Q1. Spark related questions
  • Q2. Hadoop architecture and ecosystem questions

Data Engineer Interview Questions & Answers

Infosys user image Rajamanickam S

posted on 30 Apr 2024

Interview experience
4
Good
Difficulty level
-
Process Duration
-
Result
-
Round 1 - Technical 

(1 Question)

  • Q1. About the data structures

CivicDataLab Interview FAQs

How many rounds are there in CivicDataLab Data Engineer interview for freshers?
CivicDataLab interview process for freshers usually has 1 rounds. The most common rounds in the CivicDataLab interview process for freshers are Assignment.

Tell us how to improve this page.

Data Engineer Interview Questions from Similar Companies

View all
Compare CivicDataLab with

TCS

3.7
Compare

Accenture

3.9
Compare

Wipro

3.7
Compare

Cognizant

3.8
Compare

Calculate your in-hand salary

Confused about how your in-hand salary is calculated? Enter your annual salary (CTC) and get your in-hand salary
Did you find this page helpful?
Yes No
write
Share an Interview