Home
Communities
Companies
- Companies
  
  Discover best places to work
- Compare Companies
  
  Compare & find best workplace
- Add Office Photos
  
  Bring your workplace to life
- Add Company Benefits
  
  Highlight your company's perks
Reviews
- Company reviews
  
  Read reviews for 6L+ companies
- Write a review
  
  Rate your former or current company
Salaries
- Browse salaries
  
  Discover salaries for 6L+ companies
- Salary calculator
  
  Calculate your take home salary
- Are you paid fairly?
  
  Check your market value
- Share your salary
  
  Help other jobseekers
- Gratuity calculator
  
  Check your gratuity amount
- HRA calculator
  
  Check how much of your HRA is tax-free
- Salary hike calculator
  
  Check your salary hike
Interviews
- Company interviews
  
  Read interviews for 40K+ companies
- Share interview questions
  
  Contribute your interview questions
Jobs
Awards

VIEW WINNERS
- ABECA 2025
  
  VIEW WINNERS
  
  AmbitionBox Employee Choice Awards - 4th Edition
- ABECA 2024
  
  AmbitionBox Employee Choice Awards - 3rd Edition
- AmbitionBox Best Places to Work 2022
  
  2nd Edition
Participate in ABECA 2026

Add office photos

Engaged Employer

CitiusTech

Compare

3.3

based on 1.8k Reviews

Video summary

Filter interviews by

CitiusTech Data Engineer Interview Questions and Answers

Updated 26 Nov 2024

17 Interview questions

A Data Engineer was asked 7mo ago

Q. What is a SCD type 2 table?

Ans.

SCD type2 table is used to track historical changes in data by creating new records for each change.

Contains current and historical data
New records are created for each change
Includes effective start and end dates for each record
Requires additional columns like surrogate keys and version numbers
Used for slowly changing dimensions in data warehousing

A Data Engineer was asked 7mo ago

Q. How can you improve query performance?

Ans.

Improving query performance by optimizing indexes, using proper data types, and minimizing data retrieval.

Optimize indexes on frequently queried columns
Use proper data types to reduce storage space and improve query speed
Minimize data retrieval by only selecting necessary columns
Avoid using SELECT * in queries
Use query execution plans to identify bottlenecks and optimize accordingly

A Data Engineer was asked 8mo ago

Q. What are the differences between lists and tuples in Python?

Ans.

List is mutable, tuple is immutable in Python.

List can be modified after creation, tuple cannot be modified.
List uses square brackets [], tuple uses parentheses ().
Lists are used for collections of items that may need to be changed, tuples are used for fixed collections of items.
Example: list_example = [1, 2, 3], tuple_example = (4, 5, 6)

A Data Engineer was asked 8mo ago

Q. What are the differences between repartition and coalesce?

Ans.

repartition increases partitions while coalesce decreases partitions in Spark

repartition shuffles data and can be used for increasing partitions for parallelism
coalesce reduces partitions without shuffling data, useful for reducing overhead
repartition is more expensive than coalesce as it involves data movement
example: df.repartition(10) vs df.coalesce(5)

What people are saying about CitiusTech

View All

safecobia

currently not working

Salary Negotiation

Hello Everyone, 🔹 Role: Full Stack Developer 🔹 Experience: 4 YOE 🔹 Current CTC: ₹14.5 LPA (10% variable) 🔹 Location: Pune 🔹 Company: CitiusTech (salary discussion round scheduled) 💬 What’s the expected CTC I can quote or negotiate for? Looking for inputs from folks with similar background or who’ve been through the process recently. Thanks in advance!

Got a question about CitiusTech?

Ask anonymously on communities.

A Data Engineer was asked 8mo ago

Q. How do you create a pipeline in ADF?

Ans.

To create a pipeline in ADF, you can use the Azure Data Factory UI or code-based approach.

Use Azure Data Factory UI to visually create and manage pipelines
Use code-based approach with JSON to define pipelines and activities
Add activities such as data movement, data transformation, and data processing to the pipeline
Set up triggers and schedules for the pipeline to run automatically

A Data Engineer was asked 8mo ago

Q. What is the use of getMetadata?

Ans.

getmetadata is used to retrieve metadata information about a dataset or data source.

getmetadata can provide information about the structure, format, and properties of the data.
It can be used to understand the data schema, column names, data types, and any constraints or relationships.
This information is helpful for data engineers to properly process, transform, and analyze the data.
For example, getmetadata can be ...

A Data Engineer was asked 8mo ago

Q. What are slowly changing dimensions?

Ans.

Slowly changing dimensions refer to data warehouse dimensions that change slowly over time.

SCDs are used to track historical changes in data over time.
There are three types of SCDs - Type 1, Type 2, and Type 3.
Type 1 SCDs overwrite old data with new data, Type 2 creates new records for changes, and Type 3 maintains both old and new data in separate columns.
Example: A customer's address changing would be a Type 2 S...

Are these interview questions helpful?

A Data Engineer was asked 8mo ago

Q. What are the use cases for Parquet files?

Ans.

Parquet file format is a columnar storage format used for efficient data storage and processing.

Parquet files store data in a columnar format, which allows for efficient querying and processing of specific columns without reading the entire file.
It supports complex nested data structures like arrays and maps.
Parquet files are highly compressed, reducing storage space and improving query performance.
It is commonly ...

A Data Engineer was asked 8mo ago

Q. How do you read a file in Databricks?

Ans.

To read a file in Databricks, you can use the Databricks File System (DBFS) or Spark APIs.

Use dbutils.fs.ls('dbfs:/path/to/file') to list files in DBFS
Use spark.read.format('csv').load('dbfs:/path/to/file') to read a CSV file
Use spark.read.format('parquet').load('dbfs:/path/to/file') to read a Parquet file

A Data Engineer was asked 8mo ago

Q. What is the difference between a normal cluster and a job cluster in Databricks?

Ans.

Normal cluster is used for interactive workloads while job cluster is used for batch processing in Databricks.

Normal cluster is used for ad-hoc queries and exploratory data analysis.
Job cluster is used for running scheduled jobs and batch processing tasks.
Normal cluster is terminated after a period of inactivity, while job cluster is terminated after the job completes.
Normal cluster is more cost-effective for shor...

CitiusTech Data Engineer Interview Experiences

5 interviews found

Data Engineer Interview Questions & Answers

Anonymous

posted on 21 Nov 2024

Interview experience

Excellent

Difficulty level

Moderate

Process Duration

Less than 2 weeks

Result

Not Selected

I applied via Naukri.com and was interviewed in Oct 2024. There were 2 interview rounds.

Round 1 - One-on-one

(2 Questions)

Q1. Azure Scenario based questions

Add your answer

Q2. Pyspark Coding based questions

Add your answer

Round 2 - One-on-one

(2 Questions)

Q1. ADF, Databricks related question

Add your answer

Q2. Spark Performance problem and scenarios

Ans.

Spark performance problems can arise due to inefficient code, data skew, resource constraints, and improper configuration.

Inefficient code can lead to slow performance, such as using collect() on large datasets.
Data skew can cause uneven distribution of data across partitions, impacting processing time.
Resource constraints like insufficient memory or CPU can result in slow Spark jobs.
Improper configuration settings, su...

Answered by AI

Add your answer

Skills evaluated in this interview

Data Engineer Interview Questions & Answers

Anonymous

posted on 3 Oct 2024

Interview experience

Good

Difficulty level

Moderate

Process Duration

2-4 weeks

Result

Not Selected

I applied via Naukri.com and was interviewed in Sep 2024. There was 1 interview round.

Round 1 - Technical

(14 Questions)

Q1. How to create pipeline in adf?

Ans.

To create a pipeline in ADF, you can use the Azure Data Factory UI or code-based approach.

Use Azure Data Factory UI to visually create and manage pipelines
Use code-based approach with JSON to define pipelines and activities
Add activities such as data movement, data transformation, and data processing to the pipeline
Set up triggers and schedules for the pipeline to run automatically

Answered by AI

Add your answer

Q2. Diffrent types of activities in pipelines

Ans.

Activities in pipelines include data extraction, transformation, loading, and monitoring.

Data extraction: Retrieving data from various sources such as databases, APIs, and files.
Data transformation: Cleaning, filtering, and structuring data for analysis.
Data loading: Loading processed data into a data warehouse or database.
Monitoring: Tracking the performance and health of the pipeline to ensure data quality and reliab...

Answered by AI

Add your answer

Q3. What is use of getmetadata

Ans.

getmetadata is used to retrieve metadata information about a dataset or data source.

getmetadata can provide information about the structure, format, and properties of the data.
It can be used to understand the data schema, column names, data types, and any constraints or relationships.
This information is helpful for data engineers to properly process, transform, and analyze the data.
For example, getmetadata can be used ...

Answered by AI

Add your answer

Q4. Diffrent types of triggers

Ans.

Triggers in databases are special stored procedures that are automatically executed when certain events occur.

Types of triggers include: DML triggers (for INSERT, UPDATE, DELETE operations), DDL triggers (for CREATE, ALTER, DROP operations), and logon triggers.
Triggers can be classified as row-level triggers (executed once for each row affected by the triggering event) or statement-level triggers (executed once for eac...

Answered by AI

Add your answer

Q5. Diffrence between normal cluster and job cluster in databricks

Ans.

Normal cluster is used for interactive workloads while job cluster is used for batch processing in Databricks.

Normal cluster is used for ad-hoc queries and exploratory data analysis.
Job cluster is used for running scheduled jobs and batch processing tasks.
Normal cluster is terminated after a period of inactivity, while job cluster is terminated after the job completes.
Normal cluster is more cost-effective for short-liv...

Answered by AI

Add your answer

Q6. What is slowly changing dimensions

Ans.

Slowly changing dimensions refer to data warehouse dimensions that change slowly over time.

SCDs are used to track historical changes in data over time.
There are three types of SCDs - Type 1, Type 2, and Type 3.
Type 1 SCDs overwrite old data with new data, Type 2 creates new records for changes, and Type 3 maintains both old and new data in separate columns.
Example: A customer's address changing would be a Type 2 SCD.
Ex...

Answered by AI

Add your answer

Q7. Incremental load

Add your answer

Q8. With use in python

Ans.

Use Python's 'with' statement to ensure proper resource management and exception handling.

Use 'with' statement to automatically close files after use
Helps in managing resources like database connections
Ensures proper cleanup even in case of exceptions

Answered by AI

Add your answer

Q9. List vs tuple in python

Ans.

List is mutable, tuple is immutable in Python.

List can be modified after creation, tuple cannot be modified.
List uses square brackets [], tuple uses parentheses ().
Lists are used for collections of items that may need to be changed, tuples are used for fixed collections of items.
Example: list_example = [1, 2, 3], tuple_example = (4, 5, 6)

Answered by AI

Add your answer

Q10. Datalake 1 vs datalake2

Ans.

Datalake 1 and Datalake 2 are both storage systems for big data, but they may differ in terms of architecture, scalability, and use cases.

Datalake 1 may use a Hadoop-based architecture while Datalake 2 may use a cloud-based architecture like AWS S3 or Azure Data Lake Storage.
Datalake 1 may be more suitable for on-premise data storage and processing, while Datalake 2 may offer better scalability and flexibility for clou...

Answered by AI

Add your answer

Q11. How to read a file in databricks

Ans.

To read a file in Databricks, you can use the Databricks File System (DBFS) or Spark APIs.

Use dbutils.fs.ls('dbfs:/path/to/file') to list files in DBFS
Use spark.read.format('csv').load('dbfs:/path/to/file') to read a CSV file
Use spark.read.format('parquet').load('dbfs:/path/to/file') to read a Parquet file

Answered by AI

Add your answer

Q12. Star vs snowflake schema

Ans.

Star schema is denormalized with one central fact table surrounded by dimension tables, while snowflake schema is normalized with multiple related dimension tables.

Star schema is easier to understand and query due to denormalization.
Snowflake schema saves storage space by normalizing data.
Star schema is better for data warehousing and OLAP applications.
Snowflake schema is better for OLTP systems with complex relationsh...

Answered by AI

Add your answer

Q13. Repartition vs coalesece

Ans.

repartition increases partitions while coalesce decreases partitions in Spark

repartition shuffles data and can be used for increasing partitions for parallelism
coalesce reduces partitions without shuffling data, useful for reducing overhead
repartition is more expensive than coalesce as it involves data movement
example: df.repartition(10) vs df.coalesce(5)

Answered by AI

Add your answer

Q14. Parquet file uses

Ans.

Parquet file format is a columnar storage format used for efficient data storage and processing.

Parquet files store data in a columnar format, which allows for efficient querying and processing of specific columns without reading the entire file.
It supports complex nested data structures like arrays and maps.
Parquet files are highly compressed, reducing storage space and improving query performance.
It is commonly used ...

Answered by AI

Add your answer

Skills evaluated in this interview

Data Engineer Interview Questions & Answers

Sanjay Deo

posted on 26 Nov 2024

Interview experience

Excellent

Difficulty level

Process Duration

Result

Round 1 - Technical

(2 Questions)

Q1. What can you improve the query performance?

Ans.

Improving query performance by optimizing indexes, using proper data types, and minimizing data retrieval.

Optimize indexes on frequently queried columns
Use proper data types to reduce storage space and improve query speed
Minimize data retrieval by only selecting necessary columns
Avoid using SELECT * in queries
Use query execution plans to identify bottlenecks and optimize accordingly

Answered by AI

Add your answer

Q2. What id SCD type2 table?

Ans.

SCD type2 table is used to track historical changes in data by creating new records for each change.

Contains current and historical data
New records are created for each change
Includes effective start and end dates for each record
Requires additional columns like surrogate keys and version numbers
Used for slowly changing dimensions in data warehousing

Answered by AI

Add your answer

Data Engineer Interview Questions & Answers

Anonymous

posted on 15 Jul 2024

Interview experience

Excellent

Difficulty level

Process Duration

Result

Round 1 - Technical

(2 Questions)

Q1. Use of display in databricks

Ans.

Display in Databricks is used to visualize data in a tabular format or as charts/graphs.

Display function is used to show data in a tabular format in Databricks notebooks.
It can also be used to create visualizations like charts and graphs.
Display can be customized with different options like title, labels, and chart types.

Answered by AI

Add your answer

Q2. How to create workflow in databrics

Ans.

To create a workflow in Databricks, use Databricks Jobs or Databricks Notebooks with scheduling capabilities.

Use Databricks Jobs to create and schedule workflows in Databricks.
Utilize Databricks Notebooks to define the workflow steps and dependencies.
Leverage Databricks Jobs API for programmatic workflow creation and management.
Use Databricks Jobs UI to visually design and schedule workflows.
Integrate with Databricks D...

Answered by AI

Add your answer

Skills evaluated in this interview

Data Engineer Interview Questions & Answers

Bb Stan

posted on 24 May 2024

Interview experience

Bad

Difficulty level

Process Duration

Result

Round 1 - Technical

(1 Question)

Q1. All abt sql and databricks Then some questions from adf

Add your answer

Interview questions from similar companies

Software Engineer Interview Questions & Answers

Xoriant

Anonymous

posted on 4 May 2019

I applied via Naukri.com and was interviewed before May 2018. There were 5 interview rounds.

Interview Questionnaire

4 Questions

Q1. Telephonic technical

Add your answer

Q2. Core Java related exception handling ,design pattern ,oops solid design principle, rest API, different annotations of spring and jpa

Add your answer

Q3. Same questions on telephonic round but detailed elaborate and given simple problem statement we had to justify that why it's time n space complexity valid. Rest API questions hibernate orm use

Add your answer

Q4. Manager round just to check whether you have actually worked on project or not stress testing performance questions scenario questions

Add your answer

Interview Preparation Tips

General Tips: Quite easy just go with preparation
Skills: Core Java sevlet JSP hibernate spring rest API, Communication, Body Language, Problem Solving, Analytical Skills, Decision Making Skills
Duration: 1-4 weeks

Software Engineer Interview Questions & Answers

Globant

Anonymous

posted on 13 Jun 2021

I applied via LinkedIn and was interviewed before Jun 2020. There were 3 interview rounds.

Interview Questionnaire

1 Question

Q1. Basics Of JS

Add your answer

Are these interview questions helpful?

Software Engineer Interview Questions & Answers

Apexon

Anonymous

posted on 5 Feb 2022

I applied via LinkedIn and was interviewed before Feb 2021. There were 3 interview rounds.

Round 1 - Technical

(1 Question)

Q1. Just basic questions like Method overloading, riding Abstract and interface Use of static Etc..

Add your answer

Round 2 - Coding Test

Basic Java programs related to string and array manipulation

Round 3 - Cliend Round

(1 Question)

Q1. Jus basic questions related to Work culture and privacy

Add your answer

Interview Preparation Tips

Topics to prepare for Apexon Software Engineer interview:

Java

Interview preparation tips for other job seekers - Study basics to advanced iN Java

Software Engineer Interview Questions & Answers

HTC Global Services

ARUN PRASATH JAYABAL

posted on 15 Jul 2022

I applied via Approached by Company and was interviewed before Jul 2021. There were 2 interview rounds.

Round 1 - Aptitude Test

Basic programming questions

Round 2 - HR

(1 Question)

Q1. Salary and self intro discussion

Add your answer

Interview Preparation Tips

Interview preparation tips for other job seekers - Prepare basic interview questions and self intro

Software Engineer Interview Questions & Answers

Xoriant

Anonymous

posted on 25 Apr 2022

I applied via Naukri.com and was interviewed before Apr 2021. There was 1 interview round.

Round 1 - Technical

(2 Questions)

Q1. Basic python list tuples set dictionary related questions

Add your answer

Q2. Decorators generator and django rest framework

Add your answer

Interview Preparation Tips

Interview preparation tips for other job seekers - Focus on logical and basic python fundamental

CitiusTech Interview FAQs

How many rounds are there in CitiusTech Data Engineer interview?

CitiusTech interview process usually has 1-2 rounds. The most common rounds in the CitiusTech interview process are Technical and One-on-one Round.

How to prepare for CitiusTech Data Engineer interview?

Go through your CV in detail and study all the technologies mentioned in your CV. Prepare at least two technologies or languages in depth if you are appearing for a technical interview at CitiusTech. The most common topics and skills that interviewers at CitiusTech expect are Python, AWS, AWS Glue, Airflow and Amazon Redshift.

What are the top questions asked in CitiusTech Data Engineer interview?

Some of the top questions asked at the CitiusTech Data Engineer interview -

what can you improve the query performan...read more
diffrence between normal cluster and job cluster in databri...read more
how to read a file in databri...read more

Tell us how to improve this page.

CitiusTech Interviews By Designations

Interview Questions for Popular Designations

4/5

based on 5 interview experiences

Difficulty level

Moderate 100%

Duration

Less than 2 weeks 50%

2-4 weeks 50%

Altimetrik Data Engineer Interview Questions

3.7

• 18 Interviews

Apexon Data Engineer Interview Questions

3.3

• 6 Interviews

Luxoft Data Engineer Interview Questions

3.7

• 4 Interviews

Brillio Data Engineer Interview Questions

3.4

• 2 Interviews

Collabera Technologies Data Engineer Interview Questions

3.5

• 2 Interviews

TEKsystems Data Engineer Interview Questions

3.3

• 2 Interviews

Encora Data Engineer Interview Questions

3.7

• 2 Interviews

HERE Technologies Data Engineer Interview Questions

3.8

• 2 Interviews

Globant Data Engineer Interview Questions

3.7

• 1 Interview

ThoughtWorks Data Engineer Interview Questions

3.9

• 1 Interview

View all

CitiusTech Data Engineer Salary

based on 69 salaries

₹9.6 L/yr - ₹18.1 L/yr

11% more than the average Data Engineer Salary in India

View more details

Data Engineer Jobs at CitiusTech

GenAI - Data Engineer

Pune

3-5 Yrs

₹ 6-15.9 LPA

Explore more jobs

CitiusTech Salaries in India

Senior Software Engineer 2.7k salaries	₹8.4 L/yr - ₹15.8 L/yr
Technical Lead 2.1k salaries	₹12.2 L/yr - ₹21.9 L/yr
Software Engineer 1.3k salaries	₹4.2 L/yr - ₹9.2 L/yr
Technical Lead 1 407 salaries	₹11.9 L/yr - ₹21.5 L/yr
Technical Lead 2 347 salaries	₹14.1 L/yr - ₹25.8 L/yr