i
CitiusTech
Filter interviews by
I applied via Naukri.com and was interviewed in Oct 2024. There were 2 interview rounds.
Spark performance problems can arise due to inefficient code, data skew, resource constraints, and improper configuration.
Inefficient code can lead to slow performance, such as using collect() on large datasets.
Data skew can cause uneven distribution of data across partitions, impacting processing time.
Resource constraints like insufficient memory or CPU can result in slow Spark jobs.
Improper configuration settings, su...
I applied via Naukri.com and was interviewed in Sep 2024. There was 1 interview round.
To create a pipeline in ADF, you can use the Azure Data Factory UI or code-based approach.
Use Azure Data Factory UI to visually create and manage pipelines
Use code-based approach with JSON to define pipelines and activities
Add activities such as data movement, data transformation, and data processing to the pipeline
Set up triggers and schedules for the pipeline to run automatically
Activities in pipelines include data extraction, transformation, loading, and monitoring.
Data extraction: Retrieving data from various sources such as databases, APIs, and files.
Data transformation: Cleaning, filtering, and structuring data for analysis.
Data loading: Loading processed data into a data warehouse or database.
Monitoring: Tracking the performance and health of the pipeline to ensure data quality and reliab
getmetadata is used to retrieve metadata information about a dataset or data source.
getmetadata can provide information about the structure, format, and properties of the data.
It can be used to understand the data schema, column names, data types, and any constraints or relationships.
This information is helpful for data engineers to properly process, transform, and analyze the data.
For example, getmetadata can be used ...
Triggers in databases are special stored procedures that are automatically executed when certain events occur.
Types of triggers include: DML triggers (for INSERT, UPDATE, DELETE operations), DDL triggers (for CREATE, ALTER, DROP operations), and logon triggers.
Triggers can be classified as row-level triggers (executed once for each row affected by the triggering event) or statement-level triggers (executed once for eac...
Normal cluster is used for interactive workloads while job cluster is used for batch processing in Databricks.
Normal cluster is used for ad-hoc queries and exploratory data analysis.
Job cluster is used for running scheduled jobs and batch processing tasks.
Normal cluster is terminated after a period of inactivity, while job cluster is terminated after the job completes.
Normal cluster is more cost-effective for short-liv...
Slowly changing dimensions refer to data warehouse dimensions that change slowly over time.
SCDs are used to track historical changes in data over time.
There are three types of SCDs - Type 1, Type 2, and Type 3.
Type 1 SCDs overwrite old data with new data, Type 2 creates new records for changes, and Type 3 maintains both old and new data in separate columns.
Example: A customer's address changing would be a Type 2 SCD.
Ex...
Use Python's 'with' statement to ensure proper resource management and exception handling.
Use 'with' statement to automatically close files after use
Helps in managing resources like database connections
Ensures proper cleanup even in case of exceptions
List is mutable, tuple is immutable in Python.
List can be modified after creation, tuple cannot be modified.
List uses square brackets [], tuple uses parentheses ().
Lists are used for collections of items that may need to be changed, tuples are used for fixed collections of items.
Example: list_example = [1, 2, 3], tuple_example = (4, 5, 6)
Datalake 1 and Datalake 2 are both storage systems for big data, but they may differ in terms of architecture, scalability, and use cases.
Datalake 1 may use a Hadoop-based architecture while Datalake 2 may use a cloud-based architecture like AWS S3 or Azure Data Lake Storage.
Datalake 1 may be more suitable for on-premise data storage and processing, while Datalake 2 may offer better scalability and flexibility for clou...
To read a file in Databricks, you can use the Databricks File System (DBFS) or Spark APIs.
Use dbutils.fs.ls('dbfs:/path/to/file') to list files in DBFS
Use spark.read.format('csv').load('dbfs:/path/to/file') to read a CSV file
Use spark.read.format('parquet').load('dbfs:/path/to/file') to read a Parquet file
Star schema is denormalized with one central fact table surrounded by dimension tables, while snowflake schema is normalized with multiple related dimension tables.
Star schema is easier to understand and query due to denormalization.
Snowflake schema saves storage space by normalizing data.
Star schema is better for data warehousing and OLAP applications.
Snowflake schema is better for OLTP systems with complex relationsh
repartition increases partitions while coalesce decreases partitions in Spark
repartition shuffles data and can be used for increasing partitions for parallelism
coalesce reduces partitions without shuffling data, useful for reducing overhead
repartition is more expensive than coalesce as it involves data movement
example: df.repartition(10) vs df.coalesce(5)
Parquet file format is a columnar storage format used for efficient data storage and processing.
Parquet files store data in a columnar format, which allows for efficient querying and processing of specific columns without reading the entire file.
It supports complex nested data structures like arrays and maps.
Parquet files are highly compressed, reducing storage space and improving query performance.
It is commonly used ...
Improving query performance by optimizing indexes, using proper data types, and minimizing data retrieval.
Optimize indexes on frequently queried columns
Use proper data types to reduce storage space and improve query speed
Minimize data retrieval by only selecting necessary columns
Avoid using SELECT * in queries
Use query execution plans to identify bottlenecks and optimize accordingly
SCD type2 table is used to track historical changes in data by creating new records for each change.
Contains current and historical data
New records are created for each change
Includes effective start and end dates for each record
Requires additional columns like surrogate keys and version numbers
Used for slowly changing dimensions in data warehousing
What people are saying about CitiusTech
Display in Databricks is used to visualize data in a tabular format or as charts/graphs.
Display function is used to show data in a tabular format in Databricks notebooks.
It can also be used to create visualizations like charts and graphs.
Display can be customized with different options like title, labels, and chart types.
To create a workflow in Databricks, use Databricks Jobs or Databricks Notebooks with scheduling capabilities.
Use Databricks Jobs to create and schedule workflows in Databricks.
Utilize Databricks Notebooks to define the workflow steps and dependencies.
Leverage Databricks Jobs API for programmatic workflow creation and management.
Use Databricks Jobs UI to visually design and schedule workflows.
Integrate with Databricks D
CitiusTech interview questions for designations
Get interview-ready with Top CitiusTech Interview Questions
I applied via Naukri.com and was interviewed in Dec 2024. There was 1 interview round.
I got several invitation calls from 3 different persons for the same interview at Xebia Bangalore Brigade office.I attended an interview at Xebia on January 11, 2025, and the experience was disappointing. Despite reading several negative reviews beforehand, I chose to give the company a fair chance, but unfortunately, the concerns expressed in those reviews turned out to be valid.
From the very beginning, the process was poorly managed. I waited for over three hours before being called, while candidates who arrived after me were invited for their interviews earlier. This inconsistency immediately raised questions about the fairness of their process.
When my turn finally came, the interview began with a moderately challenging SQL question: I was asked to fetch all invalid December month transaction IDs (which is coming in ooo hours) from a dataset, applying conditions such as working hours from Monday to Friday (9 AM to 4 PM), excluding weekends and specific holidays (24th and 25th December). While I attempted to solve this, the interviewer interrupted repeatedly with casual, unrelated remarks. These interruptions disrupted my concentration and added unnecessary pressure, making it difficult to focus on solving the query effectively.
Following this, the interviewer moved to a Python question, which involved determining whether a given number was a perfect square. Although the problem itself was simple, it included irrelevant details, such as pre-imported libraries in a web-based IDE. This added an unnecessary layer of complexity and confusion. Again, the interviewer’s interruptions and casual talk distracted me further. Instead of focusing on assessing my logic and problem-solving skills, he seemed more interested in making irrelevant comments.
What stood out most negatively was the interviewer’s unprofessional behavior. At one point, he made an inappropriate remark about my name, comparing it to his own, which he claimed was not as "weighted."
I asked his name politely and he replied " Vaibhav Gupta"
While I attempted to steer the conversation back to technical discussions, his attitude remained dismissive and unfocused. He even questioned my leadership skills but turned it into an argument instead of allowing me to explain.
I also noticed disparities in how candidates were treated. For instance, a female candidate before me was given over an hour for her interview, while mine felt rushed and dismissive. While this is my personal observation, it raised concerns about bias in their evaluation process.
The interview ended abruptly and on a negative note. When I tried to discuss architectural patterns for data pipelines, the interviewer dismissed my points outright, stating that they did not need data architects. Without providing proper closure, he left the room, leaving me feeling disrespected and undervalued.
Overall, the experience was frustrating and insulting. The interviewer’s behavior was unprofessional and dismissive, and the process lacked the basic respect and fairness expected in a professional setting. Based on my experience, I strongly believe that Xebia needs to overhaul their interview practices, ensuring a more structured, unbiased, and respectful approach toward candidates.
I am relieved I was not selected, as this experience highlighted what could likely be a toxic work environment. I would not recommend Xebia to anyone, as their lack of professionalism and courtesy reflects poorly on their organizational culture.
I was interviewed in Aug 2024.
I applied via Naukri.com and was interviewed in Oct 2024. There was 1 interview round.
Incremental load in pyspark refers to loading only new or updated data into a dataset without reloading the entire dataset.
Use the 'delta' function in pyspark to perform incremental loads by specifying the 'mergeSchema' option.
Utilize the 'partitionBy' function to optimize incremental loads by partitioning the data based on specific columns.
Implement a logic to identify new or updated records based on timestamps or uni...
I applied via Approached by Company and was interviewed in Mar 2024. There were 4 interview rounds.
based on 5 interviews
1 Interview rounds
based on 4 reviews
Rating in categories
Senior Software Engineer
2.6k
salaries
| ₹5.6 L/yr - ₹20 L/yr |
Technical Lead
2k
salaries
| ₹7.3 L/yr - ₹25 L/yr |
Software Engineer
1.2k
salaries
| ₹3.3 L/yr - ₹12.2 L/yr |
Technical Lead 1
376
salaries
| ₹7 L/yr - ₹25.5 L/yr |
Technical Lead 2
295
salaries
| ₹8 L/yr - ₹28 L/yr |
Accenture
Capgemini
TCS
Wipro