Add office photos
Engaged Employer

CitiusTech

3.4
based on 1.5k Reviews
Filter interviews by

10+ Atal Indore City Transport Services Interview Questions and Answers

Updated 26 Nov 2024
Popular Designations

Q1. what can you improve the query performance?

Ans.

Improving query performance by optimizing indexes, using proper data types, and minimizing data retrieval.

  • Optimize indexes on frequently queried columns

  • Use proper data types to reduce storage space and improve query speed

  • Minimize data retrieval by only selecting necessary columns

  • Avoid using SELECT * in queries

  • Use query execution plans to identify bottlenecks and optimize accordingly

Add your answer

Q2. diffrence between normal cluster and job cluster in databricks

Ans.

Normal cluster is used for interactive workloads while job cluster is used for batch processing in Databricks.

  • Normal cluster is used for ad-hoc queries and exploratory data analysis.

  • Job cluster is used for running scheduled jobs and batch processing tasks.

  • Normal cluster is terminated after a period of inactivity, while job cluster is terminated after the job completes.

  • Normal cluster is more cost-effective for short-lived workloads, while job cluster is more cost-effective for...read more

Add your answer

Q3. how to read a file in databricks

Ans.

To read a file in Databricks, you can use the Databricks File System (DBFS) or Spark APIs.

  • Use dbutils.fs.ls('dbfs:/path/to/file') to list files in DBFS

  • Use spark.read.format('csv').load('dbfs:/path/to/file') to read a CSV file

  • Use spark.read.format('parquet').load('dbfs:/path/to/file') to read a Parquet file

Add your answer

Q4. How to create pipeline in adf?

Ans.

To create a pipeline in ADF, you can use the Azure Data Factory UI or code-based approach.

  • Use Azure Data Factory UI to visually create and manage pipelines

  • Use code-based approach with JSON to define pipelines and activities

  • Add activities such as data movement, data transformation, and data processing to the pipeline

  • Set up triggers and schedules for the pipeline to run automatically

Add your answer
Discover Atal Indore City Transport Services interview dos and don'ts from real experiences

Q5. diffrent types of activities in pipelines

Ans.

Activities in pipelines include data extraction, transformation, loading, and monitoring.

  • Data extraction: Retrieving data from various sources such as databases, APIs, and files.

  • Data transformation: Cleaning, filtering, and structuring data for analysis.

  • Data loading: Loading processed data into a data warehouse or database.

  • Monitoring: Tracking the performance and health of the pipeline to ensure data quality and reliability.

Add your answer

Q6. what is slowly changing dimensions

Ans.

Slowly changing dimensions refer to data warehouse dimensions that change slowly over time.

  • SCDs are used to track historical changes in data over time.

  • There are three types of SCDs - Type 1, Type 2, and Type 3.

  • Type 1 SCDs overwrite old data with new data, Type 2 creates new records for changes, and Type 3 maintains both old and new data in separate columns.

  • Example: A customer's address changing would be a Type 2 SCD.

  • Example: A product's price changing would be a Type 1 SCD.

Add your answer
Are these interview questions helpful?

Q7. how to create workflow in databrics

Ans.

To create a workflow in Databricks, use Databricks Jobs or Databricks Notebooks with scheduling capabilities.

  • Use Databricks Jobs to create and schedule workflows in Databricks.

  • Utilize Databricks Notebooks to define the workflow steps and dependencies.

  • Leverage Databricks Jobs API for programmatic workflow creation and management.

  • Use Databricks Jobs UI to visually design and schedule workflows.

  • Integrate with Databricks Delta for optimized data processing and storage.

Add your answer

Q8. what is use of getmetadata

Ans.

getmetadata is used to retrieve metadata information about a dataset or data source.

  • getmetadata can provide information about the structure, format, and properties of the data.

  • It can be used to understand the data schema, column names, data types, and any constraints or relationships.

  • This information is helpful for data engineers to properly process, transform, and analyze the data.

  • For example, getmetadata can be used in data pipelines to dynamically adjust processing logic b...read more

Add your answer
Share interview questions and help millions of jobseekers 🌟

Q9. What id SCD type2 table?

Ans.

SCD type2 table is used to track historical changes in data by creating new records for each change.

  • Contains current and historical data

  • New records are created for each change

  • Includes effective start and end dates for each record

  • Requires additional columns like surrogate keys and version numbers

  • Used for slowly changing dimensions in data warehousing

Add your answer

Q10. Spark Performance problem and scenarios

Ans.

Spark performance problems can arise due to inefficient code, data skew, resource constraints, and improper configuration.

  • Inefficient code can lead to slow performance, such as using collect() on large datasets.

  • Data skew can cause uneven distribution of data across partitions, impacting processing time.

  • Resource constraints like insufficient memory or CPU can result in slow Spark jobs.

  • Improper configuration settings, such as too few executors or memory allocation, can hinder p...read more

Add your answer

Q11. diffrent types of triggers

Ans.

Triggers in databases are special stored procedures that are automatically executed when certain events occur.

  • Types of triggers include: DML triggers (for INSERT, UPDATE, DELETE operations), DDL triggers (for CREATE, ALTER, DROP operations), and logon triggers.

  • Triggers can be classified as row-level triggers (executed once for each row affected by the triggering event) or statement-level triggers (executed once for each triggering event).

  • Examples of triggers include: BEFORE I...read more

Add your answer

Q12. list vs tuple in python

Ans.

List is mutable, tuple is immutable in Python.

  • List can be modified after creation, tuple cannot be modified.

  • List uses square brackets [], tuple uses parentheses ().

  • Lists are used for collections of items that may need to be changed, tuples are used for fixed collections of items.

  • Example: list_example = [1, 2, 3], tuple_example = (4, 5, 6)

Add your answer

Q13. datalake 1 vs datalake2

Ans.

Datalake 1 and Datalake 2 are both storage systems for big data, but they may differ in terms of architecture, scalability, and use cases.

  • Datalake 1 may use a Hadoop-based architecture while Datalake 2 may use a cloud-based architecture like AWS S3 or Azure Data Lake Storage.

  • Datalake 1 may be more suitable for on-premise data storage and processing, while Datalake 2 may offer better scalability and flexibility for cloud-based environments.

  • Datalake 1 may be more cost-effective...read more

Add your answer

Q14. star vs snowflake schema

Ans.

Star schema is denormalized with one central fact table surrounded by dimension tables, while snowflake schema is normalized with multiple related dimension tables.

  • Star schema is easier to understand and query due to denormalization.

  • Snowflake schema saves storage space by normalizing data.

  • Star schema is better for data warehousing and OLAP applications.

  • Snowflake schema is better for OLTP systems with complex relationships.

Add your answer

Q15. repartition vs coalesece

Ans.

repartition increases partitions while coalesce decreases partitions in Spark

  • repartition shuffles data and can be used for increasing partitions for parallelism

  • coalesce reduces partitions without shuffling data, useful for reducing overhead

  • repartition is more expensive than coalesce as it involves data movement

  • example: df.repartition(10) vs df.coalesce(5)

Add your answer

Q16. use of display in databricks

Ans.

Display in Databricks is used to visualize data in a tabular format or as charts/graphs.

  • Display function is used to show data in a tabular format in Databricks notebooks.

  • It can also be used to create visualizations like charts and graphs.

  • Display can be customized with different options like title, labels, and chart types.

Add your answer

Q17. with use in python

Ans.

Use Python's 'with' statement to ensure proper resource management and exception handling.

  • Use 'with' statement to automatically close files after use

  • Helps in managing resources like database connections

  • Ensures proper cleanup even in case of exceptions

Add your answer

Q18. parquet file uses

Ans.

Parquet file format is a columnar storage format used for efficient data storage and processing.

  • Parquet files store data in a columnar format, which allows for efficient querying and processing of specific columns without reading the entire file.

  • It supports complex nested data structures like arrays and maps.

  • Parquet files are highly compressed, reducing storage space and improving query performance.

  • It is commonly used in big data processing frameworks like Apache Spark and Ha...read more

Add your answer
Contribute & help others!
Write a review
Share interview
Contribute salary
Add office photos

Interview Process at Atal Indore City Transport Services

based on 5 interviews in the last 1 year
1 Interview rounds
Technical Round
View more
Interview Tips & Stories
Ace your next interview with expert advice and inspiring stories

Top Data Engineer Interview Questions from Similar Companies

3.8
 • 39 Interview Questions
3.8
 • 26 Interview Questions
3.7
 • 13 Interview Questions
3.0
 • 12 Interview Questions
3.5
 • 11 Interview Questions
4.0
 • 11 Interview Questions
View all
Share an Interview
Stay ahead in your career. Get AmbitionBox app
qr-code
Helping over 1 Crore job seekers every month in choosing their right fit company
70 Lakh+

Reviews

5 Lakh+

Interviews

4 Crore+

Salaries

1 Cr+

Users/Month

Contribute to help millions
Get AmbitionBox app

Made with ❤️ in India. Trademarks belong to their respective owners. All rights reserved © 2024 Info Edge (India) Ltd.

Follow us
  • Youtube
  • Instagram
  • LinkedIn
  • Facebook
  • Twitter