Fractal Analytics
Jana Small Finance Bank Interview Questions and Answers
Q1. What are the types of transformation?
Types of transformations include filtering, sorting, aggregating, joining, and pivoting.
Filtering: Selecting a subset of rows based on certain criteria.
Sorting: Arranging rows in a specific order based on one or more columns.
Aggregating: Combining multiple rows into a single result, such as summing or averaging values.
Joining: Combining data from multiple sources based on a common key.
Pivoting: Restructuring data from rows to columns or vice versa.
Q2. What is SCD and there types?
SCD stands for Slowly Changing Dimension. There are three types: Type 1, Type 2, and Type 3.
SCD is used in data warehousing to track changes in dimension data over time.
Type 1 SCD overwrites old data with new data, losing historical information.
Type 2 SCD creates new records for each change, preserving historical data.
Type 3 SCD keeps both old and new data in the same record, with separate columns for each version.
Q3. Why is spark a lazy execution
Spark is lazy execution to optimize performance by delaying computation until necessary.
Spark delays execution until an action is called to optimize performance.
This allows Spark to optimize the execution plan and minimize unnecessary computations.
Lazy evaluation helps in reducing unnecessary data shuffling and processing.
Example: Transformations like map, filter, and reduce are not executed until an action like collect or saveAsTextFile is called.
Q4. What is linked service
A linked service is a connection to an external data source or destination in Azure Data Factory.
Linked services define the connection information needed to connect to external data sources or destinations.
They can be used in pipelines to read from or write to the linked data source.
Examples of linked services include Azure Blob Storage, Azure SQL Database, and Salesforce.
Linked services can store connection strings, authentication details, and other configuration settings.
Q5. What is dataset
A dataset is a collection of data that is organized in a structured format for easy access and analysis.
A dataset can consist of tables, files, or other types of data sources.
It is used for storing and managing data for analysis and reporting purposes.
Examples of datasets include customer information, sales data, and sensor readings.
Datasets can be structured, semi-structured, or unstructured depending on the type of data they contain.
Q6. What is partition pruning
Partition pruning is a query optimization technique that reduces the amount of data scanned by excluding irrelevant partitions.
Partition pruning is used in partitioned tables to skip scanning partitions that do not contain data relevant to the query.
It helps improve query performance by reducing the amount of data that needs to be processed.
For example, if a query filters data based on a specific partition key, partition pruning will only scan the relevant partitions instead ...read more
Interview Process at Jana Small Finance Bank
Top Azure Data Engineer Interview Questions from Similar Companies
Reviews
Interviews
Salaries
Users/Month