Filter interviews by
I applied via Company Website and was interviewed in Nov 2024. There was 1 interview round.
Window functions are used to perform calculations across a set of table rows that are related to the current row.
Window functions are used in SQL to perform calculations on a specific subset of rows related to the current row.
They are often used with aggregate functions like SUM, AVG, and COUNT to calculate running totals, moving averages, and rankings.
Examples of window functions include ROW_NUMBER(), RANK(), LEAD(),
Use PySpark code to connect data to ADLS with partitioning
Use SparkSession to create a Spark application
Set the configuration for ADLS storage account and container
Read data from ADLS using Spark DataFrame API
Partition the data based on a specific column while writing back to ADLS
I applied via Naukri.com and was interviewed in Aug 2024. There was 1 interview round.
I am a Data Engineer with experience in designing and implementing project architectures. My day-to-day responsibilities include data processing, ETL tasks, and ensuring data quality.
Designing and implementing project architectures for data processing
Performing ETL tasks to extract, transform, and load data into the system
Ensuring data quality and integrity through data validation and cleansing
Collaborating with cross-...
Use SQL to calculate the difference in marks for each student ID across different years.
Use a self join on the table to compare marks for the same student ID across different years.
Calculate the difference in marks by subtracting the marks from different years.
Group the results by student ID to get the difference in marks for each student.
The answer to the question is that in which state which gender makes the most purchases.
Aggregate the data by state and gender to calculate the total purchases made by each gender in each state.
Identify the gender with the highest total purchases in each state.
Present the results in a table or chart for easy visualization.
ADF stands for Azure Data Factory, a cloud-based data integration service that allows you to create, schedule, and manage data pipelines.
ADF is used for building, scheduling, and monitoring data pipelines to move and transform data from various sources to destinations.
It supports data integration between various data stores such as Azure SQL Database, Azure Blob Storage, and on-premises data sources.
ADF provides a visu...
DAG stands for Directed Acyclic Graph, a data structure used to represent dependencies between tasks in a workflow.
DAG is a collection of nodes connected by edges, where each edge has a direction and there are no cycles.
It is commonly used in data engineering for representing data pipelines and workflows.
DAGs help in visualizing and optimizing the order of tasks to be executed in a workflow.
Popular tools like Apache Ai...
Lineage refers to the history and origin of data, including its source, transformations, and dependencies.
Lineage helps in understanding how data is generated, processed, and transformed throughout its lifecycle.
It tracks the flow of data from its source to its destination, including any intermediate steps or transformations.
Lineage is important for data governance, data quality, and troubleshooting data issues.
Example...
Spark handles fault tolerance through resilient distributed datasets (RDDs) and lineage tracking.
Spark achieves fault tolerance through RDDs, which are immutable distributed collections of objects that can be rebuilt if a partition is lost.
RDDs track the lineage of transformations applied to the data, allowing lost partitions to be recomputed based on the original data and transformations.
Spark also replicates data par...
Only one job will run in parallel in Spark with four cores and four worker nodes.
In Spark, each core can only run one task at a time, so with four cores, only four tasks can run concurrently.
Since there are four worker nodes, each with four cores, a total of four tasks can run in parallel.
Therefore, only one job will run in parallel in this scenario.
I have used techniques like indexing, query optimization, and parallel processing in my projects.
Indexing: Used to improve the speed of data retrieval by creating indexes on columns frequently used in queries.
Query optimization: Rewriting queries to improve efficiency and reduce execution time.
Parallel processing: Distributing tasks across multiple processors to speed up data processing.
Caching: Storing frequently acce...
Fragma Data Systems interview questions for popular designations
I was interviewed before Oct 2022.
1.ETL Pipeline
2.PySpark Code
3.SQL
I applied via Cutshort.io and was interviewed in Nov 2021. There were 3 interview rounds.
I applied via Internshala and was interviewed before Sep 2021. There were 5 interview rounds.
I was given a pdf file containing 3 Problem statements along with the output for which I had to write SQL queries.
Joins are used to combine data from two or more tables based on a related column between them.
Inner Join: returns only the matching rows from both tables
Left Join: returns all rows from the left table and matching rows from the right table
Right Join: returns all rows from the right table and matching rows from the left table
Full Outer Join: returns all rows from both tables
Cross Join: returns the Cartesian product of b
Joins combine rows from two or more tables based on a related column, while unions combine rows from two or more tables with the same structure.
Joins are used to combine data from different tables based on a related column
Unions are used to combine data from tables with the same structure
Joins can be inner, left, right, or full, while unions are always a combination of all rows
Joins can have multiple conditions, while ...
I was interviewed in Jun 2021.
Top trending discussions
Interview experience
based on 112 reviews
Rating in categories
5-9 Yrs
₹ 15-30 LPA
Data Engineer
56
salaries
| ₹3.3 L/yr - ₹14.5 L/yr |
Software Engineer
32
salaries
| ₹4.1 L/yr - ₹14.7 L/yr |
Business Analyst
24
salaries
| ₹3.5 L/yr - ₹6 L/yr |
Senior Software Engineer
16
salaries
| ₹6.9 L/yr - ₹26 L/yr |
Data Analyst
12
salaries
| ₹3.5 L/yr - ₹6.4 L/yr |
TCS
Infosys
Wipro
HCLTech