Add office photos
Employer?
Claim Account for FREE

Fragma Data Systems

4.4
based on 112 Reviews
Filter interviews by

10+ Xora Software Systems Interview Questions and Answers

Updated 16 Jan 2025

Q1. There are four cores and four worker nodes in Spark. How many jobs will run in parallel?

Ans.

Only one job will run in parallel in Spark with four cores and four worker nodes.

  • In Spark, each core can only run one task at a time, so with four cores, only four tasks can run concurrently.

  • Since there are four worker nodes, each with four cores, a total of four tasks can run in parallel.

  • Therefore, only one job will run in parallel in this scenario.

Add your answer

Q2. What are the optimisation techniques you have used in your project ?

Ans.

I have used techniques like indexing, query optimization, and parallel processing in my projects.

  • Indexing: Used to improve the speed of data retrieval by creating indexes on columns frequently used in queries.

  • Query optimization: Rewriting queries to improve efficiency and reduce execution time.

  • Parallel processing: Distributing tasks across multiple processors to speed up data processing.

  • Caching: Storing frequently accessed data in memory to reduce the need for repeated retrie...read more

Add your answer

Q3. SQL: To calculate the difference in marks for each student ID and marks across different years?

Ans.

Use SQL to calculate the difference in marks for each student ID across different years.

  • Use a self join on the table to compare marks for the same student ID across different years.

  • Calculate the difference in marks by subtracting the marks from different years.

  • Group the results by student ID to get the difference in marks for each student.

Add your answer

Q4. SQL: Statewise which gender purchase is the most?

Ans.

The answer to the question is that in which state which gender makes the most purchases.

  • Aggregate the data by state and gender to calculate the total purchases made by each gender in each state.

  • Identify the gender with the highest total purchases in each state.

  • Present the results in a table or chart for easy visualization.

Add your answer
Discover Xora Software Systems interview dos and don'ts from real experiences

Q5. How does Spark handle fault tolerance?

Ans.

Spark handles fault tolerance through resilient distributed datasets (RDDs) and lineage tracking.

  • Spark achieves fault tolerance through RDDs, which are immutable distributed collections of objects that can be rebuilt if a partition is lost.

  • RDDs track the lineage of transformations applied to the data, allowing lost partitions to be recomputed based on the original data and transformations.

  • Spark also replicates data partitions across multiple nodes to ensure availability in ca...read more

Add your answer

Q6. What is ADF ??

Ans.

ADF stands for Azure Data Factory, a cloud-based data integration service that allows you to create, schedule, and manage data pipelines.

  • ADF is used for building, scheduling, and monitoring data pipelines to move and transform data from various sources to destinations.

  • It supports data integration between various data stores such as Azure SQL Database, Azure Blob Storage, and on-premises data sources.

  • ADF provides a visual interface for designing and monitoring data pipelines, ...read more

Add your answer
Are these interview questions helpful?

Q7. What is DAG ??

Ans.

DAG stands for Directed Acyclic Graph, a data structure used to represent dependencies between tasks in a workflow.

  • DAG is a collection of nodes connected by edges, where each edge has a direction and there are no cycles.

  • It is commonly used in data engineering for representing data pipelines and workflows.

  • DAGs help in visualizing and optimizing the order of tasks to be executed in a workflow.

  • Popular tools like Apache Airflow use DAGs to define and schedule data pipelines.

Add your answer

Q8. What is Lineage ??

Ans.

Lineage refers to the history and origin of data, including its source, transformations, and dependencies.

  • Lineage helps in understanding how data is generated, processed, and transformed throughout its lifecycle.

  • It tracks the flow of data from its source to its destination, including any intermediate steps or transformations.

  • Lineage is important for data governance, data quality, and troubleshooting data issues.

  • Examples of lineage tools include Apache Atlas, Informatica Metad...read more

Add your answer
Share interview questions and help millions of jobseekers 🌟

Q9. Find duplicate rows in the given table

Ans.

Identify duplicate rows in a table

  • Use SQL query with GROUP BY and HAVING clause to identify duplicate rows based on specific columns

  • Example: SELECT column1, column2, COUNT(*) FROM table_name GROUP BY column1, column2 HAVING COUNT(*) > 1

Add your answer

Q10. What are different types of Joins?

Ans.

Joins are used to combine data from two or more tables based on a related column between them.

  • Inner Join: returns only the matching rows from both tables

  • Left Join: returns all rows from the left table and matching rows from the right table

  • Right Join: returns all rows from the right table and matching rows from the left table

  • Full Outer Join: returns all rows from both tables

  • Cross Join: returns the Cartesian product of both tables

Add your answer

Q11. What is the Difference between Union and Joins

Ans.

Joins combine rows from two or more tables based on a related column, while unions combine rows from two or more tables with the same structure.

  • Joins are used to combine data from different tables based on a related column

  • Unions are used to combine data from tables with the same structure

  • Joins can be inner, left, right, or full, while unions are always a combination of all rows

  • Joins can have multiple conditions, while unions require the same number of columns and data types

Add your answer

Q12. Pyspark code for how you connect the data to ADLS with doing partition

Ans.

Use PySpark code to connect data to ADLS with partitioning

  • Use SparkSession to create a Spark application

  • Set the configuration for ADLS storage account and container

  • Read data from ADLS using Spark DataFrame API

  • Partition the data based on a specific column while writing back to ADLS

Add your answer

Q13. Windows function and how you used

Ans.

Window functions are used to perform calculations across a set of table rows that are related to the current row.

  • Window functions are used in SQL to perform calculations on a specific subset of rows related to the current row.

  • They are often used with aggregate functions like SUM, AVG, and COUNT to calculate running totals, moving averages, and rankings.

  • Examples of window functions include ROW_NUMBER(), RANK(), LEAD(), and LAG().

Add your answer
Contribute & help others!
Write a review
Share interview
Contribute salary
Add office photos

Interview Process at Xora Software Systems

based on 6 interviews in the last 1 year
Interview experience
4.3
Good
View more
Interview Tips & Stories
Ace your next interview with expert advice and inspiring stories

Top Interview Questions from Similar Companies

3.8
 • 3k Interview Questions
4.0
 • 399 Interview Questions
3.7
 • 322 Interview Questions
4.0
 • 247 Interview Questions
4.2
 • 237 Interview Questions
3.8
 • 134 Interview Questions
View all
Top Fragma Data Systems Interview Questions And Answers
Share an Interview
Stay ahead in your career. Get AmbitionBox app
qr-code
Helping over 1 Crore job seekers every month in choosing their right fit company
70 Lakh+

Reviews

5 Lakh+

Interviews

4 Crore+

Salaries

1 Cr+

Users/Month

Contribute to help millions
Get AmbitionBox app

Made with ❤️ in India. Trademarks belong to their respective owners. All rights reserved © 2024 Info Edge (India) Ltd.

Follow us
  • Youtube
  • Instagram
  • LinkedIn
  • Facebook
  • Twitter