Aurigene Oncology Limited Interview Questions and Answers

Question 1

Asked in

Data Engineer Interview

Q1. In a word count spark program which command will run on driver and which will run on executor

Add your answer

Answer

Commands that run on driver and executor in a word count Spark program.

The command to read the input file and create RDD will run on driver.
The command to split the lines and count the words will run on executor.
The command to aggregate the word counts and write the output will run on driver.
Driver sends tasks to executors and coordinates the overall job.
Executor processes the tasks assigned by the driver.

Question 2

Asked in

Data Engineer Interview

Q2. What are the key features and functionalities of Snowflake?

Add your answer

Answer

Snowflake is a cloud-based data warehousing platform known for its scalability, performance, and ease of use.

Snowflake uses a unique architecture called multi-cluster, which separates storage and compute resources for better scalability and performance.
It supports both structured and semi-structured data, allowing users to work with various data types.
Snowflake offers features like automatic scaling, data sharing, and built-in support for SQL queries.
It provides a web interfa...read more

Question 3

Asked in

Data Engineer Interview

Q3. Joins in Sql, Modelling and visualization part in PowerBI

Add your answer

Answer

Answering about joins in SQL and modeling/visualization in PowerBI

Joins in SQL are used to combine data from two or more tables based on a related column
There are different types of joins such as inner join, left join, right join, and full outer join
PowerBI is a data visualization tool that allows users to create interactive reports and dashboards
Data modeling in PowerBI involves creating relationships between tables and defining measures and calculated columns
Visualization i...read more

Question 4

Asked in

Data Engineer Interview

Q4. Cumulative sum and rank functions in spark

Add your answer

Answer

Explanation of cumulative sum and rank functions in Spark

Cumulative sum function calculates the running total of a column
Rank function assigns a rank to each row based on the order of values in a column
Both functions can be used with window functions in Spark
Example: df.withColumn('cumulative_sum', F.sum('column').over(Window.orderBy('order_column').rowsBetween(Window.unboundedPreceding, Window.currentRow)))
Example: df.withColumn('rank', F.rank().over(Window.orderBy('column')...read more

Question 5

Asked in

Data Engineer Interview

Q5. Slowly change data handling in spark

Add your answer

Answer

Slowly changing data handling in Spark involves updating data over time.

Slowly changing dimensions (SCD) are used to track changes in data over time.
SCD Type 1 updates the data in place, overwriting the old values.
SCD Type 2 creates a new record for each change, with a start and end date.
SCD Type 3 adds a new column to the existing record to track changes.
Spark provides functions like `from_unixtime` and `unix_timestamp` to handle timestamps.

Question 6

Asked in

Data Engineer Interview

Q6. What are window functions in SQL

Add your answer

Answer

Window functions in SQL are used to perform calculations across a set of table rows related to the current row.

Window functions are used to calculate values based on a set of rows related to the current row.
They allow you to perform calculations without grouping the rows into a single output row.
Examples of window functions include ROW_NUMBER(), RANK(), DENSE_RANK(), and NTILE().

Question 7

Asked in

Data Engineer Interview

Q7. Difference between group by and distinct

Add your answer

Answer

Group by is used to group rows that have the same values into summary rows, while distinct is used to remove duplicate rows from a result set.

Group by is used with aggregate functions like COUNT, SUM, AVG, etc.
Distinct is used to retrieve unique values from a column or set of columns.
Group by is used to perform operations on groups of rows, while distinct is used to filter out duplicate rows.
Group by is used in conjunction with SELECT statement, while distinct is used as a ke...read more

Question 8

Asked in

Data Engineer Interview

Q8. Explain Sorting algorithms

Add your answer

Answer

Sorting algorithms are methods used to arrange elements in a specific order.

Sorting algorithms are used to rearrange elements in a specific order, such as numerical or alphabetical.
Common sorting algorithms include Bubble Sort, Selection Sort, Insertion Sort, Merge Sort, Quick Sort, and Heap Sort.
Each sorting algorithm has its own time complexity and efficiency based on the size of the input data.
Sorting algorithms can be stable (maintains the relative order of equal elements...read more

Question 9

Asked in

Data Engineer Interview

Q9. Explain project

Add your answer

Answer

Developed a data pipeline to ingest, process, and analyze customer behavior data for targeted marketing campaigns.

Designed and implemented ETL processes to extract data from various sources
Utilized Apache Spark for data processing and analysis
Built machine learning models to predict customer behavior
Collaborated with marketing team to optimize campaign strategies

Aurigene Oncology Limited Interview Questions and Answers

Q1. In a word count spark program which command will run on driver and which will run on executor

Q2. What are the key features and functionalities of Snowflake?

Q3. Joins in Sql, Modelling and visualization part in PowerBI

Q4. Cumulative sum and rank functions in spark

Q5. Slowly change data handling in spark

Q6. What are window functions in SQL

Q7. Difference between group by and distinct

Q8. Explain Sorting algorithms

Q9. Explain project

More about working at Fractal Analytics

Top HR Questions asked in Aurigene Oncology Limited

Interview Process at Aurigene Oncology Limited

Top Data Engineer Interview Questions from Similar Companies