Red Parks Interview Questions and Answers

Question 1

Asked in

Senior Data Engineer Interview

Q1. Write a query to get the customer with the highest total order value for each year, month. [Note: Order table is different and Customer table is different. Order_ID and Customer_ID are the PK of the table with...

Q2. There are 10 million records in the table and the schema does not contain the ModifiedDate column. One cell was modified the next day in the table. How will you fetch that particular information that needs to b...

Q3. How do you handle data pipeline when the schema information keeps changing at the source?

Add your answer

Answer

Handle changing schema by using schema evolution techniques and version control.

Use schema evolution techniques like adding new fields, renaming fields, and changing data types.
Implement version control to track changes and ensure backward compatibility.
Use tools like Apache Avro or Apache Parquet to store data in a self-describing format.
Implement automated testing to ensure data quality and consistency.
Collaborate with data producers to establish clear communication and doc...read more

Question 4

Asked in

Senior Data Engineer Interview

Q4. What is Normalisation and Denormalisation? When do we use them? Give a real-time example that is implemented in your project.

Add your answer

Answer

Normalisation is the process of organizing data in a database to reduce redundancy and improve data integrity. Denormalisation is the opposite process.

Normalisation is used to eliminate data redundancy and improve data integrity.
Denormalisation is used to improve query performance by reducing the number of joins required.
A real-time example of normalisation is breaking down a customer's information into separate tables such as customer details, order details, and payment deta...read more

Question 5

Asked in

Senior Data Engineer Interview

Q5. Difference between Parquet and ORC file. Why industry uses parquet over ORC? Can schema evolution happen in ORC?

Add your answer

Answer

Parquet and ORC are columnar storage formats. Parquet is preferred due to its cross-platform support and better compression. ORC supports schema evolution.

Parquet is a columnar storage format that is widely used in the industry due to its cross-platform support and better compression.
ORC is another columnar storage format that supports schema evolution.
Parquet is preferred over ORC due to its better compression and support for a wider range of programming languages.
ORC is pre...read more

Question 6

Asked in

Senior Data Engineer Interview

Q6. What are the different types of schema you know in Data Warehousing?

Add your answer

Answer

There are three types of schema in Data Warehousing: Star Schema, Snowflake Schema, and Fact Constellation Schema.

Star Schema: central fact table connected to dimension tables in a star shape
Snowflake Schema: extension of star schema with normalized dimension tables
Fact Constellation Schema: multiple fact tables connected to dimension tables in a complex structure

Question 7

Asked in

Senior Data Engineer Interview

Q7. Difference between Broadcast variable and accumulator variable

Add your answer

Answer

Broadcast variables are read-only variables that are cached on each worker node while accumulator variables are write-only variables that are used to accumulate values across multiple tasks.

Broadcast variables are used to give every node a copy of a large input dataset or a small lookup table.
Accumulator variables are used to keep a running total of values across multiple tasks.
Broadcast variables are used for read-only operations while accumulator variables are used for writ...read more

Question 8

Asked in

Senior Data Engineer Interview

Q8. Different SCD Types

Add your answer

Answer

SCD (Slowly Changing Dimensions) types are used to track changes in data over time.

Type 1: Overwrite the old data with new data
Type 2: Create a new record with a new primary key
Type 3: Create a new column to store the old data
Type 4: Create a separate table to store the old data
Type 6: Hybrid of Type 1 and Type 2

Red Parks Interview Questions and Answers

Q1. Write a query to get the customer with the highest total order value for each year, month. [Note: Order table is different and Customer table is different. Order_ID and Customer_ID are the PK of the table with...

Q2. There are 10 million records in the table and the schema does not contain the ModifiedDate column. One cell was modified the next day in the table. How will you fetch that particular information that needs to b...

Q3. How do you handle data pipeline when the schema information keeps changing at the source?

Q4. What is Normalisation and Denormalisation? When do we use them? Give a real-time example that is implemented in your project.

Q5. Difference between Parquet and ORC file. Why industry uses parquet over ORC? Can schema evolution happen in ORC?

Q6. What are the different types of schema you know in Data Warehousing?

Q7. Difference between Broadcast variable and accumulator variable

Q8. Different SCD Types

More about working at 7 Eleven

Top Senior Data Engineer Interview Questions from Similar Companies