Fractal Analytics
Bajaj Auto Interview Questions and Answers
Q1. In a word count spark program which command will run on driver and which will run on executor
Commands that run on driver and executor in a word count Spark program.
The command to read the input file and create RDD will run on driver.
The command to split the lines and count the words will run on executor.
The command to aggregate the word counts and write the output will run on driver.
Driver sends tasks to executors and coordinates the overall job.
Executor processes the tasks assigned by the driver.
Q2. What are the key features and functionalities of Snowflake?
Snowflake is a cloud-based data warehousing platform known for its scalability, performance, and ease of use.
Snowflake uses a unique architecture called multi-cluster, which separates storage and compute resources for better scalability and performance.
It supports both structured and semi-structured data, allowing users to work with various data types.
Snowflake offers features like automatic scaling, data sharing, and built-in support for SQL queries.
It provides a web interfa...read more
Q3. Joins in Sql, Modelling and visualization part in PowerBI
Answering about joins in SQL and modeling/visualization in PowerBI
Joins in SQL are used to combine data from two or more tables based on a related column
There are different types of joins such as inner join, left join, right join, and full outer join
PowerBI is a data visualization tool that allows users to create interactive reports and dashboards
Data modeling in PowerBI involves creating relationships between tables and defining measures and calculated columns
Visualization i...read more
Q4. Cumulative sum and rank functions in spark
Explanation of cumulative sum and rank functions in Spark
Cumulative sum function calculates the running total of a column
Rank function assigns a rank to each row based on the order of values in a column
Both functions can be used with window functions in Spark
Example: df.withColumn('cumulative_sum', F.sum('column').over(Window.orderBy('order_column').rowsBetween(Window.unboundedPreceding, Window.currentRow)))
Example: df.withColumn('rank', F.rank().over(Window.orderBy('column')...read more
Q5. Slowly change data handling in spark
Slowly changing data handling in Spark involves updating data over time.
Slowly changing dimensions (SCD) are used to track changes in data over time.
SCD Type 1 updates the data in place, overwriting the old values.
SCD Type 2 creates a new record for each change, with a start and end date.
SCD Type 3 adds a new column to the existing record to track changes.
Spark provides functions like `from_unixtime` and `unix_timestamp` to handle timestamps.
Q6. What are window functions in SQL
Window functions in SQL are used to perform calculations across a set of table rows related to the current row.
Window functions are used to calculate values based on a set of rows related to the current row.
They allow you to perform calculations without grouping the rows into a single output row.
Examples of window functions include ROW_NUMBER(), RANK(), DENSE_RANK(), and NTILE().
Q7. Difference between group by and distinct
Group by is used to group rows that have the same values into summary rows, while distinct is used to remove duplicate rows from a result set.
Group by is used with aggregate functions like COUNT, SUM, AVG, etc.
Distinct is used to retrieve unique values from a column or set of columns.
Group by is used to perform operations on groups of rows, while distinct is used to filter out duplicate rows.
Group by is used in conjunction with SELECT statement, while distinct is used as a ke...read more
Q8. Explain Sorting algorithms
Sorting algorithms are methods used to arrange elements in a specific order.
Sorting algorithms are used to rearrange elements in a specific order, such as numerical or alphabetical.
Common sorting algorithms include Bubble Sort, Selection Sort, Insertion Sort, Merge Sort, Quick Sort, and Heap Sort.
Each sorting algorithm has its own time complexity and efficiency based on the size of the input data.
Sorting algorithms can be stable (maintains the relative order of equal elements...read more
Q9. Explain project
Developed a data pipeline to ingest, process, and analyze customer behavior data for targeted marketing campaigns.
Designed and implemented ETL processes to extract data from various sources
Utilized Apache Spark for data processing and analysis
Built machine learning models to predict customer behavior
Collaborated with marketing team to optimize campaign strategies
Interview Process at Bajaj Auto
Top Data Engineer Interview Questions from Similar Companies
Reviews
Interviews
Salaries
Users/Month