Upload Button Icon Add office photos

Filter interviews by

Clear (1)

Jio Platforms Big Data Engineer Interview Questions and Answers

Updated 27 Oct 2023

Jio Platforms Big Data Engineer Interview Experiences

1 interview found

Interview experience
5
Excellent
Difficulty level
Moderate
Process Duration
Less than 2 weeks
Result
Selected Selected

I applied via Campus Placement and was interviewed before Oct 2022. There were 3 interview rounds.

Round 1 - Resume Shortlist 
Pro Tip by AmbitionBox:
Keep your resume crisp and to the point. A recruiter looks at your resume for an average of 6 seconds, make sure to leave the best impression.
View all Resume tips
Round 2 - Coding Test 

Basic coding was asked

Round 3 - Technical 

(5 Questions)

  • Q1. Technical questions around data structures and projects related technologies
  • Q2. Machine learning basics
  • Q3. Python coding question
  • Q4. Computer fundamentals
  • Q5. Oops concepts was asked

Interview Preparation Tips

Interview preparation tips for other job seekers - Just prepare well

Interview questions from similar companies

Interview experience
4
Good
Difficulty level
Moderate
Process Duration
Less than 2 weeks
Result
No response

I was interviewed in Nov 2024.

Round 1 - One-on-one 

(7 Questions)

  • Q1. Command to check disk utilisation and health in Hadoop
  • Ans. 

    Use 'hdfs diskbalancer' command to check disk utilisation and health in Hadoop

    • Run 'hdfs diskbalancer -report' to get a report on disk utilisation

    • Use 'hdfs diskbalancer -plan <path>' to generate a plan for balancing disk usage

    • Check the Hadoop logs for any disk health issues

  • Answered by AI
  • Q2. Spark Architecture & the significance of each member of spark Architecture
  • Ans. 

    Spark Architecture consists of Driver, Cluster Manager, and Executors. Driver manages the execution of Spark jobs.

    • Driver: Manages the execution of Spark jobs, converts user code into tasks, and coordinates with Cluster Manager.

    • Cluster Manager: Manages resources across the cluster and allocates resources to Spark applications.

    • Executors: Execute tasks assigned by the Driver and store data in memory or disk for further pr...

  • Answered by AI
  • Q3. Partitioning and bucketing
  • Q4. Spark optimization techniques
  • Ans. 

    Optimization techniques in Spark improve performance and efficiency of data processing.

    • Partitioning data to distribute workload evenly

    • Caching frequently accessed data in memory

    • Using broadcast variables for small lookup tables

    • Avoiding shuffling operations whenever possible

    • Tuning memory settings and garbage collection parameters

  • Answered by AI
  • Q5. Second highest salary
  • Ans. 

    I am unable to provide this information as it is confidential.

    • Confidential information about salaries in previous organizations should not be disclosed.

    • It is important to respect the privacy and confidentiality of past employers.

    • Discussing specific salary details may not be appropriate in a professional setting.

  • Answered by AI
  • Q6. Pivot table creation in SQL from not pivot one
  • Ans. 

    To create a pivot table in SQL from a non-pivot table, you can use the CASE statement with aggregate functions.

    • Use the CASE statement to categorize data into columns

    • Apply aggregate functions like SUM, COUNT, AVG, etc. to calculate values for each category

    • Group the data by the columns you want to pivot on

  • Answered by AI
  • Q7. How to create triggers
  • Ans. 

    Creating triggers in a database involves defining the trigger, specifying the event that will activate it, and writing the code to be executed.

    • Define the trigger using the CREATE TRIGGER statement

    • Specify the event that will activate the trigger (e.g. INSERT, UPDATE, DELETE)

    • Write the code or actions to be executed when the trigger is activated

    • Test the trigger to ensure it functions as intended

  • Answered by AI

Interview Preparation Tips

Interview preparation tips for other job seekers - Easy to medium questions were asked.
They are focusing on concept basically

Skills evaluated in this interview

Interview experience
4
Good
Difficulty level
Moderate
Process Duration
Less than 2 weeks
Result
Not Selected

I applied via Job Portal and was interviewed in May 2024. There was 1 interview round.

Round 1 - Technical 

(1 Question)

  • Q1. Explain pyspark architecture
  • Ans. 

    PySpark architecture is based on the Apache Spark architecture, with additional components for Python integration.

    • PySpark architecture includes Spark Core, Spark SQL, Spark Streaming, MLlib, and GraphX.

    • It allows Python developers to interact with Spark using PySpark API.

    • PySpark architecture enables distributed processing of large datasets using RDDs and DataFrames.

    • It leverages the power of in-memory processing for fast...

  • Answered by AI

Skills evaluated in this interview

Interview experience
1
Bad
Difficulty level
Moderate
Process Duration
Less than 2 weeks
Result
Not Selected

I applied via Naukri.com and was interviewed in Jul 2023. There were 2 interview rounds.

Round 1 - Resume Shortlist 
Pro Tip by AmbitionBox:
Keep your resume crisp and to the point. A recruiter looks at your resume for an average of 6 seconds, make sure to leave the best impression.
View all Resume tips
Round 2 - Technical 

(2 Questions)

  • Q1. Basic Questions of Scala Functional Programming concepts.
  • Q2. Spark internal working and optimization techniques
  • Ans. 

    Spark internal working and optimization techniques

    • Spark uses Directed Acyclic Graph (DAG) for optimizing workflows

    • Lazy evaluation helps in optimizing transformations by combining them into a single stage

    • Caching and persistence of intermediate results can improve performance

    • Partitioning data can help in parallel processing and reducing shuffle operations

  • Answered by AI

Interview Preparation Tips

Interview preparation tips for other job seekers - The interview call was abruptly terminated into 20 mins of call duration, as the HR had another conflicting call. The HR called me over cellphone and told that if the interview panel requested she will let me know and the call can be extended, but the HR did not call, The interview did not extended. Finally they rejected me just after the panel spoke 20 mins with me in the 1st round interview. This shows how unprofessional are they in scheduling an interview call and how could any panel can decide within 20 mins of a discussion. Definitely not recommending anyone to attend Bigdata Engineering interviews here.

Skills evaluated in this interview

Interview experience
4
Good
Difficulty level
Moderate
Process Duration
Less than 2 weeks
Result
Selected Selected

I applied via LinkedIn and was interviewed in Jun 2023. There were 4 interview rounds.

Round 1 - Resume Shortlist 
Pro Tip by AmbitionBox:
Don’t add your photo or details such as gender, age, and address in your resume. These details do not add any value.
View all Resume tips
Round 2 - Technical 

(1 Question)

  • Q1. Project overview and spark architecture questions,scala coding questions
Round 3 - Technical 

(1 Question)

  • Q1. Project overview Scala and AWS services questions
Round 4 - HR 

(1 Question)

  • Q1. Asked about multiple switches And experience

Interview Preparation Tips

Interview preparation tips for other job seekers - Prepare more on spark internals
Interview experience
4
Good
Difficulty level
Moderate
Process Duration
Less than 2 weeks
Result
-

I applied via Referral and was interviewed in Dec 2024. There were 2 interview rounds.

Round 1 - Aptitude Test 

30 Questions in 20 Minutes

Round 2 - Technical 

(1 Question)

  • Q1. Baiscs of SQL,Python,AWS and spark in depth question
Interview experience
3
Average
Difficulty level
-
Process Duration
-
Result
-
Round 1 - Aptitude Test 

The aptitude test lasts 30 minutes and focuses on topics relevant to data engineering, including Spark, SQL, Azure, and PySpark.

Round 2 - Coding Test 

The coding test is a one-hour examination on PySpark.

Round 3 - Technical 

(3 Questions)

  • Q1. What is the difference between Cache() and Persist()?
  • Q2. What does the purpose of the Spark Submit command in Apache Spark?
  • Q3. What are window functions in SQL?
Round 4 - HR 

(2 Questions)

  • Q1. Could you provide more details about the daily responsibilities associated with this role?
  • Q2. How would you describe your work culture?
Interview experience
5
Excellent
Difficulty level
Easy
Process Duration
Less than 2 weeks
Result
Selected Selected

I applied via AmbitionBox and was interviewed in Nov 2024. There were 4 interview rounds.

Round 1 - HR 

(2 Questions)

  • Q1. About your self
  • Q2. Communication skills
Round 2 - Technical 

(3 Questions)

  • Q1. Programming language
  • Q2. What tools do you utilize for data analysis?
  • Ans. 

    I utilize tools such as Excel, Python, SQL, and Tableau for data analysis.

    • Excel for basic data manipulation and visualization

    • Python for advanced data analysis and machine learning

    • SQL for querying databases

    • Tableau for creating interactive visualizations

  • Answered by AI
  • Q3. Pandas numpy seaborn matplot
Round 3 - Coding Test 

Data analysis of code in the context of data analysis.

Round 4 - Aptitude Test 

Coding logical question paper.

Interview experience
4
Good
Difficulty level
Moderate
Process Duration
Less than 2 weeks
Result
No response

I applied via Naukri.com and was interviewed in Aug 2024. There were 2 interview rounds.

Round 1 - Technical 

(12 Questions)

  • Q1. Tell me about yourself and Project
  • Ans. 

    I am a Senior Data Engineer with experience in developing data pipelines and optimizing data storage for various projects.

    • Developed data pipelines using Apache Spark for real-time data processing

    • Optimized data storage using technologies like Hadoop and AWS S3

    • Worked on a project to analyze customer behavior and improve marketing strategies

  • Answered by AI
  • Q2. What was you day-to-day job in your project
  • Ans. 

    My day-to-day job in the project involved designing and implementing data pipelines, optimizing data workflows, and collaborating with cross-functional teams.

    • Designing and implementing data pipelines to extract, transform, and load data from various sources

    • Optimizing data workflows to improve efficiency and performance

    • Collaborating with cross-functional teams including data scientists, analysts, and business stakeholde...

  • Answered by AI
  • Q3. Spark Architecture
  • Q4. How DAG handle Fault tolerance?
  • Ans. 

    DAGs handle fault tolerance by rerunning failed tasks and maintaining task dependencies.

    • DAGs rerun failed tasks automatically to ensure completion.

    • DAGs maintain task dependencies to ensure proper sequencing.

    • DAGs can be configured to retry failed tasks a certain number of times before marking them as failed.

  • Answered by AI
  • Q5. What is shuffling? How to Handle Shuffling?
  • Ans. 

    Shuffling is the process of redistributing data across partitions in a distributed computing environment.

    • Shuffling is necessary when data needs to be grouped or aggregated across different partitions.

    • It can be handled efficiently by minimizing the amount of data being shuffled and optimizing the partitioning strategy.

    • Techniques like partitioning, combiners, and reducers can help reduce the amount of shuffling in MapRed

  • Answered by AI
  • Q6. What is the difference between repartition and Coelsce?
  • Ans. 

    Repartition increases or decreases the number of partitions in a DataFrame, while Coalesce only decreases the number of partitions.

    • Repartition can increase or decrease the number of partitions in a DataFrame, leading to a shuffle of data across the cluster.

    • Coalesce only decreases the number of partitions in a DataFrame without performing a full shuffle, making it more efficient than repartition.

    • Repartition is typically...

  • Answered by AI
  • Q7. How do you handle Incremental data?
  • Ans. 

    Incremental data is handled by identifying new data since the last update and merging it with existing data.

    • Identify new data since last update

    • Merge new data with existing data

    • Update data warehouse or database with incremental changes

  • Answered by AI
  • Q8. What is SCD ??
  • Ans. 

    SCD stands for Slowly Changing Dimension, a concept in data warehousing to track changes in data over time.

    • SCD is used to maintain historical data in a data warehouse.

    • There are three types of SCD - Type 1, Type 2, and Type 3.

    • Type 1 SCD overwrites old data with new data.

    • Type 2 SCD creates a new record for each change, preserving history.

    • Type 3 SCD maintains both old and new values in the same record.

    • SCD is important for...

  • Answered by AI
  • Q9. Scenerio based questions related to Spark ?
  • Q10. Two SQL Codes and Two Python codes like reverse a string ?
  • Ans. 

    Reverse a string using SQL and Python codes.

    • In SQL, use the REVERSE function to reverse a string.

    • In Python, use slicing with a step of -1 to reverse a string.

  • Answered by AI
  • Q11. Find top 5 countries with highest population in Spark and SQL
  • Ans. 

    Use Spark and SQL to find the top 5 countries with the highest population.

    • Use Spark to load the data and perform data processing.

    • Use SQL queries to group by country and sum the population.

    • Order the results in descending order and limit to top 5.

    • Example: SELECT country, SUM(population) AS total_population FROM table_name GROUP BY country ORDER BY total_population DESC LIMIT 5

  • Answered by AI
  • Q12. Using two tables find the different records for different joins
  • Ans. 

    To find different records for different joins using two tables

    • Use the SQL query to perform different joins like INNER JOIN, LEFT JOIN, RIGHT JOIN, and FULL JOIN

    • Identify the key columns in both tables to join on

    • Select the columns from both tables and use WHERE clause to filter out the different records

  • Answered by AI
Round 2 - One-on-one 

(7 Questions)

  • Q1. What is a catalyst optimiser? How it works?
  • Ans. 

    A catalyst optimizer is a query optimization tool used in Apache Spark to improve performance by generating an optimal query plan.

    • Catalyst optimizer is a rule-based query optimization framework in Apache Spark.

    • It leverages rules to transform the logical query plan into a more optimized physical plan.

    • The optimizer applies various optimization techniques like predicate pushdown, constant folding, and join reordering.

    • By o...

  • Answered by AI
  • Q2. Tell me about the optimization you used in your project.
  • Ans. 

    Used query optimization techniques to improve performance in database queries.

    • Utilized indexing to speed up search queries.

    • Implemented query caching to reduce redundant database calls.

    • Optimized SQL queries by restructuring joins and subqueries.

    • Utilized database partitioning to improve query performance.

    • Used query profiling tools to identify and optimize slow queries.

  • Answered by AI
  • Q3. Pyspark question related to merging two schemas?
  • Q4. What is the best approach to finding whether the data frame is empty or not?
  • Ans. 

    Use the len() function to check the length of the data frame.

    • Use len() function to get the number of rows in the data frame.

    • If the length is 0, then the data frame is empty.

    • Example: if len(df) == 0: print('Data frame is empty')

  • Answered by AI
  • Q5. Spark Architecture
  • Q6. How do you decide on cores and worker nodes?
  • Ans. 

    Cores and worker nodes are decided based on the workload requirements and scalability needs of the data processing system.

    • Consider the size and complexity of the data being processed

    • Evaluate the processing speed and memory requirements of the tasks

    • Take into account the parallelism and concurrency needed for efficient data processing

    • Monitor the system performance and adjust cores and worker nodes as needed

  • Answered by AI
  • Q7. What happens when we enforce schema ?
  • Ans. 

    Enforcing schema ensures that data conforms to a predefined structure and rules.

    • Ensures data integrity by validating incoming data against predefined schema

    • Helps in maintaining consistency and accuracy of data

    • Prevents data corruption and errors in data processing

    • Can lead to rejection of data that does not adhere to the schema

  • Answered by AI

Interview Preparation Tips

Topics to prepare for Persistent Systems Senior Data Engineer interview:
  • SQL
  • Pyspark
  • Python
  • Spark
  • Database
Interview preparation tips for other job seekers - Be prepared with Spark core concepts and SQL Coding

Skills evaluated in this interview

Interview experience
1
Bad
Difficulty level
Moderate
Process Duration
Less than 2 weeks
Result
No response

I applied via Job Fair and was interviewed in Nov 2024. There were 2 interview rounds.

Round 1 - Technical 

(2 Questions)

  • Q1. DAX Related Syntax and Codes
  • Q2. Data Modelling, SQL, Python
Round 2 - Technical 

(1 Question)

  • Q1. No Response from HR after calling of selection after Round 1
Contribute & help others!
anonymous
You can choose to be anonymous

Jio Platforms Interview FAQs

How many rounds are there in Jio Platforms Big Data Engineer interview?
Jio Platforms interview process usually has 3 rounds. The most common rounds in the Jio Platforms interview process are Resume Shortlist, Coding Test and Technical.
What are the top questions asked in Jio Platforms Big Data Engineer interview?

Some of the top questions asked at the Jio Platforms Big Data Engineer interview -

  1. technical questions around data structures and projects related technolog...read more
  2. Machine learning bas...read more
  3. python coding quest...read more

Recently Viewed

DESIGNATION

Tell us how to improve this page.

Jio Platforms Big Data Engineer Salary
based on 43 salaries
₹4 L/yr - ₹12.4 L/yr
37% less than the average Big Data Engineer Salary in India
View more details

Jio Platforms Big Data Engineer Reviews and Ratings

based on 5 reviews

3.0/5

Rating in categories

3.8

Skill development

2.6

Work-life balance

2.3

Salary

4.4

Job security

3.3

Company culture

2.1

Promotions

3.5

Work satisfaction

Explore 5 Reviews and Ratings
Software Developer
552 salaries
unlock blur

₹4.1 L/yr - ₹16.1 L/yr

Software Development Engineer
511 salaries
unlock blur

₹3 L/yr - ₹13.2 L/yr

Assistant Manager
452 salaries
unlock blur

₹3 L/yr - ₹10 L/yr

Product Manager
417 salaries
unlock blur

₹10.2 L/yr - ₹40 L/yr

Senior Manager
358 salaries
unlock blur

₹13.5 L/yr - ₹44.3 L/yr

Explore more salaries
Compare Jio Platforms with

Jio

3.9
Compare

Reliance Industries

4.0
Compare

Bharti Airtel

4.0
Compare

Vodafone Idea

4.1
Compare
Did you find this page helpful?
Yes No
write
Share an Interview