i
Celebal Technologies
Filter interviews by
Clear (1)
I applied via Job Fair and was interviewed in Jul 2023. There were 2 interview rounds.
I applied via Campus Placement and was interviewed in May 2021. There were 4 interview rounds.
Top trending discussions
I applied via Other and was interviewed before Nov 2020. There was 1 interview round.
Big data refers to large and complex data sets that cannot be processed using traditional data processing tools.
Big data is characterized by the 3Vs - volume, velocity, and variety.
It requires specialized tools and technologies such as Hadoop, Spark, and NoSQL databases.
Big data is used in various industries such as healthcare, finance, and retail to gain insights and make data-driven decisions.
Spark is an open-source distributed computing system used for big data processing and analytics.
Spark is built on top of Hadoop and provides faster processing of data due to in-memory computing.
It supports multiple programming languages like Java, Scala, Python, and R.
Spark has various components like Spark SQL, Spark Streaming, MLlib, and GraphX for different use cases.
It can be used for batch processing, real-time pr...
I applied via Naukri.com and was interviewed in Sep 2024. There were 2 interview rounds.
It was WeCP based test
Spark Architecture is a distributed computing framework that provides high-level APIs for in-memory computing.
Spark Architecture consists of a cluster manager, worker nodes, and a driver program.
It uses Resilient Distributed Datasets (RDDs) for fault-tolerant distributed data processing.
Spark applications run as independent sets of processes on a cluster, coordinated by the SparkContext object.
It supports various data ...
Higher order functions, closures, anonymous functions, map, flatmap, and tail recursion are key concepts in functional programming.
Higher order function: Functions that can take other functions as arguments or return functions as results.
Closure: Functions that capture variables from their lexical scope, even when they are called outside that scope.
Anonymous function: Functions without a specified name, often used as a...
posted on 11 Jun 2024
I applied via Job Portal and was interviewed in May 2024. There was 1 interview round.
PySpark architecture is based on the Apache Spark architecture, with additional components for Python integration.
PySpark architecture includes Spark Core, Spark SQL, Spark Streaming, MLlib, and GraphX.
It allows Python developers to interact with Spark using PySpark API.
PySpark architecture enables distributed processing of large datasets using RDDs and DataFrames.
It leverages the power of in-memory processing for fast...
I applied via Approached by Company and was interviewed before Oct 2022. There were 2 interview rounds.
I applied via Company Website and was interviewed in Apr 2024. There were 3 interview rounds.
Questions on software and system designs
Helps employer identify particular personality traits like leadership, confidence, interpersonal and teamwork skills of potential employees
I applied via Naukri.com and was interviewed in Aug 2024. There were 2 interview rounds.
Different types of joins in SQL with examples
Inner Join: Returns rows when there is a match in both tables
Left Join: Returns all rows from the left table and the matched rows from the right table
Right Join: Returns all rows from the right table and the matched rows from the left table
Full Outer Join: Returns all rows when there is a match in either table
Large Spark datasets can be handled by partitioning, caching, optimizing transformations, and tuning resources.
Partitioning data to distribute workload evenly across nodes
Caching frequently accessed data to avoid recomputation
Optimizing transformations to reduce unnecessary processing
Tuning resources like memory allocation and parallelism for optimal performance
Spark configuration settings can be tuned to optimize query performance by adjusting parameters like memory allocation, parallelism, and caching.
Increase executor memory and cores to allow for more parallel processing
Adjust shuffle partitions to optimize data shuffling during joins and aggregations
Enable dynamic allocation to scale resources based on workload demands
Utilize caching to store intermediate results and avo...
To handle data skew and partition imbalance in Spark, strategies include using salting, bucketing, repartitioning, and optimizing join operations.
Use salting to evenly distribute skewed keys across partitions
Implement bucketing to pre-partition data based on a specific column
Repartition data based on a specific key to balance partitions
Optimize join operations by broadcasting small tables or using partitioning strategi
Some of the top questions asked at the Celebal Technologies Big Data Engineer interview -
based on 3 interviews
Interview experience
based on 2 reviews
Rating in categories
Data Engineer
391
salaries
| ₹0 L/yr - ₹0 L/yr |
Associate
250
salaries
| ₹0 L/yr - ₹0 L/yr |
Associate Data Engineer
159
salaries
| ₹0 L/yr - ₹0 L/yr |
Associate Consultant
158
salaries
| ₹0 L/yr - ₹0 L/yr |
Data Scientist
125
salaries
| ₹0 L/yr - ₹0 L/yr |
TCS
Infosys
Wipro
HCLTech