Filter interviews by
I applied via Naukri.com and was interviewed in Jun 2024. There were 3 interview rounds.
Sample data and its transformations
Sample data can be in the form of CSV, JSON, or database tables
Transformations include cleaning, filtering, aggregating, and joining data
Examples: converting date formats, removing duplicates, calculating averages
Seeking new challenges and opportunities for growth in a more dynamic environment.
Looking for new challenges and opportunities for growth
Seeking a more dynamic work environment
Interested in expanding skill set and knowledge
Want to work on more innovative projects
I applied via LinkedIn and was interviewed in Apr 2024. There was 1 interview round.
Spark architecture is based on a master-slave architecture with a cluster manager to coordinate tasks.
Spark architecture consists of a driver program that communicates with a cluster manager to coordinate tasks.
The cluster manager allocates resources and schedules tasks on worker nodes.
Worker nodes execute the tasks and return results to the driver program.
Spark supports various cluster managers like YARN, Mesos, and s...
There will be 4 stages created in total for the spark job.
Wide transformations trigger a shuffle and create a new stage.
Narrow transformations do not trigger a shuffle and do not create a new stage.
In this case, 3 wide transformations will create 3 new stages and 2 narrow transformations will not create new stages.
Therefore, a total of 4 stages will be created.
Top trending discussions
I was interviewed in Dec 2024.
I applied via Company Website and was interviewed in Sep 2024. There were 2 interview rounds.
Platform - Hackerank
Duration - 2 Hours
Topics - Spark and SQL
Common file formats used in data storages include CSV, JSON, Parquet, Avro, and ORC. Parquet is best for compression.
CSV (Comma-Separated Values) - simple and widely used, but not efficient for large datasets
JSON (JavaScript Object Notation) - human-readable and easy to parse, but can be inefficient for storage
Parquet - columnar storage format that is highly efficient for compression and query performance
Avro - efficie...
Python program to find the most repeating substring in a list of words.
Iterate through each word in the list
Generate all possible substrings for each word
Count the occurrences of each substring using a dictionary
Find the substring with the highest count
I applied via Naukri.com and was interviewed in Nov 2024. There was 1 interview round.
Enhanced optimization in AWS Glue improves job performance by automatically adjusting resources based on workload
Enhanced optimization in AWS Glue automatically adjusts resources like DPUs based on workload
It helps improve job performance by optimizing resource allocation
Users can enable enhanced optimization in AWS Glue job settings
Optimizing querying in Amazon Redshift involves proper table design, distribution keys, sort keys, and query optimization techniques.
Use appropriate distribution keys to evenly distribute data across nodes for parallel processing.
Utilize sort keys to physically order data on disk, reducing the need for sorting during queries.
Avoid using SELECT * and instead specify only the columns needed to reduce data transfer.
Use AN...
I applied via Naukri.com and was interviewed in Sep 2024. There was 1 interview round.
SCD type 2 is a method used in data warehousing to track historical changes by creating a new record for each change.
SCD type 2 stands for Slowly Changing Dimension type 2
It involves creating a new record in the dimension table whenever there is a change in the data
The old record is marked as inactive and the new record is marked as current
It allows for historical tracking of changes in data over time
Example: If a cust...
posted on 4 Aug 2024
I am a Senior Data Engineer with 5+ years of experience in designing and implementing data pipelines for large-scale projects.
Experienced in ETL processes and data warehousing
Proficient in programming languages like Python, SQL, and Java
Skilled in working with big data technologies such as Hadoop, Spark, and Kafka
Strong understanding of data modeling and database management
Excellent problem-solving and communication sk
Developing a real-time data processing system for analyzing customer behavior on e-commerce platform.
Utilizing Apache Kafka for real-time data streaming
Implementing Spark for data processing and analysis
Creating machine learning models for customer segmentation
Integrating with Elasticsearch for data indexing and search functionality
I applied via Job Portal and was interviewed in Jul 2024. There was 1 interview round.
I have over 5 years of experience in data engineering, working with large datasets and implementing data pipelines.
Developed and maintained ETL processes to extract, transform, and load data from various sources
Optimized database performance and implemented data quality checks
Worked with cross-functional teams to design and implement data solutions
Utilized tools such as Apache Spark, Hadoop, and SQL for data processing
...
I would start by understanding the requirements, breaking down the task into smaller steps, researching if needed, and then creating a plan to execute the task efficiently.
Understand the requirements of the task
Break down the task into smaller steps
Research if needed to gather necessary information
Create a plan to execute the task efficiently
Communicate with stakeholders for clarification or updates
Regularly track prog
I applied via Recruitment Consulltant and was interviewed in Nov 2021. There were 4 interview rounds.
Normal aptitude test
based on 2 interviews
Interview experience
based on 2 reviews
Rating in categories
Senior Consultant
4k
salaries
| ₹8 L/yr - ₹25.4 L/yr |
Consultant
3.3k
salaries
| ₹6 L/yr - ₹21 L/yr |
Lead Consultant
3.3k
salaries
| ₹10.5 L/yr - ₹36 L/yr |
Software Engineer
3.3k
salaries
| ₹3.5 L/yr - ₹13 L/yr |
Associate Consultant
2.8k
salaries
| ₹4.6 L/yr - ₹15.3 L/yr |
Cognizant
TCS
Infosys
Accenture