i
IBM
Proud winner of ABECA 2024 - AmbitionBox Employee Choice Awards
Filter interviews by
Clear (1)
I applied via Recruitment Consulltant and was interviewed before Jun 2021. There were 3 interview rounds.
Data skewness in Spark can be handled by partitioning, bucketing, or using salting techniques.
Partitioning the data based on a key column can distribute the data evenly across the nodes.
Bucketing can group the data into buckets based on a key column, which can improve join performance.
Salting involves adding a random prefix to the key column, which can distribute the data evenly.
Using broadcast joins for small tables c...
Partitioning is dividing data into smaller chunks based on a column value. Bucketing is dividing data into equal-sized buckets based on a hash function.
Partitioning is used for organizing data for efficient querying and processing.
Bucketing is used for evenly distributing data across nodes in a cluster.
Partitioning is done based on a column value, such as date or region.
Bucketing is done based on a hash function, such ...
Cache is temporary storage used to speed up access to frequently accessed data. Persistent storage is permanent storage used to store data even after power loss.
Cache is faster but smaller than persistent storage
Cache is volatile and data is lost when power is lost
Persistent storage is non-volatile and data is retained even after power loss
Examples of cache include CPU cache, browser cache, and CDN cache
Examples of per...
To read JSON data using Spark, use the SparkSession.read.json() method.
Create a SparkSession object
Use the read.json() method to read the JSON data
Specify the path to the JSON file or directory containing JSON files
The resulting DataFrame can be manipulated using Spark's DataFrame API
To create a Kafka topic with replication factor 2, use the command line tool or Kafka API.
Use the command line tool 'kafka-topics.sh' with the '--replication-factor' flag set to 2.
Alternatively, use the Kafka API to create a topic with a replication factor of 2.
Ensure that the number of brokers in the Kafka cluster is greater than or equal to the replication factor.
Consider setting the 'min.insync.replicas' configurati...
What people are saying about IBM
I applied via Naukri.com and was interviewed before Apr 2022. There were 4 interview rounds.
IBM interview questions for designations
I applied via Referral and was interviewed before Apr 2022. There were 2 interview rounds.
Get interview-ready with Top IBM Interview Questions
I applied via Naukri.com and was interviewed in Aug 2021. There were 3 interview rounds.
Union combines and removes duplicates, Union All combines all rows including duplicates.
Union merges two tables and removes duplicates
Union All merges two tables and includes duplicates
Union is slower than Union All as it removes duplicates
Syntax: SELECT column1, column2 FROM table1 UNION/UNION ALL SELECT column1, column2 FROM table2
Example: SELECT name FROM table1 UNION SELECT name FROM table2
Difference between delete, truncate and drop in SQL
DELETE removes specific rows from a table
TRUNCATE removes all rows from a table
DROP removes the entire table from the database
I applied via Recruitment Consultant and was interviewed in Jun 2021. There were 3 interview rounds.
Two keys are available in Azure resources for security purposes.
One key is used for authentication and the other for authorization.
Authentication key is used to verify the identity of the user or application accessing the resource.
Authorization key is used to grant or deny access to specific resources or actions.
Having two keys adds an extra layer of security to Azure resources.
Examples of Azure resources that use two
I applied via Walk-in and was interviewed in May 2021. There were 4 interview rounds.
I applied via Naukri.com and was interviewed before Dec 2020. There were 3 interview rounds.
I have worked with tech stacks such as Hadoop, Spark, Kafka, AWS, and SQL.
Experience with Hadoop ecosystem including HDFS, MapReduce, Hive, and Pig
Proficient in Spark for data processing and analysis
Worked with Kafka for real-time data streaming
Familiar with AWS services such as EC2, S3, and EMR
Strong SQL skills for data querying and manipulation
I applied via Approached by Company and was interviewed before Sep 2021. There were 4 interview rounds.
Hackarrank questions, practice more of it
The duration of IBM Data Engineer interview process can vary, but typically it takes about less than 2 weeks to complete.
based on 40 interviews
3 Interview rounds
based on 208 reviews
Rating in categories
Application Developer
11.7k
salaries
| ₹0 L/yr - ₹0 L/yr |
Software Engineer
5.5k
salaries
| ₹0 L/yr - ₹0 L/yr |
Advisory System Analyst
5.2k
salaries
| ₹0 L/yr - ₹0 L/yr |
Senior Software Engineer
5k
salaries
| ₹0 L/yr - ₹0 L/yr |
Senior Systems Engineer
4.5k
salaries
| ₹0 L/yr - ₹0 L/yr |
Oracle
TCS
Cognizant
Accenture