i
IBM
Proud winner of ABECA 2024 - AmbitionBox Employee Choice Awards
Filter interviews by
I was interviewed in Aug 2024.
Python and sql tasks
I applied via Approached by Company and was interviewed in Jun 2024. There were 2 interview rounds.
Python coding question and couple of SQL questions
Spark optimization techniques focus on improving performance and efficiency of Spark jobs.
Partitioning data to optimize parallelism
Caching frequently accessed data
Using broadcast variables for small lookup tables
Avoiding shuffling operations whenever possible
Tuning memory settings for optimal performance
I have faced difficulties in handling large volumes of data, ensuring data quality, and managing dependencies in ETL pipelines.
Handling large volumes of data can lead to performance issues and scalability challenges.
Ensuring data quality involves dealing with data inconsistencies, errors, and missing values.
Managing dependencies between different stages of the ETL process can be complex and prone to failures.
What people are saying about IBM
Bigquery Architecture, Project Discussion
Python and SQL questions
IBM interview questions for designations
I applied via Naukri.com and was interviewed in Oct 2022. There were 2 interview rounds.
Questions on big data, Hadoop, Spark, Scala, Git, project and Agile.
Hadoop architecture and HDFS commands for copying and listing files in HDFS
Spark architecture and Transformation and Action question
What happens when we submit a Spark program
Spark DataFrame coding question
Scala basic program on List
Git and Github
Project-related question
Agile-related
I applied via Naukri.com and was interviewed in May 2022. There was 1 interview round.
I was asked to do some python code provided by the interviewer and some senario-based SQL queries and a lot of job processing theory and optimization techniques used in my project.
Optimization techniques used in project
Caching
Parallel processing
Compression
Indexing
Query optimization
I was interviewed in Sep 2024.
Pyspark is a Python library for big data processing using Spark framework.
Pyspark is used for processing large datasets in parallel.
It provides APIs for data manipulation, querying, and analysis.
Example: Using pyspark to read a CSV file and perform data transformations.
Databricks optimisation techniques improve performance and efficiency of data processing on the Databricks platform.
Use cluster sizing and autoscaling to optimize resource allocation based on workload
Leverage Databricks Delta for optimized data storage and processing
Utilize caching and persisting data to reduce computation time
Optimize queries by using appropriate indexing and partitioning strategies
Databricks is a unified data analytics platform that provides a collaborative environment for data engineers.
Databricks is built on top of Apache Spark and provides a workspace for data engineering tasks.
It allows for easy integration with various data sources and tools for data processing.
Databricks provides features like notebooks, clusters, and libraries for efficient data engineering workflows.
I applied via Naukri.com and was interviewed in Sep 2024. There was 1 interview round.
SCD type 2 is a method used in data warehousing to track historical changes by creating a new record for each change.
SCD type 2 stands for Slowly Changing Dimension type 2
It involves creating a new record in the dimension table whenever there is a change in the data
The old record is marked as inactive and the new record is marked as current
It allows for historical tracking of changes in data over time
Example: If a cust...
based on 7 interviews
2 Interview rounds
based on 51 reviews
Rating in categories
Application Developer
11.7k
salaries
| ₹5.5 L/yr - ₹24 L/yr |
Software Engineer
5.5k
salaries
| ₹5.5 L/yr - ₹22.5 L/yr |
Advisory System Analyst
5.2k
salaries
| ₹9.4 L/yr - ₹29.8 L/yr |
Senior Software Engineer
4.8k
salaries
| ₹8 L/yr - ₹30 L/yr |
Senior Systems Engineer
4.5k
salaries
| ₹5.7 L/yr - ₹20.8 L/yr |
Oracle
TCS
Cognizant
Accenture