i
Ingram Micro
Filter interviews by
A data warehouse is a centralized repository that stores structured and unstructured data from various sources for analysis and reporting.
Data warehouses are designed for query and analysis rather than transaction processing.
They typically store historical data and are used for creating reports, dashboards, and data visualizations.
Data warehouses often use ETL (extract, transform, load) processes to integrate data from...
Different kinds of views that can be created include materialized views, virtual views, and dynamic views.
Materialized views store the result set of a query physically and are updated periodically.
Virtual views are based on SQL queries and do not store data physically.
Dynamic views are created on the fly based on user input or system conditions.
Other types of views include read-only views, updatable views, and recursiv
Seeking new challenges and growth opportunities in a dynamic environment.
Desire to work on more advanced projects
Opportunity for career advancement
Seeking a more collaborative team environment
In five years, I see myself as a senior data engineer leading a team of talented individuals, implementing cutting-edge technologies to drive business growth.
Leading a team of data engineers and collaborating with other departments to drive business growth
Implementing advanced technologies and tools to optimize data processing and analysis
Continuously learning and staying updated with the latest trends in data engineer...
Top trending discussions
I applied via Naukri.com and was interviewed in Nov 2024. There was 1 interview round.
posted on 25 Sep 2024
I applied via Walk-in and was interviewed in Aug 2024. There were 5 interview rounds.
Maths grammar & communication
You're like this job opportunity
posted on 29 Jul 2024
Handling imbalanced data involves techniques like resampling, using different algorithms, and adjusting class weights.
Use resampling techniques like oversampling or undersampling to balance the data
Utilize algorithms that are robust to imbalanced data, such as Random Forest or XGBoost
Adjust class weights in the model to give more importance to minority class
Python code to calculate correlation between two features
Import pandas library
Use df.corr() method to calculate correlation between two features
Specify the two features as arguments to the corr() method
Outliers can be handled by removing, transforming, or imputing them based on the context of the data.
Identify outliers using statistical methods like Z-score, IQR, or visualization techniques.
Remove outliers if they are due to data entry errors or measurement errors.
Transform skewed data using log transformation or winsorization to reduce the impact of outliers.
Impute outliers with the median or mean if they are valid ...
I would communicate openly with the client, provide updates on the progress, and discuss potential solutions to meet the deadline.
Communicate proactively with the client about the delay
Provide regular updates on the progress of the task
Discuss potential solutions to meet the deadline, such as reallocating resources or extending the timeline
Apologize for the delay and take responsibility for the situation
Ensure that the...
I applied via LinkedIn and was interviewed in Jul 2024. There were 2 interview rounds.
It was pair programming round where we need to attempt a couple of Spark Scenario along with the Interviewer. You will have a boiler plate code with some functionalities to be filled up. You will be assessed on writing clean and extensible code and test cases.
Types of clusters in Databricks include Standard, High Concurrency, and Single Node clusters.
Standard clusters are used for general-purpose workloads
High Concurrency clusters are optimized for concurrent workloads
Single Node clusters are used for development and testing purposes
Catalyst optimizer is a query optimizer in Apache Spark that leverages advanced techniques to optimize and improve the performance of Spark SQL queries.
Catalyst optimizer uses a rule-based and cost-based optimization approach to generate an optimized query plan.
It performs various optimizations such as constant folding, predicate pushdown, and projection pruning to improve query performance.
Catalyst optimizer also leve...
Explode function is used in Apache Spark to split an array into multiple rows.
Used in Apache Spark to split an array into multiple rows
Creates a new row for each element in the array
Commonly used in data processing and transformation tasks
Delta Lake is an open-source storage layer that brings ACID transactions to Apache Spark and big data workloads.
Delta Lake provides ACID transactions, schema enforcement, and time travel capabilities on top of data lakes.
Data lakes are a storage repository that holds a vast amount of raw data in its native format until it is needed.
Delta Lake is optimized for big data workloads and provides reliability and performance ...
RDD stands for Resilient Distributed Dataset, a fundamental data structure in Apache Spark.
RDD is a fault-tolerant collection of elements that can be operated on in parallel.
RDDs are immutable, meaning they cannot be changed once created.
RDDs support two types of operations: transformations (creating a new RDD from an existing one) and actions (returning a value to the driver program).
I applied via Job Fair and was interviewed in Dec 2024. There was 1 interview round.
based on 1 interview
Interview experience
Software Engineer
142
salaries
| ₹3 L/yr - ₹13.3 L/yr |
Senior Software Engineer
127
salaries
| ₹7 L/yr - ₹21.2 L/yr |
DEP Manager, Sales
103
salaries
| ₹5 L/yr - ₹13 L/yr |
Product Manager
73
salaries
| ₹7.1 L/yr - ₹25 L/yr |
Senior Associate
57
salaries
| ₹3 L/yr - ₹7.3 L/yr |
Tech Data
Redington
Tech Data Corporation
SYNNEX Corporation