i
Optum Global Solutions
Proud winner of ABECA 2024 - AmbitionBox Employee Choice Awards
Filter interviews by
I am a Senior Data Engineer with 5+ years of experience in designing and implementing data pipelines for large-scale projects.
Experienced in ETL processes and data warehousing
Proficient in programming languages like Python, SQL, and Java
Skilled in working with big data technologies such as Hadoop, Spark, and Kafka
Strong understanding of data modeling and database management
Excellent problem-solving and communication sk
Developing a real-time data processing system for analyzing customer behavior on e-commerce platform.
Utilizing Apache Kafka for real-time data streaming
Implementing Spark for data processing and analysis
Creating machine learning models for customer segmentation
Integrating with Elasticsearch for data indexing and search functionality
Top trending discussions
I applied via Naukri.com and was interviewed in Nov 2024. There were 2 interview rounds.
I applied via Naukri.com and was interviewed in Nov 2024. There was 1 interview round.
Enhanced optimization in AWS Glue improves job performance by automatically adjusting resources based on workload
Enhanced optimization in AWS Glue automatically adjusts resources like DPUs based on workload
It helps improve job performance by optimizing resource allocation
Users can enable enhanced optimization in AWS Glue job settings
Optimizing querying in Amazon Redshift involves proper table design, distribution keys, sort keys, and query optimization techniques.
Use appropriate distribution keys to evenly distribute data across nodes for parallel processing.
Utilize sort keys to physically order data on disk, reducing the need for sorting during queries.
Avoid using SELECT * and instead specify only the columns needed to reduce data transfer.
Use AN...
I was interviewed in Sep 2024.
Pyspark is a Python library for big data processing using Spark framework.
Pyspark is used for processing large datasets in parallel.
It provides APIs for data manipulation, querying, and analysis.
Example: Using pyspark to read a CSV file and perform data transformations.
Databricks optimisation techniques improve performance and efficiency of data processing on the Databricks platform.
Use cluster sizing and autoscaling to optimize resource allocation based on workload
Leverage Databricks Delta for optimized data storage and processing
Utilize caching and persisting data to reduce computation time
Optimize queries by using appropriate indexing and partitioning strategies
Databricks is a unified data analytics platform that provides a collaborative environment for data engineers.
Databricks is built on top of Apache Spark and provides a workspace for data engineering tasks.
It allows for easy integration with various data sources and tools for data processing.
Databricks provides features like notebooks, clusters, and libraries for efficient data engineering workflows.
I applied via Naukri.com and was interviewed in Sep 2024. There was 1 interview round.
SCD type 2 is a method used in data warehousing to track historical changes by creating a new record for each change.
SCD type 2 stands for Slowly Changing Dimension type 2
It involves creating a new record in the dimension table whenever there is a change in the data
The old record is marked as inactive and the new record is marked as current
It allows for historical tracking of changes in data over time
Example: If a cust...
posted on 26 Oct 2024
I applied via Naukri.com and was interviewed in Sep 2024. There was 1 interview round.
Spark Optimization, Transformation, DLT, DL, Data Governance
Python
SQL
I applied via Job Portal and was interviewed in Jul 2024. There was 1 interview round.
I have over 5 years of experience in data engineering, working with large datasets and implementing data pipelines.
Developed and maintained ETL processes to extract, transform, and load data from various sources
Optimized database performance and implemented data quality checks
Worked with cross-functional teams to design and implement data solutions
Utilized tools such as Apache Spark, Hadoop, and SQL for data processing
...
I would start by understanding the requirements, breaking down the task into smaller steps, researching if needed, and then creating a plan to execute the task efficiently.
Understand the requirements of the task
Break down the task into smaller steps
Research if needed to gather necessary information
Create a plan to execute the task efficiently
Communicate with stakeholders for clarification or updates
Regularly track prog
I applied via LinkedIn and was interviewed in Feb 2024. There were 3 interview rounds.
Working with nested JSON using PySpark involves using the StructType and StructField classes to define the schema and then using the select function to access nested fields.
Define the schema using StructType and StructField classes
Use the select function to access nested fields
Use dot notation to access nested fields, for example df.select('nested_field.sub_field')
Implementing SCD2 involves tracking historical changes in data over time.
Identify the business key that uniquely identifies each record
Add effective start and end dates to track when the record was valid
Insert new records with updated data and end date of '9999-12-31'
Update end date of previous record when a change occurs
Use a SQL query to select data from table 2 where data exists in table 1
Use a JOIN statement to link the two tables based on a common column
Specify the columns you want to select from table 2
Use a WHERE clause to check for existence of data in table 1
The number of records retrieved after performing joins depends on the type of join - inner, left, right, or outer.
Inner join retrieves only the matching records from both tables
Left join retrieves all records from the left table and matching records from the right table
Right join retrieves all records from the right table and matching records from the left table
Outer join retrieves all records from both tables, filling
I applied via Naukri.com and was interviewed in Dec 2024. There were 2 interview rounds.
Basic Simple Apti Test was given
Interview experience
based on 30 reviews
Rating in categories
Claims Associate
4.3k
salaries
| ₹1.6 L/yr - ₹5.6 L/yr |
Senior Software Engineer
2.8k
salaries
| ₹9.4 L/yr - ₹29.6 L/yr |
Software Engineer
2.6k
salaries
| ₹6.2 L/yr - ₹22 L/yr |
Senior Claims Associate
1.2k
salaries
| ₹2.1 L/yr - ₹5.8 L/yr |
Medical Coder
1.1k
salaries
| ₹1.5 L/yr - ₹8 L/yr |
Cognizant
Accenture
IBM
TCS