Filter interviews by
I applied via Job Portal and was interviewed in Jan 2024. There were 3 interview rounds.
Implement Slowly Changing Dimension (SCD) using pyspark
Use pyspark to read the source and target tables
Identify the changes in the source data compared to the target data
Update the existing records in the target table with the new values
Insert new records for the new data in the source table
Handle historical data by maintaining effective start and end dates
I was interviewed before Mar 2023.
I was interviewed before Mar 2023.
I applied via Recruitment Consulltant and was interviewed in Nov 2024. There were 2 interview rounds.
Different types of joins available in Databricks include inner join, outer join, left join, right join, and cross join.
Inner join: Returns only the rows that have matching values in both tables.
Outer join: Returns all rows when there is a match in either table.
Left join: Returns all rows from the left table and the matched rows from the right table.
Right join: Returns all rows from the right table and the matched rows ...
Implementing fault tolerance in a data pipeline involves redundancy, monitoring, and error handling.
Use redundant components to ensure continuous data flow
Implement monitoring tools to detect failures and bottlenecks
Set up automated alerts for immediate response to issues
Design error handling mechanisms to gracefully handle failures
Use checkpoints and retries to ensure data integrity
AutoLoader is a feature in data engineering that automatically loads data from various sources into a data warehouse or database.
Automates the process of loading data from different sources
Reduces manual effort and human error
Can be scheduled to run at specific intervals
Examples: Apache Nifi, AWS Glue
To connect to different services in Azure, you can use Azure SDKs, REST APIs, Azure Portal, Azure CLI, and Azure PowerShell.
Use Azure SDKs for programming languages like Python, Java, C#, etc.
Utilize REST APIs to interact with Azure services programmatically.
Access and manage services through the Azure Portal.
Leverage Azure CLI for command-line interface interactions.
Automate tasks using Azure PowerShell scripts.
Linked Services are connections to external data sources or destinations in Azure Data Factory.
Linked Services define the connection information needed to connect to external data sources or destinations.
They can be used in Data Factory pipelines to read from or write to external systems.
Examples of Linked Services include Azure Blob Storage, Azure SQL Database, and Amazon S3.
I applied via Recruitment Consulltant and was interviewed in Oct 2024. There was 1 interview round.
Broadcast and accumulator are used in Spark for efficient data sharing and aggregation across tasks.
Broadcast variables are used to efficiently distribute large read-only data to all tasks in a Spark job.
Accumulators are used for aggregating values from all tasks in a Spark job to a shared variable.
Broadcast variables help in reducing data transfer costs and improving performance.
Accumulators are used for tasks like co...
Coalesce and repartition are operations in Spark used to control the number of partitions in a DataFrame.
Coalesce reduces the number of partitions without shuffling data, while repartition reshuffles data to create a specified number of partitions.
Coalesce is more efficient when reducing partitions, as it minimizes data movement.
Repartition is useful for evenly distributing data across a specified number of partitions.
...
BigQuery is a fully managed, serverless data warehouse by Google Cloud for analyzing large datasets using SQL queries.
BigQuery is a cloud-based data warehouse that allows for fast SQL queries on large datasets.
It is fully managed and serverless, meaning users do not have to worry about infrastructure management.
BigQuery can handle petabytes of data and allows for real-time analytics with its streaming capabilities.
It s...
I applied via Amazon jobs and was interviewed in Sep 2023. There were 3 interview rounds.
Basic with python SQL and data models
Developing a data pipeline to analyze customer behavior for an e-commerce company
Collecting and storing customer data from website interactions
Cleaning and transforming data to identify patterns and trends
Building machine learning models to predict customer behavior
Visualizing insights for stakeholders to make data-driven decisions
1 Interview rounds
Software Engineer
26
salaries
| ₹5 L/yr - ₹8.5 L/yr |
Data Engineer
6
salaries
| ₹6.5 L/yr - ₹7.2 L/yr |
Software Developer
4
salaries
| ₹5 L/yr - ₹10 L/yr |
Senior Engineer
4
salaries
| ₹12 L/yr - ₹15.2 L/yr |
Senior Software Engineer
4
salaries
| ₹8.3 L/yr - ₹22.5 L/yr |
TCS
Infosys
Wipro
HCLTech