i
TCS
Filter interviews by
RDD is a basic abstraction in Spark representing data as a distributed collection of objects, while DataFrame is a distributed collection of data organized into named columns.
RDD is more low-level and less optimized compared to DataFrame
DataFrames are easier to use for data manipulation and analysis
DataFrames provide a more structured way to work with data compared to RDDs
RDDs are suitable for unstructured data process...
I applied via Naukri.com and was interviewed in Oct 2024. There was 1 interview round.
I applied via Approached by Company and was interviewed in Apr 2024. There was 1 interview round.
IaaS provides virtualized infrastructure resources, while PaaS offers a platform for developing, testing, and managing applications.
IaaS allows users to rent virtualized hardware resources like virtual machines, storage, and networking, while PaaS provides a platform for developers to build, deploy, and manage applications without worrying about the underlying infrastructure.
In IaaS, users have more control over the op...
I applied via Naukri.com and was interviewed in Apr 2024. There was 1 interview round.
Various performance optimization techniques in Databricks
TCS interview questions for designations
I applied via Approached by Company and was interviewed in Mar 2024. There was 1 interview round.
IR in ADF pipeline stands for Integration Runtime, which is a compute infrastructure used by Azure Data Factory to provide data integration capabilities across different network environments.
IR in ADF pipeline is responsible for executing activities within the pipeline.
It can be configured to run in different modes such as Azure, Self-hosted, and SSIS.
Integration Runtime allows data movement between on-premises and clo...
Get interview-ready with Top TCS Interview Questions
I applied via Naukri.com and was interviewed in Oct 2023. There was 1 interview round.
I was interviewed in Mar 2024.
Reading files in notebook, configuring data, using ADF trigger, parquet format, window functions vs group by, reading CSV file and storing in parquet, dataset vs dataframe, transformations, delta lake
To read files in notebook, use libraries like pandas or pyspark
Configuration needed includes specifying file path, format, and any additional options
ADF trigger can be used for automated data processing, but may not be nec...
Encountered a data corruption issue in Azure Data Lake Storage and resolved it by restoring from a backup.
Identified the corrupted files by analyzing error logs and data inconsistencies
Restored the affected data from the latest backup available
Implemented preventive measures such as regular data integrity checks and backups
Collaborated with the Azure support team to investigate the root cause
I applied via Naukri.com and was interviewed in Jan 2023. There were 4 interview rounds.
Blob is a storage service for unstructured data, while ADLS is a distributed file system for big data analytics.
Blob is a general-purpose object storage service for unstructured data, while ADLS is optimized for big data analytics workloads.
Blob storage is suitable for storing large amounts of data, such as images, videos, and logs, while ADLS is designed for processing large datasets in parallel.
ADLS offers features l...
Get metadata activity is used to retrieve metadata of a specified data store or dataset in Azure Data Factory.
Get metadata activity is used in Azure Data Factory to retrieve metadata of a specified data store or dataset.
Parameters to pass include dataset, linked service, and optional folder path.
The output of the activity includes information like schema, size, last modified timestamp, etc.
Example: Get metadata of a SQ...
You can monitor the child pipeline in the master pipeline by using Azure Monitor or Azure Data Factory monitoring tools.
Use Azure Monitor to track the performance and health of the child pipeline within the master pipeline.
Leverage Azure Data Factory monitoring tools to view detailed logs and metrics for the child pipeline execution.
Set up alerts and notifications to be informed of any issues or failures in the child p
Delta is an open-source storage layer that brings ACID transactions to Apache Spark and big data workloads, while Parquet is a columnar storage format optimized for reading and writing data in large volumes.
Delta is designed for use with big data workloads and provides ACID transactions, while Parquet is optimized for reading and writing large volumes of data efficiently.
Delta allows for updates and deletes of data, wh...
SQL inner and left join can be used to combine tables with duplicate values based on specified conditions.
Use INNER JOIN to return rows from both tables that have matching values
Use LEFT JOIN to return all rows from the left table and the matched rows from the right table
Handle duplicate values by using DISTINCT or GROUP BY clauses
You can load multiple tables at a time using Azure Data Factory by creating a single pipeline with multiple copy activities.
Create a pipeline in Azure Data Factory
Add multiple copy activities to the pipeline, each copy activity for loading data from one table
Configure each copy activity to load data from a different table
Run the pipeline to load data from all tables simultaneously
Yes, I am familiar with Databricks and have been working on it for the past 2 years.
I have been using Databricks for data engineering tasks such as data processing, data transformation, and data visualization.
I have experience in building and optimizing data pipelines using Databricks.
I have worked on collaborative projects with team members using Databricks notebooks.
I have utilized Databricks for big data processing ...
Yes, pyspark is a Python API for Apache Spark, used for big data processing and analytics.
pyspark is a Python API for Apache Spark, allowing users to write Spark applications using Python.
It provides high-level APIs in Python for Spark's functionality, making it easier to work with big data.
pyspark is commonly used for data processing, machine learning, and analytics tasks.
Example: Using pyspark to read data from a CSV...
1 Interview rounds
based on 63 reviews
Rating in categories
System Engineer
1.1L
salaries
| ₹1 L/yr - ₹9 L/yr |
IT Analyst
67.7k
salaries
| ₹5.1 L/yr - ₹16 L/yr |
AST Consultant
51.2k
salaries
| ₹8 L/yr - ₹25 L/yr |
Assistant System Engineer
29.9k
salaries
| ₹2.2 L/yr - ₹5.6 L/yr |
Associate Consultant
28.8k
salaries
| ₹8.9 L/yr - ₹32 L/yr |
Amazon
Wipro
Infosys
Accenture