i
TCS
Filter interviews by
Magic Table in SQL is a temporary table that is automatically created and populated with data during triggers execution.
Magic Table is also known as 'Inserted' table in SQL Server.
It is used in triggers to access the data that was inserted, updated, or deleted in a table.
For example, in an 'AFTER INSERT' trigger, the Magic Table contains the rows that were just inserted.
Use SQL query to select unique customers in last 3 months sales
Filter sales data for the last 3 months
Use DISTINCT keyword to select unique customers
Join with customer table if necessary
CTE stands for Common Table Expression in SQL, used to create temporary result sets that can be referenced within a query.
CTEs improve readability and maintainability of complex queries
They can be recursive, allowing for hierarchical data querying
CTEs are defined using the WITH keyword followed by the CTE name and query
MERGE statement is used to perform insert, update, or delete operations in a single statement based on a condition.
Combines INSERT, UPDATE, and DELETE operations into a single statement
Helps to avoid multiple separate statements for different operations
Useful for synchronizing data between two tables based on a condition
What people are saying about TCS
Improving performance in data engineering involves optimizing code, utilizing efficient algorithms, and scaling infrastructure.
Optimize code by reducing unnecessary computations and improving data processing efficiency.
Utilize efficient algorithms and data structures to minimize time and space complexity.
Scale infrastructure by leveraging cloud services, parallel processing, and distributed computing.
Monitor perfo...
To migrate data from a local server to AWS Redshift, you can use various methods such as AWS Database Migration Service, AWS Glue, or manual ETL processes.
Use AWS Database Migration Service (DMS) to replicate data from the local server to Redshift
Create a DMS replication instance and endpoints for the source and target databases
Configure the replication task to specify the source and target endpoints, table mappin...
Yes, I have experience in AWS Glue and can use it for data migration.
AWS Glue is a fully managed extract, transform, and load (ETL) service that makes it easy to prepare and load data for analytics.
To use Glue for data migration, I would start by creating a Glue job that defines the source and target data sources, as well as any transformations needed.
I would then configure the job to run on a schedule or trigger ...
Data can be migrated from a local server to AWS Redshift using tools like AWS Database Migration Service or manual ETL processes.
Use AWS Database Migration Service for automated migration
Export data from local server to S3 and then load into Redshift using COPY command
Use ETL tools like AWS Glue for data transformation and loading into Redshift
Data pipelines are designed by identifying data sources, defining data transformations, and selecting appropriate tools and technologies.
Identify data sources and understand their structure and format
Define data transformations and processing steps
Select appropriate tools and technologies for data ingestion, processing, and storage
Consider scalability, reliability, and performance requirements
Implement error handl...
Spark submit is a command-line tool used to submit Spark applications to a cluster.
Spark submit is used to launch Spark applications on a cluster.
It is a command-line interface that allows users to specify the application's main class or JAR file, along with other configuration options.
Spark submit handles the deployment of the application code and resources to the cluster, and manages the execution of the applica...
I appeared for an interview in Apr 2025, where I was asked the following questions.
I applied via Walk-in
Rank assigns unique ranks to rows, while dense_rank handles ties by assigning the same rank to tied rows. Left join includes all rows from the left table and matching rows from the right table, while left anti join includes only rows from the left table that do not have a match in the right table.
Rank assigns unique ranks to rows based on the specified order, while dense_rank handles ties by assigning the same rank to ...
I applied via Recruitment Consulltant and was interviewed in Aug 2024. There were 2 interview rounds.
Focus of quantitative maths and aptitude a bit more
I applied via LinkedIn and was interviewed in Oct 2024. There was 1 interview round.
Reverse strings in a Python list
Use list comprehension to iterate through the list and reverse each string
Use the slice notation [::-1] to reverse each string
Example: strings = ['hello', 'world'], reversed_strings = [s[::-1] for s in strings]
To find the 2nd highest salary in SQL, use the 'SELECT' statement with 'ORDER BY' and 'LIMIT' clauses.
Use the 'SELECT' statement to retrieve the salary column from the table.
Use the 'ORDER BY' clause to sort the salaries in descending order.
Use the 'LIMIT' clause to limit the result to the second row.
I appeared for an interview in Sep 2024.
I applied via Approached by Company and was interviewed in Sep 2024. There was 1 interview round.
SCD 1 overwrites old data with new data, while SCD 2 keeps track of historical changes.
SCD 1 updates existing records with new data, losing historical information.
SCD 2 creates new records for each change, preserving historical data.
SCD 1 is simpler and faster, but can lead to data loss.
SCD 2 is more complex and slower, but maintains a full history of changes.
Corrupt record handling in Spark involves identifying and handling data that does not conform to expected formats.
Use DataFrameReader option("badRecordsPath", "path/to/bad/records") to save corrupt records to a separate location for further analysis.
Use DataFrame.na.drop() or DataFrame.na.fill() to handle corrupt records by dropping or filling missing values.
Implement custom logic to identify and handle corrupt records...
Object-oriented programming (OOP) is a programming paradigm based on the concept of objects, which can contain data in the form of fields and code in the form of procedures.
OOP focuses on creating objects that interact with each other to solve a problem
Key concepts include encapsulation, inheritance, polymorphism, and abstraction
Encapsulation involves bundling data and methods that operate on the data into a single uni...
Data engineer life cycle involves collecting, storing, processing, and analyzing data using various tools.
Data collection: Gathering data from various sources such as databases, APIs, and logs.
Data storage: Storing data in databases, data lakes, or data warehouses.
Data processing: Cleaning, transforming, and enriching data using tools like Apache Spark or Hadoop.
Data analysis: Analyzing data to extract insights and mak...
Spark join strategies include broadcast join, shuffle hash join, and shuffle sort merge join.
Broadcast join is used when one of the DataFrames is small enough to fit in memory on all nodes.
Shuffle hash join is used when joining two large DataFrames by partitioning and shuffling the data based on the join key.
Shuffle sort merge join is used when joining two large DataFrames by sorting and merging the data based on the j...
Spark is a fast and general-purpose cluster computing system for big data processing.
Spark is popular for its speed and ease of use in processing large datasets.
It provides in-memory processing capabilities, making it faster than traditional disk-based processing systems.
Spark supports multiple programming languages like Java, Scala, Python, and R.
It offers a wide range of libraries for diverse tasks such as SQL, strea...
Clustering is the process of grouping similar data points together. Pods are groups of one or more containers, while nodes are individual machines in a cluster.
Clustering is a technique used in machine learning to group similar data points together based on certain features or characteristics.
Pods in a cluster are groups of one or more containers that share resources and are scheduled together on the same node.
Nodes ar...
The duration of TCS Data Engineer interview process can vary, but typically it takes about less than 2 weeks to complete.
based on 101 interview experiences
Difficulty level
Duration
based on 513 reviews
Rating in categories
Hyderabad / Secunderabad,
Bangalore / Bengaluru
+16-11 Yrs
Not Disclosed
System Engineer
1.1L
salaries
| ₹3.9 L/yr - ₹8.3 L/yr |
IT Analyst
65.5k
salaries
| ₹7.7 L/yr - ₹12.7 L/yr |
AST Consultant
53.6k
salaries
| ₹12 L/yr - ₹20.6 L/yr |
Assistant System Engineer
33.2k
salaries
| ₹2.5 L/yr - ₹6.4 L/yr |
Associate Consultant
33k
salaries
| ₹16.2 L/yr - ₹28 L/yr |
Amazon
Wipro
Infosys
Accenture