i
TCS
Filter interviews by
To call a notebook from another notebook in Databricks, use the %run command followed by the path of the notebook.
Use the %run command followed by the path of the notebook to call it from another notebook.
Make sure the notebook you want to call is in the same workspace or accessible to the notebook you are calling it from.
You can also pass parameters to the notebook being called using the %run command.
Yes, I am open to relocating for the right opportunity.
I am willing to relocate for the right job opportunity
I have relocated in the past for work
I am flexible and open to new experiences
I am flexible and willing to work all shifts to meet the team's needs and project deadlines.
I understand that data engineering often requires collaboration across different time zones.
For example, I can adjust my schedule to align with team members in other regions.
I have previously worked night shifts during critical project phases to ensure timely delivery.
I believe that flexibility in shifts can enhance team pr...
Datastage and Informatica are both ETL tools used for data integration, but they have differences in terms of features and capabilities.
Datastage is developed by IBM and is known for its parallel processing capabilities, while Informatica is developed by Informatica Corporation and is known for its strong data quality features.
Datastage has a more user-friendly interface compared to Informatica, making it easier f...
What people are saying about TCS
Optimizing Spark jobs involves tuning configurations, optimizing code, and utilizing resources efficiently.
Tune Spark configurations such as executor memory, cores, and parallelism
Optimize code by reducing unnecessary shuffles, caching intermediate results, and using efficient transformations
Utilize resources efficiently by monitoring job performance, scaling cluster resources as needed, and optimizing data storag...
I ingest data in the pipeline using tools like Apache Kafka and Apache NiFi.
Use Apache Kafka for real-time data streaming
Utilize Apache NiFi for data ingestion and transformation
Implement data pipelines using tools like Apache Spark or Apache Flink
To add a column in a df, use the df['new_column'] = value syntax.
Use the df['new_column'] = value syntax to add a new column to a DataFrame.
Value can be a single value, a list, or a Series.
Example: df['new_column'] = 10
You can join tables without identity columns using other unique columns or composite keys.
Use other unique columns or composite keys to join the tables
Consider using a combination of columns to create a unique identifier for joining
If no unique columns are available, consider using a combination of non-unique columns with additional logic to ensure accurate joins
Avoid data skewness by partitioning data, using sampling techniques, and optimizing queries.
Partition data to distribute evenly across nodes
Use sampling techniques to analyze data distribution
Optimize queries to prevent skewed data distribution
Optimizing stored procedures involves improving performance by reducing execution time and resource usage.
Identify and eliminate unnecessary or redundant code
Use appropriate indexing to speed up data retrieval
Avoid using cursors and loops for better performance
Update statistics regularly to help the query optimizer make better decisions
Consider partitioning large tables to improve query performance
I appeared for an interview in Apr 2025, where I was asked the following questions.
I applied via Walk-in
Rank assigns unique ranks to rows, while dense_rank handles ties by assigning the same rank to tied rows. Left join includes all rows from the left table and matching rows from the right table, while left anti join includes only rows from the left table that do not have a match in the right table.
Rank assigns unique ranks to rows based on the specified order, while dense_rank handles ties by assigning the same rank to ...
I applied via Recruitment Consulltant and was interviewed in Aug 2024. There were 2 interview rounds.
Focus of quantitative maths and aptitude a bit more
I applied via LinkedIn and was interviewed in Oct 2024. There was 1 interview round.
Reverse strings in a Python list
Use list comprehension to iterate through the list and reverse each string
Use the slice notation [::-1] to reverse each string
Example: strings = ['hello', 'world'], reversed_strings = [s[::-1] for s in strings]
To find the 2nd highest salary in SQL, use the 'SELECT' statement with 'ORDER BY' and 'LIMIT' clauses.
Use the 'SELECT' statement to retrieve the salary column from the table.
Use the 'ORDER BY' clause to sort the salaries in descending order.
Use the 'LIMIT' clause to limit the result to the second row.
I appeared for an interview in Sep 2024.
I applied via Approached by Company and was interviewed in Sep 2024. There was 1 interview round.
SCD 1 overwrites old data with new data, while SCD 2 keeps track of historical changes.
SCD 1 updates existing records with new data, losing historical information.
SCD 2 creates new records for each change, preserving historical data.
SCD 1 is simpler and faster, but can lead to data loss.
SCD 2 is more complex and slower, but maintains a full history of changes.
Corrupt record handling in Spark involves identifying and handling data that does not conform to expected formats.
Use DataFrameReader option("badRecordsPath", "path/to/bad/records") to save corrupt records to a separate location for further analysis.
Use DataFrame.na.drop() or DataFrame.na.fill() to handle corrupt records by dropping or filling missing values.
Implement custom logic to identify and handle corrupt records...
Object-oriented programming (OOP) is a programming paradigm based on the concept of objects, which can contain data in the form of fields and code in the form of procedures.
OOP focuses on creating objects that interact with each other to solve a problem
Key concepts include encapsulation, inheritance, polymorphism, and abstraction
Encapsulation involves bundling data and methods that operate on the data into a single uni...
Data engineer life cycle involves collecting, storing, processing, and analyzing data using various tools.
Data collection: Gathering data from various sources such as databases, APIs, and logs.
Data storage: Storing data in databases, data lakes, or data warehouses.
Data processing: Cleaning, transforming, and enriching data using tools like Apache Spark or Hadoop.
Data analysis: Analyzing data to extract insights and mak...
Spark join strategies include broadcast join, shuffle hash join, and shuffle sort merge join.
Broadcast join is used when one of the DataFrames is small enough to fit in memory on all nodes.
Shuffle hash join is used when joining two large DataFrames by partitioning and shuffling the data based on the join key.
Shuffle sort merge join is used when joining two large DataFrames by sorting and merging the data based on the j...
Spark is a fast and general-purpose cluster computing system for big data processing.
Spark is popular for its speed and ease of use in processing large datasets.
It provides in-memory processing capabilities, making it faster than traditional disk-based processing systems.
Spark supports multiple programming languages like Java, Scala, Python, and R.
It offers a wide range of libraries for diverse tasks such as SQL, strea...
Clustering is the process of grouping similar data points together. Pods are groups of one or more containers, while nodes are individual machines in a cluster.
Clustering is a technique used in machine learning to group similar data points together based on certain features or characteristics.
Pods in a cluster are groups of one or more containers that share resources and are scheduled together on the same node.
Nodes ar...
The duration of TCS Data Engineer interview process can vary, but typically it takes about less than 2 weeks to complete.
based on 101 interview experiences
Difficulty level
Duration
based on 513 reviews
Rating in categories
Hyderabad / Secunderabad,
Bangalore / Bengaluru
+16-11 Yrs
Not Disclosed
System Engineer
1.1L
salaries
| ₹3.9 L/yr - ₹8.3 L/yr |
IT Analyst
65.5k
salaries
| ₹7.7 L/yr - ₹12.7 L/yr |
AST Consultant
53.6k
salaries
| ₹12 L/yr - ₹20.6 L/yr |
Assistant System Engineer
33.2k
salaries
| ₹2.5 L/yr - ₹6.4 L/yr |
Associate Consultant
33k
salaries
| ₹16.2 L/yr - ₹28 L/yr |
Amazon
Wipro
Infosys
Accenture