i
TCS
Filter interviews by
I applied via Naukri.com and was interviewed in Jul 2022. There was 1 interview round.
Internal tables store data within Hive's warehouse directory while external tables store data outside of it.
Internal tables are managed by Hive and are deleted when the table is dropped
External tables are not managed by Hive and data is not deleted when the table is dropped
Internal tables are faster for querying as data is stored within Hive's warehouse directory
External tables are useful for sharing data between diffe...
Partitioning is dividing a large dataset into smaller, manageable parts. Coalescing is merging small partitions into larger ones.
Partitioning is useful for parallel processing and optimizing query performance.
Coalescing reduces the number of partitions and can improve query performance.
In Spark, partitioning can be done based on a specific column or by specifying the number of partitions.
Coalescing can be used to reduc...
Repartitioning and bucketing are techniques used in Apache Spark to optimize data processing.
Repartitioning is the process of redistributing data across partitions to optimize parallelism and improve performance.
Bucketing is a technique used to organize data into more manageable and efficient groups based on a specific column or set of columns.
Repartitioning and bucketing can be used together to further optimize data p...
Window function is a SQL function that performs a calculation across a set of rows that are related to the current row.
Window functions are used to calculate running totals, moving averages, and other calculations that depend on the order of rows.
They allow you to perform calculations on a subset of rows within a larger result set.
Examples of window functions include ROW_NUMBER, RANK, DENSE_RANK, and NTILE.
Window funct...
An anonymous function is a function without a name.
Also known as lambda functions or closures
Can be used as arguments to higher-order functions
Can be defined inline without a separate declaration
Example: lambda x: x**2 defines a function that squares its input
View is a virtual table created from a SQL query. Dense rank assigns a unique rank to each row in a result set.
A view is a saved SQL query that can be used as a table
Dense rank assigns a unique rank to each row in a result set, with no gaps between the ranks
Dense rank is used to rank rows based on a specific column or set of columns
Example: SELECT * FROM my_view WHERE column_name = 'value'
Example: SELECT column_name, D...
I applied via Walk-in
Rank assigns unique ranks to rows, while dense_rank handles ties by assigning the same rank to tied rows. Left join includes all rows from the left table and matching rows from the right table, while left anti join includes only rows from the left table that do not have a match in the right table.
Rank assigns unique ranks to rows based on the specified order, while dense_rank handles ties by assigning the same rank to ...
I applied via Recruitment Consulltant and was interviewed in Aug 2024. There were 2 interview rounds.
Focus of quantitative maths and aptitude a bit more
TCS interview questions for designations
I applied via LinkedIn and was interviewed in Oct 2024. There was 1 interview round.
Reverse strings in a Python list
Use list comprehension to iterate through the list and reverse each string
Use the slice notation [::-1] to reverse each string
Example: strings = ['hello', 'world'], reversed_strings = [s[::-1] for s in strings]
To find the 2nd highest salary in SQL, use the 'SELECT' statement with 'ORDER BY' and 'LIMIT' clauses.
Use the 'SELECT' statement to retrieve the salary column from the table.
Use the 'ORDER BY' clause to sort the salaries in descending order.
Use the 'LIMIT' clause to limit the result to the second row.
Get interview-ready with Top TCS Interview Questions
I was interviewed in Sep 2024.
I applied via Approached by Company and was interviewed in Sep 2024. There was 1 interview round.
SCD 1 overwrites old data with new data, while SCD 2 keeps track of historical changes.
SCD 1 updates existing records with new data, losing historical information.
SCD 2 creates new records for each change, preserving historical data.
SCD 1 is simpler and faster, but can lead to data loss.
SCD 2 is more complex and slower, but maintains a full history of changes.
Corrupt record handling in Spark involves identifying and handling data that does not conform to expected formats.
Use DataFrameReader option("badRecordsPath", "path/to/bad/records") to save corrupt records to a separate location for further analysis.
Use DataFrame.na.drop() or DataFrame.na.fill() to handle corrupt records by dropping or filling missing values.
Implement custom logic to identify and handle corrupt records
Object-oriented programming (OOP) is a programming paradigm based on the concept of objects, which can contain data in the form of fields and code in the form of procedures.
OOP focuses on creating objects that interact with each other to solve a problem
Key concepts include encapsulation, inheritance, polymorphism, and abstraction
Encapsulation involves bundling data and methods that operate on the data into a single uni...
Data engineer life cycle involves collecting, storing, processing, and analyzing data using various tools.
Data collection: Gathering data from various sources such as databases, APIs, and logs.
Data storage: Storing data in databases, data lakes, or data warehouses.
Data processing: Cleaning, transforming, and enriching data using tools like Apache Spark or Hadoop.
Data analysis: Analyzing data to extract insights and mak...
Spark join strategies include broadcast join, shuffle hash join, and shuffle sort merge join.
Broadcast join is used when one of the DataFrames is small enough to fit in memory on all nodes.
Shuffle hash join is used when joining two large DataFrames by partitioning and shuffling the data based on the join key.
Shuffle sort merge join is used when joining two large DataFrames by sorting and merging the data based on the j
I applied via Approached by Company and was interviewed in Jul 2024. There were 3 interview rounds.
Spark is a fast and general-purpose cluster computing system for big data processing.
Spark is popular for its speed and ease of use in processing large datasets.
It provides in-memory processing capabilities, making it faster than traditional disk-based processing systems.
Spark supports multiple programming languages like Java, Scala, Python, and R.
It offers a wide range of libraries for diverse tasks such as SQL, strea...
Clustering is the process of grouping similar data points together. Pods are groups of one or more containers, while nodes are individual machines in a cluster.
Clustering is a technique used in machine learning to group similar data points together based on certain features or characteristics.
Pods in a cluster are groups of one or more containers that share resources and are scheduled together on the same node.
Nodes ar...
3 Interview rounds
based on 360 reviews
Rating in categories
System Engineer
1.1L
salaries
| ₹1 L/yr - ₹9 L/yr |
IT Analyst
67.7k
salaries
| ₹5.1 L/yr - ₹16 L/yr |
AST Consultant
51.1k
salaries
| ₹8 L/yr - ₹25 L/yr |
Assistant System Engineer
29.9k
salaries
| ₹2.2 L/yr - ₹5.6 L/yr |
Associate Consultant
28.7k
salaries
| ₹9 L/yr - ₹32 L/yr |
Amazon
Wipro
Infosys
Accenture