Filter interviews by
I applied via Naukri.com and was interviewed in Jul 2024. There were 4 interview rounds.
Aptitude was Okay. Time given was less.
Dataframes in Pyspark are distributed collections of data organized into named columns.
Dataframes are similar to tables in a relational database.
They can be created from various data sources like CSV, JSON, Parquet, etc.
Dataframes support SQL queries and transformations using PySpark functions.
Yes, I am ready to travel on site for data engineering projects.
I am willing to travel for client meetings, project kick-offs, and on-site troubleshooting.
I understand the importance of face-to-face interactions in project delivery.
I have previous experience traveling for work, such as attending conferences or training sessions.
I am flexible with my schedule and can accommodate last-minute travel if needed.
I applied via Company Website and was interviewed in Jun 2024. There were 2 interview rounds.
Functional testing checks if the software functions as expected, while non-functional testing checks the performance, usability, security, etc.
Functional testing focuses on the specific functionality of the software
Non-functional testing focuses on aspects like performance, usability, security, etc.
Examples of functional testing include unit testing, integration testing, and system testing
Examples of non-functional tes...
Performance testing focuses on evaluating the speed, responsiveness, and stability of a system under a specific workload, while capacity testing assesses the system's ability to handle a certain level of traffic or data volume over time.
Performance testing measures the speed, responsiveness, and stability of a system under a specific workload.
Capacity testing evaluates the system's ability to handle a certain level of ...
I applied via Naukri.com and was interviewed in Sep 2024. There was 1 interview round.
SDLC stands for Software Development Life Cycle. There are various models followed during a project like Waterfall, Agile, Iterative, etc.
SDLC is a process used by software development teams to design, develop, and test high-quality software.
Waterfall model follows a linear and sequential approach, where each phase must be completed before moving on to the next.
Agile model emphasizes flexibility and customer collaborat...
What people are saying about PwC
Inventory management process involves tracking, storing, and managing inventory to ensure efficient operations.
Tracking inventory levels to know when to reorder
Organizing inventory for easy access and retrieval
Implementing barcode or RFID technology for accurate tracking
Regularly conducting inventory audits to prevent stockouts or overstocking
Amortization is the process of spreading out loan payments over time, with a portion of each payment going towards both the principal and interest.
Amortization helps borrowers pay off a loan gradually over a set period of time.
Each payment is typically divided into two parts: one portion goes towards reducing the loan balance (principal), and the other portion covers the interest accrued.
Common examples of amortization...
PwC interview questions for popular designations
Different types of joins in SQL are inner join, left join, right join, and full outer join.
Inner join: Returns rows when there is a match in both tables
Left join: Returns all rows from the left table and the matched rows from the right table
Right join: Returns all rows from the right table and the matched rows from the left table
Full outer join: Returns rows when there is a match in either table
The main difference between 'having' and 'where' in SQL is that 'having' is used with aggregate functions to filter groups, while 'where' is used to filter rows.
HAVING is used with GROUP BY to filter groups based on aggregate functions results
WHERE is used to filter rows based on conditions
HAVING is applied after GROUP BY, while WHERE is applied before GROUP BY
Example: SELECT department, AVG(salary) FROM employees GROU...
Get interview-ready with Top PwC Interview Questions
I applied via Naukri.com and was interviewed in Jun 2024. There was 1 interview round.
Apache Spark is an open-source distributed computing system that provides an interface for programming entire clusters with implicit data parallelism and fault tolerance.
Apache Spark is designed for speed and ease of use in processing large amounts of data.
It can run programs up to 100x faster than Hadoop MapReduce in memory, or 10x faster on disk.
Spark provides high-level APIs in Java, Scala, Python, and R, and an opt...
Core components of Spark include Spark Core, Spark SQL, Spark Streaming, MLlib, and GraphX.
Spark Core: foundation of the Spark platform, provides basic functionality for distributed data processing
Spark SQL: module for working with structured data using SQL and DataFrame API
Spark Streaming: extension of the core Spark API that enables scalable, high-throughput, fault-tolerant stream processing of live data streams
MLlib...
Implement fault tolerance by using checkpointing, replication, and monitoring mechanisms.
Enable checkpointing in Spark Streaming to save the state of the computation periodically to a reliable storage like HDFS or S3.
Use replication in Kafka to ensure that data is not lost in case of node failures.
Monitor the health of the Kafka and Spark clusters using tools like Prometheus and Grafana to detect and address issues pro
Hive Architecture is a data warehousing infrastructure built on top of Hadoop for querying and analyzing large datasets.
Hive uses a language called HiveQL which is similar to SQL for querying data stored in Hadoop.
It organizes data into tables, partitions, and buckets to optimize queries and improve performance.
Hive metastore stores metadata about tables, columns, partitions, and their locations.
Hive queries are conver...
Vectorization is the process of converting data into a format that can be easily processed by a computer's CPU or GPU.
Vectorization allows for parallel processing of data, improving computational efficiency.
It involves performing operations on entire arrays or matrices at once, rather than on individual elements.
Examples include using libraries like NumPy in Python to perform vectorized operations on arrays.
Vectorizati...
Partition in Hive is a way to organize data in a table into multiple directories based on the values of one or more columns.
Partitions help in improving query performance by allowing Hive to only read the relevant data directories.
Partitions are defined when creating a table in Hive using the PARTITIONED BY clause.
Example: CREATE TABLE table_name (column1 INT, column2 STRING) PARTITIONED BY (column3 STRING);
Functions in SQL are built-in operations that can be used to manipulate data or perform calculations within a database.
Functions in SQL can be used to perform operations on data, such as mathematical calculations, string manipulation, date/time functions, and more.
Examples of SQL functions include SUM(), AVG(), CONCAT(), UPPER(), LOWER(), DATE_FORMAT(), and many others.
Functions can be used in SELECT statements, WHERE ...
Rank, Dense_rank, and row_number are window functions used in SQL to assign a rank to each row based on a specified order.
Rank function assigns a unique rank to each row based on the specified order.
Dense_rank function assigns a unique rank to each row without any gaps based on the specified order.
Row_number function assigns a unique sequential integer to each row based on the specified order.
I applied via Recruitment Consulltant and was interviewed in Sep 2024. There was 1 interview round.
I applied via Job Portal and was interviewed in Oct 2024. There were 2 interview rounds.
Its good and better the exam
I applied via LinkedIn and was interviewed in Jun 2024. There was 1 interview round.
findElements returns a list of web elements matching the locator, while findElement returns the first web element matching the locator.
findElements returns a list of web elements, findElement returns the first element
findElements returns an empty list if no elements are found, findElement throws NoSuchElementException
findElements is useful for finding multiple elements, findElement is useful for finding a single elemen
The code takes a name as input, prints each letter, and checks if any vowels repeat in the name.
Create a function that takes a string input for the name
Iterate through each letter in the name and print them individually
Check for vowel repetition by keeping track of vowels encountered
Fluent wait is a dynamic wait mechanism in Selenium WebDriver that waits for a condition to be true before proceeding.
Fluent wait is used to handle dynamic web elements that may load at different times.
It can define the maximum amount of time to wait for a condition, as well as the frequency of checking.
Example: WebDriverWait wait = new WebDriverWait(driver, 10); wait.until(ExpectedConditions.visibilityOfElementLocated
I am an Automation Engineer with experience in designing and implementing automated test scripts for web applications.
Designed and implemented automated test scripts using Selenium WebDriver for regression testing
Collaborated with developers to integrate automated tests into continuous integration pipeline
Performed manual testing to identify bugs and validate automated test results
Some of the top questions asked at the PwC interview -
The duration of PwC interview process can vary, but typically it takes about less than 2 weeks to complete.
Interview experience
Senior Associate
14.5k
salaries
| ₹8 L/yr - ₹30 L/yr |
Associate
12.6k
salaries
| ₹4.6 L/yr - ₹16 L/yr |
Manager
6.6k
salaries
| ₹13.4 L/yr - ₹50 L/yr |
Senior Consultant
4.5k
salaries
| ₹8.9 L/yr - ₹32 L/yr |
Associate2
4.1k
salaries
| ₹4.5 L/yr - ₹16.5 L/yr |
Deloitte
Ernst & Young
Accenture
TCS