Filter interviews by
I applied via Company Website and was interviewed before May 2023. There were 3 interview rounds.
There are one hour exam
Advanced level of SQL and Python skills are essential for a Data Engineer role.
Strong understanding of SQL queries, joins, subqueries, and optimization techniques.
Proficiency in writing complex Python scripts for data manipulation, analysis, and automation.
Experience with data modeling, ETL processes, and working with large datasets.
Knowledge of data warehousing concepts and tools like SQL Server, PostgreSQL, or Snowfl...
I was interviewed in Aug 2024.
I applied via LinkedIn and was interviewed in Jan 2024. There was 1 interview round.
Pyspark is a Python API for Apache Spark, a powerful open-source distributed computing system.
Pyspark is used for processing large datasets in parallel across a cluster of computers.
It provides high-level APIs in Python for Spark programming.
Pyspark allows seamless integration with other Python libraries like Pandas and NumPy.
Example: Using Pyspark to perform data analysis and machine learning tasks on big data sets.
Pyspark SQL is a module in Apache Spark that provides a SQL interface for working with structured data.
Pyspark SQL allows users to run SQL queries on Spark dataframes.
It provides a more concise and user-friendly way to interact with data compared to traditional Spark RDDs.
Users can leverage the power of SQL for data manipulation and analysis within the Spark ecosystem.
To merge 2 dataframes of different schema, use join operations or data transformation techniques.
Use join operations like inner join, outer join, left join, or right join based on the requirement.
Perform data transformation to align the schemas before merging.
Use tools like Apache Spark, Pandas, or SQL to merge dataframes with different schemas.
Pyspark streaming is a scalable and fault-tolerant stream processing engine built on top of Apache Spark.
Pyspark streaming allows for real-time processing of streaming data.
It provides high-level APIs in Python for creating streaming applications.
Pyspark streaming supports various data sources like Kafka, Flume, Kinesis, etc.
It enables windowed computations and stateful processing for handling streaming data.
Example: C...
I was interviewed in Jan 2024.
I have used various types of joins including inner join, left join, right join, and full outer join.
Used inner join to retrieve records that have matching values in both tables
Utilized left join to retrieve all records from the left table and matching records from the right table
Employed right join to retrieve all records from the right table and matching records from the left table
Utilized full outer join to retrieve ...
Query for joins in SQL to combine data from multiple tables
Use JOIN keyword to combine data from two or more tables based on a related column
Types of joins include INNER JOIN, LEFT JOIN, RIGHT JOIN, and FULL JOIN
Example: SELECT * FROM table1 INNER JOIN table2 ON table1.id = table2.id
I applied via Referral and was interviewed in Feb 2024. There was 1 interview round.
Just focus on the basics of pyspark.
posted on 9 May 2022
I applied via Approached by Company and was interviewed in Nov 2021. There was 1 interview round.
Normalization is a process of organizing data in a database to reduce redundancy and improve data integrity.
Normalization involves breaking down a table into smaller tables and defining relationships between them.
It helps in reducing data redundancy and inconsistencies.
Views are virtual tables that are created based on the result of a query. They can be used to simplify complex queries.
Stored procedures are precompiled...
I applied via LinkedIn and was interviewed before Jun 2023. There were 3 interview rounds.
Easy questions related to DSA and Python and SQL
Query to join two tables using different joins
Use INNER JOIN to return rows that have matching values in both tables
Use LEFT JOIN to return all rows from the left table and the matched rows from the right table
Use RIGHT JOIN to return all rows from the right table and the matched rows from the left table
Use FULL JOIN to return rows when there is a match in one of the tables
Delete removes rows from a table while truncate removes all rows from a table.
Delete is a DML command while truncate is a DDL command.
Delete operation can be rolled back while truncate operation cannot be rolled back.
Delete operation is slower than truncate operation.
Example: DELETE FROM table_name WHERE condition; TRUNCATE TABLE table_name;
I applied via Company Website and was interviewed in Jan 2024. There was 1 interview round.
Spark architecture includes driver, cluster manager, and worker nodes for distributed processing.
Spark architecture consists of a driver program that manages the execution of tasks on worker nodes.
Cluster manager is responsible for allocating resources and scheduling tasks across worker nodes.
Worker nodes execute the tasks and store data in memory or disk for processing.
Example: In a Spark application, the driver progr...
Interview experience
based on 2 reviews
Rating in categories
Software Engineer
315
salaries
| ₹3.6 L/yr - ₹16 L/yr |
System Analyst
147
salaries
| ₹4.6 L/yr - ₹16.2 L/yr |
Senior Software Engineer
144
salaries
| ₹7.6 L/yr - ₹27.2 L/yr |
Technical Support Engineer
133
salaries
| ₹2.7 L/yr - ₹7.5 L/yr |
Senior Enterprise Associate
130
salaries
| ₹4.1 L/yr - ₹11.6 L/yr |
IBM
Accenture
HCLTech
TCS