Filter interviews by
I applied via Referral and was interviewed in Nov 2020. There were 3 interview rounds.
Top trending discussions
I applied via LinkedIn and was interviewed in Nov 2024. There were 3 interview rounds.
This was a coderpad round where I solved python coding problem and SQL queries.
Data pipeline implementations involve the process of moving and transforming data from source to destination.
Data pipeline is a series of processes that extract data from sources, transform it, and load it into a destination.
Common tools for data pipeline implementations include Apache NiFi, Apache Airflow, and AWS Glue.
Data pipelines can be batch-oriented or real-time, depending on the requirements of the use case.
Test 45 mins 30 ques
Linear regression is a statistical method used to model the relationship between a dependent variable and one or more independent variables.
Linear regression is used to predict the value of a dependent variable based on the value of one or more independent variables.
It assumes a linear relationship between the independent and dependent variables.
The goal of linear regression is to find the best-fitting line that minimi...
Random forest is an ensemble learning method used for classification and regression tasks.
Random forest is a collection of decision trees that are trained on random subsets of the data.
Each tree in the random forest independently predicts the target variable, and the final prediction is made by averaging the predictions of all trees.
Random forest is robust to overfitting and noisy data, and it can handle large datasets...
XGBoost is an optimized distributed gradient boosting library designed for efficient and accurate large-scale machine learning.
XGBoost stands for eXtreme Gradient Boosting.
It is a popular machine learning algorithm known for its speed and performance.
XGBoost is used for regression, classification, ranking, and user-defined prediction problems.
It is based on the gradient boosting framework and uses decision trees as bas...
Duration was for 5 min, imp topics only
Prepare well with dsa
DeltaX is used to calculate the change in a variable over a period of time.
DeltaX is used in mathematics and science to measure the difference between two values of a variable.
It is commonly used in calculus to find the rate of change of a function.
DeltaX is represented by the symbol Δx, where Δ denotes change and x represents the variable.
2 codes were there and it was really easy
Python has various data structures like lists, tuples, dictionaries, sets, etc.
Lists: Ordered, mutable, allows duplicate elements. Example: [1, 2, 3]
Tuples: Ordered, immutable, allows duplicate elements. Example: (1, 2, 3)
Dictionaries: Unordered, mutable, key-value pairs. Example: {'key': 'value'}
Sets: Unordered, mutable, unique elements. Example: {1, 2, 3}
Hadoop is a framework for distributed storage and processing of large data sets across clusters of computers.
Hadoop consists of HDFS (Hadoop Distributed File System) for storage and MapReduce for processing.
Hadoop uses a master-slave architecture with a single NameNode (master) and multiple DataNodes (slaves).
Hadoop ecosystem includes tools like Hive, Pig, Spark, and HBase for different data processing tasks.
Hadoop is ...
posted on 8 Aug 2024
I applied via Approached by Company and was interviewed in Dec 2023. There were 4 interview rounds.
The question is asking about types of transformations, number of jobs, tasks, and actions in the context of a Senior Data Engineer role.
Types of transformations: Extract, Transform, Load (ETL), MapReduce, Spark transformations, SQL transformations
Number of jobs: Depends on the complexity and scale of the data engineering projects
Number of tasks: Varies based on the number of data sources, data transformations, and data...
Spark is a distributed processing engine, Airflow is a workflow management system, and BigQuery is a fully managed data warehouse.
Spark is designed for big data processing and provides in-memory computation capabilities.
Airflow is used for orchestrating and scheduling data pipelines.
BigQuery is a serverless data warehouse that allows for fast and scalable analytics.
Spark can be integrated with Airflow to schedule and m...
Optimization techniques in Spark, SQL, BigQuery, and Airflow.
Use partitioning and bucketing in Spark to optimize data processing.
Optimize SQL queries by using indexes, query rewriting, and query optimization techniques.
In BigQuery, use partitioning and clustering to improve query performance.
Leverage Airflow's task parallelism and resource allocation to optimize workflow execution.
To delete duplicates from a table in Spark and SQL, you can use the DISTINCT keyword or the dropDuplicates() function.
In SQL, you can use the DISTINCT keyword in a SELECT statement to retrieve unique rows from a table.
In Spark, you can use the dropDuplicates() function on a DataFrame to remove duplicate rows.
Both methods compare all columns by default, but you can specify specific columns to consider for duplicates.
You...
Dataflow and Dataproc are both processing services in GCP, but with different approaches and use cases.
Dataflow is a fully managed service for executing batch and streaming data processing pipelines.
Dataproc is a managed Spark and Hadoop service for running big data processing and analytics workloads.
Dataflow provides a serverless and auto-scaling environment, while Dataproc offers more control and flexibility.
Dataflow...
This was final round with Client.
They ask questions based on my work in pyspark area.
Questions are like:
What kind of transformations you used.
Broadcast join internals.
Spark internal joins
Spark catalyst optimizer.What are the joins happens in catalyst optimizer.
Window function question: 3rd highest salary of an emp
Discussion on airflow arch and how to deploy a airflow dag in gcp.
Discussion on BQ on what kind of work i have done till now.
I applied via Recruitment Consulltant and was interviewed in Nov 2023. There were 4 interview rounds.
Python and Spark code
based on 1 review
Rating in categories
Project Officer
6
salaries
| ₹5.4 L/yr - ₹7.3 L/yr |
Assistant Project Officer
4
salaries
| ₹4.2 L/yr - ₹6.2 L/yr |
Field Assistant
4
salaries
| ₹1.8 L/yr - ₹4.3 L/yr |
Communication Officer
4
salaries
| ₹7.9 L/yr - ₹9 L/yr |
Veterinary Doctor
4
salaries
| ₹5.4 L/yr - ₹7.5 L/yr |
Humane Society International
PETA
Wildlife SOS.
Animal Aid Unlimited