Tech Mahindra
10+ Relaxo Footwear Interview Questions and Answers
Q1. how to remove duplicate rows from bigquery? find the month of a given date in bigquery.
To remove duplicate rows from BigQuery, use the DISTINCT keyword. To find the month of a given date, use the EXTRACT function.
To remove duplicate rows, use SELECT DISTINCT * FROM table_name;
To find the month of a given date, use SELECT EXTRACT(MONTH FROM date_column) AS month_name FROM table_name;
Make sure to replace 'table_name' and 'date_column' with the appropriate values in your query.
Q2. what operator is used in composer to move data from gcs to bq
The operator used in Composer to move data from GCS to BigQuery is the GCS to BigQuery operator.
The GCS to BigQuery operator is used in Apache Airflow, which is the underlying technology of Composer.
This operator allows you to transfer data from Google Cloud Storage (GCS) to BigQuery.
You can specify the source and destination parameters in the operator to define the data transfer process.
Q3. What is the SQL query to calculate the average sales over a period of 7 days?
Calculate average sales over a 7-day period using SQL query.
Use the AVG() function to calculate the average sales.
Filter the data based on the date range of the 7-day period using WHERE clause.
Group the data by date to calculate the average sales for each day.
Join the tables if necessary to get the sales data.
Q4. write a code for this - input = [1,2,3,4] output = [1,4,9,16]
Code to square each element in the input array.
Iterate through the input array and square each element.
Store the squared values in a new array to get the desired output.
Q5. architecture of bq. Query optimization techniques in bigquery.
BigQuery architecture includes storage, execution, and optimization components for efficient query processing.
BigQuery stores data in Capacitor storage system for fast access.
Query execution is distributed across multiple nodes for parallel processing.
Query optimization techniques include partitioning tables, clustering tables, and using query cache.
Using partitioned tables can help eliminate scanning unnecessary data.
Clustering tables based on certain columns can improve que...read more
Q6. transformations in pyspark rank,dense rank
Rank and Dense Rank are transformations in PySpark used to assign ranks to rows based on a specific column.
Rank assigns unique ranks to each row based on the order of values in a specific column.
Dense Rank assigns ranks to each row based on the order of values in a specific column, but with no gaps between ranks.
Both transformations can be used with the 'over' function to specify the column to order by.
Example: df.select('name', 'score', rank().over(Window.orderBy('score')).a...read more
Q7. difference between bigtable and bigquery.
Bigtable is a NoSQL database for real-time analytics, while BigQuery is a fully managed data warehouse for running SQL queries.
Bigtable is a NoSQL database designed for real-time analytics and high throughput, while BigQuery is a fully managed data warehouse for running SQL queries.
Bigtable is used for storing large amounts of semi-structured data, while BigQuery is used for analyzing structured data using SQL queries.
Bigtable is suitable for real-time data processing and hig...read more
Q8. RDD vs dataframe vs dataset in pyspark
RDD vs dataframe vs dataset in PySpark
RDD (Resilient Distributed Dataset) is the basic abstraction in PySpark, representing a distributed collection of objects
Dataframe is a distributed collection of data organized into named columns, similar to a table in a relational database
Dataset is a distributed collection of data with the ability to use custom classes for type safety and user-defined functions
Dataframes and Datasets are built on top of RDDs, providing a more structured...read more
Q9. What is cloud ? What is pyspark
Cloud is a network of remote servers hosted on the internet to store, manage, and process data.
Cloud computing allows users to access data and applications from any device with an internet connection.
It provides scalability, flexibility, and cost-effectiveness for businesses.
Examples of cloud services include Amazon Web Services (AWS), Microsoft Azure, and Google Cloud Platform.
Q10. Explain Data bricks architecture?
Data bricks architecture is a cloud-based big data processing platform that combines Apache Spark and Delta Lake.
Data bricks architecture includes Apache Spark for processing big data in a distributed environment.
It also incorporates Delta Lake for reliable data lakes and data warehousing.
Data bricks provides a collaborative workspace for data engineers, data scientists, and analysts.
It offers automated cluster management and scaling for efficient data processing.
Data bricks ...read more
Q11. dataflow vs dataproc.
Dataflow is a fully managed stream and batch processing service, while Dataproc is a managed Apache Spark and Hadoop service.
Dataflow is a serverless data processing service that automatically scales to handle your data, while Dataproc is a managed Spark and Hadoop service that requires you to provision and manage clusters.
Dataflow is designed for both batch and stream processing, allowing you to process data in real-time, while Dataproc is more focused on batch processing.
Da...read more
Q12. how bigquery works?
BigQuery is a fully managed, serverless data warehouse by Google Cloud for analyzing large datasets using SQL queries.
BigQuery is a cloud-based data warehouse that allows for fast SQL queries on large datasets.
It is fully managed and serverless, meaning users do not have to worry about infrastructure management.
BigQuery can handle petabytes of data and allows for real-time analytics with its streaming capabilities.
It supports standard SQL queries and integrates with other Goo...read more
Q13. delete and truncate difference
Delete removes rows from a table while truncate removes all rows from a table
Delete is a DML command while truncate is a DDL command
Delete can be rolled back while truncate cannot be rolled back
Delete fires triggers on each row deletion while truncate does not fire triggers
Interview Process at Relaxo Footwear
Top Data Engineer Interview Questions from Similar Companies
Reviews
Interviews
Salaries
Users/Month