Deloitte
Magazine3 Technologies Interview Questions and Answers
Q1. What is difference between Cloud Run and Cloud Functions
Cloud Run is a fully managed compute platform that automatically scales your stateless containers, while Cloud Functions is a serverless execution environment for building and connecting cloud services.
Cloud Run is designed for running containerized applications, while Cloud Functions is designed for running single-purpose functions.
Cloud Run allows you to run any stateless container, while Cloud Functions only supports specific programming languages like Node.js, Python, and...read more
Q2. What is Data WareHouse..?
A Data Warehouse is a centralized repository that stores integrated data from multiple sources for analysis and reporting.
Data Warehouses are designed for query and analysis rather than transaction processing.
They often contain historical data and are used for decision-making purposes.
Data Warehouses typically use a dimensional model with facts and dimensions.
Examples of Data Warehouse tools include Amazon Redshift, Snowflake, and Google BigQuery.
Q3. What is the Underlying Architecture of bigquery
BigQuery is a fully-managed, serverless data warehouse that uses a distributed architecture to process and analyze large datasets.
BigQuery uses a distributed architecture to store and process data, allowing for scalability and high performance.
It separates storage and compute, enabling users to scale each independently based on their needs.
BigQuery leverages Google's infrastructure to automatically handle tasks like replication, sharding, and load balancing.
It uses a columnar...read more
Q4. Tell me about Azure data bricks
Azure Databricks is a unified analytics platform that provides collaborative environment for big data and machine learning.
Azure Databricks is built on Apache Spark and provides a collaborative workspace for data engineers, data scientists, and machine learning engineers.
It offers integrated notebooks for interactive data exploration and visualization.
Azure Databricks allows for seamless integration with other Azure services like Azure Data Lake Storage, Azure SQL Data Wareho...read more
Q5. nth highest salary, word count in pyspark
To find the nth highest salary in pyspark, use the window function with row_number and filter on the desired rank.
Use window function with row_number to assign a rank to each salary
Filter the result to get the row with the desired rank
Example: df.withColumn('rank', F.row_number().over(Window.orderBy(F.col('salary').desc()))).filter(F.col('rank') == n).select('salary')
Q6. Nested Queries in Bigquery..?
Nested queries in BigQuery allow for querying data from within another query, enabling complex data analysis.
Nested queries are queries that are embedded within another query
They can be used to perform subqueries to filter, aggregate, or manipulate data
Nested queries can be used in SELECT, FROM, WHERE, and HAVING clauses
Q7. What are spark optimization techniques
Spark optimization techniques improve performance and efficiency of Spark jobs.
Partitioning data correctly to avoid data shuffling
Caching intermediate results to avoid recomputation
Using appropriate data formats like Parquet for efficient storage and retrieval
Tuning memory settings for optimal performance
Avoiding unnecessary data transformations
More about working at Deloitte
Interview Process at Magazine3 Technologies
Top Data Engineer Interview Questions from Similar Companies
Reviews
Interviews
Salaries
Users/Month