i
IBM
Proud winner of ABECA 2024 - AmbitionBox Employee Choice Awards
Filter interviews by
I applied via LinkedIn and was interviewed before Nov 2021. There were 3 interview rounds.
I applied via Naukri.com and was interviewed before Nov 2021. There were 2 interview rounds.
Google Cloud BigQuery is a fully-managed, serverless data warehouse that uses a distributed architecture for processing and analyzing large datasets.
BigQuery uses a distributed storage system called Capacitor for storing and managing data.
It uses a distributed query engine called Dremel for executing SQL-like queries on large datasets.
BigQuery separates storage and compute, allowing users to scale compute resources ind...
List and tuple are both used to store collections of data, but they have some differences.
Lists are mutable while tuples are immutable
Lists use square brackets [] while tuples use parentheses ()
Lists are typically used for collections of homogeneous data while tuples are used for heterogeneous data
Lists have more built-in methods than tuples
Use SQL to find keys present in table A but not in table B (old copy of A).
Use a LEFT JOIN to combine tables A and B based on the key column
Filter the results where the key column in table B is NULL
This will give you the keys present in table A but not in table B
Use GCP Dataflow to transfer files between GCS buckets
Create a Dataflow pipeline using Apache Beam to read from source bucket and write to destination bucket
Use GCS connector to read and write files in Dataflow pipeline
Set up appropriate permissions for Dataflow service account to access both buckets
Cloud Composer is another orchestration tool in GCP
Cloud Composer is a fully managed workflow orchestration service built on Apache Airflow
It allows you to author, schedule, and monitor workflows that span across GCP services
Cloud Composer provides a rich set of features like DAGs, plugins, and monitoring capabilities
It integrates seamlessly with other GCP services like BigQuery, Dataflow, and Dataproc
select * from table limit 100 is faster
Using 'select * from table' retrieves all rows from the table, which can be slower if the table is large
Using 'select * from table limit 100' limits the number of rows retrieved, making it faster
Limiting the number of rows fetched can improve query performance
SCD stands for Slowly Changing Dimension and Merge is a SQL operation used to update or insert data in BigQuery.
SCD is used to track changes to data over time in a data warehouse
Merge in BigQuery is used to perform insert, update, or delete operations in a single statement
Example: MERGE INTO target_table USING source_table ON condition WHEN MATCHED THEN UPDATE SET col1 = value1 WHEN NOT MATCHED THEN INSERT (col1, col2)
BigQuery is a fully managed, serverless data warehouse that enables scalable analysis over petabytes of data.
BigQuery uses a columnar storage format for efficient querying.
It supports standard SQL for querying data.
BigQuery allows for real-time data streaming for analysis.
It integrates with various data sources like Google Cloud Storage, Google Sheets, etc.
BigQuery provides automatic scaling and high availability.
Dataflow function to split sentence
Use the Split transform in Dataflow to split the sentence into words
Apply ParDo function to process each word individually
Use regular expressions to handle punctuation and special characters
I applied via Naukri.com and was interviewed in Nov 2024. There were 2 interview rounds.
Developed a data pipeline to ingest, process, and analyze customer feedback data for a retail company.
Used Google Cloud Platform services like BigQuery, Dataflow, and Pub/Sub for data processing.
Implemented data cleansing and transformation techniques to ensure data quality.
Created visualizations and dashboards using tools like Data Studio for stakeholders to easily interpret the data.
GCP offers different storage classes for varying performance and cost requirements.
Standard Storage: for frequently accessed data
Nearline Storage: for data accessed less frequently
Coldline Storage: for data accessed very infrequently
Archive Storage: for data stored for long-term retention
SQL optimization techniques focus on improving query performance by reducing execution time and resource usage.
Use indexes to speed up data retrieval
Avoid using SELECT * and instead specify only the columns needed
Optimize joins by using appropriate join types and conditions
Limit the use of subqueries and instead use JOINs where possible
Use EXPLAIN to analyze query execution plans and identify bottlenecks
I applied via Naukri.com and was interviewed in Apr 2022. There were 3 interview rounds.
Bigquery is a cloud-based data warehousing tool used for analyzing large datasets quickly. Pubsub is a messaging service, Dataflow is a data processing tool, and Cloud Storage is a scalable object storage service.
Bigquery is used for analyzing large datasets quickly
Pubsub is a messaging service used for asynchronous communication between applications
Dataflow is a data processing tool used for batch and stream processin...
Window functions in BigQuery are used to perform calculations across a set of table rows related to the current row.
Window functions allow you to perform calculations on a set of rows related to the current row
They are used with the OVER() clause in SQL queries
Common window functions include ROW_NUMBER(), RANK(), and NTILE()
They can be used to calculate moving averages, cumulative sums, and more
Types of NoSQL databases in GCP include Firestore, Bigtable, and Datastore.
Firestore is a flexible, scalable database for mobile, web, and server development.
Bigtable is a high-performance NoSQL database service for large analytical and operational workloads.
Datastore is a highly scalable NoSQL database for web and mobile applications.
Code to find max number of product by customer
Iterate through each customer's purchases
Keep track of the count of each product for each customer
Find the product with the maximum count for each customer
Creating a dataframe in GCP Data Engineer
Use the pandas library to create a dataframe
Provide data in the form of a dictionary or list of lists
Specify column names if needed
I applied via Naukri.com and was interviewed in Jun 2024. There was 1 interview round.
Check if a string is a palindrome or not
Compare the string with its reverse to check for palindrome
Ignore spaces and punctuation marks when comparing
Examples: 'racecar' is a palindrome, 'hello' is not
Use Python to create a GCS bucket
Import the necessary libraries like google.cloud.storage
Authenticate using service account credentials
Use the library functions to create a new bucket
Python code to trigger a dataflow job in cloud function
Use the googleapiclient library to interact with the Dataflow API
Authenticate using service account credentials
Submit a job to Dataflow using the projects.locations.templates.launch endpoint
I applied via Walk-in and was interviewed in Mar 2022. There was 1 interview round.
based on 2 reviews
Rating in categories
Application Developer
11.7k
salaries
| ₹0 L/yr - ₹0 L/yr |
Software Engineer
5.5k
salaries
| ₹0 L/yr - ₹0 L/yr |
Advisory System Analyst
5.2k
salaries
| ₹0 L/yr - ₹0 L/yr |
Senior Software Engineer
5k
salaries
| ₹0 L/yr - ₹0 L/yr |
Senior Systems Engineer
4.5k
salaries
| ₹0 L/yr - ₹0 L/yr |
Oracle
TCS
Cognizant
Accenture