i
Cognizant
Proud winner of ABECA 2024 - AmbitionBox Employee Choice Awards
Filter interviews by
Clear (1)
I was interviewed in Jan 2025.
Various data sources such as databases, APIs, files, and streaming services are used for data ingestion and processing.
Databases (e.g. MySQL, PostgreSQL)
APIs (e.g. RESTful APIs)
Files (e.g. CSV, JSON)
Streaming services (e.g. Kafka, Pub/Sub)
Partitioning is dividing data into smaller chunks for efficient storage and retrieval, while clustering is organizing data within those partitions based on a specific column.
Partitioning is done at the storage level to distribute data across multiple nodes for better performance.
Clustering is done at the query level to physically group data based on a specific column, improving query performance.
Example: Partitioning b...
Using bq commands to create a table and load a CSV file in Google BigQuery
Use 'bq mk' command to create a new table in BigQuery
Use 'bq load' command to load a CSV file into the created table
Specify schema and source format when creating the table
Specify source format and destination table when loading the CSV file
Example: bq mk --table dataset.table_name schema.json
Example: bq load --source_format=CSV dataset.table_nam
Use 'bq show' command to display the schema of a table in BigQuery.
Use 'bq show' command followed by the dataset and table name to display the schema.
The schema includes the column names, data types, and mode (nullable or required).
Example: bq show project_id:dataset.table_name
Leaf nodes are the bottom nodes in a tree structure, while columnar storage stores data in columns rather than rows.
Leaf nodes are the end nodes in a tree structure, containing actual data or pointers to data.
Columnar storage stores data in columns rather than rows, allowing for faster query performance on specific columns.
Columnar storage is commonly used in data warehouses and analytics databases.
Leaf nodes are impor...
BigQuery does not have fixed slots, it dynamically allocates resources based on the query requirements.
BigQuery does not have a fixed number of slots like traditional databases.
It dynamically allocates resources based on the query requirements.
The number of slots available for a query can vary depending on the complexity and size of the query.
BigQuery's serverless architecture allows it to scale automatically to handle
The GCP services used in our project include BigQuery, Dataflow, Pub/Sub, and Cloud Storage.
BigQuery for data warehousing and analytics
Dataflow for real-time data processing
Pub/Sub for messaging and event ingestion
Cloud Storage for storing data and files
Cloud Functions are event-driven functions that run in response to cloud events.
Serverless functions that automatically scale based on demand
Can be triggered by events from various cloud services
Supports multiple programming languages like Node.js, Python, etc.
To schedule a job to trigger every hour in Airflow, you can use the Cron schedule interval
Define a DAG (Directed Acyclic Graph) in Airflow
Set the schedule_interval parameter to '0 * * * *' to trigger the job every hour
Example: schedule_interval='0 * * * *'
Use Python's slicing feature to display a string in reverse order.
Use string slicing with a step of -1 to reverse the string.
Example: 'hello'[::-1] will output 'olleh'.
Pub/Sub is a messaging service that allows communication between independent applications.
Pub/Sub is used for real-time messaging and event-driven systems.
It is commonly used for data ingestion, streaming analytics, and event-driven architectures.
Examples of Pub/Sub services include Google Cloud Pub/Sub, Apache Kafka, and Amazon SNS/SQS.
I applied via Naukri.com and was interviewed in Nov 2023. There was 1 interview round.
GCP BigQuery is a serverless, highly scalable, and cost-effective data warehouse for analyzing big data sets.
BigQuery is a fully managed, petabyte-scale data warehouse that enables super-fast SQL queries using the processing power of Google's infrastructure.
BigQuery's architecture includes storage, Dremel execution engine, and SQL layer.
Cloud Composer is a managed workflow orchestration service that helps you create, s...
What people are saying about Cognizant
I applied via Naukri.com and was interviewed in Apr 2022. There were 3 interview rounds.
Bigquery is a cloud-based data warehousing tool used for analyzing large datasets quickly. Pubsub is a messaging service, Dataflow is a data processing tool, and Cloud Storage is a scalable object storage service.
Bigquery is used for analyzing large datasets quickly
Pubsub is a messaging service used for asynchronous communication between applications
Dataflow is a data processing tool used for batch and stream processin...
Cognizant interview questions for designations
Use SQL to find keys present in table A but not in table B (old copy of A).
Use a LEFT JOIN to combine tables A and B based on the key column
Filter the results where the key column in table B is NULL
This will give you the keys present in table A but not in table B
Use GCP Dataflow to transfer files between GCS buckets
Create a Dataflow pipeline using Apache Beam to read from source bucket and write to destination bucket
Use GCS connector to read and write files in Dataflow pipeline
Set up appropriate permissions for Dataflow service account to access both buckets
Cloud Composer is another orchestration tool in GCP
Cloud Composer is a fully managed workflow orchestration service built on Apache Airflow
It allows you to author, schedule, and monitor workflows that span across GCP services
Cloud Composer provides a rich set of features like DAGs, plugins, and monitoring capabilities
It integrates seamlessly with other GCP services like BigQuery, Dataflow, and Dataproc
I applied via LinkedIn and was interviewed before Nov 2021. There were 3 interview rounds.
I applied via Naukri.com and was interviewed before Nov 2021. There were 2 interview rounds.
Google Cloud BigQuery is a fully-managed, serverless data warehouse that uses a distributed architecture for processing and analyzing large datasets.
BigQuery uses a distributed storage system called Capacitor for storing and managing data.
It uses a distributed query engine called Dremel for executing SQL-like queries on large datasets.
BigQuery separates storage and compute, allowing users to scale compute resources ind...
List and tuple are both used to store collections of data, but they have some differences.
Lists are mutable while tuples are immutable
Lists use square brackets [] while tuples use parentheses ()
Lists are typically used for collections of homogeneous data while tuples are used for heterogeneous data
Lists have more built-in methods than tuples
Window functions in BigQuery are used to perform calculations across a set of table rows related to the current row.
Window functions allow you to perform calculations on a set of rows related to the current row
They are used with the OVER() clause in SQL queries
Common window functions include ROW_NUMBER(), RANK(), and NTILE()
They can be used to calculate moving averages, cumulative sums, and more
Types of NoSQL databases in GCP include Firestore, Bigtable, and Datastore.
Firestore is a flexible, scalable database for mobile, web, and server development.
Bigtable is a high-performance NoSQL database service for large analytical and operational workloads.
Datastore is a highly scalable NoSQL database for web and mobile applications.
Code to find max number of product by customer
Iterate through each customer's purchases
Keep track of the count of each product for each customer
Find the product with the maximum count for each customer
Creating a dataframe in GCP Data Engineer
Use the pandas library to create a dataframe
Provide data in the form of a dictionary or list of lists
Specify column names if needed
I applied via LinkedIn and was interviewed in Oct 2024. There were 2 interview rounds.
I have experience working on projects involving data processing, transformation, and analysis using GCP services like BigQuery, Dataflow, and Dataproc.
Utilized BigQuery for storing and querying large datasets
Implemented data pipelines using Dataflow for real-time data processing
Utilized Dataproc for running Apache Spark and Hadoop clusters for data processing
Worked on data ingestion and transformation using Cloud Stora
Some of the top questions asked at the Cognizant Gcp Data Engineer interview -
based on 3 interviews
1 Interview rounds
based on 4 reviews
Rating in categories
Associate
72.1k
salaries
| ₹0 L/yr - ₹0 L/yr |
Programmer Analyst
55.5k
salaries
| ₹0 L/yr - ₹0 L/yr |
Senior Associate
48.7k
salaries
| ₹0 L/yr - ₹0 L/yr |
Senior Processing Executive
28.9k
salaries
| ₹0 L/yr - ₹0 L/yr |
Technical Lead
17.5k
salaries
| ₹0 L/yr - ₹0 L/yr |
TCS
Infosys
Wipro
Accenture