i
Capgemini
Filter interviews by
You can identify file size in Python using the os module or pathlib for efficient file handling.
Use os.path.getsize() to get the size of a file in bytes. Example: os.path.getsize('file.txt')
Use pathlib.Path.stat() to retrieve file size. Example: from pathlib import Path; Path('file.txt').stat().st_size
File size can also be checked using the built-in open() function with os.fstat(). Example: os.fstat(open('file.txt...
Creating a dataframe in GCP Data Engineer
Use the pandas library to create a dataframe
Provide data in the form of a dictionary or list of lists
Specify column names if needed
Window functions in BigQuery are used to perform calculations across a set of table rows related to the current row.
Window functions allow you to perform calculations on a set of rows related to the current row
They are used with the OVER() clause in SQL queries
Common window functions include ROW_NUMBER(), RANK(), and NTILE()
They can be used to calculate moving averages, cumulative sums, and more
Code to find max number of product by customer
Iterate through each customer's purchases
Keep track of the count of each product for each customer
Find the product with the maximum count for each customer
What people are saying about Capgemini
Types of NoSQL databases in GCP include Firestore, Bigtable, and Datastore.
Firestore is a flexible, scalable database for mobile, web, and server development.
Bigtable is a high-performance NoSQL database service for large analytical and operational workloads.
Datastore is a highly scalable NoSQL database for web and mobile applications.
Google Cloud BigQuery is a fully-managed, serverless data warehouse that uses a distributed architecture for processing and analyzing large datasets.
BigQuery uses a distributed storage system called Capacitor for storing and managing data.
It uses a distributed query engine called Dremel for executing SQL-like queries on large datasets.
BigQuery separates storage and compute, allowing users to scale compute resource...
List and tuple are both used to store collections of data, but they have some differences.
Lists are mutable while tuples are immutable
Lists use square brackets [] while tuples use parentheses ()
Lists are typically used for collections of homogeneous data while tuples are used for heterogeneous data
Lists have more built-in methods than tuples
I applied via LinkedIn and was interviewed in Oct 2024. There were 2 interview rounds.
I have experience working on projects involving data processing, transformation, and analysis using GCP services like BigQuery, Dataflow, and Dataproc.
Utilized BigQuery for storing and querying large datasets
Implemented data pipelines using Dataflow for real-time data processing
Utilized Dataproc for running Apache Spark and Hadoop clusters for data processing
Worked on data ingestion and transformation using Cloud Stora...
I applied via Naukri.com and was interviewed in Jun 2024. There was 1 interview round.
Shuffle partition is a data processing technique used to redistribute data across partitions in distributed computing.
Shuffle partition helps in balancing the load across different nodes in a distributed system.
It is commonly used in frameworks like Apache Spark during operations like groupBy and join.
For example, when joining two large datasets, shuffle partition ensures that related data is processed together.
Imprope...
You can identify file size in Python using the os module or pathlib for efficient file handling.
Use os.path.getsize() to get the size of a file in bytes. Example: os.path.getsize('file.txt')
Use pathlib.Path.stat() to retrieve file size. Example: from pathlib import Path; Path('file.txt').stat().st_size
File size can also be checked using the built-in open() function with os.fstat(). Example: os.fstat(open('file.txt').fi...
Google Cloud Dataflow supports Java and Python for building data processing pipelines.
Java: Widely used for building robust data pipelines; example: Apache Beam SDK for Java.
Python: Popular for its simplicity and ease of use; example: Apache Beam SDK for Python.
Both languages allow for the creation of batch and streaming data processing applications.
Window functions in BigQuery are used to perform calculations across a set of table rows related to the current row.
Window functions allow you to perform calculations on a set of rows related to the current row
They are used with the OVER() clause in SQL queries
Common window functions include ROW_NUMBER(), RANK(), and NTILE()
They can be used to calculate moving averages, cumulative sums, and more
Types of NoSQL databases in GCP include Firestore, Bigtable, and Datastore.
Firestore is a flexible, scalable database for mobile, web, and server development.
Bigtable is a high-performance NoSQL database service for large analytical and operational workloads.
Datastore is a highly scalable NoSQL database for web and mobile applications.
Code to find max number of product by customer
Iterate through each customer's purchases
Keep track of the count of each product for each customer
Find the product with the maximum count for each customer
Creating a dataframe in GCP Data Engineer
Use the pandas library to create a dataframe
Provide data in the form of a dictionary or list of lists
Specify column names if needed
I applied via Naukri.com and was interviewed before Nov 2021. There were 2 interview rounds.
Google Cloud BigQuery is a fully-managed, serverless data warehouse that uses a distributed architecture for processing and analyzing large datasets.
BigQuery uses a distributed storage system called Capacitor for storing and managing data.
It uses a distributed query engine called Dremel for executing SQL-like queries on large datasets.
BigQuery separates storage and compute, allowing users to scale compute resources ind...
List and tuple are both used to store collections of data, but they have some differences.
Lists are mutable while tuples are immutable
Lists use square brackets [] while tuples use parentheses ()
Lists are typically used for collections of homogeneous data while tuples are used for heterogeneous data
Lists have more built-in methods than tuples
I applied via LinkedIn and was interviewed before Nov 2021. There were 3 interview rounds.
I applied via Naukri.com and was interviewed in Jun 2024. There was 1 interview round.
Check if a string is a palindrome or not
Compare the string with its reverse to check for palindrome
Ignore spaces and punctuation marks when comparing
Examples: 'racecar' is a palindrome, 'hello' is not
Use Python to create a GCS bucket
Import the necessary libraries like google.cloud.storage
Authenticate using service account credentials
Use the library functions to create a new bucket
Python code to trigger a dataflow job in cloud function
Use the googleapiclient library to interact with the Dataflow API
Authenticate using service account credentials
Submit a job to Dataflow using the projects.locations.templates.launch endpoint
I applied via Company Website and was interviewed before Mar 2023. There were 2 interview rounds.
SQL joins are used to combine rows from two or more tables based on a related column between them.
SQL joins are used to retrieve data from multiple tables based on a related column between them
Types of SQL joins include INNER JOIN, LEFT JOIN, RIGHT JOIN, and FULL JOIN
In BigQuery, joins can be performed using standard SQL syntax
Example: SELECT * FROM table1 INNER JOIN table2 ON table1.column = table2.column
I applied via Naukri.com and was interviewed in Nov 2023. There was 1 interview round.
GCP BigQuery is a serverless, highly scalable, and cost-effective data warehouse for analyzing big data sets.
BigQuery is a fully managed, petabyte-scale data warehouse that enables super-fast SQL queries using the processing power of Google's infrastructure.
BigQuery's architecture includes storage, Dremel execution engine, and SQL layer.
Cloud Composer is a managed workflow orchestration service that helps you create, s...
The GCP services used in our project include BigQuery, Dataflow, Pub/Sub, and Cloud Storage.
BigQuery for data warehousing and analytics
Dataflow for real-time data processing
Pub/Sub for messaging and event ingestion
Cloud Storage for storing data and files
Cloud Functions are event-driven functions that run in response to cloud events.
Serverless functions that automatically scale based on demand
Can be triggered by events from various cloud services
Supports multiple programming languages like Node.js, Python, etc.
To schedule a job to trigger every hour in Airflow, you can use the Cron schedule interval
Define a DAG (Directed Acyclic Graph) in Airflow
Set the schedule_interval parameter to '0 * * * *' to trigger the job every hour
Example: schedule_interval='0 * * * *'
Use Python's slicing feature to display a string in reverse order.
Use string slicing with a step of -1 to reverse the string.
Example: 'hello'[::-1] will output 'olleh'.
Pub/Sub is a messaging service that allows communication between independent applications.
Pub/Sub is used for real-time messaging and event-driven systems.
It is commonly used for data ingestion, streaming analytics, and event-driven architectures.
Examples of Pub/Sub services include Google Cloud Pub/Sub, Apache Kafka, and Amazon SNS/SQS.
based on 5 interview experiences
Difficulty level
Duration
based on 10 reviews
Rating in categories
Consultant
58.6k
salaries
| ₹5.3 L/yr - ₹19 L/yr |
Associate Consultant
51.2k
salaries
| ₹4.5 L/yr - ₹10 L/yr |
Senior Consultant
50k
salaries
| ₹7.8 L/yr - ₹26 L/yr |
Senior Analyst
22.1k
salaries
| ₹1.6 L/yr - ₹9.1 L/yr |
Senior Software Engineer
21.5k
salaries
| ₹3.5 L/yr - ₹13.5 L/yr |
Wipro
Accenture
Cognizant
TCS