TCS
10+ Cyient Interview Questions and Answers
Q1. 7. How can we load multiple(50)tables at a time using adf?
You can load multiple tables at a time using Azure Data Factory by creating a single pipeline with multiple copy activities.
Create a pipeline in Azure Data Factory
Add multiple copy activities to the pipeline, each copy activity for loading data from one table
Configure each copy activity to load data from a different table
Run the pipeline to load data from all tables simultaneously
Q2. 2. What is the get metadata activity and what are the parameters we have to pass?
Get metadata activity is used to retrieve metadata of a specified data store or dataset in Azure Data Factory.
Get metadata activity is used in Azure Data Factory to retrieve metadata of a specified data store or dataset.
Parameters to pass include dataset, linked service, and optional folder path.
The output of the activity includes information like schema, size, last modified timestamp, etc.
Example: Get metadata of a SQL Server table using a linked service to the database.
Q3. 3. How can we monitor the child pipeline in the master pipeline?
You can monitor the child pipeline in the master pipeline by using Azure Monitor or Azure Data Factory monitoring tools.
Use Azure Monitor to track the performance and health of the child pipeline within the master pipeline.
Leverage Azure Data Factory monitoring tools to view detailed logs and metrics for the child pipeline execution.
Set up alerts and notifications to be informed of any issues or failures in the child pipeline.
Q4. 1. What is the difference between Blob and adls?
Blob is a storage service for unstructured data, while ADLS is a distributed file system for big data analytics.
Blob is a general-purpose object storage service for unstructured data, while ADLS is optimized for big data analytics workloads.
Blob storage is suitable for storing large amounts of data, such as images, videos, and logs, while ADLS is designed for processing large datasets in parallel.
ADLS offers features like hierarchical namespace, POSIX-compliant file system se...read more
Q5. 2. Do you know data bricks? And from when you are working on it?
Yes, I am familiar with Databricks and have been working on it for the past 2 years.
I have been using Databricks for data engineering tasks such as data processing, data transformation, and data visualization.
I have experience in building and optimizing data pipelines using Databricks.
I have worked on collaborative projects with team members using Databricks notebooks.
I have utilized Databricks for big data processing and analysis, leveraging its scalability and performance c...read more
Q6. How would you convince client to migrate to cloud?
Migrating to the cloud offers numerous benefits such as cost savings, scalability, and improved security.
Highlight the cost savings that can be achieved by migrating to the cloud, as clients can avoid upfront infrastructure costs and pay only for the resources they use.
Emphasize the scalability of cloud services, allowing clients to easily scale up or down based on their needs without the need for additional hardware investments.
Discuss the improved security measures provided...read more
Q7. 5. SQL inner and left join with tables having duplicate values
SQL inner and left join can be used to combine tables with duplicate values based on specified conditions.
Use INNER JOIN to return rows from both tables that have matching values
Use LEFT JOIN to return all rows from the left table and the matched rows from the right table
Handle duplicate values by using DISTINCT or GROUP BY clauses
Q8. How do you read files in notebook What are configuration needed to read data Why you have not used adf trigger only What is parquet format Window functions vs group by How to read a CSV file and store it in par...
read moreReading files in notebook, configuring data, using ADF trigger, parquet format, window functions vs group by, reading CSV file and storing in parquet, dataset vs dataframe, transformations, delta lake
To read files in notebook, use libraries like pandas or pyspark
Configuration needed includes specifying file path, format, and any additional options
ADF trigger can be used for automated data processing, but may not be necessary for all scenarios
Parquet format is a columnar stora...read more
Q9. What is ADLS and diff between ADLS gen1 and gen2
ADLS is Azure Data Lake Storage, a scalable and secure data lake solution. ADLS gen2 is an improved version of gen1.
ADLS is a cloud-based storage solution for big data analytics workloads
ADLS gen1 is based on Hadoop Distributed File System (HDFS) and has limitations in terms of scalability and performance
ADLS gen2 is built on Azure Blob Storage and offers improved performance, scalability, and security features
ADLS gen2 supports hierarchical namespace, which enables efficient...read more
Q10. tell me the difficult problem come across and how you resove it
Encountered a data corruption issue in Azure Data Lake Storage and resolved it by restoring from a backup.
Identified the corrupted files by analyzing error logs and data inconsistencies
Restored the affected data from the latest backup available
Implemented preventive measures such as regular data integrity checks and backups
Collaborated with the Azure support team to investigate the root cause
Q11. 4. Difference between delta and parquet?
Delta is an open-source storage layer that brings ACID transactions to Apache Spark and big data workloads, while Parquet is a columnar storage format optimized for reading and writing data in large volumes.
Delta is designed for use with big data workloads and provides ACID transactions, while Parquet is optimized for reading and writing large volumes of data efficiently.
Delta allows for updates and deletes of data, while Parquet is a read-only format.
Delta supports schema ev...read more
Q12. 4. Do you know pyspark?
Yes, pyspark is a Python API for Apache Spark, used for big data processing and analytics.
pyspark is a Python API for Apache Spark, allowing users to write Spark applications using Python.
It provides high-level APIs in Python for Spark's functionality, making it easier to work with big data.
pyspark is commonly used for data processing, machine learning, and analytics tasks.
Example: Using pyspark to read data from a CSV file, perform transformations, and store the results in a...read more
Q13. Difference between azure Iaas and Paas
IaaS provides virtualized infrastructure resources, while PaaS offers a platform for developing, testing, and managing applications.
IaaS allows users to rent virtualized hardware resources like virtual machines, storage, and networking, while PaaS provides a platform for developers to build, deploy, and manage applications without worrying about the underlying infrastructure.
In IaaS, users have more control over the operating system, applications, and data, while in PaaS, the...read more
Q14. what is IR in adf pipe line
IR in ADF pipeline stands for Integration Runtime, which is a compute infrastructure used by Azure Data Factory to provide data integration capabilities across different network environments.
IR in ADF pipeline is responsible for executing activities within the pipeline.
It can be configured to run in different modes such as Azure, Self-hosted, and SSIS.
Integration Runtime allows data movement between on-premises and cloud data stores.
It provides secure connectivity and data en...read more
Q15. rdd vs dataframe
RDD is a basic abstraction in Spark representing data as a distributed collection of objects, while DataFrame is a distributed collection of data organized into named columns.
RDD is more low-level and less optimized compared to DataFrame
DataFrames are easier to use for data manipulation and analysis
DataFrames provide a more structured way to work with data compared to RDDs
RDDs are suitable for unstructured data processing, while DataFrames are better for structured data
More about working at TCS
Top HR Questions asked in Cyient
Interview Process at Cyient
Top Azure Data Engineer Interview Questions from Similar Companies
Reviews
Interviews
Salaries
Users/Month