Add office photos
Engaged Employer

TCS

3.7
based on 85.5k Reviews
Filter interviews by

20+ iSource Interview Questions and Answers

Updated 11 Dec 2024
Popular Designations

Q1. What is the difference between Tasks, and stages? About Spark UI?

Ans.

Tasks and stages are components of the execution plan in Spark UI.

  • Tasks are the smallest unit of work in Spark, representing a single operation on a partition of data.

  • Stages are groups of tasks that are executed together as part of a larger computation.

  • Tasks within a stage can be executed in parallel, while stages are executed sequentially.

  • Tasks are created based on the transformations and actions in the Spark application.

  • Stages are created based on the dependencies between R...read more

Add your answer

Q2. What are optimisation techniques used in the project?

Ans.

Optimisation techniques used in the project include indexing, query optimization, caching, and parallel processing.

  • Indexing: Creating indexes on frequently queried columns to improve search performance.

  • Query optimization: Rewriting queries to make them more efficient and reduce execution time.

  • Caching: Storing frequently accessed data in memory to reduce the need for repeated database queries.

  • Parallel processing: Distributing tasks across multiple processors to speed up data p...read more

Add your answer

Q3. What is the SQL query to group by employee ID in order to combine the first name and last name with a space?

Ans.

SQL query to group by employee ID and combine first name and last name with a space

  • Use the GROUP BY clause to group by employee ID

  • Use the CONCAT function to combine first name and last name with a space

  • Select employee ID, CONCAT(first_name, ' ', last_name) AS full_name

Add your answer

Q4. What are the different joins in SQL? Please give an example to elaborate.

Ans.

Different types of joins in SQL include inner join, left join, right join, and full outer join.

  • Inner join: Returns rows when there is a match in both tables.

  • Left join: Returns all rows from the left table and the matched rows from the right table.

  • Right join: Returns all rows from the right table and the matched rows from the left table.

  • Full outer join: Returns rows when there is a match in either table.

  • Example: SELECT * FROM table1 INNER JOIN table2 ON table1.id = table2.id;

Add your answer
Discover iSource interview dos and don'ts from real experiences

Q5. How to write a file in a delta table?

Ans.

To write a file in a delta table, you can use the Delta Lake API or Spark SQL commands.

  • Use Delta Lake API to write data to a delta table

  • Use Spark SQL commands like INSERT INTO to write data to a delta table

  • Ensure that the data being written is in the correct format and schema

Add your answer

Q6. How can you optimize your queries for efficiency in BQ?

Ans.

Optimizing queries in BigQuery involves using partitioned tables, clustering, and optimizing joins.

  • Partition tables by date or another relevant column to reduce the amount of data scanned

  • Use clustering to group related rows together, reducing the amount of data scanned for queries

  • Avoid unnecessary joins and denormalize data where possible to reduce query complexity

Add your answer
Are these interview questions helpful?

Q7. Do you have experience in Dataflow, Dataproc, cloud composer?

Ans.

Yes, I have experience in Dataflow, Dataproc, and cloud composer.

  • I have worked with Dataflow to process and analyze large datasets in real-time.

  • I have used Dataproc to create and manage Apache Spark and Hadoop clusters for big data processing.

  • I have experience with cloud composer for orchestrating workflows and managing data pipelines.

Add your answer

Q8. What are the Types of SCD?

Ans.

Types of SCD include Type 1, Type 2, and Type 3.

  • Type 1 SCD: Overwrites old data with new data, no history is maintained.

  • Type 2 SCD: Maintains historical data by creating new records for changes.

  • Type 3 SCD: Creates separate columns to store historical and current data.

  • Examples: Type 1 - Employee address updates overwrite old address. Type 2 - Employee salary changes create new record with effective date. Type 3 - Employee job title history stored in separate columns.

Add your answer
Share interview questions and help millions of jobseekers 🌟

Q9. Explain cloud functions like cloud build, cloud run in GCP.

Ans.

Cloud functions like Cloud Build and Cloud Run in GCP are serverless computing services for building and running applications in the cloud.

  • Cloud Build is a service that executes your builds on Google Cloud Platform infrastructure. It automatically builds and tests your code in the cloud.

  • Cloud Run is a managed compute platform that enables you to run stateless containers that are invocable via HTTP requests. It automatically scales up or down based on traffic.

  • Cloud Functions i...read more

Add your answer

Q10. Why spark works well with parquet files?

Ans.

Spark works well with Parquet files due to its columnar storage format, efficient compression, and ability to push down filters.

  • Parquet files are columnar storage format, which aligns well with Spark's processing model of working on columns rather than rows.

  • Parquet files support efficient compression, reducing storage space and improving read performance in Spark.

  • Spark can push down filters to Parquet files, allowing for faster query execution by only reading relevant data.

  • Pa...read more

Add your answer

Q11. Role of DAG ins aprk ?

Ans.

DAG (Directed Acyclic Graph) in Apache Spark is used to represent a series of data processing steps and their dependencies.

  • DAG in Spark helps optimize the execution of tasks by determining the order in which they should be executed based on dependencies.

  • It breaks down a Spark job into smaller tasks and organizes them in a way that minimizes unnecessary computations.

  • DAGs are created automatically by Spark when actions are called on RDDs or DataFrames.

  • Example: If a Spark job in...read more

Add your answer

Q12. Could you please explain GCP architecture?

Ans.

GCP architecture refers to the structure and components of Google Cloud Platform for building and managing applications and services.

  • GCP architecture is based on a global network of data centers that provide secure, scalable infrastructure for cloud services.

  • Key components include Compute Engine for virtual machines, Cloud Storage for object storage, and BigQuery for data analytics.

  • GCP architecture also includes networking services like Virtual Private Cloud (VPC) for secure ...read more

Add your answer

Q13. How do you decide Spark configuration for a job

Ans.

Spark configuration for a job is decided based on factors like data size, cluster resources, and job requirements.

  • Consider the size of the data being processed to determine the number of partitions and memory requirements.

  • Evaluate the available cluster resources such as CPU cores, memory, and storage to optimize performance.

  • Adjust parameters like executor memory, executor cores, and driver memory based on the complexity of the job.

  • Use dynamic allocation to efficiently utilize...read more

Add your answer

Q14. What are constructors in Python?

Ans.

Constructors in Python are special methods used for initializing objects. They are called automatically when a new instance of a class is created.

  • Constructors are defined using the __init__() method in a class.

  • They are used to initialize instance variables of a class.

  • Example: class Person: def __init__(self, name, age): self.name = name self.age = age person1 = Person('Alice', 30)

Add your answer

Q15. Delta Lake vs Data Lake?

Ans.

Delta Lake is an open-source storage layer that brings ACID transactions to Apache Spark and big data workloads.

  • Delta Lake provides ACID transactions, schema enforcement, and data versioning on top of data lakes.

  • Data Lake is a centralized repository that allows you to store all your structured and unstructured data at any scale.

  • Delta Lake is optimized for big data workloads and is built on top of Apache Spark.

  • Data Lake can store data from various sources like IoT devices, soc...read more

Add your answer

Q16. What is SQL and explain clearly

Ans.

SQL is a programming language used for managing and manipulating relational databases.

  • SQL stands for Structured Query Language

  • It is used to communicate with databases to perform tasks such as querying data, updating data, and creating tables

  • Common SQL commands include SELECT, INSERT, UPDATE, DELETE

  • Example: SELECT * FROM employees WHERE department = 'IT'

Add your answer

Q17. What is python and explain clearly

Ans.

Python is a high-level programming language known for its simplicity and readability.

  • Python is an interpreted language, meaning code is executed line by line.

  • It is dynamically typed, allowing for flexibility in variable types.

  • Python is popular for web development, data analysis, artificial intelligence, and more.

  • Example: print('Hello, World!') is a simple Python program to display text.

Add your answer

Q18. How do you deploy spark application

Ans.

Spark applications can be deployed using various methods like standalone mode, YARN, Mesos, or Kubernetes.

  • Deploy Spark application in standalone mode by submitting the application using spark-submit command

  • Deploy Spark application on YARN by setting the master to yarn and submitting the application to the YARN ResourceManager

  • Deploy Spark application on Mesos by setting the master to mesos and submitting the application to the Mesos cluster

  • Deploy Spark application on Kubernete...read more

Add your answer

Q19. Write a Python program

Ans.

Python program to print 'Hello, World!'

  • Use the print() function in Python to display text on the screen

  • Enclose the text in single or double quotes to indicate a string

Add your answer

Q20. Decorators in python

Ans.

Decorators in Python are functions that modify the behavior of other functions or methods.

  • Decorators are defined using the @decorator_name syntax before a function definition.

  • They can be used to add functionality to existing functions without modifying their code.

  • Decorators can be used for logging, timing, authentication, and more.

  • Example: @staticmethod decorator in Python is used to define a static method in a class.

Add your answer

Q21. Indexing in sql

Ans.

Indexing in SQL is a technique used to improve the performance of queries by creating a data structure that allows for faster retrieval of data.

  • Indexes are created on columns in a database table to speed up the retrieval of rows that match a certain condition in a WHERE clause.

  • Indexes can be created using CREATE INDEX statement in SQL.

  • Types of indexes include clustered indexes, non-clustered indexes, unique indexes, and composite indexes.

  • Example: CREATE INDEX idx_lastname ON ...read more

Add your answer

Q22. Write a SQL query

Ans.

SQL query to retrieve all employees from a table named 'employees'

  • Use SELECT * FROM employees;

  • Replace '*' with specific columns if needed, e.g. SELECT employee_id, name FROM employees;

Add your answer
Contribute & help others!
Write a review
Share interview
Contribute salary
Add office photos

Interview Process at iSource

based on 9 interviews in the last 1 year
1 Interview rounds
Technical Round
View more
Interview Tips & Stories
Ace your next interview with expert advice and inspiring stories

Top Senior Data Engineer Interview Questions from Similar Companies

3.9
 • 36 Interview Questions
3.5
 • 14 Interview Questions
3.8
 • 10 Interview Questions
View all
Share an Interview
Stay ahead in your career. Get AmbitionBox app
qr-code
Helping over 1 Crore job seekers every month in choosing their right fit company
70 Lakh+

Reviews

5 Lakh+

Interviews

4 Crore+

Salaries

1 Cr+

Users/Month

Contribute to help millions
Get AmbitionBox app

Made with ❤️ in India. Trademarks belong to their respective owners. All rights reserved © 2024 Info Edge (India) Ltd.

Follow us
  • Youtube
  • Instagram
  • LinkedIn
  • Facebook
  • Twitter