Celebal Technologies
60+ Episource Interview Questions and Answers
Q1. Are you familiar with Celebal Technologies
Celebal Technologies is a technology company specializing in data engineering and analytics solutions.
Celebal Technologies is known for providing data engineering and analytics solutions.
They offer services such as data integration, data warehousing, and data visualization.
Celebal Technologies works with clients across various industries to help them optimize their data processes.
They have expertise in technologies like Hadoop, Spark, and Python for data engineering.
The compa...read more
Q2. What is the Goal of Celebal Technologies
Celebal Technologies aims to provide innovative solutions using cutting-edge technologies to help businesses thrive in the digital age.
Developing advanced data analytics tools for businesses
Creating AI-driven solutions for automation and optimization
Offering consulting services for digital transformation
Empowering organizations with cloud computing and IoT solutions
Q3. Difference between having clause and where clause?
The WHERE clause is used to filter rows before grouping, while the HAVING clause is used to filter groups after grouping.
WHERE clause is used with SELECT, UPDATE, and DELETE statements.
HAVING clause is used with SELECT statements that include GROUP BY clause.
WHERE clause filters individual rows based on conditions.
HAVING clause filters groups based on conditions.
WHERE clause is applied before the GROUP BY clause.
HAVING clause is applied after the GROUP BY clause.
WHERE clause ...read more
Q4. What is your experience in Pyspark , python
I have extensive experience in using Pyspark and Python for data engineering tasks.
I have worked on various projects involving data processing, transformation, and analysis using Pyspark and Python.
I am proficient in writing efficient and optimized code in Pyspark for big data processing.
I have experience in handling large datasets and implementing complex data pipelines using Pyspark and Python.
Q5. 2. What are the main modules of Node.js? Explain in detail.
The main modules of Node.js are HTTP, File System, Path, and Events.
HTTP module allows Node.js to transfer data over the HyperText Transfer Protocol (HTTP).
File System module enables interaction with the file system, allowing reading, writing, and manipulating files.
Path module provides utilities for working with file and directory paths.
Events module allows for event-driven programming, enabling the creation and handling of custom events.
Q6. What are advance types of join ?
Advanced types of join include outer join, self join, and cross join.
Outer join: includes unmatched rows from one or both tables
Self join: joins a table with itself
Cross join: combines each row from one table with each row from another table
Q7. What are the best practices for handling large data sets ?
Best practices for handling large data sets include data preprocessing, using distributed computing frameworks, and optimizing storage and retrieval methods.
Perform data preprocessing to clean and transform data before analysis.
Utilize distributed computing frameworks like Hadoop or Spark for parallel processing.
Optimize storage and retrieval methods by using efficient data structures and indexing.
Consider using cloud services for scalable storage and processing capabilities....read more
Q8. What is the purpose of a confusion matrix in data science?
A confusion matrix is a table that is used to describe the performance of a classification model.
It shows the number of true positives, true negatives, false positives, and false negatives.
It helps in evaluating the performance of a machine learning model by providing insights into the model's accuracy, precision, recall, and F1 score.
It is particularly useful in scenarios where class imbalance exists or when different misclassification costs are involved.
Example: In a binary...read more
Q9. 1. What is Node.js? Describe the inner workings of Node.js
Node.js is a JavaScript runtime built on Chrome's V8 JavaScript engine, used for server-side and networking applications.
Node.js is an open-source, cross-platform runtime environment for executing JavaScript code outside of a browser.
It uses an event-driven, non-blocking I/O model that makes it lightweight and efficient.
Node.js allows developers to build scalable and high-performance applications.
It provides a rich set of built-in modules and libraries for various functionali...read more
Q10. what are python data structure?
Python data structures are containers that hold and organize data in different ways.
Some common Python data structures are lists, tuples, sets, and dictionaries.
Lists are ordered and mutable, allowing duplicate elements.
Tuples are ordered and immutable, useful for storing related data together.
Sets are unordered and contain unique elements, useful for mathematical operations.
Dictionaries are key-value pairs, providing fast access to values based on keys.
Q11. What is joins , write a subquery
Joins are used to combine rows from two or more tables based on a related column between them.
Joins are used in SQL to retrieve data from multiple tables based on a related column between them.
Common types of joins include INNER JOIN, LEFT JOIN, RIGHT JOIN, and FULL JOIN.
Subqueries are queries nested within another query, typically used to return a single value or a list of values for comparison.
Q12. Explain the differnece between the difference?
The difference between the difference is the result of subtracting one value from another.
Difference is the result of subtracting two values.
The difference between two values can be positive, negative, or zero.
For example, the difference between 10 and 5 is 5.
Q13. How to create soft link and hardlink How to create empty file in Linux?
To create a soft link, use the 'ln -s' command. To create a hard link, use the 'ln' command. To create an empty file, use the 'touch' command.
To create a soft link: ln -s
To create a hard link: ln
To create an empty file: touch
Q14. Multiple 2 numbers without using '*' operator.
Use bitwise operations to multiply two numbers without using '*' operator.
Use bitwise shift left operation to multiply a number by 2.
Add the result of shifting left n times for the second number.
Example: 3 * 5 = 3 << 2 + 3 = 12.
Q15. What is indexing in SQL?
Indexing in SQL is a technique used to improve the performance of queries by creating a data structure that allows for faster retrieval of data.
Indexes are created on columns in a database table to speed up the retrieval of data.
They work similar to the index in a book, allowing the database to quickly find the rows that match a certain condition.
Indexes can be created using a single column or a combination of columns.
Examples: CREATE INDEX index_name ON table_name(column_nam...read more
Q16. What is the use of chmod and chown command? give me one example?
chmod and chown commands are used to change file permissions and ownership respectively.
chmod command is used to change the permissions of a file or directory
chown command is used to change the ownership of a file or directory
Example: chmod 755 file.txt - This command gives read, write and execute permissions to the owner and read and execute permissions to group and others
Example: chown user1 file.txt - This command changes the ownership of file.txt to user1
Q17. How to create the sub directories using mkdir command?
To create sub directories using mkdir command, use the -p option followed by the directory path.
Use the command 'mkdir -p directory/subdirectory'
The -p option creates parent directories if they don't exist
Multiple subdirectories can be created at once using 'mkdir -p directory/subdirectory1/subdirectory2'
Use 'mkdir -m' option to set permissions for the directory
Q18. Explain Differnece between ETL AND ELT?
ETL is Extract, Transform, Load where data is extracted, transformed, and loaded in that order. ELT is Extract, Load, Transform where data is extracted, loaded, and then transformed.
ETL: Data is extracted from the source, transformed in a separate system, and then loaded into the target system.
ELT: Data is extracted from the source, loaded into the target system, and then transformed within the target system.
ETL is suitable for scenarios where data needs to be transformed bef...read more
Q19. Differnce between join and union?
Join combines rows from two or more tables based on a related column, while union combines rows from two or more tables into a single result set.
Join is used to combine rows from different tables based on a related column.
Union is used to combine rows from two or more tables into a single result set.
Join can be used with different types like inner join, left join, right join, etc.
Union only combines rows with the same number of columns and compatible data types.
Join can inclu...read more
Q20. Joins, ADF, CTE find 2nd highest no
Use SQL query with joins, common table expressions (CTE) and Azure Data Factory (ADF) to find the 2nd highest number.
Use a SQL query with joins to combine the necessary tables or datasets.
Utilize common table expressions (CTE) to simplify the query and make it more readable.
Leverage Azure Data Factory (ADF) to automate the process of running the query and retrieving the result.
Order the numbers in descending order and select the second row to find the 2nd highest number.
Q21. What are the basics of data science?
Data science involves extracting insights from data using various techniques and tools.
Data collection and cleaning
Exploratory data analysis
Statistical modeling and machine learning
Data visualization
Communication of results
Q22. what is friend function, what is function overloading and overriding. difference between run time polymorphism and compile time polymorphism.
Friend function is a function that is not a member of a class but has access to its private and protected members. Function overloading is defining multiple functions with the same name but different parameters. Function overriding is redefining a base class function in a derived class. Runtime polymorphism is achieved through virtual functions and late binding, while compile time polymorphism is achieved through function overloading and templates.
Friend function can access p...read more
Q23. Calculate repeating element in list
Find the repeating element in a list
Iterate through the list and keep track of elements seen so far
Use a hash set to efficiently check for duplicates
Return the first element that is already in the set
Q24. Calculate maximum salary in SQL
Use the MAX() function in SQL to calculate the maximum salary.
Use the MAX() function along with the column name of the salary field.
Example: SELECT MAX(salary) FROM employees;
Ensure the correct table and column names are used in the query.
Q25. what is difference between duplicate and reference ?
Duplicate is an exact copy of something, while reference is a pointer to the original object.
Duplicate is a separate copy of the original item, while reference points to the original item.
Changes made to a duplicate do not affect the original, but changes made to a reference affect the original.
Duplicate has its own memory space, while reference shares memory space with the original object.
Q26. How is an empty class created in python.
An empty class can be created in Python using the 'pass' keyword.
Use the 'class' keyword to define a class.
Add the class name and a colon after the 'class' keyword.
Use the 'pass' keyword to indicate an empty class body.
Q27. DNS, IP Addressing,what is cloud computing, DHCP, what is saas, paas and iaas, etc
Q28. What Is testing? 2.what are types nd level of testing 3.STLC 4.basics questions about Selenium
Testing is the process of evaluating a system or its component(s) with the intent to find whether it satisfies the specified requirements or not.
Types of testing include functional testing, performance testing, security testing, usability testing, etc.
Levels of testing include unit testing, integration testing, system testing, acceptance testing, etc.
STLC stands for Software Testing Life Cycle and includes phases like requirement analysis, test planning, test design, test exe...read more
Q29. LSTM & GRU, Which to use when ?
LSTM for longer sequences, GRU for faster training and less complex models.
Use LSTM for tasks requiring long-term dependencies and memory retention.
Use GRU for faster training and simpler models with fewer parameters.
Consider using LSTM for tasks like language translation or speech recognition.
Consider using GRU for tasks like sentiment analysis or text generation.
Q30. What is maven What is maven default port no
Maven is a build automation tool used primarily for Java projects. It manages project dependencies and builds the project.
Maven is based on the concept of a Project Object Model (POM) file, which describes the project structure and dependencies.
It uses a centralized repository called Maven Central to download dependencies.
Maven can be used to compile, test, package, and deploy Java applications.
It provides a consistent and repeatable build process, making it easier to manage ...read more
Q31. What are the principles of the Project Management ?
Project Management principles are fundamental guidelines for managing projects effectively.
Clear objectives and goals must be defined at the beginning of the project.
Effective communication is essential for all stakeholders involved in the project.
Proper planning and scheduling are crucial for successful project completion.
Risk management strategies should be in place to address potential issues that may arise.
Regular monitoring and evaluation of progress are necessary to ens...read more
Q32. versioning in aws
Versioning in AWS allows you to manage different versions of your resources.
AWS S3 supports object versioning to keep multiple versions of an object in the same bucket.
AWS Lambda supports versioning to manage different versions of your functions.
AWS API Gateway supports versioning to manage different versions of your APIs.
Q33. Difference between entropy & information gain
Entropy measures randomness in data, while information gain measures the reduction in uncertainty after splitting data.
Entropy is used in decision trees to measure impurity in a dataset before splitting it.
Information gain is used in decision trees to measure the effectiveness of a split in reducing uncertainty.
Entropy ranges from 0 (pure dataset) to 1 (completely impure dataset).
Information gain is calculated as the difference between the entropy of the parent node and the w...read more
Q34. In the context of Dot Net Framework, What is DotNet vs Dot Net Framework? Why do use it? What is role of DI,Service Lifetime scope ?
DotNet is a platform while Dot Net Framework is a software framework built on top of it.
DotNet is a platform that provides a runtime environment for executing applications.
Dot Net Framework is a software framework built on top of DotNet platform.
Dot Net Framework provides a set of libraries and tools for developing and running applications.
Dependency Injection (DI) is a design pattern used to implement Inversion of Control (IoC) in applications.
DI is used to decouple the comp...read more
Q35. Do you have knowledge of Cloud service, SQL etc?
Yes, I have knowledge of Cloud services such as AWS, Azure, and Google Cloud, as well as SQL databases.
Familiar with Cloud services like AWS, Azure, and Google Cloud
Proficient in SQL databases
Experience in setting up and managing databases in the Cloud
Q36. Explain some of the Python functions you worked on in the Project with the values ?
I worked on Python functions in the project to manipulate data and perform calculations.
Used Python functions like 'sum()', 'max()', 'min()' to calculate total, maximum, and minimum values of datasets.
Implemented custom functions to clean and preprocess data before analysis.
Utilized functions like 'filter()', 'map()', 'reduce()' for data transformation and aggregation.
Created functions to generate visualizations using libraries like Matplotlib and Seaborn.
Q37. What are oops? Explain 4 pillars
OOPs stands for Object-Oriented Programming. The 4 pillars are Inheritance, Encapsulation, Abstraction, and Polymorphism.
Inheritance: Allows a class to inherit properties and behavior from another class. Example: Parent class 'Animal' and child class 'Dog' inheriting from 'Animal'.
Encapsulation: Bundling data and methods that operate on the data into a single unit. Example: Using private variables and public methods in a class.
Abstraction: Hiding the complex implementation de...read more
Q38. Difference in Credit underwriting procedures for Retail and Agri Loans?
Credit underwriting procedures differ in Retail and Agri Loans due to varying risk factors and collateral requirements.
Retail loans typically involve individual borrowers with stable income and credit history, while Agri loans involve farmers with fluctuating income and collateral in the form of agricultural assets.
Retail loans may require credit score checks and income verification, while Agri loans may focus more on the value of the agricultural land and crops.
Agri loans ma...read more
Q39. What are the different types of loans?
Different types of loans include personal loans, home loans, auto loans, student loans, and business loans.
Personal loans
Home loans
Auto loans
Student loans
Business loans
Q40. what are joins in sql ?
Joins in SQL are used to combine rows from two or more tables based on a related column between them.
Joins are used to retrieve data from multiple tables based on a related column
Common types of joins include INNER JOIN, LEFT JOIN, RIGHT JOIN, and FULL JOIN
Example: SELECT * FROM table1 INNER JOIN table2 ON table1.column = table2.column
Q41. SQL queries to find second highest salary
Use SQL query with subquery to find second highest salary
Use ORDER BY clause to sort salaries in descending order
Use LIMIT clause to get the second row after skipping the first row
Use a subquery to avoid duplicates if multiple employees have the same highest salary
Q42. How to design a NLP engine?
Designing a NLP engine involves defining the problem, selecting appropriate algorithms, and training the model.
Define the problem and identify the data sources
Select appropriate algorithms for tasks such as tokenization, part-of-speech tagging, and named entity recognition
Train the model using labeled data and evaluate its performance
Continuously improve the model by incorporating feedback and updating the algorithms
Q43. Difference between Secured and Unsecured loans?
Secured loans are backed by collateral, while unsecured loans are not.
Secured loans require collateral to secure the loan, such as a house or car
Unsecured loans do not require collateral, but typically have higher interest rates
Secured loans are less risky for lenders, as they have a way to recover their money if the borrower defaults
Examples of secured loans include mortgages and auto loans, while credit cards and personal loans are examples of unsecured loans
Q44. Choices were given between data science and data engineering?
Data science focuses on analyzing and interpreting complex data, while data engineering focuses on designing and building data pipelines.
Data science involves analyzing and interpreting large amounts of data to extract insights and make predictions.
Data engineering involves designing and building data pipelines to collect, store, and process data efficiently.
Data scientists use tools like Python, R, and machine learning algorithms to analyze data.
Data engineers work with tech...read more
Q45. What is MongoDB? Explain it.
MongoDB is a NoSQL database program that uses a document-oriented data model.
NoSQL database program
Document-oriented data model
Uses JSON-like documents for data storage
Supports dynamic schemas for flexible data structures
Q46. What is DBMS and SQL?
DBMS stands for Database Management System, which is a software system that manages databases. SQL is a language used to interact with databases.
DBMS is a software system that allows users to define, create, maintain, and control access to databases.
SQL (Structured Query Language) is a language used to communicate with a database. It is used to retrieve, update, and manage data in a database.
Examples of popular DBMS include MySQL, Oracle, SQL Server, and PostgreSQL.
Examples o...read more
Q47. What is append in power query
Append in Power Query is used to combine multiple tables or queries into a single table or query.
Append is used to stack tables on top of each other, adding rows from one table to the end of another.
It is useful when you have multiple tables with the same structure and want to combine them into one.
You can append tables from different data sources or within the same data source.
For example, you can append sales data from different regions into a single table for analysis.
Q48. what is merge in power query
Merge in Power Query combines multiple tables into one by matching rows based on specified columns.
Merge is used to combine tables in Power Query
You can merge based on one or more columns
Different types of merges include inner, left outer, right outer, and full outer joins
Q49. SQL queries to find duplicate values
Use SQL queries with GROUP BY and HAVING clause to find duplicate values in a table.
Use GROUP BY clause to group the records based on the columns you want to check for duplicates.
Use HAVING clause to filter out the groups that have more than one record, indicating duplicates.
Example: SELECT column_name, COUNT(*) FROM table_name GROUP BY column_name HAVING COUNT(*) > 1;
Q50. Difference between shallow copy and deep copy
Shallow copy only copies the references of nested objects, while deep copy creates new copies of nested objects.
Shallow copy creates a new object, but does not create copies of nested objects. Changes in nested objects reflect in both original and shallow copied objects.
Deep copy creates a new object and also creates copies of nested objects. Changes in nested objects do not reflect in the original object.
Q51. what is filter context
Filter context determines which rows of data are visible to calculations in DAX formulas.
Filter context is dynamic and changes based on user interactions.
It can be set by slicers, filters, or relationships between tables.
Filter context is used to calculate results based on the current filter selections.
It helps in determining which rows of data are included in calculations.
Q52. what is summarize function
Summarize function is used in Power BI to aggregate data based on specified columns.
Used to group data and perform aggregate functions like sum, count, average, etc.
Can be used in DAX formulas to create calculated columns or measures.
Example: SUMMARIZE('Table', 'Column1', 'Column2', SUM('Value'))
Q53. What is js libraries?
JS libraries are pre-written code that can be used to perform specific tasks in JavaScript.
Libraries can save time and effort by providing pre-written code for common tasks
Popular JS libraries include jQuery, React, and Angular
Libraries can be installed using package managers like npm or yarn
Q54. What are solid principles
SOLID principles are a set of five design principles that help make software designs more understandable, flexible, and maintainable.
S - Single Responsibility Principle: A class should have only one reason to change.
O - Open/Closed Principle: Software entities should be open for extension but closed for modification.
L - Liskov Substitution Principle: Objects of a superclass should be replaceable with objects of its subclasses without affecting the functionality.
I - Interface ...read more
Q55. Explain all hooks of React.
React hooks are functions that let you use state and other React features without writing a class.
Hooks are functions that let you use state and other React features in functional components
useState() - allows functional components to have local state
useEffect() - allows functional components to perform side effects
useContext() - allows functional components to access context
useReducer() - an alternative to useState() for managing complex state logic
useMemo() - memoizes the r...read more
Q56. Write code for patterns
Code for patterns
Decide on the pattern to be created
Use loops and conditional statements to generate the pattern
Test the code with different inputs
Q57. What is node js
Node.js is a JavaScript runtime built on Chrome's V8 JavaScript engine that allows developers to run JavaScript on the server side.
Node.js is an open-source, cross-platform runtime environment for executing JavaScript code outside of a browser.
It uses an event-driven, non-blocking I/O model that makes it lightweight and efficient for real-time applications.
Node.js is commonly used for building web servers, APIs, and microservices.
It has a large ecosystem of libraries and fram...read more
Q58. Pattern program using any language
Print a pattern program using any programming language
Use nested loops to print the desired pattern
Identify the pattern and determine the number of rows and columns needed
Experiment with different loop structures to achieve the desired output
Q59. Pattern of Pascal Triangle
Pascal's Triangle is a mathematical pattern where each number is the sum of the two numbers directly above it.
Each row starts and ends with 1.
To get the middle numbers, add the two numbers above it.
Example: Row 4 - 1 3 3 1 (1+3=4, 3+3=6, 3+1=4)
Q60. What is Devops?
Devops is a software development methodology that combines software development (Dev) with IT operations (Ops) to shorten the systems development life cycle.
Devops focuses on collaboration, communication, and integration between software developers and IT operations teams.
It aims to automate the process of software delivery and infrastructure changes to improve the speed and quality of software development.
Devops practices include continuous integration, continuous delivery, ...read more
Q61. What are closures?
Closures are functions that have access to their own scope, as well as the scope in which they were defined.
Closures are created when a function is defined within another function and has access to the outer function's variables.
They allow for maintaining state in an asynchronous environment.
Closures can be used to create private variables and functions in JavaScript.
Q62. Explain the OOPs concept
OOPs (Object-Oriented Programming) is a programming paradigm based on the concept of objects, which can contain data in the form of fields and code in the form of procedures.
OOPs focuses on creating objects that interact with each other to solve problems.
Key principles of OOPs include encapsulation, inheritance, polymorphism, and abstraction.
Encapsulation involves bundling data and methods that operate on the data into a single unit.
Inheritance allows a class to inherit prope...read more
Q63. WHAT IS INDEX IN SQL
Index in SQL is a data structure that improves the speed of data retrieval operations on a database table.
Indexes are created on columns in a database table to quickly retrieve data based on the values in those columns.
They can be unique or non-unique, clustered or non-clustered.
Indexes can significantly improve the performance of SELECT queries but may slow down INSERT, UPDATE, and DELETE operations.
Examples: CREATE INDEX idx_name ON table_name(column_name); SELECT * FROM ta...read more
Q64. Explain Task Parallel Library
Task Parallel Library is a .NET framework that enables parallel processing of tasks.
TPL provides a higher-level abstraction for parallelism than traditional threading models.
It includes features like Task and Task
classes, Parallel.ForEach and Parallel.Invoke methods. TPL can improve performance by utilizing multiple cores and processors.
TPL also includes cancellation and exception handling mechanisms.
Example: Parallel.ForEach can be used to process a collection of items in pa...read more
Q65. What is Spark ?
Q66. What is Cloud ?
Q67. Fibonacci series using python
Fibonacci series is a sequence of numbers where each number is the sum of the two preceding ones.
Initialize variables for the first two numbers in the series (0 and 1)
Use a loop to calculate the next number by adding the previous two numbers
Continue this process until reaching the desired length of the series
Q68. Networking of linux and cloud
Networking of linux and cloud involves configuring network settings, security protocols, and communication between servers and services.
Configuring network interfaces in Linux using tools like ifconfig or ip command
Setting up virtual private networks (VPNs) for secure communication between cloud servers
Implementing firewall rules to control incoming and outgoing network traffic
Using protocols like SSH, HTTPS, or VPN protocols for secure communication
Utilizing cloud networking...read more
Q69. Baaics of SQL AND ADF
SQL is a language used for managing relational databases, while ADF (Azure Data Factory) is a cloud-based data integration service.
SQL is used to query, insert, update, and delete data in databases.
ADF is used to create data pipelines for moving and transforming data in the cloud.
SQL examples: SELECT * FROM table_name; INSERT INTO table_name (column1, column2) VALUES (value1, value2);
ADF example: Create a pipeline to copy data from an on-premises database to Azure Blob Storag...read more
Top HR Questions asked in Episource
Interview Process at Episource
Top Interview Questions from Similar Companies
Reviews
Interviews
Salaries
Users/Month