Data Science

20+ Data Science Interview Questions and Answers

Updated 6 Jul 2025

Asked in Infotact Solutions

5d ago

Q. What are Python’s key features? Explain list vs. tuple vs. set vs. dictionary. How do you handle missing data in a dataset? What is the difference between apply() and map() in pandas? What are lambda functions...

Ans.

Python is a versatile programming language known for its simplicity, readability, and extensive libraries for data science.

Easy to Learn: Python's syntax is clear and intuitive, making it accessible for beginners and experienced programmers alike.
Extensive Libraries: Python has a rich ecosystem of libraries like NumPy, Pandas, and Matplotlib for data manipulation and analysis.
Cross-Platform: Python runs on various operating systems, including Windows, macOS, and Linux, ensuri...read more

Asked in Infotact Solutions

6d ago

Q. What is the Central Limit Theorem? Explain p-value and its importance. What are Type I and Type II errors? Describe a normal distribution and its properties. What is correlation vs. causation?

Ans.

The Central Limit Theorem states that the distribution of sample means approaches normality as sample size increases.

Central Limit Theorem (CLT): Regardless of the population distribution, the sampling distribution of the mean will be approximately normal if the sample size is large enough (n > 30).
P-Value: The p-value measures the strength of evidence against the null hypothesis; a low p-value (typically < 0.05) indicates strong evidence to reject the null hypothesis.
Type I ...read more

Data Science Interview Questions and Answers for Freshers

View all interview questions

Asked in Infotact Solutions

3d ago

Q. If your model’s accuracy drops in production, how would you troubleshoot? How would you deal with imbalanced data? How do you explain your model to a non-technical stakeholder?

Ans.

Troubleshooting model accuracy involves systematic checks, while imbalanced data requires specific techniques to address bias.

Monitor Data Drift: Check if the input data distribution has changed since the model was trained, which can affect accuracy.
Evaluate Model Performance: Use metrics like precision, recall, and F1-score to get a better understanding of model performance beyond accuracy.
Feature Importance Analysis: Identify if certain features have become less relevant or...read more

Asked in TCS

1d ago

Q. What are the differences between a list and a tuple?

Ans.

List and tuple are both data structures in Python used to store collections of items.

Lists are mutable, meaning their values can be changed after creation.
Tuples are immutable, meaning their values cannot be changed after creation.
Lists are defined using square brackets [], while tuples are defined using parentheses ().
Lists are typically used for collections of similar items, while tuples are used for collections of different items.
Example of a list: my_list = [1, 2, 3]
Examp...read more

Are these interview questions helpful?

Asked in Accenture

1d ago

Q. What are data structures?

Ans.

Data structures are ways of organizing and storing data in a computer so that it can be accessed and used efficiently.

Data structures can be linear or non-linear
Examples of linear data structures include arrays, linked lists, and stacks
Examples of non-linear data structures include trees and graphs
Choosing the right data structure is important for optimizing performance

Asked in Microsoft Corporation

4d ago

Q. How does regression work?

Ans.

Regression is a statistical method used to establish a relationship between a dependent variable and one or more independent variables.

Regression helps to predict the value of the dependent variable based on the values of the independent variables.
It involves fitting a line or curve to the data points to minimize the difference between the predicted and actual values.
There are different types of regression such as linear regression, logistic regression, polynomial regression,...read more

Data Science Jobs

AI/ML Engineer - Data Science • 6-9 years

Optum

•

4.0

Noida

Data science + Gen AI (SH) • 3-5 years

Infosys

•

3.6

Noida

Data Science/Gen ai- Bangalore (Pan India Infosys) • 3-8 years

Infosys

•

3.6

Hyderabad / Secunderabad

View all Data Science jobs

Asked in Walmart

4d ago

Q. What is the difference between the HAVING and WHERE clauses?

Ans.

HAVING is used with GROUP BY to filter groups, WHERE is used to filter rows

HAVING is used with GROUP BY to filter groups based on aggregate functions
WHERE is used to filter rows based on conditions
HAVING is applied after GROUP BY, WHERE is applied before GROUP BY
Example: SELECT department, AVG(salary) FROM employees GROUP BY department HAVING AVG(salary) > 50000;
Example: SELECT * FROM employees WHERE age > 30;

Asked in TCS

2d ago

Q. What is object-oriented programming?

Ans.

Object-oriented programming is a programming paradigm based on the concept of objects, which can contain data and code.

Objects are instances of classes, which define the structure and behavior of the objects.
Encapsulation, inheritance, and polymorphism are key principles of object-oriented programming.
Examples include Java, C++, and Python, which are popular object-oriented programming languages.

Share interview questions and help millions of jobseekers 🌟

Asked in HDFC Bank

5d ago

Q. Write an SQL query for bank transactions.

Ans.

SQL query to retrieve bank transactions data.

Use SELECT statement to retrieve data from transaction table.
Filter data based on account number or transaction date.
Group data by transaction type or amount for analysis.

Asked in Invest4Edu

1d ago

Q. How can you swap two numbers without using a third variable?

Ans.

Swap two numbers using arithmetic operations without a third variable.

Use addition and subtraction: a = a + b; b = a - b; a = a - b.
Example: If a = 5 and b = 3, after operations, a = 3 and b = 5.
Use XOR bitwise operation: a = a ^ b; b = a ^ b; a = a ^ b.
Example: If a = 5 (101) and b = 3 (011), after operations, a = 3 and b = 5.

Asked in Deloitte

3d ago

Q. Explain how a neural network works.

Ans.

Neural networks are a type of machine learning algorithm inspired by the human brain, consisting of interconnected nodes that process information.

Neural networks consist of layers of interconnected nodes, with each node performing a simple mathematical operation.
Information is passed through the network via weighted connections between nodes, with the weights adjusted during training to optimize performance.
Neural networks are trained using labeled data to learn patterns and ...read more

Asked in Deloitte

6d ago

Q. Different type of metrics for regression

Ans.

Different metrics for evaluating regression models

Mean Squared Error (MSE)
Root Mean Squared Error (RMSE)
Mean Absolute Error (MAE)
R-squared (Coefficient of Determination)

Asked in Six Red Marbles

6d ago

Q. What is the central limit theorem?

Ans.

Central Limit Theorem states that the sampling distribution of the sample mean approaches a normal distribution as the sample size increases.

The theorem states that regardless of the shape of the population distribution, the sampling distribution of the sample mean will be approximately normally distributed.
It is a fundamental concept in statistics and is used in hypothesis testing and confidence intervals.
For example, if you take multiple samples of a population and calculat...read more

Asked in Tiger Analytics

6d ago

Q. Explain any ML model in depth.

Ans.

A machine learning model is a mathematical model that learns from data to make predictions or decisions without being explicitly programmed.

ML models can be classified into categories such as supervised learning, unsupervised learning, and reinforcement learning.
Examples of ML models include linear regression, decision trees, support vector machines, and neural networks.
ML models require training data to learn patterns and relationships, and testing data to evaluate their per...read more

Asked in Ozibook

3d ago

Q. What is data preprocessing?

Ans.

Data preprocessing is the process of cleaning, transforming, and organizing raw data before analysis.

Removing irrelevant or duplicate data
Handling missing values
Normalizing or standardizing data
Encoding categorical variables
Feature scaling
Data transformation (e.g. log transformation)
Data reduction (e.g. PCA)
Handling outliers

Asked in Capgemini

1d ago

Q. What are the OOP concepts?

Ans.

OOP concepts are foundational principles in programming that enable code reusability and organization through objects and classes.

Encapsulation: Bundling data and methods that operate on the data within one unit (e.g., a class).
Inheritance: Mechanism where a new class derives properties and behavior from an existing class (e.g., a 'Dog' class inheriting from an 'Animal' class).
Polymorphism: Ability to present the same interface for different data types (e.g., a function that ...read more

Asked in Infosys

5d ago

Q. What is Machine Learning?

Ans.

Machine learning is a subset of artificial intelligence that enables systems to learn from data and improve over time without explicit programming.

Machine learning algorithms can be supervised (e.g., predicting house prices) or unsupervised (e.g., clustering customers).
Common algorithms include decision trees, neural networks, and support vector machines.
Applications range from image recognition (e.g., facial recognition) to natural language processing (e.g., chatbots).
Machin...read more

Asked in Chetu

3d ago

Q. What is boosting?

Ans.

Boosting is an ensemble learning technique that combines weak learners to create a strong predictive model.

Boosting sequentially trains models, each focusing on the errors of the previous one.
Common algorithms include AdaBoost, Gradient Boosting, and XGBoost.
In AdaBoost, misclassified instances are given more weight in subsequent models.
Gradient Boosting minimizes a loss function by adding models that correct previous errors.
XGBoost is an optimized version of gradient boostin...read more

Asked in TCS

2d ago

Q. What is clustering?

Ans.

A cluster is a group of data points or objects that are similar to each other within the group and dissimilar to data points in other groups.

Clusters are formed based on the similarity of data points within the group.
Clustering is an unsupervised learning technique used in data science.
Examples of clustering algorithms include K-means, hierarchical clustering, and DBSCAN.

Asked in Wipro

2d ago

Q. Write code to determine if a number is prime.

Ans.

This code checks if a number is prime by testing divisibility from 2 to the square root of the number.

A prime number is greater than 1 and has no divisors other than 1 and itself.
To check if a number n is prime, test divisibility from 2 to √n.
If n is divisible by any number in this range, it is not prime.
Example: 5 is prime (divisors: 1, 5), but 4 is not (divisors: 1, 2, 4).

Asked in FirstCry

6d ago

Q. Explain cosine similarity.

Ans.

Cosine similarity is a measure of similarity between two non-zero vectors of an inner product space.

It measures the cosine of the angle between two vectors.
Values range from -1 (completely opposite) to 1 (identical), with 0 indicating orthogonality.
Used in recommendation systems, text mining, and clustering algorithms.

Asked in Accenture

1d ago

Q. Explain your project.

Ans.

Developed a machine learning model to predict customer churn for a telecommunications company.

Collected and cleaned customer data including demographics, usage patterns, and customer service interactions.
Used classification algorithms such as logistic regression and random forest to build the predictive model.
Evaluated model performance using metrics like accuracy, precision, recall, and ROC curve.
Implemented the model into the company's CRM system to identify at-risk custome...read more