Data Science

20+ Data Science Interview Questions and Answers

Updated 6 Jul 2025
search-icon
5d ago

Q. What are Python’s key features? Explain list vs. tuple vs. set vs. dictionary. How do you handle missing data in a dataset? What is the difference between apply() and map() in pandas? What are lambda functions...

read more
Ans.

Python is a versatile programming language known for its simplicity, readability, and extensive libraries for data science.

  • Easy to Learn: Python's syntax is clear and intuitive, making it accessible for beginners and experienced programmers alike.

  • Extensive Libraries: Python has a rich ecosystem of libraries like NumPy, Pandas, and Matplotlib for data manipulation and analysis.

  • Cross-Platform: Python runs on various operating systems, including Windows, macOS, and Linux, ensuri...read more

6d ago

Q. What is the Central Limit Theorem? Explain p-value and its importance. What are Type I and Type II errors? Describe a normal distribution and its properties. What is correlation vs. causation?

Ans.

The Central Limit Theorem states that the distribution of sample means approaches normality as sample size increases.

  • Central Limit Theorem (CLT): Regardless of the population distribution, the sampling distribution of the mean will be approximately normal if the sample size is large enough (n > 30).

  • P-Value: The p-value measures the strength of evidence against the null hypothesis; a low p-value (typically < 0.05) indicates strong evidence to reject the null hypothesis.

  • Type I ...read more

Data Science Interview Questions and Answers for Freshers

illustration image
3d ago

Q. If your model’s accuracy drops in production, how would you troubleshoot? How would you deal with imbalanced data? How do you explain your model to a non-technical stakeholder?

Ans.

Troubleshooting model accuracy involves systematic checks, while imbalanced data requires specific techniques to address bias.

  • Monitor Data Drift: Check if the input data distribution has changed since the model was trained, which can affect accuracy.

  • Evaluate Model Performance: Use metrics like precision, recall, and F1-score to get a better understanding of model performance beyond accuracy.

  • Feature Importance Analysis: Identify if certain features have become less relevant or...read more

Asked in TCS

1d ago

Q. What are the differences between a list and a tuple?

Ans.

List and tuple are both data structures in Python used to store collections of items.

  • Lists are mutable, meaning their values can be changed after creation.

  • Tuples are immutable, meaning their values cannot be changed after creation.

  • Lists are defined using square brackets [], while tuples are defined using parentheses ().

  • Lists are typically used for collections of similar items, while tuples are used for collections of different items.

  • Example of a list: my_list = [1, 2, 3]

  • Examp...read more

Are these interview questions helpful?

Asked in Accenture

1d ago

Q. What are data structures?

Ans.

Data structures are ways of organizing and storing data in a computer so that it can be accessed and used efficiently.

  • Data structures can be linear or non-linear

  • Examples of linear data structures include arrays, linked lists, and stacks

  • Examples of non-linear data structures include trees and graphs

  • Choosing the right data structure is important for optimizing performance

4d ago

Q. How does regression work?

Ans.

Regression is a statistical method used to establish a relationship between a dependent variable and one or more independent variables.

  • Regression helps to predict the value of the dependent variable based on the values of the independent variables.

  • It involves fitting a line or curve to the data points to minimize the difference between the predicted and actual values.

  • There are different types of regression such as linear regression, logistic regression, polynomial regression,...read more

Data Science Jobs

Optum logo
AI/ML Engineer - Data Science 6-9 years
Optum
4.0
Noida
Infosys logo
Data science + Gen AI (SH) 3-5 years
Infosys
3.6
Noida
Infosys logo
Data Science/Gen ai- Bangalore (Pan India Infosys) 3-8 years
Infosys
3.6
Hyderabad / Secunderabad

Asked in Walmart

4d ago

Q. What is the difference between the HAVING and WHERE clauses?

Ans.

HAVING is used with GROUP BY to filter groups, WHERE is used to filter rows

  • HAVING is used with GROUP BY to filter groups based on aggregate functions

  • WHERE is used to filter rows based on conditions

  • HAVING is applied after GROUP BY, WHERE is applied before GROUP BY

  • Example: SELECT department, AVG(salary) FROM employees GROUP BY department HAVING AVG(salary) > 50000;

  • Example: SELECT * FROM employees WHERE age > 30;

Asked in TCS

2d ago

Q. What is object-oriented programming?

Ans.

Object-oriented programming is a programming paradigm based on the concept of objects, which can contain data and code.

  • Objects are instances of classes, which define the structure and behavior of the objects.

  • Encapsulation, inheritance, and polymorphism are key principles of object-oriented programming.

  • Examples include Java, C++, and Python, which are popular object-oriented programming languages.

Share interview questions and help millions of jobseekers 🌟

man-with-laptop

Asked in HDFC Bank

5d ago

Q. Write an SQL query for bank transactions.

Ans.

SQL query to retrieve bank transactions data.

  • Use SELECT statement to retrieve data from transaction table.

  • Filter data based on account number or transaction date.

  • Group data by transaction type or amount for analysis.

Asked in Invest4Edu

1d ago

Q. How can you swap two numbers without using a third variable?

Ans.

Swap two numbers using arithmetic operations without a third variable.

  • Use addition and subtraction: a = a + b; b = a - b; a = a - b.

  • Example: If a = 5 and b = 3, after operations, a = 3 and b = 5.

  • Use XOR bitwise operation: a = a ^ b; b = a ^ b; a = a ^ b.

  • Example: If a = 5 (101) and b = 3 (011), after operations, a = 3 and b = 5.

Asked in Deloitte

3d ago

Q. Explain how a neural network works.

Ans.

Neural networks are a type of machine learning algorithm inspired by the human brain, consisting of interconnected nodes that process information.

  • Neural networks consist of layers of interconnected nodes, with each node performing a simple mathematical operation.

  • Information is passed through the network via weighted connections between nodes, with the weights adjusted during training to optimize performance.

  • Neural networks are trained using labeled data to learn patterns and ...read more

Asked in Deloitte

6d ago

Q. Different type of metrics for regression

Ans.

Different metrics for evaluating regression models

  • Mean Squared Error (MSE)

  • Root Mean Squared Error (RMSE)

  • Mean Absolute Error (MAE)

  • R-squared (Coefficient of Determination)

6d ago

Q. What is the central limit theorem?

Ans.

Central Limit Theorem states that the sampling distribution of the sample mean approaches a normal distribution as the sample size increases.

  • The theorem states that regardless of the shape of the population distribution, the sampling distribution of the sample mean will be approximately normally distributed.

  • It is a fundamental concept in statistics and is used in hypothesis testing and confidence intervals.

  • For example, if you take multiple samples of a population and calculat...read more

6d ago

Q. Explain any ML model in depth.

Ans.

A machine learning model is a mathematical model that learns from data to make predictions or decisions without being explicitly programmed.

  • ML models can be classified into categories such as supervised learning, unsupervised learning, and reinforcement learning.

  • Examples of ML models include linear regression, decision trees, support vector machines, and neural networks.

  • ML models require training data to learn patterns and relationships, and testing data to evaluate their per...read more

Asked in Ozibook

3d ago

Q. What is data preprocessing?

Ans.

Data preprocessing is the process of cleaning, transforming, and organizing raw data before analysis.

  • Removing irrelevant or duplicate data

  • Handling missing values

  • Normalizing or standardizing data

  • Encoding categorical variables

  • Feature scaling

  • Data transformation (e.g. log transformation)

  • Data reduction (e.g. PCA)

  • Handling outliers

Asked in Capgemini

1d ago

Q. What are the OOP concepts?

Ans.

OOP concepts are foundational principles in programming that enable code reusability and organization through objects and classes.

  • Encapsulation: Bundling data and methods that operate on the data within one unit (e.g., a class).

  • Inheritance: Mechanism where a new class derives properties and behavior from an existing class (e.g., a 'Dog' class inheriting from an 'Animal' class).

  • Polymorphism: Ability to present the same interface for different data types (e.g., a function that ...read more

Asked in Infosys

5d ago

Q. What is Machine Learning?

Ans.

Machine learning is a subset of artificial intelligence that enables systems to learn from data and improve over time without explicit programming.

  • Machine learning algorithms can be supervised (e.g., predicting house prices) or unsupervised (e.g., clustering customers).

  • Common algorithms include decision trees, neural networks, and support vector machines.

  • Applications range from image recognition (e.g., facial recognition) to natural language processing (e.g., chatbots).

  • Machin...read more

Asked in Chetu

3d ago

Q. What is boosting?

Ans.

Boosting is an ensemble learning technique that combines weak learners to create a strong predictive model.

  • Boosting sequentially trains models, each focusing on the errors of the previous one.

  • Common algorithms include AdaBoost, Gradient Boosting, and XGBoost.

  • In AdaBoost, misclassified instances are given more weight in subsequent models.

  • Gradient Boosting minimizes a loss function by adding models that correct previous errors.

  • XGBoost is an optimized version of gradient boostin...read more

Asked in TCS

2d ago

Q. What is clustering?

Ans.

A cluster is a group of data points or objects that are similar to each other within the group and dissimilar to data points in other groups.

  • Clusters are formed based on the similarity of data points within the group.

  • Clustering is an unsupervised learning technique used in data science.

  • Examples of clustering algorithms include K-means, hierarchical clustering, and DBSCAN.

Asked in Wipro

2d ago

Q. Write code to determine if a number is prime.

Ans.

This code checks if a number is prime by testing divisibility from 2 to the square root of the number.

  • A prime number is greater than 1 and has no divisors other than 1 and itself.

  • To check if a number n is prime, test divisibility from 2 to √n.

  • If n is divisible by any number in this range, it is not prime.

  • Example: 5 is prime (divisors: 1, 5), but 4 is not (divisors: 1, 2, 4).

Asked in FirstCry

6d ago

Q. Explain cosine similarity.

Ans.

Cosine similarity is a measure of similarity between two non-zero vectors of an inner product space.

  • It measures the cosine of the angle between two vectors.

  • Values range from -1 (completely opposite) to 1 (identical), with 0 indicating orthogonality.

  • Used in recommendation systems, text mining, and clustering algorithms.

Asked in Accenture

1d ago

Q. Explain your project.

Ans.

Developed a machine learning model to predict customer churn for a telecommunications company.

  • Collected and cleaned customer data including demographics, usage patterns, and customer service interactions.

  • Used classification algorithms such as logistic regression and random forest to build the predictive model.

  • Evaluated model performance using metrics like accuracy, precision, recall, and ROC curve.

  • Implemented the model into the company's CRM system to identify at-risk custome...read more

Interview Experiences of Popular Companies

Deloitte Logo
3.7
 • 3k Interviews
ICICI Bank Logo
4.0
 • 2.6k Interviews
HDFC Bank Logo
3.9
 • 2.5k Interviews
Tata Motors Logo
4.1
 • 1.1k Interviews
Mu Sigma Logo
2.6
 • 240 Interviews
View all
interview tips and stories logo
Interview Tips & Stories
Ace your next interview with expert advice and inspiring stories

Calculate your in-hand salary

Confused about how your in-hand salary is calculated? Enter your annual salary (CTC) and get your in-hand salary

Data Science Interview Questions
Share an Interview
Stay ahead in your career. Get AmbitionBox app
play-icon
play-icon
qr-code
Trusted by over 1.5 Crore job seekers to find their right fit company
80 L+

Reviews

10L+

Interviews

4 Cr+

Salaries

1.5 Cr+

Users

Contribute to help millions

Made with ❤️ in India. Trademarks belong to their respective owners. All rights reserved © 2025 Info Edge (India) Ltd.

Follow Us
  • Youtube
  • Instagram
  • LinkedIn
  • Facebook
  • Twitter
Profile Image
Hello, Guest
AmbitionBox Employee Choice Awards 2025
Winners announced!
awards-icon
Contribute to help millions!
Write a review
Write a review
Share interview
Share interview
Contribute salary
Contribute salary
Add office photos
Add office photos
Add office benefits
Add office benefits