Senior Data Scientist

10+ Senior Data Scientist Interview Questions and Answers for Freshers

Updated 16 Jul 2025
search-icon

Asked in Feynn Labs

6d ago

Q. What is the difference between logistic and linear regression?

Ans.

Logistic regression is used for binary classification, while linear regression is used for predicting continuous values.

  • Logistic regression is a classification algorithm, while linear regression is a regression algorithm.

  • Logistic regression uses a logistic function to model the probability of the binary outcome.

  • Linear regression uses a linear function to model the relationship between the independent and dependent variables.

  • Logistic regression predicts discrete outcomes (e.g....read more

1d ago

Q. How is a random forest different from decision trees?

Ans.

Random forest is an ensemble learning method that uses multiple decision trees to improve prediction accuracy.

  • Random forest builds multiple decision trees and combines their predictions to reduce overfitting.

  • Decision trees are prone to overfitting and can be unstable, while random forest is more robust.

  • Random forest can handle missing values and categorical variables better than decision trees.

  • Example: Random forest can be used for predicting customer churn in a telecom compa...read more

6d ago

Q. What is the formula for logistic regression?

Ans.

The formula of logistic regression is a mathematical equation used to model the relationship between a binary dependent variable and one or more independent variables.

  • The formula is: log(odds) = β0 + β1x1 + β2x2 + ... + βnxn

  • The dependent variable is transformed using the logit function to obtain the log-odds ratio.

  • The independent variables are multiplied by their respective coefficients (β) and summed up with the intercept (β0).

  • The resulting value is then transformed back to ...read more

4d ago

Q. How do you measure the accuracy of a model?

Ans.

Model accuracy can be measured using metrics such as confusion matrix, ROC curve, and precision-recall curve.

  • Confusion matrix shows true positives, true negatives, false positives, and false negatives.

  • ROC curve plots true positive rate against false positive rate.

  • Precision-recall curve plots precision against recall.

  • Other metrics include accuracy, F1 score, and AUC-ROC.

  • Cross-validation can also be used to evaluate model performance.

Are these interview questions helpful?
1d ago

Q. What are specificity and sensitivity?

Ans.

Specificity and sensitivity are statistical measures used to evaluate the performance of a binary classification model.

  • Specificity measures the proportion of true negatives correctly identified by the model.

  • Sensitivity (also known as recall or true positive rate) measures the proportion of true positives correctly identified by the model.

  • Both measures are commonly used in medical diagnostics to assess the accuracy of tests or models.

  • Specificity and sensitivity are often used ...read more

3d ago

Q. What is an AUC-ROC curve?

Ans.

AUC-ROC curve is a graphical representation of the performance of a classification model.

  • AUC-ROC stands for Area Under the Receiver Operating Characteristic curve.

  • It is used to evaluate the performance of binary classification models.

  • The curve plots the true positive rate (sensitivity) against the false positive rate (1-specificity) at various classification thresholds.

  • AUC-ROC ranges from 0 to 1, with a higher value indicating better model performance.

  • An AUC-ROC of 0.5 repres...read more

Senior Data Scientist Jobs

Target Corporation India Pvt Ltd logo
Sr Data Scientist- Stores 3-4 years
Target Corporation India Pvt Ltd
4.2
₹ 33 L/yr - ₹ 34 L/yr
(AmbitionBox estimate)
Bangalore / Bengaluru
Target Corporation India Pvt Ltd logo
Sr Data Scientist- Operations Research 3-8 years
Target Corporation India Pvt Ltd
4.2
₹ 33 L/yr - ₹ 36 L/yr
(AmbitionBox estimate)
Bangalore / Bengaluru
Target Corporation India Pvt Ltd logo
Sr Data Scientist 4-9 years
Target Corporation India Pvt Ltd
4.2
₹ 34 L/yr - ₹ 40 L/yr
(AmbitionBox estimate)
Bangalore / Bengaluru

Q. What are the diifrent ML algoritham & Explain in details

Ans.

Various ML algorithms include linear regression, decision trees, random forests, support vector machines, and neural networks.

  • Linear Regression: Used for predicting continuous values based on input features.

  • Decision Trees: Tree-like model of decisions used for classification and regression.

  • Random Forests: Ensemble learning method using multiple decision trees for improved accuracy.

  • Support Vector Machines: Classify data by finding the hyperplane that best separates different c...read more

6d ago

Q. What is a t-test?

Ans.

t-test is a statistical test used to determine if there is a significant difference between the means of two groups.

  • It compares the means of two groups and assesses if the difference is statistically significant.

  • It is commonly used in hypothesis testing and comparing the effectiveness of different treatments or interventions.

  • There are different types of t-tests, such as independent samples t-test and paired samples t-test.

  • The t-test calculates a t-value and p-value, where the...read more

Share interview questions and help millions of jobseekers 🌟

man-with-laptop

Asked in Feynn Labs

3d ago

Q. What is a random forest?

Ans.

A random forest is an ensemble learning method that combines multiple decision trees to make predictions.

  • Random forest is a supervised learning algorithm.

  • It can be used for both classification and regression tasks.

  • It creates multiple decision trees and combines their predictions to make a final prediction.

  • Each decision tree is trained on a random subset of the training data and features.

  • Random forest reduces overfitting and improves accuracy compared to a single decision tree...read more

Asked in TCS

5d ago

Q. What is linear regression?

Ans.

Linear regression is a statistical method used to model the relationship between two variables.

  • It assumes a linear relationship between the dependent and independent variables.

  • It is used to predict the value of the dependent variable based on the value of the independent variable.

  • It can be used for both simple and multiple regression analysis.

  • Example: predicting the price of a house based on its size or predicting the salary of an employee based on their years of experience.

6d ago

Q. What is logistic regression?

Ans.

Logistic regression is a statistical method used to analyze and model the relationship between a binary dependent variable and one or more independent variables.

  • It is used to predict the probability of a binary outcome (0 or 1).

  • It is a type of regression analysis that uses a logistic function to model the relationship between the dependent and independent variables.

  • It is commonly used in machine learning and data analysis for classification problems.

  • Example: predicting whethe...read more

Asked in Target

1d ago

Q. Given two lists, find the overlapping elements. Follow-up: How would you handle duplicates? How can you optimize for space complexity?

Ans.

Identify overlapping elements in two lists with duplicates while minimizing space complexity.

  • Use a hash map to count occurrences of elements in both lists.

  • Iterate through the first list, checking counts in the hash map for overlaps.

  • Example: For lists [1, 2, 2, 3] and [2, 2, 4], the overlap is [2, 2].

  • Consider using a set to track seen elements for reduced space usage.

  • Example: For lists [1, 2, 3] and [3, 4, 5], the overlap is [3].

4d ago

Q. What is a z-test?

Ans.

A z-test is a statistical test used to determine whether two population means are significantly different from each other.

  • It is used when the sample size is large and the population standard deviation is known.

  • The test compares the sample mean to the population mean using the z-score formula.

  • The z-score is calculated as the difference between the sample mean and population mean divided by the standard deviation.

  • If the calculated z-score falls within the critical region, the n...read more

Asked in TCS

5d ago

Q. What are the different data types in Python?

Ans.

Python has various data types including int, float, str, list, tuple, dict, set, bool, and more.

  • int - integer numbers (e.g. 5)

  • float - floating point numbers (e.g. 3.14)

  • str - strings (e.g. 'hello')

  • list - ordered collection of items (e.g. [1, 2, 3])

  • tuple - ordered collection of items that cannot be changed (e.g. (1, 2, 3))

  • dict - collection of key-value pairs (e.g. {'name': 'John', 'age': 30})

  • set - unordered collection of unique items (e.g. {1, 2, 3})

  • bool - boolean values True o...read more

Asked in Wabtec

6d ago

Q. Current project on ocr. Validation and testing

Ans.

Currently working on OCR project focusing on validation and testing

  • Developing validation strategies to ensure accuracy of OCR results

  • Creating test cases to evaluate OCR performance under different conditions

  • Utilizing ground truth data for benchmarking OCR accuracy

  • Implementing error analysis techniques to identify and address common OCR mistakes

Asked in Target

5d ago

Q. Data Structures and Algorithms - longest common prefix

Ans.

Find the longest common prefix among an array of strings.

  • Iterate through the characters of the first string.

  • Compare each character with the corresponding character in other strings.

  • Stop when a mismatch is found or the end of any string is reached.

  • Example: For ['flower', 'flow', 'flight'], the longest common prefix is 'fl'.

  • If no common prefix exists, return an empty string.

Asked in Wabtec

4d ago

Q. Transformer architecture and working.

Ans.

Transformer architecture is a deep learning model that utilizes self-attention mechanism for sequence processing.

  • Transformer architecture is based on self-attention mechanism, allowing the model to weigh the importance of different input tokens when making predictions.

  • It consists of an encoder and a decoder, with multiple layers of multi-head self-attention and feedforward neural networks.

  • Transformers have been widely used in natural language processing tasks, such as machine...read more

Interview Experiences of Popular Companies

TCS Logo
3.6
 • 11.1k Interviews
Accenture Logo
3.7
 • 8.7k Interviews
Wipro Logo
3.7
 • 6.1k Interviews
Deloitte Logo
3.7
 • 3k Interviews
EXL Service Logo
3.7
 • 806 Interviews
View all
Interview Tips & Stories
Interview Tips & Stories
Ace your next interview with expert advice and inspiring stories
Senior Data Scientist Interview Questions
Share an Interview
Stay ahead in your career. Get AmbitionBox app
play-icon
play-icon
qr-code
Trusted by over 1.5 Crore job seekers to find their right fit company
80 L+

Reviews

10L+

Interviews

4 Cr+

Salaries

1.5 Cr+

Users

Contribute to help millions

Made with ❤️ in India. Trademarks belong to their respective owners. All rights reserved © 2025 Info Edge (India) Ltd.

Follow Us
  • Youtube
  • Instagram
  • LinkedIn
  • Facebook
  • Twitter
Profile Image
Hello, Guest
AmbitionBox Employee Choice Awards 2025
Winners announced!
awards-icon
Contribute to help millions!
Write a review
Write a review
Share interview
Share interview
Contribute salary
Contribute salary
Add office photos
Add office photos
Add office benefits
Add office benefits