Jr. Data Scientist

10+ Jr. Data Scientist Interview Questions and Answers for Freshers

Updated 9 Jul 2025

Q. What are the differences between Left and Right Join?

Ans.

Left join returns all records from left table and matching records from right table. Right join returns all records from right table and matching records from left table.

Left join keeps all records from the left table and only matching records from the right table
Right join keeps all records from the right table and only matching records from the left table
Left join is denoted by LEFT JOIN keyword in SQL
Right join is denoted by RIGHT JOIN keyword in SQL
Left join is useful whe...read more

Asked in TCS

3d ago

Q. What experience do you have in model deployment?

Ans.

I have experience deploying machine learning models using cloud services like AWS SageMaker and Azure ML.

Deployed a sentiment analysis model on AWS SageMaker for real-time predictions
Deployed a recommendation system model on Azure ML for batch predictions
Used Docker containers to deploy models in production environments

Q. Explain different KPIs of a Classification Model.

Ans.

KPIs of Classification Model

Accuracy: measures the proportion of correct predictions
Precision: measures the proportion of true positives among predicted positives
Recall: measures the proportion of true positives among actual positives
F1 Score: harmonic mean of precision and recall
ROC Curve: plots true positive rate against false positive rate
Confusion Matrix: summarizes the performance of a classification model

Q. Underlying process of boosting and Decision tree

Ans.

Boosting is an ensemble learning technique that combines multiple weak learners to create a strong learner, often using decision trees.

Boosting is an iterative process where each weak learner is trained to correct the errors of the previous ones.
Decision trees are commonly used as the base learner in boosting algorithms like AdaBoost and Gradient Boosting.
Boosting algorithms like XGBoost and LightGBM are popular in machine learning for their high predictive accuracy.

Are these interview questions helpful?

Asked in Worley

1d ago

Q. What is a decision tree?

Ans.

A decision tree is a flowchart-like structure in which each internal node represents a test on an attribute, each branch represents the outcome of the test, and each leaf node represents a class label.

Decision tree is a popular machine learning algorithm used for classification and regression tasks.
It breaks down a dataset into smaller subsets based on different attributes and creates a tree-like structure to make decisions.
Each internal node of the tree represents a test on ...read more

Asked in Techolution

4d ago

Q. What are transformers?

Ans.

Transformers are models used in natural language processing (NLP) that learn contextual relationships between words.

Transformers use self-attention mechanisms to weigh the importance of different words in a sentence.
They have revolutionized NLP tasks such as language translation, sentiment analysis, and text generation.
Examples of transformer models include BERT, GPT-3, and RoBERTa.

Jr. Data Scientist Jobs

Junior Data Scientist • 0-3 years

Cyber Infrastructure

•

3.5

Indore

Junior Data Scientist • 1-4 years

DIATOZ SOLUTIONS PVT LTD

•

4.0

Gurgaon / Gurugram

Junior Data Scientist - Python/Machine Learning (2-4 yrs) • 2-4 years

Diatoz Solutions Pvt Ltd

•

4.0

View all Jr. Data Scientist jobs

Asked in Deloitte

6d ago

Q. Explain logistic regression.

Ans.

Logistic regression is a statistical model used to predict the probability of a binary outcome based on one or more predictor variables.

Logistic regression is used when the dependent variable is binary (0/1, True/False, Yes/No, etc.).
It estimates the probability that a given input belongs to a particular category.
The output of logistic regression is a probability score between 0 and 1.
It uses the logistic function (sigmoid function) to map the input to the output.
Example: Pre...read more

Asked in AB InBev India

5d ago

Q. mean median mode on distribution curve

Ans.

Mean, median, and mode are measures of central tendency on a distribution curve.

Mean is the average of all the values in the distribution.
Median is the middle value when the data is arranged in ascending order.
Mode is the value that appears most frequently in the distribution.
For example, in a distribution of [2, 3, 3, 4, 5], the mean is 3.4, the median is 3, and the mode is 3.

Share interview questions and help millions of jobseekers 🌟

Asked in Accenture

1d ago

Q. What is hyperparameter tuning?

Ans.

Hyperparameter tuning is the process of selecting the best set of hyperparameters for a machine learning model.

Hyperparameters are parameters that are set before the learning process begins, such as learning rate, number of hidden layers, etc.
Hyperparameter tuning involves trying out different combinations of hyperparameters to find the ones that result in the best model performance.
Techniques for hyperparameter tuning include grid search, random search, and Bayesian optimiza...read more

Asked in TCS

5d ago

Q. What is linear regression?

Ans.

Linear regression is a statistical method used to model the relationship between a dependent variable and one or more independent variables.

Linear regression is used to predict the value of a dependent variable based on the value of one or more independent variables.
It assumes a linear relationship between the independent and dependent variables.
The goal of linear regression is to find the best-fitting line that represents the relationship between the variables.
The equation f...read more

Asked in Landmark Group

5d ago

Q. What is logistic regression?

Ans.

Logistic regression is a statistical model used to predict the probability of a binary outcome based on one or more predictor variables.

Logistic regression is used when the dependent variable is binary (e.g., 0 or 1, yes or no).
It estimates the probability that a given input belongs to a certain category.
It uses the logistic function to model the relationship between the dependent variable and independent variables.
Coefficients in logistic regression represent the impact of t...read more

Asked in Anblicks

6d ago

Q. What is the bias-variance tradeoff?

Ans.

Bias-variance tradeoff is the balance between underfitting (high bias) and overfitting (high variance) in machine learning models.

Bias is error from erroneous assumptions in the learning algorithm, leading to underfitting.
Variance is error from sensitivity to fluctuations in the training data, leading to overfitting.
Finding the right balance between bias and variance is crucial for optimal model performance.
Regularization techniques like Lasso and Ridge regression can help in...read more