Data Scientist

800+ Data Scientist Interview Questions and Answers

Updated 28 Feb 2025
search-icon

Q151. What is Reinforcement Learning ? How did you use it in creating Automated Trading Systems ?

Ans.

Reinforcement Learning is a type of machine learning where an agent learns to make decisions by interacting with an environment.

  • Reinforcement Learning involves an agent learning to take actions in an environment to maximize some notion of cumulative reward.

  • It uses a trial and error approach, where the agent learns from the consequences of its actions.

  • In Automated Trading Systems, Reinforcement Learning can be used to optimize trading strategies by learning from past market da...read more

Q152. What is the difference between Adam optimizer and Gradient Descent Optimizer?

Ans.

Adam optimizer is an extension to the Gradient Descent optimizer with adaptive learning rates and momentum.

  • Adam optimizer combines the benefits of both AdaGrad and RMSProp optimizers.

  • Adam optimizer uses adaptive learning rates for each parameter.

  • Gradient Descent optimizer has a fixed learning rate for all parameters.

  • Adam optimizer includes momentum to speed up convergence.

  • Gradient Descent optimizer updates parameters based on the gradient of the entire dataset.

  • Example: Adam o...read more

Q153. How many weights are there in the given neural network layer?

Ans.

The number of weights in a neural network layer depends on the number of neurons in the layer and the number of neurons in the previous layer.

  • Number of weights = (number of neurons in current layer) * (number of neurons in previous layer)

  • Each neuron in the current layer is connected to each neuron in the previous layer, hence a weight for each connection

  • For example, if a layer has 5 neurons and the previous layer has 3 neurons, there would be 5*3 = 15 weights

Q154. isolatn forest work? evalution metrics in laymann tems , pyspark basics , job lib

Ans.

Isolation Forest is an anomaly detection algorithm that works by isolating outliers in a dataset.

  • Isolation Forest is an unsupervised machine learning algorithm used for anomaly detection.

  • It works by randomly selecting a feature and then randomly selecting a split value between the maximum and minimum values of the selected feature.

  • The number of splits required to isolate an outlier is used as a measure of its abnormality.

  • Evaluation metrics for Isolation Forest in layman's ter...read more

Are these interview questions helpful?

Q155. Why we use bias in neural networks? Why do we scale our data in neural networks?

Ans.

Bias in neural networks helps in capturing the underlying patterns in data. Scaling data helps in improving convergence and performance.

  • Bias in neural networks helps in shifting the activation function to better fit the data.

  • It allows the model to capture the underlying patterns in the data by providing flexibility in the decision boundary.

  • Scaling data helps in improving convergence by ensuring that the gradients are of similar scale.

  • It also helps in improving the performance...read more

Q156. What is learning factor Assumptions for linear regressoon Model evaluation Bias variance trade off Gradient descent Auc Roc curve Python function writing (string based)

Ans.

Learning factor, assumptions for linear regression, model evaluation, bias-variance trade off, gradient descent, AUC-ROC curve, Python function writing.

  • Learning factor refers to the rate at which a model learns from the data, often used in gradient descent algorithms.

  • Assumptions for linear regression include linearity, independence, homoscedasticity, and normality of residuals.

  • Model evaluation involves metrics like mean squared error, R-squared, and cross-validation technique...read more

Share interview questions and help millions of jobseekers 🌟

man-with-laptop

Q157. If minimal data, which would you train for categorical prediction model?

Ans.

I would train a decision tree model as it can handle categorical data well with minimal data.

  • Decision tree models are suitable for categorical prediction with minimal data

  • They can handle both numerical and categorical data

  • Decision trees are easy to interpret and visualize

  • Examples: predicting customer churn, classifying spam emails

Q158. If you have to sell a kids account, how will you Target the customer. What features will you build?

Ans.

To target customers for a kids account, focus on features like parental controls, educational content, and interactive games.

  • Implement parental controls to assure parents of child safety online.

  • Include educational content to attract parents looking for learning opportunities.

  • Incorporate interactive games to engage children and make the account more appealing.

  • Offer rewards or incentives for completing educational activities to encourage usage.

  • Provide progress tracking features...read more

Data Scientist Jobs

Data Scientist - 7+ Exp 7-12 years
Ericsson India Global Services Pvt. Ltd.
4.1
Noida
Data Scientist 3-5 years
Red Hat India Pvt Ltd
4.3
Bangalore / Bengaluru
Data Scientist 8-12 years
Ericsson India Global Services Pvt. Ltd.
4.1
Noida

Q159. What different splitting criterion are applied in decision tree. Why random forest works better ?

Ans.

Different splitting criteria in decision trees include Gini impurity, entropy, and misclassification error. Random forest works better due to ensemble learning and reducing overfitting.

  • Splitting criteria in decision trees: Gini impurity, entropy, misclassification error

  • Random forest works better due to ensemble learning and reducing overfitting

  • Random forest combines multiple decision trees to improve accuracy and generalization

  • Random forest introduces randomness in feature se...read more

Q160. What is your current CTC and what is your expected CTC?

Ans.

I am currently earning X amount and looking for a competitive offer based on my skills and experience.

  • Be honest about your current salary

  • Research the market rates for Data Scientists in your area

  • Consider your experience, skills, and the value you bring to the company when stating your expected CTC

Q161. what is bais and variance. tell me assumption consider for different machine learning algorithm. explain architecture of CNN.

Ans.

Bias and variance are two sources of error in machine learning models. Different machine learning algorithms have different assumptions. CNN architecture consists of convolutional layers, pooling layers, and fully connected layers.

  • Bias is the error due to overly simplistic assumptions in the model, while variance is the error due to overly complex assumptions.

  • Different machine learning algorithms have different assumptions, such as linear regression assuming a linear relation...read more

Q162. Explain the EDA, features and metrics used in the assignment.

Ans.

EDA involved exploratory analysis of data to identify patterns and insights. Features included demographic and behavioral data. Metrics used were accuracy, precision, recall, and F1 score.

  • EDA involved data cleaning, visualization, and statistical analysis

  • Features included age, gender, income, education, and purchase history

  • Metrics used were accuracy, precision, recall, and F1 score to evaluate model performance

  • Exploratory analysis revealed a correlation between age and purcha...read more

Q163. How do you work towards a random forest?

Ans.

To work towards a random forest, you need to gather and preprocess data, select features, train individual decision trees, and combine them into an ensemble.

  • Gather and preprocess data from various sources

  • Select relevant features for the model

  • Train individual decision trees using the data

  • Combine the decision trees into an ensemble

  • Evaluate the performance of the random forest model

Q164. What is about machine learning, what is known about pandas

Ans.

Pandas is a Python library used for data manipulation and analysis. Machine learning is a subset of artificial intelligence.

  • Pandas is used for data cleaning, preparation, and analysis

  • It provides data structures like DataFrame and Series

  • Machine learning involves training models to make predictions or decisions based on data

  • Supervised learning, unsupervised learning, and reinforcement learning are common types of machine learning

  • Examples of machine learning applications include...read more

Q165. What is cross validation explain in layman term What is precision What is confusion Matrix draw with dummy data

Ans.

Cross validation is a technique to evaluate the performance of a model by splitting the data into multiple subsets.

  • Cross validation helps in assessing how well a model generalizes to new data.

  • It involves splitting the data into training and testing sets multiple times to get a more reliable estimate of model performance.

  • Common types of cross validation include k-fold cross validation and leave-one-out cross validation.

Q166. write python program to find employees who belongs to sales department

Ans.

Python program to find employees in sales department

  • Use a list of dictionaries to store employee data with 'department' key

  • Iterate through the list and check if 'department' key has value 'sales'

  • Return a list of employees who belong to sales department

Q167. Asci value along with alphabets(both capital and small)

Ans.

The ASCII value is a numerical representation of a character. It includes both capital and small alphabets.

  • ASCII values range from 65 to 90 for capital letters A to Z.

  • ASCII values range from 97 to 122 for small letters a to z.

  • For example, the ASCII value of 'A' is 65 and the ASCII value of 'a' is 97.

Q168. Have you worked on customer segmentation?

Ans.

Yes, I have worked on customer segmentation.

  • I have used clustering algorithms like K-means and hierarchical clustering to segment customers based on their behavior and demographics.

  • I have also used decision trees and random forests to identify the most important features for segmentation.

  • I have experience with both supervised and unsupervised learning techniques for customer segmentation.

  • I have worked on projects where the goal was to identify high-value customers, churn pred...read more

Q169. What is the code to determine and print a happy number?

Ans.

A happy number is a number which eventually reaches 1 when replaced by the sum of the square of each digit.

  • Create a function to determine if a number is happy by repeatedly squaring the digits and summing them until the result is 1 or a cycle is detected.

  • Use a set to keep track of seen numbers to detect cycles.

  • Example: For number 19, the process would be 1^2 + 9^2 = 82, 8^2 + 2^2 = 68, 6^2 + 8^2 = 100, 1^2 + 0^2 + 0^2 = 1, so 19 is a happy number.

Q170. What is the transformer architecture in the context of neural networks?

Ans.

Transformer architecture is a type of neural network architecture commonly used in natural language processing tasks.

  • Utilizes self-attention mechanism to weigh the importance of different words in a sentence

  • Consists of encoder and decoder layers for tasks like machine translation

  • Introduced by the paper 'Attention is All You Need' by Vaswani et al.

  • Popular implementations include BERT, GPT, and TransformerXL

Q171. How do you approach the project if you are using logistic regression model?

Ans.

Approach involves data preprocessing, model training, evaluation, and interpretation.

  • Perform data preprocessing such as handling missing values, encoding categorical variables, and scaling features.

  • Split the data into training and testing sets.

  • Train the logistic regression model on the training data.

  • Evaluate the model using metrics like accuracy, precision, recall, and F1 score.

  • Interpret the model coefficients to understand the impact of features on the target variable.

  • Iterat...read more

Q172. What are the two schools of thoughts in statistics?

Ans.

The two schools of thoughts in statistics are frequentist and Bayesian.

  • Frequentist approach focuses on using sample data to make inferences about a population, based on the frequency of events.

  • Bayesian approach incorporates prior knowledge and beliefs into the analysis, updating probabilities as new data is collected.

Q173. How do you choose an ML algorithm basis the data given

Ans.

ML algorithm selection is based on data characteristics, problem type, and desired outcomes.

  • Understand the problem type (classification, regression, clustering, etc.)

  • Consider the size and quality of the data

  • Evaluate the complexity of the model and interpretability requirements

  • Choose algorithms based on their strengths and weaknesses for the specific task

  • Experiment with multiple algorithms and compare their performance

  • For example, use decision trees for classification tasks, l...read more

Q174. Statistical Significance - what it is? and how to use hypothesis testing

Ans.

Statistical significance refers to the likelihood that a result or relationship is not due to chance.

  • Statistical significance is a measure of the probability that a relationship between variables is not due to random chance.

  • Hypothesis testing is a common method to determine statistical significance by comparing observed data to what would be expected by chance.

  • A p-value is used to determine statistical significance, with a lower p-value indicating stronger evidence against th...read more

Q175. what is one vs one classification?

Ans.

One vs one classification is a binary classification method where multiple models are trained to classify each pair of classes.

  • It is used when there are more than two classes in the dataset.

  • It involves training multiple binary classifiers for each pair of classes.

  • The final prediction is made by combining the results of all the binary classifiers.

  • Example: In a dataset with 5 classes, 10 binary classifiers will be trained for each pair of classes.

Q176. What is Yolo in object detection and how's it efficient?

Ans.

Yolo is an acronym for You Only Look Once, a real-time object detection system that uses a single neural network.

  • Yolo is a popular object detection algorithm that uses a single neural network to detect objects in real-time.

  • It divides the image into a grid and predicts the bounding boxes and class probabilities for each grid cell.

  • Yolo is efficient because it only requires a single forward pass through the neural network to make predictions.

  • It can detect multiple objects in a s...read more

Q177. Why your CGPA is so low ?

Ans.

My CGPA is low because I focused more on gaining practical experience through internships and projects.

  • I prioritized gaining practical experience over theoretical knowledge

  • I took up internships and projects to gain hands-on experience

  • I believe practical experience is more valuable than just academic grades

Q178. Q. What is joints? Q. What is linear search? Q. What is your hobby?

Ans.

Joints are connections between bones that allow movement and provide support to the body.

  • Joints are found throughout the body, such as the knee, elbow, and shoulder.

  • They are made up of bones, cartilage, ligaments, and synovial fluid.

  • Joints enable various types of movements, including flexion, extension, rotation, and abduction.

  • Different types of joints include hinge joints, ball-and-socket joints, and pivot joints.

  • Joint problems can lead to conditions like arthritis and joint...read more

Q179. What is the Difference Between overfitting and underfitting in machine learning?

Ans.

Overfitting occurs when a model learns the training data too well, while underfitting occurs when a model is too simple to capture the underlying patterns.

  • Overfitting: Model performs well on training data but poorly on unseen data.

  • Underfitting: Model is too simple and fails to capture the underlying patterns in the data.

  • Overfitting can be addressed by using techniques like cross-validation, regularization, and early stopping.

  • Underfitting can be addressed by increasing model c...read more

Q180. what is universal approximation theorm?

Ans.

Universal approximation theorem states that a neural network with a single hidden layer can approximate any continuous function.

  • A neural network with a single hidden layer can approximate any continuous function

  • It is a fundamental theorem in the field of deep learning

  • The theorem applies to a wide range of activation functions

  • The number of neurons required in the hidden layer may vary depending on the complexity of the function

  • The theorem does not guarantee the accuracy of the...read more

Q181. what is AIC & BIC in linear regression?

Ans.

AIC & BIC are statistical measures used to evaluate the goodness of fit of a linear regression model.

  • AIC stands for Akaike Information Criterion and BIC stands for Bayesian Information Criterion.

  • Both AIC and BIC are used to compare different models and select the best one.

  • AIC penalizes complex models less severely than BIC.

  • Lower AIC/BIC values indicate a better fit of the model to the data.

  • AIC and BIC can be calculated using the log-likelihood function and the number of param...read more

Q182. what is VIF and how is it calculated?

Ans.

VIF stands for Variance Inflation Factor, a measure of multicollinearity in regression analysis.

  • VIF is calculated for each predictor variable in a regression model.

  • It measures how much the variance of the estimated regression coefficient is increased due to multicollinearity.

  • A VIF of 1 indicates no multicollinearity, while a VIF greater than 1 indicates increasing levels of multicollinearity.

  • VIF is calculated as 1 / (1 - R^2), where R^2 is the coefficient of determination fro...read more

Q183. Explain YOLO architecture, difference with SSD?

Ans.

YOLO (You Only Look Once) is a real-time object detection system that processes images in a single pass, while SSD (Single Shot MultiBox Detector) is another object detection model that also aims for real-time processing but uses a different approach.

  • YOLO processes images in a single pass, making it faster than SSD which requires multiple passes.

  • SSD uses a fixed grid of boxes at different aspect ratios and scales to detect objects, while YOLO divides the image into a grid and...read more

Q184. What are hyper parameters you turned for your model

Ans.

Tuned hyperparameters include learning rate, batch size, number of layers, and activation functions.

  • Adjusted learning rate to improve model convergence

  • Optimized batch size for better training efficiency

  • Experimented with different numbers of layers to find optimal model complexity

  • Tried various activation functions to enhance model performance

Q185. What is difference between random forest and xgboost

Ans.

Random forest is an ensemble learning method using decision trees, while XGBoost is a gradient boosting algorithm.

  • Random forest builds multiple decision trees and combines their predictions, while XGBoost builds trees sequentially to correct errors.

  • Random forest is less prone to overfitting compared to XGBoost.

  • XGBoost is computationally efficient and often outperforms random forest in terms of predictive accuracy.

  • Random forest is easier to tune and less sensitive to hyperpara...read more

Q186. how to find a largest number in a list without using inbuilt function

Ans.

Iterate through the list and compare each element to find the largest number.

  • Iterate through the list using a loop

  • Compare each element with a variable storing the current largest number

  • Update the variable if a larger number is found

Q187. How Work Culture of the Company affects the Work Efficiency

Ans.

Work culture greatly impacts work efficiency by influencing employee motivation, collaboration, and overall job satisfaction.

  • Positive work culture promotes employee motivation and engagement, leading to higher productivity.

  • Open communication and collaboration in the work culture can improve teamwork and problem-solving abilities.

  • A supportive work environment can boost employee morale and job satisfaction, reducing turnover rates.

  • Negative work culture, such as toxic behavior o...read more

Q188. How RNN handles exploding/vanishing Gradient?

Ans.

RNN uses techniques like gradient clipping, weight initialization, and LSTM/GRU cells to handle exploding/vanishing gradients.

  • Gradient clipping limits the magnitude of gradients during backpropagation.

  • Weight initialization techniques like Xavier initialization help in preventing vanishing gradients.

  • LSTM/GRU cells have gating mechanisms that allow the network to selectively remember or forget information.

  • Batch normalization can also help in stabilizing the gradients.

  • Exploding ...read more

Q189. RNN,CNN and difference between these two.

Ans.

RNN and CNN are neural network architectures used for different types of data.

  • RNN is used for sequential data like time series, text, speech, etc.

  • CNN is used for grid-like data like images, videos, etc.

  • RNN has feedback connections while CNN has convolutional layers.

  • RNN can handle variable length input while CNN requires fixed size input.

  • Both can be used for classification, regression, and generation tasks.

Q190. slope vs gradient (again not in relation to machine learning, and in plain english)

Ans.

Slope and gradient are both measures of the steepness of a line, but slope is a ratio while gradient is a vector.

  • Slope is the ratio of the change in y to the change in x on a line.

  • Gradient is the rate of change of a function with respect to its variables.

  • Slope is a scalar value, while gradient is a vector.

  • Slope is used to describe the steepness of a line, while gradient is used to describe the direction and magnitude of the change in a function.

  • Example: The slope of a line wi...read more

Q191. What do you understand by Deep learning neural networks

Ans.

Deep learning neural networks are a type of artificial neural network with multiple layers, used for complex pattern recognition.

  • Deep learning neural networks consist of multiple layers of interconnected nodes, allowing for more complex patterns to be learned.

  • They are capable of automatically learning features from data, eliminating the need for manual feature engineering.

  • Examples include Convolutional Neural Networks (CNNs) for image recognition and Recurrent Neural Networks...read more

Q192. What is biasing and what is overfitting and underfitting

Ans.

Biasing is the error due to overly simplistic assumptions in the learning algorithm. Overfitting is when a model is too complex and fits the training data too closely, leading to poor generalization. Underfitting is when a model is too simple to capture the underlying structure of the data.

  • Biasing occurs when a model has high error on both training and test data due to oversimplified assumptions.

  • Overfitting happens when a model is too complex and captures noise in the trainin...read more

Q193. How to extract top discussed topics on twitter ?

Ans.

Use Twitter API to extract tweets, perform text analysis to identify top discussed topics.

  • Access Twitter API to retrieve tweets

  • Perform text analysis using NLP techniques like TF-IDF or LDA

  • Identify keywords or hashtags with highest frequency to determine top discussed topics

Q194. What is the difference between linear and classification?

Ans.

Linear regression is used for predicting continuous values, while classification is used for predicting discrete values.

  • Linear regression is used when the output variable is continuous, such as predicting house prices based on features like size and location.

  • Classification is used when the output variable is categorical, such as predicting whether an email is spam or not based on its content.

  • Linear regression aims to find the best-fitting line that describes the relationship ...read more

Q195. what left skewed and right skewed data. And how to treat them

Ans.

Left skewed data has a long tail on the left side, while right skewed data has a long tail on the right side.

  • Left skewed data: mean < median < mode. Example: income distribution in a developing country.

  • Right skewed data: mean > median > mode. Example: age distribution in a population.

  • Treatment: For left skewed data, consider log transformation. For right skewed data, consider square root transformation.

Q196. Explain chi square distribution. What are the assumptions involved?

Ans.

Chi square distribution is a probability distribution used in statistical tests to determine the significance of relationships between categorical variables.

  • Chi square distribution is a continuous probability distribution that is used in statistical tests such as the chi square test.

  • It is skewed to the right and its shape is determined by the degrees of freedom.

  • Assumptions involved in chi square distribution include: random sampling, independence of observations, and expected...read more

Q197. How a cellular network works from server to SM?

Ans.

Cellular network connects devices to a server through base stations and switches, using radio waves.

  • Cellular devices communicate with base stations via radio waves

  • Base stations connect to switches which route the data to the server

  • Data is transmitted in packets using protocols like TCP/IP

  • Example: A smartphone connects to a base station, which routes the data through switches to the server

Q198. Explain difference between Faster-RCNN and Yolo v3.

Ans.

Faster-RCNN and Yolo v3 are both object detection algorithms, but differ in their approach and performance.

  • Faster-RCNN uses a two-stage approach, first generating region proposals and then classifying them.

  • Yolo v3 uses a single-stage approach, directly predicting bounding boxes and class probabilities.

  • Faster-RCNN is generally more accurate but slower, while Yolo v3 is faster but less accurate.

  • Faster-RCNN is better suited for complex scenes with many small objects, while Yolo ...read more

Q199. How does fbpropher forecasting model works and how is can be used to forecsst trafffic

Ans.

fbprophet is a forecasting model developed by Facebook that uses time series data to make predictions.

  • fbprophet is an open-source forecasting tool developed by Facebook's Core Data Science team.

  • It is based on an additive model where non-linear trends are fit with yearly, weekly, and daily seasonality, plus holiday effects.

  • fbprophet can be used to forecast traffic by providing historical data on traffic patterns and using the model to predict future trends.

  • It allows users to e...read more

Q200. Difference between probability density function and mass function

Ans.

Probability density function is for continuous random variables while mass function is for discrete random variables.

  • Probability density function gives the probability of a continuous random variable taking a certain value within a range.

  • Mass function gives the probability of a discrete random variable taking a certain value.

  • Probability density function integrates to 1 over the entire range of the random variable.

  • Mass function sums to 1 over all possible values of the random ...read more

Previous
1
2
3
4
5
6
7
Next
Interview Tips & Stories
Ace your next interview with expert advice and inspiring stories

Interview experiences of popular companies

3.7
 • 10.4k Interviews
3.8
 • 8.1k Interviews
3.6
 • 7.5k Interviews
3.8
 • 5.6k Interviews
3.7
 • 4.7k Interviews
3.8
 • 3.1k Interviews
3.8
 • 2.9k Interviews
3.8
 • 2.8k Interviews
3.7
 • 222 Interviews
View all

Calculate your in-hand salary

Confused about how your in-hand salary is calculated? Enter your annual salary (CTC) and get your in-hand salary

Data Scientist Interview Questions
Share an Interview
Stay ahead in your career. Get AmbitionBox app
qr-code
Helping over 1 Crore job seekers every month in choosing their right fit company
65 L+

Reviews

4 L+

Interviews

4 Cr+

Salaries

1 Cr+

Users/Month

Contribute to help millions

Made with ❤️ in India. Trademarks belong to their respective owners. All rights reserved © 2024 Info Edge (India) Ltd.

Follow us
  • Youtube
  • Instagram
  • LinkedIn
  • Facebook
  • Twitter