Add office photos
Engaged Employer

TCS

3.7
based on 84k Reviews
Filter interviews by

20+ Interview Questions and Answers

Updated 21 Nov 2024
Popular Designations

Q1. How does decision tree algorithm work, what is cross entropy..

Ans.

Decision tree algorithm is a tree-like model used for classification and regression. Cross entropy is a measure of the difference between two probability distributions.

  • Decision tree algorithm recursively splits the data into subsets based on the most significant attribute until a stopping criterion is met.

  • It is a popular algorithm for both classification and regression tasks.

  • Cross entropy is used as a loss function in machine learning to measure the difference between predict...read more

Add your answer

Q2. If minimal data, which would you train for categorical prediction model?

Ans.

I would train a decision tree model as it can handle categorical data well with minimal data.

  • Decision tree models are suitable for categorical prediction with minimal data

  • They can handle both numerical and categorical data

  • Decision trees are easy to interpret and visualize

  • Examples: predicting customer churn, classifying spam emails

Add your answer

Q3. How RNN handles exploding/vanishing Gradient?

Ans.

RNN uses techniques like gradient clipping, weight initialization, and LSTM/GRU cells to handle exploding/vanishing gradients.

  • Gradient clipping limits the magnitude of gradients during backpropagation.

  • Weight initialization techniques like Xavier initialization help in preventing vanishing gradients.

  • LSTM/GRU cells have gating mechanisms that allow the network to selectively remember or forget information.

  • Batch normalization can also help in stabilizing the gradients.

  • Exploding ...read more

Add your answer

Q4. Explain difference between Faster-RCNN and Yolo v3.

Ans.

Faster-RCNN and Yolo v3 are both object detection algorithms, but differ in their approach and performance.

  • Faster-RCNN uses a two-stage approach, first generating region proposals and then classifying them.

  • Yolo v3 uses a single-stage approach, directly predicting bounding boxes and class probabilities.

  • Faster-RCNN is generally more accurate but slower, while Yolo v3 is faster but less accurate.

  • Faster-RCNN is better suited for complex scenes with many small objects, while Yolo ...read more

Add your answer
Discover null interview dos and don'ts from real experiences

Q5. Why did you choose recall for a particular ML model?

Ans.

Recall was chosen for the ML model to prioritize minimizing false negatives.

  • Chose recall to focus on identifying all relevant cases, even if it means more false positives

  • In scenarios where missing a positive case is more costly than incorrectly labeling a negative case

  • Commonly used in medical diagnosis to ensure all potential cases are identified

Add your answer

Q6. what is difference between recall and precission

Ans.

Recall is the ratio of correctly predicted positive observations to the all observations in actual class, while precision is the ratio of correctly predicted positive observations to the total predicted positive observations.

  • Recall is about the actual positive instances that were correctly identified by the model.

  • Precision is about the predicted positive instances and how many of them were actually positive.

  • Recall = True Positives / (True Positives + False Negatives)

  • Precision...read more

View 1 answer
Are these interview questions helpful?

Q7. what are Logist regression and tell me about your projects

Ans.

Logistic regression is a statistical model used to predict the probability of a binary outcome based on one or more predictor variables.

  • Logistic regression is used when the dependent variable is binary (e.g., 0/1, yes/no, true/false).

  • It estimates the probability that a given observation belongs to a particular category.

  • It uses the logistic function to model the relationship between the dependent variable and independent variables.

  • Example: Predicting whether a customer will pu...read more

Add your answer

Q8. how to remove stop words and how it works

Ans.

Stop words are common words like 'the', 'is', 'and' that are removed from text data to improve analysis.

  • Stop words are commonly removed from text data to improve the accuracy of natural language processing tasks.

  • They are typically removed before tokenization and can be done using libraries like NLTK or spaCy.

  • Examples of stop words include 'the', 'is', 'and', 'in', 'on', etc.

Add your answer
Share interview questions and help millions of jobseekers 🌟

Q9. What's the difference between k means and knn

Ans.

K-means is a clustering algorithm while KNN is a classification algorithm.

  • K-means is unsupervised learning, KNN is supervised learning

  • K-means partitions data into K clusters based on distance, KNN classifies data points based on similarity to K neighbors

  • K-means requires specifying the number of clusters (K), KNN requires specifying the number of neighbors (K)

  • Example: K-means can be used to group customers based on purchasing behavior, KNN can be used to classify emails as spa...read more

Add your answer

Q10. Steps involved in Machine Learning Problem Statement

Ans.

Steps involved in Machine Learning Problem Statement

  • Define the problem statement and goals

  • Collect and preprocess data

  • Select a machine learning model

  • Train the model on the data

  • Evaluate the model's performance

  • Fine-tune the model if necessary

  • Deploy the model for predictions

Add your answer

Q11. Comparison of transfer learning and fintuning.

Ans.

Transfer learning involves using pre-trained models on a different task, while fine-tuning involves further training a pre-trained model on a specific task.

  • Transfer learning uses knowledge gained from one task to improve learning on a different task.

  • Fine-tuning involves adjusting the parameters of a pre-trained model to better fit a specific task.

  • Transfer learning is faster and requires less data compared to training a model from scratch.

  • Fine-tuning is more task-specific and ...read more

Add your answer

Q12. What is bias variance trade off

Ans.

Bias-variance trade off is the balance between underfitting and overfitting in machine learning models.

  • Bias refers to error from erroneous assumptions in the learning algorithm, leading to underfitting.

  • Variance refers to error from sensitivity to small fluctuations in the training set, leading to overfitting.

  • The trade off involves finding the right level of model complexity to minimize both bias and variance.

  • Regularization techniques like Lasso and Ridge regression can help i...read more

Add your answer

Q13. What are different ML algorithms?

Ans.

Different ML algorithms include linear regression, decision trees, random forests, support vector machines, and neural networks.

  • Linear regression: used for predicting continuous values based on input features.

  • Decision trees: used for classification and regression tasks by splitting data into branches based on feature values.

  • Random forests: ensemble method using multiple decision trees for improved accuracy.

  • Support vector machines: used for classification tasks by finding the ...read more

Add your answer

Q14. What is low bias and high variance

Ans.

Low bias and high variance refer to the trade-off between model complexity and generalization ability.

  • Low bias refers to a model that makes strong assumptions about the data, leading to high accuracy on training data but potentially poor performance on unseen data.

  • High variance refers to a model that is very sensitive to small fluctuations in the training data, leading to overfitting and poor generalization.

  • Finding the right balance between bias and variance is crucial for bu...read more

Add your answer

Q15. What happened with multi collinearity

Ans.

Multicollinearity occurs when independent variables in a regression model are highly correlated.

  • Multicollinearity can lead to unstable estimates of the coefficients in the regression model.

  • It can make it difficult to determine the effect of each independent variable on the dependent variable.

  • One common way to detect multicollinearity is to calculate the variance inflation factor (VIF) for each independent variable.

Add your answer

Q16. what is confusion matrix?

Ans.

Confusion matrix is a table used to evaluate the performance of a classification model.

  • It is a 2x2 matrix that shows the counts of true positive, true negative, false positive, and false negative predictions.

  • It is used to calculate metrics like accuracy, precision, recall, and F1 score.

  • Example: TP=100, TN=50, FP=10, FN=5.

Add your answer

Q17. explain similarity matrix algo?

Ans.

Similarity matrix algo is a method to quantify the similarity between data points in a dataset.

  • It calculates the similarity between each pair of data points in a dataset and represents it in a matrix form.

  • Common similarity measures used include cosine similarity, Euclidean distance, and Jaccard similarity.

  • The diagonal of the matrix usually contains 1s as each data point is perfectly similar to itself.

  • The values in the matrix range from 0 (no similarity) to 1 (perfect similari...read more

Add your answer

Q18. Identify geometric alorithm pattern

Ans.

Geometric algorithm patterns involve solving problems related to geometric shapes and structures.

  • Identifying and solving problems related to points, lines, angles, and shapes

  • Utilizing geometric formulas and theorems to find solutions

  • Examples include calculating area, perimeter, angles, and distances in geometric figures

Add your answer

Q19. Unsupervised learning algorithms

Ans.

Unsupervised learning algorithms are used to find patterns in data without labeled outcomes.

  • Unsupervised learning algorithms do not require labeled data for training.

  • They are used for clustering, dimensionality reduction, and anomaly detection.

  • Examples include K-means clustering, hierarchical clustering, and principal component analysis.

Add your answer

Q20. What is F1-score

Ans.

F1-score is a measure of a model's accuracy that considers both precision and recall.

  • F1-score is the harmonic mean of precision and recall.

  • It ranges from 0 to 1, where 1 is the best possible F1-score.

  • F1-score is useful when you want to balance precision and recall in your model evaluation.

Add your answer

Q21. Retraining GEN AI model

Ans.

Retraining GEN AI model involves updating the model with new data to improve its accuracy and performance.

  • Retraining is necessary to keep the model up-to-date with new information.

  • New data is used to fine-tune the model's parameters and improve its predictions.

  • Retraining may involve adjusting hyperparameters, adding more layers, or changing the architecture.

  • Examples: retraining a language model with new text data, retraining a image recognition model with additional images.

Add your answer

Q22. DEployment of Model in MLFlow

Ans.

MLFlow allows for easy deployment of machine learning models.

  • MLFlow provides a simple way to deploy models using the mlflow models serve command.

  • Models can be deployed locally or to a cloud-based server for production use.

  • MLFlow also supports model versioning and tracking for easy management of deployed models.

Add your answer

Q23. Overfitting vs Underfitting

Ans.

Overfitting occurs when a model learns the training data too well, while underfitting occurs when a model fails to capture the underlying patterns in the data.

  • Overfitting: Model is too complex and learns noise in the training data.

  • Underfitting: Model is too simple and fails to capture the underlying patterns.

  • Overfitting can lead to poor generalization and high variance.

  • Underfitting can lead to high bias and poor performance.

  • Overfitting can be reduced by regularization techniq...read more

Add your answer

Q24. Supervised learning algorithms

Ans.

Supervised learning algorithms are used in machine learning to predict outcomes based on labeled training data.

  • Supervised learning algorithms require labeled training data to learn the relationship between input and output variables.

  • Common supervised learning algorithms include linear regression, logistic regression, decision trees, support vector machines, and neural networks.

  • These algorithms are used for tasks such as classification, regression, and ranking.

  • Examples of supe...read more

Add your answer

Q25. Cosine similarity

Ans.

Cosine similarity measures the similarity between two non-zero vectors in an inner product space.

  • Cosine similarity ranges from -1 to 1, with 1 indicating identical vectors and -1 indicating opposite vectors.

  • It is commonly used in information retrieval, text mining, and recommendation systems.

  • Formula: cos(theta) = (A . B) / (||A|| * ||B||)

  • Example: Calculating similarity between two documents based on their word frequencies.

Add your answer

Q26. ML models in resume

Ans.

ML models should be included in a Data Scientist's resume.

  • Include a section in your resume highlighting the ML models you have worked with.

  • Mention the specific ML algorithms and techniques you have used.

  • Provide examples of projects where you have successfully applied ML models.

  • Highlight any notable achievements or results obtained using ML models.

  • Demonstrate your understanding of model evaluation and validation techniques.

Add your answer

Q27. Recall vs Precision

Ans.

Recall and Precision are evaluation metrics used in classification tasks to measure the performance of a model.

  • Recall measures the ability of a model to find all the relevant instances in a dataset.

  • Precision measures the ability of a model to correctly identify only the relevant instances.

  • Recall and Precision are often used together to evaluate the trade-off between completeness and correctness in a model's predictions.

  • In medical field, Recall is important to minimize false n...read more

Add your answer
Contribute & help others!
Write a review
Share interview
Contribute salary
Add office photos

Interview Process at null

based on 22 interviews in the last 1 year
2 Interview rounds
Technical Round 1
Technical Round 2
View more
Interview Tips & Stories
Ace your next interview with expert advice and inspiring stories

Top Data Scientist Interview Questions from Similar Companies

3.5
 • 18 Interview Questions
3.8
 • 15 Interview Questions
3.9
 • 14 Interview Questions
3.7
 • 14 Interview Questions
3.8
 • 14 Interview Questions
4.0
 • 12 Interview Questions
View all
Share an Interview
Stay ahead in your career. Get AmbitionBox app
qr-code
Helping over 1 Crore job seekers every month in choosing their right fit company
70 Lakh+

Reviews

5 Lakh+

Interviews

4 Crore+

Salaries

1 Cr+

Users/Month

Contribute to help millions
Get AmbitionBox app

Made with ❤️ in India. Trademarks belong to their respective owners. All rights reserved © 2024 Info Edge (India) Ltd.

Follow us
  • Youtube
  • Instagram
  • LinkedIn
  • Facebook
  • Twitter