TCS
20+ Interview Questions and Answers
Q1. How does decision tree algorithm work, what is cross entropy..
Decision tree algorithm is a tree-like model used for classification and regression. Cross entropy is a measure of the difference between two probability distributions.
Decision tree algorithm recursively splits the data into subsets based on the most significant attribute until a stopping criterion is met.
It is a popular algorithm for both classification and regression tasks.
Cross entropy is used as a loss function in machine learning to measure the difference between predict...read more
Q2. If minimal data, which would you train for categorical prediction model?
I would train a decision tree model as it can handle categorical data well with minimal data.
Decision tree models are suitable for categorical prediction with minimal data
They can handle both numerical and categorical data
Decision trees are easy to interpret and visualize
Examples: predicting customer churn, classifying spam emails
Q3. How RNN handles exploding/vanishing Gradient?
RNN uses techniques like gradient clipping, weight initialization, and LSTM/GRU cells to handle exploding/vanishing gradients.
Gradient clipping limits the magnitude of gradients during backpropagation.
Weight initialization techniques like Xavier initialization help in preventing vanishing gradients.
LSTM/GRU cells have gating mechanisms that allow the network to selectively remember or forget information.
Batch normalization can also help in stabilizing the gradients.
Exploding ...read more
Q4. Explain difference between Faster-RCNN and Yolo v3.
Faster-RCNN and Yolo v3 are both object detection algorithms, but differ in their approach and performance.
Faster-RCNN uses a two-stage approach, first generating region proposals and then classifying them.
Yolo v3 uses a single-stage approach, directly predicting bounding boxes and class probabilities.
Faster-RCNN is generally more accurate but slower, while Yolo v3 is faster but less accurate.
Faster-RCNN is better suited for complex scenes with many small objects, while Yolo ...read more
Q5. Why did you choose recall for a particular ML model?
Recall was chosen for the ML model to prioritize minimizing false negatives.
Chose recall to focus on identifying all relevant cases, even if it means more false positives
In scenarios where missing a positive case is more costly than incorrectly labeling a negative case
Commonly used in medical diagnosis to ensure all potential cases are identified
Q6. what is difference between recall and precission
Recall is the ratio of correctly predicted positive observations to the all observations in actual class, while precision is the ratio of correctly predicted positive observations to the total predicted positive observations.
Recall is about the actual positive instances that were correctly identified by the model.
Precision is about the predicted positive instances and how many of them were actually positive.
Recall = True Positives / (True Positives + False Negatives)
Precision...read more
Q7. what are Logist regression and tell me about your projects
Logistic regression is a statistical model used to predict the probability of a binary outcome based on one or more predictor variables.
Logistic regression is used when the dependent variable is binary (e.g., 0/1, yes/no, true/false).
It estimates the probability that a given observation belongs to a particular category.
It uses the logistic function to model the relationship between the dependent variable and independent variables.
Example: Predicting whether a customer will pu...read more
Q8. how to remove stop words and how it works
Stop words are common words like 'the', 'is', 'and' that are removed from text data to improve analysis.
Stop words are commonly removed from text data to improve the accuracy of natural language processing tasks.
They are typically removed before tokenization and can be done using libraries like NLTK or spaCy.
Examples of stop words include 'the', 'is', 'and', 'in', 'on', etc.
Q9. What's the difference between k means and knn
K-means is a clustering algorithm while KNN is a classification algorithm.
K-means is unsupervised learning, KNN is supervised learning
K-means partitions data into K clusters based on distance, KNN classifies data points based on similarity to K neighbors
K-means requires specifying the number of clusters (K), KNN requires specifying the number of neighbors (K)
Example: K-means can be used to group customers based on purchasing behavior, KNN can be used to classify emails as spa...read more
Q10. Steps involved in Machine Learning Problem Statement
Steps involved in Machine Learning Problem Statement
Define the problem statement and goals
Collect and preprocess data
Select a machine learning model
Train the model on the data
Evaluate the model's performance
Fine-tune the model if necessary
Deploy the model for predictions
Q11. Comparison of transfer learning and fintuning.
Transfer learning involves using pre-trained models on a different task, while fine-tuning involves further training a pre-trained model on a specific task.
Transfer learning uses knowledge gained from one task to improve learning on a different task.
Fine-tuning involves adjusting the parameters of a pre-trained model to better fit a specific task.
Transfer learning is faster and requires less data compared to training a model from scratch.
Fine-tuning is more task-specific and ...read more
Q12. What is bias variance trade off
Bias-variance trade off is the balance between underfitting and overfitting in machine learning models.
Bias refers to error from erroneous assumptions in the learning algorithm, leading to underfitting.
Variance refers to error from sensitivity to small fluctuations in the training set, leading to overfitting.
The trade off involves finding the right level of model complexity to minimize both bias and variance.
Regularization techniques like Lasso and Ridge regression can help i...read more
Q13. What are different ML algorithms?
Different ML algorithms include linear regression, decision trees, random forests, support vector machines, and neural networks.
Linear regression: used for predicting continuous values based on input features.
Decision trees: used for classification and regression tasks by splitting data into branches based on feature values.
Random forests: ensemble method using multiple decision trees for improved accuracy.
Support vector machines: used for classification tasks by finding the ...read more
Q14. What is low bias and high variance
Low bias and high variance refer to the trade-off between model complexity and generalization ability.
Low bias refers to a model that makes strong assumptions about the data, leading to high accuracy on training data but potentially poor performance on unseen data.
High variance refers to a model that is very sensitive to small fluctuations in the training data, leading to overfitting and poor generalization.
Finding the right balance between bias and variance is crucial for bu...read more
Q15. What happened with multi collinearity
Multicollinearity occurs when independent variables in a regression model are highly correlated.
Multicollinearity can lead to unstable estimates of the coefficients in the regression model.
It can make it difficult to determine the effect of each independent variable on the dependent variable.
One common way to detect multicollinearity is to calculate the variance inflation factor (VIF) for each independent variable.
Q16. what is confusion matrix?
Confusion matrix is a table used to evaluate the performance of a classification model.
It is a 2x2 matrix that shows the counts of true positive, true negative, false positive, and false negative predictions.
It is used to calculate metrics like accuracy, precision, recall, and F1 score.
Example: TP=100, TN=50, FP=10, FN=5.
Q17. explain similarity matrix algo?
Similarity matrix algo is a method to quantify the similarity between data points in a dataset.
It calculates the similarity between each pair of data points in a dataset and represents it in a matrix form.
Common similarity measures used include cosine similarity, Euclidean distance, and Jaccard similarity.
The diagonal of the matrix usually contains 1s as each data point is perfectly similar to itself.
The values in the matrix range from 0 (no similarity) to 1 (perfect similari...read more
Q18. Identify geometric alorithm pattern
Geometric algorithm patterns involve solving problems related to geometric shapes and structures.
Identifying and solving problems related to points, lines, angles, and shapes
Utilizing geometric formulas and theorems to find solutions
Examples include calculating area, perimeter, angles, and distances in geometric figures
Q19. Unsupervised learning algorithms
Unsupervised learning algorithms are used to find patterns in data without labeled outcomes.
Unsupervised learning algorithms do not require labeled data for training.
They are used for clustering, dimensionality reduction, and anomaly detection.
Examples include K-means clustering, hierarchical clustering, and principal component analysis.
Q20. What is F1-score
F1-score is a measure of a model's accuracy that considers both precision and recall.
F1-score is the harmonic mean of precision and recall.
It ranges from 0 to 1, where 1 is the best possible F1-score.
F1-score is useful when you want to balance precision and recall in your model evaluation.
Q21. Retraining GEN AI model
Retraining GEN AI model involves updating the model with new data to improve its accuracy and performance.
Retraining is necessary to keep the model up-to-date with new information.
New data is used to fine-tune the model's parameters and improve its predictions.
Retraining may involve adjusting hyperparameters, adding more layers, or changing the architecture.
Examples: retraining a language model with new text data, retraining a image recognition model with additional images.
Q22. DEployment of Model in MLFlow
MLFlow allows for easy deployment of machine learning models.
MLFlow provides a simple way to deploy models using the mlflow models serve command.
Models can be deployed locally or to a cloud-based server for production use.
MLFlow also supports model versioning and tracking for easy management of deployed models.
Q23. Overfitting vs Underfitting
Overfitting occurs when a model learns the training data too well, while underfitting occurs when a model fails to capture the underlying patterns in the data.
Overfitting: Model is too complex and learns noise in the training data.
Underfitting: Model is too simple and fails to capture the underlying patterns.
Overfitting can lead to poor generalization and high variance.
Underfitting can lead to high bias and poor performance.
Overfitting can be reduced by regularization techniq...read more
Q24. Supervised learning algorithms
Supervised learning algorithms are used in machine learning to predict outcomes based on labeled training data.
Supervised learning algorithms require labeled training data to learn the relationship between input and output variables.
Common supervised learning algorithms include linear regression, logistic regression, decision trees, support vector machines, and neural networks.
These algorithms are used for tasks such as classification, regression, and ranking.
Examples of supe...read more
Q25. Cosine similarity
Cosine similarity measures the similarity between two non-zero vectors in an inner product space.
Cosine similarity ranges from -1 to 1, with 1 indicating identical vectors and -1 indicating opposite vectors.
It is commonly used in information retrieval, text mining, and recommendation systems.
Formula: cos(theta) = (A . B) / (||A|| * ||B||)
Example: Calculating similarity between two documents based on their word frequencies.
Q26. ML models in resume
ML models should be included in a Data Scientist's resume.
Include a section in your resume highlighting the ML models you have worked with.
Mention the specific ML algorithms and techniques you have used.
Provide examples of projects where you have successfully applied ML models.
Highlight any notable achievements or results obtained using ML models.
Demonstrate your understanding of model evaluation and validation techniques.
Q27. Recall vs Precision
Recall and Precision are evaluation metrics used in classification tasks to measure the performance of a model.
Recall measures the ability of a model to find all the relevant instances in a dataset.
Precision measures the ability of a model to correctly identify only the relevant instances.
Recall and Precision are often used together to evaluate the trade-off between completeness and correctness in a model's predictions.
In medical field, Recall is important to minimize false n...read more
More about working at TCS
Top HR Questions asked in null
Interview Process at null
Top Data Scientist Interview Questions from Similar Companies
Reviews
Interviews
Salaries
Users/Month