Top 250 Machine Learning Interview Questions and Answers

Updated 14 Dec 2024

Q1. What is MLT?

Ans.

MLT stands for Medical Laboratory Technician.

MLT is a healthcare professional who performs laboratory tests and procedures.
They collect and analyze samples such as blood, urine, and tissue.
MLTs work under the supervision of medical technologists or pathologists.
They operate and maintain laboratory equipment.
MLTs ensure accuracy and quality control in test results.
They may specialize in areas like microbiology, hematology, or immunology.

View 1 answer

Q2. 4. What is the difference between Linear Regression and Logistic Regression?

Ans.

Linear Regression is used for predicting continuous numerical values, while Logistic Regression is used for predicting binary categorical values.

Linear Regression predicts a continuous output, while Logistic Regression predicts a binary output.
Linear Regression uses a linear equation to model the relationship between the independent and dependent variables, while Logistic Regression uses a logistic function.
Linear Regression assumes a linear relationship between the variables...read more

View 2 more answers

Q3. Which test is used in logistic regression to check the significance of the variable

Ans.

The Wald test is used in logistic regression to check the significance of the variable.

The Wald test calculates the ratio of the estimated coefficient to its standard error.
It follows a chi-square distribution with one degree of freedom.
A small p-value indicates that the variable is significant.
For example, in Python, the statsmodels library provides the Wald test in the summary of a logistic regression model.

View 1 answer

Q4. What is the difference between logistic and linear regression?

Ans.

Logistic regression is used for binary classification, while linear regression is used for predicting continuous values.

Logistic regression is a classification algorithm, while linear regression is a regression algorithm.
Logistic regression uses a logistic function to model the probability of the binary outcome.
Linear regression uses a linear function to model the relationship between the independent and dependent variables.
Logistic regression predicts discrete outcomes (e.g....read more

View 3 more answers

Frequently asked in

IndusInd Bank

Are these interview questions helpful?

Q5. what is random forest, knn?

Ans.

Random forest and KNN are machine learning algorithms used for classification and regression tasks.

Random forest is an ensemble learning method that constructs multiple decision trees and combines their outputs to make a final prediction.
KNN (k-nearest neighbors) is a non-parametric algorithm that classifies new data points based on the majority class of their k-nearest neighbors in the training set.
Random forest is useful for handling high-dimensional data and avoiding overf...read more

Add your answer

Q6. What are the types of ML algorithms? Give an example of each.

Ans.

There are several types of ML algorithms, including supervised learning, unsupervised learning, and reinforcement learning.

Supervised learning: algorithms learn from labeled data to make predictions or classifications (e.g., linear regression, decision trees)
Unsupervised learning: algorithms find patterns or relationships in unlabeled data (e.g., clustering, dimensionality reduction)
Reinforcement learning: algorithms learn through trial and error by interacting with an enviro...read more

View 1 answer

Share interview questions and help millions of jobseekers 🌟

Q7. Justify the need for using Recall instead of accuracy.

Ans.

Recall is more important than accuracy in certain scenarios.

Recall is important when the cost of false negatives is high.
Accuracy can be misleading when the dataset is imbalanced.
Recall measures the ability to correctly identify positive cases.
Examples include medical diagnosis and fraud detection.

Add your answer

Q8. How would you measure model effectiveness without using any of confusion matrix metrics given the data is highly imbalanced

Ans.

One way to measure model effectiveness without using confusion matrix metrics is by using area under the receiver operating characteristic curve (AUC-ROC).

Calculate the AUC-ROC score to evaluate the model's ability to distinguish between positive and negative classes.
AUC-ROC considers the entire range of classification thresholds and is insensitive to class imbalance.
Higher AUC-ROC score indicates better model performance.
Example: A model with an AUC-ROC score of 0.85 perform...read more

View 1 answer

Machine Learning Jobs

Jr. ML Engineer - Time Series • 1-6 years

Robert Bosch Engineering and Business Solutions Private Limited

•

4.2

Hosur

MLOps Platform Engineer • 0-7 years

Maersk Global Service Centres India Pvt. Ltd.

•

4.2

Bangalore / Bengaluru

Software Engineer II - ML • 3-7 years

Uber

•

4.2

Bangalore / Bengaluru

View all Machine Learning jobs

Q9. in what scenarios would you advice me to not use ReLU in my hidden layers?

Ans.

Avoid ReLU when dealing with negative values or vanishing gradients.

When dealing with negative values, use Leaky ReLU or ELU instead.
When facing vanishing gradients, use other activation functions like tanh or sigmoid.
In some cases, using ReLU in all layers can lead to dead neurons.
Consider the nature of your data and the problem you are trying to solve before choosing an activation function.

Add your answer

Q10. How is object detection done using CNN?

Ans.

Object detection using CNN involves training a neural network to identify and locate objects within an image.

CNNs use convolutional layers to extract features from images
These features are then passed through fully connected layers to classify and locate objects
Common architectures for object detection include YOLO, SSD, and Faster R-CNN

Add your answer

Q11. How do you handle overfitting and underfitting in Decision Trees

Ans.

Overfitting in decision trees can be handled by pruning, reducing tree depth, increasing dataset size, and using ensemble methods.

Prune the tree to remove unnecessary branches
Reduce tree depth to prevent overfitting
Increase dataset size to improve model generalization
Use ensemble methods like Random Forest to reduce overfitting
Underfitting can be handled by increasing tree depth, adding more features, and reducing regularization
Regularization can be used to prevent overfittin...read more

Add your answer

Frequently asked in

Urban Company

Q12. what is the difference between clustering and classification.

Ans.

Clustering groups data points based on similarity while classification assigns labels to data points based on predefined categories.

Clustering is unsupervised learning while classification is supervised learning.
Clustering is used to find patterns in data while classification is used to predict the category of a data point.
Examples of clustering algorithms include k-means and hierarchical clustering while examples of classification algorithms include decision trees and logist...read more

Add your answer

Q13. What all you know about Anomaly detection?

Ans.

Anomaly detection is the process of identifying data points that deviate from the expected pattern.

Anomaly detection is used in various fields such as finance, cybersecurity, and manufacturing.
It can be done using statistical methods, machine learning algorithms, or a combination of both.
Some common techniques for anomaly detection include clustering, classification, and time series analysis.
Examples of anomalies include fraudulent transactions, network intrusions, and equipm...read more

Add your answer

Q14. Explain about K means Clustering

Ans.

K means Clustering is a unsupervised machine learning algorithm used to group similar data points together.

K means clustering is used to partition a dataset into K clusters based on their similarity.
It is an iterative algorithm that starts with K random centroids and assigns each data point to the nearest centroid.
The centroids are then recalculated based on the mean of the data points in each cluster and the process is repeated until convergence.
It is widely used in image se...read more

View 1 answer

Q15. How RNN handles exploding/vanishing Gradient?

Ans.

RNN uses techniques like gradient clipping, weight initialization, and LSTM/GRU cells to handle exploding/vanishing gradients.

Gradient clipping limits the magnitude of gradients during backpropagation.
Weight initialization techniques like Xavier initialization help in preventing vanishing gradients.
LSTM/GRU cells have gating mechanisms that allow the network to selectively remember or forget information.
Batch normalization can also help in stabilizing the gradients.
Exploding ...read more

Add your answer

Frequently asked in

TCS

Q16. How did you prevent your model from overfitting ? What did you do when it was underfit ?

Ans.

To prevent overfitting, I used techniques like regularization, cross-validation, and early stopping. For underfitting, I tried increasing model complexity and adding more features.

Used regularization techniques like L1 and L2 regularization to penalize large weights
Used cross-validation to evaluate model performance on different subsets of data
Used early stopping to prevent the model from continuing to train when performance on validation set stops improving
For underfitting, ...read more

Add your answer

Q17. Explain any supervised and unsupervised algorithm.

Ans.

Supervised algorithms learn from labeled data while unsupervised algorithms learn from unlabeled data.

Supervised algorithms are used for classification and regression tasks.
Examples of supervised algorithms include decision trees, random forests, and support vector machines.
Unsupervised algorithms are used for clustering and dimensionality reduction tasks.
Examples of unsupervised algorithms include k-means clustering, principal component analysis, and autoencoders.

Add your answer

Q18. How to select features?

Ans.

Feature selection involves identifying the most relevant and informative variables for a predictive model.

Start with a large pool of potential features
Use statistical tests or machine learning algorithms to identify the most important features
Consider domain knowledge and expert input
Regularly re-evaluate and update feature selection as needed

Add your answer

Q19. How to train CNN model

Ans.

Training a CNN model involves selecting appropriate architecture, preparing data, setting hyperparameters, and optimizing loss function.

Select appropriate CNN architecture based on the problem at hand
Prepare data by preprocessing, augmenting, and splitting into training, validation, and test sets
Set hyperparameters such as learning rate, batch size, and number of epochs
Optimize loss function using backpropagation and gradient descent
Regularize the model to prevent overfitting...read more

Add your answer

Q20. What is PCA and where and how it is used?

Ans.

PCA stands for Principal Component Analysis. It is a statistical technique used for dimensionality reduction.

PCA is used to reduce the number of variables in a dataset while retaining the maximum amount of information.
It is commonly used in data preprocessing and exploratory data analysis.
PCA is also used in image processing, speech recognition, and finance.
It works by transforming the original variables into a new set of uncorrelated variables called principal components.
The...read more

Add your answer

Q21. the difference between supervised and unsupervised machine learning

Ans.

Supervised learning uses labeled data to train models, while unsupervised learning uses unlabeled data to find patterns.

Supervised learning requires a target variable to be predicted, while unsupervised learning does not.
Supervised learning algorithms include regression, classification, and decision trees, while unsupervised learning algorithms include clustering and association.
Supervised learning is used for prediction and classification tasks, while unsupervised learning i...read more

Add your answer

Q22. What is bais-variance tradeoff? Explain P values to non technical and technical audience.

Ans.

Bais-variance tradeoff is the balance between overfitting and underfitting. P values measure the significance of statistical results.

Bais-variance tradeoff is the tradeoff between the model's ability to fit the training data and its ability to generalize to new data.
Overfitting occurs when the model is too complex and fits the training data too closely, resulting in poor performance on new data.
Underfitting occurs when the model is too simple and fails to capture the underlyi...read more

View 1 answer

Q23. What is the Bias Variance trade-off and name some models with high bias and low variance?

Ans.

Bias-Variance trade-off is the balance between overfitting and underfitting. High bias models are simple but inaccurate, low variance models are complex but overfit.

Bias-Variance trade-off is a fundamental concept in machine learning.
High bias models are simple and have low variance, but are inaccurate.
Low bias models are complex and have high variance, but can overfit the data.
Examples of high bias models are linear regression and decision trees with few nodes.
Examples of lo...read more

Add your answer

Frequently asked in

Flipkart

Q24. What LLM frameworks have you worked with?

Ans.

I have worked with various LLM frameworks including TensorFlow, PyTorch, and Keras.

I have experience with TensorFlow, a popular deep learning framework.
I have also worked with PyTorch, another widely used framework for deep learning.
Keras is another LLM framework that I have utilized in my projects.

Add your answer

Q25. Please tell me about the machine learning projects you have done

Ans.

I have worked on several machine learning projects, including image recognition and natural language processing.

Developed an image recognition model using convolutional neural networks
Implemented a natural language processing algorithm for sentiment analysis
Collaborated on a recommendation system using collaborative filtering
Applied machine learning techniques to predict customer churn in a telecom company

View 1 answer

Q26. What is Transformers? Explain

Ans.

Transformers are electrical devices that transfer energy between two or more circuits through electromagnetic induction.

Transformers are used to increase or decrease the voltage of an alternating current (AC) signal.
They consist of two or more coils of wire, known as windings, that are wound around a core made of magnetic material.
The primary winding receives the input voltage, while the secondary winding delivers the output voltage.
Step-up transformers increase the voltage, ...read more

View 2 more answers

Q27. what does KNN do during training?

Ans.

KNN during training stores all the data points and their corresponding labels to use for prediction.

KNN algorithm stores all the training data points and their corresponding labels.
It calculates the distance between the new data point and all the stored data points.
It selects the k-nearest neighbors based on the calculated distance.
It assigns the label of the majority of the k-nearest neighbors to the new data point.

Add your answer

Q28. What is SVM?

Ans.

SVM stands for Support Vector Machine, a supervised learning algorithm used for classification and regression analysis.

SVM is a type of machine learning algorithm that analyzes data for classification and regression analysis.
It works by finding the best possible boundary between different classes of data points.
SVM can be used for both linear and non-linear data.
It is commonly used in image classification, text classification, and bioinformatics.
SVM is known for its ability t...read more

Add your answer

Q29. Where to used AML and not too used

Ans.

AML is used in financial institutions to prevent money laundering, while it is not commonly used in other industries.

AML is used in financial institutions to detect and prevent money laundering and terrorist financing.
It involves the identification and verification of customers, monitoring of transactions, and reporting of suspicious activities.
AML is not commonly used in other industries, although some may have similar regulations or compliance requirements.
For example, casi...read more

Add your answer

Frequently asked in

BNP Paribas

Q30. How XGB is better than RF

Ans.

XGB is better than RF due to its ability to handle complex relationships and optimize performance.

XGB uses gradient boosting which allows for better handling of complex relationships compared to RF
XGB optimizes performance by using regularization techniques to prevent overfitting
XGB is faster and more efficient in training compared to RF
XGB allows for parallel processing which can speed up computation
XGB has been shown to outperform RF in various machine learning competitions

Add your answer

Q31. diff between rnn and lstm

Ans.

LSTM is a type of RNN with additional memory cells to better capture long-term dependencies.

RNN stands for Recurrent Neural Network, while LSTM stands for Long Short-Term Memory.
LSTM has additional memory cells (input, forget, output gates) to better capture long-term dependencies.
RNN suffers from vanishing/exploding gradient problem, while LSTM helps alleviate this issue.
LSTM is better suited for tasks requiring long-term memory retention, such as language translation or spe...read more

Add your answer

Q32. What is KNN and K-means

Ans.

KNN is a supervised machine learning algorithm used for classification and regression. K-means is an unsupervised clustering algorithm.

KNN stands for K-Nearest Neighbors and works by finding the K closest data points to a given data point to make predictions.
K-means is a clustering algorithm that partitions data into K clusters based on similarity.
KNN is used for classification tasks, while K-means is used for clustering tasks.
Example: KNN can be used to predict whether a cus...read more

Add your answer

Q33. What is regularization? Why is it used?

Ans.

Regularization is a technique used in machine learning to prevent overfitting by adding a penalty term to the loss function.

Regularization helps to reduce the complexity of a model by discouraging large parameter values.
It prevents overfitting by adding a penalty for complex models, encouraging simpler and more generalizable models.
Common regularization techniques include L1 regularization (Lasso), L2 regularization (Ridge), and Elastic Net regularization.
Regularization can b...read more

Add your answer

Q34. Explain feature engineering process in ML modelling

Ans.

Feature engineering is the process of selecting and transforming relevant features from raw data to improve model performance.

Identify relevant features based on domain knowledge and data exploration
Transform features to improve their quality and relevance
Create new features by combining or extracting information from existing features
Select the most important features using feature selection techniques
Iterate the process to improve model performance

Add your answer

Frequently asked in

Marsh

Q35. What is rnn and lstm

Ans.

RNN stands for Recurrent Neural Network and LSTM stands for Long Short-Term Memory. They are types of neural networks used for sequential data processing.

RNN is a type of neural network that can process sequential data by maintaining a memory of past inputs.
LSTM is a type of RNN that can handle the vanishing gradient problem and can remember long-term dependencies.
LSTM has gates that control the flow of information into and out of the memory cell.
Both RNN and LSTM are commonl...read more

Add your answer

Q36. How do you publish the models and share it?

Ans.

Models are published on a cloud-based platform and shared with stakeholders via access permissions.

Models are uploaded to a cloud-based platform such as BIM 360 or Autodesk Forge.
Access permissions are set for stakeholders to view and collaborate on the models.
Regular updates are made to the models and stakeholders are notified of changes.
Issues and clashes are tracked and resolved through the platform.
Final models are exported in various formats for use in construction and m...read more

Add your answer

Q37. What is difference between C and gamma in SVM

Ans.

C is the regularization parameter while gamma controls the shape of the decision boundary in SVM.

C controls the trade-off between achieving a low training error and a low testing error.
A smaller C value creates a wider margin and allows more misclassifications.
Gamma controls the shape of the decision boundary and the influence of each training example.
A smaller gamma value creates a smoother decision boundary while a larger gamma value creates a more complex decision boundary...read more

Add your answer

Q38. Where 1D CNN are used

Ans.

1D CNNs are used in signal processing, time series analysis, speech recognition, and natural language processing.

Signal processing: analyzing signals such as audio, EEG, ECG
Time series analysis: forecasting stock prices, weather patterns
Speech recognition: converting spoken language to text
Natural language processing: sentiment analysis, text classification

Add your answer

Q39. how will you get the embeddings of long sentences/paragraphs that transformer models like BERT truncate? how will you go about using BERT for such sentences? will you use sentence embeddings or word embeddings...

Ans.

To get embeddings of long sentences/paragraphs truncated by BERT, we can use pooling techniques like mean/max pooling.

We can use pooling techniques like mean/max pooling to get embeddings of truncated sentences/paragraphs.
We can also use sliding window approach to get embeddings of overlapping segments of the long input.
For using BERT on such long inputs, we can use sentence embeddings or word embeddings depending on the task.
Models like Longformer and Reformer can handle lon...read more

Add your answer

Q40. 7. Explain KNN Algorithm?

Ans.

KNN is a non-parametric algorithm used for classification and regression tasks.

KNN stands for K-Nearest Neighbors.
It works by finding the K closest data points to a given test point.
The class or value of the test point is then determined by the majority class or average value of the K neighbors.
KNN can be used for both classification and regression tasks.
It is a simple and easy-to-understand algorithm, but can be computationally expensive for large datasets.

Add your answer

Q41. What is Blue score in Regression

Ans.

Blue score is not a term used in regression analysis.

Blue score is not a standard term in regression analysis
It is possible that the interviewer meant to ask about another metric such as R-squared or mean squared error
Without further context, it is difficult to provide a more specific answer

View 1 answer

Q42. Explain working of recommendation system.

Ans.

Recommendation system uses data analysis and machine learning algorithms to suggest items to users based on their preferences.

Collect user data and item data
Analyze data to find patterns and similarities
Use machine learning algorithms to make predictions and suggest items to users
Continuously update and improve the system based on user feedback
Examples: Netflix suggesting movies based on viewing history, Amazon suggesting products based on purchase history

Add your answer

Q43. What is clustering?How k means works?

Ans.

Clustering is a technique used to group similar data points together. K-means is a popular clustering algorithm.

Clustering is an unsupervised learning technique
It is used to group similar data points together based on their features
K-means is a popular clustering algorithm that partitions data into k clusters
The algorithm works by randomly selecting k centroids and assigning each data point to the nearest centroid
The centroids are then updated based on the mean of the data po...read more

Add your answer

Frequently asked in

General Mills

Q44. What metrics do you use to evaluate classification models

Ans.

Metrics used to evaluate classification models

Accuracy
Precision
Recall
F1 Score
ROC Curve
Confusion Matrix

Add your answer

Frequently asked in

Urban Company

Q45. Do you know what are AI and ML ?

Ans.

AI stands for Artificial Intelligence and ML stands for Machine Learning.

AI is the simulation of human intelligence in machines that are programmed to think and learn like humans.
ML is a subset of AI that involves training algorithms to make predictions or decisions based on data.
AI and ML are used in various industries such as healthcare, finance, and transportation.
Examples of AI and ML include virtual assistants like Siri and Alexa, self-driving cars, and fraud detection s...read more

Add your answer

Frequently asked in

TCS

Q46. What is Naive Bayes in ML?

Ans.

Naive Bayes is a probabilistic algorithm that uses Bayes' theorem to classify data based on prior knowledge.

Naive Bayes assumes that all features are independent of each other.
It is commonly used for text classification and spam filtering.
There are three types of Naive Bayes classifiers: Gaussian, Multinomial, and Bernoulli.
It is a fast and simple algorithm that works well with high-dimensional datasets.
Naive Bayes can handle missing data and is not affected by irrelevant fea...read more

Add your answer

Q47. What is BERT & Transformers

Ans.

BERT & Transformers are natural language processing models used for tasks such as sentiment analysis, question answering, and language translation.

BERT stands for Bidirectional Encoder Representations from Transformers and is a pre-trained language model developed by Google.
Transformers are a type of neural network architecture that can process sequential data, such as text, by attending to different parts of the input at each step.
BERT and Transformers have been used for a v...read more

Add your answer

Q48. Mention some optimizers and loss functions used in machine learning?

Ans.

Some optimizers and loss functions used in machine learning

Optimizers: Adam, SGD, RMSprop
Loss functions: Mean Squared Error (MSE), Cross Entropy, Hinge Loss

Add your answer

Q49. What do these hyper parameters in the above mentioned algorithms actually mean?

Ans.

Hyperparameters are settings that control the behavior of machine learning algorithms.

Hyperparameters are set before training the model.
They control the learning process and affect the model's performance.
Examples include learning rate, regularization strength, and number of hidden layers.
Optimizing hyperparameters is important for achieving better model accuracy.

Add your answer

Frequently asked in

Walmart

Q50. Is it always important to apply ML algorithms to solve any statistical problem?

Ans.

No, it is not always important to apply ML algorithms to solve any statistical problem.

ML algorithms may not be necessary for simple statistical problems
ML algorithms require large amounts of data and computing power
ML algorithms may not always provide the most interpretable results
Statistical models may be more appropriate for certain types of data
ML algorithms should be used when they provide a clear advantage over traditional statistical methods

Add your answer

Q51. Have you heard about Gaussian Mixture Model? Can you explain it with an proper industrial example?

Ans.

Gaussian Mixture Model is a probabilistic model used for clustering and density estimation.

GMM assumes that the data points are generated from a mixture of Gaussian distributions.
It estimates the parameters of these Gaussian distributions to cluster the data points.
An industrial example of GMM is in customer segmentation for targeted marketing.
GMM can also be used in anomaly detection and image segmentation.

Add your answer

Q52. What different splitting criterion are applied in decision tree. Why random forest works better ?

Ans.

Different splitting criteria in decision trees include Gini impurity, entropy, and misclassification error. Random forest works better due to ensemble learning and reducing overfitting.

Splitting criteria in decision trees: Gini impurity, entropy, misclassification error
Random forest works better due to ensemble learning and reducing overfitting
Random forest combines multiple decision trees to improve accuracy and generalization
Random forest introduces randomness in feature se...read more

Add your answer

Frequently asked in

Accenture

Q53. What is the difference between sigmoid and softmax activation function?

Ans.

Sigmoid is used for binary classification while softmax is used for multi-class classification.

Sigmoid function outputs values between 0 and 1, suitable for binary classification tasks.
Softmax function outputs a probability distribution over multiple classes, summing up to 1.
Sigmoid is used in the output layer for binary classification, while softmax is used for multi-class classification.
Softmax is the generalization of the sigmoid function for multiple classes.

Add your answer

Q54. What are underfitting and overfitting in machine learning models?

Ans.

Underfitting and overfitting are common problems in machine learning models.

Underfitting occurs when a model is too simple and fails to capture the underlying patterns in the data.
Overfitting happens when a model is too complex and learns the noise or random fluctuations in the training data.
Underfitting leads to high bias and low variance, while overfitting leads to low bias and high variance.
To address underfitting, we can increase model complexity, gather more data, or use...read more

Add your answer

Q55. Why do you think the objective of predictive modeling is minimizing the cost function? How would you define a cost function after all?

Ans.

The objective of predictive modeling is to minimize the cost function as it helps in optimizing the model's performance.

Predictive modeling aims to make accurate predictions by minimizing the cost function.
The cost function quantifies the discrepancy between predicted and actual values.
By minimizing the cost function, the model can improve its ability to make accurate predictions.
The cost function can be defined differently based on the problem at hand.
For example, in a binar...read more

Add your answer

Q56. How would you approach the problem of training a model to detect this plastic bottle?

Ans.

I would approach the problem by collecting a dataset of images containing plastic bottles, preprocessing the images, selecting a suitable model architecture, training the model, and evaluating its performance.

Collect a dataset of images containing plastic bottles and label them accordingly
Preprocess the images by resizing, normalizing, and augmenting them to improve model performance
Select a suitable model architecture such as Convolutional Neural Network (CNN) for image clas...read more

Add your answer

Q57. Explain the transformer architecture and positional encoders?

Ans.

Transformer architecture is a neural network architecture used for natural language processing tasks. Positional encoders are used to encode the position of words in a sentence.

Transformer architecture is based on the self-attention mechanism.
It consists of an encoder and a decoder.
Positional encoders are added to the input embeddings to encode the position of words in a sentence.
They are computed using sine and cosine functions of different frequencies.
Positional encoders he...read more

Add your answer

Q58. how does backpropagation in neural networks work?

Ans.

Backpropagation is a supervised learning algorithm used to train neural networks by adjusting weights to minimize error.

It involves propagating the error backwards through the network to adjust the weights of the connections between neurons.
The algorithm uses the chain rule of calculus to calculate the gradient of the error with respect to each weight.
The weights are then updated using a learning rate and the calculated gradient.
This process is repeated for multiple iteration...read more

Add your answer

Q59. How you proceed with model building.

Ans.

I proceed with model building by first defining the problem, collecting and cleaning data, selecting appropriate algorithms, training and testing the model, and finally evaluating its performance.

Define the problem and set goals
Collect and clean data
Select appropriate algorithms
Train and test the model
Evaluate the model's performance
Iterate and refine the model as needed

Add your answer

Q60. What are different types of algorthim methods in machine learning?

Ans.

There are various algorithm methods in machine learning, such as supervised learning, unsupervised learning, and reinforcement learning.

Supervised learning: Algorithms learn from labeled data to make predictions or classifications.
Unsupervised learning: Algorithms learn from unlabeled data to discover patterns or relationships.
Reinforcement learning: Algorithms learn through trial and error to maximize rewards.
Other methods include semi-supervised learning, transfer learning,...read more

View 4 more answers

Frequently asked in

Cognizant

Q61. What are different types of machine learning with examples

Ans.

There are three types of machine learning: supervised, unsupervised, and reinforcement learning.

Supervised learning involves training a model on labeled data to make predictions on new data. Example: predicting house prices based on features like location, size, etc.
Unsupervised learning involves finding patterns in unlabeled data. Example: clustering customers based on their purchasing behavior.
Reinforcement learning involves training a model to make decisions based on rewar...read more

Add your answer

Q62. What is mlops and how it is different from machine learning

Ans.

MLOps is the practice of combining machine learning with operations to deploy, monitor, and manage ML models in production.

MLOps focuses on the entire ML lifecycle from development to deployment and monitoring
It involves automating the process of training, deploying, and managing ML models
MLOps ensures collaboration between data scientists and operations teams for seamless model deployment
It includes version control, testing, and continuous integration/continuous deployment (...read more

Add your answer

Q63. what are the techniques used in ML for CV apart from CV?

Ans.

ML techniques for CV apart from CV

Transfer learning
Object detection
Semantic segmentation
Generative adversarial networks (GANs)
Reinforcement learning
Neural style transfer

Add your answer

Q64. What is Yolo in object detection and how's it efficient?

Ans.

Yolo is an acronym for You Only Look Once, a real-time object detection system that uses a single neural network.

Yolo is a popular object detection algorithm that uses a single neural network to detect objects in real-time.
It divides the image into a grid and predicts the bounding boxes and class probabilities for each grid cell.
Yolo is efficient because it only requires a single forward pass through the neural network to make predictions.
It can detect multiple objects in a s...read more

Add your answer

Q65. Difference between loss function and cost function.

Ans.

Loss function measures the error for a single training example, while cost function measures the average error for the entire training set.

Loss function is used to optimize the model parameters during training.
Cost function is used to evaluate the performance of the model after training.
Loss function is typically defined for a single training example.
Cost function is typically defined for the entire training set.
Examples of loss functions include mean squared error, cross-ent...read more

Add your answer

Q66. What is Random Forest algorithm?

Ans.

Random Forest is an ensemble learning algorithm that builds multiple decision trees and combines their outputs.

Random Forest is a supervised learning algorithm.
It can be used for both classification and regression tasks.
It creates multiple decision trees and combines their outputs to make a final prediction.
Random Forest reduces overfitting and improves accuracy compared to a single decision tree.
It randomly selects a subset of features for each tree to reduce correlation bet...read more

Add your answer

Q67. how embeddings works

Ans.

Embeddings are a way to represent words or phrases as vectors in a high-dimensional space.

Embeddings are learned through neural networks that analyze large amounts of text data.
They capture semantic and syntactic relationships between words.
They are used in natural language processing tasks such as language translation and sentiment analysis.
Popular embedding models include Word2Vec and GloVe.

Add your answer

Q68. Explain auc and roc

Ans.

AUC (Area Under the Curve) is a metric that measures the performance of a classification model. ROC (Receiver Operating Characteristic) is a graphical representation of the AUC.

AUC is a single scalar value that represents the area under the ROC curve.
ROC curve is a plot of the true positive rate against the false positive rate for different threshold values.
AUC ranges from 0 to 1, where a higher value indicates better model performance.
An AUC of 0.5 suggests the model is no b...read more

Add your answer

Frequently asked in

Infosys

Q69. Explain the difference between Precision and Recall.

Ans.

Precision is the ratio of true positives to all predicted positives, while recall is the ratio of true positives to all actual positives.

Precision measures how accurate the positive predictions are, while recall measures how complete the positive predictions are.
Precision is important when the cost of false positives is high, while recall is important when the cost of false negatives is high.
A high precision means that when the model predicts a positive, it is likely to be co...read more

Add your answer

Q70. What are clustering algorithms?

Ans.

Clustering algorithms are unsupervised machine learning techniques used to group similar data points together.

Clustering algorithms are used to identify patterns in data by grouping similar data points together.
They are unsupervised machine learning techniques, meaning they do not require labeled data.
Common clustering algorithms include k-means, hierarchical clustering, and DBSCAN.
Clustering can be used for customer segmentation, anomaly detection, and image segmentation, am...read more

Add your answer

Q71. Explain CNN models with practical skills

Ans.

CNN models are deep neural networks used for image classification and object recognition.

CNN models use convolutional layers to extract features from images
Pooling layers are used to reduce the spatial dimensions of the feature maps
Fully connected layers are used for classification
Examples of CNN models include VGG, ResNet, and Inception

Add your answer

Q72. ROC and AUC Differences

Ans.

ROC and AUC are performance metrics used in binary classification models.

ROC (Receiver Operating Characteristic) is a curve that plots the true positive rate against the false positive rate at different classification thresholds.
AUC (Area Under the Curve) is the area under the ROC curve and is a measure of the model's ability to distinguish between positive and negative classes.
ROC and AUC are commonly used to evaluate the performance of binary classification models and compa...read more

Add your answer

Q73. Do you know about Event Detection?

Ans.

Event Detection is the process of identifying and extracting meaningful events from data streams.

It involves analyzing data in real-time to detect patterns and anomalies
It is commonly used in fields such as finance, social media, and security
Examples include detecting fraudulent transactions, identifying trending topics on Twitter, and detecting network intrusions

Add your answer

Q74. How can you use GMM in anomaly detection?

Ans.

GMM can be used to model normal behavior and identify anomalies based on low probability density.

GMM can be used to fit a model to the normal behavior of a system or process.
Anomalies can be identified as data points with low probability density under the GMM model.
The number of components in the GMM can be adjusted to balance between overfitting and underfitting.
GMM can be combined with other techniques such as PCA or clustering for better anomaly detection.
Example: Using GM...read more

Add your answer

Q75. How Transformer work?

Ans.

Transformers work on the principle of electromagnetic induction to transfer electrical energy from one circuit to another.

Transformers have two coils of wire, a primary coil and a secondary coil, wrapped around a magnetic core.
When an alternating current flows through the primary coil, it creates a magnetic field that induces a voltage in the secondary coil.
The voltage induced in the secondary coil is proportional to the ratio of the number of turns in the secondary coil to t...read more

Add your answer

Q76. Which type of machine handled?

Ans.

I handle various types of machines including forklifts, cranes, and conveyor belts.

Forklifts
Cranes
Conveyor belts

Add your answer

Q77. What is the specialty in the architecture of ResNET?

Ans.

ResNET architecture specializes in deep residual learning, allowing for easier training of very deep neural networks.

ResNET introduces skip connections to help with the vanishing gradient problem in deep neural networks.
It consists of residual blocks where the input is added to the output of one or more layers.
This architecture enables the training of very deep networks (100+ layers) without issues like vanishing gradients.
ResNET won the ImageNet Large Scale Visual Recognitio...read more

Add your answer

Q78. How QDA works & It's working principle?

Ans.

QDA is a statistical method used for classification and prediction of data based on its attributes.

QDA stands for Quadratic Discriminant Analysis.
It is a supervised learning algorithm used in machine learning.
It is based on Bayes' theorem and assumes that the data follows a Gaussian distribution.
QDA calculates the probability of a data point belonging to a particular class based on its attributes.
It then assigns the data point to the class with the highest probability.
QDA is ...read more

Add your answer

Q79. What is AI and it functionality in machine learning?

Ans.

AI is the simulation of human intelligence in machines that can learn and perform tasks without explicit instructions.

AI enables machines to learn from data and improve their performance over time
Machine learning is a subset of AI that involves training algorithms to make predictions or decisions based on data
AI and machine learning are used in various industries such as healthcare, finance, and transportation
Examples of AI applications include virtual assistants, image recog...read more

Add your answer

Q80. 1. How to choose optimum probability threshold from ROC?

Ans.

To choose optimum probability threshold from ROC, we need to balance between sensitivity and specificity.

Choose the threshold that maximizes the sum of sensitivity and specificity
Use Youden's J statistic to find the optimal threshold
Consider the cost of false positives and false negatives
Use cross-validation to evaluate the performance of different thresholds

Add your answer

Frequently asked in

MasterCard

Q81. Explain Project about ML

Ans.

The project involves using machine learning algorithms to analyze and make predictions based on data.

Collecting and cleaning data
Selecting appropriate ML algorithms
Training and testing the model
Evaluating the model's performance
Applying the model to new data
Examples: predicting customer churn, detecting fraud, diagnosing diseases

Add your answer

Frequently asked in

Cisco

Q82. Detail explain validation sampling?

Ans.

Validation sampling is a process of selecting a subset of data from a larger population to assess the accuracy and reliability of a validation method.

Validation sampling is used to evaluate the performance of a validation process or method.
It involves selecting a representative sample from a larger population.
The sample should be chosen randomly to ensure unbiased results.
The size of the sample should be sufficient to provide reliable conclusions.
Validation sampling can be us...read more

View 2 more answers

Q83. What is precision, recall

Ans.

Precision and recall are evaluation metrics used in machine learning to measure the performance of a classification model.

Precision is the ratio of correctly predicted positive observations to the total predicted positive observations.
Recall is the ratio of correctly predicted positive observations to the all observations in actual class.
Precision is important when the cost of false positives is high, while recall is important when the cost of false negatives is high.
F1 score...read more

Add your answer

Frequently asked in

Accenture

Q84. What is ML and random forest classifier

Ans.

ML stands for Machine Learning, a subset of AI that uses algorithms to learn from data and make predictions. Random Forest is an ensemble learning method that creates multiple decision trees and combines their predictions.

ML is a subset of AI that uses algorithms to learn from data and make predictions
Random Forest is an ensemble learning method that creates multiple decision trees and combines their predictions
Random Forest is used for classification and regression tasks
Rand...read more

Add your answer

Frequently asked in

Cognizant

Q85. What is Cost function and Error Function

Ans.

Cost function measures the difference between predicted and actual values. Error function measures the average of cost function.

Cost function is used to evaluate the performance of a machine learning model.
It measures the difference between predicted and actual values.
Error function is the average of cost function over the entire dataset.
It is used to optimize the parameters of the model.
Examples of cost functions are mean squared error, mean absolute error, and cross-entropy...read more

Add your answer

Frequently asked in

Urban Company

Q86. Explain the classification algorithms you used in your project?

Ans.

I used multiple classification algorithms in my project.

Decision Tree: Used for creating a tree-like model to make decisions based on features.
Random Forest: Ensemble method using multiple decision trees to improve accuracy.
Logistic Regression: Used to predict binary outcomes based on input variables.
Support Vector Machines: Used for classification by finding the best hyperplane to separate data points.
Naive Bayes: Based on Bayes' theorem, used for probabilistic classificatio...read more

Add your answer

Frequently asked in

American Express

Q87. Explain bias and variance

Ans.

Bias is error due to overly simplistic assumptions in the learning algorithm, while variance is error due to too much complexity.

Bias is the error introduced by approximating a real-world problem, which can lead to underfitting.
Variance is the error introduced by modeling the noise in the training data, which can lead to overfitting.
High bias and low variance can result in underfitting, while low bias and high variance can result in overfitting.
Finding the right balance betwe...read more

Add your answer

Q88. What is multicollinearity and what are its effects?

Ans.

Multicollinearity is a phenomenon where two or more independent variables in a regression model are highly correlated.

It can lead to unstable and unreliable estimates of regression coefficients.
It can make it difficult to determine the individual effect of each independent variable on the dependent variable.
It can also result in inflated standard errors and p-values, making it difficult to identify statistically significant variables.
It can be detected using methods such as c...read more

Add your answer

Frequently asked in

HDFC Bank

Q89. Difference between GPT and BERT model

Ans.

GPT is a generative model while BERT is a transformer model for natural language processing.

GPT is a generative model that predicts the next word in a sentence based on previous words.
BERT is a transformer model that considers the context of a word by looking at the entire sentence.
GPT is unidirectional, while BERT is bidirectional.
GPT is better for text generation tasks, while BERT is better for understanding the context of words in a sentence.

Add your answer

Frequently asked in

Fractal Analytics

Q90. Which machine learning models did use see and why?

Ans.

I have experience with various machine learning models such as linear regression, decision trees, random forests, and neural networks.

Linear regression is used for predicting continuous outcomes.
Decision trees are used for classification and regression tasks.
Random forests are an ensemble method that combines multiple decision trees for improved accuracy.
Neural networks are used for complex pattern recognition and prediction tasks.

Add your answer

Q91. 2. Explain how does Prediction works

Ans.

Prediction uses data analysis and statistical models to forecast future outcomes.

Prediction involves collecting and analyzing data to identify patterns and trends.
Statistical models are then used to make predictions based on the identified patterns.
Predictions can be made for a wide range of applications, such as weather forecasting, stock market trends, and customer behavior.
Accuracy of predictions can be improved by using machine learning algorithms and incorporating new da...read more

Add your answer

Frequently asked in

Persistent Systems

Q92. What is principal component analysis? When would you use it?

Ans.

Principal Component Analysis (PCA) is a dimensionality reduction technique used to transform high-dimensional data into a lower-dimensional space.

PCA is used to identify patterns and relationships in data by reducing the number of variables.
It helps in visualizing and interpreting complex data by representing it in a simpler form.
PCA is commonly used in fields like image processing, genetics, finance, and social sciences.
It can be used for feature extraction, noise reduction,...read more

Add your answer

Q93. What are classification metrics?

Ans.

Classification metrics are used to evaluate the performance of a classification model by measuring its accuracy, precision, recall, F1 score, and more.

Classification metrics help in assessing how well a model is performing in terms of predicting the correct class labels.
Common classification metrics include accuracy, precision, recall, F1 score, ROC-AUC, and confusion matrix.
Accuracy measures the overall correctness of the model's predictions, while precision and recall focus...read more

Add your answer

Q94. How do you choose which ml model to use?

Ans.

The choice of ML model depends on the problem, data, and desired outcome.

Consider the problem type: classification, regression, clustering, etc.
Analyze the data: size, quality, features, and target variable.
Evaluate model performance: accuracy, precision, recall, F1-score.
Consider interpretability, scalability, and computational requirements.
Experiment with multiple models: decision trees, SVM, neural networks, etc.
Use cross-validation and hyperparameter tuning for model sele...read more

View 1 answer

Frequently asked in

Accenture

Q95. What is the Naive Bayes algorithm?

Ans.

Naive Bayes is a probabilistic algorithm used for classification and prediction based on Bayes' theorem.

It assumes that the presence of a particular feature in a class is unrelated to the presence of any other feature.
It calculates the probability of each class based on the input features and selects the class with the highest probability.
It is commonly used in spam filtering, sentiment analysis, and document classification.
It requires a training dataset with labeled examples...read more

Add your answer

Q96. Explain classification models and how they work

Ans.

Classification models are used to predict the category or class of a new observation based on past data.

Classification models assign new data points to a specific category or class based on patterns in the training data.
Common classification algorithms include logistic regression, decision trees, random forests, and support vector machines.
These models are evaluated based on metrics like accuracy, precision, recall, and F1 score.
Example: Predicting whether an email is spam or...read more

Add your answer

Q97. Explain the Transformer

Ans.

A transformer is an electrical device that transfers electrical energy between two or more circuits through electromagnetic induction.

Consists of two coils of wire, known as primary and secondary coils
Primary coil receives electrical energy and creates a magnetic field
Magnetic field induces a voltage in the secondary coil, transferring energy
Used to step up or step down voltage levels in power distribution systems
Commonly used in power substations, electrical appliances, and ...read more

Add your answer

Q98. Explain Confusion metrics

Ans.

Confusion metrics are used to evaluate the performance of a classification model by comparing predicted values with actual values.

Confusion matrix is a table that describes the performance of a classification model.
It consists of four different metrics: True Positive, True Negative, False Positive, and False Negative.
These metrics are used to calculate other evaluation metrics like accuracy, precision, recall, and F1 score.
For example, in a binary classification problem, a co...read more

Add your answer

Q99. How will you handle imbalance data?

Ans.

I will use techniques such as oversampling, undersampling, or SMOTE to handle imbalance data.

Use oversampling to increase the number of instances in the minority class.
Use undersampling to decrease the number of instances in the majority class.
Use Synthetic Minority Over-sampling Technique (SMOTE) to generate synthetic samples for the minority class.
Evaluate the performance of different techniques using metrics like precision, recall, and F1 score.

Add your answer

Q100. What is so special in mobilenet

Ans.

MobileNet is a lightweight deep learning model designed for mobile and embedded devices.

MobileNet uses depthwise separable convolutions to reduce the number of parameters and computations.
It has a small memory footprint and can be easily deployed on mobile and embedded devices.
MobileNet has been used for various applications such as image classification, object detection, and semantic segmentation.
It has achieved state-of-the-art performance on several benchmark datasets.
Mobi...read more

View 1 answer

Interview Questions of Machine Learning Related Designations

Data Analyst Interview Questions and Answers

1.4k Questions

Data Scientist Interview Questions and Answers

850 Questions

Senior Data Scientist Interview Questions and Answers

176 Questions

Data Science Intern Interview Questions and Answers

129 Questions

Machine Learning Engineer Interview Questions and Answers

112 Questions

Interview experiences of popular companies

TCS Interview Questions

3.7

• 10.5k Interviews

Accenture Interview Questions

3.8

• 8.2k Interviews

Infosys Interview Questions

3.6

• 7.6k Interviews

Cognizant Interview Questions

3.8

• 5.6k Interviews

Capgemini Interview Questions

3.7

• 4.8k Interviews

Deloitte Interview Questions

3.8

• 2.9k Interviews

EXL Service Interview Questions

3.7

• 738 Interviews

Tiger Analytics Interview Questions

3.6

• 221 Interviews

Quantiphi Analytics Solutions Private Limited Interview Questions

3.1

• 184 Interviews

Feynn Labs Interview Questions

4.0

• 27 Interviews

View all

Home

Interviews

Machine Learning Interview Questions

Share an Interview

Stay ahead in your career. Get AmbitionBox app

Helping over 1 Crore job seekers every month in choosing their right fit company

70 Lakh+

Reviews

5 Lakh+

Interviews

4 Crore+

Salaries

1 Cr+

Users/Month

70 Lakh+

Reviews

5 Lakh+

Interviews

4 Crore+

Salaries

1 Crore+

Users/Month

Contribute

Contribute to help millions

Company

Reviews

Users/Jobseekers

Employers

AmbitionBox Awards

AmbitionBox

Terms & Policies

Get AmbitionBox app

Top 250 Machine Learning Interview Questions and Answers

Q1. What is MLT?

Q2. 4. What is the difference between Linear Regression and Logistic Regression?

Q3. Which test is used in logistic regression to check the significance of the variable

Q4. What is the difference between logistic and linear regression?

Q5. what is random forest, knn?

Q6. What are the types of ML algorithms? Give an example of each.

Q7. Justify the need for using Recall instead of accuracy.

Q8. How would you measure model effectiveness without using any of confusion matrix metrics given the data is highly imbalanced

Machine Learning Jobs

Q9. in what scenarios would you advice me to not use ReLU in my hidden layers?

Q10. How is object detection done using CNN?

Q11. How do you handle overfitting and underfitting in Decision Trees

Q12. what is the difference between clustering and classification.

Q13. What all you know about Anomaly detection?

Q14. Explain about K means Clustering

Q15. How RNN handles exploding/vanishing Gradient?

Q16. How did you prevent your model from overfitting ? What did you do when it was underfit ?

Q17. Explain any supervised and unsupervised algorithm.

Q18. How to select features?

Q19. How to train CNN model

Q20. What is PCA and where and how it is used?

Q21. the difference between supervised and unsupervised machine learning

Q22. What is bais-variance tradeoff? Explain P values to non technical and technical audience.

Q23. What is the Bias Variance trade-off and name some models with high bias and low variance?

Q24. What LLM frameworks have you worked with?

Q25. Please tell me about the machine learning projects you have done

Q26. What is Transformers? Explain

Q27. what does KNN do during training?

Q28. What is SVM?

Q29. Where to used AML and not too used

Q30. How XGB is better than RF

Q31. diff between rnn and lstm

Q32. What is KNN and K-means

Q33. What is regularization? Why is it used?

Q34. Explain feature engineering process in ML modelling

Q35. What is rnn and lstm

Q36. How do you publish the models and share it?

Q37. What is difference between C and gamma in SVM

Q38. Where 1D CNN are used

Q39. how will you get the embeddings of long sentences/paragraphs that transformer models like BERT truncate? how will you go about using BERT for such sentences? will you use sentence embeddings or word embeddings...

Q40. 7. Explain KNN Algorithm?

Q41. What is Blue score in Regression

Q42. Explain working of recommendation system.

Q43. What is clustering?How k means works?

Q44. What metrics do you use to evaluate classification models

Q45. Do you know what are AI and ML ?

Q46. What is Naive Bayes in ML?

Q47. What is BERT &amp; Transformers

Q48. Mention some optimizers and loss functions used in machine learning?

Q49. What do these hyper parameters in the above mentioned algorithms actually mean?

Q50. Is it always important to apply ML algorithms to solve any statistical problem?

Q51. Have you heard about Gaussian Mixture Model? Can you explain it with an proper industrial example?

Q52. What different splitting criterion are applied in decision tree. Why random forest works better ?

Q53. What is the difference between sigmoid and softmax activation function?

Q55. Why do you think the objective of predictive modeling is minimizing the cost function? How would you define a cost function after all?

Q56. How would you approach the problem of training a model to detect this plastic bottle?

Q57. Explain the transformer architecture and positional encoders?

Q58. how does backpropagation in neural networks work?

Q59. How you proceed with model building.

Q60. What are different types of algorthim methods in machine learning?

Q61. What are different types of machine learning with examples

Q62. What is mlops and how it is different from machine learning

Q63. what are the techniques used in ML for CV apart from CV?

Q64. What is Yolo in object detection and how's it efficient?

Q65. Difference between loss function and cost function.

Q66. What is Random Forest algorithm?

Q67. how embeddings works

Q68. Explain auc and roc

Q69. Explain the difference between Precision and Recall.

Q70. What are clustering algorithms?

Q71. Explain CNN models with practical skills

Q72. ROC and AUC Differences

Q73. Do you know about Event Detection?

Q74. How can you use GMM in anomaly detection?

Q75. How Transformer work?

Q76. Which type of machine handled?

Q77. What is the specialty in the architecture of ResNET?

Q78. How QDA works &amp; It's working principle?

Q79. What is AI and it functionality in machine learning?

Q47. What is BERT & Transformers

Q78. How QDA works & It's working principle?