Filter interviews by
Top trending discussions
I applied via Approached by Company and was interviewed in Sep 2024. There was 1 interview round.
I applied via Recruitment Consulltant and was interviewed in Jul 2024. There were 3 interview rounds.
I applied via Job Portal and was interviewed in Apr 2024. There was 1 interview round.
XGBoost is a powerful machine learning algorithm known for its speed and performance in handling large datasets.
XGBoost stands for eXtreme Gradient Boosting, which is an implementation of gradient boosting machines.
It is widely used in machine learning competitions and is known for its speed and performance.
XGBoost uses a technique called boosting, where multiple weak learners are combined to create a strong learner.
It...
XgBoost algorithm uses a greedy approach to determine splits based on feature importance.
XgBoost algorithm calculates the information gain for each feature to determine the best split.
The feature with the highest information gain is chosen for the split.
This process is repeated recursively for each node in the tree.
Features can be split based on numerical values or categories.
Example: If a feature like 'age' has the hi...
Yes, I have experience working on cloud platforms such as AWS and Google Cloud.
Experience with AWS services like S3, EC2, and Redshift
Familiarity with Google Cloud services like BigQuery and Compute Engine
Utilized cloud platforms for data storage, processing, and analysis
Entropy is a measure of randomness or uncertainty in a dataset, while information gain is the reduction in entropy after splitting a dataset based on a feature.
Entropy is used in decision tree algorithms to determine the best feature to split on.
Information gain measures the effectiveness of a feature in classifying the data.
Higher information gain indicates that a feature is more useful for splitting the data.
Entropy ...
Hypothesis testing is a statistical method used to make inferences about a population based on sample data.
Hypothesis testing involves formulating a null hypothesis and an alternative hypothesis.
The null hypothesis is assumed to be true until there is enough evidence to reject it.
Statistical tests are used to determine the likelihood of observing the data if the null hypothesis is true.
The p-value is used to determine ...
Precision and recall are metrics used in evaluating the performance of classification models.
Precision measures the accuracy of positive predictions, while recall measures the ability of the model to find all positive instances.
Precision = TP / (TP + FP)
Recall = TP / (TP + FN)
Precision is important when false positives are costly, while recall is important when false negatives are costly.
For example, in a spam email de...
Data imbalance refers to unequal distribution of classes in a dataset, where one class has significantly more samples than others.
Data imbalance can lead to biased models that favor the majority class.
It can result in poor performance for minority classes, as the model may struggle to accurately predict them.
Techniques like oversampling, undersampling, and using different evaluation metrics can help address data imbala...
SMOTE stands for Synthetic Minority Over-sampling Technique, used to balance imbalanced datasets by generating synthetic samples.
SMOTE is commonly used in machine learning to address class imbalance by creating synthetic samples of the minority class.
It works by generating new instances of the minority class by interpolating between existing instances.
SMOTE is particularly useful in scenarios where the minority class i...
I applied via Approached by Company and was interviewed in May 2024. There were 3 interview rounds.
DSA was asked. And general coding language questions were asked. Previous experience based questions were asked.
Machine Learning, Generative AI, Deep learning interview questions. 2 Coding problems based on Algorithms.
Cosine similarity measures the similarity between two non-zero vectors in an inner product space.
Cosine similarity ranges from -1 to 1, with 1 indicating identical vectors and -1 indicating opposite vectors.
It is commonly used in information retrieval, text mining, and recommendation systems.
Formula: cos(theta) = (A . B) / (||A|| * ||B||)
Example: Calculating similarity between two documents based on their word frequenc
Recall is the ratio of correctly predicted positive observations to the all observations in actual class, while precision is the ratio of correctly predicted positive observations to the total predicted positive observations.
Recall is about the actual positive instances that were correctly identified by the model.
Precision is about the predicted positive instances and how many of them were actually positive.
Recall = Tr...
Stop words are common words like 'the', 'is', 'and' that are removed from text data to improve analysis.
Stop words are commonly removed from text data to improve the accuracy of natural language processing tasks.
They are typically removed before tokenization and can be done using libraries like NLTK or spaCy.
Examples of stop words include 'the', 'is', 'and', 'in', 'on', etc.
I applied via Naukri.com and was interviewed in Mar 2024. There were 3 interview rounds.
Machine learning algorithms are tools used to analyze data, identify patterns, and make predictions without being explicitly programmed.
Machine learning algorithms can be categorized into supervised, unsupervised, and reinforcement learning.
Examples of machine learning algorithms include linear regression, decision trees, support vector machines, and neural networks.
These algorithms require training data to learn patte...
Developing a credit risk model involves several steps to assess the likelihood of a borrower defaulting on a loan.
1. Define the problem and objectives of the credit risk model.
2. Gather relevant data such as credit history, income, debt-to-income ratio, etc.
3. Preprocess the data by handling missing values, encoding categorical variables, and scaling features.
4. Select a suitable machine learning algorithm such as logi...
AIC and BIC are statistical measures used for model selection in the context of regression analysis.
AIC (Akaike Information Criterion) is used to compare the goodness of fit of different models. It penalizes the model for the number of parameters used.
BIC (Bayesian Information Criterion) is similar to AIC but penalizes more heavily for the number of parameters, making it more suitable for model selection when the focus...
XGBoost is a popular gradient boosting library while LightGBM is a faster and more memory-efficient alternative.
XGBoost is known for its accuracy and performance on structured/tabular data.
LightGBM is faster and more memory-efficient, making it suitable for large datasets.
LightGBM uses a histogram-based algorithm for splitting whereas XGBoost uses a level-wise tree growth strategy.
Supervised learning uses labeled data to train a model, while unsupervised learning uses unlabeled data.
Supervised learning requires a target variable for training the model.
Examples of supervised learning include classification and regression.
Unsupervised learning finds patterns and relationships in data without a target variable.
Examples of unsupervised learning include clustering and dimensionality reduction.
Sigmoid function is a mathematical function that maps any real value to a value between 0 and 1.
Used in machine learning for binary classification problems to produce probabilities
Commonly used in logistic regression
Has an S-shaped curve
Equation: f(x) = 1 / (1 + e^(-x))
I applied via Naukri.com and was interviewed in Aug 2023. There were 3 interview rounds.
Standardization is the process of rescaling the features so that they have the properties of a standard normal distribution with a mean of 0 and a standard deviation of 1.
Standardization helps in comparing different features on a common scale.
It is useful when the features have different units or scales.
Commonly used in machine learning algorithms like support vector machines and k-nearest neighbors.
Example: If one fea...
Normalization is the process of scaling and standardizing data to a common range.
Normalization helps in comparing different features on the same scale.
Common techniques include Min-Max scaling and Z-score normalization.
Example: Scaling age and income variables to a range of 0 to 1.
Overfitting occurs when a model learns the training data too well, leading to poor performance on new data. Underfitting occurs when a model is too simple to capture the underlying patterns in the data.
Overfitting: Model is too complex, fits noise in the training data, performs poorly on new data
Underfitting: Model is too simple, fails to capture underlying patterns in the data, performs poorly on both training and new...
LLM models, or Language Model Models, are a type of machine learning model that focuses on predicting the next word in a sequence of words.
LLM models are commonly used in natural language processing tasks such as text generation, machine translation, and speech recognition.
They are trained on large amounts of text data to learn the relationships between words and predict the most likely next word in a given context.
Exa...
Data Analyst
9
salaries
| ₹2.2 L/yr - ₹4.8 L/yr |
Data Scientist
6
salaries
| ₹4 L/yr - ₹12.9 L/yr |
Front end Developer
5
salaries
| ₹4 L/yr - ₹8 L/yr |
Full Stack Developer
5
salaries
| ₹3.6 L/yr - ₹9.5 L/yr |
HR Manager
4
salaries
| ₹4.6 L/yr - ₹6.5 L/yr |
TCS
Infosys
Wipro
HCLTech