10+ Sreeleathers Interview Questions and Answers

Question 1

Asked in

Q1. How will you handle class imbalanced dataset to increase the f1 score ?

Add your answer

Answer

Handling class imbalanced dataset involves techniques like resampling, using different algorithms, adjusting class weights, and using ensemble methods.

Use resampling techniques like oversampling the minority class or undersampling the majority class.
Try using different algorithms that are less sensitive to class imbalance, such as Random Forest or XGBoost.
Adjust class weights in the model to give more importance to the minority class.
Utilize ensemble methods like bagging or b...read more

Question 2

Asked in

Data Scientist Interview

Q2. Are all the decision trees same in a random forest ?

Add your answer

Answer

No, decision trees in a random forest are different due to the use of bootstrapping and feature randomization.

Decision trees in a random forest are trained on different subsets of the data through bootstrapping.
Each decision tree in a random forest also considers only a random subset of features at each split.
The final prediction in a random forest is made by aggregating the predictions of all individual decision trees.

Question 3

Asked in

Data Scientist Interview

Q3. What do you understand by Deep learning neural networks

Add your answer

Answer

Deep learning neural networks are a type of artificial neural network with multiple layers, used for complex pattern recognition.

Deep learning neural networks consist of multiple layers of interconnected nodes, allowing for more complex patterns to be learned.
They are capable of automatically learning features from data, eliminating the need for manual feature engineering.
Examples include Convolutional Neural Networks (CNNs) for image recognition and Recurrent Neural Networks...read more

Question 4

Asked in

Data Scientist Interview

Q4. What is biasing and what is overfitting and underfitting

Add your answer

Answer

Biasing is the error due to overly simplistic assumptions in the learning algorithm. Overfitting is when a model is too complex and fits the training data too closely, leading to poor generalization. Underfitting is when a model is too simple to capture the underlying structure of the data.

Biasing occurs when a model has high error on both training and test data due to oversimplified assumptions.
Overfitting happens when a model is too complex and captures noise in the trainin...read more

Question 5

Asked in

Data Scientist Interview

Q5. What is neutral network? Explain back propagation. Explain difference in CNN and RNN Live coding questions were also asked

Add your answer

Answer

Neural network is a computational model inspired by the way the human brain works, used for machine learning tasks.

Neural network is a series of algorithms that attempts to recognize underlying relationships in a set of data through a process that mimics the way the human brain operates.
Backpropagation is a technique used to train neural networks by updating the weights of the network to minimize the difference between the predicted output and the actual output.
CNN (Convoluti...read more

Question 6

Asked in

Data Scientist Interview

Q6. What is the difference between precision and recall

Add your answer

Answer

Precision is the ratio of correctly predicted positive observations to the total predicted positives, while recall is the ratio of correctly predicted positive observations to the all observations in actual class.

Precision focuses on the accuracy of positive predictions, while recall focuses on the proportion of actual positives that were correctly identified.
Precision = TP / (TP + FP), Recall = TP / (TP + FN)
High precision means that when the model predicts a positive result...read more

Question 7

Asked in

Data Scientist Interview

Q7. What is RELU and activation functions

Add your answer

Answer

RELU is an activation function used in neural networks to introduce non-linearity.

RELU stands for Rectified Linear Unit.
It is a simple function that returns the input if it is positive, and 0 otherwise.
It is commonly used in deep learning models due to its simplicity and effectiveness.
Other activation functions include sigmoid, tanh, and softmax.

Question 8

Asked in

Data Scientist Interview

Q8. What is your expected ctc?

Add your answer

Answer

I am looking for a competitive salary based on industry standards and my experience.

Research industry standards for Data Scientist salaries
Consider my level of experience and skills when determining salary expectations
Be open to negotiation based on the overall compensation package offered

Question 9

Asked in

Data Scientist Interview

Q9. What is stemming and lematization

Add your answer

Answer

Stemming and lemmatization are techniques used in natural language processing to reduce words to their base or root form.

Stemming is a process of reducing words to their base form by removing suffixes.
Lemmatization is a process of reducing words to their base form by considering the context and part of speech.
Stemming is faster but may not always produce a valid word, while lemmatization is slower but produces valid words.
Example of stemming: 'running' -> 'run', 'jumps' -> 'j...read more

Question 10

Asked in

Data Scientist Interview

Q10. How to measure multicollinearity

Add your answer

Answer

Multicollinearity can be measured using correlation matrix, variance inflation factor (VIF), or eigenvalues.

Calculate the correlation matrix to identify highly correlated variables.
Use the variance inflation factor (VIF) to quantify the extent of multicollinearity.
Check for high eigenvalues in the correlation matrix, indicating multicollinearity.
Consider using dimensionality reduction techniques like principal component analysis (PCA) to address multicollinearity.

Question 11

Asked in

Data Scientist Interview

Q11. What is homoscedasticity

Add your answer

Answer

Homoscedasticity refers to the assumption that the variance of errors is constant across all levels of the independent variable.

Homoscedasticity is a key assumption in linear regression analysis.
It indicates that the residuals (errors) have constant variance.
If the residuals exhibit a pattern where the spread of points increases or decreases as the predicted values increase, it violates the assumption of homoscedasticity.
This violation can lead to biased and inefficient estim...read more

Question 12

Asked in

Data Scientist Interview

Q12. Explain about bias variance trade off

Add your answer

Answer

Bias-variance trade off is the balance between underfitting and overfitting in machine learning models.

Bias refers to the error introduced by approximating a real-world problem, leading to underfitting.
Variance refers to the model's sensitivity to fluctuations in the training data, leading to overfitting.
Finding the right balance between bias and variance is crucial for creating a model that generalizes well to unseen data.
Regularization techniques like Lasso and Ridge regres...read more

Question 13

Asked in

Data Scientist Interview

Q13. What are optimizers

Add your answer

Answer

Optimizers are algorithms used to adjust the parameters of a model to minimize the error between predicted and actual values.

Optimizers are used in machine learning to improve the accuracy of models.
They work by adjusting the weights and biases of a model during training.
Common optimizers include Gradient Descent, Adam, and RMSprop.
The choice of optimizer depends on the type of problem and the characteristics of the data.
Optimizers can help models converge faster and avoid ge...read more

Question 14

Asked in

Data Scientist Interview

Q14. Explain about poisson distribution

Add your answer

Answer

Poisson distribution is a probability distribution that expresses the likelihood of a given number of events occurring in a fixed interval of time or space.

Describes the number of events that occur in a fixed interval of time or space
Events are independent of each other
Average rate of occurrence is constant
Examples: number of emails received in an hour, number of customers arriving at a store in a day

Question 15

Asked in

Data Scientist Interview

Q15. Explain pipeline flow

Add your answer

Answer

Pipeline flow is the process of moving data through a series of interconnected stages or steps in a systematic manner.

Pipeline flow involves the sequential movement of data from one stage to another, with each stage performing a specific task or transformation.
It helps in automating and streamlining the data processing process, making it more efficient and scalable.
Examples of pipeline flow include data preprocessing, feature engineering, model training, and model evaluation ...read more

10+ Sreeleathers Interview Questions and Answers

Q1. How will you handle class imbalanced dataset to increase the f1 score ?

Q2. Are all the decision trees same in a random forest ?

Q3. What do you understand by Deep learning neural networks

Q4. What is biasing and what is overfitting and underfitting

Q5. What is neutral network? Explain back propagation. Explain difference in CNN and RNN Live coding questions were also asked

Q6. What is the difference between precision and recall

Q7. What is RELU and activation functions

Q8. What is your expected ctc?

Q9. What is stemming and lematization

Q10. How to measure multicollinearity

Q11. What is homoscedasticity

Q12. Explain about bias variance trade off

Q13. What are optimizers

Q14. Explain about poisson distribution

Q15. Explain pipeline flow

More about working at Deloitte

Top HR Questions asked in Sreeleathers

Interview Process at Sreeleathers

Top Data Scientist Interview Questions from Similar Companies