Standard Chartered
10+ Rupeek Interview Questions and Answers
Q1. How would you perform outlier analysis- detection and treatment?
Outlier analysis involves identifying and treating data points that are significantly different from the rest.
Identify outliers using statistical methods such as box plots, scatter plots, and z-scores.
Determine the cause of the outlier and decide whether to remove it or keep it in the dataset.
Consider the impact of outliers on the analysis and adjust the model accordingly.
Use techniques such as winsorization or data transformation to treat outliers.
Repeat the analysis after t...read more
Q2. How would you impute missing value when we don't ant to use single value for imputation?
Multiple imputation can be used to impute missing values by creating multiple datasets with imputed values.
Use multiple imputation to create multiple datasets with imputed values
Combine the results from the multiple datasets to obtain a final imputed dataset
Consider using predictive models to impute missing values
Evaluate the quality of imputation using metrics such as mean squared error or R-squared
Q3. How would you perform variable selection before modelling/ multicollinearity?
Variable selection can be done using techniques like correlation matrix, stepwise regression, and principal component analysis.
Check for correlation between variables using correlation matrix
Use stepwise regression to select variables based on their significance
Perform principal component analysis to identify important variables
Check for multicollinearity using variance inflation factor (VIF)
Consider domain knowledge and business requirements while selecting variables
Q4. How would you test model performance of classification models?
Model performance of classification models can be tested using various metrics.
Use confusion matrix to calculate accuracy, precision, recall, and F1 score.
ROC curve and AUC can be used to evaluate model's ability to distinguish between positive and negative classes.
Cross-validation can be used to test model's performance on different subsets of data.
Use lift charts to compare model's performance with random selection.
Use KS statistic to measure the separation between positive...read more
Q5. What is Xgboost? How it is different from Random Forest?
Xgboost is a gradient boosting algorithm used for classification and regression tasks. It is faster and more accurate than Random Forest.
Xgboost stands for Extreme Gradient Boosting
It is a type of gradient boosting algorithm that uses decision trees
It is faster and more accurate than Random Forest
Xgboost uses a more regularized model formalization to control overfitting
Random Forest builds multiple decision trees and combines them to get a more accurate prediction
Xgboost is w...read more
Q6. How would you measure relationship between two features?
The relationship between two features can be measured using correlation coefficient.
Calculate the correlation coefficient using statistical methods.
Correlation coefficient ranges from -1 to 1.
A positive correlation indicates a direct relationship between the features.
A negative correlation indicates an inverse relationship between the features.
A correlation coefficient of 0 indicates no relationship between the features.
Q7. what is Loss function in Logistic Regression?
Loss function in Logistic Regression measures the difference between predicted and actual values.
It is used to optimize the model parameters during training.
The most common loss function used in logistic regression is the binary cross-entropy loss.
The goal is to minimize the loss function to improve the accuracy of the model.
The loss function is calculated using the predicted probabilities and the actual labels.
Other loss functions used in logistic regression include hinge lo...read more
Q8. What is Logistic Regression and when do we use it?
Logistic Regression is a statistical method used to analyze and model the relationship between a binary dependent variable and one or more independent variables.
It is used when the dependent variable is binary (0 or 1).
It estimates the probability of an event occurring based on the values of the independent variables.
It is commonly used in credit risk analysis to predict the likelihood of default.
It can also be used in marketing to predict the likelihood of a customer making ...read more
Q9. What is the use of Learning rate in Xgboost?
Learning rate controls the step size at each boosting iteration in Xgboost.
Learning rate is a hyperparameter that determines the contribution of each tree in the final output.
A smaller learning rate requires more trees to be added to the model, but can lead to better performance.
A larger learning rate can speed up the training process, but may result in overfitting.
Typical values for learning rate range from 0.01 to 0.2.
Example: setting a learning rate of 0.1 means that each ...read more
Q10. what is p value and what it's interpretation?
P value is the probability of obtaining a result as extreme or more extreme than the observed result, assuming the null hypothesis is true.
P value is used in hypothesis testing to determine the significance of a result.
A small p value (less than 0.05) indicates strong evidence against the null hypothesis.
A large p value (greater than 0.05) indicates weak evidence against the null hypothesis.
P value should not be used as the sole criterion for accepting or rejecting a hypothes...read more
Q11. given a 4 coordinates, write a memory efficient program to check if it's forming a square
Program to check if 4 coordinates form a square
Calculate distance between all pairs of points
Check if all distances are equal
Check if diagonals are equal
Use Pythagorean theorem to calculate distance
Q12. What loss function is used in Xgboost?
The loss function used in Xgboost is customizable and can be specified by the user.
Xgboost supports various loss functions such as binary logistic regression, multi-class classification, and regression.
The default loss function for binary classification is logistic regression while for regression it is mean squared error.
Users can specify their own loss function by defining a custom objective and evaluation function.
The objective function measures the difference between predi...read more
Q13. What are the parameters in Xgboost?
Xgboost parameters include learning rate, max depth, subsample, colsample by tree, and more.
Learning rate controls the step size during training.
Max depth limits the depth of each tree.
Subsample controls the fraction of observations to be randomly sampled for each tree.
Colsample by tree controls the fraction of features to be randomly sampled for each tree.
Other parameters include min child weight, gamma, and lambda for regularization.
Q14. How would you test variable importance
Variable importance can be tested using various methods such as permutation importance, drop column importance, and SHAP values.
Permutation importance involves randomly shuffling the values of a variable and measuring the decrease in model performance.
Drop column importance involves removing a variable from the model and measuring the decrease in model performance.
SHAP values provide a measure of the contribution of each variable to the model output.
Other methods include corr...read more
Q15. How to calculate EAD & PD?
EAD is calculated using the formula EAD = Exposure at Default = PD x LGD x EAD
Calculate Probability of Default (PD) based on historical data and credit rating
Determine Loss Given Default (LGD) based on collateral or recovery rate
Use the formula EAD = PD x LGD x EAD to calculate Exposure at Default
Q16. Difference between counterparty and credit risk?
Counterparty risk is the risk of default by a party in a financial transaction, while credit risk is the risk of loss due to a borrower's failure to repay a loan.
Counterparty risk is specific to financial transactions involving parties such as banks, brokers, or counterparties in derivatives contracts.
Credit risk is more general and refers to the risk of loss due to a borrower's failure to repay a loan or meet other financial obligations.
Counterparty risk is typically associa...read more
More about working at Standard Chartered
Reviews
Interviews
Salaries
Users/Month