Citicorp
BinaryRoots Interview Questions and Answers
Q1. Which test is used in logistic regression to check the significance of the variable
The Wald test is used in logistic regression to check the significance of the variable.
The Wald test calculates the ratio of the estimated coefficient to its standard error.
It follows a chi-square distribution with one degree of freedom.
A small p-value indicates that the variable is significant.
For example, in Python, the statsmodels library provides the Wald test in the summary of a logistic regression model.
Q2. What is R square and how R square is different from Adjusted R square
R square is a statistical measure that represents the proportion of the variance in the dependent variable explained by the independent variables.
R square is a value between 0 and 1, where 0 indicates that the independent variables do not explain any of the variance in the dependent variable, and 1 indicates that they explain all of it.
It is used to evaluate the goodness of fit of a regression model.
Adjusted R square takes into account the number of predictors in the model an...read more
Q3. How to check outliers in a variable, what treatment should you use to remove such outliers
Outliers can be detected using statistical methods like box plots, z-score, and IQR. Treatment can be removal or transformation.
Use box plots to visualize outliers
Calculate z-score and remove data points with z-score greater than 3
Calculate IQR and remove data points outside 1.5*IQR
Transform data using log or square root to reduce the impact of outliers
Q4. How to check multicollinearity in Logistic regression
Multicollinearity in logistic regression can be checked using correlation matrix and variance inflation factor (VIF).
Calculate the correlation matrix of the independent variables and check for high correlation coefficients.
Calculate the VIF for each independent variable and check for values greater than 5 or 10.
Consider removing one of the highly correlated variables or variables with high VIF to address multicollinearity.
Example: If variables A and B have a correlation coeff...read more
Q5. What are variable reducing techniques
Variable reducing techniques are methods used to identify and select the most relevant variables in a dataset.
Variable reducing techniques help in reducing the number of variables in a dataset.
These techniques aim to identify the most important variables that contribute significantly to the outcome.
Some common variable reducing techniques include feature selection, dimensionality reduction, and correlation analysis.
Feature selection methods like backward elimination, forward ...read more
Q6. Difference between bagging and boosting
Bagging and boosting are ensemble methods used in machine learning to improve model performance.
Bagging involves training multiple models on different subsets of the training data and then combining their predictions through averaging or voting.
Boosting involves iteratively training models on the same dataset, with each subsequent model focusing on the samples that were misclassified by the previous model.
Bagging reduces variance and overfitting, while boosting reduces bias a...read more
Q7. Difference between chair and cart
A chair is a piece of furniture used for sitting, while a cart is a vehicle used for transporting goods.
A chair typically has a backrest and armrests, while a cart does not.
A chair is designed for one person to sit on, while a cart can carry multiple items or people.
A chair is usually stationary, while a cart is mobile and can be pushed or pulled.
A chair is commonly found in homes, offices, and public spaces, while a cart is often used in warehouses, supermarkets, and farms.
Q8. Explain the logistics regression process
Logistic regression is a statistical method used to analyze and model the relationship between a binary dependent variable and one or more independent variables.
It is a type of regression analysis used for predicting the outcome of a categorical dependent variable based on one or more predictor variables.
It uses a logistic function to model the probability of the dependent variable taking a particular value.
It is commonly used in machine learning for classification problems, ...read more
Q9. Explain Gini coefficient
Gini coefficient measures the inequality among values of a frequency distribution.
Gini coefficient ranges from 0 to 1, where 0 represents perfect equality and 1 represents perfect inequality.
It is commonly used to measure income inequality in a population.
A Gini coefficient of 0.4 or higher is considered to be a high level of inequality.
Gini coefficient can be calculated using the Lorenz curve, which plots the cumulative percentage of the total income against the cumulative p...read more
Interview Process at BinaryRoots
Top Data Scientist Interview Questions from Similar Companies
Reviews
Interviews
Salaries
Users/Month