Home
Communities
Companies
- Companies
  
  Discover best places to work
- Compare Companies
  
  Compare & find best workplace
- Add Office Photos
  
  Bring your workplace to life
- Add Company Benefits
  
  Highlight your company's perks
Reviews
- Company reviews
  
  Read reviews for 6L+ companies
- Write a review
  
  Rate your former or current company
Salaries
- Browse salaries
  
  Discover salaries for 6L+ companies
- Salary calculator
  
  Calculate your take home salary
- Are you paid fairly?
  
  Check your market value
- Share your salary
  
  Help other jobseekers
- Gratuity calculator
  
  Check your gratuity amount
- HRA calculator
  
  Check how much of your HRA is tax-free
- Salary hike calculator
  
  Check your salary hike
Interviews
- Company interviews
  
  Read interviews for 40K+ companies
- Share interview questions
  
  Contribute your interview questions
Jobs
Awards

VIEW WINNERS
- ABECA 2025
  
  VIEW WINNERS
  
  AmbitionBox Employee Choice Awards - 4th Edition
- ABECA 2024
  
  AmbitionBox Employee Choice Awards - 3rd Edition
- AmbitionBox Best Places to Work 2022
  
  2nd Edition
Participate in ABECA 2026

Add office photos

Employer? Claim Account for FREE

Citicorp

Compare

3.7

based on 4.9k Reviews

Video summary

Filter interviews by

Citicorp Data Scientist Interview Questions and Answers

Updated 19 Apr 2024

8 Interview questions

A Data Scientist was asked

Q. Explain the Gini coefficient.

Ans.

Gini coefficient measures the inequality among values of a frequency distribution.

Gini coefficient ranges from 0 to 1, where 0 represents perfect equality and 1 represents perfect inequality.
It is commonly used to measure income inequality in a population.
A Gini coefficient of 0.4 or higher is considered to be a high level of inequality.
Gini coefficient can be calculated using the Lorenz curve, which plots the cum...

A Data Scientist was asked

Q. Explain the logistic regression process.

Ans.

Logistic regression is a statistical method used to analyze and model the relationship between a binary dependent variable and one or more independent variables.

It is a type of regression analysis used for predicting the outcome of a categorical dependent variable based on one or more predictor variables.
It uses a logistic function to model the probability of the dependent variable taking a particular value.
It is ...

A Data Scientist was asked

Q. What is the difference between bagging and boosting?

Ans.

Bagging and boosting are ensemble methods used in machine learning to improve model performance.

Bagging involves training multiple models on different subsets of the training data and then combining their predictions through averaging or voting.
Boosting involves iteratively training models on the same dataset, with each subsequent model focusing on the samples that were misclassified by the previous model.
Bagging ...

A Data Scientist was asked

Q. How do you check for multicollinearity in Logistic Regression?

Ans.

Multicollinearity in logistic regression can be checked using correlation matrix and variance inflation factor (VIF).

Calculate the correlation matrix of the independent variables and check for high correlation coefficients.
Calculate the VIF for each independent variable and check for values greater than 5 or 10.
Consider removing one of the highly correlated variables or variables with high VIF to address multicoll...

What people are saying about Citicorp

View All

a senior software engineer

2w (edited)

Which one should I select?

I'm holding offer from Boeing, Citi and Ford which one should I choose? My priority is work life balance and job security at some point. Boeing - 35LPA Team Lead Citi - 33.5 LPA Sr. Software Engineer Ford - 37LPA Team Lead Iris software - 28 LPA WFH - Sr. Software Engineer

Got a question about Citicorp?

Ask anonymously on communities.

A Data Scientist was asked

Q. What are variable reduction techniques?

Ans.

Variable reducing techniques are methods used to identify and select the most relevant variables in a dataset.

Variable reducing techniques help in reducing the number of variables in a dataset.
These techniques aim to identify the most important variables that contribute significantly to the outcome.
Some common variable reducing techniques include feature selection, dimensionality reduction, and correlation analysi...

A Data Scientist was asked

Q. What is R-squared, and how does R-squared differ from Adjusted R-squared?

Ans.

R square is a statistical measure that represents the proportion of the variance in the dependent variable explained by the independent variables.

R square is a value between 0 and 1, where 0 indicates that the independent variables do not explain any of the variance in the dependent variable, and 1 indicates that they explain all of it.
It is used to evaluate the goodness of fit of a regression model.
Adjusted R squ...

A Data Scientist was asked

Q. Which test is used in logistic regression to check the significance of the variable?

Ans.

The Wald test is used in logistic regression to check the significance of the variable.

The Wald test calculates the ratio of the estimated coefficient to its standard error.
It follows a chi-square distribution with one degree of freedom.
A small p-value indicates that the variable is significant.
For example, in Python, the statsmodels library provides the Wald test in the summary of a logistic regression model.

Are these interview questions helpful?

A Data Scientist was asked

Q. How to check outliers in a variable, what treatment should you use to remove such outliers

Ans.

Outliers can be detected using statistical methods like box plots, z-score, and IQR. Treatment can be removal or transformation.

Use box plots to visualize outliers
Calculate z-score and remove data points with z-score greater than 3
Calculate IQR and remove data points outside 1.5*IQR
Transform data using log or square root to reduce the impact of outliers

Citicorp Data Scientist Interview Experiences

3 interviews found

Data Scientist Interview Questions & Answers

Anonymous

posted on 19 Apr 2024

Interview experience

Good

Difficulty level

Process Duration

Result

I appeared for an interview before Apr 2023.

Round 1 - Technical

(1 Question)

Q1. Basic statistics

Add your answer

Round 2 - Technical

(1 Question)

Q1. Project related

Add your answer

Interview Preparation Tips

Interview preparation tips for other job seekers - Donot join citi....no job security at all...I joined and was thrown in 3months due to their restructuring and budget issues.very bad management

Data Scientist Interview Questions & Answers

laltaki

posted on 6 Feb 2024

Interview experience

Average

Difficulty level

Moderate

Process Duration

2-4 weeks

Result

I applied via Company Website and was interviewed before Feb 2023. There was 1 interview round.

Round 1 - Technical

(1 Question)

Q1. ML concepts , regression, regularization etc

Add your answer

Data Scientist Interview Questions & Answers

Anonymous

posted on 10 Sep 2020

I applied via Walk-in and was interviewed in Mar 2020. There was 1 interview round.

Interview Questionnaire

10 Questions

Q1. What is R square and how R square is different from Adjusted R square

Ans.

R square is a statistical measure that represents the proportion of the variance in the dependent variable explained by the independent variables.

R square is a value between 0 and 1, where 0 indicates that the independent variables do not explain any of the variance in the dependent variable, and 1 indicates that they explain all of it.
It is used to evaluate the goodness of fit of a regression model.
Adjusted R square t...

Answered by AI

View 1 more answer

Q2. Explain what do u understand by the team WOE and IV. What's the importance. Advantages and disadvantages

Ans.

WOE (Weight of Evidence) and IV (Information Value) are metrics used for feature selection and assessing predictive power in models.

WOE transforms categorical variables into continuous variables, making them more suitable for modeling.
IV quantifies the predictive power of a feature by measuring the separation between the good and bad outcomes.
For example, if a feature has an IV of 0.3, it indicates strong predictive po...

Answered by AI

View 1 more answer

Q3. What are variable reducing techniques

Ans.

Variable reducing techniques are methods used to identify and select the most relevant variables in a dataset.

Variable reducing techniques help in reducing the number of variables in a dataset.
These techniques aim to identify the most important variables that contribute significantly to the outcome.
Some common variable reducing techniques include feature selection, dimensionality reduction, and correlation analysis.
Fea...

Answered by AI

View 1 more answer

Q4. Which test is used in logistic regression to check the significance of the variable

Ans.

The Wald test is used in logistic regression to check the significance of the variable.

The Wald test calculates the ratio of the estimated coefficient to its standard error.
It follows a chi-square distribution with one degree of freedom.
A small p-value indicates that the variable is significant.
For example, in Python, the statsmodels library provides the Wald test in the summary of a logistic regression model.

Answered by AI

View 1 more answer

Q5. How to check multicollinearity in Logistic regression

Ans.

Multicollinearity in logistic regression can be checked using correlation matrix and variance inflation factor (VIF).

Calculate the correlation matrix of the independent variables and check for high correlation coefficients.
Calculate the VIF for each independent variable and check for values greater than 5 or 10.
Consider removing one of the highly correlated variables or variables with high VIF to address multicollinear...

Answered by AI

View 1 more answer

Q6. Difference between bagging and boosting

Ans.

Bagging and boosting are ensemble methods used in machine learning to improve model performance.

Bagging involves training multiple models on different subsets of the training data and then combining their predictions through averaging or voting.
Boosting involves iteratively training models on the same dataset, with each subsequent model focusing on the samples that were misclassified by the previous model.
Bagging reduc...

Answered by AI

Add your answer

Q7. Explain the logistics regression process

Ans.

Logistic regression is a statistical method used to analyze and model the relationship between a binary dependent variable and one or more independent variables.

It is a type of regression analysis used for predicting the outcome of a categorical dependent variable based on one or more predictor variables.
It uses a logistic function to model the probability of the dependent variable taking a particular value.
It is commo...

Answered by AI

Add your answer

Q8. Explain Gini coefficient

Ans.

Gini coefficient measures the inequality among values of a frequency distribution.

Gini coefficient ranges from 0 to 1, where 0 represents perfect equality and 1 represents perfect inequality.
It is commonly used to measure income inequality in a population.
A Gini coefficient of 0.4 or higher is considered to be a high level of inequality.
Gini coefficient can be calculated using the Lorenz curve, which plots the cumulati...

Answered by AI

Add your answer

Q9. Difference between chair and cart

Ans.

A chair is a piece of furniture used for sitting, while a cart is a vehicle used for transporting goods.

A chair typically has a backrest and armrests, while a cart does not.
A chair is designed for one person to sit on, while a cart can carry multiple items or people.
A chair is usually stationary, while a cart is mobile and can be pushed or pulled.
A chair is commonly found in homes, offices, and public spaces, while a c...

Answered by AI

Add your answer

Q10. How to check outliers in a variable, what treatment should you use to remove such outliers

Ans.

Outliers can be detected using statistical methods like box plots, z-score, and IQR. Treatment can be removal or transformation.

Use box plots to visualize outliers
Calculate z-score and remove data points with z-score greater than 3
Calculate IQR and remove data points outside 1.5*IQR
Transform data using log or square root to reduce the impact of outliers

Answered by AI

Add your answer

Interview Preparation Tips

Interview preparation tips for other job seekers - Explain the concept properly, if not able to explain properly then take a pause and try again with some examples. Be confident.

Skills evaluated in this interview

Interview questions from similar companies

Data Scientist Interview Questions & Answers

HSBC Group

Anonymous

posted on 11 Aug 2022

I applied via Recruitment Consulltant and was interviewed before Aug 2021. There was 1 interview round.

Round 1 - Technical

(1 Question)

Q1. Difference between CNN and MLP

Ans.

CNN is used for image recognition while MLP is used for general classification tasks.

CNN uses convolutional layers to extract features from images while MLP uses fully connected layers.
CNN is better suited for tasks that require spatial understanding like object detection while MLP is better for tabular data.
CNN has fewer parameters than MLP due to weight sharing in convolutional layers.
CNN can handle input of varying ...

Answered by AI

Add your answer

Interview Preparation Tips

Interview preparation tips for other job seekers - Brush up basic statistics . Also prepare atleast 2 , 3 ML algorithms for the interview.

Skills evaluated in this interview

What people are saying about Citicorp

View All

a senior software engineer

2w (edited)

Which one should I select?

Got a question about Citicorp?

Ask anonymously on communities.

Data Scientist Interview Questions & Answers

HSBC Group

Anonymous

posted on 13 Sep 2022

I applied via Approached by Company and was interviewed before Sep 2021. There were 3 interview rounds.

Round 1 - Resume Shortlist

Pro Tip by AmbitionBox:

Keep your resume crisp and to the point. A recruiter looks at your resume for an average of 6 seconds, make sure to leave the best impression.

View all tips

Round 2 - Technical

(1 Question)

Q1. Projects and Data Science concepts

Add your answer

Round 3 - Technical

(1 Question)

Q1. Python and coding skills

Add your answer

Interview Preparation Tips

Interview preparation tips for other job seekers - Be through with concepts - ML, stats, NLP

Data Scientist Interview Questions & Answers

American Express

Anonymous

posted on 5 Jul 2024

Interview experience

Good

Difficulty level

Hard

Process Duration

Less than 2 weeks

Result

Selected

I applied via Campus Placement and was interviewed before Jul 2023. There were 3 interview rounds.

Round 1 - Aptitude Test

Medium General Aptitude questions and technical(Big Data, Python etc.)

Round 2 - Technical

(1 Question)

Q1. ML Algorithms (SVM, Random forest, bagging boosting, ridge, etc)

Add your answer

Round 3 - Technical

(1 Question)

Q1. Deep equations and understading of DL and ML Algorithms

Ans.

Understanding deep equations and algorithms in DL and ML is crucial for a data scientist.

Deep learning involves complex neural network architectures like CNNs and RNNs.
Machine learning algorithms include decision trees, SVM, k-means clustering, etc.
Understanding the math behind algorithms helps in optimizing model performance.
Equations like gradient descent, backpropagation, and loss functions are key concepts.
Practica...

Answered by AI

Add your answer

Skills evaluated in this interview

Data Scientist Interview Questions & Answers

Wells Fargo

Anonymous

posted on 6 Oct 2021

Interview Questionnaire

3 Questions

Q1. Mainly resume based. In detail from the project.

Add your answer

Q2. Softmax vs sigmoid

Ans.

Softmax and sigmoid are both activation functions used in neural networks.

Softmax is used for multi-class classification problems, while sigmoid is used for binary classification problems.
Softmax outputs a probability distribution over the classes, while sigmoid outputs a probability for a single class.
Softmax ensures that the sum of the probabilities of all classes is 1, while sigmoid does not.
Softmax is more sensitiv...

Answered by AI

Add your answer

Q3. Logistics regression (multiclass)

Add your answer

Interview Preparation Tips

Interview preparation tips for other job seekers - Prepare the projects mentioned in your resume very well

Skills evaluated in this interview

Are these interview questions helpful?

Data Scientist Interview Questions & Answers

Motilal Oswal Financial Services

Anonymous

posted on 20 Jun 2024

Interview experience

Good

Difficulty level

Easy

Process Duration

2-4 weeks

Result

Selected

I applied via IIM Jobs and was interviewed before Jun 2023. There was 1 interview round.

Round 1 - Technical

(2 Questions)

Q1. SQL basic questions

Add your answer

Q2. Python - pandas, numpy based questions

Add your answer

Interview Preparation Tips

Interview preparation tips for other job seekers - Great place to work

Data Scientist Interview Questions & Answers

Motilal Oswal Financial Services

Ravindra Kumar Pandey

posted on 7 May 2024

Interview experience

Good

Difficulty level

Moderate

Process Duration

Result

No response

I applied via Job Portal and was interviewed in Nov 2023. There was 1 interview round.

Round 1 - One-on-one

(5 Questions)

Q1. What is Gradient Descents?

Ans.

Gradient descent is an optimization algorithm used to minimize a function by iteratively moving in the direction of steepest descent.

Gradient descent is used to find the minimum of a function by taking steps proportional to the negative of the gradient at the current point.
It is commonly used in machine learning to optimize the parameters of a model by minimizing the loss function.
There are different variants of gradie...

Answered by AI

Add your answer

Q2. What is LSTM?, and what are the gates in it?

Ans.

LSTM (Long Short-Term Memory) is a type of recurrent neural network designed to handle long-term dependencies.

LSTM has three gates: input gate, forget gate, and output gate.
Input gate controls the flow of information into the cell state.
Forget gate decides what information to discard from the cell state.
Output gate determines the output based on the cell state.

Answered by AI

Add your answer

Q3. They gave me a link to dataset and started saying the operations to apply on that. E.g, value_counts, null_values, fill the values with mean,etc.

Add your answer

Q4. What is t-test? What is Mean, Median and Mode and where to use these?

Ans.

T-test is a statistical test used to determine if there is a significant difference between the means of two groups.

Mean is the average of a set of numbers, median is the middle value when the numbers are ordered, and mode is the most frequently occurring value.
Mean is sensitive to outliers, median is robust to outliers, and mode is useful for categorical data.
T-test is used to compare means of two groups, mean is used...

Answered by AI

Add your answer

Q5. What is RANDOM FOREST ?

Ans.

Random Forest is an ensemble learning method used for classification and regression tasks.

Random Forest is a collection of decision trees that are trained on random subsets of the data.
Each tree in the forest makes a prediction, and the final prediction is the average (regression) or majority vote (classification) of all trees.
Random Forest helps reduce overfitting and improve accuracy compared to a single decision tre...

Answered by AI

Add your answer

Interview Preparation Tips

Topics to prepare for Motilal Oswal Financial Services Data Scientist interview:

Machine Learning
Statistics
Pandas

Skills evaluated in this interview

Data Scientist Interview Questions & Answers

American Express

Anonymous

posted on 18 May 2025

Interview experience

Good

Difficulty level

Moderate

Process Duration

Less than 2 weeks

Result

Not Selected

I appeared for an interview in Apr 2025, where I was asked the following questions.

Q1. Tell me about yourself?

Add your answer

Q2. What are the GenAI industry trends? ChatGPT vs DeepSeek high level discussion.

Ans.

GenAI trends focus on advancements in AI models, with ChatGPT and DeepSeek showcasing different applications and capabilities.

Increased adoption of conversational AI in customer service, exemplified by ChatGPT's integration into various platforms.
DeepSeek focuses on specialized knowledge retrieval, enhancing search capabilities in niche domains like legal or medical fields.
Emergence of hybrid models combining generativ...

Answered by AI

Add your answer

Q3. ML question: Random Forest vs XGBoost

Ans.

Random Forest is an ensemble method using bagging, while XGBoost uses boosting for improved accuracy and speed.

Random Forest builds multiple decision trees and merges them for better accuracy.
XGBoost optimizes the model by sequentially adding trees that correct errors of previous ones.
Random Forest is less prone to overfitting compared to individual decision trees.
XGBoost includes regularization techniques to prevent o...

Answered by AI

Add your answer

Q4. ML question: Classification vs Regression

Ans.

Classification predicts categories, while regression predicts continuous values in machine learning tasks.

Classification: Assigns labels to data points (e.g., spam vs. not spam).
Regression: Predicts numerical values (e.g., house prices based on features).
Classification algorithms include logistic regression, decision trees, and SVM.
Regression algorithms include linear regression, polynomial regression, and regression t...

Answered by AI

Add your answer

Q5. ML question: How does one handle data imbalance in a dataset?

Ans.

Data imbalance can skew model performance; various techniques can help mitigate its effects.

Resampling techniques: Use oversampling (e.g., SMOTE) or undersampling to balance classes.
Use different evaluation metrics: Focus on precision, recall, or F1-score instead of accuracy.
Implement cost-sensitive learning: Assign higher misclassification costs to minority class errors.
Utilize ensemble methods: Techniques like Random...

Answered by AI

Add your answer

Q6. ML question: When to use accuracy, precision, recall? How to identify the classification model's performance using above three? F score, ROC area under the curve.

Add your answer

Q7. ML question: What are the X and Y axis of ROC curve?

Ans.

The ROC curve plots true positive rate against false positive rate to evaluate classifier performance.

X-axis: False Positive Rate (FPR) - the ratio of negative instances incorrectly classified as positive.
Y-axis: True Positive Rate (TPR) - the ratio of positive instances correctly classified as positive.
Example: A model with a TPR of 0.9 and FPR of 0.1 indicates high sensitivity but some false alarms.
The ROC curve help...

Answered by AI

Add your answer

Q8. SQL questions: Employee and department table; 1. Salary of employees with salary > avg salary 2. Salary of employees with salary > avg salary of department 3. Salary of top 3 employees with salary > avg sa...

Add your answer

Q9. Any other questions for us?

Ans.

I'm curious about the team's data science projects and how they align with the company's goals and vision.

What are the current data science projects the team is working on?
How does the data science team collaborate with other departments?
Can you share examples of how data-driven decisions have impacted the company?
What tools and technologies does the team primarily use?
How does the company support continuous learning a...

Answered by AI

Add your answer

Interview Preparation Tips

Interview preparation tips for other job seekers - Focus on ML, SQL, and Gen AI trends.

Citicorp Interview FAQs

How many rounds are there in Citicorp Data Scientist interview?

Citicorp interview process usually has 1-2 rounds. The most common rounds in the Citicorp interview process are Technical.

How to prepare for Citicorp Data Scientist interview?

Go through your CV in detail and study all the technologies mentioned in your CV. Prepare at least two technologies or languages in depth if you are appearing for a technical interview at Citicorp. The most common topics and skills that interviewers at Citicorp expect are Data Science, Machine Learning, Natural Language Processing, Credit Risk and Data Analytics.

What are the top questions asked in Citicorp Data Scientist interview?

Some of the top questions asked at the Citicorp Data Scientist interview -

Which test is used in logistic regression to check the significance of the vari...read more
What is R square and how R square is different from Adjusted R squ...read more
How to check outliers in a variable, what treatment should you use to remove su...read more

Tell us how to improve this page.

Citicorp Interviews By Designations

Interview Questions for Popular Designations

3.5/5

based on 2 interview experiences

Difficulty level

Moderate 100%

Duration

2-4 weeks 100%

JPMorgan Chase & Co. Interview Questions

3.9

• 797 Interviews

Wells Fargo Interview Questions

3.8

• 619 Interviews

HSBC Group Interview Questions

3.9

• 511 Interviews

American Express Interview Questions

4.1

• 387 Interviews

BNY Interview Questions

3.8

• 366 Interviews

UBS Interview Questions

3.9

• 351 Interviews

Motilal Oswal Financial Services Interview Questions

3.6

• 337 Interviews

Morgan Stanley Interview Questions

3.6

• 308 Interviews

State Street Corporation Interview Questions

3.7

• 248 Interviews

Cholamandalam Investment & Finance Interview Questions

3.9

• 221 Interviews

View all

Citicorp Data Scientist Salary

based on 198 salaries

₹21.1 L/yr - ₹37.8 L/yr

84% more than the average Data Scientist Salary in India

View more details

Citicorp Salaries in India

Assistant Vice President 5.2k salaries	₹28 L/yr - ₹45 L/yr
Assistant Manager 3.2k salaries	₹10.6 L/yr - ₹19 L/yr
Officer 3k salaries	₹17.6 L/yr - ₹31.5 L/yr
Vice President 2.8k salaries	₹39.2 L/yr - ₹65 L/yr
Manager 2.3k salaries	₹17.2 L/yr - ₹31 L/yr