Upload Button Icon Add office photos

Filter interviews by

GeekBull Consulting Data Scientist Interview Questions, Process, and Tips

Updated 28 May 2024

Top GeekBull Consulting Data Scientist Interview Questions and Answers

  • Q1. you have two different vectors with only small change in one of the dimensions. but, the predictions/output from the model is drastically different for each vector. can y ...read more
  • Q2. why does optimisers matter? what's their purpose? what do they do in addition to weights-updation that the vanilla gradient and back-prop does?
  • Q3. how are LSTMs better than RNNs? what makes them better? how does LSTMs do better what they do better than vanilla RNNs?
View all 12 questions

GeekBull Consulting Data Scientist Interview Experiences

2 interviews found

I applied via Naukri.com and was interviewed in Jun 2022. There were 2 interview rounds.

Round 1 - Resume Shortlist 
Pro Tip by AmbitionBox:
Keep your resume crisp and to the point. A recruiter looks at your resume for an average of 6 seconds, make sure to leave the best impression.
View all tips
Round 2 - One-on-one 

(12 Questions)

  • Q1. What is correlation(in plain english)?
  • Ans. 

    Correlation is a statistical measure that shows how two variables are related to each other.

    • Correlation measures the strength and direction of the relationship between two variables.

    • It ranges from -1 to 1, where -1 indicates a perfect negative correlation, 0 indicates no correlation, and 1 indicates a perfect positive correlation.

    • Correlation does not imply causation, meaning that just because two variables are correlat...

  • Answered by AI
  • Q2. What is multi-collinearity?
  • Ans. 

    Multicollinearity is a phenomenon where two or more independent variables in a regression model are highly correlated.

    • It can lead to unstable and unreliable estimates of regression coefficients.

    • It can also make it difficult to determine the individual effect of each independent variable on the dependent variable.

    • It can be detected using correlation matrices or variance inflation factors (VIF).

    • Solutions include removing...

  • Answered by AI
  • Q3. What are p-values? explain it in plain english without bringing up machine learning?
  • Ans. 

    P-values are a statistical measure that helps determine the likelihood of obtaining a result by chance.

    • P-values range from 0 to 1, with a smaller value indicating stronger evidence against the null hypothesis.

    • A p-value of 0.05 or less is typically considered statistically significant.

    • P-values are commonly used in hypothesis testing to determine if a result is statistically significant or not.

  • Answered by AI
  • Q4. How are LSTMs better than RNNs? what makes them better? how does LSTMs do better what they do better than vanilla RNNs?
  • Ans. 

    LSTMs are better than RNNs due to their ability to handle long-term dependencies.

    • LSTMs have a memory cell that can store information for long periods of time.

    • They have gates that control the flow of information into and out of the cell.

    • This allows them to selectively remember or forget information.

    • Vanilla RNNs suffer from the vanishing gradient problem, which limits their ability to handle long-term dependencies.

    • LSTMs ...

  • Answered by AI
  • Q5. Does pooling in CNNs have any learning?
  • Ans. 

    Pooling in CNNs has learning but reduces spatial resolution.

    • Pooling helps in reducing overfitting by summarizing the features learned in a region.

    • Max pooling retains the strongest feature in a region while average pooling takes the average.

    • Pooling reduces the spatial resolution of the feature maps.

    • Pooling can also help in translation invariance.

    • However, too much pooling can lead to loss of important information.

  • Answered by AI
  • Q6. Why does optimisers matter? what's their purpose? what do they do in addition to weights-updation that the vanilla gradient and back-prop does?
  • Ans. 

    Optimizers are used to improve the efficiency and accuracy of the training process in machine learning models.

    • Optimizers help in finding the optimal set of weights for a given model by minimizing the loss function.

    • They use various techniques like momentum, learning rate decay, and adaptive learning rates to speed up the training process.

    • Optimizers also prevent the model from getting stuck in local minima and help in ge...

  • Answered by AI
  • Q7. What does KNN do during training?
  • Ans. 

    KNN during training stores all the data points and their corresponding labels to use for prediction.

    • KNN algorithm stores all the training data points and their corresponding labels.

    • It calculates the distance between the new data point and all the stored data points.

    • It selects the k-nearest neighbors based on the calculated distance.

    • It assigns the label of the majority of the k-nearest neighbors to the new data point.

  • Answered by AI
  • Q8. You have two different vectors with only small change in one of the dimensions. but, the predictions/output from the model is drastically different for each vector. can you explain why this can be the case...
  • Ans. 

    Small change in one dimension causing drastic difference in model output. Explanation and solution.

    • This is known as sensitivity to input

    • It can be caused by non-linearities in the model or overfitting

    • Regularization techniques can be used to reduce sensitivity

    • Cross-validation can help identify overfitting

    • Ensemble methods can help reduce sensitivity

    • It is generally a bad thing as it indicates instability in the model

  • Answered by AI
  • Q9. Slope vs gradient (again not in relation to machine learning, and in plain english)
  • Ans. 

    Slope and gradient are both measures of the steepness of a line, but slope is a ratio while gradient is a vector.

    • Slope is the ratio of the change in y to the change in x on a line.

    • Gradient is the rate of change of a function with respect to its variables.

    • Slope is a scalar value, while gradient is a vector.

    • Slope is used to describe the steepness of a line, while gradient is used to describe the direction and magnitude o...

  • Answered by AI
  • Q10. How are boosting and bagging algorithms different?
  • Ans. 

    Boosting and bagging are ensemble learning techniques used to improve model performance.

    • Bagging involves training multiple models on different subsets of the data and averaging their predictions.

    • Boosting involves training multiple models sequentially, with each model focusing on the errors of the previous model.

    • Bagging reduces variance and overfitting, while boosting reduces bias and underfitting.

    • Examples of bagging al...

  • Answered by AI
  • Q11. What is a logarithm? (in linear algebra) what is it's significance and what purpose does it serve?
  • Ans. 

    A logarithm is a mathematical function that measures the relationship between two quantities.

    • Logarithms are used to simplify complex calculations involving large numbers.

    • They are used in linear algebra to transform multiplicative relationships into additive ones.

    • Logarithms are also used in data analysis to transform skewed data into a more normal distribution.

    • Common logarithms use base 10, while natural logarithms use

  • Answered by AI
  • Q12. What are gradients? (not in relation to machine learning)
  • Ans. 

    Gradients are the changes in values of a function with respect to its variables.

    • Gradients are used in calculus to measure the rate of change of a function.

    • They are represented as vectors and indicate the direction of steepest ascent.

    • Gradients are used in optimization problems to find the minimum or maximum value of a function.

    • They are also used in physics to calculate the force acting on a particle.

    • Gradients can be cal

  • Answered by AI

Interview Preparation Tips

Interview preparation tips for other job seekers - be strong in fundamentals and be able to explain what and why of every project on your resume and all things things used in those projects.

Skills evaluated in this interview

Interview experience
1
Bad
Difficulty level
Easy
Process Duration
Less than 2 weeks
Result
Not Selected

I applied via Referral and was interviewed in Apr 2024. There was 1 interview round.

Round 1 - Technical 

(1 Question)

  • Q1. Time Series forecasting questions

Interview Preparation Tips

Interview preparation tips for other job seekers - They are happy with me send an intent to offer, but later declined as requirements changed (time waste)

Data Scientist Interview Questions Asked at Other Companies

Q1. Special Sum of ArrayYou have been given an array/list ‘arr’ of le ... read more
Q2. for a data with 1000 samples and 700 dimensions, how would you fi ... read more
asked in Affine
Q3. you have a pandas dataframe with three columns, filled with state ... read more
Q4. Clone a Linked List with random pointersGiven a linked list havin ... read more
Q5. Technical QuestionGiven a API reference. You had to make a post r ... read more

Interview questions from similar companies

Interview experience
5
Excellent
Difficulty level
-
Process Duration
-
Result
-
Round 1 - Technical 

(3 Questions)

  • Q1. Difference between bagging and boosting
  • Ans. 

    Bagging and boosting are ensemble learning techniques used to improve the performance of machine learning models by combining multiple weak learners.

    • Bagging (Bootstrap Aggregating) involves training multiple models independently on different subsets of the training data and then combining their predictions through averaging or voting.

    • Boosting involves training multiple models sequentially, where each subsequent model c...

  • Answered by AI
  • Q2. Parameters of Decision Tree
  • Ans. 

    Parameters of a Decision Tree include max depth, min samples split, criterion, and splitter.

    • Max depth: maximum depth of the tree

    • Min samples split: minimum number of samples required to split an internal node

    • Criterion: function to measure the quality of a split (e.g. 'gini' or 'entropy')

    • Splitter: strategy used to choose the split at each node (e.g. 'best' or 'random')

  • Answered by AI
  • Q3. Explain any one of your project in detail
  • Ans. 

    Developed a predictive model to forecast customer churn in a telecom company

    • Collected and cleaned customer data including usage patterns and demographics

    • Used machine learning algorithms such as logistic regression and random forest to build the model

    • Evaluated model performance using metrics like accuracy, precision, and recall

    • Provided actionable insights to the company to reduce customer churn rate

  • Answered by AI

Skills evaluated in this interview

Interview experience
3
Average
Difficulty level
Easy
Process Duration
Less than 2 weeks
Result
Not Selected

I was interviewed in Oct 2024.

Round 1 - Technical 

(1 Question)

  • Q1. Project related questions from your CV
Round 2 - Technical 

(2 Questions)

  • Q1. Question on transformers
  • Q2. Comparison of transfer learning and fintuning.
  • Ans. 

    Transfer learning involves using pre-trained models on a different task, while fine-tuning involves further training a pre-trained model on a specific task.

    • Transfer learning uses knowledge gained from one task to improve learning on a different task.

    • Fine-tuning involves adjusting the parameters of a pre-trained model to better fit a specific task.

    • Transfer learning is faster and requires less data compared to training a...

  • Answered by AI

Skills evaluated in this interview

Interview experience
3
Average
Difficulty level
Moderate
Process Duration
Less than 2 weeks
Result
Selected Selected

I applied via Approached by Company and was interviewed in Aug 2024. There were 2 interview rounds.

Round 1 - Coding Test 

*****, arjumpudi satyanarayana

Round 2 - Technical 

(5 Questions)

  • Q1. What is the python language
  • Ans. 

    Python is a high-level programming language known for its simplicity and readability.

    • Python is widely used for web development, data analysis, artificial intelligence, and scientific computing.

    • It emphasizes code readability and uses indentation for block delimiters.

    • Python has a large standard library and a vibrant community of developers.

    • Example: print('Hello, World!')

    • Example: import pandas as pd

  • Answered by AI
  • Q2. What is the code problems
  • Ans. 

    Code problems refer to issues or errors in the code that need to be identified and fixed.

    • Code problems can include syntax errors, logical errors, or performance issues.

    • Examples of code problems include missing semicolons, incorrect variable assignments, or inefficient algorithms.

    • Identifying and resolving code problems is a key skill for data scientists to ensure accurate and efficient data analysis.

  • Answered by AI
  • Q3. What is the python code
  • Ans. 

    Python code is a programming language used for data analysis, machine learning, and scientific computing.

    • Python code is written in a text editor or an integrated development environment (IDE)

    • Python code is executed using a Python interpreter

    • Python code can be used for data manipulation, visualization, and modeling

  • Answered by AI
  • Q4. What is the project
  • Ans. 

    The project is a machine learning model to predict customer churn for a telecommunications company.

    • Developing predictive models using machine learning algorithms

    • Analyzing customer data to identify patterns and trends

    • Evaluating model performance and making recommendations for reducing customer churn

  • Answered by AI
  • Q5. What is the lnderssip
  • Ans. 

    The question seems to be incomplete or misspelled.

    • It is possible that the interviewer made a mistake while asking the question.

    • Ask for clarification or context to provide a relevant answer.

  • Answered by AI

Interview Preparation Tips

Topics to prepare for IBM Data Scientist interview:
  • Python
  • Machine Learning
Interview preparation tips for other job seekers - No

Skills evaluated in this interview

Interview experience
4
Good
Difficulty level
-
Process Duration
-
Result
-
Round 1 - Assignment 

NER training using deep learning

Round 2 - Technical 

(2 Questions)

  • Q1. Describe the approach taken for assignment
  • Ans. 

    I approach assignments by breaking them down into smaller tasks, setting deadlines, and regularly checking progress.

    • Break down the assignment into smaller tasks to make it more manageable

    • Set deadlines for each task to stay on track

    • Regularly check progress to ensure everything is on schedule

    • Seek feedback from colleagues or supervisors to improve the quality of work

  • Answered by AI
  • Q2. Scenario based questions
Interview experience
3
Average
Difficulty level
Moderate
Process Duration
4-6 weeks
Result
Not Selected

I applied via Naukri.com and was interviewed in Sep 2024. There were 2 interview rounds.

Round 1 - Technical 

(3 Questions)

  • Q1. Overfitting and Underfitting
  • Q2. Find Nth-largest element
  • Ans. 

    Find Nth-largest element in an array

    • Sort the array in descending order

    • Return the element at index N-1

  • Answered by AI
  • Q3. NLP Data preprocessing
Round 2 - HR 

(2 Questions)

  • Q1. Salary Discussion
  • Q2. Fitment discussion

Skills evaluated in this interview

Interview experience
5
Excellent
Difficulty level
Easy
Process Duration
Less than 2 weeks
Result
Not Selected

I applied via Naukri.com and was interviewed in Jul 2024. There were 2 interview rounds.

Round 1 - One-on-one 

(3 Questions)

  • Q1. Tell me about yourself?
  • Ans. 

    I am a data scientist with a background in statistics and machine learning, passionate about solving complex problems using data-driven approaches.

    • Background in statistics and machine learning

    • Experience in solving complex problems using data-driven approaches

    • Passionate about leveraging data to drive insights and decision-making

  • Answered by AI
  • Q2. Describe in detail about one of my main project.
  • Ans. 

    Developed a predictive model for customer churn in a telecom company.

    • Collected and cleaned customer data including usage patterns and demographics.

    • Used machine learning algorithms such as logistic regression and random forest to build the model.

    • Evaluated model performance using metrics like accuracy, precision, and recall.

    • Implemented the model into the company's CRM system for real-time predictions.

  • Answered by AI
  • Q3. Few questions related to my projects.
Round 2 - Technical 

(1 Question)

  • Q1. Questions on Basics python(Since i am fresher)

Interview Preparation Tips

Interview preparation tips for other job seekers - Overall, it was a good experience for me. Very friendly interviewers. I couldn't make it after the second round. I came to know where I was lacking.
Interview experience
4
Good
Difficulty level
Moderate
Process Duration
-
Result
No response

I applied via Naukri.com and was interviewed in Jul 2024. There was 1 interview round.

Round 1 - Technical 

(6 Questions)

  • Q1. Which GenAI projects I have worked on
  • Q2. What is the context window in LLMs
  • Ans. 

    Context window in LLMs refers to the number of surrounding words considered when predicting the next word in a sequence.

    • Context window helps LLMs capture dependencies between words in a sentence.

    • A larger context window allows the model to consider more context but may lead to increased computational complexity.

    • For example, in a context window of 2, the model considers 2 words before and 2 words after the target word fo

  • Answered by AI
  • Q3. What is top_k parameter
  • Ans. 

    top_k parameter is used to specify the number of top elements to be returned in a result set.

    • top_k parameter is commonly used in machine learning algorithms to limit the number of predictions or recommendations.

    • For example, in recommendation systems, setting top_k=5 will return the top 5 recommended items for a user.

    • In natural language processing tasks, top_k can be used to limit the number of possible next words in a

  • Answered by AI
  • Q4. What are regex patterns in python
  • Ans. 

    Regex patterns in Python are sequences of characters that define a search pattern.

    • Regex patterns are used for pattern matching and searching in strings.

    • They are created using the 're' module in Python.

    • Examples of regex patterns include searching for email addresses, phone numbers, or specific words in a text.

  • Answered by AI
  • Q5. What are iterators and tuples
  • Ans. 

    Iterators are objects that allow iteration over a sequence of elements. Tuples are immutable sequences of elements.

    • Iterators are used to loop through elements in a collection, like lists or dictionaries

    • Tuples are similar to lists but are immutable, meaning their elements cannot be changed

    • Example of iterator: for item in list: print(item)

    • Example of tuple: my_tuple = (1, 2, 3)

  • Answered by AI
  • Q6. Do I have REST API experience
  • Ans. 

    Yes, I have experience working with REST APIs in various projects.

    • Developed RESTful APIs using Python Flask framework

    • Consumed REST APIs in data analysis projects using requests library

    • Used Postman for testing and debugging REST APIs

  • Answered by AI

Skills evaluated in this interview

Interview experience
4
Good
Difficulty level
Moderate
Process Duration
2-4 weeks
Result
Not Selected

I applied via Job Portal and was interviewed in Apr 2024. There was 1 interview round.

Round 1 - Technical 

(9 Questions)

  • Q1. Explain XGBoost algoritm
  • Ans. 

    XGBoost is a powerful machine learning algorithm known for its speed and performance in handling large datasets.

    • XGBoost stands for eXtreme Gradient Boosting, which is an implementation of gradient boosting machines.

    • It is widely used in machine learning competitions and is known for its speed and performance.

    • XGBoost uses a technique called boosting, where multiple weak learners are combined to create a strong learner.

    • It...

  • Answered by AI
  • Q2. XgBoost algorithm has 10-20 features. How are the splits decided, on which feature are they going to be divided?
  • Ans. 

    XgBoost algorithm uses a greedy approach to determine splits based on feature importance.

    • XgBoost algorithm calculates the information gain for each feature to determine the best split.

    • The feature with the highest information gain is chosen for the split.

    • This process is repeated recursively for each node in the tree.

    • Features can be split based on numerical values or categories.

    • Example: If a feature like 'age' has the hi...

  • Answered by AI
  • Q3. Do you have any experience on cloud platform?
  • Ans. 

    Yes, I have experience working on cloud platforms such as AWS and Google Cloud.

    • Experience with AWS services like S3, EC2, and Redshift

    • Familiarity with Google Cloud services like BigQuery and Compute Engine

    • Utilized cloud platforms for data storage, processing, and analysis

  • Answered by AI
  • Q4. What is entropy, information gain?
  • Ans. 

    Entropy is a measure of randomness or uncertainty in a dataset, while information gain is the reduction in entropy after splitting a dataset based on a feature.

    • Entropy is used in decision tree algorithms to determine the best feature to split on.

    • Information gain measures the effectiveness of a feature in classifying the data.

    • Higher information gain indicates that a feature is more useful for splitting the data.

    • Entropy ...

  • Answered by AI
  • Q5. What is hypothesis testing?
  • Ans. 

    Hypothesis testing is a statistical method used to make inferences about a population based on sample data.

    • Hypothesis testing involves formulating a null hypothesis and an alternative hypothesis.

    • The null hypothesis is assumed to be true until there is enough evidence to reject it.

    • Statistical tests are used to determine the likelihood of observing the data if the null hypothesis is true.

    • The p-value is used to determine ...

  • Answered by AI
  • Q6. Explain precision and recall, when are they used in which scenario?
  • Ans. 

    Precision and recall are metrics used in evaluating the performance of classification models.

    • Precision measures the accuracy of positive predictions, while recall measures the ability of the model to find all positive instances.

    • Precision = TP / (TP + FP)

    • Recall = TP / (TP + FN)

    • Precision is important when false positives are costly, while recall is important when false negatives are costly.

    • For example, in a spam email de...

  • Answered by AI
  • Q7. What is data imbalance?
  • Ans. 

    Data imbalance refers to unequal distribution of classes in a dataset, where one class has significantly more samples than others.

    • Data imbalance can lead to biased models that favor the majority class.

    • It can result in poor performance for minority classes, as the model may struggle to accurately predict them.

    • Techniques like oversampling, undersampling, and using different evaluation metrics can help address data imbala...

  • Answered by AI
  • Q8. What is SMOTE? Do you have any experience working on Time Series? Code analysis of global variable?
  • Ans. 

    SMOTE stands for Synthetic Minority Over-sampling Technique, used to balance imbalanced datasets by generating synthetic samples.

    • SMOTE is commonly used in machine learning to address class imbalance by creating synthetic samples of the minority class.

    • It works by generating new instances of the minority class by interpolating between existing instances.

    • SMOTE is particularly useful in scenarios where the minority class i...

  • Answered by AI
  • Q9. Find 5th highest salary in every department. What are window functions Difference between union and union all Difference between delete and truncate.

Interview Preparation Tips

Interview preparation tips for other job seekers - Prepare basics well. Go through the top questions asked for SQL,Python,Data Science.
Well versed with resume projects and concepts used in it.

Skills evaluated in this interview

GeekBull Consulting Interview FAQs

How many rounds are there in GeekBull Consulting Data Scientist interview?
GeekBull Consulting interview process usually has 1-2 rounds. The most common rounds in the GeekBull Consulting interview process are Resume Shortlist, One-on-one Round and Technical.
How to prepare for GeekBull Consulting Data Scientist interview?
Go through your CV in detail and study all the technologies mentioned in your CV. Prepare at least two technologies or languages in depth if you are appearing for a technical interview at GeekBull Consulting. The most common topics and skills that interviewers at GeekBull Consulting expect are Data Science, Data Warehousing, Deep Learning, Llm and Machine Learning.
What are the top questions asked in GeekBull Consulting Data Scientist interview?

Some of the top questions asked at the GeekBull Consulting Data Scientist interview -

  1. you have two different vectors with only small change in one of the dimensions....read more
  2. why does optimisers matter? what's their purpose? what do they do in addition t...read more
  3. how are LSTMs better than RNNs? what makes them better? how does LSTMs do bette...read more

Tell us how to improve this page.

People are getting interviews through

based on 2 GeekBull Consulting interviews
Referral
Job Portal
50%
50%
Moderate Confidence
?
Moderate Confidence means the data is based on a sufficient number of responses received from the candidates
Software Engineer
5 salaries
unlock blur

₹4.5 L/yr - ₹7.5 L/yr

Project Manager
4 salaries
unlock blur

₹6 L/yr - ₹8.4 L/yr

Embedded Engineer
4 salaries
unlock blur

₹3 L/yr - ₹4.3 L/yr

Embedded Software Engineer
3 salaries
unlock blur

₹4.2 L/yr - ₹9 L/yr

Associate Data Scientist
3 salaries
unlock blur

₹3.2 L/yr - ₹3.2 L/yr

Explore more salaries
Compare GeekBull Consulting with

TCS

3.7
Compare

Infosys

3.7
Compare

Wipro

3.7
Compare

HCLTech

3.5
Compare

Calculate your in-hand salary

Confused about how your in-hand salary is calculated? Enter your annual salary (CTC) and get your in-hand salary
Did you find this page helpful?
Yes No
write
Share an Interview