Decision Scientist
30+ Decision Scientist Interview Questions and Answers
Q1. Stats- Significance of Mean and Standard Deviation. Normal distribution, percentage distribution for every SD in Normal distribution
Explanation of significance of mean and standard deviation in normal distribution.
Mean represents the central tendency of the data while standard deviation measures the spread of the data.
In a normal distribution, approximately 68% of the data falls within one standard deviation of the mean, 95% within two standard deviations, and 99.7% within three standard deviations.
These percentages can be used to calculate the probability of a data point falling within a certain range of...read more
Q2. Guestimate: Estimate the number of sweet shops in your city
There are approximately 500 sweet shops in my city.
The number of sweet shops may vary depending on the size of the city.
Factors such as population density and cultural preferences may also affect the number of sweet shops.
A rough estimate can be made by dividing the population by the number of sweet shops per capita in similar cities.
Alternatively, a survey or data analysis of existing sweet shops can provide a more accurate estimate.
Decision Scientist Interview Questions and Answers for Freshers
Q3. How do you stay up to date with new analytical tools and techniques.
I stay up to date with new analytical tools and techniques by attending workshops, online courses, reading research papers, and participating in industry conferences.
Attend workshops and training sessions on new tools and techniques
Take online courses and certifications to learn about the latest advancements
Read research papers and articles to stay informed about cutting-edge methods
Participate in industry conferences and networking events to exchange knowledge and ideas
Q4. When is a z test used and when is t test used.
Z test is used when sample size is large and population standard deviation is known. T test is used when sample size is small or population standard deviation is unknown.
Z test is used for hypothesis testing when sample size is large (n > 30) and population standard deviation is known.
T test is used when sample size is small (n < 30) or population standard deviation is unknown.
Z test is used for comparing means of two populations when the population standard deviation is know...read more
Q5. What is the difference between data and data source?
Data is the information collected and stored, while data source is the origin or location from which the data is obtained.
Data is the raw facts and figures that are collected and stored for analysis.
Data source is the location or system from which the data is collected, such as a database, sensor, or survey.
Examples of data sources include customer surveys, website analytics, and social media platforms.
Q6. What is chi square test and when is it used
Chi square test is a statistical test used to determine if there is a significant association between two categorical variables.
Chi square test is used to compare observed frequencies with expected frequencies in a contingency table.
It is commonly used in research to analyze data and determine if there is a relationship between two variables.
For example, it can be used to test if there is a significant difference in the distribution of a disease between two groups.
Another exa...read more
Share interview questions and help millions of jobseekers 🌟
Q7. Most challenging problem you have solved or tried solving
Developing a predictive model for customer churn in a telecom company
Gathering and cleaning large amounts of customer data
Identifying key predictors of churn through statistical analysis
Building and testing various machine learning models
Iteratively refining the model to improve accuracy
Presenting findings and recommendations to stakeholders
Q8. Clustering project explanation and clustering metrics used.
Utilized K-means clustering to group customers based on purchasing behavior. Evaluated clusters using silhouette score and inertia.
Used K-means clustering algorithm to group customers into segments
Evaluated the quality of clusters using silhouette score and inertia
Silhouette score measures how similar an object is to its own cluster compared to other clusters
Inertia measures how tightly the clusters are packed together
Example: Clustered customers based on demographics and pur...read more
Decision Scientist Jobs
Q9. coding on sql to find monthwise cumulative sum of something.
Use SQL window function to calculate monthwise cumulative sum.
Use the SUM() function with OVER() clause to calculate cumulative sum.
Partition the data by month to get monthwise cumulative sum.
Order the data by date to ensure correct cumulative sum calculation.
Q10. Why do you want to work with analytical data?
I am passionate about uncovering insights and patterns in data to drive informed decision-making.
I enjoy working with numbers and finding trends in data sets.
Analyzing data allows me to make strategic decisions based on evidence rather than intuition.
I find satisfaction in solving complex problems using statistical methods and algorithms.
Q11. Past project explanation on predictive modeling
Developed predictive model to forecast customer churn using machine learning algorithms
Collected and cleaned customer data from various sources
Performed feature engineering to create relevant predictors
Built and trained machine learning models such as logistic regression and random forest
Evaluated model performance using metrics like accuracy, precision, and recall
Implemented the model in a production environment for real-time predictions
Q12. A guesstimate of no of flights on Bengaluru airport
The number of flights at Bengaluru airport can vary depending on the day and time, but on average, there are around 400-500 flights per day.
Consider the average number of flights per day at Bengaluru airport
Take into account the peak hours and off-peak hours for flight operations
Factor in the types of flights - domestic, international, cargo, etc.
Look at historical data or industry reports for more accurate estimates
Q13. difference between Stats and Data analysis
Stats focuses on summarizing and interpreting data, while data analysis involves exploring and drawing insights from data.
Statistics involves collecting, organizing, analyzing, and interpreting data to make informed decisions.
Data analysis involves cleaning, transforming, and visualizing data to discover patterns, trends, and insights.
Statistics uses mathematical formulas and techniques to summarize data, such as mean, median, and standard deviation.
Data analysis uses tools l...read more
Q14. Explain what i did in house price model
I developed a predictive model to estimate house prices based on various factors.
Collected and cleaned data on house features, location, and sale prices
Performed exploratory data analysis to identify key variables impacting house prices
Built and trained machine learning models such as linear regression or random forest
Evaluated model performance using metrics like RMSE or R-squared
Used the model to make predictions on new data and assess accuracy
Q15. What is binomial distribution?
Binomial distribution is a probability distribution that describes the number of successes in a fixed number of independent trials.
Describes the number of successes in a fixed number of independent trials
Each trial has only two possible outcomes (success or failure)
The trials are independent and the probability of success is constant
Examples: Coin toss (success = heads), Pass/fail exams, Yes/no surveys
Q16. What is central limit theorem?
Central Limit Theorem states that the sampling distribution of the sample mean approaches a normal distribution as the sample size increases.
Central Limit Theorem is a fundamental concept in statistics.
It states that the sampling distribution of the sample mean will be approximately normally distributed regardless of the shape of the population distribution.
As the sample size increases, the sampling distribution of the sample mean becomes more normally distributed.
It is used ...read more
Q17. What is K-means clustering?
K-means clustering is a popular unsupervised machine learning algorithm used for clustering data points into groups based on similarity.
Divides data points into K clusters based on similarity
Minimizes the sum of squared distances within each cluster
Requires specifying the number of clusters (K) beforehand
Iteratively assigns data points to the nearest cluster centroid
Commonly used in customer segmentation, image compression, and anomaly detection
Q18. order of executuion of SQL query
SQL queries are executed in a specific order to ensure accurate results.
SQL queries are executed in the following order: SELECT, FROM, WHERE, GROUP BY, HAVING, ORDER BY.
The SELECT clause is always executed first to retrieve the specified columns.
The FROM clause is then executed to specify the tables from which to retrieve data.
The WHERE clause filters the rows based on specified conditions.
The GROUP BY clause groups the rows based on specified columns.
The HAVING clause filter...read more
Q19. What is precision recall auc
Precision-Recall AUC is a metric used to evaluate the performance of classification models, particularly in imbalanced datasets.
Precision-Recall AUC focuses on the trade-off between precision and recall for different threshold values.
It is particularly useful when dealing with imbalanced datasets where the positive class is rare.
A higher Precision-Recall AUC indicates better model performance in terms of precision and recall.
It is often used in conjunction with the ROC AUC me...read more
Q20. What is Data and Why Mu Sigma
Data is information collected and analyzed for decision-making. Mu Sigma is a leading analytics company.
Data is raw facts and figures that can be processed to gain insights.
Mu Sigma is a data analytics company that helps businesses make data-driven decisions.
Data can come in various forms such as structured, unstructured, and semi-structured.
Mu Sigma uses advanced analytics techniques like machine learning and AI to extract valuable insights from data.
Data is essential for bu...read more
Q21. Explain working of recommendation system.
Recommendation system uses data analysis and machine learning algorithms to suggest items to users based on their preferences.
Collect user data and item data
Analyze data to find patterns and similarities
Use machine learning algorithms to make predictions and suggest items to users
Continuously update and improve the system based on user feedback
Examples: Netflix suggesting movies based on viewing history, Amazon suggesting products based on purchase history
Q22. What is cross validation
Cross validation is a technique used to evaluate the performance of a machine learning model by testing it on multiple subsets of the data.
It involves dividing the data into multiple subsets or folds.
The model is trained on a subset and tested on the remaining subset.
This process is repeated for all subsets and the results are averaged to get a final performance metric.
It helps to prevent overfitting and provides a more accurate estimate of the model's performance.
Examples in...read more
Q23. case steady of vodafone idea
Vodafone Idea is a telecom company in India facing financial challenges due to intense competition and regulatory issues.
Vodafone Idea is struggling to compete with other telecom companies in India such as Reliance Jio and Bharti Airtel.
The company has a large debt burden and has been unable to raise funds due to regulatory issues.
Vodafone Idea has been losing subscribers and market share due to poor network quality and customer service.
The Indian government has recently anno...read more
Q24. What is confusion matrix
A confusion matrix is a table used to evaluate the performance of a classification model.
It shows the number of true positives, true negatives, false positives, and false negatives.
It helps in calculating various evaluation metrics like accuracy, precision, recall, and F1 score.
It is useful in comparing the performance of different models.
Example: A confusion matrix for a binary classification problem can be represented as follows: | | Predicted Positive | Predicted Negative ...read more
Q25. What problem does ML solve
ML solves complex problems by analyzing data and making predictions or decisions based on patterns and trends.
ML can solve problems related to prediction, classification, clustering, anomaly detection, and recommendation.
Examples include predicting customer churn, classifying spam emails, clustering similar customer segments, detecting fraudulent transactions, and recommending products based on user behavior.
ML can automate tasks that are too complex or time-consuming for hum...read more
Q26. What is xgboost algorithm
XGBoost is a powerful machine learning algorithm known for its speed and performance in handling large datasets.
XGBoost stands for eXtreme Gradient Boosting
It is an implementation of gradient boosted decision trees designed for speed and performance
XGBoost is widely used in machine learning competitions and real-world applications
It can handle missing data, regularization, parallel processing, and custom optimization objectives
Example: XGBoost has been used in predicting cust...read more
Q27. python code using recursion
Python code using recursion
Recursion is a technique in which a function calls itself to solve a problem
Base case is important to prevent infinite recursion
Example: Factorial calculation using recursion - def factorial(n): return 1 if n == 0 else n * factorial(n-1)
Q28. What is inheritance?
Inheritance is the mechanism in object-oriented programming where a new class inherits attributes and methods from an existing class.
Inheritance allows for code reusability and promotes the concept of hierarchical classification.
The class that is being inherited from is called the parent class or superclass, while the class that inherits is called the child class or subclass.
Subclasses can add new attributes or methods, or override existing ones from the superclass.
Example: A...read more
Q29. Different types of Sql joins
Different types of SQL joins include inner join, left join, right join, and full outer join.
Inner join: Returns rows when there is a match in both tables
Left join: Returns all rows from the left table and the matched rows from the right table
Right join: Returns all rows from the right table and the matched rows from the left table
Full outer join: Returns rows when there is a match in either table
Q30. Types of Joins in SQL
Types of joins in SQL include inner join, left join, right join, and full outer join.
Inner join: Returns rows when there is a match in both tables
Left join: Returns all rows from the left table and the matched rows from the right table
Right join: Returns all rows from the right table and the matched rows from the left table
Full outer join: Returns rows when there is a match in either table
Q31. Reverse the string
Reverse a given string
Use built-in functions like reverse() or loop through the string
Create a new string and add characters in reverse order
Handle edge cases like empty string or single character
Q32. Explain tree models.
Tree models are predictive models that use a tree-like graph of decisions and their possible consequences.
Tree models split the data into subsets based on the most significant attributes.
They are used for classification and regression tasks.
Examples include decision trees, random forests, and gradient boosting trees.
Q33. Tools used
I use a variety of tools depending on the task at hand, including Python, R, SQL, Excel, and Tableau.
Python for data cleaning and analysis
R for statistical modeling and visualization
SQL for querying and manipulating databases
Excel for basic data analysis and visualization
Tableau for creating interactive dashboards and visualizations
Interview Questions of Similar Designations
Interview experiences of popular companies
Calculate your in-hand salary
Confused about how your in-hand salary is calculated? Enter your annual salary (CTC) and get your in-hand salary
Reviews
Interviews
Salaries
Users/Month