Filter interviews by
Random Forest is an ensemble learning algorithm that creates multiple decision trees and combines their predictions.
Random Forest is a collection of decision trees that are trained on random subsets of the data.
Each tree in the Random Forest independently predicts the outcome, and the final prediction is made by averaging the predictions of all trees.
Random Forest is used for classification and regression tasks, a...
Series is a one-dimensional labeled array while Dataframe is a two-dimensional labeled data structure.
Series can hold data of any type while Dataframe is a collection of Series.
Dataframe is like a table with rows and columns, while Series is like a single column of that table.
Dataframe is more versatile and powerful compared to Series.
Example: Series - a column of employee names. Dataframe - a table with columns f...
Assumptions of linear regression are important for the model to be valid and reliable.
Linear relationship between independent and dependent variables
Independence of residuals (errors)
Homoscedasticity (constant variance of residuals)
Normality of residuals
No multicollinearity among independent variables
Rank assigns unique ranks to each row based on the order specified, while Dense Rank assigns consecutive ranks without gaps.
Rank may have gaps in ranks if there are ties, while Dense Rank does not have gaps.
Rank function is used to assign a unique rank to each row based on the specified order, while Dense Rank function assigns consecutive ranks.
Example: If three rows have the same value and are ranked 1, 1, and 2 ...
Central Limit Theorem states that the sampling distribution of the sample mean approaches a normal distribution as the sample size increases.
The Central Limit Theorem is a fundamental concept in statistics that states that the sampling distribution of the sample mean will be approximately normally distributed, regardless of the shape of the population distribution, as the sample size increases.
It is important beca...
SQL query to join two tables
Use JOIN keyword to combine rows from two or more tables based on a related column between them
Specify the columns to be selected from each table
Use ON keyword to specify the join condition
R-Squared measures the proportion of variance explained by the model, while Adjusted R-Squared adjusts for the number of predictors in the model.
R-Squared increases as more predictors are added to the model, even if they are not relevant.
Adjusted R-Squared penalizes for adding irrelevant predictors, making it a more reliable measure of model fit.
R-Squared can never decrease when adding predictors, while Adjusted R...
Analyzing datasets and building a Machine Learning model for Associate Data Scientist role.
1. Explore and understand the datasets to identify patterns and relationships.
2. Preprocess the data by handling missing values, encoding categorical variables, and scaling numerical features.
3. Split the data into training and testing sets for model evaluation.
4. Choose a suitable Machine Learning algorithm based on the nat...
Conduct EDA on datasets to uncover trends, patterns, and insights for informed decision-making.
Check for missing values and handle them appropriately, e.g., imputation or removal.
Visualize distributions of key variables using histograms or box plots to identify outliers.
Analyze correlations between features using heatmaps to understand relationships.
Segment data by categories to uncover trends, e.g., sales by regi...
Stemming reduces words to their root form, while lemmatization reduces words to their dictionary form.
Stemming chops off prefixes or suffixes to get the root form (e.g. 'running' becomes 'run')
Lemmatization uses vocabulary analysis to reduce words to their base form (e.g. 'better' becomes 'good')
Lemmatization is more accurate but slower than stemming
Stemming is faster but may not always result in a valid word
I applied via Naukri.com and was interviewed in Jun 2024. There were 3 interview rounds.
SQL query to join two tables
Use JOIN keyword to combine rows from two or more tables based on a related column between them
Specify the columns to be selected from each table
Use ON keyword to specify the join condition
Conduct EDA on datasets to uncover trends, patterns, and insights for informed decision-making.
Check for missing values and handle them appropriately, e.g., imputation or removal.
Visualize distributions of key variables using histograms or box plots to identify outliers.
Analyze correlations between features using heatmaps to understand relationships.
Segment data by categories to uncover trends, e.g., sales by region or...
Analyzing datasets and building a Machine Learning model for Associate Data Scientist role.
1. Explore and understand the datasets to identify patterns and relationships.
2. Preprocess the data by handling missing values, encoding categorical variables, and scaling numerical features.
3. Split the data into training and testing sets for model evaluation.
4. Choose a suitable Machine Learning algorithm based on the nature o...
Assumptions of linear regression are important for the model to be valid and reliable.
Linear relationship between independent and dependent variables
Independence of residuals (errors)
Homoscedasticity (constant variance of residuals)
Normality of residuals
No multicollinearity among independent variables
R-Squared measures the proportion of variance explained by the model, while Adjusted R-Squared adjusts for the number of predictors in the model.
R-Squared increases as more predictors are added to the model, even if they are not relevant.
Adjusted R-Squared penalizes for adding irrelevant predictors, making it a more reliable measure of model fit.
R-Squared can never decrease when adding predictors, while Adjusted R-Squa...
Central Limit Theorem states that the sampling distribution of the sample mean approaches a normal distribution as the sample size increases.
The Central Limit Theorem is a fundamental concept in statistics that states that the sampling distribution of the sample mean will be approximately normally distributed, regardless of the shape of the population distribution, as the sample size increases.
It is important because i...
Rank assigns unique ranks to each row based on the order specified, while Dense Rank assigns consecutive ranks without gaps.
Rank may have gaps in ranks if there are ties, while Dense Rank does not have gaps.
Rank function is used to assign a unique rank to each row based on the specified order, while Dense Rank function assigns consecutive ranks.
Example: If three rows have the same value and are ranked 1, 1, and 2 using...
Series is a one-dimensional labeled array while Dataframe is a two-dimensional labeled data structure.
Series can hold data of any type while Dataframe is a collection of Series.
Dataframe is like a table with rows and columns, while Series is like a single column of that table.
Dataframe is more versatile and powerful compared to Series.
Example: Series - a column of employee names. Dataframe - a table with columns for em...
Random Forest is an ensemble learning algorithm that creates multiple decision trees and combines their predictions.
Random Forest is a collection of decision trees that are trained on random subsets of the data.
Each tree in the Random Forest independently predicts the outcome, and the final prediction is made by averaging the predictions of all trees.
Random Forest is used for classification and regression tasks, and it...
Stemming reduces words to their root form, while lemmatization reduces words to their dictionary form.
Stemming chops off prefixes or suffixes to get the root form (e.g. 'running' becomes 'run')
Lemmatization uses vocabulary analysis to reduce words to their base form (e.g. 'better' becomes 'good')
Lemmatization is more accurate but slower than stemming
Stemming is faster but may not always result in a valid word
Top trending discussions
ETL stands for Extract, Transform, Load. It is a process used in data warehousing to extract data from various sources, transform it into a consistent format, and load it into a target database.
ETL stands for Extract, Transform, Load
Extract: Involves extracting data from various sources such as databases, applications, and files
Transform: Involves cleaning, filtering, and transforming the extracted data into a consiste...
I applied via Referral and was interviewed before Nov 2020. There were 3 interview rounds.
I applied via Naukri.com and was interviewed in Sep 2019. There was 1 interview round.
I am a recent graduate with a degree in Computer Science and experience in web development.
Recent graduate with a degree in Computer Science
Experience in web development
Strong problem-solving skills
Proficient in programming languages such as Java, JavaScript, and HTML/CSS
My hobbies include reading, hiking, and playing the guitar.
Reading: I enjoy reading fiction and non-fiction books in my free time.
Hiking: I love exploring nature trails and challenging myself with new hikes.
Playing the guitar: I have been playing the guitar for several years and enjoy learning new songs.
Our company is a leading tech startup specializing in AI-driven solutions for businesses.
Specializes in AI-driven solutions for businesses
Considered a leading tech startup in the industry
Known for innovative and cutting-edge technology
Has a strong focus on research and development
Provides services to a wide range of industries
I want to join your company because of its innovative projects, strong company culture, and opportunities for growth.
Innovative projects that align with my interests and skills
Strong company culture that values collaboration and employee development
Opportunities for growth and advancement within the company
I completed various training programs and projects during my college years, gaining hands-on experience in different areas.
Completed a training program in data analysis using Python and R
Developed a mobile application for a class project using Java and Android Studio
Participated in a research project on renewable energy sources
Completed an internship at a local software company, working on web development projects
I appeared for an interview before Jun 2016.
I appeared for an interview before Aug 2016.
I appeared for an interview before May 2016.
I appeared for an interview in Mar 2017.
To make the red fishes 98%, 50 fishes have to be removed from the aquarium.
Calculate 1% of 200 fishes to find the number of red fishes.
Subtract the number of red fishes from 200 to find the number of non-red fishes.
Calculate 2% of the total number of fishes to find the desired number of red fishes.
Subtract the desired number of red fishes from the current number of red fishes to find the number of fishes to be removed.
posted on 28 Jun 2017
I appeared for an interview in Mar 2017.
To make the red fishes 98%, 50 fishes have to be removed from the aquarium.
Calculate 1% of 200 fishes to find out how many fishes represent 1%.
Multiply the result by 2 to find out how many fishes represent 2%.
Subtract the result from 200 to find out how many fishes represent 98%.
Some of the top questions asked at the GeakMinds Associate Data Scientist interview -
based on 1 interview experience
Difficulty level
Duration
based on 3 reviews
Rating in categories
Software Engineer
10
salaries
| ₹3.6 L/yr - ₹5.8 L/yr |
Qa Automation Lead
9
salaries
| ₹30 L/yr - ₹31 L/yr |
Data Scientist
8
salaries
| ₹3 L/yr - ₹5 L/yr |
Associate Software Engineer
6
salaries
| ₹2.7 L/yr - ₹3.9 L/yr |
Senior Data Scientist
6
salaries
| ₹6.5 L/yr - ₹12.6 L/yr |
Marpu Foundation
Huawei Technologies
HCL Infosystems
Z X Learning