i
C5i
Filter interviews by
TF-IDF is a numerical statistic that reflects the importance of a word in a document relative to a collection of documents.
TF-IDF stands for Term Frequency-Inverse Document Frequency
It is used in Natural Language Processing (NLP) to determine the importance of a word in a document
TF-IDF is calculated by multiplying the term frequency (TF) by the inverse document frequency (IDF)
It helps in identifying the most impo...
Machine learning enables computers to learn from data and make predictions or decisions without being explicitly programmed.
Machine learning can automate and optimize complex processes
It can help identify patterns and insights in large datasets
It can improve accuracy and efficiency in decision-making
Examples include image recognition, natural language processing, and predictive analytics
It can also be used for ano...
KNN is a non-parametric algorithm used for classification and regression tasks.
KNN stands for K-Nearest Neighbors.
It works by finding the K closest data points to a given test point.
The class or value of the test point is then determined by the majority class or average value of the K neighbors.
KNN can be used for both classification and regression tasks.
It is a simple and easy-to-understand algorithm, but can be ...
I chose Data Science field because of its potential to solve complex problems and make a positive impact on society.
Fascination with data and its potential to drive insights
Desire to solve complex problems and make a positive impact on society
Opportunity to work with cutting-edge technology and tools
Ability to work in a variety of industries and domains
Examples: Predictive maintenance in manufacturing, fraud detec...
What people are saying about C5i
No, confusion matrix is not used in Linear Regression.
Confusion matrix is used to evaluate classification models.
Linear Regression is a regression model, not a classification model.
Evaluation metrics for Linear Regression include R-squared, Mean Squared Error, etc.
Confusion matrix is a table used to evaluate the performance of a classification model.
It is a 2x2 matrix that shows the number of true positives, false positives, true negatives, and false negatives.
It helps in calculating various metrics like accuracy, precision, recall, and F1 score.
It is useful in identifying the strengths and weaknesses of a model and improving its performance.
Example: In a binary classificat...
Linear Regression is used for predicting continuous numerical values, while Logistic Regression is used for predicting binary categorical values.
Linear Regression predicts a continuous output, while Logistic Regression predicts a binary output.
Linear Regression uses a linear equation to model the relationship between the independent and dependent variables, while Logistic Regression uses a logistic function.
Linear...
Linear Regression is a statistical method for modeling the relationship between a dependent variable and one or more independent variables.
It assumes a linear relationship between the dependent and independent variables.
The equation of a simple linear regression is Y = a + bX + e, where Y is the dependent variable, X is the independent variable, a is the intercept, b is the slope, and e is the error term.
Multiple ...
Random Forest is an ensemble learning method that builds multiple decision trees and combines their outputs to improve accuracy.
Random Forest is a type of supervised learning algorithm used for classification and regression tasks.
It creates multiple decision trees and combines their outputs to make a final prediction.
Each decision tree is built using a random subset of features and data points to reduce overfittin...
I appeared for an interview in May 2024.
Questions based on ML,PYTHON, DATA VISUALIZATION
TF-IDF is a numerical statistic that reflects the importance of a word in a document relative to a collection of documents.
TF-IDF stands for Term Frequency-Inverse Document Frequency
It is used in Natural Language Processing (NLP) to determine the importance of a word in a document
TF-IDF is calculated by multiplying the term frequency (TF) by the inverse document frequency (IDF)
It helps in identifying the most important...
ML,DL,Python,NLP,Data VIsualization
TF-IDF is a numerical statistic that reflects the importance of a word in a document relative to a collection of documents.
TF-IDF stands for Term Frequency-Inverse Document Frequency.
It is used in Natural Language Processing (NLP) to determine the importance of a word in a document.
TF-IDF is calculated by multiplying the term frequency (TF) of a word by the inverse document frequency (IDF) of the word.
It helps in ident...
I applied via Naukri.com and was interviewed before Dec 2023. There were 3 interview rounds.
Test of Basic data structures in Python include lists, tuples, and dictionaries, as well as loops and conditional statements.
Framework and requirements for chatbot implementation.
I applied via Recruitment Consultant and was interviewed in Dec 2018. There were 3 interview rounds.
I chose Data Science field because of its potential to solve complex problems and make a positive impact on society.
Fascination with data and its potential to drive insights
Desire to solve complex problems and make a positive impact on society
Opportunity to work with cutting-edge technology and tools
Ability to work in a variety of industries and domains
Examples: Predictive maintenance in manufacturing, fraud detection ...
Linear Regression is a statistical method for modeling the relationship between a dependent variable and one or more independent variables.
It assumes a linear relationship between the dependent and independent variables.
The equation of a simple linear regression is Y = a + bX + e, where Y is the dependent variable, X is the independent variable, a is the intercept, b is the slope, and e is the error term.
Multiple linea...
Linear Regression is used for predicting continuous numerical values, while Logistic Regression is used for predicting binary categorical values.
Linear Regression predicts a continuous output, while Logistic Regression predicts a binary output.
Linear Regression uses a linear equation to model the relationship between the independent and dependent variables, while Logistic Regression uses a logistic function.
Linear Regr...
Confusion matrix is a table used to evaluate the performance of a classification model.
It is a 2x2 matrix that shows the number of true positives, false positives, true negatives, and false negatives.
It helps in calculating various metrics like accuracy, precision, recall, and F1 score.
It is useful in identifying the strengths and weaknesses of a model and improving its performance.
Example: In a binary classification p...
No, confusion matrix is not used in Linear Regression.
Confusion matrix is used to evaluate classification models.
Linear Regression is a regression model, not a classification model.
Evaluation metrics for Linear Regression include R-squared, Mean Squared Error, etc.
KNN is a non-parametric algorithm used for classification and regression tasks.
KNN stands for K-Nearest Neighbors.
It works by finding the K closest data points to a given test point.
The class or value of the test point is then determined by the majority class or average value of the K neighbors.
KNN can be used for both classification and regression tasks.
It is a simple and easy-to-understand algorithm, but can be compu...
Random Forest is an ensemble learning method that builds multiple decision trees and combines their outputs to improve accuracy.
Random Forest is a type of supervised learning algorithm used for classification and regression tasks.
It creates multiple decision trees and combines their outputs to make a final prediction.
Each decision tree is built using a random subset of features and data points to reduce overfitting.
Ran...
I have worked on various projects involving data analysis, machine learning, and predictive modeling.
Developed a predictive model to forecast customer churn for a telecommunications company.
Built a recommendation system using collaborative filtering for an e-commerce platform.
Performed sentiment analysis on social media data to understand customer opinions and preferences.
Implemented a fraud detection system using anom...
My friends think of me as reliable, supportive, and always up for a good time.
Reliable - always there when they need help or support
Supportive - willing to listen and offer advice
Fun-loving - enjoys socializing and trying new things
I applied via Approached by Company and was interviewed in Feb 2022. There were 2 interview rounds.
I applied via LinkedIn and was interviewed before Oct 2023. There were 2 interview rounds.
Graph based question, acyclic graph
posted on 11 Sep 2024
I applied via Company Website and was interviewed in Aug 2024. There was 1 interview round.
RAG pipeline is a data processing pipeline used in data science to categorize data into Red, Amber, and Green based on certain criteria.
RAG stands for Red, Amber, Green which are used to categorize data based on certain criteria
Red category typically represents data that needs immediate attention or action
Amber category represents data that requires monitoring or further investigation
Green category represents data that...
Confusion metrics are used to evaluate the performance of a classification model by comparing predicted values with actual values.
Confusion matrix is a table that describes the performance of a classification model.
It consists of four different metrics: True Positive, True Negative, False Positive, and False Negative.
These metrics are used to calculate other evaluation metrics like accuracy, precision, recall, and F1 s...
I applied via Naukri.com and was interviewed in Oct 2023. There were 4 interview rounds.
Sql, python, Statistics mcq, Aptitude test. These were medium level questions.
Remove duplicates from list b, keep elements not in list a, and sort in ascending order.
Create a set from list b to remove duplicates
Use list comprehension to keep elements not in list a
Sort the final list in ascending order
Use the DISTINCT keyword in SQL to remove duplicates from a table.
Use the SELECT DISTINCT statement to retrieve unique rows from the table.
Identify the columns that should be used to determine uniqueness.
Example: SELECT DISTINCT column1, column2 FROM tab1;
Given 2 case studies on data science and asked different possibilities to improve the models.
How to work with imbalance dataset.
How to remove null values, what is features engineering.
What is PCA
What is the working of XGBOOST
based on 3 interview experiences
Difficulty level
Duration
based on 8 reviews
Rating in categories
Analyst
330
salaries
| ₹5 L/yr - ₹10.3 L/yr |
Senior Analyst
220
salaries
| ₹7.5 L/yr - ₹12 L/yr |
Data Analyst
181
salaries
| ₹2.5 L/yr - ₹9 L/yr |
Associate Manager
135
salaries
| ₹7.8 L/yr - ₹25 L/yr |
Junior Analyst
122
salaries
| ₹2.9 L/yr - ₹7 L/yr |
IKS Health
Crisil
Indegene
Acuity Knowledge Partners