i
Impact Analytics
Filter interviews by
Be the first one to contribute and help others!
I applied via Approached by Company and was interviewed in Aug 2024. There were 3 interview rounds.
AB testing is a method used to compare two versions of a webpage or app to determine which one performs better.
AB testing involves creating two versions (A and B) of a webpage or app with one differing element
Users are randomly assigned to either version A or B to measure performance metrics
The version that performs better in terms of the desired outcome is selected for implementation
Example: Testing two different call...
It was a classification problem
posted on 24 Oct 2024
I applied via Referral and was interviewed in Sep 2024. There was 1 interview round.
I applied via Referral and was interviewed in Mar 2024. There was 1 interview round.
I applied via Company Website and was interviewed before May 2020. There were 3 interview rounds.
I applied via Company Website and was interviewed in Mar 2024. There were 2 interview rounds.
Good questions. 3 SQL case studies at end
I applied via Recruitment Consulltant and was interviewed before Jun 2023. There were 2 interview rounds.
I was given assigment on a simple problem where task was to analyse and create a working solution for a problem statement
BERT (Bidirectional Encoder Representations from Transformers) is a pre-trained natural language processing model.
BERT is a transformer-based machine learning algorithm developed by Google.
It is designed to understand the context of words in a sentence by considering both the left and right context simultaneously.
BERT has been pre-trained on a large corpus of text data and can be fine-tuned for specific NLP tasks like ...
Logistic regression is a type of regression analysis used to predict the probability of a binary outcome.
Logistic regression is used when the dependent variable is binary (e.g. 0 or 1, yes or no).
It estimates the probability that a given input belongs to a certain category.
The output of logistic regression is transformed using a sigmoid function to ensure it falls between 0 and 1.
It uses the logistic function to model ...
R-squared value is a statistical measure that represents the proportion of the variance in the dependent variable that is predictable from the independent variable(s).
R-squared value ranges from 0 to 1, with 1 indicating a perfect fit.
It is used to evaluate the goodness of fit of a regression model.
A higher R-squared value indicates that the model explains a larger proportion of the variance in the dependent variable.
F...
Python , pandas, sql
I applied via Naukri.com and was interviewed in Feb 2022. There were 4 interview rounds.
Test had a mix of questions on Statistics, Probability, Machine Learning, SQL and Python.
To retain special characters in pandas data, use encoding parameter while reading the data.
Use encoding parameter while reading the data in pandas
Specify the encoding type of the data file
Example: pd.read_csv('filename.csv', encoding='utf-8')
Use pandas' read_csv() method with appropriate parameters to read large .csv files quickly.
Use the chunksize parameter to read the file in smaller chunks
Use the low_memory parameter to optimize memory usage
Use the dtype parameter to specify data types for columns
Use the usecols parameter to read only necessary columns
Use the skiprows parameter to skip unnecessary rows
Use the nrows parameter to read only a specific numb...
Use vectorized operations, avoid loops, and optimize memory usage.
Use vectorized operations like apply(), map(), and applymap() instead of loops.
Avoid using iterrows() and itertuples() as they are slower than vectorized operations.
Optimize memory usage by using appropriate data types and dropping unnecessary columns.
Use inplace=True parameter to modify the DataFrame in place instead of creating a copy.
Use the pd.eval()...
Generators are functions that allow you to iterate over a sequence of values without creating the entire sequence in memory. Decorators are functions that modify the behavior of other functions.
Generators use the yield keyword to return values one at a time
Generators are memory efficient and can handle large datasets
Decorators are functions that take another function as input and return a modified version of that funct...
Retrieve top 2 cities per state based on max number in third column of pandas dataframe.
Group the dataframe by state column
Sort each group by the third column in descending order
Retrieve the top 2 rows of each group using head(2) function
Concatenate the resulting dataframes using pd.concat() function
my_list[5] retrieves the 6th element of the list.
Indexing starts from 0 in Python.
The integer inside the square brackets is the index of the element to retrieve.
If the index is out of range, an IndexError is raised.
To create dictionaries in Python with repeated keys, use defaultdict from the collections module.
Import the collections module
Create a defaultdict object
Add key-value pairs to the dictionary using the same key multiple times
Access the values using the key
Example: from collections import defaultdict; d = defaultdict(list); d['key'].append('value1'); d['key'].append('value2')
Lambda functions are anonymous functions used for short and simple operations. They are different from regular functions in their syntax and usage.
Lambda functions are defined without a name and keyword 'lambda' is used to define them.
They can take any number of arguments but can only have one expression.
They are commonly used in functional programming and as arguments to higher-order functions.
Lambda functions are oft...
Merge and join are used to combine dataframes in pandas.
Merge is used to combine dataframes based on a common column or index.
Join is used to combine dataframes based on their index.
Merge can handle different column names, while join cannot.
Merge can handle different types of joins (inner, outer, left, right), while join only does inner join by default.
The resultant table will have all the columns from both tables and the rows will be a combination of matching rows.
The resultant table will have all the columns from both tables
The rows in the resultant table will be a combination of matching rows
If the second table has repeated keys, there will be multiple rows with the same key in the resultant table
Eigenvalues and eigenvectors are linear algebra concepts used in machine learning for dimensionality reduction and feature extraction.
Eigenvalues represent the scaling factor of the eigenvectors.
Eigenvectors are the directions along which a linear transformation acts by stretching or compressing.
In machine learning, eigenvectors are used for principal component analysis (PCA) to reduce the dimensionality of data.
Eigenv...
PCA is a dimensionality reduction technique used to transform high-dimensional data into a lower-dimensional space.
PCA can be used for feature extraction, data visualization, and noise reduction.
PCA cannot be used for causal inference or to handle missing data.
PCA assumes linear relationships between variables and may not work well with non-linear data.
PCA can be applied to various fields such as finance, image process
VIF stands for Variance Inflation Factor, a measure of multicollinearity in regression analysis.
VIF is calculated for each predictor variable in a regression model.
It measures how much the variance of the estimated regression coefficient is increased due to multicollinearity.
A VIF of 1 indicates no multicollinearity, while a VIF greater than 1 indicates increasing levels of multicollinearity.
VIF is calculated as 1 / (1...
AIC & BIC are statistical measures used to evaluate the goodness of fit of a linear regression model.
AIC stands for Akaike Information Criterion and BIC stands for Bayesian Information Criterion.
Both AIC and BIC are used to compare different models and select the best one.
AIC penalizes complex models less severely than BIC.
Lower AIC/BIC values indicate a better fit of the model to the data.
AIC and BIC can be calculated...
We minimize the loss in logistic regression.
The goal of logistic regression is to minimize the loss function.
The loss function measures the difference between predicted and actual values.
The optimization algorithm tries to find the values of coefficients that minimize the loss function.
Minimizing the loss function leads to better model performance.
Examples of loss functions used in logistic regression are cross-entropy
One vs Rest is a technique used to extend binary classification to multi-class problems in logistic regression.
It involves training multiple binary classifiers, one for each class.
In each classifier, one class is treated as the positive class and the rest as negative.
The class with the highest probability is predicted as the final output.
It is also known as one vs all or one vs others.
Example: In a 3-class problem, we ...
One vs one classification is a binary classification method where multiple models are trained to classify each pair of classes.
It is used when there are more than two classes in the dataset.
It involves training multiple binary classifiers for each pair of classes.
The final prediction is made by combining the results of all the binary classifiers.
Example: In a dataset with 5 classes, 10 binary classifiers will be traine
Senior Data Scientist
72
salaries
| ₹9 L/yr - ₹26.5 L/yr |
Data Scientist
69
salaries
| ₹6 L/yr - ₹21 L/yr |
Senior Software Engineer
62
salaries
| ₹11.5 L/yr - ₹39 L/yr |
Software Engineer
51
salaries
| ₹6 L/yr - ₹23 L/yr |
Business Analyst
46
salaries
| ₹8 L/yr - ₹14 L/yr |
Fractal Analytics
Mu Sigma
Tiger Analytics
LatentView Analytics