Filter interviews by
I applied via Referral and was interviewed in Feb 2022. There were 5 interview rounds.
Given time series data of provider, compute hour wise provider wise no of seconds online
Assumptions in Linear Regression
Linear relationship between independent and dependent variables
Homoscedasticity (constant variance) of residuals
Independence of residuals
Normal distribution of residuals
No multicollinearity among independent variables
Overfitting and underfitting are two common problems in machine learning models.
Overfitting occurs when a model is too complex and fits the training data too closely, resulting in poor performance on new data.
Underfitting occurs when a model is too simple and cannot capture the underlying patterns in the data, resulting in poor performance on both training and new data.
Overfitting can be prevented by using regularizati...
To improve the performance of Linear Regression, you can consider feature engineering, regularization, and handling outliers.
Perform feature engineering to create new features that capture important information.
Apply regularization techniques like L1 or L2 regularization to prevent overfitting.
Handle outliers by either removing them or using robust regression techniques.
Check for multicollinearity among the independent...
Metrics used to evaluate Linear Regression
Mean Squared Error (MSE)
Root Mean Squared Error (RMSE)
R-squared (R²)
Adjusted R-squared (Adj R²)
Mean Absolute Error (MAE)
Residual Sum of Squares (RSS)
Akaike Information Criterion (AIC)
Bayesian Information Criterion (BIC)
Cost function measures the difference between predicted and actual values. Error function measures the average of cost function.
Cost function is used to evaluate the performance of a machine learning model.
It measures the difference between predicted and actual values.
Error function is the average of cost function over the entire dataset.
It is used to optimize the parameters of the model.
Examples of cost functions are ...
Overfitting in Linear Regression can be handled by using regularization techniques.
Regularization techniques like Ridge regression and Lasso regression can help in reducing overfitting.
Cross-validation can be used to find the optimal regularization parameter.
Feature selection and dimensionality reduction techniques can also help in reducing overfitting.
Collecting more data can help in reducing overfitting by providing
Least Squares Method and Maximum Likelihood are both used to estimate parameters, but differ in their approach.
Least Squares Method minimizes the sum of squared errors between the observed and predicted values.
Maximum Likelihood estimates the parameters that maximize the likelihood of observing the given data.
Least Squares Method assumes that the errors are normally distributed and independent.
Maximum Likelihood does n...
Logistic Regression formula is used to model the probability of a certain event occurring.
The formula is: P(Y=1) = e^(b0 + b1*X1 + b2*X2 + ... + bn*Xn) / (1 + e^(b0 + b1*X1 + b2*X2 + ... + bn*Xn))
Y is the dependent variable and X1, X2, ..., Xn are the independent variables
b0, b1, b2, ..., bn are the coefficients that need to be estimated
The formula is used to predict the probability of a binary outcome, such as whether...
Type I error is rejecting a true null hypothesis, while Type II error is failing to reject a false null hypothesis.
Type I error is also known as a false positive
Type II error is also known as a false negative
Type I error occurs when the significance level is set too high
Type II error occurs when the significance level is set too low
Examples: Type I error - Convicting an innocent person, Type II error - Failing to convi...
Metrics used to evaluate classification models
Accuracy
Precision
Recall
F1 Score
ROC Curve
Confusion Matrix
Overfitting in decision trees can be handled by pruning, reducing tree depth, increasing dataset size, and using ensemble methods.
Prune the tree to remove unnecessary branches
Reduce tree depth to prevent overfitting
Increase dataset size to improve model generalization
Use ensemble methods like Random Forest to reduce overfitting
Underfitting can be handled by increasing tree depth, adding more features, and reducing regu...
Case Study - How do you improve user engagement of Facebook?
Guesstimates - How many people watched the Squid Game series on Netflix
How do you reduce partner churn in UC?
I applied via Job Portal
SQL and Python data analysis questions
Matching data in different columns involves comparing the values in the columns and identifying similarities or patterns.
Use string matching techniques like exact match, partial match, or fuzzy matching.
Apply data cleaning and preprocessing techniques to standardize the data before matching.
Utilize advanced algorithms like Levenshtein distance or Jaccard similarity for more complex matching.
Consider using database join...
I applied via LinkedIn and was interviewed in Sep 2022. There were 4 interview rounds.
I applied via Campus Placement and was interviewed in Nov 2024. There were 4 interview rounds.
Questions from arrays and strings and some aptitude questions
To merge two CSV files, you can use software like Microsoft Excel or programming languages like Python.
Open both CSV files in a software like Microsoft Excel.
Copy the data from one CSV file and paste it into the other CSV file.
Save the merged CSV file with a new name.
Alternatively, you can use programming languages like Python to merge CSV files by reading both files, combining the data, and writing to a new file.
I applied to this company because of its reputation in the industry, opportunities for growth, and company culture.
Reputation in the industry - I have heard great things about the company's innovative projects and successful track record.
Opportunities for growth - The company offers various training programs and career advancement opportunities for employees.
Company culture - I value a positive work environment and the...
Normal distribution is a probability distribution that is symmetric and bell-shaped.
It is also known as Gaussian distribution.
It is characterized by mean and standard deviation.
Many natural phenomena follow normal distribution, such as height and weight of individuals.
68% of the data falls within one standard deviation of the mean, 95% within two standard deviations, and 99.7% within three standard deviations.
It is wid...
SQL jobs include database administrator, data analyst, data scientist, business intelligence analyst, and software developer.
Database Administrator
Data Analyst
Data Scientist
Business Intelligence Analyst
Software Developer
I applied via Company Website and was interviewed in Jul 2023. There were 4 interview rounds.
Number sires, clock, logic, arithmetic, geometry
Code test basic, ans the basic knowledge, write code
Python is a high-level programming language known for its simplicity and readability.
Python is widely used for web development, data analysis, artificial intelligence, and scientific computing.
It emphasizes code readability and uses indentation to define code blocks.
Python has a large standard library and a vibrant community of developers.
Example: print('Hello, World!')
Example: import pandas as pd
Python is a versatile programming language used for data analysis, web development, artificial intelligence, automation, and more.
Data analysis and visualization
Web development (Django, Flask)
Artificial intelligence and machine learning (TensorFlow, PyTorch)
Automation and scripting
Scientific computing (NumPy, SciPy)
Python is a high-level programming language known for its simplicity and readability.
Python is an interpreted language, meaning it does not need to be compiled before running.
It supports multiple programming paradigms, including object-oriented, imperative, and functional programming.
Python has a large standard library and a thriving community, making it versatile and widely used.
Example: Python is used for web develop...
Object-oriented programming (OOP) is a programming paradigm based on the concept of 'objects', which can contain data and code.
OOP allows for the organization of code into reusable components called classes.
Classes can have attributes (variables) and methods (functions) associated with them.
In Python, everything is an object, and classes can be defined using the 'class' keyword.
Encapsulation, inheritance, and polymorph
An array in Python is a data structure that stores a collection of elements of the same type.
Arrays can store elements such as integers, floats, or strings.
Arrays are indexed starting from 0, with elements accessed using their index.
Example: arr = ['apple', 'banana', 'cherry']
I applied via Job Portal and was interviewed before Jun 2022. There were 2 interview rounds.
I applied via Company Website and was interviewed before Feb 2023. There was 1 interview round.
Types of join in SQL include inner join, left join, right join, and full outer join.
Inner join returns only the matching records from both tables.
Left join returns all records from the left table and the matching records from the right table.
Right join returns all records from the right table and the matching records from the left table.
Full outer join returns all records when there is a match in either the left or rig
A left join is used to combine data from two tables based on a common column, including all records from the left table.
Left join returns all rows from the left table and the matching rows from the right table.
It is useful when you want to retrieve all records from the left table, even if there are no matches in the right table.
The result of a left join will have NULL values in the columns from the right table where th...
The number of rows after applying join operations depends on the type of join used and the data in the tables being joined.
Inner join retains only the rows that have matching values in both tables
Left join retains all rows from the left table and the matched rows from the right table
Right join retains all rows from the right table and the matched rows from the left table
Full outer join retains all rows when there is a
VLOOKUP in Excel is used to search for a value in a table and return a corresponding value, while SQL functions like JOIN and WHERE are used to retrieve data from multiple tables based on specified conditions.
VLOOKUP is specific to Excel and works on a single table, while SQL functions can work on multiple tables.
VLOOKUP requires the table to be sorted in ascending order, while SQL functions do not have this requiremen...
I applied via Company Website and was interviewed before Feb 2023. There were 2 interview rounds.
Simple SQL & Excel test
I applied via Company Website and was interviewed in Apr 2024. There was 1 interview round.
HackerRank Test - Python, SQL
Some of the top questions asked at the Urban Company Senior Data Analyst interview -
Category Manager
558
salaries
| ₹4 L/yr - ₹14 L/yr |
Business Development Associate
490
salaries
| ₹1.8 L/yr - ₹5.2 L/yr |
Senior Category Manager
137
salaries
| ₹10 L/yr - ₹24 L/yr |
AC Technician
135
salaries
| ₹0.7 L/yr - ₹7.2 L/yr |
Beautician
134
salaries
| ₹0.8 L/yr - ₹7.8 L/yr |
Housejoy
Urban Ladder
Udaan
Swiggy