AB InBev India
Vitality by Design Interview Questions and Answers
Q1. How did you prevent your model from overfitting ? What did you do when it was underfit ?
To prevent overfitting, I used techniques like regularization, cross-validation, and early stopping. For underfitting, I tried increasing model complexity and adding more features.
Used regularization techniques like L1 and L2 regularization to penalize large weights
Used cross-validation to evaluate model performance on different subsets of data
Used early stopping to prevent the model from continuing to train when performance on validation set stops improving
For underfitting, ...read more
Q2. Why was this model/ approach used instead of others ?
The model/approach was chosen based on its accuracy, interpretability, and scalability.
The chosen model/approach had the highest accuracy compared to others.
The chosen model/approach was more interpretable and easier to explain to stakeholders.
The chosen model/approach was more scalable and could handle larger datasets.
Other models/approaches were considered but did not meet the requirements or had limitations.
The chosen model/approach was also more suitable for the specific ...read more
Q3. What approach did you use and why ?
I used a combination of supervised and unsupervised learning approaches to analyze the data.
I used supervised learning to train models for classification and regression tasks.
I used unsupervised learning to identify patterns and relationships in the data.
I also used feature engineering to extract relevant features from the data.
I chose this approach because it allowed me to gain insights from the data and make predictions based on it.
Q4. Explain chi square distribution. What are the assumptions involved?
Chi square distribution is a probability distribution used in statistical tests to determine the significance of relationships between categorical variables.
Chi square distribution is a continuous probability distribution that is used in statistical tests such as the chi square test.
It is skewed to the right and its shape is determined by the degrees of freedom.
Assumptions involved in chi square distribution include: random sampling, independence of observations, and expected...read more
Q5. Deep dive into clustering and bagging boosting algorithms
Clustering and bagging boosting algorithms are popular techniques in machine learning for grouping data points and improving model accuracy.
Clustering algorithms like K-means, DBSCAN, and hierarchical clustering are used to group similar data points together based on certain criteria.
Bagging algorithms like Random Forest create multiple subsets of the training data and train individual models on each subset, then combine their predictions to improve accuracy.
Boosting algorith...read more
Q6. Explain different clustering algorithms
Clustering algorithms group similar data points together based on certain criteria.
K-means: partitions data into K clusters based on centroids
Hierarchical clustering: creates a tree of clusters
DBSCAN: density-based clustering algorithm
Mean Shift: shifts centroids to maximize data points within a certain radius
Gaussian Mixture Models: assumes data points are generated from a mixture of Gaussian distributions
Q7. Librarires Used
I have experience using libraries such as Pandas, NumPy, Scikit-learn, Matplotlib for data analysis and visualization.
Pandas for data manipulation
NumPy for numerical operations
Scikit-learn for machine learning algorithms
Matplotlib for data visualization
Interview Process at Vitality by Design
Top Data Scientist Interview Questions from Similar Companies
Reviews
Interviews
Salaries
Users/Month