Yannick Tech Systems Interview Questions and Answers

Question 1

Asked in

Q1. How did you prevent your model from overfitting ? What did you do when it was underfit ?

Add your answer

Answer

To prevent overfitting, I used techniques like regularization, cross-validation, and early stopping. For underfitting, I tried increasing model complexity and adding more features.

Used regularization techniques like L1 and L2 regularization to penalize large weights
Used cross-validation to evaluate model performance on different subsets of data
Used early stopping to prevent the model from continuing to train when performance on validation set stops improving
For underfitting, ...read more

Question 2

Asked in

Data Scientist Interview

Q2. Why was this model/ approach used instead of others ?

Add your answer

Answer

The model/approach was chosen based on its accuracy, interpretability, and scalability.

The chosen model/approach had the highest accuracy compared to others.
The chosen model/approach was more interpretable and easier to explain to stakeholders.
The chosen model/approach was more scalable and could handle larger datasets.
Other models/approaches were considered but did not meet the requirements or had limitations.
The chosen model/approach was also more suitable for the specific ...read more

Question 3

Asked in

Data Scientist Interview

Q3. What approach did you use and why ?

Add your answer

Answer

I used a combination of supervised and unsupervised learning approaches to analyze the data.

I used supervised learning to train models for classification and regression tasks.
I used unsupervised learning to identify patterns and relationships in the data.
I also used feature engineering to extract relevant features from the data.
I chose this approach because it allowed me to gain insights from the data and make predictions based on it.

Question 4

Asked in

Data Scientist Interview

Q4. Explain chi square distribution. What are the assumptions involved?

Add your answer

Answer

Chi square distribution is a probability distribution used in statistical tests to determine the significance of relationships between categorical variables.

Chi square distribution is a continuous probability distribution that is used in statistical tests such as the chi square test.
It is skewed to the right and its shape is determined by the degrees of freedom.
Assumptions involved in chi square distribution include: random sampling, independence of observations, and expected...read more

Question 5

Asked in

Data Scientist Interview

Q5. Deep dive into clustering and bagging boosting algorithms

Add your answer

Answer

Clustering and bagging boosting algorithms are popular techniques in machine learning for grouping data points and improving model accuracy.

Clustering algorithms like K-means, DBSCAN, and hierarchical clustering are used to group similar data points together based on certain criteria.
Bagging algorithms like Random Forest create multiple subsets of the training data and train individual models on each subset, then combine their predictions to improve accuracy.
Boosting algorith...read more

Question 6

Asked in

Data Scientist Interview

Q6. Explain different clustering algorithms

Add your answer

Answer

Clustering algorithms group similar data points together based on certain criteria.

K-means: partitions data into K clusters based on centroids
Hierarchical clustering: creates a tree of clusters
DBSCAN: density-based clustering algorithm
Mean Shift: shifts centroids to maximize data points within a certain radius
Gaussian Mixture Models: assumes data points are generated from a mixture of Gaussian distributions

Question 7

Asked in

Data Scientist Interview

Q7. Librarires Used

Add your answer

Answer

I have experience using libraries such as Pandas, NumPy, Scikit-learn, Matplotlib for data analysis and visualization.

Pandas for data manipulation
NumPy for numerical operations
Scikit-learn for machine learning algorithms
Matplotlib for data visualization

Yannick Tech Systems Interview Questions and Answers

Q1. How did you prevent your model from overfitting ? What did you do when it was underfit ?

Q2. Why was this model/ approach used instead of others ?

Q3. What approach did you use and why ?

Q4. Explain chi square distribution. What are the assumptions involved?

Q5. Deep dive into clustering and bagging boosting algorithms

Q6. Explain different clustering algorithms

Q7. Librarires Used

More about working at AB InBev India

Interview Process at Yannick Tech Systems

Top Data Scientist Interview Questions from Similar Companies