Data Science Intern

100+ Data Science Intern Interview Questions and Answers

Updated 2 Jul 2025

Asked in Samsung

4d ago

Q. Rotate Matrix by 90 Degrees Problem Statement

Given a square matrix 'MATRIX' of non-negative integers, rotate the matrix by 90 degrees in an anti-clockwise direction using only constant extra space.

Input:

The ...read more

Ans.

The task is to rotate a square matrix by 90 degrees in an anti-clockwise direction using constant extra space.

Iterate through each layer of the matrix from outer to inner
For each layer, perform a four-way swap of elements
Continue this process until all layers have been rotated

Q. Given an array (duplicates allowed), how do you find the second highest element with a time complexity of O(N) using a single traversal and a space complexity of O(1)?

Ans.

Get second highest element from an array of strings with O(N) time complexity and O(1) space complexity.

Initialize two variables to store the highest and second highest elements.
Traverse the array and update the variables accordingly.
Return the second highest element.
Handle edge cases like empty array or array with only one element.

Data Science Intern Interview Questions and Answers for Freshers

View all interview questions

Q. What is gradient descent, why does gradient descent follow tan angles and please explain and write down the formula of it.

Ans.

Gradient descent is an optimization algorithm used to minimize the cost function of a machine learning model.

Gradient descent is used to update the parameters of a model to minimize the cost function.
It follows the direction of steepest descent, which is the negative gradient of the cost function.
The learning rate determines the step size of the algorithm.
The formula for gradient descent is: theta = theta - alpha * (1/m) * sum((hypothesis - y) * x)
The cost function should be ...read more

Asked in Uber

1d ago

Q. What new feature would you like to add to Uber?

Ans.

I would like to add a feature that allows users to schedule rides in advance.

Users can schedule rides for important events or appointments
Option to choose specific driver or vehicle for scheduled rides
Notifications for upcoming scheduled rides
Ability to edit or cancel scheduled rides

Are these interview questions helpful?

Asked in Siemens

2d ago

Q. How do you implement a machine learning algorithm based on a given case study, and which algorithm do you choose and why?

Ans.

To implement a machine learning algorithm based on a case study, choose an algorithm based on the type of data and problem to be solved.

Understand the problem statement and the type of data available.
Preprocess the data by handling missing values, encoding categorical variables, and scaling features.
Split the data into training and testing sets.
Choose an appropriate algorithm based on the problem type (classification, regression, clustering) and data characteristics.
Train the...read more

Q. How many squares are there on a chessboard?

Ans.

A chessboard has 64 small squares, but also contains larger squares formed by combining them.

A standard chessboard is 8x8, totaling 64 small squares.
Larger squares can be formed: 1x1, 2x2, ..., up to 8x8.
Count of squares of size kxk is (8-k+1)^2.
Total squares = 1^2 + 2^2 + 3^2 + ... + 8^2 = 204.

Data Science Intern Jobs

Data Science Intern • 0-1 years

3SC

•

3.7

Bangalore / Bengaluru

Data science intern • 1-2 years

Eversana India

•

3.4

Bangalore / Bengaluru

Data Science Intern/Data Analytics/prompt engineering • 0-1 years

Idatalytics

•

3.1

Bangalore / Bengaluru

View all Data Science Intern jobs

Q. What is the coefficient of x^7 in the equation y=(x^101-1)(x^100+1)(x^99-1)...(x^0+1)?

Ans.

Coffiecent of x^7 in a given equation

Use the binomial theorem to expand the equation
Identify the term with x^7
The coefficient of x^7 is the coefficient of that term

Asked in Ideapoke Technologies

1d ago

Q. Where did you complete your Data Science course?

Ans.

I completed the Data Science course at XYZ University.

Completed Data Science course at XYZ University
Received hands-on training in machine learning algorithms
Worked on real-world projects during the course

Share interview questions and help millions of jobseekers 🌟

Q. What parameters are important for deciding whether a man can reach the bus stop?

Ans.

Key parameters include distance, speed, physical condition, and external factors affecting travel to the bus stop.

Distance to the bus stop: Longer distances require more time and effort.
Walking speed: A faster pace increases the likelihood of reaching on time.
Physical fitness: A person's health can impact their ability to walk quickly.
Traffic conditions: Busy roads may slow down travel time.
Weather conditions: Rain or snow can hinder movement and affect timing.
Time until bus ...read more

Q. How do you choose the K value in K-means clustering? If there are any techniques, name them and explain them.

Ans.

Choosing the optimal K value in K-means clustering is crucial for accurate results.

Elbow method: Plotting the sum of squared distances vs. K and selecting the K value where the curve bends like an elbow.
Silhouette method: Calculating the average silhouette score for different K values and choosing the one with the highest score.
Gap statistic method: Comparing the within-cluster dispersion to a reference null distribution to find the optimal K value.
Cross-validation: Splitting...read more

Q. Explain Random forest. What is gini impurity.

Ans.

Random forest is an ensemble learning method that constructs a multitude of decision trees and outputs the mode of the classes. Gini impurity is a measure of impurity or randomness used in decision trees.

Random forest is a collection of decision trees that are trained on different subsets of the data.
Each decision tree in the forest is trained on a random subset of the features.
The final prediction is made by taking the mode of the predictions of all the trees.
Gini impurity i...read more

Asked in Wolters Kluwer

3d ago

Q. Which ML algorithm did you use in your project?

Ans.

I used the Random Forest algorithm in my project.

Random Forest is an ensemble learning method that combines multiple decision trees to make predictions.
It is used for both classification and regression tasks.
Random Forest reduces overfitting and provides feature importance.
Example: I used Random Forest to predict customer churn in a telecom company.

Asked in Mahindra & Mahindra

6d ago

Q. Given a table containing student name, class, 11th marks, and 12th marks, write an SQL query to determine whether class 11 marks or class 12 marks are greater.

Ans.

Analyze student marks to determine if Class 11 or Class 12 marks are higher.

Compare the '11th marks' and '12th marks' columns for each student.
Use SQL query: SELECT student_name, CASE WHEN class_11_marks > class_12_marks THEN 'Class 11' ELSE 'Class 12' END AS higher_class FROM students;
Example: If 'John' has 85 in Class 11 and 90 in Class 12, the result will show 'Class 12' as higher.
Aggregate results to see how many students performed better in Class 11 vs Class 12.

Asked in TCS

5d ago

Q. What is the difference between call by reference and call by value?

Ans.

Call by value passes a copy of the value while call by reference passes the address of the value.

Call by value passes a copy of the value while call by reference passes the address of the value.
Call by value does not modify the original value while call by reference can modify the original value.
Call by value is used for simple data types while call by reference is used for complex data types.

Asked in Avantao Technologies

2d ago

Q. If you were given an unstructured data set with no clear objective, how would you approach finding insights from it?

Ans.

To extract insights from unstructured data, I would employ exploratory analysis, NLP, and clustering techniques.

1. Data Exploration: Begin by understanding the data's structure and content. For example, if it's text data, look for common themes or topics.
2. Preprocessing: Clean the data by removing noise, such as irrelevant information or formatting issues. For instance, in text data, remove stop words and punctuation.
3. Natural Language Processing (NLP): Use NLP techniques t...read more

Asked in Wolters Kluwer

2d ago

Q. Apply a join operation to the tables provided.

Ans.

Joining tables combines related data for analysis, enhancing insights and decision-making.

A join operation merges rows from two or more tables based on a related column.
Types of joins include INNER JOIN, LEFT JOIN, RIGHT JOIN, and FULL OUTER JOIN.
Example: INNER JOIN returns only matching rows from both tables.
LEFT JOIN returns all rows from the left table and matched rows from the right table.
Example: If Table A has 3 rows and Table B has 2 matching rows, INNER JOIN will retu...read more

Asked in ffreedom app

3d ago

Q. What is the difference between Data Definition Language (DDL) and Data Manipulation Language (DML)?

Ans.

DDL is used to define the structure of database objects, while DML is used to manipulate data within those objects.

DDL is used to create, modify, and delete database objects such as tables, indexes, and views.
DML is used to insert, update, retrieve, and delete data within those database objects.
Examples of DDL statements include CREATE TABLE, ALTER INDEX, and DROP VIEW.
Examples of DML statements include INSERT INTO, UPDATE SET, and DELETE FROM.

Asked in GE Aerospace

1d ago

Q. When the performance metrics have a high variance, which performance metric do you rely on?

Ans.

Choosing the right performance metric depends on the specific goals and context of the model's application.

Consider the business objective: For example, in fraud detection, prioritize recall to catch as many fraudulent cases as possible.
Evaluate the cost of false positives vs. false negatives: In medical diagnosis, a false negative can be more harmful than a false positive.
Use domain knowledge: In a recommendation system, precision might be more important to ensure users rece...read more

Asked in Mahindra & Mahindra

3d ago

Q. Given a table of students and their marks in 11th and 12th grade, find the students who scored more than the average marks in both grades.

Ans.

Find students who scored more than avg marks in both 11th and 12th grades.

Calculate the average marks for each student in 11th and 12th grades.
Compare each student's marks with the respective average marks to find those who scored higher in both grades.

Asked in Siemens

1d ago

Q. What is LDA, and can you represent LDA using a diagram?

Ans.

LDA stands for Latent Dirichlet Allocation, a topic modeling technique used in natural language processing.

LDA is a generative statistical model that allows sets of observations to be explained by unobserved groups that explain why some parts of the data are similar.
It is commonly used in text mining to extract topics from a collection of documents.
LDA assumes that each document is a mixture of a small number of topics and that each word's presence is attributable to one of t...read more

Asked in OneBanc Technologies

4d ago

Q. As a guesstimate question, how many tea cups are consumed daily in Delhi?

Ans.

Estimating daily tea cup consumption in Delhi involves population size, tea-drinking habits, and cultural factors.

Delhi's population is approximately 20 million.
Assuming 30% of the population drinks tea daily, that's 6 million tea drinkers.
If each tea drinker consumes an average of 2 cups per day, that totals 12 million cups.
Cultural factors: Tea is a popular beverage in India, often consumed at home and in street stalls.

Asked in Mahindra & Mahindra

1d ago

Q. Write an SQL query to find customers who have ordered all products from all categories.

Ans.

Use a SQL query to find customers who have ordered all products from all categories.

Join the Customers, Orders, and Products tables
Group by customer and count the distinct products ordered
Filter for customers who have ordered the total number of products available in each category

Asked in OneBanc Technologies

4d ago

Q. What is the average number of cups of tea people drink per day in Delhi?

Ans.

The average number of tea people drink in a day in Delhi varies depending on individual preferences and habits.

The average number of tea consumed can range from 1-5 cups per day.
Factors such as age, gender, occupation, and cultural background can influence tea consumption.
Some people may drink more tea in the morning for a caffeine boost, while others may prefer tea throughout the day for relaxation.
Tea consumption may also vary based on the season, with more tea being consum...read more

Asked in Wolters Kluwer

5d ago

Q. What is Sql? And what is a database?

Ans.

SQL is a programming language used for managing and manipulating relational databases. A database is a structured collection of data.

SQL is used to retrieve, insert, update, and delete data from a database.
A database is a software system that stores and organizes data in a structured manner.
SQL allows users to define the structure of a database, create tables, and establish relationships between tables.
Examples of databases include MySQL, Oracle, and SQL Server.

Asked in ffreedom app

6d ago

Q. What is the SQL query for calculating a moving average?

Ans.

The SQL query for calculating a moving average involves using window functions.

Use the OVER clause with the ORDER BY clause to define the window frame for the moving average calculation.
Use the AVG() function to calculate the average within the window frame.
Example: SELECT value, AVG(value) OVER (ORDER BY date ROWS BETWEEN 2 PRECEDING AND CURRENT ROW) AS moving_avg FROM table_name;

Asked in Wolters Kluwer

4d ago

Q. What do you understand by joins in SQL?

Ans.

Joins in SQL are used to combine rows from two or more tables based on a related column between them.

Joins are used to retrieve data from multiple tables in a single query.
Common types of joins include inner join, left join, right join, and full outer join.
Joins are performed using the JOIN keyword and specifying the columns to join on.
Joins can be used to combine tables based on matching values or non-matching values.
Joins help in creating relationships between tables and fe...read more

Asked in Mobile Premier League

4d ago

Q. Which algorithm, Random Forest or XGBoost, would you prefer when the model has low bias?

Ans.

XGBoost is preferred over Random Forest for low bias models due to its ability to reduce bias further.

XGBoost is a more complex algorithm compared to Random Forest, allowing it to reduce bias further in low bias models.
XGBoost uses gradient boosting which helps in reducing bias by optimizing the loss function iteratively.
Random Forest may not be able to further reduce bias in low bias models as effectively as XGBoost.
In scenarios where the model already has low bias, XGBoost'...read more

Asked in Mvg Innovations

1d ago

Q. What is overfitting, How to handle missing values, etc

Ans.

Overfitting is when a model is too complex and fits the training data too closely, leading to poor performance on new data.

Regularization techniques like L1 and L2 can be used to prevent overfitting
Cross-validation can be used to evaluate model performance on new data
Reducing the complexity of the model can also help prevent overfitting
Handling missing values can be done by imputing them with mean, median or mode values
Alternatively, missing values can be dropped if they are ...read more

Asked in Labmentix

5d ago

Q. Can you describe a time when you encountered a challenge in a past project that involved SQL?

Ans.

Faced a challenge optimizing a complex SQL query that was slowing down data retrieval for a project.

Identified the issue during testing when query execution time exceeded acceptable limits.
Analyzed the query plan and discovered missing indexes on key columns.
Implemented indexing strategies, which reduced execution time from 30 seconds to under 2 seconds.
Collaborated with team members to ensure the changes did not affect other parts of the application.

Asked in AI Variant

4d ago

Q. What are the reasons for using deep learning in addition to traditional machine learning techniques?

Ans.

Deep learning excels in handling complex data patterns, automating feature extraction, and improving accuracy over traditional methods.

Deep learning can automatically extract features from raw data, reducing the need for manual feature engineering. For example, in image recognition, convolutional neural networks (CNNs) can identify edges, shapes, and objects without explicit programming.
It is particularly effective for large datasets, where traditional machine learning may st...read more