Data Analyst
1000+ Data Analyst Interview Questions and Answers
Popular Companies
Q51. Explain reconciliation assignment that you have done in first step.
Reconciliation assignment involves comparing and matching data from different sources.
Identify sources of data to be reconciled
Compare data from different sources
Match data based on common identifiers
Resolve discrepancies and errors
Ensure accuracy and completeness of reconciled data
Q52. How much amount of data you Handel till now.
I have handled large amounts of data in my previous roles.
I have experience handling terabytes of data in my previous role as a data analyst at XYZ company.
I have worked with data from various sources such as databases, spreadsheets, and APIs.
I have also used tools like SQL, Python, and Excel to manipulate and analyze data.
I am comfortable working with both structured and unstructured data.
I have experience cleaning and transforming data to make it usable for analysis.
Q53. Whom are you going to query when you have TP discrepant data
I would query the relevant stakeholders involved in the data collection process.
Identify the stakeholders involved in the data collection process
Reach out to them to understand the data collection process and identify any potential issues
Collaborate with them to resolve the TP discrepant data
Q54. Suppose we need to find the sales of Rosogolla sweets being sold in Kolkata. What approach would you take?
I would use a combination of primary and secondary research to find the sales of Rosogolla sweets in Kolkata.
Conduct surveys and interviews with local sweet shop owners and customers
Analyze sales data from major sweet shops in Kolkata
Research online for articles and reports on the popularity of Rosogolla in Kolkata
Compare sales data with other popular sweets in Kolkata
Consider factors such as seasonality and festivals that may affect sales
Q55. Guesstimate the number of people travelling by local metro in mumbai
The number of people travelling by local metro in Mumbai is estimated to be in millions.
Mumbai has a population of over 20 million people.
The local metro is a popular mode of transportation in Mumbai.
During peak hours, the metro trains are usually crowded.
The metro network in Mumbai is extensive, covering various parts of the city.
The number of daily commuters on the metro can be estimated based on population density and transportation preferences.
Q56. What is row number, rank, dense rank? explain partition by, indexing joins, query optimization techniques
Explanation of row number, rank, dense rank, partition by, indexing joins, and query optimization techniques.
Row number is a function that assigns a unique number to each row in a result set.
Rank is a function that assigns a rank to each row based on the values in a specified column.
Dense rank is a function that assigns a rank to each row based on the values in a specified column, but with no gaps in the ranking.
Partition by is a clause that divides a result set into partitio...read more
Share interview questions and help millions of jobseekers 🌟
Q57. Filters Transfer data from 1 sheet to other Join , inner, outer, self, left right Along with the syntax Colene function in SQL Show top value show values between a date range Sumifs countifs
Answering questions related to data filtering, transferring, joining, and SQL functions for a Data Analyst role.
To transfer data from one sheet to another, you can use functions like VLOOKUP or INDEX/MATCH in Excel.
Different types of joins in SQL include INNER JOIN, OUTER JOIN, LEFT JOIN, RIGHT JOIN, and SELF JOIN.
Syntax for INNER JOIN: SELECT columns FROM table1 INNER JOIN table2 ON table1.column = table2.column;
Syntax for SUMIFS in Excel: =SUMIFS(sum_range, criteria_range1,...read more
Q58. Excel :- Difference between vlookup and Index(Match)
VLOOKUP searches for a value in the first column of a table and returns a corresponding value in the same row. INDEX/MATCH uses two functions to look up a value based on its position in a table.
VLOOKUP only searches for values in the first column of a table, while INDEX/MATCH can search for values in any column.
INDEX/MATCH is more flexible and efficient for large datasets.
VLOOKUP is easier to use for simple lookups with small datasets.
INDEX/MATCH can handle errors more effect...read more
Data Analyst Jobs
Q59. Given a table with fields(id, name, salary, managerId) find out the managers who have atleast 5 reportees under him/her.
Use SQL query to count number of reportees for each manager and filter out those with atleast 5 reportees.
Write a SQL query to count number of reportees for each manager using GROUP BY clause
Add HAVING clause to filter out managers with atleast 5 reportees
Example: SELECT managerId, COUNT(id) AS num_reportees FROM table_name GROUP BY managerId HAVING num_reportees >= 5
Q60. what is the difference between a list and a tuple and what is list comprehension?
A list is mutable and can be modified, while a tuple is immutable and cannot be changed. List comprehension is a concise way to create lists.
List: Mutable, can be modified. Tuple: Immutable, cannot be changed.
List: Defined with square brackets []. Tuple: Defined with parentheses ().
List comprehension: Concise way to create lists by iterating over an iterable and applying an expression.
Example of list comprehension: squares = [x**2 for x in range(10)]
Q61. What is the difference between 'Extract Data' and 'Live Connection' in Tableau?
Extract Data saves a snapshot of data in Tableau workbook, while Live Connection directly connects to data source.
Extract Data creates a static copy of data in Tableau workbook, while Live Connection directly queries data source in real-time.
Extract Data is useful for working offline or with small datasets, while Live Connection is ideal for large datasets or when data is frequently updated.
Extract Data can improve performance as it reduces the need to query the data source r...read more
Q62. What is the difference between Tree Map and Heatmap in Tableau?
Tree Map visualizes hierarchical data using nested rectangles, while Heatmap displays data values using color gradients.
Tree Map displays data hierarchically with nested rectangles, where the size and color represent different measures.
Heatmap visualizes data values using color gradients, with darker colors indicating higher values.
Tree Map is useful for showing hierarchical data structures, while Heatmap is effective for comparing values across categories.
Example: Tree Map c...read more
Q63. how to do you detect outliers in the data and how do you treat them
Outliers can be detected using statistical methods and treated by either removing them or replacing them with a more appropriate value.
Use box plots, scatter plots, or histograms to visually identify outliers
Calculate the z-score or interquartile range (IQR) to identify outliers statistically
Remove outliers if they are due to data entry errors or measurement errors
Replace outliers with a more appropriate value if they are valid data points but extreme
Consider the impact of ou...read more
Q64. What do you know about R1RCM and the role you applied for?
R1RCM is a healthcare revenue cycle management company. The role is for a data analyst position.
R1RCM specializes in providing revenue cycle management services to healthcare providers.
The company uses advanced technology and analytics to optimize revenue cycle performance.
As a data analyst, the role involves analyzing and interpreting healthcare data to identify trends and insights.
The data analyst will work closely with other teams to develop strategies for improving revenu...read more
Q65. What is the use if store procedure ?
Stored procedures are precompiled SQL statements that can be reused and executed multiple times.
Stored procedures improve performance by reducing network traffic and improving security.
They can be used to encapsulate business logic and provide a consistent interface to the database.
Stored procedures can also be used to simplify complex queries and transactions.
Examples include procedures for inserting, updating, and deleting data, as well as generating reports and performing ...read more
Q66. Brexit vote and how it would effect the indian economy?
The Brexit vote could have both positive and negative effects on the Indian economy.
Positive effects: Increased trade opportunities with the UK, potential for attracting foreign investments from companies relocating from the UK.
Negative effects: Uncertainty in global markets leading to volatility in exchange rates, potential decline in exports to the UK.
Example: Indian IT companies may face challenges due to stricter immigration policies in the UK.
Example: Indian textile expo...read more
Q67. Compare two arrays in python and print if both of them are same or not?
Compare two arrays in python and print if both of them are same or not.
Use the '==' operator to compare the arrays.
If the arrays have the same elements in the same order, they are considered the same.
If the arrays have different elements or different order, they are considered different.
Print 'Same' if the arrays are the same, otherwise print 'Different'.
Q68. Do you have any experience working in cloud environments?
Yes, I have experience working in cloud environments.
I have worked with AWS, Azure, and Google Cloud Platform.
I have experience with cloud-based data storage and processing.
I have used cloud-based tools for data visualization and analysis.
I am familiar with cloud security and compliance measures.
Q69. Use lookup with the product based given data and find the needed data
Using lookup with product-based data to find the needed data.
Use a lookup function like VLOOKUP or INDEX/MATCH to search for the needed data
Identify the key or unique identifier to match the data
Specify the range or table where the data is located
Retrieve the desired data based on the lookup value
Q70. Reliance fresh wants to open a store in north-eastern part of India. Should it or should it not?
Yes, Reliance Fresh should open a store in the north-eastern part of India.
The north-eastern part of India has a growing population and increasing consumer demand.
Reliance Fresh can tap into the untapped market potential in the region.
Opening a store in the north-eastern part of India can help Reliance Fresh expand its presence and increase its market share.
Reliance Fresh can leverage its existing supply chain and distribution network to efficiently serve customers in the reg...read more
Q71. Do you have experience in converting python scripts to R?
Yes, I have experience in converting Python scripts to R.
I have converted several Python scripts to R for data analysis projects.
I am proficient in both Python and R programming languages.
I can provide examples of projects where I have successfully converted Python scripts to R.
Q72. extract the details of all employees whose salary is more than 5000000 and write a query.
Query to extract details of employees with salary more than 5000000.
Use SELECT statement to retrieve employee details
Add WHERE clause to filter employees with salary greater than 5000000
Example: SELECT * FROM employees WHERE salary > 5000000
Q73. How do you optimize and manage large datasets, and what data sources do you utilize to obtain this data?
I optimize and manage large datasets by using tools like SQL, Python, and Excel. I utilize various data sources such as databases, APIs, and web scraping.
Utilize SQL queries to efficiently extract and manipulate data from databases
Use Python for data cleaning, analysis, and visualization tasks
Leverage Excel for organizing and summarizing data in a user-friendly format
Obtain data from sources like internal databases, external APIs, and web scraping techniques
Q74. What was your role in the project, specifically related to data transformation and dashboard creation?
I was responsible for data cleaning, transformation, and visualization to create interactive dashboards for stakeholders.
Performed data cleaning and transformation to ensure accuracy and consistency of data
Utilized tools like SQL, Python, and Tableau to manipulate and visualize data
Collaborated with stakeholders to understand their requirements and design interactive dashboards
Generated insights from data analysis to drive decision-making processes
Q75. Can aggregate functions be used with HAVING clause? With WHERE clause? What is the difference between WHERE and HAVING clause? Can HAVING be used without GROUP BY?
Yes, aggregate functions can be used with HAVING and WHERE clauses. WHERE filters rows before grouping, HAVING filters groups after grouping.
Aggregate functions can be used with HAVING clause to filter groups based on the result of the aggregate function.
Aggregate functions can also be used with WHERE clause to filter rows before grouping.
WHERE clause filters rows before grouping, while HAVING clause filters groups after grouping.
HAVING clause requires a GROUP BY clause, but ...read more
Q76. If we have 3 numerical columns, which chart is a better representation these column?
A scatter plot is a better representation for 3 numerical columns.
Use a scatter plot to show the relationship between the numerical columns.
Scatter plots are effective for visualizing correlations and patterns in data.
Each point on the plot represents a data point with values from all 3 columns.
Q77. How your work will increase our revenue? By sharing the data we can improve our services.
Sharing data insights can help improve services and identify revenue opportunities.
Analyzing customer behavior can help identify areas for improvement
Identifying trends in sales data can help optimize pricing strategies
Using data to personalize marketing efforts can increase customer engagement
Predictive modeling can help identify potential revenue opportunities
Tracking key performance indicators can help optimize business operations
Q78. Describe the steps for Universe design from scratch?
Universe design involves identifying data sources, defining business requirements, creating a schema, and building the universe.
Identify data sources and gather requirements
Create a schema and map data to it
Build the universe using a tool like SAP BusinessObjects
Test and validate the universe
Publish the universe for end-users to access
Q79. How many cups of tea is consumed in Delhi for a week?
It is impossible to accurately determine the number of cups of tea consumed in Delhi for a week without proper data.
There is no official data available on the exact number of cups of tea consumed in Delhi for a week.
The number of cups of tea consumed in Delhi for a week varies depending on various factors such as weather, season, and individual preferences.
It is estimated that Delhi consumes a significant amount of tea due to its cultural and historical significance.
Factors s...read more
Q80. What are KPIs, for the given data what are possible useful KPIs.
KPIs are Key Performance Indicators used to measure the performance of a business or organization. Possible KPIs for data analysis include revenue, customer acquisition cost, conversion rate, and customer retention rate.
Revenue: Total income generated from sales
Customer acquisition cost: Cost incurred to acquire a new customer
Conversion rate: Percentage of website visitors who take a desired action
Customer retention rate: Percentage of customers who continue to use a product ...read more
Q81. There is table which holds details of matches played between the countries in first 2 columns and who won the match in 3rd column. write a query to get no. of matches played by each team and no. of matches won...
read moreQuery to get number of matches played and won by each team from a table of match details.
Use GROUP BY clause to group the data by team name.
Use COUNT() function to get the number of matches played by each team.
Use SUM() function with a condition to get the number of matches won by each team.
Q82. Can regression be used to solve Classification Problems?
No, regression is used for predicting continuous values, while classification is used for predicting categorical values.
Regression predicts continuous values, while classification predicts categorical values
Regression models include linear regression, polynomial regression, etc.
Classification models include logistic regression, decision trees, etc.
Q83. 7 gold bars connected with chain. how to give the gold bar to worker for each day, with minimal number if cuts.
Cut the chain twice to give one gold bar each day.
Cut the chain between the first and second gold bar to give one bar each day.
On the last day, cut the chain between the sixth and seventh gold bar to give the last bar.
Total number of cuts required is 2.
Q84. Why the offline data is so important for any company?
Offline data is important for companies as it provides insights into customer behavior and preferences.
Offline data can help companies understand customer behavior and preferences
It can be used to identify trends and patterns in customer data
Offline data can also be used to improve customer experience and personalize marketing efforts
Examples of offline data include in-store purchases, customer service interactions, and surveys
Offline data can be combined with online data to ...read more
Q85. What is Regression and Classification in Machine Learning??
Regression predicts continuous values while classification predicts categorical values.
Regression is used to predict a continuous output variable based on one or more input variables.
Classification is used to predict a categorical output variable based on one or more input variables.
Regression algorithms include linear regression, polynomial regression, and logistic regression.
Classification algorithms include decision trees, random forests, and support vector machines.
Exampl...read more
Q86. Tell us data science project where you developed a script for forecastings, KNN, Data science models with practical examples.
Developed a script for forecasting using KNN and data science models.
Developed a script using Python to forecast sales for a retail company using KNN algorithm.
Built a data science model to predict customer churn for a telecom company using logistic regression.
Used time series analysis to forecast stock prices for a financial services company.
Developed a script to predict the likelihood of a customer defaulting on a loan using decision trees.
Implemented a recommendation syste...read more
Q87. Compare 2 scorce files record count and print if their are equally or not
Compare record count of 2 score files and print if they are equal or not.
Read both score files
Count the number of records in each file
Compare the record counts
Print if they are equal or not
Q88. What according to you is financial Market?
Financial market is a platform where buyers and sellers trade financial assets such as stocks, bonds, currencies, and commodities.
Financial market facilitates the exchange of financial assets between buyers and sellers
It includes stock markets, bond markets, currency markets, and commodity markets
Financial market plays a crucial role in the economy by allocating capital to businesses and governments
Examples of financial markets include NYSE, NASDAQ, London Stock Exchange, and...read more
Q89. Write a sql query to find out different routes for a airline. Table has two columns which are destination and source
Use a SQL query to find different routes for an airline based on source and destination columns in a table.
Use a SELECT statement to retrieve the distinct combinations of source and destination.
Use the DISTINCT keyword to ensure only unique routes are returned.
Order the results by source and destination for easier analysis.
Q90. what is the difference between columns and measures in power BI
Columns are the fields in a dataset, while measures are calculations based on those fields in Power BI.
Columns are the raw data fields in a dataset, such as 'Sales Amount' or 'Product Category'.
Measures are calculations performed on columns, like sum, average, count, etc.
Measures are created using DAX (Data Analysis Expressions) in Power BI.
Columns are used for grouping, filtering, and slicing data, while measures are used for calculations and aggregations.
Q91. Why interested in the field of Data Analytics, Why S&P?
Passionate about leveraging data to drive insights and decisions, attracted to S&P's reputation and diverse data sets.
Fascinated by the power of data to uncover trends and patterns
Enjoy problem-solving and making data-driven decisions
Impressed by S&P's reputation for accurate and reliable data
Excited about the opportunity to work with diverse data sets at S&P
Q92. Why logistic regression called why not classifier called.
Logistic regression is a type of classifier that uses a logistic function to model a binary dependent variable.
Logistic regression is a statistical method used to analyze a dataset in which there are one or more independent variables that determine an outcome.
It is called logistic regression because it uses a logistic function to model a binary dependent variable.
The logistic function is an S-shaped curve that can take any real-valued number and map it into a value between 0 ...read more
Q93. Why python is an awful language and would you be willing to work as a back end developer using spring or Node
Python is not an awful language, but I am open to working with other technologies like Spring or Node.
Python is a versatile and widely-used language in data analysis and machine learning.
It has a large community and extensive libraries like Pandas and NumPy.
Spring and Node are also popular choices for backend development, offering different strengths and capabilities.
I am open to learning and working with new technologies to expand my skill set.
Q94. Process route for Steel Making Process, What are the prerequisites for making a steel water bottle? Draw the Hall Herault process diagram
The Steel Making Process and Prerequisites for Making a Steel Water Bottle
The steel making process involves converting iron ore into steel through various stages such as smelting, refining, and casting
Prerequisites for making a steel water bottle include obtaining high-quality steel, designing the bottle, and manufacturing it through processes such as stamping and welding
The Hall Herault process is a method for producing aluminum through the electrolysis of alumina dissolved ...read more
Q95. Puzzle of how to fill 4litres water in 3litre and 5litre capacity vessel
Fill 4L water in 3L and 5L capacity vessels
Fill 3L vessel completely
Pour the water from 3L vessel into 5L vessel
Fill 3L vessel again
Pour the water from 3L vessel into 5L vessel until it's full
The remaining water in 3L vessel is 1L
Q96. Q1. Case study for Number of Green T-shirts sold in US?
A case study on the number of green T-shirts sold in the US.
Identify the target audience for green T-shirts
Analyze the market demand for green T-shirts
Study the sales data of green T-shirts in the US
Identify the popular brands and styles of green T-shirts
Analyze the impact of seasonality on sales
Consider the pricing strategy of green T-shirts
Identify potential marketing opportunities to increase sales
Q97. What are the joins used in SQL and What is Left Outer Join
SQL joins are used to combine data from two or more tables. Left Outer Join returns all the rows from the left table and matching rows from the right table.
Joins are used to combine data from two or more tables in SQL
Common types of joins are Inner Join, Left Outer Join, Right Outer Join, and Full Outer Join
Left Outer Join returns all the rows from the left table and matching rows from the right table
Syntax: SELECT * FROM table1 LEFT JOIN table2 ON table1.column = table2.colu...read more
Q98. a men fell in 50m well he climbs 4m up and slips 3 m down in a day.how many days would it take him to come out of the well?
It would take the man 47 days to come out of the well.
The man climbs 1m (4m up - 3m down) each day.
He needs to climb a total of 50m to come out of the well.
Therefore, it would take him 47 days to climb the remaining 46m (50m - 4m).
Q99. Dcount function ABS function Excel remove minus sign from integer using a formula.
Dcount function counts the number of unique values in a dataset. ABS function returns the absolute value of a number. To remove minus sign from an integer in Excel, use a formula like =ABS(A1).
Dcount function counts unique values in a dataset
ABS function returns absolute value of a number
To remove minus sign from an integer in Excel, use formula like =ABS(A1)
Q100. How to track 1 million records being generated online every 5 minutes
Use a distributed system with real-time processing to track 1 million records generated every 5 minutes.
Implement a distributed system like Apache Kafka or Apache Spark to handle the large volume of data.
Use real-time processing to ensure that the data is analyzed and tracked as soon as it is generated.
Consider using cloud-based solutions like AWS or Google Cloud Platform for scalability and cost-effectiveness.
Implement data validation and error handling to ensure data accura...read more
Interview Questions of Similar Designations
Top Interview Questions for Data Analyst Related Skills
Interview experiences of popular companies
Calculate your in-hand salary
Confused about how your in-hand salary is calculated? Enter your annual salary (CTC) and get your in-hand salary
Reviews
Interviews
Salaries
Users/Month