Add office photos
Engaged Employer

Tiger Analytics

3.6
based on 593 Reviews
Filter interviews by

90+ Uplers Interview Questions and Answers

Updated 16 Dec 2024
Popular Designations

Q1. Q4. What is the probability of getting 5 Sundays in 31 day month.

Ans.

The probability of getting 5 Sundays in a 31 day month is less than 1%.

  • There are 7 days in a week, so the probability of any given day being a Sunday is 1/7.

  • In a 31 day month, there are 4 full weeks and 3 extra days.

  • The probability of the first 4 weeks having 4 Sundays is (1/7)^4.

  • The probability of the remaining 3 days being Sundays is (3/7).

  • Multiplying these probabilities gives a total probability of less than 1%.

View 5 more answers

Q2. Q4. You are standing in a field. Chances of seeing atleast 1 plane in 10 minutes is 15%. What is the probability of seeing atleast 1 plane in next 30 minutes?

Ans.

Probability of seeing a plane in 30 minutes given 15% chance in 10 minutes.

  • Calculate the probability of not seeing a plane in 10 minutes

  • Use the formula P(X>=1) = 1 - P(X=0)

  • Calculate the probability of not seeing a plane in 30 minutes using the above probability

  • Calculate the probability of seeing atleast 1 plane in 30 minutes using the formula P(X>=1) = 1 - P(X=0)

View 3 more answers

Q3. Q5. If we select a random point in a circle of 1 unit radius what is the probability of appearing that point closer to the circumference , not closer to the centre.

Ans.

Probability of a random point in a circle of 1 unit radius being closer to the circumference than the center.

  • The probability is 1/4 or approximately 0.785.

  • This is because the area of the circle closer to the circumference is 1/4th of the total area.

  • This can be calculated using the formula for the area of a circle: A = πr^2.

View 1 answer

Q4. Q1. Implement python Collection Counter from Scratch.

Ans.

Implementing Python Collection Counter from Scratch

  • Create an empty dictionary to store the elements and their count

  • Iterate through the input list and add elements to the dictionary with their count

  • Return the dictionary

  • Example: input_list = ['apple', 'banana', 'apple', 'orange', 'banana']

  • Output: {'apple': 2, 'banana': 2, 'orange': 1}

View 2 more answers
Discover Uplers interview dos and don'ts from real experiences

Q5. Q2. What will be the approach If all the features are categorical in Linear Regression. Q3. What is Dummy variable trap? If we don't remove dummy variable what will be the issue and does it impact performance o...

read more
Ans.

Categorical features in Linear Regression require encoding using dummy variables. Removing one dummy variable avoids the dummy variable trap.

  • Categorical features need to be encoded using dummy variables to be used in Linear Regression

  • Dummy variable trap occurs when one dummy variable can be predicted from the others

  • Removing one dummy variable avoids the issue of multicollinearity and improves model performance

  • Example: Gender (Male/Female) can be encoded as a dummy variable wi...read more

Add your answer

Q6. Q1. Implement a Program to check if a number is power of 3 .

Ans.

Program to check if a number is power of 3

  • Use logarithm to check if the result is an integer

  • Check if the number is greater than 0

  • Check if the remainder is 0 when the number is divided by 3 repeatedly

View 1 answer
Are these interview questions helpful?

Q7. Q2. Do Matrix Multiplication. Q3. Implement Factorial and Fibonacci Series with different Approaches.

Ans.

Matrix multiplication, factorial and Fibonacci series implementation

  • Matrix multiplication involves multiplying two matrices to get a third matrix

  • Factorial is the product of all positive integers up to a given number

  • Fibonacci series is a sequence of numbers where each number is the sum of the two preceding ones

  • Factorial can be implemented using recursion or iteration

  • Fibonacci series can be implemented using recursion or iteration

Add your answer

Q8. Q5. There were 100 coins. 99 Unbiased Coins, 1. Coin is biased. Derive the probability of getting 10 heads given the even of unbiased coins using Bayes Theorem.

Ans.

Using Bayes Theorem, find the probability of getting 10 heads given 99 unbiased coins and 1 biased coin.

  • Identify the prior probability of getting 10 heads with unbiased coins

  • Calculate the likelihood of getting 10 heads with the biased coin

  • Use Bayes Theorem to calculate the posterior probability of getting 10 heads given the mix of coins

  • Consider the impact of the biased coin on the overall probability

Add your answer
Share interview questions and help millions of jobseekers 🌟

Q9. How can you prove to the client that a students with higher classes are taller than that of lower classes?

Ans.

We can use statistical analysis to prove that students in higher classes are taller than those in lower classes.

  • Collect height data of students from different classes

  • Use statistical measures like mean, median, and mode to compare the heights of students in different classes

  • Perform hypothesis testing to determine if the difference in height between classes is statistically significant

  • Visualize the data using graphs and charts to make it easier for the client to understand

  • Provi...read more

View 3 more answers

Q10. What is entropy ? What is gini index? Give a real life example of derivative and second derivative. What is the difference between P-value and beta value? How do you handle imbalanced dataset? What is the diffe...

read more
Ans.

Entropy is a measure of randomness or disorder in a system. Gini index is a measure of impurity in a dataset. Derivatives measure rate of change. P-value is the probability of observing a test statistic. Beta value is the coefficient in a regression model. Imbalanced datasets have unequal class distribution. Recall is the proportion of actual positives correctly identified. Precision is the proportion of predicted positives that are actually positive. Slope in one variable is...read more

Add your answer

Q11. Why accuracy score should not be used on imbalanced dataset?

Ans.

Accuracy score can be misleading on imbalanced datasets.

  • Accuracy score can be high even if the model is not performing well on the minority class.

  • F1 score, precision, and recall are better metrics for imbalanced datasets.

  • Stratified sampling, oversampling, and undersampling can help balance the dataset.

  • Example: A model predicting cancer in a dataset with only 1% positive cases.

  • Using accuracy score, a model that always predicts negative will have 99% accuracy.

  • However, this mode...read more

Add your answer

Q12. Regression models: Which one should be used in which case?

Ans.

Different regression models are used based on the type of data and relationship between variables.

  • Linear regression is used when there is a linear relationship between the independent and dependent variables.

  • Logistic regression is used when the dependent variable is binary.

  • Polynomial regression is used when the relationship between variables is non-linear.

  • Ridge regression is used when there is multicollinearity in the data.

  • Lasso regression is used when feature selection is im...read more

Add your answer

Q13. Different varieties on Fibonacci series in Python.

Ans.

Different varieties of Fibonacci series in Python.

  • Standard Fibonacci series

  • Fibonacci series with user-defined starting numbers

  • Fibonacci series with user-defined length

  • Fibonacci series with user-defined step

  • Fibonacci series with user-defined function

Add your answer

Q14. ML algorithm overview of what I have used in my projects

Ans.

I have used various ML algorithms such as linear regression, decision trees, random forests, and neural networks in my projects.

  • Linear regression for predicting continuous values

  • Decision trees for classification and regression tasks

  • Random forests for ensemble learning and improved accuracy

  • Neural networks for complex pattern recognition

Add your answer

Q15. What is the difference between List and Tuple?

Ans.

List is mutable and Tuple is immutable in Python.

  • List can be modified after creation while Tuple cannot be modified.

  • List uses square brackets [] while Tuple uses parentheses ().

  • List is used for homogenous data while Tuple is used for heterogenous data.

  • List is slower than Tuple in terms of performance.

  • Example of List: [1, 2, 3] and Example of Tuple: (1, 'hello', 3.14)

View 1 answer

Q16. List of stock prizes, identify the days when a person should buy and sell to earn maximum profit

Ans.

To maximize profit, buy when the stock price is low and sell when it is high.

  • Identify the lowest price point to buy the stock

  • Identify the highest price point to sell the stock

  • Consider market trends and analysis for optimal buying and selling days

Add your answer

Q17. 1. Confusion Matrix 2. What is recall and precision? 3. Explain about ROC curve 4. Based on what RFE eliminate the features? 5. SQL question which requires grouping 6. How to read a dataframe, display top 5 row...

read more
Ans.

Interview questions for Senior Analyst Data Science

  • Confusion matrix is a table used to evaluate the performance of a classification model

  • Recall is the ratio of true positives to the sum of true positives and false negatives

  • Precision is the ratio of true positives to the sum of true positives and false positives

  • ROC curve is a graphical representation of the performance of a binary classifier

  • RFE eliminates features based on their importance to the model

  • SQL question may involve ...read more

Add your answer

Q18. What is permutation and combination and how is it used in data science?

Ans.

Permutation and combination are mathematical concepts used to count the number of possible outcomes in a given scenario.

  • Permutation is the arrangement of objects in a specific order while combination is the selection of objects without considering the order.

  • Permutation formula: nPr = n!/(n-r)! where n is the total number of objects and r is the number of objects selected.

  • Combination formula: nCr = n!/r!(n-r)! where n is the total number of objects and r is the number of objec...read more

Add your answer

Q19. What is P-value in regression summary?

Ans.

P-value in regression summary measures the probability of observing a test statistic as extreme as the one computed from the sample data.

  • P-value is used to determine the statistical significance of the regression coefficient.

  • A low P-value (less than 0.05) indicates that the coefficient is statistically significant.

  • A high P-value (greater than 0.05) indicates that the coefficient is not statistically significant.

  • P-value is calculated using the t-test or F-test depending on the...read more

Add your answer

Q20. Compare two arrays in python and print if both of them are same or not?

Ans.

Compare two arrays in python and print if both of them are same or not.

  • Use the '==' operator to compare the arrays.

  • If the arrays have the same elements in the same order, they are considered the same.

  • If the arrays have different elements or different order, they are considered different.

  • Print 'Same' if the arrays are the same, otherwise print 'Different'.

View 2 more answers

Q21. Explain databricks dlt, and when will you use batch vs streaming?

Ans.

Databricks DLT is a unified data management platform for batch and streaming processing.

  • Databricks DLT (Delta Lake Table) is a storage layer that brings ACID transactions to Apache Spark and big data workloads.

  • Batch processing is used when data is collected over a period of time and processed in large chunks, while streaming processing is used for real-time data processing.

  • Use batch processing for historical data analysis, ETL jobs, and periodic reporting. Use streaming proce...read more

Add your answer

Q22. How is memory managed in Python?

Ans.

Python uses automatic memory management through garbage collection.

  • Python uses reference counting to keep track of memory usage.

  • When an object's reference count drops to zero, it is deleted.

  • Python also uses a garbage collector to handle circular references.

  • Memory allocation is handled by the Python memory manager.

  • Python provides tools like the 'gc' module for managing memory usage.

Add your answer

Q23. Sample T test. What is it?

Ans.

Sample T test is a statistical test used to determine if there is a significant difference between the means of two groups.

  • It is used to compare the means of two groups.

  • It assumes that the data is normally distributed.

  • It is commonly used in research studies to determine if a treatment has a significant effect.

  • Example: A sample T test can be used to compare the mean weight of two groups of people who followed different diets.

Add your answer

Q24. Get Second highest element from an array (duplicates elements are allowed). Required T.C-->O(N) Single traversal. S.C--->O(1)

Ans.

Get second highest element from an array of strings with O(N) time complexity and O(1) space complexity.

  • Initialize two variables to store the highest and second highest elements.

  • Traverse the array and update the variables accordingly.

  • Return the second highest element.

  • Handle edge cases like empty array or array with only one element.

Add your answer

Q25. What are the relevant projects in Data science & expertise in whatt all tools & technologies

Ans.

Relevant projects in Data Science and expertise in tools and technologies

  • Projects: Predictive modeling, Natural Language Processing, Computer Vision, Recommender Systems, Time Series Analysis

  • Tools: Python, R, SQL, Tableau, Hadoop, Spark, TensorFlow, Keras, Scikit-learn

  • Technologies: Machine Learning, Deep Learning, Big Data, Cloud Computing, Data Visualization

Add your answer

Q26. Different type of license in power bi. Data Modelling.

Ans.

Power BI offers different types of licenses for data modeling, including Power BI Pro and Power BI Premium.

  • Power BI Pro license allows users to create and share reports and dashboards with others.

  • Power BI Premium license offers additional features such as larger data capacity and advanced AI capabilities.

  • Power BI Embedded license is designed for embedding reports and dashboards into custom applications.

  • Power BI Report Server license allows for on-premises report publishing an...read more

Add your answer

Q27. 1. Different Types of integration runtime in adf 2. How to copy 100 files from one adls path to another in adf 3. Diff between DAG and Lineage , narrow and wide transformation in Spark 4. DBUtils questions. 5. ...

read more
Ans.

The interview questions cover topics related to Azure Data Factory, Spark, and Python programming.

  • Integration runtimes in ADF include Azure, Self-hosted, and SSIS IRs.

  • To copy 100 files in ADF, use a Copy Data activity with a wildcard path in source and sink datasets.

  • DAG in Spark represents a directed acyclic graph of computation, while lineage tracks the data flow.

  • Narrow transformations in Spark operate on a single partition, wide transformations shuffle data across partition...read more

Add your answer

Q28. What is the difference between deltalake and delta warehouse

Ans.

Delta Lake is an open-source storage layer that brings ACID transactions to Apache Spark and big data workloads, while Delta Warehouse is a cloud-based data warehouse service.

  • Delta Lake is an open-source storage layer that brings ACID transactions to Apache Spark and big data workloads.

  • Delta Warehouse is a cloud-based data warehouse service that provides scalable storage and analytics capabilities.

  • Delta Lake is more focused on data lake operations and ensuring data reliabilit...read more

Add your answer

Q29. What is R-squared?

Ans.

R-squared is a statistical measure that represents the proportion of the variance for a dependent variable that's explained by an independent variable.

  • R-squared ranges from 0 to 1, with 1 indicating that all variance in the dependent variable is explained by the independent variable.

  • It is used in regression analysis to determine how well the regression line fits the data points.

  • A higher R-squared value indicates a better fit of the model to the data, while a lower value sugge...read more

Add your answer

Q30. Architecture diagram of project

Ans.

The architecture diagram of the project showcases the overall structure and components of the system.

  • The architecture diagram typically includes components like servers, databases, APIs, and client applications.

  • It shows how these components interact with each other and the flow of data within the system.

  • Commonly used tools for creating architecture diagrams include Microsoft Visio, Lucidchart, and draw.io.

Add your answer

Q31. Coffiecent of x^7 in equation ? y=(x^101-1)(x^100+1)(x^99-1)...........................................(X^0+1)

Ans.

Coffiecent of x^7 in a given equation

  • Use the binomial theorem to expand the equation

  • Identify the term with x^7

  • The coefficient of x^7 is the coefficient of that term

Add your answer

Q32. most frequent word in a sentence ?

Ans.

The most frequent word in a sentence can be found by counting the occurrence of each word and selecting the one with the highest count.

  • Split the sentence into words using whitespace as delimiter

  • Create a dictionary to store the count of each word

  • Iterate through the words and update the count in the dictionary

  • Find the word with the highest count in the dictionary

Add your answer

Q33. Whats the evaluation mertics for classification and regression model?bias and variance

Ans.

Evaluation metrics for classification and regression models are different. Bias and variance are important factors to consider.

  • Classification metrics include accuracy, precision, recall, F1 score, ROC curve, and AUC.

  • Regression metrics include mean squared error, mean absolute error, R-squared, and adjusted R-squared.

  • Bias refers to the difference between the predicted values and the actual values, while variance refers to the variability of the model's predictions.

  • High bias in...read more

Add your answer

Q34. how to find a largest number in a list without using inbuilt function

Ans.

Iterate through the list and compare each element to find the largest number.

  • Iterate through the list using a loop

  • Compare each element with a variable storing the current largest number

  • Update the variable if a larger number is found

Add your answer

Q35. Difference between generator and iterator?

Ans.

Generator generates values on the fly while iterator iterates over a collection of values.

  • Generator is a function that returns an iterator.

  • Generators use 'yield' keyword to return values one at a time.

  • Iterators are objects that implement the 'next' method to return the next value in a collection.

  • Iterators can be created from arrays, strings, maps, sets, etc.

  • Generators are useful for generating large sequences of values without having to store them in memory.

  • Iterators are usef...read more

Add your answer

Q36. What are decision Trees and All the algorithms that you have used in ur project?

Ans.

Decision Trees are a type of supervised learning algorithm used for classification and regression tasks.

  • Decision Trees are used to create a model that predicts the value of a target variable based on several input variables.

  • The algorithm splits the data into subsets based on the most significant attribute and continues recursively until a leaf node is reached.

  • Some of the algorithms used in my project include Random Forest, Gradient Boosting, and XGBoost.

  • Random Forest is an en...read more

Add your answer

Q37. What is difference between C and gamma in SVM

Ans.

C is the regularization parameter while gamma controls the shape of the decision boundary in SVM.

  • C controls the trade-off between achieving a low training error and a low testing error.

  • A smaller C value creates a wider margin and allows more misclassifications.

  • Gamma controls the shape of the decision boundary and the influence of each training example.

  • A smaller gamma value creates a smoother decision boundary while a larger gamma value creates a more complex decision boundary...read more

Add your answer

Q38. Do you know numpy pandas?

Ans.

Yes, numpy and pandas are Python libraries used for data analysis and manipulation.

  • NumPy is used for numerical operations on arrays and matrices.

  • Pandas is used for data manipulation and analysis, providing data structures like DataFrame.

  • Both libraries are commonly used in data science and machine learning.

  • Example: import numpy as np; import pandas as pd;

Add your answer

Q39. Expected ctc and current ctc negotiations

Ans.

Discussing expected and current salary for negotiation purposes.

  • Be honest about your current salary and provide a realistic expectation for your desired salary.

  • Highlight your skills and experience that justify your desired salary.

  • Be open to negotiation and willing to discuss other benefits besides salary.

  • Research industry standards and salary ranges for similar positions to support your negotiation.

  • Focus on the value you can bring to the company rather than just the monetary ...read more

Add your answer

Q40. Project description

Ans.

Developed a data analysis project to optimize marketing strategies for a retail company.

  • Utilized customer segmentation techniques to identify target demographics

  • Analyzed sales data to determine the most effective marketing channels

  • Implemented A/B testing to measure the impact of different marketing campaigns

Add your answer

Q41. Why use MSE metric

Ans.

MSE metric is commonly used in data analysis to measure the average squared difference between predicted values and actual values.

  • MSE helps to quantify the accuracy of a model by penalizing large errors more than small errors.

  • It is easy to interpret as it gives a clear measure of how well the model is performing.

  • MSE is differentiable, making it suitable for optimization algorithms like gradient descent.

  • Example: In linear regression, MSE is often used to evaluate the performan...read more

Add your answer

Q42. Why use MSE metrics

Ans.

MSE metrics are commonly used to measure the average squared difference between predicted values and actual values in statistical analysis.

  • MSE helps in evaluating the performance of a predictive model by quantifying the accuracy of the model's predictions.

  • It penalizes large errors more heavily than small errors, making it a useful metric for identifying outliers or areas where the model is underperforming.

  • MSE is widely used in machine learning, regression analysis, and time s...read more

Add your answer

Q43. How to handle imbalanced data in text analytics?

Ans.

Imbalanced data in text analytics can be handled by techniques like oversampling, undersampling, and SMOTE.

  • Use oversampling to increase the number of instances in the minority class

  • Use undersampling to decrease the number of instances in the majority class

  • Use SMOTE to generate synthetic samples for the minority class

  • Use cost-sensitive learning algorithms to assign higher misclassification costs to the minority class

  • Use ensemble methods like bagging and boosting to combine mul...read more

Add your answer

Q44. What are different types of indexing

Ans.

Different types of indexing include primary indexing, secondary indexing, clustered indexing, and non-clustered indexing.

  • Primary indexing: Index based on the primary key of a table, typically implemented using a B-tree structure.

  • Secondary indexing: Index based on a non-primary key column, allowing for faster retrieval of data based on that column.

  • Clustered indexing: Physically reorders the table based on the indexed column, leading to faster retrieval of data but slower inser...read more

Add your answer

Q45. What is probability

Ans.

Probability is the likelihood of a specific event occurring, expressed as a number between 0 and 1.

  • Probability ranges from 0 (impossible event) to 1 (certain event)

  • It can be calculated by dividing the number of favorable outcomes by the total number of possible outcomes

  • Probability can be represented as a percentage, fraction, or decimal

Add your answer

Q46. Explain null hypothesis and p-value in terms of probability

Ans.

Null hypothesis is a statement that assumes no relationship or difference between variables. P-value is the probability of obtaining results as extreme as the observed data, assuming the null hypothesis is true.

  • Null hypothesis is a statement that assumes no effect or relationship between variables

  • P-value is the probability of obtaining results as extreme as the observed data, assuming the null hypothesis is true

  • Null hypothesis is typically denoted as H0, while an alternative ...read more

Add your answer

Q47. What is word embedding and explain it's significance?

Ans.

Word embedding is a technique to represent words as vectors in a high-dimensional space.

  • Word embedding captures the semantic meaning of words and their relationships.

  • It is used in natural language processing tasks such as text classification, sentiment analysis, and machine translation.

  • Popular word embedding models include Word2Vec, GloVe, and FastText.

  • Word embedding can be pre-trained on large corpora or trained on specific domain data.

  • It reduces the dimensionality of the in...read more

Add your answer

Q48. What is your weakness ans strength?

Ans.

My weakness is overthinking and my strength is attention to detail.

  • Weakness: tend to overthink situations, which can lead to indecision or anxiety

  • Strength: strong attention to detail, ensuring accuracy and thoroughness in work

  • Example: Weakness - I sometimes spend too much time analyzing a problem before taking action. Strength - I am meticulous in my work, catching even the smallest errors.

Add your answer

Q49. Difference between PCA, KNN , Decision Tree

Ans.

PCA reduces dimensionality, KNN is a non-parametric classification algorithm, Decision Tree is a tree-like model for classification.

  • PCA is used for dimensionality reduction by transforming data into a new coordinate system

  • KNN is a non-parametric classification algorithm that classifies new data points based on similarity to training data

  • Decision Tree is a tree-like model where each internal node represents a feature, each branch represents a decision, and each leaf node repre...read more

Add your answer

Q50. Python code to find the root of a number

Ans.

Python code to find the root of a number

  • Use the math module to access the sqrt() function

  • Use the ** operator to raise the number to the power of 1/n

  • Handle negative numbers by converting them to complex numbers

Add your answer

Q51. write a CI/CD Pipeline code for a 3 tier application

Ans.

A CI/CD Pipeline code for a 3 tier application

  • Use a version control system like Git to store the application code

  • Set up a CI tool like Jenkins to automate the build process

  • Define stages in the pipeline for building, testing, and deploying each tier of the application

  • Leverage tools like Docker for containerization and Kubernetes for orchestration

  • Implement automated testing at each stage to ensure code quality and reliability

Add your answer

Q52. Write python code

Ans.

Python code to find the sum of all elements in a list

  • Use the sum() function to find the sum of all elements in a list

  • Ensure the list contains only numeric values for accurate results

Add your answer

Q53. How does random forest work

Ans.

Random forest is an ensemble learning method that builds multiple decision trees and merges their predictions.

  • Random forest creates a set of decision trees from randomly selected subsets of the training data.

  • Each tree in the random forest independently predicts the outcome, and the final prediction is made by averaging the predictions of all the trees.

  • Random forest is effective in handling high-dimensional data and can handle missing values and outliers well.

  • It is a popular a...read more

Add your answer

Q54. What are the methods of variable selection?

Add your answer

Q55. how did i solve business problems through analytics

Ans.

I utilized data analytics to identify root causes of business problems and develop effective solutions.

  • Utilized data analytics tools such as Excel, Tableau, and SQL to analyze large datasets

  • Identified trends and patterns in data to pinpoint areas of improvement

  • Developed predictive models to forecast future business outcomes

  • Collaborated with cross-functional teams to implement data-driven solutions

  • Monitored key performance indicators to track the success of implemented solutio...read more

Add your answer

Q56. Write a program to arrange array of numbers in ascending order.

Add your answer

Q57. What is indexing in SQl

Ans.

Indexing in SQL is a technique to improve the performance of queries by creating a data structure that allows for faster retrieval of data.

  • Indexes are created on columns in a database table to speed up the retrieval of data.

  • They work similar to the index in a book, allowing the database to quickly find the rows that match a certain value.

  • Indexes can be created using single or multiple columns.

  • Examples: CREATE INDEX index_name ON table_name(column_name);

Add your answer

Q58. Design round for adf pipeline

Ans.

Designing an ADF pipeline for data processing

  • Identify data sources and destinations

  • Define data transformations and processing steps

  • Consider scheduling and monitoring requirements

  • Utilize ADF activities like Copy Data, Data Flow, and Databricks

  • Implement error handling and logging mechanisms

Add your answer

Q59. 1. Explain about clustering methods.

Ans.

Clustering methods group similar data points together based on their characteristics.

  • Clustering is an unsupervised learning technique.

  • It is used to identify patterns and groupings in data.

  • Common clustering methods include k-means, hierarchical, and density-based clustering.

  • K-means clustering partitions data into k clusters based on distance from a centroid.

  • Hierarchical clustering creates a tree-like structure of nested clusters.

  • Density-based clustering identifies areas of hig...read more

Add your answer

Q60. Explain Linear and Logistic Regression

Ans.

Linear regression is used for predicting continuous numerical values, while logistic regression is used for predicting binary categorical values.

  • Linear regression models the relationship between a dependent variable and one or more independent variables using a linear equation.

  • Logistic regression models the probability of a binary outcome using a logistic function.

  • Linear regression is used for tasks like predicting house prices based on features like area and number of rooms....read more

Add your answer

Q61. Sort nearly sortes array.

Ans.

Sort nearly sorted array using min heap

  • Create a min heap of size k+1

  • Insert first k+1 elements into min heap

  • For remaining elements, extract min and insert new element

  • Extract all remaining elements from min heap

  • Time complexity: O(nlogk)

  • Example: ['apple', 'banana', 'cherry', 'date', 'elderberry']

Add your answer

Q62. What is random forest

Ans.

Random forest is an ensemble learning method used for classification and regression tasks.

  • Random forest is a collection of decision trees that are trained on random subsets of the data.

  • Each tree in the random forest independently predicts the target variable, and the final prediction is made by averaging the predictions of all trees.

  • Random forest is robust to overfitting and noisy data, and can handle large datasets with high dimensionality.

  • It is a popular machine learning al...read more

Add your answer

Q63. Rotate two dimensional array

Ans.

Rotate a 2D array by 90 degrees clockwise or counterclockwise.

  • Transpose the matrix by swapping elements across the diagonal

  • Reverse each row or column depending on clockwise or counterclockwise rotation

  • Example: [[1,2],[3,4]] rotated clockwise becomes [[3,1],[4,2]]

Add your answer

Q64. Explain macros inExcel

Ans.

Macros in Excel are automated sequences of commands that can be created to perform repetitive tasks.

  • Macros can be recorded or written using Visual Basic for Applications (VBA)

  • They can automate tasks such as formatting, data manipulation, and calculations

  • Macros can be assigned to buttons or keyboard shortcuts for easy access

  • They can save time and reduce errors in repetitive tasks

Add your answer

Q65. What is PCA? Explain the working.

Ans.

PCA stands for Principal Component Analysis. It is a dimensionality reduction technique used to reduce the number of variables in a dataset while preserving the most important information.

  • PCA is used to transform high-dimensional data into a lower-dimensional space by finding the principal components that explain the maximum variance in the data.

  • The first principal component is the direction in which the data varies the most, followed by the second principal component, and so...read more

Add your answer

Q66. Explain lasso in feature selection

Ans.

Lasso is a feature selection technique that penalizes the absolute size of the regression coefficients.

  • Lasso stands for Least Absolute Shrinkage and Selection Operator

  • It adds a penalty term to the regression equation, forcing some coefficients to be exactly zero

  • Helps in selecting the most important features and reducing overfitting

  • Useful when dealing with high-dimensional data

  • Example: In a dataset with multiple features, lasso regression can be used to select the most relevan...read more

Add your answer

Q67. Experience in terraform Azure DevOps CI/CD Kubernetes Monitoring

Ans.

I have extensive experience in Terraform, Azure DevOps CI/CD, Kubernetes, and monitoring tools.

  • Implemented infrastructure as code using Terraform to automate provisioning of resources

  • Set up CI/CD pipelines in Azure DevOps for automated deployment

  • Managed Kubernetes clusters for container orchestration

  • Utilized monitoring tools like Prometheus and Grafana for performance tracking

Add your answer

Q68. What are different types of filters in power bi

Add your answer

Q69. Explain spark architecture

Ans.

Spark architecture is a distributed computing framework that consists of a driver program, cluster manager, and worker nodes.

  • Consists of a driver program that manages the execution of tasks

  • Utilizes a cluster manager to allocate resources and schedule tasks

  • Worker nodes execute the tasks and store data in memory or disk

  • Supports fault tolerance through resilient distributed datasets (RDDs)

Add your answer

Q70. Difference between Ridge and Lasso regression

Ans.

Ridge and Lasso regression are both regularization techniques used in linear regression to prevent overfitting.

  • Ridge regression adds a penalty equivalent to the square of the magnitude of coefficients, while Lasso regression adds a penalty equivalent to the absolute value of the magnitude of coefficients.

  • Ridge regression shrinks the coefficients towards zero but never exactly to zero, while Lasso regression can shrink some coefficients to zero, effectively performing feature ...read more

Add your answer

Q71. How will you design the data models

Ans.

I will design the data models by analyzing the requirements, identifying entities and relationships, creating entity-relationship diagrams, and normalizing the data.

  • Analyze the requirements to understand the data needs

  • Identify entities and their relationships

  • Create entity-relationship diagrams to visualize the structure

  • Normalize the data to reduce redundancy and improve efficiency

Add your answer

Q72. How to select features?

Ans.

Feature selection involves identifying the most relevant and informative variables for a predictive model.

  • Start with a large pool of potential features

  • Use statistical tests or machine learning algorithms to identify the most important features

  • Consider domain knowledge and expert input

  • Regularly re-evaluate and update feature selection as needed

Add your answer

Q73. How do you handle low performance

Ans.

I address low performance by identifying root causes, providing feedback and support, setting clear expectations, and offering opportunities for improvement.

  • Identify the root causes of low performance through performance evaluations and feedback.

  • Provide constructive feedback and support to help the individual improve.

  • Set clear expectations and goals for performance improvement.

  • Offer training, resources, and opportunities for the individual to enhance their skills and knowledg...read more

Add your answer

Q74. Explain over fitting

Ans.

Overfitting occurs when a model learns the noise in the training data rather than the underlying pattern.

  • Overfitting happens when a model is too complex and captures noise in the training data.

  • It leads to poor generalization to new, unseen data.

  • Regularization techniques like L1/L2 regularization can help prevent overfitting.

  • Cross-validation can be used to detect and prevent overfitting.

  • Example: A decision tree with too many branches that perfectly fits the training data but p...read more

Add your answer

Q75. Architecture Diagram of Deployments in production

Ans.

The architecture diagram of deployments in production showcases the flow of models from training to deployment.

  • The diagram typically includes components such as data storage, model training, model serving, and monitoring.

  • It shows how data flows through the system, how models are trained and tested, and how they are deployed for inference.

  • Common tools used in MLOps architecture include Kubernetes for orchestration, Docker for containerization, and CI/CD pipelines for automatio...read more

Add your answer

Q76. t-test significance

Ans.

A t-test is used to determine if there is a significant difference between the means of two groups.

  • T-test is a statistical test used to compare the means of two groups.

  • It calculates the t-value, which is then compared to a critical value to determine significance.

  • The lower the p-value, the more significant the difference between the groups.

  • For example, a t-test can be used to compare the average test scores of two different classes.

  • Make sure to check assumptions like normalit...read more

Add your answer

Q77. difference between NSG and ASG

Ans.

NSG stands for Network Security Group and is used to control inbound and outbound traffic to Azure resources. ASG stands for Availability Set Group and is used to ensure high availability of virtual machines.

  • NSG controls traffic by setting rules for inbound and outbound traffic based on source and destination IP addresses, ports, and protocols.

  • ASG groups virtual machines together to ensure high availability by distributing them across multiple fault domains and update domains...read more

Add your answer

Q78. number of vowels String program

Ans.

Count the number of vowels in a given array of strings.

  • Iterate through each string in the array

  • For each string, iterate through each character and check if it is a vowel (a, e, i, o, u)

  • Increment a counter for each vowel found in the string

Add your answer

Q79. Dicision Tree algorithm

Ans.

Decision Tree algorithm is a supervised learning algorithm used for classification and regression tasks.

  • Decision Tree algorithm is based on a tree-like model of decisions and their possible consequences.

  • It uses a set of rules to split the data into branches and make predictions at the leaf nodes.

  • The algorithm selects the best attribute to split the data based on certain criteria like information gain or Gini index.

  • Decision Trees can handle both categorical and numerical data....read more

Add your answer

Q80. What is marginal costing?

Ans.

Marginal costing is a costing technique where only variable costs are considered in determining the cost of a product or service.

  • Marginal costing helps in determining the contribution margin of a product, which is the difference between its selling price and variable costs.

  • Fixed costs are not included in the calculation under marginal costing.

  • It is useful for decision-making as it helps in analyzing the impact of changes in production volume on profitability.

  • Example: If a com...read more

Add your answer

Q81. Java hash map and how it works

Ans.

Java hash map is a data structure that stores key-value pairs and uses hashing to efficiently retrieve values based on keys.

  • HashMap in Java implements the Map interface and allows null keys and values.

  • It uses hashing to store and retrieve key-value pairs, providing O(1) time complexity for get() and put() operations.

  • Example: HashMap map = new HashMap<>(); map.put("key", 1); int value = map.get("key");

Add your answer

Q82. What is top down approach

Ans.

Top down approach is a problem solving or design strategy that starts with the larger overview and breaks it down into smaller components.

  • Start with a broad overview of the problem or design

  • Break it down into smaller components or sub-problems

  • Address each component individually before integrating them back together

  • Commonly used in software development, project management, and system design

Add your answer

Q83. Anyone ML model in depth concept

Ans.

A machine learning model is a mathematical model that learns from data to make predictions or decisions without being explicitly programmed.

  • ML models can be classified into categories such as supervised learning, unsupervised learning, and reinforcement learning.

  • Examples of ML models include linear regression, decision trees, support vector machines, and neural networks.

  • ML models require training data to learn patterns and relationships, and testing data to evaluate their per...read more

Add your answer

Q84. Difference between boosting and bagging

Ans.

Boosting focuses on improving the performance of weak learners sequentially, while bagging uses parallel ensemble learning with bootstrapping.

  • Boosting combines multiple weak learners to create a strong learner by giving more weight to misclassified instances in each iteration.

  • Bagging creates multiple subsets of the training data through bootstrapping and trains each subset independently to reduce variance.

  • Examples: AdaBoost, Gradient Boosting for boosting; Random Forest for b...read more

Add your answer

Q85. Longest non repeating substring

Ans.

Find the longest substring without repeating characters

  • Use a sliding window approach to track the longest substring without repeating characters

  • Keep track of the characters seen so far and their positions in a hashmap

  • Update the start of the window when a repeating character is encountered

Add your answer

Q86. SQL query for joining

Ans.

SQL query for joining tables to retrieve data from multiple related tables.

  • Use JOIN keyword to combine rows from two or more tables based on a related column between them.

  • Types of joins include INNER JOIN, LEFT JOIN, RIGHT JOIN, and FULL JOIN.

  • Example: SELECT * FROM table1 INNER JOIN table2 ON table1.column = table2.column;

Add your answer

Q87. Java program for oops concepts

Ans.

Java program showcasing OOPs concepts like inheritance, encapsulation, polymorphism, and abstraction.

  • Create classes representing different entities with properties and methods

  • Use inheritance to create a parent-child class relationship

  • Demonstrate encapsulation by setting private variables and using getter and setter methods

  • Show polymorphism by overriding methods in child classes

  • Implement abstraction by creating abstract classes or interfaces

Add your answer

Q88. Writing Xpath for any page

Ans.

Xpath is a query language used to locate elements on a web page based on their attributes and structure.

  • Identify unique attributes of the element you want to locate

  • Use the '//' operator to search for elements anywhere in the document

  • Use the '[@attribute='value']' syntax to specify the attribute and value you are looking for

  • Combine multiple conditions using 'and' or 'or' operators

  • Use functions like 'contains()', 'starts-with()', and 'text()' to refine your Xpath

Add your answer

Q89. Expectations from Tiger Analytics

Ans.

Tiger Analytics expects high-quality project delivery, effective communication, and proactive problem-solving from Program Managers.

  • Deliver high-quality projects on time and within budget

  • Maintain open and clear communication with stakeholders

  • Proactively identify and address project risks and issues

  • Collaborate effectively with cross-functional teams

  • Drive continuous improvement in project delivery processes

Add your answer

Q90. what is expected ctc

Ans.

Expected CTC refers to the salary range that the candidate is looking for in the new position.

  • Research industry standards for Software Developer salaries

  • Consider your experience, skills, and location when determining expected CTC

  • Be prepared to negotiate based on the job responsibilities and benefits package

  • Provide a range rather than a specific number to allow for flexibility

Add your answer

Q91. Star vs snowflake schema

Ans.

Star schema is a denormalized schema with a single central fact table surrounded by dimension tables. Snowflake schema is a normalized schema with multiple interconnected dimension tables.

  • Star schema is easier to understand and query due to denormalization.

  • Snowflake schema saves storage space by normalizing data.

  • Star schema is better for data warehousing and reporting, while snowflake schema is better for OLAP systems.

  • Example: A star schema for a sales database would have a f...read more

Add your answer

Q92. explain about cloud computing

Ans.

Cloud computing is the delivery of computing services over the internet, including storage, servers, databases, networking, software, and more.

  • Cloud computing allows users to access resources on-demand without the need for physical infrastructure

  • Examples of cloud computing services include Amazon Web Services (AWS), Microsoft Azure, and Google Cloud Platform

  • Cloud computing offers scalability, flexibility, cost-effectiveness, and increased collaboration

Add your answer

Q93. Python code to sort elements

Ans.

Python code to sort elements in an array of strings

  • Use the sorted() function to sort the elements in the array

  • Specify the key parameter if you want to sort based on a specific criteria

  • Use the reverse parameter to sort in descending order if needed

Add your answer

Q94. Reduce cost of tableau access

Ans.

Reduce Tableau access costs by optimizing licenses, utilizing server resources efficiently, and training users on best practices.

  • Optimize Tableau licenses by identifying and removing unused licenses or downgrading to a lower tier if possible

  • Utilize Tableau Server resources efficiently by scheduling extracts during off-peak hours and optimizing server performance settings

  • Train users on best practices to reduce unnecessary usage of Tableau features and improve efficiency

Add your answer

Q95. Design a data model

Ans.

Design a data model for a customer relationship management system

  • Identify entities such as customers, products, orders, and sales representatives

  • Establish relationships between entities (e.g. a customer can place multiple orders)

  • Define attributes for each entity (e.g. customer name, product price)

  • Consider normalization to reduce redundancy and improve data integrity

Add your answer

Q96. Sum of two java

Ans.

The question is asking for a Java program that calculates the sum of two numbers.

  • Create two variables to store the numbers to be added.

  • Use the + operator to add the two numbers together.

  • Print or return the result of the addition.

Add your answer
Contribute & help others!
Write a review
Share interview
Contribute salary
Add office photos

Interview Process at Uplers

based on 114 interviews in the last 1 year
Interview experience
4.0
Good
View more
Interview Tips & Stories
Ace your next interview with expert advice and inspiring stories

Top Interview Questions from Similar Companies

3.5
 • 417 Interview Questions
3.7
 • 406 Interview Questions
4.1
 • 393 Interview Questions
3.9
 • 253 Interview Questions
4.3
 • 189 Interview Questions
4.3
 • 130 Interview Questions
View all
Top Tiger Analytics Interview Questions And Answers
Share an Interview
Stay ahead in your career. Get AmbitionBox app
qr-code
Helping over 1 Crore job seekers every month in choosing their right fit company
70 Lakh+

Reviews

5 Lakh+

Interviews

4 Crore+

Salaries

1 Cr+

Users/Month

Contribute to help millions
Get AmbitionBox app

Made with ❤️ in India. Trademarks belong to their respective owners. All rights reserved © 2024 Info Edge (India) Ltd.

Follow us
  • Youtube
  • Instagram
  • LinkedIn
  • Facebook
  • Twitter