Tiger Analytics
90+ Uplers Interview Questions and Answers
Q1. Q4. What is the probability of getting 5 Sundays in 31 day month.
The probability of getting 5 Sundays in a 31 day month is less than 1%.
There are 7 days in a week, so the probability of any given day being a Sunday is 1/7.
In a 31 day month, there are 4 full weeks and 3 extra days.
The probability of the first 4 weeks having 4 Sundays is (1/7)^4.
The probability of the remaining 3 days being Sundays is (3/7).
Multiplying these probabilities gives a total probability of less than 1%.
Q2. Q4. You are standing in a field. Chances of seeing atleast 1 plane in 10 minutes is 15%. What is the probability of seeing atleast 1 plane in next 30 minutes?
Probability of seeing a plane in 30 minutes given 15% chance in 10 minutes.
Calculate the probability of not seeing a plane in 10 minutes
Use the formula P(X>=1) = 1 - P(X=0)
Calculate the probability of not seeing a plane in 30 minutes using the above probability
Calculate the probability of seeing atleast 1 plane in 30 minutes using the formula P(X>=1) = 1 - P(X=0)
Q3. Q5. If we select a random point in a circle of 1 unit radius what is the probability of appearing that point closer to the circumference , not closer to the centre.
Probability of a random point in a circle of 1 unit radius being closer to the circumference than the center.
The probability is 1/4 or approximately 0.785.
This is because the area of the circle closer to the circumference is 1/4th of the total area.
This can be calculated using the formula for the area of a circle: A = πr^2.
Q4. Q1. Implement python Collection Counter from Scratch.
Implementing Python Collection Counter from Scratch
Create an empty dictionary to store the elements and their count
Iterate through the input list and add elements to the dictionary with their count
Return the dictionary
Example: input_list = ['apple', 'banana', 'apple', 'orange', 'banana']
Output: {'apple': 2, 'banana': 2, 'orange': 1}
Q5. Q2. What will be the approach If all the features are categorical in Linear Regression. Q3. What is Dummy variable trap? If we don't remove dummy variable what will be the issue and does it impact performance o...
read moreCategorical features in Linear Regression require encoding using dummy variables. Removing one dummy variable avoids the dummy variable trap.
Categorical features need to be encoded using dummy variables to be used in Linear Regression
Dummy variable trap occurs when one dummy variable can be predicted from the others
Removing one dummy variable avoids the issue of multicollinearity and improves model performance
Example: Gender (Male/Female) can be encoded as a dummy variable wi...read more
Q6. Q1. Implement a Program to check if a number is power of 3 .
Program to check if a number is power of 3
Use logarithm to check if the result is an integer
Check if the number is greater than 0
Check if the remainder is 0 when the number is divided by 3 repeatedly
Q7. Q2. Do Matrix Multiplication. Q3. Implement Factorial and Fibonacci Series with different Approaches.
Matrix multiplication, factorial and Fibonacci series implementation
Matrix multiplication involves multiplying two matrices to get a third matrix
Factorial is the product of all positive integers up to a given number
Fibonacci series is a sequence of numbers where each number is the sum of the two preceding ones
Factorial can be implemented using recursion or iteration
Fibonacci series can be implemented using recursion or iteration
Q8. Q5. There were 100 coins. 99 Unbiased Coins, 1. Coin is biased. Derive the probability of getting 10 heads given the even of unbiased coins using Bayes Theorem.
Using Bayes Theorem, find the probability of getting 10 heads given 99 unbiased coins and 1 biased coin.
Identify the prior probability of getting 10 heads with unbiased coins
Calculate the likelihood of getting 10 heads with the biased coin
Use Bayes Theorem to calculate the posterior probability of getting 10 heads given the mix of coins
Consider the impact of the biased coin on the overall probability
Q9. How can you prove to the client that a students with higher classes are taller than that of lower classes?
We can use statistical analysis to prove that students in higher classes are taller than those in lower classes.
Collect height data of students from different classes
Use statistical measures like mean, median, and mode to compare the heights of students in different classes
Perform hypothesis testing to determine if the difference in height between classes is statistically significant
Visualize the data using graphs and charts to make it easier for the client to understand
Provi...read more
Q10. What is entropy ? What is gini index? Give a real life example of derivative and second derivative. What is the difference between P-value and beta value? How do you handle imbalanced dataset? What is the diffe...
read moreEntropy is a measure of randomness or disorder in a system. Gini index is a measure of impurity in a dataset. Derivatives measure rate of change. P-value is the probability of observing a test statistic. Beta value is the coefficient in a regression model. Imbalanced datasets have unequal class distribution. Recall is the proportion of actual positives correctly identified. Precision is the proportion of predicted positives that are actually positive. Slope in one variable is...read more
Q11. Why accuracy score should not be used on imbalanced dataset?
Accuracy score can be misleading on imbalanced datasets.
Accuracy score can be high even if the model is not performing well on the minority class.
F1 score, precision, and recall are better metrics for imbalanced datasets.
Stratified sampling, oversampling, and undersampling can help balance the dataset.
Example: A model predicting cancer in a dataset with only 1% positive cases.
Using accuracy score, a model that always predicts negative will have 99% accuracy.
However, this mode...read more
Q12. Regression models: Which one should be used in which case?
Different regression models are used based on the type of data and relationship between variables.
Linear regression is used when there is a linear relationship between the independent and dependent variables.
Logistic regression is used when the dependent variable is binary.
Polynomial regression is used when the relationship between variables is non-linear.
Ridge regression is used when there is multicollinearity in the data.
Lasso regression is used when feature selection is im...read more
Q13. Different varieties on Fibonacci series in Python.
Different varieties of Fibonacci series in Python.
Standard Fibonacci series
Fibonacci series with user-defined starting numbers
Fibonacci series with user-defined length
Fibonacci series with user-defined step
Fibonacci series with user-defined function
Q14. ML algorithm overview of what I have used in my projects
I have used various ML algorithms such as linear regression, decision trees, random forests, and neural networks in my projects.
Linear regression for predicting continuous values
Decision trees for classification and regression tasks
Random forests for ensemble learning and improved accuracy
Neural networks for complex pattern recognition
Q15. What is the difference between List and Tuple?
List is mutable and Tuple is immutable in Python.
List can be modified after creation while Tuple cannot be modified.
List uses square brackets [] while Tuple uses parentheses ().
List is used for homogenous data while Tuple is used for heterogenous data.
List is slower than Tuple in terms of performance.
Example of List: [1, 2, 3] and Example of Tuple: (1, 'hello', 3.14)
Q16. List of stock prizes, identify the days when a person should buy and sell to earn maximum profit
To maximize profit, buy when the stock price is low and sell when it is high.
Identify the lowest price point to buy the stock
Identify the highest price point to sell the stock
Consider market trends and analysis for optimal buying and selling days
Q17. 1. Confusion Matrix 2. What is recall and precision? 3. Explain about ROC curve 4. Based on what RFE eliminate the features? 5. SQL question which requires grouping 6. How to read a dataframe, display top 5 row...
read moreInterview questions for Senior Analyst Data Science
Confusion matrix is a table used to evaluate the performance of a classification model
Recall is the ratio of true positives to the sum of true positives and false negatives
Precision is the ratio of true positives to the sum of true positives and false positives
ROC curve is a graphical representation of the performance of a binary classifier
RFE eliminates features based on their importance to the model
SQL question may involve ...read more
Q18. What is permutation and combination and how is it used in data science?
Permutation and combination are mathematical concepts used to count the number of possible outcomes in a given scenario.
Permutation is the arrangement of objects in a specific order while combination is the selection of objects without considering the order.
Permutation formula: nPr = n!/(n-r)! where n is the total number of objects and r is the number of objects selected.
Combination formula: nCr = n!/r!(n-r)! where n is the total number of objects and r is the number of objec...read more
Q19. What is P-value in regression summary?
P-value in regression summary measures the probability of observing a test statistic as extreme as the one computed from the sample data.
P-value is used to determine the statistical significance of the regression coefficient.
A low P-value (less than 0.05) indicates that the coefficient is statistically significant.
A high P-value (greater than 0.05) indicates that the coefficient is not statistically significant.
P-value is calculated using the t-test or F-test depending on the...read more
Q20. Compare two arrays in python and print if both of them are same or not?
Compare two arrays in python and print if both of them are same or not.
Use the '==' operator to compare the arrays.
If the arrays have the same elements in the same order, they are considered the same.
If the arrays have different elements or different order, they are considered different.
Print 'Same' if the arrays are the same, otherwise print 'Different'.
Q21. Explain databricks dlt, and when will you use batch vs streaming?
Databricks DLT is a unified data management platform for batch and streaming processing.
Databricks DLT (Delta Lake Table) is a storage layer that brings ACID transactions to Apache Spark and big data workloads.
Batch processing is used when data is collected over a period of time and processed in large chunks, while streaming processing is used for real-time data processing.
Use batch processing for historical data analysis, ETL jobs, and periodic reporting. Use streaming proce...read more
Q22. How is memory managed in Python?
Python uses automatic memory management through garbage collection.
Python uses reference counting to keep track of memory usage.
When an object's reference count drops to zero, it is deleted.
Python also uses a garbage collector to handle circular references.
Memory allocation is handled by the Python memory manager.
Python provides tools like the 'gc' module for managing memory usage.
Q23. Sample T test. What is it?
Sample T test is a statistical test used to determine if there is a significant difference between the means of two groups.
It is used to compare the means of two groups.
It assumes that the data is normally distributed.
It is commonly used in research studies to determine if a treatment has a significant effect.
Example: A sample T test can be used to compare the mean weight of two groups of people who followed different diets.
Q24. Get Second highest element from an array (duplicates elements are allowed). Required T.C-->O(N) Single traversal. S.C--->O(1)
Get second highest element from an array of strings with O(N) time complexity and O(1) space complexity.
Initialize two variables to store the highest and second highest elements.
Traverse the array and update the variables accordingly.
Return the second highest element.
Handle edge cases like empty array or array with only one element.
Q25. What are the relevant projects in Data science & expertise in whatt all tools & technologies
Relevant projects in Data Science and expertise in tools and technologies
Projects: Predictive modeling, Natural Language Processing, Computer Vision, Recommender Systems, Time Series Analysis
Tools: Python, R, SQL, Tableau, Hadoop, Spark, TensorFlow, Keras, Scikit-learn
Technologies: Machine Learning, Deep Learning, Big Data, Cloud Computing, Data Visualization
Q26. Different type of license in power bi. Data Modelling.
Power BI offers different types of licenses for data modeling, including Power BI Pro and Power BI Premium.
Power BI Pro license allows users to create and share reports and dashboards with others.
Power BI Premium license offers additional features such as larger data capacity and advanced AI capabilities.
Power BI Embedded license is designed for embedding reports and dashboards into custom applications.
Power BI Report Server license allows for on-premises report publishing an...read more
Q27. 1. Different Types of integration runtime in adf 2. How to copy 100 files from one adls path to another in adf 3. Diff between DAG and Lineage , narrow and wide transformation in Spark 4. DBUtils questions. 5. ...
read moreThe interview questions cover topics related to Azure Data Factory, Spark, and Python programming.
Integration runtimes in ADF include Azure, Self-hosted, and SSIS IRs.
To copy 100 files in ADF, use a Copy Data activity with a wildcard path in source and sink datasets.
DAG in Spark represents a directed acyclic graph of computation, while lineage tracks the data flow.
Narrow transformations in Spark operate on a single partition, wide transformations shuffle data across partition...read more
Q28. What is the difference between deltalake and delta warehouse
Delta Lake is an open-source storage layer that brings ACID transactions to Apache Spark and big data workloads, while Delta Warehouse is a cloud-based data warehouse service.
Delta Lake is an open-source storage layer that brings ACID transactions to Apache Spark and big data workloads.
Delta Warehouse is a cloud-based data warehouse service that provides scalable storage and analytics capabilities.
Delta Lake is more focused on data lake operations and ensuring data reliabilit...read more
Q29. What is R-squared?
R-squared is a statistical measure that represents the proportion of the variance for a dependent variable that's explained by an independent variable.
R-squared ranges from 0 to 1, with 1 indicating that all variance in the dependent variable is explained by the independent variable.
It is used in regression analysis to determine how well the regression line fits the data points.
A higher R-squared value indicates a better fit of the model to the data, while a lower value sugge...read more
Q30. Architecture diagram of project
The architecture diagram of the project showcases the overall structure and components of the system.
The architecture diagram typically includes components like servers, databases, APIs, and client applications.
It shows how these components interact with each other and the flow of data within the system.
Commonly used tools for creating architecture diagrams include Microsoft Visio, Lucidchart, and draw.io.
Q31. Coffiecent of x^7 in equation ? y=(x^101-1)(x^100+1)(x^99-1)...........................................(X^0+1)
Coffiecent of x^7 in a given equation
Use the binomial theorem to expand the equation
Identify the term with x^7
The coefficient of x^7 is the coefficient of that term
Q32. most frequent word in a sentence ?
The most frequent word in a sentence can be found by counting the occurrence of each word and selecting the one with the highest count.
Split the sentence into words using whitespace as delimiter
Create a dictionary to store the count of each word
Iterate through the words and update the count in the dictionary
Find the word with the highest count in the dictionary
Q33. Whats the evaluation mertics for classification and regression model?bias and variance
Evaluation metrics for classification and regression models are different. Bias and variance are important factors to consider.
Classification metrics include accuracy, precision, recall, F1 score, ROC curve, and AUC.
Regression metrics include mean squared error, mean absolute error, R-squared, and adjusted R-squared.
Bias refers to the difference between the predicted values and the actual values, while variance refers to the variability of the model's predictions.
High bias in...read more
Q34. how to find a largest number in a list without using inbuilt function
Iterate through the list and compare each element to find the largest number.
Iterate through the list using a loop
Compare each element with a variable storing the current largest number
Update the variable if a larger number is found
Q35. Difference between generator and iterator?
Generator generates values on the fly while iterator iterates over a collection of values.
Generator is a function that returns an iterator.
Generators use 'yield' keyword to return values one at a time.
Iterators are objects that implement the 'next' method to return the next value in a collection.
Iterators can be created from arrays, strings, maps, sets, etc.
Generators are useful for generating large sequences of values without having to store them in memory.
Iterators are usef...read more
Q36. What are decision Trees and All the algorithms that you have used in ur project?
Decision Trees are a type of supervised learning algorithm used for classification and regression tasks.
Decision Trees are used to create a model that predicts the value of a target variable based on several input variables.
The algorithm splits the data into subsets based on the most significant attribute and continues recursively until a leaf node is reached.
Some of the algorithms used in my project include Random Forest, Gradient Boosting, and XGBoost.
Random Forest is an en...read more
Q37. What is difference between C and gamma in SVM
C is the regularization parameter while gamma controls the shape of the decision boundary in SVM.
C controls the trade-off between achieving a low training error and a low testing error.
A smaller C value creates a wider margin and allows more misclassifications.
Gamma controls the shape of the decision boundary and the influence of each training example.
A smaller gamma value creates a smoother decision boundary while a larger gamma value creates a more complex decision boundary...read more
Q38. Do you know numpy pandas?
Yes, numpy and pandas are Python libraries used for data analysis and manipulation.
NumPy is used for numerical operations on arrays and matrices.
Pandas is used for data manipulation and analysis, providing data structures like DataFrame.
Both libraries are commonly used in data science and machine learning.
Example: import numpy as np; import pandas as pd;
Q39. Expected ctc and current ctc negotiations
Discussing expected and current salary for negotiation purposes.
Be honest about your current salary and provide a realistic expectation for your desired salary.
Highlight your skills and experience that justify your desired salary.
Be open to negotiation and willing to discuss other benefits besides salary.
Research industry standards and salary ranges for similar positions to support your negotiation.
Focus on the value you can bring to the company rather than just the monetary ...read more
Q40. Project description
Developed a data analysis project to optimize marketing strategies for a retail company.
Utilized customer segmentation techniques to identify target demographics
Analyzed sales data to determine the most effective marketing channels
Implemented A/B testing to measure the impact of different marketing campaigns
Q41. Why use MSE metric
MSE metric is commonly used in data analysis to measure the average squared difference between predicted values and actual values.
MSE helps to quantify the accuracy of a model by penalizing large errors more than small errors.
It is easy to interpret as it gives a clear measure of how well the model is performing.
MSE is differentiable, making it suitable for optimization algorithms like gradient descent.
Example: In linear regression, MSE is often used to evaluate the performan...read more
Q42. Why use MSE metrics
MSE metrics are commonly used to measure the average squared difference between predicted values and actual values in statistical analysis.
MSE helps in evaluating the performance of a predictive model by quantifying the accuracy of the model's predictions.
It penalizes large errors more heavily than small errors, making it a useful metric for identifying outliers or areas where the model is underperforming.
MSE is widely used in machine learning, regression analysis, and time s...read more
Q43. How to handle imbalanced data in text analytics?
Imbalanced data in text analytics can be handled by techniques like oversampling, undersampling, and SMOTE.
Use oversampling to increase the number of instances in the minority class
Use undersampling to decrease the number of instances in the majority class
Use SMOTE to generate synthetic samples for the minority class
Use cost-sensitive learning algorithms to assign higher misclassification costs to the minority class
Use ensemble methods like bagging and boosting to combine mul...read more
Q44. What are different types of indexing
Different types of indexing include primary indexing, secondary indexing, clustered indexing, and non-clustered indexing.
Primary indexing: Index based on the primary key of a table, typically implemented using a B-tree structure.
Secondary indexing: Index based on a non-primary key column, allowing for faster retrieval of data based on that column.
Clustered indexing: Physically reorders the table based on the indexed column, leading to faster retrieval of data but slower inser...read more
Q45. What is probability
Probability is the likelihood of a specific event occurring, expressed as a number between 0 and 1.
Probability ranges from 0 (impossible event) to 1 (certain event)
It can be calculated by dividing the number of favorable outcomes by the total number of possible outcomes
Probability can be represented as a percentage, fraction, or decimal
Q46. Explain null hypothesis and p-value in terms of probability
Null hypothesis is a statement that assumes no relationship or difference between variables. P-value is the probability of obtaining results as extreme as the observed data, assuming the null hypothesis is true.
Null hypothesis is a statement that assumes no effect or relationship between variables
P-value is the probability of obtaining results as extreme as the observed data, assuming the null hypothesis is true
Null hypothesis is typically denoted as H0, while an alternative ...read more
Q47. What is word embedding and explain it's significance?
Word embedding is a technique to represent words as vectors in a high-dimensional space.
Word embedding captures the semantic meaning of words and their relationships.
It is used in natural language processing tasks such as text classification, sentiment analysis, and machine translation.
Popular word embedding models include Word2Vec, GloVe, and FastText.
Word embedding can be pre-trained on large corpora or trained on specific domain data.
It reduces the dimensionality of the in...read more
Q48. What is your weakness ans strength?
My weakness is overthinking and my strength is attention to detail.
Weakness: tend to overthink situations, which can lead to indecision or anxiety
Strength: strong attention to detail, ensuring accuracy and thoroughness in work
Example: Weakness - I sometimes spend too much time analyzing a problem before taking action. Strength - I am meticulous in my work, catching even the smallest errors.
Q49. Difference between PCA, KNN , Decision Tree
PCA reduces dimensionality, KNN is a non-parametric classification algorithm, Decision Tree is a tree-like model for classification.
PCA is used for dimensionality reduction by transforming data into a new coordinate system
KNN is a non-parametric classification algorithm that classifies new data points based on similarity to training data
Decision Tree is a tree-like model where each internal node represents a feature, each branch represents a decision, and each leaf node repre...read more
Q50. Python code to find the root of a number
Python code to find the root of a number
Use the math module to access the sqrt() function
Use the ** operator to raise the number to the power of 1/n
Handle negative numbers by converting them to complex numbers
Q51. write a CI/CD Pipeline code for a 3 tier application
A CI/CD Pipeline code for a 3 tier application
Use a version control system like Git to store the application code
Set up a CI tool like Jenkins to automate the build process
Define stages in the pipeline for building, testing, and deploying each tier of the application
Leverage tools like Docker for containerization and Kubernetes for orchestration
Implement automated testing at each stage to ensure code quality and reliability
Q52. Write python code
Python code to find the sum of all elements in a list
Use the sum() function to find the sum of all elements in a list
Ensure the list contains only numeric values for accurate results
Q53. How does random forest work
Random forest is an ensemble learning method that builds multiple decision trees and merges their predictions.
Random forest creates a set of decision trees from randomly selected subsets of the training data.
Each tree in the random forest independently predicts the outcome, and the final prediction is made by averaging the predictions of all the trees.
Random forest is effective in handling high-dimensional data and can handle missing values and outliers well.
It is a popular a...read more
Q54. What are the methods of variable selection?
Q55. how did i solve business problems through analytics
I utilized data analytics to identify root causes of business problems and develop effective solutions.
Utilized data analytics tools such as Excel, Tableau, and SQL to analyze large datasets
Identified trends and patterns in data to pinpoint areas of improvement
Developed predictive models to forecast future business outcomes
Collaborated with cross-functional teams to implement data-driven solutions
Monitored key performance indicators to track the success of implemented solutio...read more
Q56. Write a program to arrange array of numbers in ascending order.
Q57. What is indexing in SQl
Indexing in SQL is a technique to improve the performance of queries by creating a data structure that allows for faster retrieval of data.
Indexes are created on columns in a database table to speed up the retrieval of data.
They work similar to the index in a book, allowing the database to quickly find the rows that match a certain value.
Indexes can be created using single or multiple columns.
Examples: CREATE INDEX index_name ON table_name(column_name);
Q58. Design round for adf pipeline
Designing an ADF pipeline for data processing
Identify data sources and destinations
Define data transformations and processing steps
Consider scheduling and monitoring requirements
Utilize ADF activities like Copy Data, Data Flow, and Databricks
Implement error handling and logging mechanisms
Q59. 1. Explain about clustering methods.
Clustering methods group similar data points together based on their characteristics.
Clustering is an unsupervised learning technique.
It is used to identify patterns and groupings in data.
Common clustering methods include k-means, hierarchical, and density-based clustering.
K-means clustering partitions data into k clusters based on distance from a centroid.
Hierarchical clustering creates a tree-like structure of nested clusters.
Density-based clustering identifies areas of hig...read more
Q60. Explain Linear and Logistic Regression
Linear regression is used for predicting continuous numerical values, while logistic regression is used for predicting binary categorical values.
Linear regression models the relationship between a dependent variable and one or more independent variables using a linear equation.
Logistic regression models the probability of a binary outcome using a logistic function.
Linear regression is used for tasks like predicting house prices based on features like area and number of rooms....read more
Q61. Sort nearly sortes array.
Sort nearly sorted array using min heap
Create a min heap of size k+1
Insert first k+1 elements into min heap
For remaining elements, extract min and insert new element
Extract all remaining elements from min heap
Time complexity: O(nlogk)
Example: ['apple', 'banana', 'cherry', 'date', 'elderberry']
Q62. What is random forest
Random forest is an ensemble learning method used for classification and regression tasks.
Random forest is a collection of decision trees that are trained on random subsets of the data.
Each tree in the random forest independently predicts the target variable, and the final prediction is made by averaging the predictions of all trees.
Random forest is robust to overfitting and noisy data, and can handle large datasets with high dimensionality.
It is a popular machine learning al...read more
Q63. Rotate two dimensional array
Rotate a 2D array by 90 degrees clockwise or counterclockwise.
Transpose the matrix by swapping elements across the diagonal
Reverse each row or column depending on clockwise or counterclockwise rotation
Example: [[1,2],[3,4]] rotated clockwise becomes [[3,1],[4,2]]
Q64. Explain macros inExcel
Macros in Excel are automated sequences of commands that can be created to perform repetitive tasks.
Macros can be recorded or written using Visual Basic for Applications (VBA)
They can automate tasks such as formatting, data manipulation, and calculations
Macros can be assigned to buttons or keyboard shortcuts for easy access
They can save time and reduce errors in repetitive tasks
Q65. What is PCA? Explain the working.
PCA stands for Principal Component Analysis. It is a dimensionality reduction technique used to reduce the number of variables in a dataset while preserving the most important information.
PCA is used to transform high-dimensional data into a lower-dimensional space by finding the principal components that explain the maximum variance in the data.
The first principal component is the direction in which the data varies the most, followed by the second principal component, and so...read more
Q66. Explain lasso in feature selection
Lasso is a feature selection technique that penalizes the absolute size of the regression coefficients.
Lasso stands for Least Absolute Shrinkage and Selection Operator
It adds a penalty term to the regression equation, forcing some coefficients to be exactly zero
Helps in selecting the most important features and reducing overfitting
Useful when dealing with high-dimensional data
Example: In a dataset with multiple features, lasso regression can be used to select the most relevan...read more
Q67. Experience in terraform Azure DevOps CI/CD Kubernetes Monitoring
I have extensive experience in Terraform, Azure DevOps CI/CD, Kubernetes, and monitoring tools.
Implemented infrastructure as code using Terraform to automate provisioning of resources
Set up CI/CD pipelines in Azure DevOps for automated deployment
Managed Kubernetes clusters for container orchestration
Utilized monitoring tools like Prometheus and Grafana for performance tracking
Q68. What are different types of filters in power bi
Q69. Explain spark architecture
Spark architecture is a distributed computing framework that consists of a driver program, cluster manager, and worker nodes.
Consists of a driver program that manages the execution of tasks
Utilizes a cluster manager to allocate resources and schedule tasks
Worker nodes execute the tasks and store data in memory or disk
Supports fault tolerance through resilient distributed datasets (RDDs)
Q70. Difference between Ridge and Lasso regression
Ridge and Lasso regression are both regularization techniques used in linear regression to prevent overfitting.
Ridge regression adds a penalty equivalent to the square of the magnitude of coefficients, while Lasso regression adds a penalty equivalent to the absolute value of the magnitude of coefficients.
Ridge regression shrinks the coefficients towards zero but never exactly to zero, while Lasso regression can shrink some coefficients to zero, effectively performing feature ...read more
Q71. How will you design the data models
I will design the data models by analyzing the requirements, identifying entities and relationships, creating entity-relationship diagrams, and normalizing the data.
Analyze the requirements to understand the data needs
Identify entities and their relationships
Create entity-relationship diagrams to visualize the structure
Normalize the data to reduce redundancy and improve efficiency
Q72. How to select features?
Feature selection involves identifying the most relevant and informative variables for a predictive model.
Start with a large pool of potential features
Use statistical tests or machine learning algorithms to identify the most important features
Consider domain knowledge and expert input
Regularly re-evaluate and update feature selection as needed
Q73. How do you handle low performance
I address low performance by identifying root causes, providing feedback and support, setting clear expectations, and offering opportunities for improvement.
Identify the root causes of low performance through performance evaluations and feedback.
Provide constructive feedback and support to help the individual improve.
Set clear expectations and goals for performance improvement.
Offer training, resources, and opportunities for the individual to enhance their skills and knowledg...read more
Q74. Explain over fitting
Overfitting occurs when a model learns the noise in the training data rather than the underlying pattern.
Overfitting happens when a model is too complex and captures noise in the training data.
It leads to poor generalization to new, unseen data.
Regularization techniques like L1/L2 regularization can help prevent overfitting.
Cross-validation can be used to detect and prevent overfitting.
Example: A decision tree with too many branches that perfectly fits the training data but p...read more
Q75. Architecture Diagram of Deployments in production
The architecture diagram of deployments in production showcases the flow of models from training to deployment.
The diagram typically includes components such as data storage, model training, model serving, and monitoring.
It shows how data flows through the system, how models are trained and tested, and how they are deployed for inference.
Common tools used in MLOps architecture include Kubernetes for orchestration, Docker for containerization, and CI/CD pipelines for automatio...read more
Q76. t-test significance
A t-test is used to determine if there is a significant difference between the means of two groups.
T-test is a statistical test used to compare the means of two groups.
It calculates the t-value, which is then compared to a critical value to determine significance.
The lower the p-value, the more significant the difference between the groups.
For example, a t-test can be used to compare the average test scores of two different classes.
Make sure to check assumptions like normalit...read more
Q77. difference between NSG and ASG
NSG stands for Network Security Group and is used to control inbound and outbound traffic to Azure resources. ASG stands for Availability Set Group and is used to ensure high availability of virtual machines.
NSG controls traffic by setting rules for inbound and outbound traffic based on source and destination IP addresses, ports, and protocols.
ASG groups virtual machines together to ensure high availability by distributing them across multiple fault domains and update domains...read more
Q78. number of vowels String program
Count the number of vowels in a given array of strings.
Iterate through each string in the array
For each string, iterate through each character and check if it is a vowel (a, e, i, o, u)
Increment a counter for each vowel found in the string
Q79. Dicision Tree algorithm
Decision Tree algorithm is a supervised learning algorithm used for classification and regression tasks.
Decision Tree algorithm is based on a tree-like model of decisions and their possible consequences.
It uses a set of rules to split the data into branches and make predictions at the leaf nodes.
The algorithm selects the best attribute to split the data based on certain criteria like information gain or Gini index.
Decision Trees can handle both categorical and numerical data....read more
Q80. What is marginal costing?
Marginal costing is a costing technique where only variable costs are considered in determining the cost of a product or service.
Marginal costing helps in determining the contribution margin of a product, which is the difference between its selling price and variable costs.
Fixed costs are not included in the calculation under marginal costing.
It is useful for decision-making as it helps in analyzing the impact of changes in production volume on profitability.
Example: If a com...read more
Q81. Java hash map and how it works
Java hash map is a data structure that stores key-value pairs and uses hashing to efficiently retrieve values based on keys.
HashMap in Java implements the Map interface and allows null keys and values.
It uses hashing to store and retrieve key-value pairs, providing O(1) time complexity for get() and put() operations.
Example: HashMap
map = new HashMap<>(); map.put("key", 1); int value = map.get("key");
Q82. What is top down approach
Top down approach is a problem solving or design strategy that starts with the larger overview and breaks it down into smaller components.
Start with a broad overview of the problem or design
Break it down into smaller components or sub-problems
Address each component individually before integrating them back together
Commonly used in software development, project management, and system design
Q83. Anyone ML model in depth concept
A machine learning model is a mathematical model that learns from data to make predictions or decisions without being explicitly programmed.
ML models can be classified into categories such as supervised learning, unsupervised learning, and reinforcement learning.
Examples of ML models include linear regression, decision trees, support vector machines, and neural networks.
ML models require training data to learn patterns and relationships, and testing data to evaluate their per...read more
Q84. Difference between boosting and bagging
Boosting focuses on improving the performance of weak learners sequentially, while bagging uses parallel ensemble learning with bootstrapping.
Boosting combines multiple weak learners to create a strong learner by giving more weight to misclassified instances in each iteration.
Bagging creates multiple subsets of the training data through bootstrapping and trains each subset independently to reduce variance.
Examples: AdaBoost, Gradient Boosting for boosting; Random Forest for b...read more
Q85. Longest non repeating substring
Find the longest substring without repeating characters
Use a sliding window approach to track the longest substring without repeating characters
Keep track of the characters seen so far and their positions in a hashmap
Update the start of the window when a repeating character is encountered
Q86. SQL query for joining
SQL query for joining tables to retrieve data from multiple related tables.
Use JOIN keyword to combine rows from two or more tables based on a related column between them.
Types of joins include INNER JOIN, LEFT JOIN, RIGHT JOIN, and FULL JOIN.
Example: SELECT * FROM table1 INNER JOIN table2 ON table1.column = table2.column;
Q87. Java program for oops concepts
Java program showcasing OOPs concepts like inheritance, encapsulation, polymorphism, and abstraction.
Create classes representing different entities with properties and methods
Use inheritance to create a parent-child class relationship
Demonstrate encapsulation by setting private variables and using getter and setter methods
Show polymorphism by overriding methods in child classes
Implement abstraction by creating abstract classes or interfaces
Q88. Writing Xpath for any page
Xpath is a query language used to locate elements on a web page based on their attributes and structure.
Identify unique attributes of the element you want to locate
Use the '//' operator to search for elements anywhere in the document
Use the '[@attribute='value']' syntax to specify the attribute and value you are looking for
Combine multiple conditions using 'and' or 'or' operators
Use functions like 'contains()', 'starts-with()', and 'text()' to refine your Xpath
Q89. Expectations from Tiger Analytics
Tiger Analytics expects high-quality project delivery, effective communication, and proactive problem-solving from Program Managers.
Deliver high-quality projects on time and within budget
Maintain open and clear communication with stakeholders
Proactively identify and address project risks and issues
Collaborate effectively with cross-functional teams
Drive continuous improvement in project delivery processes
Q90. what is expected ctc
Expected CTC refers to the salary range that the candidate is looking for in the new position.
Research industry standards for Software Developer salaries
Consider your experience, skills, and location when determining expected CTC
Be prepared to negotiate based on the job responsibilities and benefits package
Provide a range rather than a specific number to allow for flexibility
Q91. Star vs snowflake schema
Star schema is a denormalized schema with a single central fact table surrounded by dimension tables. Snowflake schema is a normalized schema with multiple interconnected dimension tables.
Star schema is easier to understand and query due to denormalization.
Snowflake schema saves storage space by normalizing data.
Star schema is better for data warehousing and reporting, while snowflake schema is better for OLAP systems.
Example: A star schema for a sales database would have a f...read more
Q92. explain about cloud computing
Cloud computing is the delivery of computing services over the internet, including storage, servers, databases, networking, software, and more.
Cloud computing allows users to access resources on-demand without the need for physical infrastructure
Examples of cloud computing services include Amazon Web Services (AWS), Microsoft Azure, and Google Cloud Platform
Cloud computing offers scalability, flexibility, cost-effectiveness, and increased collaboration
Q93. Python code to sort elements
Python code to sort elements in an array of strings
Use the sorted() function to sort the elements in the array
Specify the key parameter if you want to sort based on a specific criteria
Use the reverse parameter to sort in descending order if needed
Q94. Reduce cost of tableau access
Reduce Tableau access costs by optimizing licenses, utilizing server resources efficiently, and training users on best practices.
Optimize Tableau licenses by identifying and removing unused licenses or downgrading to a lower tier if possible
Utilize Tableau Server resources efficiently by scheduling extracts during off-peak hours and optimizing server performance settings
Train users on best practices to reduce unnecessary usage of Tableau features and improve efficiency
Q95. Design a data model
Design a data model for a customer relationship management system
Identify entities such as customers, products, orders, and sales representatives
Establish relationships between entities (e.g. a customer can place multiple orders)
Define attributes for each entity (e.g. customer name, product price)
Consider normalization to reduce redundancy and improve data integrity
Q96. Sum of two java
The question is asking for a Java program that calculates the sum of two numbers.
Create two variables to store the numbers to be added.
Use the + operator to add the two numbers together.
Print or return the result of the addition.
Interview Process at Uplers
Top Interview Questions from Similar Companies
Reviews
Interviews
Salaries
Users/Month