Filter interviews by
Use SQL query with window function to rank members by transaction amount in each city.
Use SQL query with PARTITION BY clause to group members by city
Use ORDER BY clause to rank members by transaction amount
Select the second highest member for each city
Broadcast Variables are read-only shared variables that are cached on each machine in a cluster for efficient data distribution.
Broadcast Variables are used to efficiently distribute large read-only datasets to all nodes in a Spark cluster.
They are useful for tasks like joining a small lookup table with a large dataset.
Broadcast variables are cached in memory on each machine to avoid unnecessary data shuffling dur...
Using self join to analyze customer behavior in an e-commerce platform.
Identifying patterns in customer purchase history
Analyzing customer preferences based on past purchases
Segmenting customers based on their buying behavior
Normalization in SQL is the process of organizing data in a database to reduce redundancy and improve data integrity.
1NF (First Normal Form) - Each column in a table must contain atomic values, and there should be no repeating groups.
2NF (Second Normal Form) - Table should be in 1NF and all non-key attributes are fully functional dependent on the primary key.
3NF (Third Normal Form) - Table should be in 2NF and the...
Alter is used to modify the structure of a table, while update is used to modify the data in a table.
Alter is used to add, remove, or modify columns in a table.
Update is used to change the values of existing records in a table.
Alter can change the structure of a table, such as adding a new column or changing the data type of a column.
Update is used to modify the data in a table, such as changing the value of a spe...
Map applies a function to each element in a collection and returns a new collection. Flatmap applies a function that returns a collection to each element and flattens the result.
Map transforms each element in a collection using a function and returns a new collection.
Flatmap applies a function that returns a collection to each element and flattens the result into a single collection.
Map does not flatten nested col...
A generator function is a function that can pause and resume its execution, allowing it to yield multiple values over time.
Generator functions are defined using the 'function*' syntax in JavaScript.
They use the 'yield' keyword to return values one at a time.
Generators can be iterated over using a 'for...of' loop.
They are useful for generating sequences of values lazily, improving memory efficiency.
Use left join for computationally efficient way to find customer names from customer profile and transaction tables.
Use left join to combine customer profile and transaction tables based on customer id
Left join will include all customers from profile table even if they don't have transactions
Subquery may be less efficient as it has to be executed for each row in the result set
Lambda function is a serverless computing service that runs code in response to events and automatically manages the computing resources required.
Lambda functions are event-driven and can be triggered by various AWS services such as S3, DynamoDB, API Gateway, etc.
They are written in languages like Python, Node.js, Java, etc.
Lambda functions are scalable and cost-effective as you only pay for the compute time you c...
List comprehension is a concise way to create lists in Python by applying an expression to each item in an iterable.
Syntax: [expression for item in iterable]
Can include conditionals: [expression for item in iterable if condition]
Example: squares = [x**2 for x in range(10)]
I applied via LinkedIn and was interviewed in Aug 2023. There were 2 interview rounds.
Normalization in SQL is the process of organizing data in a database to reduce redundancy and improve data integrity.
1NF (First Normal Form) - Each column in a table must contain atomic values, and there should be no repeating groups.
2NF (Second Normal Form) - Table should be in 1NF and all non-key attributes are fully functional dependent on the primary key.
3NF (Third Normal Form) - Table should be in 2NF and there sh...
Alter is used to modify the structure of a table, while update is used to modify the data in a table.
Alter is used to add, remove, or modify columns in a table.
Update is used to change the values of existing records in a table.
Alter can change the structure of a table, such as adding a new column or changing the data type of a column.
Update is used to modify the data in a table, such as changing the value of a specific...
Use left join for computationally efficient way to find customer names from customer profile and transaction tables.
Use left join to combine customer profile and transaction tables based on customer id
Left join will include all customers from profile table even if they don't have transactions
Subquery may be less efficient as it has to be executed for each row in the result set
Subqueries are nested queries that provide intermediate results for the main query, optimizing data retrieval.
Subqueries can be used in SELECT, INSERT, UPDATE, or DELETE statements.
They are executed once for the outer query, and their results are used as input.
Example: SELECT * FROM employees WHERE department_id IN (SELECT id FROM departments WHERE location = 'New York');
Subqueries can be correlated, meaning they refer...
Using self join to analyze customer behavior in an e-commerce platform.
Identifying patterns in customer purchase history
Analyzing customer preferences based on past purchases
Segmenting customers based on their buying behavior
Use SQL query with window function to rank members by transaction amount in each city.
Use SQL query with PARTITION BY clause to group members by city
Use ORDER BY clause to rank members by transaction amount
Select the second highest member for each city
CTE is a temporary result set that can be referenced within a SELECT, INSERT, UPDATE, or DELETE statement. It is different from a Stored Procedure as it is only available for the duration of the query.
CTE stands for Common Table Expression and is defined using the WITH keyword.
CTEs are mainly used for recursive queries, complex joins, and simplifying complex queries.
CTEs are not stored in the database like Stored Proce...
List comprehension is a concise way to create lists in Python by applying an expression to each item in an iterable.
Syntax: [expression for item in iterable]
Can include conditionals: [expression for item in iterable if condition]
Example: squares = [x**2 for x in range(10)]
Lambda function is a serverless computing service that runs code in response to events and automatically manages the computing resources required.
Lambda functions are event-driven and can be triggered by various AWS services such as S3, DynamoDB, API Gateway, etc.
They are written in languages like Python, Node.js, Java, etc.
Lambda functions are scalable and cost-effective as you only pay for the compute time you consum...
A generator function is a function that can pause and resume its execution, allowing it to yield multiple values over time.
Generator functions are defined using the 'function*' syntax in JavaScript.
They use the 'yield' keyword to return values one at a time.
Generators can be iterated over using a 'for...of' loop.
They are useful for generating sequences of values lazily, improving memory efficiency.
Transformation in pyspark is lazy evaluation while Actions trigger execution of transformations.
Transformations are operations that are not executed immediately but create a plan for execution.
Actions are operations that trigger the execution of transformations and return results.
Examples of transformations include map, filter, and reduceByKey.
Examples of actions include collect, count, and saveAsTextFile.
Map applies a function to each element in a collection and returns a new collection. Flatmap applies a function that returns a collection to each element and flattens the result.
Map transforms each element in a collection using a function and returns a new collection.
Flatmap applies a function that returns a collection to each element and flattens the result into a single collection.
Map does not flatten nested collecti...
Broadcast Variables are read-only shared variables that are cached on each machine in a cluster for efficient data distribution.
Broadcast Variables are used to efficiently distribute large read-only datasets to all nodes in a Spark cluster.
They are useful for tasks like joining a small lookup table with a large dataset.
Broadcast variables are cached in memory on each machine to avoid unnecessary data shuffling during c...
Top trending discussions
I applied via Campus Placement
I appeared for an interview in Oct 2016.
I cannot provide investment advice, but here are five companies that have shown strong financial performance in recent years.
Apple - consistently high revenue and profit margins
Amazon - dominant player in e-commerce and cloud computing
Microsoft - strong growth in cloud computing and enterprise software
Alphabet (Google) - diversified revenue streams and strong advertising business
Visa - dominant player in the payments i...
The Brexit vote could have both positive and negative effects on the Indian economy.
Positive effects: Increased trade opportunities with the UK, potential for attracting foreign investments from companies relocating from the UK.
Negative effects: Uncertainty in global markets leading to volatility in exchange rates, potential decline in exports to the UK.
Example: Indian IT companies may face challenges due to stricter i...
I applied via Walk-in and was interviewed before May 2021. There were 2 interview rounds.
I have worked on various databases including MySQL, Oracle, and MongoDB.
MySQL
Oracle
MongoDB
I have 3 years of experience working as a Data Analyst in the finance industry.
Utilized SQL to extract and analyze data from databases
Created visualizations using Tableau to present findings to stakeholders
Performed predictive modeling using Python to forecast financial trends
I applied via Approached by Company and was interviewed in Jan 2023. There were 3 interview rounds.
Three Data model questions will be given to solve within 24 hours.
A case study on the number of green T-shirts sold in the US.
Identify the target audience for green T-shirts
Analyze the market demand for green T-shirts
Study the sales data of green T-shirts in the US
Identify the popular brands and styles of green T-shirts
Analyze the impact of seasonality on sales
Consider the pricing strategy of green T-shirts
Identify potential marketing opportunities to increase sales
I applied via Campus Placement and was interviewed before Apr 2023. There were 2 interview rounds.
ANOVA is used to compare means across multiple groups to determine if at least one group differs significantly.
ANOVA is used when comparing three or more groups to see if their means are statistically different.
Example: Testing the effectiveness of three different diets on weight loss.
It helps in understanding the impact of categorical independent variables on a continuous dependent variable.
Example: Analyzing test sco...
Choosing an aggregation method for an index depends on the components and their context, ensuring relevance and accuracy.
Identify the components: Understand what individual metrics or data points will be included in the index.
Determine the aggregation method: Common methods include sum, average, weighted average, or geometric mean.
Consider the context: The importance of each component may vary based on the specific app...
Question:
Suppose you are trying to detect if a particular credit card transaction is fraudulent or not. The credit score of the individual to which the card belongs to had a very healthy credit score. All bills were paid in time and average transaction amount was not that high ($800). The individual had not been out of the country in the last couple of decades. Here is a list of transactions:
1) Gold jwelleries worth $5000
2) Groceries worth $35
3) Second hand car worth $8,000
4) Burgers worth $10
Which transaction looks fraudulent to you?
There is no specific answer. They just want to see how you think through the problem. One can potentially make use of data in order to deal with this problem. From that, one can estimate the probability of each of these transactions being fraudulent. Econometrically, one can develop a potential binary logit model. That would involve identifying certain features that belong to individuals like the one considered above and use these features to come up with an estimate of the probability of the transaction being a fraud.
Not just that, this also needs to include not individual specific features but external features as well. For example, the first transaction might not be as fraudulent as it looks like, because in heavily regulated markets, the risk associated with reselling the gold or exchanging it for money might be high enough to disincentivise the fraudster from buying gold. Thus regulation might also be a valid feature, and different from features describing an individuals characteristics.
Ofcourse problems of overfitting would arise ifan excessive number of features are used. Various means of finding the optimal Degrees of Freedom can be employed.
Obviously one can do better with more complicated decisioning algorihms that involve machine learning models as well.
Eventually one needs to estimate at what threshold of probability will the trasaction be declared fraudulent.
A business analyst is a professional who analyzes an organization's business domain and documents its business processes or systems.
Analyzes business processes to identify areas for improvement
Works with stakeholders to gather requirements for new systems or processes
Creates documentation such as business requirements documents and process maps
Helps to bridge the gap between business stakeholders and technical teams
Use...
Some of the top questions asked at the TransOrg Analytics Data Engineer interview -
based on 1 interview experience
Difficulty level
Duration
based on 1 review
Rating in categories
Data Analyst
47
salaries
| ₹6 L/yr - ₹20 L/yr |
Analyst
40
salaries
| ₹7 L/yr - ₹14 L/yr |
Data Scientist
25
salaries
| ₹7 L/yr - ₹16 L/yr |
Analytics Specialist
24
salaries
| ₹8 L/yr - ₹21 L/yr |
Data Science Analyst
12
salaries
| ₹8 L/yr - ₹12 L/yr |
Bajaj Finserv
Wells Fargo
JPMorgan Chase & Co.
HSBC Group