Filter interviews by
Use SQL query with window function to rank members by transaction amount in each city.
Use SQL query with PARTITION BY clause to group members by city
Use ORDER BY clause to rank members by transaction amount
Select the second highest member for each city
Broadcast Variables are read-only shared variables that are cached on each machine in a cluster for efficient data distribution.
Broadcast Variables are used to efficiently distribute large read-only datasets to all nodes in a Spark cluster.
They are useful for tasks like joining a small lookup table with a large dataset.
Broadcast variables are cached in memory on each machine to avoid unnecessary data shuffling dur...
Using self join to analyze customer behavior in an e-commerce platform.
Identifying patterns in customer purchase history
Analyzing customer preferences based on past purchases
Segmenting customers based on their buying behavior
Normalization in SQL is the process of organizing data in a database to reduce redundancy and improve data integrity.
1NF (First Normal Form) - Each column in a table must contain atomic values, and there should be no repeating groups.
2NF (Second Normal Form) - Table should be in 1NF and all non-key attributes are fully functional dependent on the primary key.
3NF (Third Normal Form) - Table should be in 2NF and the...
Alter is used to modify the structure of a table, while update is used to modify the data in a table.
Alter is used to add, remove, or modify columns in a table.
Update is used to change the values of existing records in a table.
Alter can change the structure of a table, such as adding a new column or changing the data type of a column.
Update is used to modify the data in a table, such as changing the value of a spe...
Map applies a function to each element in a collection and returns a new collection. Flatmap applies a function that returns a collection to each element and flattens the result.
Map transforms each element in a collection using a function and returns a new collection.
Flatmap applies a function that returns a collection to each element and flattens the result into a single collection.
Map does not flatten nested col...
A generator function is a function that can pause and resume its execution, allowing it to yield multiple values over time.
Generator functions are defined using the 'function*' syntax in JavaScript.
They use the 'yield' keyword to return values one at a time.
Generators can be iterated over using a 'for...of' loop.
They are useful for generating sequences of values lazily, improving memory efficiency.
Use left join for computationally efficient way to find customer names from customer profile and transaction tables.
Use left join to combine customer profile and transaction tables based on customer id
Left join will include all customers from profile table even if they don't have transactions
Subquery may be less efficient as it has to be executed for each row in the result set
Lambda function is a serverless computing service that runs code in response to events and automatically manages the computing resources required.
Lambda functions are event-driven and can be triggered by various AWS services such as S3, DynamoDB, API Gateway, etc.
They are written in languages like Python, Node.js, Java, etc.
Lambda functions are scalable and cost-effective as you only pay for the compute time you c...
List comprehension is a concise way to create lists in Python by applying an expression to each item in an iterable.
Syntax: [expression for item in iterable]
Can include conditionals: [expression for item in iterable if condition]
Example: squares = [x**2 for x in range(10)]
I applied via LinkedIn and was interviewed in Aug 2023. There were 2 interview rounds.
Normalization in SQL is the process of organizing data in a database to reduce redundancy and improve data integrity.
1NF (First Normal Form) - Each column in a table must contain atomic values, and there should be no repeating groups.
2NF (Second Normal Form) - Table should be in 1NF and all non-key attributes are fully functional dependent on the primary key.
3NF (Third Normal Form) - Table should be in 2NF and there sh...
Alter is used to modify the structure of a table, while update is used to modify the data in a table.
Alter is used to add, remove, or modify columns in a table.
Update is used to change the values of existing records in a table.
Alter can change the structure of a table, such as adding a new column or changing the data type of a column.
Update is used to modify the data in a table, such as changing the value of a specific...
Use left join for computationally efficient way to find customer names from customer profile and transaction tables.
Use left join to combine customer profile and transaction tables based on customer id
Left join will include all customers from profile table even if they don't have transactions
Subquery may be less efficient as it has to be executed for each row in the result set
Subqueries are nested queries that provide intermediate results for the main query, optimizing data retrieval.
Subqueries can be used in SELECT, INSERT, UPDATE, or DELETE statements.
They are executed once for the outer query, and their results are used as input.
Example: SELECT * FROM employees WHERE department_id IN (SELECT id FROM departments WHERE location = 'New York');
Subqueries can be correlated, meaning they refer...
Using self join to analyze customer behavior in an e-commerce platform.
Identifying patterns in customer purchase history
Analyzing customer preferences based on past purchases
Segmenting customers based on their buying behavior
Use SQL query with window function to rank members by transaction amount in each city.
Use SQL query with PARTITION BY clause to group members by city
Use ORDER BY clause to rank members by transaction amount
Select the second highest member for each city
CTE is a temporary result set that can be referenced within a SELECT, INSERT, UPDATE, or DELETE statement. It is different from a Stored Procedure as it is only available for the duration of the query.
CTE stands for Common Table Expression and is defined using the WITH keyword.
CTEs are mainly used for recursive queries, complex joins, and simplifying complex queries.
CTEs are not stored in the database like Stored Proce...
List comprehension is a concise way to create lists in Python by applying an expression to each item in an iterable.
Syntax: [expression for item in iterable]
Can include conditionals: [expression for item in iterable if condition]
Example: squares = [x**2 for x in range(10)]
Lambda function is a serverless computing service that runs code in response to events and automatically manages the computing resources required.
Lambda functions are event-driven and can be triggered by various AWS services such as S3, DynamoDB, API Gateway, etc.
They are written in languages like Python, Node.js, Java, etc.
Lambda functions are scalable and cost-effective as you only pay for the compute time you consum...
A generator function is a function that can pause and resume its execution, allowing it to yield multiple values over time.
Generator functions are defined using the 'function*' syntax in JavaScript.
They use the 'yield' keyword to return values one at a time.
Generators can be iterated over using a 'for...of' loop.
They are useful for generating sequences of values lazily, improving memory efficiency.
Transformation in pyspark is lazy evaluation while Actions trigger execution of transformations.
Transformations are operations that are not executed immediately but create a plan for execution.
Actions are operations that trigger the execution of transformations and return results.
Examples of transformations include map, filter, and reduceByKey.
Examples of actions include collect, count, and saveAsTextFile.
Map applies a function to each element in a collection and returns a new collection. Flatmap applies a function that returns a collection to each element and flattens the result.
Map transforms each element in a collection using a function and returns a new collection.
Flatmap applies a function that returns a collection to each element and flattens the result into a single collection.
Map does not flatten nested collecti...
Broadcast Variables are read-only shared variables that are cached on each machine in a cluster for efficient data distribution.
Broadcast Variables are used to efficiently distribute large read-only datasets to all nodes in a Spark cluster.
They are useful for tasks like joining a small lookup table with a large dataset.
Broadcast variables are cached in memory on each machine to avoid unnecessary data shuffling during c...
Top trending discussions
I applied via Campus Placement
I appeared for an interview in Oct 2016.
I cannot provide investment advice, but here are five companies that have shown strong financial performance in recent years.
Apple - consistently high revenue and profit margins
Amazon - dominant player in e-commerce and cloud computing
Microsoft - strong growth in cloud computing and enterprise software
Alphabet (Google) - diversified revenue streams and strong advertising business
Visa - dominant player in the payments i...
The Brexit vote could have both positive and negative effects on the Indian economy.
Positive effects: Increased trade opportunities with the UK, potential for attracting foreign investments from companies relocating from the UK.
Negative effects: Uncertainty in global markets leading to volatility in exchange rates, potential decline in exports to the UK.
Example: Indian IT companies may face challenges due to stricter i...
I applied via Walk-in and was interviewed before May 2021. There were 2 interview rounds.
A business analyst is a professional who analyzes an organization's business domain and documents its business processes or systems.
Analyzes business processes to identify areas for improvement
Works with stakeholders to gather requirements for new systems or processes
Creates documentation such as business requirements documents and process maps
Helps to bridge the gap between business stakeholders and technical teams
Use...
I applied via Approached by Company and was interviewed in Jan 2023. There were 3 interview rounds.
Three Data model questions will be given to solve within 24 hours.
A case study on the number of green T-shirts sold in the US.
Identify the target audience for green T-shirts
Analyze the market demand for green T-shirts
Study the sales data of green T-shirts in the US
Identify the popular brands and styles of green T-shirts
Analyze the impact of seasonality on sales
Consider the pricing strategy of green T-shirts
Identify potential marketing opportunities to increase sales
I appeared for an interview in Aug 2021.
Round duration - 90 Minutes
Round difficulty - Medium
5-9pm
You are given an array/list of integers with length 'N'. A sliding window of size 'K' moves from the start to the end of the array. For each of the 'N'-'K'+1 possi...
The problem involves finding the maximum element in each sliding window of size 'K' in an array of integers.
Iterate through the array and maintain a deque to store the indices of elements in the current window.
Remove indices from the deque that are outside the current window.
Keep the deque in decreasing order of element values to easily find the maximum element in each window.
Round duration - 45 Minutes
Round difficulty - Medium
Timing was Morning-noon
Interviewer was super friendly.
He even helped me at places I got Stuck
Explaining a complex joins problem in DBMS
Discussing the use of different types of joins like inner join, outer join, self join, etc.
Explaining how to handle null values and duplicates during joins
Demonstrating a scenario where multiple tables need to be joined based on different keys
Round duration - 15 Minutes
Round difficulty - Easy
Easiest HR round ever.
Just 15 mins after the technical interview round
Tip 1 : Prepare short hand written notes for a quick glance before each interview.
Tip 2 : Start with Easy questions for DSA and Slowly Increase your level to medium and then to hard. Do not Rush things.
Tip 3 : Follow LOVE BABBAR'S DSA sheet. It's the best.
Tip 1 : Mention only genuine Skills on your resume. Do not lie or over-exaggerate
Tip 2 : Do not put Coursera/Udemy or any such Course Certifications on Your Resume, As Interviewers do not care about where You learnt things from, They only care about The things you know.
Logical question math
Learn oops it's very important
I applied via Campus Placement
1hr test , basic apti questions .
American express discussion and economics
Some of the top questions asked at the TransOrg Analytics Data Engineer interview -
based on 1 interview experience
Difficulty level
Duration
based on 1 review
Rating in categories
Data Analyst
47
salaries
| ₹8.5 L/yr - ₹14.1 L/yr |
Analyst
40
salaries
| ₹7 L/yr - ₹14 L/yr |
Data Scientist
25
salaries
| ₹7 L/yr - ₹16 L/yr |
Analytics Specialist
24
salaries
| ₹8 L/yr - ₹21 L/yr |
Data Science Analyst
12
salaries
| ₹8 L/yr - ₹12 L/yr |
Bajaj Finserv
Wells Fargo
JPMorgan Chase & Co.
HSBC Group