Ernst & Young
Camfil Interview Questions and Answers
Q1. Difference between star and Snowflake schema in detail
Star schema has a single fact table connected to multiple dimension tables, while Snowflake schema has normalized dimension tables.
Star schema denormalizes data for faster query performance
Snowflake schema normalizes data to reduce redundancy
Star schema is easier to understand and query
Snowflake schema is more flexible and scalable
Example: A star schema for a sales database would have a fact table for sales transactions connected to dimension tables for products, customers, a...read more
Q2. Diff btw transient table , fact table and dimension table
Transient tables are temporary tables, fact tables contain quantitative data, and dimension tables contain descriptive data.
Transient tables are temporary and used for intermediate processing or storing temporary data.
Fact tables contain quantitative data such as sales, revenue, or quantities.
Dimension tables contain descriptive data like customer names, product categories, or dates.
Q3. Maximum salary for each department in sql
Use SQL query to find the maximum salary for each department
Use GROUP BY clause to group the data by department
Use MAX() function to find the maximum salary within each group
Join the tables if necessary to get department information
Q4. Sql order of execution
SQL order of execution determines the sequence in which different clauses are processed in a query.
SQL query is parsed and validated first
Next, the query optimizer creates an execution plan
Execution plan includes steps like table scans, index scans, joins, etc.
Finally, the query is executed and results are returned
Q5. list vs tuple difference
List and tuple are both sequence data types in Python, but the main difference is that lists are mutable while tuples are immutable.
Lists are enclosed in square brackets [], while tuples are enclosed in parentheses ().
Lists can be modified by adding, removing, or changing elements, while tuples cannot be modified once created.
Lists are typically used for collections of similar items, while tuples are used for heterogeneous data or to represent a fixed set of values.
Lists have...read more
Q6. Spark Optimization Techniques
Spark optimization techniques improve performance and efficiency of Spark jobs.
Use partitioning to distribute data evenly across nodes
Cache intermediate results to avoid recomputation
Optimize shuffle operations by reducing data shuffling
Use broadcast variables for small lookup tables
Tune memory and executor settings for optimal performance
More about working at Ernst & Young
Interview Process at Camfil
Top Data Engineer Interview Questions from Similar Companies
Reviews
Interviews
Salaries
Users/Month