Coforge
Mirabilis Design Interview Questions and Answers
Q1. Write python code to remove duplicates from list of string
Python code to remove duplicates from list of strings
Use set() to remove duplicates from the list
Convert the set back to a list to maintain the order of strings
Example: input_list = ['apple', 'banana', 'apple', 'orange']
Output: ['apple', 'banana', 'orange']
Q2. Write SQL to get 2nd highest sal
Use SQL query with ORDER BY and LIMIT to get 2nd highest salary.
Use SELECT statement to retrieve salary column
Use ORDER BY clause to sort salaries in descending order
Use LIMIT 1,1 to get the second highest salary
Q3. Remove duplicates in a dataframe
Use drop_duplicates() method to remove duplicates in a dataframe
Use drop_duplicates() method on the dataframe to remove duplicates based on all columns
Specify subset parameter to remove duplicates based on specific columns
Use keep parameter to control which duplicate to keep (first, last, or False)
Example: df.drop_duplicates()
Example: df.drop_duplicates(subset=['column1', 'column2'])
Q4. Pyspark scd type2 implementation
Implementing Slowly Changing Dimension Type 2 in PySpark
Use PySpark DataFrame operations to handle SCD Type 2 implementation
Maintain historical records by adding new rows with updated information and end dates for previous records
Utilize window functions and joins to identify changes and update records accordingly
Q5. Data skewness handling in spark
Data skewness handling in Spark involves redistributing data to balance workload and optimize performance.
Use repartition() or coalesce() to redistribute data evenly across partitions
Consider using broadcast joins for small tables to avoid data shuffling
Implement custom partitioning strategies for specific use cases
Monitor job performance and adjust partitioning as needed
Q6. Optimisation technique in saprk
Optimisation techniques in Spark improve performance by efficiently utilizing resources.
Use partitioning to distribute data evenly across nodes
Cache intermediate results to avoid recomputation
Use broadcast variables for small lookup tables
Optimize shuffle operations to reduce data movement
Interview Process at Mirabilis Design
Top Senior Data Engineer Interview Questions from Similar Companies
Reviews
Interviews
Salaries
Users/Month