Add office photos
Engaged Employer

LTIMindtree

3.9
based on 20k Reviews
Filter interviews by

30+ GMR Group Interview Questions and Answers

Updated 28 Dec 2024
Popular Designations

Q1. 7) How does query acceleration speed up query processing?

Ans.

Query acceleration speeds up query processing by optimizing query execution and reducing the time taken to retrieve data.

  • Query acceleration uses techniques like indexing, partitioning, and caching to optimize query execution.

  • It reduces the time taken to retrieve data by minimizing disk I/O and utilizing in-memory processing.

  • Examples include using columnar storage formats like Parquet or optimizing join operations.

View 1 answer

Q2. SQL what are the condition used in sql? when we have table but we want create

Ans.

SQL conditions are used to filter data based on specified criteria. Common conditions include WHERE, AND, OR, IN, BETWEEN, etc.

  • Common SQL conditions include WHERE, AND, OR, IN, BETWEEN, LIKE, etc.

  • Conditions are used to filter data based on specified criteria in SQL queries.

  • Examples: WHERE salary > 50000, AND department = 'IT', OR age < 30

Add your answer

Q3. How to handle missing data in pyspark dataframe.

Ans.

Handle missing data in pyspark dataframe by using functions like dropna, fillna, or replace.

  • Use dropna() function to remove rows with missing data

  • Use fillna() function to fill missing values with a specified value

  • Use replace() function to replace missing values with a specified value

Add your answer

Q4. In Databricks, when a spark is submitted, what happens at backend. Explain the flow?

Ans.

When a spark is submitted in Databricks, several backend processes are triggered to execute the job.

  • The submitted spark job is divided into tasks by the Spark driver.

  • The tasks are then scheduled to run on the available worker nodes in the cluster.

  • The worker nodes execute the tasks and return the results to the driver.

  • The driver aggregates the results and presents them to the user.

  • Various optimizations such as data shuffling and caching may be applied during the execution proc...read more

Add your answer
Discover GMR Group interview dos and don'ts from real experiences

Q5. How would you delete duplicate records from a table?

Ans.

To delete duplicate records from a table, you can use the DELETE statement with a self-join or subquery.

  • Identify the duplicate records using a self-join or subquery

  • Use the DELETE statement to remove the duplicate records

  • Consider using a temporary table to store the unique records before deleting the duplicates

Add your answer

Q6. duplicate table how we create? window function? types of joins? explain each join?

Ans.

To duplicate a table, use CREATE TABLE AS or INSERT INTO SELECT. Window functions are used for calculations across a set of table rows. Types of joins include INNER, LEFT, RIGHT, and FULL OUTER joins.

  • To duplicate a table, use CREATE TABLE AS or INSERT INTO SELECT

  • Window functions are used for calculations across a set of table rows

  • Types of joins include INNER, LEFT, RIGHT, and FULL OUTER joins

  • Explain each join: INNER - returns rows when there is at least one match in both tabl...read more

Add your answer
Are these interview questions helpful?

Q7. How to filter data from A dashboard to B dashboard?

Ans.

Use data connectors or APIs to extract and transfer data from one dashboard to another.

  • Utilize data connectors or APIs provided by the dashboard platforms to extract data from A dashboard.

  • Transform the data as needed to match the format of B dashboard.

  • Use data connectors or APIs of B dashboard to transfer the filtered data from A dashboard to B dashboard.

Add your answer

Q8. How do you do to performance optimization in Spark?

Ans.

Performance optimization in Spark involves tuning configurations, optimizing code, and utilizing caching.

  • Tune Spark configurations such as executor memory, cores, and parallelism

  • Optimize code by reducing unnecessary shuffles, using efficient transformations, and avoiding unnecessary data movements

  • Utilize caching to store intermediate results in memory for faster access

Add your answer
Share interview questions and help millions of jobseekers 🌟

Q9. Do you have hands on experience on big data tools

Ans.

Yes, I have hands-on experience with big data tools.

  • I have worked extensively with Hadoop, Spark, and Kafka.

  • I have experience with data ingestion, processing, and storage using these tools.

  • I have also worked with NoSQL databases like Cassandra and MongoDB.

  • I am familiar with data warehousing concepts and have worked with tools like Redshift and Snowflake.

Add your answer

Q10. 4) Describe the SSO process between Snowflake and Azure Active Directory.

Ans.

SSO process between Snowflake and Azure Active Directory involves configuring SAML-based authentication.

  • Configure Snowflake to use SAML authentication with Azure AD as the identity provider

  • Set up a trust relationship between Snowflake and Azure AD

  • Users authenticate through Azure AD and are granted access to Snowflake resources

  • SSO eliminates the need for separate logins and passwords for Snowflake and Azure AD

Add your answer

Q11. How much data can be stored in MySQL database?

Ans.

The maximum amount of data that can be stored in a MySQL database depends on various factors.

  • The maximum size of a MySQL database is determined by the file system and operating system limitations.

  • The maximum size of a single table in MySQL is 64 terabytes (TB) for InnoDB storage engine and 256 terabytes (TB) for MyISAM storage engine.

  • The maximum number of rows in a table is determined by the maximum value of the AUTO_INCREMENT column.

  • The maximum size of a row in MySQL is 65,5...read more

Add your answer

Q12. Time travel, different types of tables in snowflake and their retention periods

Ans.

Snowflake has different types of tables with varying retention periods. Time travel allows accessing historical data.

  • Snowflake has two types of tables: transient and persistent

  • Transient tables are temporary and have a retention period of 1 day by default

  • Persistent tables are permanent and have a retention period of 1 year by default

  • Time travel in Snowflake allows accessing historical data at different points in time

  • Time travel is enabled by default for 1 day for transient tab...read more

Add your answer

Q13. 6) Automatic data loading from pipes in to Snowflake.

Ans.

Automate data loading from pipes into Snowflake for efficient data processing.

  • Use Snowpipe, a continuous data ingestion service provided by Snowflake, to automatically load data from pipes into Snowflake tables.

  • Snowpipe monitors a stage for new data files and loads them into the specified table in real-time.

  • Configure Snowpipe to trigger a data load whenever new data files are added to the stage, eliminating the need for manual intervention.

  • Snowpipe supports various file forma...read more

Add your answer

Q14. combine two columns in pyspark dataframe

Ans.

Use the withColumn method in PySpark to combine two columns in a DataFrame.

  • Use the withColumn method to create a new column by combining two existing columns

  • Specify the new column name and the expression to combine the two columns

  • Example: df = df.withColumn('combined_column', concat(col('column1'), lit(' '), col('column2')))

Add your answer

Q15. How would you truncate a table?

Ans.

Truncating a table removes all data from the table while keeping the structure intact.

  • Truncate is a DDL (Data Definition Language) command in SQL.

  • It is used to quickly delete all rows from a table.

  • Truncate is faster than using the DELETE statement.

  • Truncate cannot be rolled back, and it does not generate any log data.

  • The table structure, indexes, and constraints remain intact after truncation.

Add your answer

Q16. Why would someone index a table?

Ans.

To improve query performance by reducing the time it takes to retrieve data from a table.

  • Indexes help to speed up data retrieval operations by allowing the database to quickly locate the required data.

  • They can be used to optimize queries that involve filtering, sorting, or joining data.

  • Indexes can also improve the performance of data modification operations, such as inserts, updates, and deletes.

  • Choosing the right columns to index is important to ensure maximum benefit.

  • Exampl...read more

Add your answer

Q17. What is difference between lookup and sp activity

Ans.

Lookup is used to retrieve a single value from a dataset, while stored procedure activity executes a stored procedure in a database.

  • Lookup is used in data pipelines to retrieve a single value or a set of values from a dataset.

  • Stored procedure activity is used in ETL processes to execute a stored procedure in a database.

  • Lookup is typically used for data enrichment or validation purposes.

  • Stored procedure activity is commonly used for data transformation or loading tasks.

Add your answer

Q18. How to handle large amount of data on tableau.

Ans.

Utilize Tableau's features like data extracts, data blending, and performance optimization techniques.

  • Use data extracts to improve performance by reducing the amount of data being processed.

  • Utilize data blending to combine data from multiple sources without the need for complex ETL processes.

  • Optimize performance by using filters, aggregations, and calculations efficiently.

  • Consider using Tableau's in-memory data engine for faster processing of large datasets.

Add your answer

Q19. 1) Snowflake architecture in your current project.

Ans.

Snowflake architecture is used in our project for cloud-based data warehousing.

  • Snowflake follows a multi-cluster shared data architecture.

  • It separates storage and compute resources, allowing for independent scaling.

  • Data is stored in virtual warehouses, which are compute clusters that can be scaled up or down based on workload.

  • Snowflake uses a unique architecture called a multi-cluster, shared data architecture, which separates storage and compute resources for improved perfor...read more

Add your answer

Q20. Performance tuning techniques in Spark &amp; Hive

Ans.

Performance tuning techniques in Spark & Hive involve optimizing resource allocation, partitioning data, using appropriate data formats, and caching.

  • Optimize resource allocation by adjusting memory and CPU settings based on workload requirements

  • Partition data to distribute processing load evenly across nodes

  • Use appropriate data formats like Parquet or ORC for efficient storage and retrieval

  • Cache intermediate results to avoid recomputation and improve query performance

Add your answer

Q21. Difference between extract and live connection

Ans.

Extract connection imports data into Tableau while live connection directly connects to the data source.

  • Extract connection creates a static snapshot of data while live connection accesses real-time data from the source.

  • Extract connection is useful for large datasets or when offline access is needed.

  • Live connection is beneficial for real-time analysis and when data needs to be updated frequently.

  • Examples: Extract connection - importing a CSV file into Tableau. Live connection ...read more

Add your answer

Q22. Spark architecture in detail

Ans.

Spark architecture includes driver, executor, and cluster manager components for distributed data processing.

  • Spark architecture consists of a driver program that manages the execution of tasks across multiple worker nodes.

  • Executors are responsible for executing tasks on worker nodes and storing data in memory or disk.

  • Cluster manager is used to allocate resources and schedule tasks across the cluster.

  • Spark applications run as independent sets of processes on a cluster, coordin...read more

Add your answer

Q23. What is dense rank in sql

Ans.

Dense rank in SQL assigns a unique rank to each distinct row in a result set, with no gaps between the ranks.

  • Dense rank is used to assign a rank to each row in a result set without any gaps.

  • It differs from regular rank in that it does not skip ranks if there are ties.

  • For example, if two rows have the same value and are ranked 1st, the next row will be ranked 2nd, not 3rd.

Add your answer

Q24. what is the oops of java

Ans.

Object-oriented programming concepts in Java

  • Encapsulation: bundling data and methods that operate on the data into a single unit

  • Inheritance: allows a class to inherit properties and behavior from another class

  • Polymorphism: ability of a method to do different things based on the object it is acting upon

  • Abstraction: hiding the implementation details and showing only the functionality to the user

Add your answer

Q25. 2) Database roles in Snowflake.

Ans.

Database roles in Snowflake define permissions and access control for users and objects.

  • Database roles in Snowflake are used to manage permissions and access control for users and objects.

  • Roles can be assigned to users or other roles to grant specific privileges.

  • Examples of roles in Snowflake include ACCOUNTADMIN, SYSADMIN, SECURITYADMIN, and PUBLIC.

Add your answer

Q26. What is Pyspark

Ans.

Pyspark is a Python API for Apache Spark, a powerful open-source distributed computing system.

  • Pyspark allows users to write Spark applications using Python programming language.

  • It provides high-level APIs in Python for Spark's core functionality.

  • Pyspark can be used for processing large datasets in a distributed computing environment.

  • Example: Using Pyspark to perform data analysis and machine learning tasks on big data.

Add your answer

Q27. What is spark cluster

Ans.

Spark cluster is a group of interconnected computers that work together to process large datasets using Apache Spark.

  • Consists of a master node and multiple worker nodes

  • Master node manages the distribution of tasks and resources

  • Worker nodes execute the tasks in parallel

  • Used for processing big data and running distributed computing jobs

Add your answer

Q28. How hive works in hdfs

Ans.

Hive is a data warehouse system built on top of Hadoop for querying and analyzing large datasets stored in HDFS.

  • Hive translates SQL-like queries into MapReduce jobs to process data stored in HDFS

  • It uses a metastore to store metadata about tables and partitions

  • HiveQL is the query language used in Hive, similar to SQL

  • Hive supports partitioning, bucketing, and indexing for optimizing queries

Add your answer

Q29. 3) Session Policy in Snowflake.

Ans.

Session Policy in Snowflake defines the behavior of a session, including session timeout and idle timeout settings.

  • Session Policy can be set at the account, user, or role level in Snowflake.

  • Session Policy settings include session timeout, idle timeout, and other session-related configurations.

  • Example: Setting a session timeout of 30 minutes will automatically end the session if there is no activity for 30 minutes.

Add your answer

Q30. 5) Network Policy in Snowflake.

Ans.

Network Policy in Snowflake controls access to Snowflake resources based on IP addresses or ranges.

  • Network Policies are used to restrict access to Snowflake resources based on IP addresses or ranges.

  • They can be applied at the account, user, or role level.

  • Network Policies can be used to whitelist specific IP addresses or ranges that are allowed to access Snowflake resources.

  • They can also be used to blacklist IP addresses or ranges that are not allowed to access Snowflake resou...read more

Add your answer

Q31. Sql query to write max salary

Ans.

Use SQL query with MAX function to find the highest salary in a table.

  • Use SELECT MAX(salary) FROM table_name;

  • Make sure to replace 'salary' with the actual column name in the table.

  • Ensure proper permissions to access the table.

Add your answer

Q32. Default join in tableau

Ans.

Default join in Tableau is inner join

  • Default join in Tableau is inner join, which only includes rows that have matching values in both tables

  • Other types of joins in Tableau include left join, right join, and full outer join

  • To change the default join type in Tableau, you can drag the field from one table to another and select the desired join type

Add your answer

Q33. pyspark optimization technique

Ans.

One pyspark optimization technique is using broadcast variables to efficiently distribute read-only data across all nodes.

  • Use broadcast variables to efficiently distribute read-only data across all nodes

  • Avoid shuffling data unnecessarily by using partitioning and caching

  • Optimize data processing by using appropriate transformations and actions

Add your answer

Q34. Explain blending

Ans.

Blending is the process of combining multiple data sources or models to create a single, unified dataset or prediction.

  • Blending involves taking the outputs of multiple models and combining them to improve overall performance.

  • It is commonly used in machine learning competitions to create an ensemble model that outperforms individual models.

  • Blending can also refer to combining different data sources, such as blending demographic data with sales data for analysis.

Add your answer

Q35. Left join in Sql

Ans.

Left join in SQL combines rows from two tables based on a related column, including all rows from the left table.

  • Left join keyword: LEFT JOIN

  • Syntax: SELECT columns FROM table1 LEFT JOIN table2 ON table1.column = table2.column

  • Retrieves all rows from table1 and the matching rows from table2, if any

  • Non-matching rows from table2 will have NULL values for columns from table2

Add your answer

Q36. Introduction of urs

Ans.

I am a Senior Data Engineer with expertise in data processing and analysis.

  • Experienced in designing and implementing data pipelines

  • Proficient in programming languages like Python and SQL

  • Skilled in working with big data technologies like Hadoop and Spark

  • Familiar with data warehousing and ETL processes

  • Strong problem-solving and analytical skills

Add your answer
Contribute & help others!
Write a review
Share interview
Contribute salary
Add office photos

Interview Process at GMR Group

based on 36 interviews in the last 1 year
3 Interview rounds
Technical Round 1
Technical Round 2
HR Round
View more
Interview Tips & Stories
Ace your next interview with expert advice and inspiring stories

Top Senior Data Engineer Interview Questions from Similar Companies

3.7
 • 22 Interview Questions
3.5
 • 14 Interview Questions
3.8
 • 10 Interview Questions
View all
Share an Interview
Stay ahead in your career. Get AmbitionBox app
qr-code
Helping over 1 Crore job seekers every month in choosing their right fit company
70 Lakh+

Reviews

5 Lakh+

Interviews

4 Crore+

Salaries

1 Cr+

Users/Month

Contribute to help millions
Get AmbitionBox app

Made with ❤️ in India. Trademarks belong to their respective owners. All rights reserved © 2024 Info Edge (India) Ltd.

Follow us
  • Youtube
  • Instagram
  • LinkedIn
  • Facebook
  • Twitter