Infosys
HPCL Rajasthan Refinery Interview Questions and Answers
Q1. Python dataframes and how we use them in project and where at time
Python dataframes are used to organize and manipulate data in a tabular format.
Dataframes are created using the pandas library in Python.
They allow for easy manipulation of data, such as filtering, sorting, and grouping.
Dataframes can be used in various projects, such as data analysis, machine learning, and data visualization.
Examples of using dataframes include analyzing sales data, predicting customer behavior, and visualizing stock market trends.
Q2. Project which handled in last organisation
Developed a data pipeline to ingest, process, and analyze customer feedback data for product improvement.
Designed and implemented ETL processes to extract data from various sources
Utilized Apache Spark for data processing and analysis
Built data visualizations to present insights to stakeholders
Q3. What is the architecture of Spark
Spark has a master-slave architecture with a cluster manager and worker nodes.
Spark has a driver program that communicates with a cluster manager to allocate resources and schedule tasks.
The cluster manager can be standalone, Mesos, or YARN.
Worker nodes execute tasks and store data in memory or on disk.
Spark can also utilize external data sources like Hadoop Distributed File System (HDFS) or Amazon S3.
Spark supports various APIs like SQL, Streaming, MLlib, and GraphX.
Q4. If clone table contain any privilege?
Clone tables inherit the privileges of the original table.
Clone tables do inherit the privileges of the original table they were cloned from.
Any user with privileges on the original table will also have the same privileges on the clone table.
This can be useful for maintaining consistent access control across tables.
Q5. What are examples of iaas,paas,saas
Examples of IaaS, PaaS, and SaaS include AWS (IaaS), Google App Engine (PaaS), and Salesforce (SaaS).
IaaS - Infrastructure as a Service: AWS, Microsoft Azure, Google Cloud Platform
PaaS - Platform as a Service: Google App Engine, Heroku, Microsoft Azure App Service
SaaS - Software as a Service: Salesforce, Google Workspace, Microsoft Office 365
Q6. Different ADF activities used by me
Some ADF activities include Copy Data, Execute Pipeline, Lookup, and Web Activity.
Copy Data activity for moving data between sources and sinks
Execute Pipeline activity for running another pipeline within a pipeline
Lookup activity for retrieving data from a dataset
Web Activity for calling a web service or API
Q7. What is Smb join
Smb join is a method used to join two tables in SQL Server.
Smb join stands for Sort Merge Bucket join.
It is used when joining large tables.
It involves sorting the tables and then merging them.
It is an efficient join method for large tables with indexes.
Example: SELECT * FROM table1 JOIN table2 ON table1.column = table2.column OPTION (HASH JOIN, MERGE JOIN, LOOP JOIN);
Q8. Difference between Adf and ADB
ADF stands for Azure Data Factory, a cloud-based data integration service. ADB stands for Azure Databricks, an Apache Spark-based analytics platform.
ADF is used for data integration and orchestration, while ADB is used for big data analytics and machine learning.
ADF provides a visual interface for building data pipelines, while ADB offers collaborative notebooks for data exploration and analysis.
ADF supports various data sources and destinations, while ADB is optimized for pr...read more
Q9. Code on Palindrome
A palindrome is a word, phrase, number, or other sequence of characters that reads the same forward and backward.
Check if the string is equal to its reverse to determine if it's a palindrome.
Ignore spaces and punctuation when checking for palindromes.
Convert the string to lowercase before checking for palindromes.
Examples: 'racecar', 'A man, a plan, a canal, Panama'
Interview Process at HPCL Rajasthan Refinery
Top Data Engineer Interview Questions from Similar Companies
Reviews
Interviews
Salaries
Users/Month