i
Hitech Digital Solutions
Filter interviews by
I applied via Naukri.com and was interviewed in Aug 2024. There was 1 interview round.
Asked me about financial terms like intangible assets
Top trending discussions
I applied via Naukri.com and was interviewed in Aug 2024. There was 1 interview round.
I am a Data Engineer with experience in designing and implementing project architectures. My day-to-day responsibilities include data processing, ETL tasks, and ensuring data quality.
Designing and implementing project architectures for data processing
Performing ETL tasks to extract, transform, and load data into the system
Ensuring data quality and integrity through data validation and cleansing
Collaborating with cross-...
Use SQL to calculate the difference in marks for each student ID across different years.
Use a self join on the table to compare marks for the same student ID across different years.
Calculate the difference in marks by subtracting the marks from different years.
Group the results by student ID to get the difference in marks for each student.
The answer to the question is that in which state which gender makes the most purchases.
Aggregate the data by state and gender to calculate the total purchases made by each gender in each state.
Identify the gender with the highest total purchases in each state.
Present the results in a table or chart for easy visualization.
ADF stands for Azure Data Factory, a cloud-based data integration service that allows you to create, schedule, and manage data pipelines.
ADF is used for building, scheduling, and monitoring data pipelines to move and transform data from various sources to destinations.
It supports data integration between various data stores such as Azure SQL Database, Azure Blob Storage, and on-premises data sources.
ADF provides a visu...
DAG stands for Directed Acyclic Graph, a data structure used to represent dependencies between tasks in a workflow.
DAG is a collection of nodes connected by edges, where each edge has a direction and there are no cycles.
It is commonly used in data engineering for representing data pipelines and workflows.
DAGs help in visualizing and optimizing the order of tasks to be executed in a workflow.
Popular tools like Apache Ai...
Lineage refers to the history and origin of data, including its source, transformations, and dependencies.
Lineage helps in understanding how data is generated, processed, and transformed throughout its lifecycle.
It tracks the flow of data from its source to its destination, including any intermediate steps or transformations.
Lineage is important for data governance, data quality, and troubleshooting data issues.
Example...
Spark handles fault tolerance through resilient distributed datasets (RDDs) and lineage tracking.
Spark achieves fault tolerance through RDDs, which are immutable distributed collections of objects that can be rebuilt if a partition is lost.
RDDs track the lineage of transformations applied to the data, allowing lost partitions to be recomputed based on the original data and transformations.
Spark also replicates data par...
Only one job will run in parallel in Spark with four cores and four worker nodes.
In Spark, each core can only run one task at a time, so with four cores, only four tasks can run concurrently.
Since there are four worker nodes, each with four cores, a total of four tasks can run in parallel.
Therefore, only one job will run in parallel in this scenario.
I have used techniques like indexing, query optimization, and parallel processing in my projects.
Indexing: Used to improve the speed of data retrieval by creating indexes on columns frequently used in queries.
Query optimization: Rewriting queries to improve efficiency and reduce execution time.
Parallel processing: Distributing tasks across multiple processors to speed up data processing.
Caching: Storing frequently acce...
posted on 11 Nov 2024
Key Performance Indicators (KPIs) are measurable values that demonstrate how effectively a company is achieving key business objectives.
KPIs are specific, measurable, achievable, relevant, and time-bound metrics used to evaluate the success of an organization or a particular activity.
Examples of KPIs include revenue growth rate, customer acquisition cost, customer retention rate, website traffic, conversion rate, and e
loc is label-based indexing while iloc is integer-based indexing in pandas.
loc is used to access a group of rows and columns by labels
iloc is used to access a group of rows and columns by integer position
Example: df.loc['row_label', 'column_label'] vs df.iloc[0, 1]
I applied via LinkedIn and was interviewed in Jan 2024. There was 1 interview round.
Pyspark is a Python API for Apache Spark, a powerful open-source distributed computing system.
Pyspark is used for processing large datasets in parallel across a cluster of computers.
It provides high-level APIs in Python for Spark programming.
Pyspark allows seamless integration with other Python libraries like Pandas and NumPy.
Example: Using Pyspark to perform data analysis and machine learning tasks on big data sets.
Pyspark SQL is a module in Apache Spark that provides a SQL interface for working with structured data.
Pyspark SQL allows users to run SQL queries on Spark dataframes.
It provides a more concise and user-friendly way to interact with data compared to traditional Spark RDDs.
Users can leverage the power of SQL for data manipulation and analysis within the Spark ecosystem.
To merge 2 dataframes of different schema, use join operations or data transformation techniques.
Use join operations like inner join, outer join, left join, or right join based on the requirement.
Perform data transformation to align the schemas before merging.
Use tools like Apache Spark, Pandas, or SQL to merge dataframes with different schemas.
Pyspark streaming is a scalable and fault-tolerant stream processing engine built on top of Apache Spark.
Pyspark streaming allows for real-time processing of streaming data.
It provides high-level APIs in Python for creating streaming applications.
Pyspark streaming supports various data sources like Kafka, Flume, Kinesis, etc.
It enables windowed computations and stateful processing for handling streaming data.
Example: C...
I was interviewed before Jan 2024.
I applied via Naukri.com and was interviewed before Jun 2021. There was 1 interview round.
I applied via Company Website and was interviewed in Jan 2024. There was 1 interview round.
Spark architecture includes driver, cluster manager, and worker nodes for distributed processing.
Spark architecture consists of a driver program that manages the execution of tasks on worker nodes.
Cluster manager is responsible for allocating resources and scheduling tasks across worker nodes.
Worker nodes execute the tasks and store data in memory or disk for processing.
Example: In a Spark application, the driver progr...
I was interviewed before Oct 2022.
1.ETL Pipeline
2.PySpark Code
3.SQL
I applied via Recruitment Consulltant and was interviewed before Jul 2023. There were 2 interview rounds.
Handling ADF pipelines involves designing, building, and monitoring data pipelines in Azure Data Factory.
Designing data pipelines using ADF UI or code
Building pipelines with activities like copy data, data flow, and custom activities
Monitoring pipeline runs and debugging issues
Optimizing pipeline performance and scheduling triggers
based on 1 interview
Interview experience
Aeronautical Analyst
89
salaries
| ₹2.7 L/yr - ₹7.2 L/yr |
BIM Modeller
61
salaries
| ₹2.2 L/yr - ₹5.9 L/yr |
Design Engineer
56
salaries
| ₹2 L/yr - ₹7.2 L/yr |
Customer Support Representative
55
salaries
| ₹2.3 L/yr - ₹3.6 L/yr |
Process Associate
37
salaries
| ₹1.5 L/yr - ₹3.2 L/yr |
Tech Mahindra
Wipro
Infosys
TCS