Filter interviews by
I applied via Approached by Company and was interviewed in Dec 2023. There was 1 interview round.
ADF pipelines can be triggered using triggers like schedule, event, manual, or tumbling window.
Use a schedule trigger to run the pipeline at specific times or intervals.
Use an event trigger to start the pipeline based on an event like a file being added to a storage account.
Manually trigger the pipeline through the ADF UI or REST API.
Tumbling window trigger can be used to run the pipeline at regular intervals based on
To ensure ADF pipeline does not fail, monitor pipeline health, handle errors gracefully, optimize performance, and conduct regular testing.
Monitor pipeline health regularly to identify and address potential issues proactively
Handle errors gracefully by implementing error handling mechanisms such as retries, logging, and notifications
Optimize performance by tuning pipeline configurations, optimizing data processing logi...
Top trending discussions
I applied via Recruitment Consulltant
The aptitude test lasts 30 minutes and focuses on topics relevant to data engineering, including Spark, SQL, Azure, and PySpark.
The coding test is a one-hour examination on PySpark.
I applied via Naukri.com and was interviewed in Nov 2024. There was 1 interview round.
I am a Senior Data Engineer with experience in building scalable data pipelines and optimizing data processing workflows.
Experience in designing and implementing ETL processes using tools like Apache Spark and Airflow
Proficient in working with large datasets and optimizing query performance
Strong background in data modeling and database design
Worked on projects involving real-time data processing and streaming analytic
Decorators in Python are functions that modify the behavior of other functions or methods.
Decorators are defined using the @decorator_name syntax before a function definition.
They can be used to add functionality to existing functions without modifying their code.
Decorators can be used for logging, timing, authentication, and more.
Example: @staticmethod decorator in Python is used to define a static method in a class.
SQL query to group by employee ID and combine first name and last name with a space
Use the GROUP BY clause to group by employee ID
Use the CONCAT function to combine first name and last name with a space
Select employee ID, CONCAT(first_name, ' ', last_name) AS full_name
Constructors in Python are special methods used for initializing objects. They are called automatically when a new instance of a class is created.
Constructors are defined using the __init__() method in a class.
They are used to initialize instance variables of a class.
Example: class Person: def __init__(self, name, age): self.name = name self.age = age person1 = Person('Alice', 30)
Indexing in SQL is a technique used to improve the performance of queries by creating a data structure that allows for faster retrieval of data.
Indexes are created on columns in a database table to speed up the retrieval of rows that match a certain condition in a WHERE clause.
Indexes can be created using CREATE INDEX statement in SQL.
Types of indexes include clustered indexes, non-clustered indexes, unique indexes, an...
Spark works well with Parquet files due to its columnar storage format, efficient compression, and ability to push down filters.
Parquet files are columnar storage format, which aligns well with Spark's processing model of working on columns rather than rows.
Parquet files support efficient compression, reducing storage space and improving read performance in Spark.
Spark can push down filters to Parquet files, allowing f...
I applied via Naukri.com and was interviewed in Nov 2024. There were 2 interview rounds.
Use SQL query with MAX function to find the highest salary in a table.
Use SELECT MAX(salary) FROM table_name;
Make sure to replace 'salary' with the actual column name in the table.
Ensure proper permissions to access the table.
Dense rank in SQL assigns a unique rank to each distinct row in a result set, with no gaps between the ranks.
Dense rank is used to assign a rank to each row in a result set without any gaps.
It differs from regular rank in that it does not skip ranks if there are ties.
For example, if two rows have the same value and are ranked 1st, the next row will be ranked 2nd, not 3rd.
Spark cluster is a group of interconnected computers that work together to process large datasets using Apache Spark.
Consists of a master node and multiple worker nodes
Master node manages the distribution of tasks and resources
Worker nodes execute the tasks in parallel
Used for processing big data and running distributed computing jobs
Hive is a data warehouse system built on top of Hadoop for querying and analyzing large datasets stored in HDFS.
Hive translates SQL-like queries into MapReduce jobs to process data stored in HDFS
It uses a metastore to store metadata about tables and partitions
HiveQL is the query language used in Hive, similar to SQL
Hive supports partitioning, bucketing, and indexing for optimizing queries
I was interviewed in Aug 2024.
Python and sql tasks
I was interviewed in Sep 2024.
Pyspark is a Python library for big data processing using Spark framework.
Pyspark is used for processing large datasets in parallel.
It provides APIs for data manipulation, querying, and analysis.
Example: Using pyspark to read a CSV file and perform data transformations.
Databricks optimisation techniques improve performance and efficiency of data processing on the Databricks platform.
Use cluster sizing and autoscaling to optimize resource allocation based on workload
Leverage Databricks Delta for optimized data storage and processing
Utilize caching and persisting data to reduce computation time
Optimize queries by using appropriate indexing and partitioning strategies
Databricks is a unified data analytics platform that provides a collaborative environment for data engineers.
Databricks is built on top of Apache Spark and provides a workspace for data engineering tasks.
It allows for easy integration with various data sources and tools for data processing.
Databricks provides features like notebooks, clusters, and libraries for efficient data engineering workflows.
posted on 26 Oct 2024
I applied via Naukri.com and was interviewed in Sep 2024. There was 1 interview round.
Spark Optimization, Transformation, DLT, DL, Data Governance
Python
SQL
based on 1 interview
Interview experience
Software Developer
7
salaries
| ₹4 L/yr - ₹18 L/yr |
Software Engineer
5
salaries
| ₹2.3 L/yr - ₹34.5 L/yr |
Manager
5
salaries
| ₹6 L/yr - ₹12.5 L/yr |
Senior Manager Information Technology
5
salaries
| ₹20 L/yr - ₹25 L/yr |
Senior Administrator
4
salaries
| ₹11.5 L/yr - ₹35 L/yr |
TCS
Infosys
Wipro
HCLTech