i
Insight Global Technologies
Filter interviews by
I applied via LinkedIn and was interviewed in Sep 2024. There were 2 interview rounds.
Set up ETL flow for data in Lake House using Databricks
Connect Databricks to Lake House storage (e.g. Azure Data Lake Storage)
Define ETL process using Databricks notebooks or jobs
Extract data from Lake House, transform as needed, and load into target destination
Monitor and schedule ETL jobs for automated data processing
I handle failures in ADF Pipelines by setting up monitoring, alerts, retries, and error handling mechanisms.
Implement monitoring to track pipeline runs and identify failures
Set up alerts to notify when a pipeline fails
Configure retries for transient failures
Use error handling activities like Try/Catch to manage exceptions
Utilize Azure Monitor to analyze pipeline performance and troubleshoot issues
Yes, I have worked on developing a Data Validation Framework to ensure data accuracy and consistency.
Developed automated data validation scripts to check for data accuracy and consistency
Implemented data quality checks to identify and resolve data issues
Utilized tools like SQL queries, Python scripts, and Azure Data Factory for data validation
Worked closely with data stakeholders to define validation rules and requirem
SQL query to fetch Top 3 revenue generating Products from Sales table
Use the SELECT statement to retrieve data from the Sales table
Use the GROUP BY clause to group the data by Product
Use the ORDER BY clause to sort the revenue in descending order
Use the LIMIT clause to fetch only the top 3 revenue generating Products
SQL query to fetch customers who have not transacted in last 30 days but did before
Use a subquery to filter customers who transacted before 30 days
Use NOT IN or NOT EXISTS to exclude customers who transacted in last 30 days
Dynamic Content in ADF allows for dynamic values to be passed between activities in Azure Data Factory.
Dynamic Content can be used to pass values between activities, such as passing output from one activity as input to another.
Expressions can be used within Dynamic Content to manipulate data or create dynamic values.
Dynamic Content can be used in various ADF components like datasets, linked services, and activities.
For...
I have applied optimization techniques like partitioning, caching, and cluster sizing in Databricks projects.
Utilized partitioning to improve query performance by limiting the amount of data scanned
Implemented caching to store frequently accessed data in memory for faster retrieval
Adjusted cluster sizing based on workload requirements to optimize cost and performance
Distributed table in Synapse is a table that is distributed across multiple nodes for parallel processing.
Distributed tables in Synapse are divided into distributions to optimize query performance.
There are three distribution types: Hash distribution, Round-robin distribution, and Replicate distribution.
Hash distribution is ideal for joining large tables on a common key, Round-robin distribution evenly distributes data...
You can load data from Databricks to Synapse using PolyBase or Azure Data Factory.
Use PolyBase to load data from Databricks to Synapse by creating an external table in Synapse pointing to the Databricks data location.
Alternatively, use Azure Data Factory to copy data from Databricks to Synapse by creating a pipeline with Databricks as source and Synapse as destination.
Ensure proper permissions and connectivity between
Yes, I have worked on real-time data processing projects using technologies like Apache Kafka and Spark Streaming.
Implemented real-time data pipelines using Apache Kafka for streaming data ingestion
Utilized Spark Streaming for processing and analyzing real-time data
Worked on monitoring and optimizing the performance of real-time data processing systems
Top trending discussions
Interview experience
HR Recruiter
3
salaries
| ₹1.5 L/yr - ₹2.8 L/yr |
Senior Data Engineer
3
salaries
| ₹34 L/yr - ₹42.4 L/yr |
TCS
Wipro
Infosys
HCLTech