Insight Global Technologies
10+ Greenfield Ezra Street Kolkata Interview Questions and Answers
Q1. Write a SQL query to fetch the Customer who have not done any transaction in last 30 day but did before 30 days
SQL query to fetch customers who have not transacted in last 30 days but did before
Use a subquery to filter customers who transacted before 30 days
Use NOT IN or NOT EXISTS to exclude customers who transacted in last 30 days
Q2. What is Distributed table in Synapse? How to choose distribution type
Distributed table in Synapse is a table that is distributed across multiple nodes for parallel processing.
Distributed tables in Synapse are divided into distributions to optimize query performance.
There are three distribution types: Hash distribution, Round-robin distribution, and Replicate distribution.
Hash distribution is ideal for joining large tables on a common key, Round-robin distribution evenly distributes data, and Replicate distribution duplicates data on all nodes....read more
Q3. What is Dynamic Content in ADF and how did you use in previous projects
Dynamic Content in ADF allows for dynamic values to be passed between activities in Azure Data Factory.
Dynamic Content can be used to pass values between activities, such as passing output from one activity as input to another.
Expressions can be used within Dynamic Content to manipulate data or create dynamic values.
Dynamic Content can be used in various ADF components like datasets, linked services, and activities.
For example, in a pipeline, you can use Dynamic Content to pa...read more
Q4. What all optimization techniques have you applied in projects using Databricks
I have applied optimization techniques like partitioning, caching, and cluster sizing in Databricks projects.
Utilized partitioning to improve query performance by limiting the amount of data scanned
Implemented caching to store frequently accessed data in memory for faster retrieval
Adjusted cluster sizing based on workload requirements to optimize cost and performance
Q5. Have you worked on any real time data processing projects
Yes, I have worked on real-time data processing projects using technologies like Apache Kafka and Spark Streaming.
Implemented real-time data pipelines using Apache Kafka for streaming data ingestion
Utilized Spark Streaming for processing and analyzing real-time data
Worked on monitoring and optimizing the performance of real-time data processing systems
Q6. How to load data Synapse which is available in Databricks
You can load data from Databricks to Synapse using PolyBase or Azure Data Factory.
Use PolyBase to load data from Databricks to Synapse by creating an external table in Synapse pointing to the Databricks data location.
Alternatively, use Azure Data Factory to copy data from Databricks to Synapse by creating a pipeline with Databricks as source and Synapse as destination.
Ensure proper permissions and connectivity between Databricks and Synapse for data transfer.
Q7. Have you worked on any Data Validation Framework?
Yes, I have worked on developing a Data Validation Framework to ensure data accuracy and consistency.
Developed automated data validation scripts to check for data accuracy and consistency
Implemented data quality checks to identify and resolve data issues
Utilized tools like SQL queries, Python scripts, and Azure Data Factory for data validation
Worked closely with data stakeholders to define validation rules and requirements
Q8. Setup an ETL flow for data present in Lake House using Databricks
Set up ETL flow for data in Lake House using Databricks
Connect Databricks to Lake House storage (e.g. Azure Data Lake Storage)
Define ETL process using Databricks notebooks or jobs
Extract data from Lake House, transform as needed, and load into target destination
Monitor and schedule ETL jobs for automated data processing
Q9. Write a SQL query to fetch the Top 3 revenue generating Product from Sales table
SQL query to fetch Top 3 revenue generating Products from Sales table
Use the SELECT statement to retrieve data from the Sales table
Use the GROUP BY clause to group the data by Product
Use the ORDER BY clause to sort the revenue in descending order
Use the LIMIT clause to fetch only the top 3 revenue generating Products
Q10. How did you handle failures in ADF Pipelines
I handle failures in ADF Pipelines by setting up monitoring, alerts, retries, and error handling mechanisms.
Implement monitoring to track pipeline runs and identify failures
Set up alerts to notify when a pipeline fails
Configure retries for transient failures
Use error handling activities like Try/Catch to manage exceptions
Utilize Azure Monitor to analyze pipeline performance and troubleshoot issues
Interview Process at Greenfield Ezra Street Kolkata
Top Interview Questions from Similar Companies
Reviews
Interviews
Salaries
Users/Month