Filter interviews by
I applied via Naukri.com and was interviewed before Oct 2022. There were 3 interview rounds.
Limitations of AWS Lambda service
Limited execution time (15 minutes maximum)
Limited memory allocation (up to 3GB)
Cold start latency can impact performance
Limited support for long-running processes
Difficulty in debugging and monitoring
Parallel data processing using AWS services involves distributing data processing tasks across multiple resources for faster and more efficient processing.
Use AWS Glue for ETL (Extract, Transform, Load) tasks in parallel
Leverage AWS EMR (Elastic MapReduce) for processing large amounts of data in parallel using Hadoop or Spark
Utilize AWS Lambda for serverless parallel processing of small tasks
Implement AWS Batch for par
Snowflake is a cloud-based data warehousing platform that allows users to store and analyze large amounts of data.
Snowflake is a fully managed service that works on a pay-as-you-go model
It separates storage and compute resources, allowing users to scale each independently
Snowflake supports SQL queries and has built-in support for semi-structured data like JSON and Avro
Window functions in SQL are used to perform calculations across a set of table rows related to the current row.
Window functions are used to calculate values based on a set of rows related to the current row
They can be used with aggregate functions like SUM, AVG, COUNT, etc.
Common window functions include ROW_NUMBER, RANK, DENSE_RANK, LEAD, LAG, etc.
Top trending discussions
I applied via Naukri.com and was interviewed in Nov 2024. There were 2 interview rounds.
I applied via Approached by Company and was interviewed in Jun 2024. There were 2 interview rounds.
Python coding question and couple of SQL questions
Spark optimization techniques focus on improving performance and efficiency of Spark jobs.
Partitioning data to optimize parallelism
Caching frequently accessed data
Using broadcast variables for small lookup tables
Avoiding shuffling operations whenever possible
Tuning memory settings for optimal performance
I have faced difficulties in handling large volumes of data, ensuring data quality, and managing dependencies in ETL pipelines.
Handling large volumes of data can lead to performance issues and scalability challenges.
Ensuring data quality involves dealing with data inconsistencies, errors, and missing values.
Managing dependencies between different stages of the ETL process can be complex and prone to failures.
I applied via Walk-in and was interviewed in Apr 2024. There were 2 interview rounds.
SQL query to retrieve all employees from a table named 'employees'
Use SELECT * FROM employees;
Replace '*' with specific columns if needed, e.g. SELECT employee_id, name FROM employees;
Python program to print 'Hello, World!'
Use the print() function in Python to display text on the screen
Enclose the text in single or double quotes to indicate a string
I applied via LinkedIn and was interviewed in Feb 2024. There were 3 interview rounds.
Working with nested JSON using PySpark involves using the StructType and StructField classes to define the schema and then using the select function to access nested fields.
Define the schema using StructType and StructField classes
Use the select function to access nested fields
Use dot notation to access nested fields, for example df.select('nested_field.sub_field')
Implementing SCD2 involves tracking historical changes in data over time.
Identify the business key that uniquely identifies each record
Add effective start and end dates to track when the record was valid
Insert new records with updated data and end date of '9999-12-31'
Update end date of previous record when a change occurs
Use a SQL query to select data from table 2 where data exists in table 1
Use a JOIN statement to link the two tables based on a common column
Specify the columns you want to select from table 2
Use a WHERE clause to check for existence of data in table 1
The number of records retrieved after performing joins depends on the type of join - inner, left, right, or outer.
Inner join retrieves only the matching records from both tables
Left join retrieves all records from the left table and matching records from the right table
Right join retrieves all records from the right table and matching records from the left table
Outer join retrieves all records from both tables, filling
I applied via Recruitment Consulltant and was interviewed in Mar 2024. There were 2 interview rounds.
Files can be read in AWS Glue using Data Catalog, crawlers, and Glue ETL jobs.
Use AWS Glue Data Catalog to store metadata information about the files.
Crawlers can automatically infer the schema of the files and populate the Data Catalog.
Glue ETL jobs can then be used to read the files from various sources like S3, RDS, etc.
Supports various file formats like CSV, JSON, Parquet, etc.
Duplicate records can be identified using SQL queries by comparing columns and using aggregate functions.
Use GROUP BY clause with COUNT() function to identify duplicate records
Use HAVING clause to filter out records with count greater than 1
Join the table with itself on specific columns to find duplicates
I was interviewed in Jul 2023.
Snowflake architecture is used in our project for cloud-based data warehousing.
Snowflake follows a multi-cluster shared data architecture.
It separates storage and compute resources, allowing for independent scaling.
Data is stored in virtual warehouses, which are compute clusters that can be scaled up or down based on workload.
Snowflake uses a unique architecture called a multi-cluster, shared data architecture, which s...
Database roles in Snowflake define permissions and access control for users and objects.
Database roles in Snowflake are used to manage permissions and access control for users and objects.
Roles can be assigned to users or other roles to grant specific privileges.
Examples of roles in Snowflake include ACCOUNTADMIN, SYSADMIN, SECURITYADMIN, and PUBLIC.
Session Policy in Snowflake defines the behavior of a session, including session timeout and idle timeout settings.
Session Policy can be set at the account, user, or role level in Snowflake.
Session Policy settings include session timeout, idle timeout, and other session-related configurations.
Example: Setting a session timeout of 30 minutes will automatically end the session if there is no activity for 30 minutes.
SSO process between Snowflake and Azure Active Directory involves configuring SAML-based authentication.
Configure Snowflake to use SAML authentication with Azure AD as the identity provider
Set up a trust relationship between Snowflake and Azure AD
Users authenticate through Azure AD and are granted access to Snowflake resources
SSO eliminates the need for separate logins and passwords for Snowflake and Azure AD
Network Policy in Snowflake controls access to Snowflake resources based on IP addresses or ranges.
Network Policies are used to restrict access to Snowflake resources based on IP addresses or ranges.
They can be applied at the account, user, or role level.
Network Policies can be used to whitelist specific IP addresses or ranges that are allowed to access Snowflake resources.
They can also be used to blacklist IP addresse...
Automate data loading from pipes into Snowflake for efficient data processing.
Use Snowpipe, a continuous data ingestion service provided by Snowflake, to automatically load data from pipes into Snowflake tables.
Snowpipe monitors a stage for new data files and loads them into the specified table in real-time.
Configure Snowpipe to trigger a data load whenever new data files are added to the stage, eliminating the need fo...
Query acceleration speeds up query processing by optimizing query execution and reducing the time taken to retrieve data.
Query acceleration uses techniques like indexing, partitioning, and caching to optimize query execution.
It reduces the time taken to retrieve data by minimizing disk I/O and utilizing in-memory processing.
Examples include using columnar storage formats like Parquet or optimizing join operations.
I applied via Company Website and was interviewed in Aug 2023. There was 1 interview round.
Find duplicates in a list
I applied via Naukri.com and was interviewed before Jan 2023. There was 1 interview round.
Rank types in SQL
RANK(): assigns a unique rank to each row within the result set
DENSE_RANK(): assigns a unique rank to each row within the result set, but may have gaps
ROW_NUMBER(): assigns a unique number to each row within the result set
Row_Number is a function in SQL Server that assigns a unique sequential number to each row in a result set.
Row_Number is used to generate a unique identifier for each row in a result set
It is commonly used for pagination, ranking, and partitioning data
The function requires an ORDER BY clause to determine the order of the rows
The result of Row_Number is an integer value starting from 1
Performance tuning in SQL involves optimizing queries, indexes, and database configurations to improve query execution time.
Identify and optimize slow-performing queries
Create and maintain appropriate indexes
Partition large tables to improve query performance
Optimize database configurations and server settings
Use query execution plans to identify bottlenecks
Consider denormalization for frequently accessed data
Use appro...
I applied via Naukri.com and was interviewed in Aug 2020. There were 4 interview rounds.
based on 1 review
Rating in categories
Quantitative Analyst
5
salaries
| ₹9.2 L/yr - ₹16 L/yr |
Customer Success Manager
4
salaries
| ₹12 L/yr - ₹15 L/yr |
Quant Analyst
3
salaries
| ₹8 L/yr - ₹16 L/yr |
Quantiphi Analytics Solutions Private Limited
Fractal Analytics
Tiger Analytics
LatentView Analytics