i
Infinite Computer Solutions
Filter interviews by
I applied via Approached by Company and was interviewed in Sep 2023. There were 3 interview rounds.
What people are saying about Infinite Computer Solutions
I applied via LinkedIn and was interviewed in Nov 2024. There was 1 interview round.
Databricks is a unified analytics platform that provides a collaborative environment for data scientists, engineers, and analysts.
Databricks is built on top of Apache Spark, providing a unified platform for data engineering, data science, and business analytics.
Internals of Databricks include a cluster manager, job scheduler, and workspace for collaboration.
Optimization techniques in Databricks include query optimizati...
I have worked with Azure Data Factory, Azure Databricks, and Azure SQL Database.
Azure Data Factory for data integration and orchestration
Azure Databricks for big data processing and analytics
Azure SQL Database for relational database management
I applied via Job Portal and was interviewed in Aug 2024. There were 3 interview rounds.
Its mandatory test even for experience people
Window functions in SQL are used to perform calculations across a set of table rows related to the current row.
Window functions are used to calculate values based on a specific subset of rows within a table.
They allow for ranking, aggregation, and other calculations without grouping the rows.
Examples of window functions include ROW_NUMBER(), RANK(), and SUM() OVER().
Delta Lake is an open-source storage layer that brings ACID transactions to Apache Spark and big data workloads.
Delta Lake is built on top of Apache Spark and provides ACID transactions for big data processing.
It allows for schema enforcement and evolution, data versioning, and time travel queries.
Delta Lake is compatible with popular data science and machine learning libraries like TensorFlow and PyTorch.
I applied via Naukri.com and was interviewed in Jan 2024. There was 1 interview round.
I applied via Naukri.com and was interviewed in Mar 2021. There were 6 interview rounds.
I applied via Recruitment Consulltant and was interviewed in Jul 2024. There were 2 interview rounds.
If a job fails in the pipeline and data processing cycle is over, it can lead to incomplete or inaccurate data.
Incomplete data may affect downstream processes and analysis
Data quality may be compromised if errors are not addressed
Monitoring and alerting systems should be in place to detect and handle failures
Re-running the failed job or implementing error handling mechanisms can help prevent issues in the future
Repartition increases the number of partitions in a DataFrame, while coalesce reduces the number of partitions without shuffling data.
Repartition involves a full shuffle of the data across the cluster, which can be expensive.
Coalesce minimizes data movement by only creating new partitions if necessary.
Repartition is typically used when increasing parallelism or evenly distributing data, while coalesce is used for reduc...
SQL code to get the city1 city2 distance of table with repeating city1 and city2 values
Use a self join on the table to match city1 and city2
Calculate the distance between the cities using appropriate formula
Consider using a subquery if needed
Data partitioning in a pipeline involves dividing data into smaller chunks for processing and analysis.
Data can be partitioned based on a specific key or attribute, such as date, location, or customer ID.
Partitioning helps distribute data processing tasks across multiple nodes or servers for parallel processing.
Common partitioning techniques include range partitioning, hash partitioning, and list partitioning.
Example: ...
EMR is a managed Hadoop framework for processing large amounts of data, while EC2 is a scalable virtual server in AWS.
EMR stands for Elastic MapReduce and is a managed Hadoop framework for processing large amounts of data.
EC2 stands for Elastic Compute Cloud and is a scalable virtual server in Amazon Web Services (AWS).
EMR allows for easy provisioning and scaling of Hadoop clusters, while EC2 provides resizable compute...
I have experience working with both Star and Snowflake schemas in my projects.
Star schema is a denormalized schema where one central fact table is connected to multiple dimension tables.
Snowflake schema is a normalized schema where dimension tables are further normalized into sub-dimension tables.
Used Star schema for simpler, smaller datasets where performance is a priority.
Used Snowflake schema for complex, larger dat...
Yes, I have used Python and PySpark in my projects for data engineering tasks.
I have used Python for data manipulation, analysis, and visualization.
I have used PySpark for big data processing and distributed computing.
I have experience in writing PySpark jobs to process large datasets efficiently.
Yes, I have experience with serverless schema.
I have worked with AWS Lambda to build serverless applications.
I have experience using serverless frameworks like Serverless Framework or AWS SAM.
I have designed and implemented serverless architectures using services like AWS API Gateway and AWS DynamoDB.
based on 1 review
Rating in categories
Software Engineer
1.4k
salaries
| ₹3.6 L/yr - ₹12.4 L/yr |
Senior Software Engineer
1.2k
salaries
| ₹6.5 L/yr - ₹22 L/yr |
Technical Lead
803
salaries
| ₹8.2 L/yr - ₹28.5 L/yr |
Associate Software Engineer
709
salaries
| ₹2 L/yr - ₹10 L/yr |
Softwaretest Engineer
609
salaries
| ₹3.2 L/yr - ₹10 L/yr |
TCS
Wipro
HCLTech
Tech Mahindra