Filter interviews by
I applied via Referral and was interviewed before Oct 2023. There was 1 interview round.
Apache Spark is a distributed computing framework that provides in-memory processing capabilities for big data analytics.
Apache Spark is designed for speed and ease of use in processing large datasets.
It supports multiple programming languages such as Scala, Java, Python, and R.
Spark provides high-level APIs like Spark SQL for structured data processing and Spark Streaming for real-time data processing.
It includes libr...
Streaming solutions involve real-time data processing and delivery.
Use Azure Stream Analytics for real-time data processing
Utilize Azure Event Hubs for event ingestion at scale
Consider Azure Media Services for video streaming
Implement Azure Functions for serverless processing of streaming data
Top trending discussions
I applied via Approached by Company and was interviewed before Sep 2021. There was 1 interview round.
I applied via Naukri.com and was interviewed in Nov 2024. There were 2 interview rounds.
I applied via Recruitment Consulltant and was interviewed in Aug 2024. There were 3 interview rounds.
The output after inner join of table 1 and table 2 will be 2,3,5.
Inner join only includes rows that have matching values in both tables.
Values 2, 3, and 5 are present in both tables, so they will be included in the output.
Null values are not considered as matching values in inner join.
The project involves building a data pipeline to ingest, process, and analyze large volumes of data from various sources in Azure.
Utilizing Azure Data Factory for data ingestion and orchestration
Implementing Azure Databricks for data processing and transformation
Storing processed data in Azure Data Lake Storage
Using Azure Synapse Analytics for data warehousing and analytics
Leveraging Azure DevOps for CI/CD pipeline aut
Designing an effective ADF pipeline involves considering various metrics and factors.
Understand the data sources and destinations
Identify the dependencies between activities
Optimize data movement and processing for performance
Monitor and track pipeline execution for troubleshooting
Consider security and compliance requirements
Use parameterization and dynamic content for flexibility
Implement error handling and retries fo
I applied via Recruitment Consulltant and was interviewed in Nov 2024. There was 1 interview round.
To build a Docker file with a specific tag, you can use the 'docker build' command with the '-t' flag followed by the desired tag.
Use the 'docker build' command with the '-t' flag to specify the tag.
Example: docker build -t myimage:latest .
Replace 'myimage' with the desired image name and 'latest' with the desired tag.
Apache Maven is commonly used for building Java applications.
Apache Maven is a popular build automation tool used for Java projects.
It simplifies the build process by providing a standard way to structure projects and manage dependencies.
Maven uses a Project Object Model (POM) file to define project settings and dependencies.
Example: mvn clean install command is used to build and package a Java project using Maven.
posted on 8 Nov 2024
Terraform is an open-source infrastructure as code software tool created by HashiCorp.
Terraform allows users to define and provision infrastructure using a declarative configuration language.
It supports multiple cloud providers such as AWS, Azure, Google Cloud, and more.
Terraform uses 'terraform plan' to create an execution plan and 'terraform apply' to apply the changes.
It helps in automating the creation, modificatio...
Azure DevOps is a set of development tools provided by Microsoft to help teams collaborate and deliver high-quality software.
Azure DevOps includes services such as Azure Repos, Azure Pipelines, Azure Boards, Azure Artifacts, and Azure Test Plans.
It allows for version control, continuous integration/continuous deployment (CI/CD), project management, and testing.
Teams can plan, build, test, and deploy applications using ...
CI/CD pipelines automate the process of building, testing, and deploying code changes.
CI/CD stands for Continuous Integration/Continuous Deployment
Automates the process of integrating code changes into a shared repository and deploying them to production
Helps in detecting and fixing integration errors early in the development process
Enables faster delivery of software updates and improvements
Popular tools for CI/CD pip...
Docker is a platform for developing, shipping, and running applications in containers. Kubernetes is a container orchestration tool for managing containerized applications across a cluster of nodes.
Docker allows developers to package applications and their dependencies into containers for easy deployment.
Kubernetes automates the deployment, scaling, and management of containerized applications.
Docker containers are lig...
Activities in Azure Data Factory (ADF) are the building blocks of a pipeline and perform various tasks like data movement, data transformation, and data orchestration.
Activities can be used to copy data from one location to another (Copy Activity)
Activities can be used to transform data using mapping data flows (Data Flow Activity)
Activities can be used to run custom code or scripts (Custom Activity)
Activities can be u...
Dataframes in pyspark are distributed collections of data organized into named columns.
Dataframes are similar to tables in a relational database, with rows and columns.
They can be created from various data sources like CSV, JSON, Parquet, etc.
Dataframes support SQL queries and transformations using PySpark functions.
Example: df = spark.read.csv('file.csv')
I applied via Naukri.com
I applied via Recruitment Consulltant and was interviewed in Mar 2024. There was 1 interview round.
I connect onPrem to Azure using Azure ExpressRoute or VPN Gateway.
Use Azure ExpressRoute for private connection through a dedicated connection.
Set up a VPN Gateway for secure connection over the internet.
Ensure proper network configurations and security settings.
Use Azure Virtual Network Gateway to establish the connection.
Consider using Azure Site-to-Site VPN for connecting onPremises network to Azure Virtual Network.
Autoloader in Databricks is a feature that automatically loads new data files as they arrive in a specified directory.
Autoloader monitors a specified directory for new data files and loads them into a Databricks table.
It supports various file formats such as CSV, JSON, Parquet, Avro, and ORC.
Autoloader simplifies the process of ingesting streaming data into Databricks without the need for manual intervention.
It can be ...
Json data normalization involves structuring data to eliminate redundancy and improve efficiency.
Identify repeating groups of data
Create separate tables for each group
Establish relationships between tables using foreign keys
Eliminate redundant data by referencing shared values
I applied via Job Fair and was interviewed in Dec 2023. There was 1 interview round.
The problem of slow performance in Amazon can be attributed to various factors.
Insufficient server capacity leading to high latency
Network congestion causing delays in data transfer
Inefficient code or algorithms affecting processing speed
Inadequate optimization of database queries
Heavy traffic load impacting overall system performance
Amazon's product is a popular online marketplace and cloud computing platform.
Amazon's product offers a wide range of products and services for customers and businesses.
It allows individuals and companies to sell and buy products online.
Amazon's product also provides cloud computing services through Amazon Web Services (AWS).
Some examples of Amazon's product include Amazon Prime, Amazon Echo, and Amazon Web Services (A
TCS
Accenture
Wipro
Cognizant