i
Photon Interactive
Filter interviews by
I applied via Naukri.com and was interviewed in Jul 2024. There was 1 interview round.
Large data processing in Pyspark involves partitioning, caching, and optimizing transformations for efficient processing.
Partitioning data to distribute workload evenly across nodes
Caching intermediate results to avoid recomputation
Optimizing transformations to minimize shuffling and reduce data movement
Data governance is implemented through policies, processes, and tools to ensure data quality, security, and compliance.
Establish data governance policies and procedures to define roles, responsibilities, and processes for managing data
Implement data quality controls to ensure accuracy, completeness, and consistency of data
Utilize data security measures such as encryption, access controls, and monitoring to protect sens...
Two data lineage tools are Apache Atlas and Informatica Enterprise Data Catalog.
Apache Atlas is an open source tool for metadata management and governance in Hadoop ecosystems.
Informatica Enterprise Data Catalog provides a comprehensive data discovery and metadata management solution.
I applied via Naukri.com and was interviewed in Nov 2024. There was 1 interview round.
I was interviewed in Oct 2024.
Designing an ADF pipeline for data processing
Identify data sources and destinations
Define data transformations and processing steps
Consider scheduling and monitoring requirements
Utilize ADF activities like Copy Data, Data Flow, and Databricks
Implement error handling and logging mechanisms
Discussing expected and current salary for negotiation purposes.
Be honest about your current salary and provide a realistic expectation for your desired salary.
Highlight your skills and experience that justify your desired salary.
Be open to negotiation and willing to discuss other benefits besides salary.
Research industry standards and salary ranges for similar positions to support your negotiation.
Focus on the value y...
I applied via Naukri.com and was interviewed in Oct 2024. There were 2 interview rounds.
Spark performance problems can arise due to inefficient code, data skew, resource constraints, and improper configuration.
Inefficient code can lead to slow performance, such as using collect() on large datasets.
Data skew can cause uneven distribution of data across partitions, impacting processing time.
Resource constraints like insufficient memory or CPU can result in slow Spark jobs.
Improper configuration settings, su...
I applied via Recruitment Consulltant and was interviewed in Sep 2024. There were 2 interview rounds.
Accumulators are shared variables that are updated by worker nodes and can be used for aggregating information across tasks.
Accumulators are used for implementing counters and sums in Spark.
They are only updated by worker nodes and are read-only by the driver program.
Accumulators are useful for debugging and monitoring purposes.
Example: counting the number of errors encountered during processing.
Spark architecture is a distributed computing framework that consists of a driver program, cluster manager, and worker nodes.
Spark architecture includes a driver program that manages the execution of the Spark application.
It also includes a cluster manager that allocates resources and schedules tasks on worker nodes.
Worker nodes are responsible for executing the tasks and storing data in memory or disk.
Spark architectu...
Query to find duplicate data using SQL
Use GROUP BY and HAVING clause to identify duplicate records
Select columns to check for duplicates
Use COUNT() function to count occurrences of each record
Pub/sub is a messaging pattern where senders (publishers) of messages do not program the messages to be sent directly to specific receivers (subscribers).
Pub/sub stands for publish/subscribe.
Publishers send messages to a topic, and subscribers receive messages from that topic.
It allows for decoupling of components in a system, enabling scalability and flexibility.
Examples include Apache Kafka, Google Cloud Pub/Sub, and
I have used services like BigQuery, Dataflow, Pub/Sub, and Cloud Storage in GCP.
BigQuery for data warehousing and analytics
Dataflow for real-time data processing
Pub/Sub for messaging and event ingestion
Cloud Storage for storing data and files
I applied via Approached by Company and was interviewed in Nov 2024. There were 3 interview rounds.
I applied via Campus Placement
Based on SQL , statistics , python , cognitive
Address toxic work culture by open communication, setting boundaries, seeking support, and considering leaving if necessary.
Open communication with colleagues and management about issues
Set boundaries to protect your mental and emotional well-being
Seek support from HR, a mentor, or a therapist if needed
Consider leaving the toxic work environment if the situation does not improve
posted on 25 Sep 2024
I applied via Walk-in and was interviewed in Aug 2024. There were 5 interview rounds.
Maths grammar & communication
You're like this job opportunity
I applied via Naukri.com and was interviewed in Oct 2024. There was 1 interview round.
Incremental load in pyspark refers to loading only new or updated data into a dataset without reloading the entire dataset.
Use the 'delta' function in pyspark to perform incremental loads by specifying the 'mergeSchema' option.
Utilize the 'partitionBy' function to optimize incremental loads by partitioning the data based on specific columns.
Implement a logic to identify new or updated records based on timestamps or uni...
I applied via Recruitment Consulltant and was interviewed in Jul 2024. There was 1 interview round.
Senior Software Engineer
986
salaries
| ₹6 L/yr - ₹23 L/yr |
Software Engineer
505
salaries
| ₹3.2 L/yr - ₹13 L/yr |
Technical Lead
399
salaries
| ₹10 L/yr - ₹27 L/yr |
Softwaretest Engineer
136
salaries
| ₹2.8 L/yr - ₹11.4 L/yr |
Project Manager
109
salaries
| ₹8.5 L/yr - ₹24.5 L/yr |
TCS
Infosys
Wipro
HCLTech