i
Altimetrik
Filter interviews by
Clear (1)
I applied via Campus Placement and was interviewed in Aug 2021. There were 6 interview rounds.
In both aptitude and coding in the second round, aptitude mostly consists of basic problems and there are some data science problems like bias, stats and probability.
2 coding problems the ones I got are easier didn't take more than 15 minutes to solve both of them.
Gradient descent is an optimization algorithm used to minimize the cost function of a machine learning model.
Gradient descent is used to update the parameters of a model to minimize the cost function.
It follows the direction of steepest descent, which is the negative gradient of the cost function.
The learning rate determines the step size of the algorithm.
The formula for gradient descent is: theta = theta - alpha * (1/...
A dictionary sorted in ascending order based on keys.
Create a dictionary with key-value pairs
Use the sorted() function to sort the dictionary based on keys
Convert the sorted dictionary into a list of tuples
Use the dict() constructor to create a new dictionary from the sorted list of tuples
I applied via Recruitment Consultant and was interviewed in Mar 2021. There was 1 interview round.
Code for parsing a triangle
Use a loop to iterate through each line of the triangle
Split each line into an array of numbers
Store the parsed numbers in a 2D array or a list of lists
The ASCII value is a numerical representation of a character. It includes both capital and small alphabets.
ASCII values range from 65 to 90 for capital letters A to Z.
ASCII values range from 97 to 122 for small letters a to z.
For example, the ASCII value of 'A' is 65 and the ASCII value of 'a' is 97.
I applied via Campus Placement and was interviewed before Jul 2020. There was 1 interview round.
posted on 23 Jan 2024
I rate myself highly in PL/SQL with expertise in mview, index, CTE, and merge statement.
I have extensive knowledge and experience in writing PL/SQL code.
I am proficient in creating and managing materialized views (mview) to improve query performance.
I am skilled in creating and managing indexes to optimize database performance.
I am well-versed in using Common Table Expressions (CTE) for complex queries and recursive op...
I applied via Naukri.com and was interviewed in Sep 2024. There was 1 interview round.
To create a pipeline in ADF, you can use the Azure Data Factory UI or code-based approach.
Use Azure Data Factory UI to visually create and manage pipelines
Use code-based approach with JSON to define pipelines and activities
Add activities such as data movement, data transformation, and data processing to the pipeline
Set up triggers and schedules for the pipeline to run automatically
Activities in pipelines include data extraction, transformation, loading, and monitoring.
Data extraction: Retrieving data from various sources such as databases, APIs, and files.
Data transformation: Cleaning, filtering, and structuring data for analysis.
Data loading: Loading processed data into a data warehouse or database.
Monitoring: Tracking the performance and health of the pipeline to ensure data quality and reliab
getmetadata is used to retrieve metadata information about a dataset or data source.
getmetadata can provide information about the structure, format, and properties of the data.
It can be used to understand the data schema, column names, data types, and any constraints or relationships.
This information is helpful for data engineers to properly process, transform, and analyze the data.
For example, getmetadata can be used ...
Triggers in databases are special stored procedures that are automatically executed when certain events occur.
Types of triggers include: DML triggers (for INSERT, UPDATE, DELETE operations), DDL triggers (for CREATE, ALTER, DROP operations), and logon triggers.
Triggers can be classified as row-level triggers (executed once for each row affected by the triggering event) or statement-level triggers (executed once for eac...
Normal cluster is used for interactive workloads while job cluster is used for batch processing in Databricks.
Normal cluster is used for ad-hoc queries and exploratory data analysis.
Job cluster is used for running scheduled jobs and batch processing tasks.
Normal cluster is terminated after a period of inactivity, while job cluster is terminated after the job completes.
Normal cluster is more cost-effective for short-liv...
Slowly changing dimensions refer to data warehouse dimensions that change slowly over time.
SCDs are used to track historical changes in data over time.
There are three types of SCDs - Type 1, Type 2, and Type 3.
Type 1 SCDs overwrite old data with new data, Type 2 creates new records for changes, and Type 3 maintains both old and new data in separate columns.
Example: A customer's address changing would be a Type 2 SCD.
Ex...
Use Python's 'with' statement to ensure proper resource management and exception handling.
Use 'with' statement to automatically close files after use
Helps in managing resources like database connections
Ensures proper cleanup even in case of exceptions
List is mutable, tuple is immutable in Python.
List can be modified after creation, tuple cannot be modified.
List uses square brackets [], tuple uses parentheses ().
Lists are used for collections of items that may need to be changed, tuples are used for fixed collections of items.
Example: list_example = [1, 2, 3], tuple_example = (4, 5, 6)
Datalake 1 and Datalake 2 are both storage systems for big data, but they may differ in terms of architecture, scalability, and use cases.
Datalake 1 may use a Hadoop-based architecture while Datalake 2 may use a cloud-based architecture like AWS S3 or Azure Data Lake Storage.
Datalake 1 may be more suitable for on-premise data storage and processing, while Datalake 2 may offer better scalability and flexibility for clou...
To read a file in Databricks, you can use the Databricks File System (DBFS) or Spark APIs.
Use dbutils.fs.ls('dbfs:/path/to/file') to list files in DBFS
Use spark.read.format('csv').load('dbfs:/path/to/file') to read a CSV file
Use spark.read.format('parquet').load('dbfs:/path/to/file') to read a Parquet file
Star schema is denormalized with one central fact table surrounded by dimension tables, while snowflake schema is normalized with multiple related dimension tables.
Star schema is easier to understand and query due to denormalization.
Snowflake schema saves storage space by normalizing data.
Star schema is better for data warehousing and OLAP applications.
Snowflake schema is better for OLTP systems with complex relationsh
repartition increases partitions while coalesce decreases partitions in Spark
repartition shuffles data and can be used for increasing partitions for parallelism
coalesce reduces partitions without shuffling data, useful for reducing overhead
repartition is more expensive than coalesce as it involves data movement
example: df.repartition(10) vs df.coalesce(5)
Parquet file format is a columnar storage format used for efficient data storage and processing.
Parquet files store data in a columnar format, which allows for efficient querying and processing of specific columns without reading the entire file.
It supports complex nested data structures like arrays and maps.
Parquet files are highly compressed, reducing storage space and improving query performance.
It is commonly used ...
I applied via Naukri.com and was interviewed in Oct 2024. There were 2 interview rounds.
Spark performance problems can arise due to inefficient code, data skew, resource constraints, and improper configuration.
Inefficient code can lead to slow performance, such as using collect() on large datasets.
Data skew can cause uneven distribution of data across partitions, impacting processing time.
Resource constraints like insufficient memory or CPU can result in slow Spark jobs.
Improper configuration settings, su...
Improving query performance by optimizing indexes, using proper data types, and minimizing data retrieval.
Optimize indexes on frequently queried columns
Use proper data types to reduce storage space and improve query speed
Minimize data retrieval by only selecting necessary columns
Avoid using SELECT * in queries
Use query execution plans to identify bottlenecks and optimize accordingly
SCD type2 table is used to track historical changes in data by creating new records for each change.
Contains current and historical data
New records are created for each change
Includes effective start and end dates for each record
Requires additional columns like surrogate keys and version numbers
Used for slowly changing dimensions in data warehousing
I applied via LinkedIn and was interviewed in Jul 2023. There were 3 interview rounds.
Python code and sql query they ask
I applied via Naukri.com and was interviewed in Apr 2023. There were 3 interview rounds.
Finding index of 2 numbers having total equal to target in a list without nested for loop.
Use dictionary to store the difference between target and each element of list.
Iterate through list and check if element is in dictionary.
Return the indices of the two elements that add up to target.
Random forest and KNN are machine learning algorithms used for classification and regression tasks.
Random forest is an ensemble learning method that constructs multiple decision trees and combines their outputs to make a final prediction.
KNN (k-nearest neighbors) is a non-parametric algorithm that classifies new data points based on the majority class of their k-nearest neighbors in the training set.
Random forest is us...
To find unique keys in 2 dictionaries.
Create a set of keys for each dictionary
Use set operations to find the unique keys
Return the unique keys
AWS EC2 model deployment involves creating an instance, installing necessary software, and deploying the model.
Create an EC2 instance with the desired specifications
Install necessary software and dependencies on the instance
Upload the model and any required data to the instance
Deploy the model using a web server or API
Monitor the instance and model performance for optimization
Overloading is the ability to define multiple methods with the same name but different parameters.
Overloading allows for more flexibility in method naming and improves code readability.
Examples include defining multiple constructors for a class with different parameter lists or defining a method that can accept different data types as input.
Overloading is resolved at compile-time based on the number and types of argume...
Some of the top questions asked at the Altimetrik Data Science Intern interview -
based on 1 review
Rating in categories
Senior Software Engineer
1.2k
salaries
| ₹0 L/yr - ₹0 L/yr |
Staff Engineer
865
salaries
| ₹0 L/yr - ₹0 L/yr |
Senior Engineer
638
salaries
| ₹0 L/yr - ₹0 L/yr |
Software Engineer
313
salaries
| ₹0 L/yr - ₹0 L/yr |
Staff Software Engineer
236
salaries
| ₹0 L/yr - ₹0 L/yr |
Accenture
Persistent Systems
Mphasis
LTIMindtree