83 Varite Jobs
Data Scientist - LLM Models (16-20 yrs)
Varite
posted 5d ago
About The Job:
As an LLM Data Scientist, you will be at the forefront of leveraging large language models to extract valuable insights and patterns from textual data. Your role will involve harnessing the power of advanced natural language processing (NLP) techniques to enhance data-driven decision-making and contribute to the success of our LLM project.
Essential Job Functions:
NLP and Language Model Integration - Apply advanced NLP techniques to preprocess, analyze, and extract meaningful information from large textual datasets. Integrate and leverage large language models such as LLaMA2/3, Mistral or similar offline LLM models to address project-specific goals.
Small LLMs / Tiny LLMs: Familarity and understanding of usage of SLMs / Tiny LLMs like phi3, OpenELM etc and their performance characteristics and usage requirements and nuances of how they can be consumed by use case applications.
Feature Engineering for Text Data - Engineer features to capture the nuances of textual data and optimize model performance. Implement strategies for effective representation of linguistic structures within the project's context.
Model Fine-Tuning and Optimization - Fine-tune pre-trained offline large language models like LLaMA2/3 to align with the project's objectives and Enterprise-specific requirements. Optimize language models for efficiency, coherence, and relevance within the project scope. Solid understanding of various fine-tuning recipes & RAG techniques used on base LLM Models.
Performance Monitoring and Optimization - Evaluate the performance of LLMs and implement optimizations to enhance efficiency and accuracy.
Text Generation and Creative Applications - Explore creative applications of large language models, including text generation, summarization, and context-aware responses.
Qualifications:
16 to 20 Years Exp
Skills & Tools :
Programming Languages
- Proficiency in Python for data analysis, statistical modeling, and machine learning.
Machine Learning Libraries
- Hands-on experience with machine learning libraries such as scikit-learn, Huggingface, TensorFlow, and PyTorch.
- Understanding of various fine-tuning recipes for data preparation and objectives of training (like next token prediction, fill in the middle etc) for both code and documents.
- Have experience in configuring fine tuning infrastructure like GPU memory, compute for different model finetuning techniques and model sizes.
- Solid understanding of various fine tuning techniques like full fine tuning, PEFT techniques like LoRA, QLoRA and the strategy to adopt for various use cases
Statistical Analysis
- Strong understanding of statistical techniques and their application in data analysis.
Data Manipulation and Analysis
- Expertise in data manipulation and analysis using Pandas and NumPy.
Communication Skills
- Excellent communication skills with the ability to convey technical concepts to non-technical audiences.
Functional Areas: Analytics & Business Intelligence
Read full job descriptionPrepare for Data Scientist roles with real interview advice