At C4Scale, we are seeking passionate engineers to this role that will focus on the design, development, fine-tune and optimization of Generative AI LLM Models & applications, leveraging cutting-edge technologies to solve real-world problems.
You will play a key part in building GenAI-driven products with a #YC-funded company
You will work smart engineers that are building SOTA technologies to deliver APIs/data in millisecond latencies.
What You Will Do:
Fine-tune large language models (LLMs) on code and text datasets to optimize performance for specific use cases and applications.
Develop and deploy LLM-based and NLP applications leveraging state-of-the-art AI/ML technologies.
Design and maintain scalable Python API backends to serve models in production environments, ensuring high performance, scalability, security, and reliability.
Conduct experiments to improve model accuracy, efficiency, and latency through hyperparameter tuning, optimization techniques, and advanced training strategies.
Collaborate with product teams to align AI capabilities with business goals.
Monitor, evaluate, and fine-tune deployed models to ensure they meet KPIs and deliver an excellent user experience.
Stay updated on advancements in AI/ML, Generative AI, and related fields to incorporate new techniques and tools into the development process.
What You Will Need:
You must have a minimum of 4 years of overall experience of which a minimum of 1 or more projects involved in fine-tuning models
Strong experience building domain-specific LLMs with fine-tuning and deploying large language models like GPT, LLaMA, Mistral, T5, or similar.
Proficiency in Python and experience with frameworks like PyTorch or TensorFlow for AI/ML development.
Proficiency in Retrieval-Augmented Generation (RAG) solutions with vector databases (e.g., Pinecone, ChromaDB), frameworks like LangChain, LlamaIndex, and agentic frameworks like Langflow or Crew AI for automation tasks.
Expertise in building and managing APIs using Python-based frameworks such as FastAPI, Flask, or Django.
Preferred Qualifications / Added Advantage:
Solid understanding of transformer architectures and knowledge of pre-training, fine-tuning, and transfer learning methodologies.
A proactive approach to solving complex problems, coupled with excellent collaboration and communication skills.
Understanding of containerization and orchestration technologies (Docker, Kubernetes).
Knowledge in any of cloud platforms (AWS, GCP, Azure) and MLOps tools for managing AI/ML workloads in production.
What We Offer:
MacOS Laptop.
Competitive salary and benefits package.
Opportunity to work on cutting-edge technologies and challenging projects.