Overview
Learning Materials
- Intro to Large Language Models by Andrej Karpathy
- Short Courses by DeepLearning.AI
- What We Learned from a Year of Building with LLMs (Part I)
Local LLMs
Evaluations
- A Practical Guide to RAG Pipeline Evaluation (Part 1: Retrieval)
- A Practical Guide to RAG Pipeline Evaluation (Part 2: Generation)
- How important is a Golden Dataset for LLM evaluation?
- Case Study: Reference-free vs Reference-based evaluation of RAG pipeline
- How to evaluate complex GenAI Apps: a granular approach
- Generate Synthetic Data to Test LLM Applications
Leaderboards
Language Models
- LMSYS Chatbot Arena
- SEAL Leaderboards
- Comparing models for quality, speed, price, etc.
- Hallucinations: Vectara, Hallucinations
- Code Generation: BigCode
- Tools/Functions: Gorilla, Nexus, Toolbench
- Performance (latency, throughput, memory, etc.)
- Enterprise Scenarios