P

Pro Llm Workspace

Enterprise-grade skill for production, ready, patterns, building. Includes structured workflows, validation checks, and reusable patterns for ai research.

SkillClipticsai researchv1.0.0MIT
0 views0 copies

Pro LLM Workspace

Overview

A professional LLM development workspace is a structured environment for building, testing, evaluating, and deploying large language model applications. It encompasses local model serving, prompt engineering workflows, evaluation pipelines, experiment tracking, and integration with both local and cloud-hosted models. This template provides a comprehensive guide to setting up a workspace that supports the full LLM development lifecycle, from rapid prototyping with local models to production deployment with cloud APIs. The workspace integrates tools like Ollama and LM Studio for local inference, LangChain and LlamaIndex for application frameworks, and evaluation tooling for systematic quality assessment. Whether you are building RAG systems, fine-tuning models, or developing agent workflows, this workspace configuration ensures reproducibility, fast iteration, and smooth transitions from development to production.

When to Use

  • Rapid LLM prototyping: Quickly test ideas with local models before committing to cloud API costs.
  • Prompt engineering workflow: Systematically develop, version, and evaluate prompts across multiple models.
  • RAG application development: Build and iterate on retrieval-augmented generation systems with local vector stores.
  • Model comparison: Evaluate multiple models (local and cloud) on the same tasks to select the best fit.
  • Fine-tuning workflows: Prepare datasets, run fine-tuning jobs, and evaluate fine-tuned models.
  • Team LLM development: Establish consistent development practices across a team building LLM applications.

Quick Start

Core Tool Installation

# Local model serving curl -fsSL https://ollama.com/install.sh | sh ollama pull llama3.1:8b ollama pull nomic-embed-text # Python environment python -m venv llm-workspace && source llm-workspace/bin/activate pip install langchain langchain-openai langchain-community pip install chromadb sentence-transformers pip install jupyter ipykernel # Evaluation and experiment tracking pip install ragas mlflow promptfoo

Workspace Directory Structure

llm-workspace/
ā”œā”€ā”€ prompts/              # Versioned prompt templates
│   ā”œā”€ā”€ v1/
│   └── v2/
ā”œā”€ā”€ data/                 # Training and evaluation data
│   ā”œā”€ā”€ eval-sets/
│   └── fine-tune/
ā”œā”€ā”€ notebooks/            # Exploration and prototyping
ā”œā”€ā”€ src/                  # Application source code
│   ā”œā”€ā”€ chains/
│   ā”œā”€ā”€ agents/
│   └── tools/
ā”œā”€ā”€ evals/                # Evaluation scripts and results
ā”œā”€ā”€ configs/              # Model and environment configs
│   ā”œā”€ā”€ models.yaml
│   └── .env
ā”œā”€ā”€ scripts/              # Utility scripts
└── mlruns/               # MLflow experiment tracking

Verify Setup

# Test local model import ollama response = ollama.chat(model="llama3.1:8b", messages=[ {"role": "user", "content": "Hello, confirm you are running locally."} ]) print(response["message"]["content"]) # Test cloud model from langchain_openai import ChatOpenAI llm = ChatOpenAI(model="gpt-4o-mini") print(llm.invoke("Confirm cloud connectivity.").content)

Core Concepts

Model Configuration Layer

# configs/models.yaml models: local: fast: provider: ollama model: llama3.1:8b base_url: http://localhost:11434 use_for: [prototyping, testing, simple-tasks] embed: provider: ollama model: nomic-embed-text base_url: http://localhost:11434 cloud: primary: provider: openai model: gpt-4o use_for: [complex-reasoning, production] fast: provider: openai model: gpt-4o-mini use_for: [classification, simple-generation] embed: provider: openai model: text-embedding-3-small routing: development: local.fast evaluation: cloud.primary production: cloud.primary embedding: local.embed

Model Router Implementation

import yaml from langchain_openai import ChatOpenAI from langchain_community.llms import Ollama class ModelRouter: def __init__(self, config_path="configs/models.yaml"): with open(config_path) as f: self.config = yaml.safe_load(f) def get_model(self, purpose: str = "development"): route = self.config["routing"].get(purpose, "development") category, name = route.split(".") model_config = self.config["models"][category][name] if model_config["provider"] == "ollama": return Ollama( model=model_config["model"], base_url=model_config["base_url"] ) elif model_config["provider"] == "openai": return ChatOpenAI(model=model_config["model"]) router = ModelRouter() dev_model = router.get_model("development") # Local Ollama prod_model = router.get_model("production") # Cloud GPT-4o

Prompt Version Management

import json from pathlib import Path from datetime import datetime class PromptManager: def __init__(self, prompts_dir="prompts"): self.dir = Path(prompts_dir) self.dir.mkdir(exist_ok=True) def save(self, name: str, template: str, metadata: dict = None): version = datetime.now().strftime("%Y%m%d_%H%M%S") prompt_dir = self.dir / name / version prompt_dir.mkdir(parents=True, exist_ok=True) (prompt_dir / "template.txt").write_text(template) (prompt_dir / "metadata.json").write_text( json.dumps({ "version": version, "created": datetime.now().isoformat(), **(metadata or {}) }, indent=2) ) return version def load(self, name: str, version: str = "latest") -> str: prompt_dir = self.dir / name if version == "latest": versions = sorted(prompt_dir.iterdir()) prompt_dir = versions[-1] else: prompt_dir = prompt_dir / version return (prompt_dir / "template.txt").read_text() pm = PromptManager() pm.save("summarizer", "Summarize the following:\n{text}\n\nKey points:") template = pm.load("summarizer", "latest")

Evaluation Pipeline

import json from dataclasses import dataclass @dataclass class EvalCase: input: str expected: str tags: list[str] = None def run_evaluation(model, eval_set: list[EvalCase], judge_model=None): results = [] for case in eval_set: output = model.invoke(case.input) score = judge_model.invoke( f"Rate 1-5 how well this output matches the expected.\n" f"Output: {output}\nExpected: {case.expected}\nScore:" ) if judge_model else None results.append({ "input": case.input, "output": str(output), "expected": case.expected, "score": score, "tags": case.tags }) return results # Usage eval_set = [ EvalCase("Summarize: AI is transforming...", "AI transforms industries..."), EvalCase("Summarize: Climate change...", "Climate change impacts..."), ] results = run_evaluation( model=router.get_model("development"), eval_set=eval_set, judge_model=router.get_model("evaluation") )

Configuration Reference

ToolPurposeConfiguration
OllamaLocal model servingOLLAMA_HOST=0.0.0.0:11434
LM StudioGUI model managementDefault port 1234
MLflowExperiment trackingMLFLOW_TRACKING_URI=./mlruns
ChromaDBLocal vector storeCHROMA_PERSIST_DIR=./chroma_db
PromptfooPrompt evaluationpromptfoo.yaml config file
LangSmithCloud tracingLANGCHAIN_TRACING_V2=true

Environment Variables

VariableDescriptionExample
OPENAI_API_KEYOpenAI API keysk-...
ANTHROPIC_API_KEYAnthropic API keysk-ant-...
OLLAMA_HOSTOllama server addresshttp://localhost:11434
LANGCHAIN_TRACING_V2Enable LangSmith tracingtrue
MLFLOW_TRACKING_URIMLflow tracking server./mlruns
HF_TOKENHugging Face access tokenhf_...

Best Practices

  1. Use local models for development, cloud for evaluation: Prototype rapidly with Ollama-served models to avoid API costs, then validate with cloud models before production deployment.

  2. Version everything: Track prompts, evaluation datasets, model configurations, and results with explicit versioning. Use Git for code and prompts, DVC for large datasets.

  3. Build evaluation sets early: Create evaluation datasets from day one. Even 20-30 curated examples provide valuable signal for detecting regressions as you iterate on prompts and models.

  4. Implement a model router: Abstract model selection behind a router that switches between local and cloud models based on the development stage. This makes it trivial to upgrade models or switch providers.

  5. Use structured output schemas: Define Pydantic models for LLM outputs from the start. This catches format errors early and makes downstream processing reliable.

  6. Track experiments with MLflow: Log every significant prompt change, model swap, or parameter tweak as an MLflow experiment. This creates an auditable history of what worked and what did not.

  7. Separate concerns in your codebase: Keep prompts, chains, tools, and evaluation logic in separate directories. This makes it easy to test and swap components independently.

  8. Test with multiple models: Regularly run your evaluation suite against different models to avoid overfitting your prompts to a single model's quirks.

  9. Automate quality gates: Set up CI scripts that run evaluation suites and block merges when quality metrics drop below thresholds.

  10. Document model-specific behaviors: Keep notes on model-specific prompt patterns, token limits, and known failure modes. This institutional knowledge accelerates onboarding and debugging.

Troubleshooting

Ollama model download fails or hangs Check available disk space (models range from 4GB to 40GB+). Verify network connectivity with curl http://localhost:11434/api/tags. Restart the Ollama service with ollama serve and retry the pull.

Local model responses are slow Ensure you are using a quantized model appropriate for your hardware. On CPU-only machines, use Q4_K_M quantizations. Check that no other processes are consuming GPU memory with nvidia-smi.

LangChain version conflicts Pin your LangChain versions explicitly: langchain==0.3.x, langchain-openai==0.2.x. Use a virtual environment per project and avoid mixing LangChain v0.2 and v0.3 dependencies.

Evaluation results inconsistent across runs Set temperature to 0 for evaluation runs. Use fixed random seeds where available. For LLM-as-a-judge evaluations, run multiple judge passes and average the scores.

Memory issues with large models locally Monitor RAM with htop. For models larger than available RAM, use smaller quantizations or switch to API-based models. Close other memory-intensive applications during local inference.

Community

Reviews

Write a review

No reviews yet. Be the first to review this template!

Similar Templates