From: Svjatoslav Agejenko Date: Sun, 7 Dec 2025 11:50:56 +0000 (+0200) Subject: Update example configuration and documentation to include additional model parameters... X-Git-Url: http://www2.svjatoslav.eu/gitweb/?a=commitdiff_plain;h=cb728a0003d5a3f8370c6248d57a74aceb2b1fb1;p=alyverkko-cli.git Update example configuration and documentation to include additional model parameters and detailed concept glossary for Älyverkko CLI usage. --- diff --git a/doc/examples/alyverkko-cli.yaml b/doc/examples/alyverkko-cli.yaml index 5becf00..b84f709 100644 --- a/doc/examples/alyverkko-cli.yaml +++ b/doc/examples/alyverkko-cli.yaml @@ -1,16 +1,33 @@ -tasks_directory: "/home/user/AI/tasks" -models_directory: "/home/user/AI/models" +tasks_directory: "/home/john/AI/tasks" +models_directory: "/home/john/AI/models" +skills_directory: "/home/john/.config/alyverkko-cli/skills" +llama_cli_path: "/home/john/AI/llama.cpp/build/bin/llama-cli" + default_temperature: 0.7 -llama_cpp_dir_path: "/home/user/AI/llama.cpp/" + batch_thread_count: 10 thread_count: 6 -skills_directory: "/home/user/.config/alyverkko-cli/skills" + models: + + - alias: "qwen3-next-80b-A3B-thinking" + filesystem_path: "Qwen3-Next-80B-A3B-Thinking-UD-Q4_K_XL.gguf" + context_size_tokens: 131072 + temperature: 0.6 + top_p: 0.95 + top_k: 20 + min_p: 0 + - alias: "default" - filesystem_path: "WizardLM-2-8x22B.Q5_K_M-00001-of-00005.gguf" - context_size_tokens: 64000 - end_of_text_marker: null - - alias: "mistral" - filesystem_path: "Mistral-Large-Instruct-2407.Q8_0.gguf" - context_size_tokens: 32768 - end_of_text_marker: null + filesystem_path: "Alibaba-NLP_Tongyi-DeepResearch-30B-A3B-Q8_0.gguf" + context_size_tokens: 131072 + temperature: 0.85 + repeat_penalty: 1.1 + top_p: 0.95 + + - alias: "tongyi" + filesystem_path: "Alibaba-NLP_Tongyi-DeepResearch-30B-A3B-Q8_0.gguf" + context_size_tokens: 131072 + temperature: 0.85 + repeat_penalty: 1.1 + top_p: 0.95 diff --git a/doc/index.org b/doc/index.org index aec64ac..35e2e7e 100644 --- a/doc/index.org +++ b/doc/index.org @@ -130,14 +130,6 @@ Draft an outline for a book on science fiction or improve its plot. Here is [[https://www.svjatoslav.eu/writing/Whispers%20in%20the%20Stream%20of%20Time.html][example sci-fi book]] that was written with the help of *Älyverkko CLI*. -* Getting started - -When you first encounter Älyverkko CLI, the setup process might seem -involved compared to cloud-based AI services. That's completely -understandable! Let me walk you through why each step exists and how -it ultimately creates a powerful, private, and cost-effective AI -assistant that works /for you/. - ** Why Bother With This Setup? (The Big Picture) Before diving into steps, let's address the elephant in the room: *Why @@ -158,6 +150,564 @@ This isn't designed for real-time chatting (CPU inference is slow), but for substantial tasks where quality matters more than speed: code generation, document analysis, content creation, etc. +* Concept glossary +** General concepts +*** Task + +A /task/ represents a single unit of work submitted to the Älyverkko +CLI system for AI processing. It consists of two core components: + +- a [[id:89bc60f0-89d4-4e10-ae80-8f824f2e3c55][system prompt]] (defining the AI's role/behavior) and +- a [[id:009e5410-f852-4faa-b81a-f9c98b056ae3][user prompt]] (the specific request or question). + +Tasks are implemented as plain text files that begin with a +[[id:cd4b622a-6b74-4fac-85fe-f5b056367824]["TOCOMPUTE" header]] line specifying processing parameters. When +processed, the system appends the AI's response in structured format +and renames the file with a =DONE:= prefix. Tasks represent the +fundamental interaction pattern between users and the system - you +create a task file, Älyverkko CLI processes it while you work on other +things, and later you receive the completed response. The asynchronous +nature makes this ideal for CPU-based batch processing where responses +may take minutes to hours. + +*** Skill +A /skill/ is a predefined behavioral configuration for the AI, +implemented as a YAML file in the skills directory. Each skill +defines: + +- A [[id:89bc60f0-89d4-4e10-ae80-8f824f2e3c55][system prompt]] that establishes the AI's role and behavior + +- Optional generation parameters ([[id:24a0a54b-828b-4c78-8208-504390848fbc][temperature]], [[id:047f5bf7-e964-49ac-a666-c7ac75754e54][top-p]] , etc.) + +- The == placeholder where user input gets injected + +Skills function as specialized "personas" for different task +types. For example, a =summary.yaml= skill might contain instructions +for concise text summarization, while a =writer.yaml= skill could +optimize for creative prose. The power of skills lies in their +reusability - once defined, you can apply the same behavioral +configuration across countless tasks by simply referencing the skill +name in your task [[id:cd4b622a-6b74-4fac-85fe-f5b056367824][TOCOMPUTE:]] header. Skills abstract away repetitive +instructions, letting you focus on the actual content of your request +rather than constantly redefining how the AI should behave. + +*** Model + +A /model/ refers to a specific AI language model implementation in +GGUF format, capable of processing tasks. Each model is configured +with: +- An alias (e.g., "default", "mistral") +- File name of the GGUF model file +- Context size (maximum tokens processable) +- Optional generation parameters +- Optional end-of-text marker + +Models represent the underlying neural network "brains" of the system. +While skills define /how/ the AI should behave, models determine +/what/ the AI is capable of. Larger models (e.g., 70B+ parameters) +generally produce higher quality outputs but require more RAM and +process slower. The system supports multiple registered models, +allowing you to select the appropriate capability/performance tradeoff +for each task via the =model== parameter in your task file. Models are +typically stored in the =models_directory= and must be compatible with +llama.cpp for CPU-based inference. + +*** "TOCOMPUTE" Marker +:PROPERTIES: +:ID: cd4b622a-6b74-4fac-85fe-f5b056367824 +:END: + +The =TOCOMPUTE:= marker is a special header line that /must/ appear as +the first line of any task file to trigger processing. + +Example: +#+begin_example +TOCOMPUTE: skill=default model=default priority=5 +#+end_example + +This line specifies three critical parameters: +- =skill==: Which behavioral configuration to use (default: "default") +- =model==: Which AI model to execute the task (default: "default") +- =priority==: Integer determining processing order (higher = sooner) + +The presence of this marker transforms an ordinary text file into an +executable task. Älyverkko CLI ignores files without this header, +allowing you to safely save draft versions. When you're ready for +processing, simply add this line and save the file - the daemon will +detect the change within seconds and queue the task. This marker-based +system enables asynchronous workflow: prepare your task at your pace, +then signal completion with this single line. + +*** "DONE" Marker +:PROPERTIES: +:ID: aa69e23a-248a-4459-a36e-74b43948dba9 +:END: + +The =DONE:= marker appears as the first line of processed task files, +replacing the original =TOCOMPUTE:= line. Its format documents exactly +how the task was processed: + +#+begin_example +DONE: skill=default model=default duration=2m +#+end_example + +This line records: +- Which skill was used +- Which model processed the task +- How long processing took (in seconds/minutes/hours) + +The DONE marker serves multiple critical functions: it prevents +reprocessing of completed tasks, provides an audit trail of processing +parameters, and gives immediate visual feedback about the task's +execution environment. Combined with the structured =* USER:= and =* +ASSISTANT:= sections that follow, it creates a self-documenting +conversation history that preserves both the original request and AI +response in context. This format enables iterative refinement - you +can review the AI's response, add follow-up questions, and re-add a +=TOCOMPUTE:= line to continue the conversation. + +*** Priority + +/Priority/ is an integer value specified in the =TOCOMPUTE:= header +(e.g., =priority=10=) that determines task processing order. Higher +integer values indicate higher priority - a task with =priority=10= +will process before one with =priority=5=. The system uses a priority +queue that processes tasks in descending priority order, with random +tiebreakers for equal priorities. + +This feature is essential for managing multiple concurrent tasks. For +example: +- Urgent tasks: =priority=100= +- Normal tasks: =priority=0= (default) +- Low priority background tasks: =priority=-10= + +When you have many tasks queued (e.g., overnight processing), priority +ensures critical work gets attention first. The flexible integer +system allows fine-grained control - you're not limited to just +"high/medium/low" but can create nuanced priority tiers matching your +workflow. Note that extremely high priorities won't make processing +faster (that depends on model/hardware), but will ensure those tasks +jump the queue. + +*** System Prompt +:PROPERTIES: +:ID: 89bc60f0-89d4-4e10-ae80-8f824f2e3c55 +:END: + +The /system prompt/ is the foundational instruction set that defines +the AI's role, behavior, and constraints for a task. It's implemented +through skills as the =prompt= field in YAML files, containing the +special == placeholder where user input gets injected. + +Characteristics of effective system prompts: +- Establish clear role ("You are an expert Python developer...") +- Define output format requirements +- Set behavioral boundaries +- Include domain-specific knowledge + +For example, a code review skill's system prompt might: +1. Instruct the AI to analyze for security vulnerabilities +2. Require responses in markdown with specific sections +3. Specify ignoring certain file types +4. Define severity classification standards + +The system prompt operates "behind the scenes" - users never see it +directly in task files, only its influence on the AI's responses. +Well-crafted prompts dramatically improve output quality by providing +consistent context across all tasks using that skill. They represent +the primary mechanism for customizing AI behavior without retraining +models. + +*** User Prompt +:PROPERTIES: +:ID: 009e5410-f852-4faa-b81a-f9c98b056ae3 +:END: + +The /user prompt/ is the specific request, question, or content you +provide as input to the AI within a task file. It appears after the +=TOCOMPUTE:= header and forms the substantive content the AI will +process. + +Effective user prompts typically: +- Clearly state the desired outcome +- Provide sufficient context +- Specify any constraints or requirements +- Reference relevant materials when needed + +For example, a good user prompt for code generation might: + +#+begin_example +Generate a Python function that processes CSV files, handling: +- Missing values by interpolation +- Date formatting in ISO 8601 +- Memory efficiency for large files + +Include docstrings and type hints. Target Python 3.10+. +#+end_example + +Unlike the [[id:89bc60f0-89d4-4e10-ae80-8f824f2e3c55][system prompt]] (which defines /how/ the AI behaves), the +user prompt defines /what/ specific work should be done. It's where +you bring your domain knowledge and task requirements to the +interaction. Well-structured prompts yield significantly better +results - the AI can only work with what you provide. + +*** Model Library + +The /model library/ is the internal registry of all available AI +models configured in the system. It's constructed during startup +from: +- The =models= list in the [[id:fd687508-0a76-4fee-9a1c-4031cb403c60][configuration file]] +- Verified model files in the models directory + +Key functions of the model library: +- Validates model file existence +- Resolves relative/absolute paths +- Provides model lookup by alias +- Manages default model selection + +When you run =alyverkko-cli listmodels=, it queries this library to +show available models (marking missing files with "-missing"). The +library ensures that when a task specifies =model=mistral=, the system +can locate the correct GGUF file and its associated parameters. It +serves as the critical bridge between your configuration and the +actual model files on disk, handling all path resolution and +validation so your tasks can reference models by simple aliases. + +*** GGUF Format + +/GGUF/ is the binary model format used by llama.cpp for AI inference. + +Key advantages for Älyverkko CLI users: +- Enables CPU-only operation (no GPU required) +- Multiple quantization levels (Q4_K, Q8_0, etc.) +- Active development community + +When downloading models, you'll typically see filenames like +=model-Q4_K_M.gguf= where the suffix indicates quantization +level. Lower quantization (Q4) uses less RAM but sacrifices some +quality; higher (Q8) preserves more accuracy at greater memory +cost. The format's efficiency is why you can run 70B+ parameter models +on consumer hardware - a 4-bit quantized 70B model requires "only" +~40GB RAM versus hundreds of GB for full precision. + +*** llama.cpp + +/llama.cpp/ is the open-source inference engine that powers Älyverkko +CLI's CPU-based AI processing. It's a critical dependency, in +particular a standalone executable (=llama-cli=) that handles: + +- Loading GGUF format models +- Tokenization and detokenization +- Core neural network computations +- Generation parameter application + +Key features enabling Älyverkko CLI's functionality: +- Optimized CPU kernels for AVX2/AVX512 +- Quantization support for memory efficiency +- Batched/unattended processing capabilities +- Cross-platform compatibility + +Älyverkko CLI acts as a sophisticated wrapper around llama.cpp, +managing the complex workflow of task processing while leveraging +llama.cpp's efficient inference capabilities. The =llama_cli_path= +configuration specifies where to find this executable, which must be +built separately from source to optimize for your specific +CPU. Without llama.cpp, Älyverkko CLI couldn't execute any AI tasks - +it's the actual "brain" behind the system. + +** Important files and directories +*** Configuration File +:PROPERTIES: +:ID: fd687508-0a76-4fee-9a1c-4031cb403c60 +:END: + +The /configuration file/ (default =~/.config/alyverkko-cli.yaml=) is +the central YAML file defining all system parameters. It contains four +critical sections: + +1. *Core Paths*: + - =tasks_directory=: Where task files live + - =models_directory=: Location of GGUF model files + - =skills_directory=: Directory for skill YAML files + - =llama_cli_path=: Path to the llama.cpp executable +2. *Generation Parameters*: + - Global defaults for temperature, top_p, etc. + - Affects all tasks unless overridden +3. *Performance Tuning*: + - =thread_count= and =batch_thread_count= optimized for your + specific hardware +4. *Model Definitions*: + - Aliases, paths, and parameters for each registered model + +This file serves as the system's blueprint - without it, Älyverkko CLI +doesn't know where to find models, tasks, or how to process them. The +configuration wizard simplifies initial setup, but advanced users +often edit this file directly for fine-grained control. Parameter +precedence follows *skill* > *model* > *global* rules, creating a +flexible hierarchy for managing complex workflows. + +*** Skill Directory + +The /skill directory/ (configured via =skills_directory=) is the +filesystem location where YAML files defining AI behaviors are stored. +Each file in this directory represents a distinct skill (e.g., +=default.yaml=, =summary.yaml=), with the filename (minus extension) +serving as the skill's alias. + +This directory enables: +- Organization of different AI personas +- Easy addition/removal of capabilities +- Version control of prompt engineering +- Sharing of skill configurations + +When setting up Älyverkko CLI, you typically start with sample skills +from the documentation, then gradually customize them to match your +needs. The directory structure keeps your behavioral configurations +separate from model files and task data, creating clean separation of +concerns. Skills are reloadable at runtime - modifying a skill YAML +file automatically affects subsequent tasks using that skill, without +requiring Alyverkko CLI restart. + +*** Task Directory + +The /task directory/ is the designated filesystem location where users +place task files for processing, configured via =tasks_directory= in +the YAML configuration file. Älyverkko CLI continuously monitors this +directory using filesystem watchers for new or modified files. When a +file with a =TOCOMPUTE:= header is detected, it's added to the +processing queue according to its priority. After completion, the +original file is renamed with a =DONE:= prefix. This directory serves +as the central hub for user-AI interaction - users create and edit +task files here using their preferred text editor, and completed +results appear in the same location. + +Beauty of file based interaction is that there is no imposed user +interface. User can choose tools or editor that he/she prefers. Also +tasks directory can be synchronized with Dropbox/Syncthing or similar +tools between multiple computers or users. This way, travel laptop can +utilize processing capability or more powerful computer at home. + +** Generation parameters +*** Temperature +:PROPERTIES: +:ID: 24a0a54b-828b-4c78-8208-504390848fbc +:END: + +/Temperature/ is a generation parameter controlling the randomness and +creativity of AI responses, typically ranging from 0.0 (completely +deterministic) to 2.0+ (highly creative). Lower values produce more +focused, predictable outputs ideal for factual tasks, while higher +values encourage diverse, unexpected responses better for +brainstorming. + +The parameter operates through a sophisticated probability +distribution: +- Temperature = 0.0: Always selects highest-probability token (repetitive but reliable) +- Temperature = 0.7: Balanced exploration (common default) +- Temperature = 1.5+: Significant randomness (may produce nonsensical outputs) + +Älyverkko CLI implements a three-tier hierarchy for temperature +settings: [[id:456dd42e-a474-4464-a14e-384c68713537][*skill-specific* > *model-specific* > *global default*]]. This +allows precise control +- your =creative-writing.yaml= skill might use temperature=0.9 +- while your =code-review.yaml= skill uses 0.2. + +The system automatically selects the most specific applicable value, +giving you surgical control over response characteristics without +modifying model files. + +*** Top-p (Nucleus Sampling) +:PROPERTIES: +:ID: 047f5bf7-e964-49ac-a666-c7ac75754e54 +:END: + +/Top-p/ (or nucleus sampling) is a generation parameter (range 0.0-1.0) +that dynamically selects the smallest set of highest-probability tokens +whose cumulative probability exceeds the p-value. For example, with +top_p=0.9, the model considers only tokens comprising the top 90% of the +probability distribution. + +- Low values (0.3-0.6): Focused, conservative responses +- Medium values (0.7-0.9): Balanced exploration (common default) +- High values (0.95+): Maximum diversity within coherence + +Unlike temperature which affects all tokens uniformly, top-p +dynamically adjusts the token selection pool based on the current +context's probability distribution. This often produces more natural +variation in responses. Like other parameters, top-p follows the +*skill* > *model* > *global* hierarchy, allowing context-specific +tuning. The default setting (typically 0.9-0.95) works well for most +general-purpose tasks while preventing extremely low-probability +("nonsense") outputs. + +*** Repeat Penalty + +/Repeat penalty/ is a parameter (>0.0) that discourages the AI from +repeating identical phrases or tokens. A value of 1.0 means no +penalty, while values >1.0 increasingly penalize repetitions. For +example, repeat_penalty=1.2 applies a 20% reduction to the probability +of tokens that have recently appeared. + +This parameter is crucial for maintaining response quality in longer +outputs: +- Values 1.0-1.1: Mild repetition control (good for most tasks) +- Values 1.1-1.3: Stronger anti-repetition (helpful for verbose outputs) +- Values >1.5: May produce unnatural phrasing + +The parameter operates by modifying the token probability distribution +during generation - tokens that have appeared in the recent context +have their probabilities reduced by the penalty factor. This happens +dynamically throughout generation, making it more effective than +simple post-processing filters. Like other generation parameters, +repeat penalty follows the *skill* > *model* > *global* hierarchy, +allowing you to configure strict anti-repetition for technical writing +while allowing more repetition in poetic outputs. + +*** Top-k + +/Top-k/ is a generation parameter that restricts token selection to +the K most probable tokens at each step, regardless of their actual +probability values. For example, with top_k=40, the model only +considers the 40 highest-probability tokens when generating each new +token. + +Usage considerations: +- Lower values (20-40): More focused, conservative outputs +- Higher values (50-100): Greater diversity within coherence +- Value of 0: Disables top-k filtering (uses full vocabulary) + +Unlike temperature which affects probability distribution shape, top-k +creates a hard cutoff - tokens outside the top K have zero chance of +selection. This provides more deterministic control over output +diversity. The parameter follows the standard *skill* > *model* > +*global* hierarchy, allowing context-specific tuning. While less +commonly adjusted than temperature or top-p, top-k offers valuable +fine control for specialized tasks where you want to strictly limit +the token selection pool. + +*** Min-p + +/Min-p/ (minimum probability threshold) is an advanced generation +parameter that filters tokens whose probability falls below a +specified fraction of the highest-probability token's probability. For +example, with min_p=0.05, only tokens with probability ≥5% of the top +token's probability are considered. + +Key characteristics: +- Range 0.0-1.0 (0.0 disables the filter) +- Complements rather than replaces top-p +- More adaptive than fixed top-k + +This parameter helps eliminate extremely low-probability "tail" tokens +that might produce nonsensical outputs, while maintaining more +flexibility than strict top-k filtering. It's particularly useful +for: +- Reducing rare factual errors +- Preventing improbable word combinations +- Maintaining response coherence in long outputs + +Like other generation parameters, min_p follows the *skill* > *model* +> *global* hierarchy, though it's typically left at default (0.0) +unless addressing specific output quality issues. Advanced users might +experiment with min_p=0.03-0.07 for critical applications requiring +maximum response reliability. + +*** Thread Count + +/Thread count/ specifies the number of CPU threads dedicated to the +core AI inference process (configured via =thread_count= in +YAML). This parameter primarily affects how efficiently the system +utilizes your CPU's computational resources during token +generation. Token generation is typically bound by RAM speed and not +by CPU compute. + +The parameter targets the phase or transforming tokens through the +neural network layers. Since this phase is often limited by memory +bandwidth rather than pure compute, increasing threads beyond your +RAM's capability won't improve speed but will keep your CPU cores +uselessly busy-waiting for data. + +For instance on AMD Ryzen 5 5600G I observed that AI throughput gains +start diminishing fast after about 3 threads have been utilized. And +there is almost no performance difference between 5 and 6 threads +despite CPU claiming to have 12 threads. Reason is that RAM bandwidth +gets fully utilized already very fast with just few threads. + +*** Batch Thread Count + +/Batch thread count/ specifies threads used for prompt preprocessing +(configured via =batch_thread_count=). This parameter affects how +quickly the system parses your input text for the AI model. + +Unlike *thread_count* which handles token generation, this phase is +typically compute-bound rather than RAM-bound, so higher values often +help up to your CPU's logical core count. + +*** End of Text Marker + +An /end of text marker/ is an optional string (e.g., "###", ”[end of +text]“) specified per-model that signals the AI has completed its +response. When configured, Älyverkko CLI automatically truncates +output at this marker, removing any trailing artifacts. + +This parameter is useful with models that use specific termination +sequences so that they will not be shown to the AI user. + +For example, if a model typically ends responses with "###", setting +=end_of_text_marker: "###"= ensures the system removes "###" at the +end of AI response. + +*** Context Size Tokens + +/Context size tokens/ defines the maximum number of tokens +(word-pieces) a model can process, configured per-model via +=context_size_tokens=. This parameter represents the AI's "working +memory" capacity for any given task. + +Critical implications: +- Determines maximum input+output length. +- Larger contexts require significantly more RAM. +- Most models support 4K-128K tokens. + +This parameter fundamentally shapes what tasks a model can handle - +code analysis of large files, book chapter processing, or +multi-document summarization all require sufficient context +size. Always verify your model's actual supported context - exceeding +it causes unpredictable or significantly degraded model output. + +*** Parameter Precedence Hierarchy +:PROPERTIES: +:ID: 456dd42e-a474-4464-a14e-384c68713537 +:END: + +Älyverkko CLI implements a sophisticated three-tier /parameter +precedence hierarchy/ for generation settings (temperature, top_p, +etc.): + +1. *Skill-specific values* (highest priority) + - Defined in skill YAML files + - Example: =temperature: 0.3= in =summary.yaml= + +2. *Model-specific values* (middle priority) + - Defined in model configuration + - Example: =temperature: 0.6= for "mistral" model + +3. *Global defaults* (lowest priority) + - Set in main configuration + - Example: =default_temperature: 0.7= + +The system automatically selects the most specific applicable value, +creating a flexible "rule cascade" where specialized configurations +override broader ones. + +* Getting started + +When you first encounter Älyverkko CLI, the setup process might seem +involved compared to cloud-based AI services. That's completely +understandable! Let me walk you through why each step exists and how +it ultimately creates a powerful, private, and cost-effective AI +assistant that works /for you/. + ** Your Setup Journey - What to Expect Here's what you'll be doing, explained simply with /why/ each step @@ -416,7 +966,6 @@ Each model in the =models= list can have: can identify and remove them so that they don't leak into conversation. Default value is: *null*. - *** Configuration file example The application is configured using a YAML-formatted configuration