Added AI server setup : to be reviewed

2026-01-21 19:10:32 +01:00
parent 30d560f804
commit b580137ee8
16 changed files with 4089 additions and 0 deletions
--- a/Setup/01-BaseSetup-WebminAndDocker.md
+++ b/Setup/01-BaseSetup-WebminAndDocker.md
@@ -0,0 +1,143 @@
 # Installing Webmin and Docker on Ubuntu
 This guide walks you through installing Webmin on Ubuntu and expanding logical volumes via Webmin’s interface. Additionally, it covers Docker installation on Ubuntu.
 ---
 ## Part 1: Installing Webmin on Ubuntu
 Webmin is a web-based interface for managing Unix-like systems, making tasks such as user management, server configuration, and software installation easier.
 ### Step 1: Update Your System
 Before installing Webmin, update your system to ensure all packages are up to date.
 ```bash
 sudo apt update && sudo apt upgrade -y
 ```
 ### Step 2: Add the Webmin Repository and Key
 To add the Webmin repository, download and run the setup script.
 ```bash
 curl -o setup-repos.sh https://raw.githubusercontent.com/webmin/webmin/master/setup-repos.sh
 sudo sh setup-repos.sh
 ```
 ### Step 3: Install Webmin
 With the repository set up, install Webmin:
 ```bash
 sudo apt-get install webmin --install-recommends
 ```
 ### Step 4: Access Webmin
 Once installed, Webmin runs on port 10000. You can access it by opening a browser and navigating to:
 ```
 https://<your-server-ip>:10000
 ```
 If you are using a firewall, allow traffic on port 10000:
 ```bash
 sudo ufw allow 10000
 ```
 You can now log in to Webmin using your system's root credentials.
 ---
 ## Part 2: Expanding a Logical Volume Using Webmin
 Expanding a logical volume through Webmin’s Logical Volume Management (LVM) interface is a simple process.
 ### Step 1: Access Logical Volume Management
 Log in to Webmin and navigate to:
 **Hardware > Logical Volume Management**
 Here, you can manage physical volumes, volume groups, and logical volumes.
 ### Step 2: Add a New Physical Volume
 If you've added a new disk or partition to your system, you need to allocate it to a volume group before expanding the logical volume. To do this:
 1. Locate your volume group in the Logical Volume Management module.
 2. Click **Add Physical Volume**.
 3. Select the new partition or RAID device and click **Add to volume group**. This action increases the available space in the group.
 ### Step 3: Resize the Logical Volume
 To extend a logical volume:
 1. In the **Logical Volumes** section, locate the logical volume you wish to extend.
 2. Select **Resize**.
 3. Specify the additional space or use all available free space in the volume group.
 4. Click **Apply** to resize the logical volume.
 ### Step 4: Resize the Filesystem
 After resizing the logical volume, expand the filesystem to match:
 1. Click on the logical volume to view its details.
 2. For supported filesystems like ext2, ext3, or ext4, click **Resize Filesystem**. The filesystem will automatically adjust to the new size of the logical volume.
 ---
 ## Part 3: Installing Docker on Ubuntu
 This section covers installing Docker on Ubuntu.
 ### Step 1: Remove Older Versions
 If you have previous versions of Docker installed, remove them:
 ```bash
 sudo apt remove docker docker-engine docker.io containerd runc
 ```
 ### Step 2: Add Docker's Official GPG Key and Repository
 Add Docker’s GPG key and repository to your system’s Apt sources:
 ```bash
 sudo apt-get update
 sudo apt-get install ca-certificates curl
 sudo install -m 0755 -d /etc/apt/keyrings
 sudo curl -fsSL https://download.docker.com/linux/ubuntu/gpg -o /etc/apt/keyrings/docker.asc
 sudo chmod a+r /etc/apt/keyrings/docker.asc
 echo \
 "deb [arch=$(dpkg --print-architecture) signed-by=/etc/apt/keyrings/docker.asc] https://download.docker.com/linux/ubuntu \
 $(. /etc/os-release && echo "$VERSION_CODENAME") stable" | \
 sudo tee /etc/apt/sources.list.d/docker.list > /dev/null
 sudo apt-get update
 ```
 ### Step 3: Install Docker
 Now, install Docker:
 ```bash
 sudo apt-get install docker-ce docker-ce-cli containerd.io docker-buildx-plugin docker-compose-plugin
 ```
 ### Step 4: Post-Installation Steps
 To allow your user to run Docker commands without `sudo`, add your user to the Docker group:
 ```bash
 sudo usermod -aG docker $USER
 newgrp docker
 ```
 Test your Docker installation by running the following command:
 ```bash
 docker run hello-world
 ```
 For more information, visit the official [Docker installation page](https://docs.docker.com/engine/install/ubuntu/).
--- a/Setup/02-NvidiaDriversSetup.md
+++ b/Setup/02-NvidiaDriversSetup.md
@@ -0,0 +1,182 @@
 # How to Install the Latest Version of NVIDIA CUDA on Ubuntu 22.04 LTS
 If you’re looking to unlock the power of your NVIDIA GPU for scientific computing, machine learning, or other parallel workloads, CUDA is essential. Follow this step-by-step guide to install the latest CUDA release on Ubuntu 22.04 LTS.
 ## Prerequisites
 Before proceeding with installing CUDA, ensure your system meets the following requirements:
 - **Ubuntu 22.04 LTS** – This version is highly recommended for stability and compatibility.
 - **NVIDIA GPU + Drivers** – CUDA requires having an NVIDIA GPU along with proprietary Nvidia drivers installed.
 To check for an NVIDIA GPU, open a terminal and run:
 ```bash
 lspci | grep -i NVIDIA
 ```
 If an NVIDIA GPU is present, it will be listed. If not, consult NVIDIA’s documentation on installing the latest display drivers.
 ## Step 1: Install Latest NVIDIA Drivers
 Install the latest NVIDIA drivers matched to your GPU model and CUDA version using Ubuntu’s built-in Additional Drivers utility:
 1. Open **Settings -> Software & Updates -> Additional Drivers**
 2. Select the recommended driver under the NVIDIA heading
 3. Click **Apply Changes** and **Reboot**
 Verify the driver installation by running:
 ```bash
 nvidia-smi
 ```
 This should print details on your NVIDIA GPU and driver version.
 ## Step 2: Add the CUDA Repository
 Add NVIDIA’s official repository to your system to install CUDA:
 1. Visit NVIDIA’s CUDA Download Page and select "Linux", "x86_64", "Ubuntu", "22.04", "deb(network)"
 2. Copy the repository installation commands for Ubuntu 22.04:
 ```bash
 wget https://developer.download.nvidia.com/compute/cuda/repos/ubuntu2204/x86_64/cuda-keyring_1.1-1_all.deb
 sudo dpkg -i cuda-keyring_1.1-1_all.deb
 sudo apt-get update
 ```
 Run these commands to download repository metadata and add the apt source.
 ## Step 3: Install CUDA Toolkit
 Install CUDA using apt:
 ```bash
 sudo apt-get -y install cuda
 ```
 Press **Y** to proceed and allow the latest supported version of the CUDA toolkit to install.
 ## Step 4: Configure Environment Variables
 Update environment variables to recognize the CUDA compiler, tools, and libraries:
 Open `/etc/profile.d/cuda.sh` and add the following configuration:
 ```bash
 export PATH=/usr/local/cuda/bin:$PATH
 export LD_LIBRARY_PATH=/usr/local/cuda/lib64:$LD_LIBRARY_PATH
 ```
 Save changes and refresh environment variables:
 ```bash
 source /etc/profile.d/cuda.sh
 ```
 Alternatively, reboot to load the updated environment variables.
 ## Step 5: Verify Installation
 Validate the installation:
 1. Check the `nvcc` compiler version:
   ```bash
   nvcc --version
   ```
   This should display details on the CUDA compile driver, including the installed version.
 2. Verify GPU details with NVIDIA SMI:
   ```bash
   nvidia-smi
   ```
 # Optional: Setting Up cuDNN with CUDA: A Comprehensive Guide
 This guide will walk you through downloading cuDNN from NVIDIA's official site, extracting it, copying the necessary files to the CUDA directory, and setting up environment variables for CUDA.
 ## Step 1: Download cuDNN
 1. **Visit the NVIDIA cuDNN Archive**: 
   Navigate to the [NVIDIA cuDNN Archive](https://developer.nvidia.com/rdp/cudnn-archive).
 2. **Select the Version**: 
   Choose the appropriate version of cuDNN compatible with your CUDA version. For this guide, we'll assume you are downloading `cudnn-linux-x86_64-8.9.7.29_cuda12-archive`.
 3. **Download the Archive**: 
   Download the `tar.xz` file to your local machine.
 ## Step 2: Extract cuDNN
 1. **Navigate to the Download Directory**: 
   Open a terminal and navigate to the directory where the archive was downloaded.
   ```bash
   cd ~/Downloads
   ```
 2. **Extract the Archive**: 
   Use the `tar` command to extract the contents of the archive.
   ```bash
   tar -xvf cudnn-linux-x86_64-8.9.7.29_cuda12-archive.tar.xz
   ```
   This will create a directory named `cudnn-linux-x86_64-8.9.7.29_cuda12-archive`.
 ## Step 3: Copy cuDNN Files to CUDA Directory
 1. **Navigate to the Extracted Directory**: 
   Move into the directory containing the extracted cuDNN files.
   ```bash
   cd cudnn-linux-x86_64-8.9.7.29_cuda12-archive
   ```
 2. **Copy Header Files**: 
   Copy the header files to the CUDA include directory.
   ```bash
   sudo cp include/cudnn*.h /usr/local/cuda-12.5/include/
   ```
 3. **Copy Library Files**: 
   Copy the library files to the CUDA lib64 directory.
   ```bash
   sudo cp lib/libcudnn* /usr/local/cuda-12.5/lib64/
   ```
 4. **Set Correct Permissions**: 
   Ensure the copied files have the appropriate permissions.
   ```bash
   sudo chmod a+r /usr/local/cuda-12.5/include/cudnn*.h /usr/local/cuda-12.5/lib64/libcudnn*
   ```
 ## Step 4: Set Up Environment Variables
 1. **Open Your Shell Profile**: 
   Open your `.bashrc` or `.bash_profile` file in a text editor.
   ```bash
   nano ~/.bashrc
   ```
 2. **Add CUDA to PATH and LD_LIBRARY_PATH**: 
   Add the following lines to set the environment variables for CUDA. This example assumes CUDA 12.5.
   ```bash
   export PATH=/usr/local/cuda-12.5/bin:$PATH
   export LD_LIBRARY_PATH=/usr/local/cuda-12.5/lib64:$LD_LIBRARY_PATH
   ```
 3. **Apply the Changes**: 
   Source the file to apply the changes immediately.
   ```bash
   source ~/.bashrc
   ```
 ## Verification
 1. **Check CUDA Installation**: 
   Verify that CUDA is correctly set up by running:
   ```bash
   nvcc --version
   ```
 2. **Check cuDNN Installation**: 
   Optionally, you can compile and run a sample program to ensure cuDNN is working correctly.
 By following these steps, you will have downloaded and installed cuDNN, integrated it into your CUDA setup, and configured your environment variables for smooth operation. This ensures that applications requiring both CUDA and cuDNN can run without issues.
--- a/Setup/03-PowerLimitNvidiaGPU.md
+++ b/Setup/03-PowerLimitNvidiaGPU.md
@@ -0,0 +1,85 @@
 # OPTIONAL: Setting NVIDIA GPU Power Limit at System Startup
 ## Overview
 This guide explains how to set the power limit for NVIDIA GPUs at system startup using a systemd service. This ensures the power limit setting is persistent across reboots.
 ## Steps
 ### 1. Create and Configure the Service File
 1. Open a terminal and create a new systemd service file:
    ```bash
    sudo nano /etc/systemd/system/nvidia-power-limit.service
    ```
 2. Add the following content to the file, replacing `270` with the desired power limit (e.g., 270 watts for your GPUs):
    - For Dual GPU Setup:
    ```ini
    [Unit]
    Description=Set NVIDIA GPU Power Limit
    [Service]
    Type=oneshot
    ExecStart=/usr/bin/nvidia-smi -i 0 -pl 270
    ExecStart=/usr/bin/nvidia-smi -i 1 -pl 270
    [Install]
    WantedBy=multi-user.target
    ```
    - For Quad GPU Setup:
    ```ini
    [Unit]
    Description=Set NVIDIA GPU Power Limit
    [Service]
    Type=oneshot
    ExecStart=/usr/bin/nvidia-smi -i 0 -pl 270
    ExecStart=/usr/bin/nvidia-smi -i 1 -pl 270
    ExecStart=/usr/bin/nvidia-smi -i 2 -pl 270
    ExecStart=/usr/bin/nvidia-smi -i 3 -pl 270
    [Install]
    WantedBy=multi-user.target
    ```
    Save and close the file.
 ### 2. Apply and Enable the Service
 1. Reload the systemd manager configuration:
    ```bash
    sudo systemctl daemon-reload
    ```
 2. Enable the service to ensure it runs at startup:
    ```bash
    sudo systemctl enable nvidia-power-limit.service
    ```
 ### 3. (Optional) Start the Service Immediately
 To apply the power limit immediately without rebooting:
 ```bash
 sudo systemctl start nvidia-power-limit.service
 ```
 ## Verification
 Check the power limits using `nvidia-smi`:
 ```bash
 nvidia-smi -q -d POWER
 ```
 Look for the "Power Management" section to verify the new power limits.
 By following this guide, you can ensure that your NVIDIA GPUs have a power limit set at every system startup, providing consistent and controlled power usage for your GPUs.
--- a/OpenWebUI/01-OllamaAndOpenWebUISetup.md
+++ b/OpenWebUI/01-OllamaAndOpenWebUISetup.md
@@ -0,0 +1,103 @@
 # Ollama & OpenWebUI Docker Setup
 ## Ollama with Nvidia GPU
 Ollama makes it easy to get up and running with large language models locally.
 To run Ollama using an Nvidia GPU, follow these steps:
 ### Step 1: Install the NVIDIA Container Toolkit
 #### Install with Apt
 1. **Configure the repository**:
    ```bash
    curl -fsSL https://nvidia.github.io/libnvidia-container/gpgkey \
        | sudo gpg --dearmor -o /usr/share/keyrings/nvidia-container-toolkit-keyring.gpg
    curl -s -L https://nvidia.github.io/libnvidia-container/stable/deb/nvidia-container-toolkit.list \
        | sed 's#deb https://#deb [signed-by=/usr/share/keyrings/nvidia-container-toolkit-keyring.gpg] https://#g' \
        | sudo tee /etc/apt/sources.list.d/nvidia-container-toolkit.list
    sudo apt-get update
    ```
 2. **Install the NVIDIA Container Toolkit packages**:
    ```bash
    sudo apt-get install -y nvidia-container-toolkit
    ```
 #### Install with Yum or Dnf
 1. **Configure the repository**:
    ```bash
    curl -s -L https://nvidia.github.io/libnvidia-container/stable/rpm/nvidia-container-toolkit.repo \
        | sudo tee /etc/yum.repos.d/nvidia-container-toolkit.repo
    ```
 2. **Install the NVIDIA Container Toolkit packages**:
    ```bash
    sudo yum install -y nvidia-container-toolkit
    ```
 ### Step 2: Configure Docker to Use Nvidia Driver
 ```bash
 sudo nvidia-ctk runtime configure --runtime=docker
 sudo systemctl restart docker
 ```
 ### Step 3: Start the Container
 ```bash
 docker run -d --gpus=all -v ollama:/root/.ollama -p 11434:11434 --restart always --name ollama ollama/ollama
 ```
 ## Running Multiple Instances with Specific GPUs
 You can run multiple instances of the Ollama server and assign specific GPUs to each instance. In my server, I have 4 Nvidia 3090 GPUs, which I use as described below:
 ### Ollama Server for GPUs 0 and 1
 ```bash
 docker run -d --gpus '"device=0,1"' -v ollama:/root/.ollama -p 11435:11434 --restart always --name ollama1 --network ollama-network ollama/ollama
 ```
 ### Ollama Server for GPUs 2 and 3
 ```bash
 docker run -d --gpus '"device=2,3"' -v ollama:/root/.ollama -p 11436:11434 --restart always --name ollama2 --network ollama-network ollama/ollama
 ```
 ## Running Models Locally
 Once the container is up and running, you can execute models using:
 ```bash
 docker exec -it ollama ollama run llama3.1
 ```
 ```bash
 docker exec -it ollama ollama run llama3.1:70b
 ```
 ```bash
 docker exec -it ollama ollama run qwen2.5-coder:1.5b
 ```
 ```bash
 docker exec -it ollama ollama run deepseek-v2
 ```
 ### Try Different Models
 Explore more models available in the [Ollama library](https://github.com/ollama/ollama).
 ## OpenWebUI Installation
 To install and run OpenWebUI, use the following command:
 ```bash
 docker run -d -p 3000:8080 --add-host=host.docker.internal:host-gateway -v open-webui:/app/backend/data --name open-webui --restart always ghcr.io/open-webui/open-webui:main
 ```
--- a/OpenWebUI/02-Update-Ollama-OpenwebUI.md
+++ b/OpenWebUI/02-Update-Ollama-OpenwebUI.md
@@ -0,0 +1,48 @@
 ### Wiki: Updating Docker Containers for Ollama and OpenWebUI
 This guide explains the steps to update Docker containers for **Ollama** and **OpenWebUI**. Follow the instructions below to stop, remove, pull new images, and run the updated containers.
 ---
 ## Ollama
 ### Steps to Update
 1. **Stop Existing Containers**
 2. **Remove Existing Containers**
 3. **Pull the Latest Ollama Image**
 4. **Run Updated Containers**
 For GPU devices 0 and 1:
 ```bash
 docker stop ollama
 docker rm ollama
 docker pull ollama/ollama
 docker run -d --gpus '"device=0,1"' -v ollama:/root/.ollama -p 11434:11434 --restart always --name ollama -e OLLAMA_KEEP_ALIVE=1h ollama/ollama
 ```
 For NVIDIA jetson/cpu
 ```bash
 docker stop ollama
 docker rm ollama
 docker pull ollama/ollama
 docker run -d -v ollama:/root/.ollama -p 11434:11434 --restart always --name ollama -e OLLAMA_KEEP_ALIVE=1h ollama/ollama
 ```
 ---
 ## OpenWebUI
 ```bash
 docker stop open-webui
 docker rm open-webui
 docker pull ghcr.io/open-webui/open-webui:main
 docker run -d -p 3000:8080 --add-host=host.docker.internal:host-gateway -v open-webui:/app/backend/data --name open-webui --restart always ghcr.io/open-webui/open-webui:main
 ```
 ---
 ### Notes
 - Make sure to adjust GPU allocation or port numbers as necessary for your setup.
 - The `OLLAMA_KEEP_ALIVE` environment variable is set to `1h` to maintain the container alive for an hour after inactivity.
--- a/host/AIServerSetup/03-SearXNG/SearXNGSetup.md
+++ b/host/AIServerSetup/03-SearXNG/SearXNGSetup.md
@@ -0,0 +1,58 @@
 # Running SearXNG with Custom Settings in Docker
 ## Overview
 This guide walks you through the steps to run a SearXNG instance in Docker using a custom `settings.yml` configuration file. This setup is ideal for users who want to customize their SearXNG instance without needing to rebuild the Docker image every time they make a change.
 ## Prerequisites
 - **Docker**: Ensure Docker is installed on your machine. Verify the installation by running `docker --version`.
 - **Git**: For cloning the SearXNG repository, make sure Git is installed.
 ## Steps
 ### 1. Use the Official Image or Clone the SearXNG Repository
 You can pull the official image directly from Docker Hub:
 ```bash
 docker pull docker.io/searxng/searxng:latest
 ```
 ### 2. Customize `settings.yml`
 Place your custom `settings.yml` file in the directory of your choice. Ensure that this file is configured according to your needs, including enabling JSON responses if required.
 ### 3. Run the SearXNG Docker Container
 Run the Docker container using your custom `settings.yml` file. Choose the appropriate command based on whether you are using the official image or a custom build.
 #### For the Official Image:
 ```bash
 docker run -d -p 4000:8080 --restart always --name searxng -v ./settings.yml:/etc/searxng/settings.yml searxng/searxng:latest
 ```
 #### Command Breakdown:
 - `-d`: Runs the container in detached mode.
 - `-p 4000:8080`: Maps port 8080 in the container to port 4000 on your host machine.
 - `-v ./settings.yml:/etc/searxng/settings.yml`: Mounts the custom `settings.yml` file into the container.
 - `searxng/searxng:latest` or `searxng/searxng`: The Docker image being used.
 ### 4. Access SearXNG
 Once the container is running, you can access your SearXNG instance by navigating to `http://<hostname>:4000` in your web browser.
 ### 5. Testing JSON Output
 To verify that the JSON output is correctly configured, you can use `curl` or a similar tool:
 ```bash
 curl http://<hostname>:4000/search?q=python&format=json
 ```
 This should return search results in JSON format.
 ### 5. Configuration URL for OpenWebUI
 http://<hostname>:4000/search?q=<query>
--- a/host/AIServerSetup/03-SearXNG/settings.yml
+++ b/host/AIServerSetup/03-SearXNG/settings.yml
--- a/host/AIServerSetup/04-ComfyUI/ComfyUISetup.md
+++ b/host/AIServerSetup/04-ComfyUI/ComfyUISetup.md
@@ -0,0 +1,123 @@
 # ComfyUI Docker Setup with GGUF Support and ComfyUI Manager
 This guide provides detailed steps to build and run **ComfyUI** with **GGUF support** and **ComfyUI Manager** using Docker. The GGUF format is optimized for quantized models, and ComfyUI Manager is included for easy node management.
 ## Prerequisites
 Before starting, ensure you have the following installed on your system:
 - **Docker**
 - **NVIDIA GPU with CUDA support** (if using GPU acceleration)
 - **Create Directory structure for git repo Models and Checkpoints**
 ```bash
 mkdir -p ~/dev-ai/vison/models/checkpoints
 ```
 ### 1. Clone the ComfyUI Repository
 First, navigate to `~/dev-ai/vison` directory and clone the ComfyUI repository to your local machine:
 ```bash
 cd ~/dev-ai/vison
 ```
 ```bash
 git clone https://github.com/comfyanonymous/ComfyUI.git
 cd ComfyUI
 ```
 ### 2. Create the Dockerfile
 Copy the provided `Dockerfile` in the root of your ComfyUI directory. This file contains the necessary configurations for building the Docker container with GGUF support.
 ### 3. Build the Docker Image
 ```bash
 docker build -t comfyui-gguf:latest .
 ```
 This will create a Docker image named `comfyui-gguf:latest` with both **ComfyUI Manager** and **GGUF support** built in.
 ### 4. Run the Docker Container
 Once the image is built, you can run the Docker container with volume mapping for your models.
 ```bash
 docker run --name comfyui -p 8188:8188 --gpus all \
  -v /home/mukul/dev-ai/vison/models:/app/models \
  -d comfyui-gguf:latest
 ```
 This command maps your local `models` directory to `/app/models` inside the container and exposes ComfyUI on port `8188`.
 ### 5. Download and Place Checkpoint Models
 Download and place your civitai checkpoint models in the `checkpoints` directory inside the container:
 https://civitai.com/models/139562/realvisxl-v50
 To use GGUF models or other safetensor models, follow the steps below to download them directly into the `checkpoints` directory.
 1. **Navigate to the Checkpoints Directory**:
   ```bash
   cd /home/mukul/dev-ai/vison/models/checkpoints
   ```
 2. **Download `flux1-schnell-fp8.safetensors`**:
   ```bash
   wget https://huggingface.co/Comfy-Org/flux1-schnell/resolve/main/flux1-schnell-fp8.safetensors?download=true -O flux1-schnell-fp8.safetensors
   ```
 3. **Download `flux1-dev-fp8.safetensors`**:
   ```bash
   wget https://huggingface.co/Comfy-Org/flux1-dev/resolve/main/flux1-dev-fp8.safetensors?download=true -O flux1-dev-fp8.safetensors
   ```
 These commands will place the corresponding `.safetensors` files into the `checkpoints` directory.
 ### 6. Access ComfyUI
 After starting the container, access the ComfyUI interface in your web browser:
 ```bash
 http://<your-server-ip>:8188
 ```
 Replace `<your-server-ip>` with your server's IP address or use `localhost` if you're running it locally.
 ### 7. Using GGUF Models
 In the ComfyUI interface:
 - Use the **UnetLoaderGGUF** node (found in the `bootleg` category) to load GGUF models.
 - Ensure your GGUF files are correctly named and placed in the `/app/models/checkpoints` directory for detection by the loader node.
 ### 8. Managing Nodes with ComfyUI Manager
 With **ComfyUI Manager** built into the image:
 - **Install** missing nodes as needed when uploading workflows.
 - **Enable/Disable** conflicting nodes from the ComfyUI Manager interface.
 ### 9. Stopping and Restarting the Docker Container
 To stop the running container:
 ```bash
 docker stop comfyui
 ```
 To restart the container:
 ```bash
 docker start comfyui
 ```
 ### 10. Logs and Troubleshooting
 To view the container logs:
 ```bash
 docker logs comfyui
 ```
 This will provide details if anything goes wrong or if you encounter issues with GGUF models or node management.
--- a/host/AIServerSetup/04-ComfyUI/Dockerfile
+++ b/host/AIServerSetup/04-ComfyUI/Dockerfile
@@ -0,0 +1,33 @@
 # Base image with Python 3.11 and CUDA 12.5 support
 FROM nvidia/cuda:12.5.0-runtime-ubuntu22.04
 # Install system dependencies
 RUN apt-get update && apt-get install -y \
    git \
    python3-pip \
    libgl1-mesa-glx \
    && rm -rf /var/lib/apt/lists/*
 # Set working directory
 WORKDIR /app
 # Copy the cloned ComfyUI repository
 COPY . /app
 # Install Python dependencies
 RUN pip install --upgrade pip
 RUN pip install -r requirements.txt
 # Clone and install ComfyUI Manager
 RUN git clone https://github.com/ltdrdata/ComfyUI-Manager.git /app/custom_nodes/ComfyUI-Manager && \
    pip install -r /app/custom_nodes/ComfyUI-Manager/requirements.txt
 # Clone and install GGUF support for ComfyUI
 RUN git clone https://github.com/city96/ComfyUI-GGUF.git /app/custom_nodes/ComfyUI-GGUF && \
    pip install --upgrade gguf
 # Expose the port used by ComfyUI
 EXPOSE 8188
 # Run ComfyUI with the server binding to 0.0.0.0
 CMD ["python3", "main.py", "--listen", "0.0.0.0"]
--- a/host/AIServerSetup/05-Jetson
+++ b/host/AIServerSetup/05-Jetson
@@ -0,0 +1,113 @@
 ## Running uncensored models on the NVIDIA Jetson Orin Nano Super Developer Kit
 This guide is aimed at helping you set up uncensored models seamlessly on your Jetson Orin Nano, ensuring you can run powerful image generation models on this compact, yet powerful device.
 This tutorial will walk you through each step of the process. Even if you're starting from a fresh installation, following along should ensure everything is set up correctly. And if anything doesn’t work as expected, feel free to reach out—I'll keep this guide updated to keep it running smoothly.
 ---
 ## Let’s Dive In
 ### Step 1: Installing Miniconda and Setting Up a Python Environment
 First, we need to install Miniconda on your Jetson Nano. This will allow us to create an isolated Python environment for managing dependencies. Let's set up our project environment.
 ```bash
 wget https://repo.anaconda.com/miniconda/Miniconda3-latest-Linux-aarch64.sh
 chmod +x Miniconda3-latest-Linux-aarch64.sh
 ./Miniconda3-latest-Linux-aarch64.sh
 conda update conda
 ```
 Now, we create and activate a Python 3.10 environment for our project.
 ```bash
 conda create -n comfyui python=3.10
 conda activate comfyui
 ```
 ### Step 2: Installing CUDA, cuDNN, TensorRT, and Verifying nvcc
 ```bash
 Preconfigured on JetPack 6.1!
 ```
 Next, confirm that CUDA is installed correctly by checking the `nvcc` version.
 ```bash
 nvcc --version
 ```
 ### Step 3: Installing PyTorch, TorchVision, and TorchAudio
 Now let's install the essential libraries for image generation: PyTorch, TorchVision, and Torchaudio from here [devpi - cu12.6](http://jetson.webredirect.org/jp6/cu126)
 ```bash
 pip install https://pypi.jetson-ai-lab.dev/jp6/cu126/+f/5cf/9ed17e35cb752/torch-2.5.0-cp310-cp310-linux_aarch64.whl
 pip install https://pypi.jetson-ai-lab.dev/jp6/cu126/+f/9d2/6fac77a4e832a/torchvision-0.19.1a0+6194369-cp310-cp310-linux_aarch64.whl
 pip install https://pypi.jetson-ai-lab.dev/jp6/cu126/+f/812/4fbc4ba6df0a3/torchaudio-2.5.0-cp310-cp310-linux_aarch64.whl
 ```
 ### Step 4: Cloning the Project Repository
 Now, we clone the necessary source code for the project from GitHub. This will include the files for running uncensored models from civtai.com.
 ```bash
 git clone https://github.com/comfyanonymous/ComfyUI.git
 cd ComfyUI
 ```
 ### Step 5: Installing Project Dependencies
 Next, install the required dependencies for the project by running the `requirements.txt` file.
 ```bash
 pip install -r requirements.txt
 ```
 ### Step 6: Resolving Issues with NumPy (if necessary)
 If you encounter issues with NumPy, such as compatibility problems, you can fix it by downgrading to a version below 2.0.
 ```bash
 pip install "numpy<2"
 ```
 ### Step 7: Running ComfyUI
 Finally, we can run ComfyUI to check if everything is set up properly. Start the app with the following command:
 ```bash
 python main.py --listen 0.0.0.0
 ```
 ---
 ## Great! Now that you've got ComfyUI up and running, it’s time to load your first uncensored model.
 1. Navigate to [civitai.com](https://civitai.com) and select a model. For example, you can choose the following model:
   [RealVisionBabes v1.0](https://civitai.com/models/543456?modelVersionId=604282)
 2. Download the model file: [realvisionbabes_v10.safetensors](https://civitai.com/api/download/models/604282?type=Model&format=SafeTensor&size=pruned&fp=fp16)
 3. Place it inside the `models/checkpoints` folder.
 4. Download the VAE file: [ClearVAE_V2.3_fp16.pt](https://civitai.com/api/download/models/604282?type=VAE)
 5. Place it inside the `models/vae` folder.
 ---
 ## You're all set to launch your first run! 
 Visit the provided URL by ComfyUI (`http://jetson:8188`) on your Jetson Nano.
 Go to the [ControlNet reference demo](https://civitai.com/posts/3943573), download the workflow (also available in the repo as workflow-api.json) and, import it in comfyUI.
 And hit the "Queue Prompt" button, and watch the magic unfold!
 Happy generating! 🎉
--- a/host/AIServerSetup/05-Jetson
+++ b/host/AIServerSetup/05-Jetson
@@ -0,0 +1,91 @@
 You are an expert prompt generator. Your task is to transform user requests into detailed, vivid, and imaginative prompts that can be used to generate visually captivating images with a diffusion model. You should:
 - **Analyze** the user's request carefully and extract the key visual elements.
 - **Generate a prompt** that describes the image in clear and evocative terms, ensuring it’s visually rich and imaginative.
 - **Ensure details** are specific about the atmosphere, setting, colors, lighting, textures, and unique characteristics.
 - **Use creative language** to enhance the visual quality, whether the style is realistic, surreal, or abstract.
 - **Consider mood and style** (e.g., dark and moody, bright and lively, minimalist, detailed, etc.).
 Here are some examples to guide you:
 ---
 ### **Few-Shot Examples:**
 **Example 1:**
 **User Request:**  
 "I want to see a futuristic cityscape at night with neon lights and flying cars."
 **Generated Prompt:**  
 "A vibrant futuristic cityscape at night, with towering skyscrapers that stretch towards a starless sky. The buildings are covered in shimmering neon lights—bright blues, purples, and pinks—casting colorful reflections onto the glossy streets below. Flying cars zip through the air, leaving glowing trails in their wake, while holographic billboards advertise virtual products. The atmosphere is electric, bustling with energy and technology, as a soft mist rises from the ground, adding a touch of mystery to the scene."
 ---
 **Example 2:**
 **User Request:**  
 "I’d love a serene mountain landscape with a calm lake and a small wooden cabin."
 **Generated Prompt:**  
 "A tranquil mountain landscape at dawn, with majestic snow-capped peaks towering in the distance. A serene, glassy lake reflects the vibrant colors of the early morning sky—soft pinks, oranges, and purples. A small, rustic wooden cabin sits by the lakeshore, its smoke rising from the chimney, blending gently with the mist above the water. Pine trees surround the cabin, their dark green needles adding depth to the peaceful scene. The air is crisp, and the whole environment exudes a sense of quiet solitude."
 ---
 **Example 3:**
 **User Request:**  
 "I want a mystical creature in a dark enchanted forest."
 **Generated Prompt:**  
 "A mystical creature standing tall in the heart of a dark, enchanted forest. The creature has the body of a lion, but its fur is deep indigo, shimmering with silver flecks like stars. Its eyes glow with an ethereal light, casting an otherworldly glow across the forest floor. The forest is dense with towering trees whose bark is twisted, covered in glowing moss. Fog weaves through the trees, and mysterious flowers glow faintly in the shadows. The atmosphere is magical, filled with the sense of an ancient, forgotten world full of wonder."
 ---
 **Example 4:**
 **User Request:**  
 "Can you create a vibrant sunset on a tropical beach with palm trees?"
 **Generated Prompt:**  
 "A stunning tropical beach at sunset, where the sky is ablaze with fiery hues of red, orange, and pink, melting into the calm blue of the ocean. The golden sand is warm, and the gentle waves lap against the shore. Silhouetted palm trees frame the scene, their long leaves swaying in the soft breeze. The sun is just dipping below the horizon, casting a golden glow across the water. The atmosphere is peaceful yet vibrant, with the serene sounds of the ocean adding to the beauty of the moment."
 ---
 **Example 5:**
 **User Request:**  
 "Imagine an underwater scene with colorful coral reefs and exotic fish."
 **Generated Prompt:**  
 "A vibrant underwater scene, where the sunlight filters down through crystal-clear water, illuminating the colorful coral reefs below. The corals are in shades of purple, pink, and yellow, teeming with life. Schools of exotic fish dart through the scene—brightly colored in hues of electric blue, orange, and green. The water is calm, with soft ripples distorting the light, while gentle seaweed sways with the current. The scene is peaceful and full of life, a kaleidoscope of color beneath the ocean's surface."
 ---
 ### **User Request:**  
 "Please create a scene with a magical waterfall in a forest."
 **Generated Prompt:**  
 "A breathtaking magical waterfall cascading down from a high cliff, surrounded by an ancient forest. The water sparkles with iridescent hues, as if glowing with a soft, mystical light. Lush green foliage and towering trees frame the waterfall, with delicate vines hanging down like nature’s curtains. Mist rises from the base of the waterfall, creating a rainbow in the air. Sunlight filters through the canopy above, casting dappled light across the mossy rocks and the peaceful forest floor. The atmosphere is serene, almost dreamlike, filled with the sound of the water’s soothing rush."
 ---
 ### **User Request:**  
 "I want to see an alien landscape on another planet with strange rock formations."
 **Generated Prompt:**  
 "A surreal alien landscape on a distant planet, bathed in the pale light of two suns setting on the horizon. The ground is rocky, with bizarre rock formations that defy gravity, twisting and spiraling upward like ancient sculptures. The sky above is a vibrant shade of purple, dotted with swirling clouds and distant stars. The air is thick with an otherworldly mist, and strange, bioluminescent plants glow faintly in the twilight. The scene is alien and unearthly, with a sense of wonder and curiosity as the landscape stretches endlessly into the unknown."
 ---
 ### **User Request:**  
 "Could you create a winter scene with a frozen lake and a snowman?"
 **Generated Prompt:**  
 "A peaceful winter scene with a frozen lake covered in a smooth sheet of ice, reflecting the soft pale blue of the overcast sky. Snow gently falls from the sky, coating the landscape in a thick layer of white. A cheerful snowman stands at the edge of the lake, its coal-black eyes and carrot nose adding a touch of whimsy to the quiet surroundings. Snow-covered pine trees line the shore, their branches weighed down by the snow. The air is crisp and fresh, and the entire scene feels calm, still, and full of the quiet beauty of winter."
 ---
 ### **End of Few-Shot Examples**
--- a/host/AIServerSetup/05-Jetson
+++ b/host/AIServerSetup/05-Jetson
@@ -0,0 +1,129 @@
 {
  "1": {
    "inputs": {
      "ckpt_name": "realvisionbabes_v10.safetensors"
    },
    "class_type": "CheckpointLoaderSimple",
    "_meta": {
      "title": "Load Checkpoint"
    }
  },
  "2": {
    "inputs": {
      "stop_at_clip_layer": -1,
      "clip": [
        "1",
        1
      ]
    },
    "class_type": "CLIPSetLastLayer",
    "_meta": {
      "title": "CLIP Set Last Layer"
    }
  },
  "3": {
    "inputs": {
      "text": "amateur, instagram photo, beautiful face",
      "clip": [
        "2",
        0
      ]
    },
    "class_type": "CLIPTextEncode",
    "_meta": {
      "title": "CLIP Text Encode (Prompt)"
    }
  },
  "4": {
    "inputs": {
      "text": "Watermark, Text, censored, deformed, bad anatomy, disfigured, poorly drawn face, mutated, ugly, cropped, worst quality, low quality, mutation, poorly drawn, abnormal eye proportion, bad\nart, ugly face, messed up face, high forehead, professional photo shoot, makeup, photoshop, doll, plastic_doll, silicone, anime, cartoon, fake, filter, airbrush, 3d max, infant, featureless, colourless, impassive, shaders, two heads, crop,",
      "clip": [
        "2",
        0
      ]
    },
    "class_type": "CLIPTextEncode",
    "_meta": {
      "title": "CLIP Text Encode (Prompt)"
    }
  },
  "5": {
    "inputs": {
      "seed": 411040191827786,
      "steps": 30,
      "cfg": 3,
      "sampler_name": "dpmpp_2m_sde",
      "scheduler": "normal",
      "denoise": 1,
      "model": [
        "1",
        0
      ],
      "positive": [
        "3",
        0
      ],
      "negative": [
        "4",
        0
      ],
      "latent_image": [
        "6",
        0
      ]
    },
    "class_type": "KSampler",
    "_meta": {
      "title": "KSampler"
    }
  },
  "6": {
    "inputs": {
      "width": 512,
      "height": 768,
      "batch_size": 1
    },
    "class_type": "EmptyLatentImage",
    "_meta": {
      "title": "Empty Latent Image"
    }
  },
  "7": {
    "inputs": {
      "samples": [
        "5",
        0
      ],
      "vae": [
        "8",
        0
      ]
    },
    "class_type": "VAEDecode",
    "_meta": {
      "title": "VAE Decode"
    }
  },
  "8": {
    "inputs": {
      "vae_name": "ClearVAE_V2.3_fp16.pt"
    },
    "class_type": "VAELoader",
    "_meta": {
      "title": "Load VAE"
    }
  },
  "9": {
    "inputs": {
      "filename_prefix": "ComfyUI",
      "images": [
        "7",
        0
      ]
    },
    "class_type": "SaveImage",
    "_meta": {
      "title": "Save Image"
    }
  }
 }
--- a/host/AIServerSetup/06-DeepSeek-R1-0528/01-DeepSeek-R1-0528-KTransformers-Setup-Guide.md
+++ b/host/AIServerSetup/06-DeepSeek-R1-0528/01-DeepSeek-R1-0528-KTransformers-Setup-Guide.md
@@ -0,0 +1,316 @@
 # Running DeepSeek-R1-0528 (FP8 Hybrid) with KTransformers
 This guide provides instructions to run the DeepSeek-R1-0528 model locally using a hybrid FP8 (GPU) and Q4_K_M GGUF (CPU) approach with KTransformers, managed via Docker. This setup is optimized for high-end hardware (e.g., NVIDIA RTX 4090, high-core count CPU, significant RAM).
 **Model Version:** DeepSeek-R1-0528
 **KTransformers Version (Working):** `approachingai/ktransformers:v0.2.4post1-AVX512`
 ## Table of Contents
 1.  [Prerequisites](#prerequisites)
 2.  [Model Preparation](#model-preparation)
    *   [Step 2a: Download FP8 Base Model (Host)](#step-2a-download-fp8-base-model-host)
    *   [Step 2b: Download Q4\_K\_M GGUF Model (Host)](#step-2b-download-q4_k_m-gguf-model-host)
    *   [Step 2c: Merge Models (Inside Docker)](#step-2c-merge-models-inside-docker)
    *   [Step 2d: Set Ownership & Permissions (Host)](#step-2d-set-ownership--permissions-host)
 3.  [Running the Model with KTransformers](#running-the-model-with-ktransformers)
    *   [Single GPU (e.g., 1x RTX 4090)](#single-gpu-eg-1x-rtx-4090)
    *   [Multi-GPU (e.g., 2x RTX 4090)](#multi-gpu-eg-2x-rtx-4090)
 4.  [Testing the Server](#testing-the-server)
 5.  [Key Server Parameters](#key-server-parameters)
 6.  [Notes on KTransformers v0.3.1](#notes-on-ktransformers-v031)
 7.  [Available Optimize Config YAMLs (for reference)](#available-optimize-config-yamls-for-reference)
 8.  [Troubleshooting Tips](#troubleshooting-tips)
 ---
 ## 1. Prerequisites
 *   **Hardware:**
    *   NVIDIA GPU with FP8 support (e.g., RTX 40-series, Hopper series).
    *   High core-count CPU (e.g., Intel Xeon, AMD Threadripper).
    *   Significant System RAM (ideally 512GB for larger GGUF experts and context). The Q4_K_M experts for a large model can consume 320GB+ alone.
    *   Fast SSD (NVMe recommended) for model storage.
 *   **Software (on Host):**
    *   Linux OS (Ubuntu 24.04 LTS recommended).
    *   NVIDIA Drivers (ensure they are up-to-date and support your GPU and CUDA version).
    *   Docker Engine.
    *   NVIDIA Container Toolkit (for GPU access within Docker).
    *   Conda or a Python virtual environment manager.
    *   Python 3.9+
    *   `huggingface_hub` and `hf_transfer`
    *   Git (for cloning KTransformers if you need to inspect YAMLs or contribute).
 ---
 ## 2. Model Preparation
 We assume your models will be downloaded and stored under `/home/mukul/dev-ai/models` on your host system. This path will be mounted into the Docker container as `/models`. Adjust paths if your setup differs.
 ### Step 2a: Download FP8 Base Model (Host)
 Download the official DeepSeek-R1-0528 FP8 base model components.
 ```bash
 # Ensure that correct packages are installed. Conda is recommended for environemnt management.
 pip install -U huggingface_hub hf_transfer
 export HF_HUB_ENABLE_HF_TRANSFER=1 # For faster downloads
 ```
 ```bash
 # Define your host model directory
 HOST_MODEL_DIR="/home/mukul/dev-ai/models"
 BASE_MODEL_HF_ID="deepseek-ai/DeepSeek-R1-0528"
 LOCAL_BASE_MODEL_PATH="${HOST_MODEL_DIR}/${BASE_MODEL_HF_ID}"
 mkdir -p "${LOCAL_BASE_MODEL_PATH}"
 echo "Downloading base model to: ${LOCAL_BASE_MODEL_PATH}"
 huggingface-cli download --resume-download "${BASE_MODEL_HF_ID}" \
  --local-dir "${LOCAL_BASE_MODEL_PATH}"```
 ```
 ### Step 2b: Download Q4_K_M GGUF Model (Host)
 Download the Unsloth Q4_K_M GGUF version of DeepSeek-R1-0528 using the attached python script.
 ### Step 2c: Merge Models (Inside Docker)
 This step uses the KTransformers Docker image to merge the FP8 base and Q4\_K\_M GGUF weights.
 ```bash
 docker stop ktransformers
 docker run --rm --gpus '"device=1"' \
  -v /home/mukul/dev-ai/models:/models \
  --name ktransformers \
  -itd approachingai/ktransformers:v0.2.4post1-AVX512
 docker exec -it ktransformers /bin/bash
 ```
 ```bash
 python merge_tensors/merge_safetensor_gguf.py \
  --safetensor_path /models/deepseek-ai/DeepSeek-R1-0528 \
  --gguf_path /models/unsloth/DeepSeek-R1-0528-GGUF/Q4_K_M \
  --output_path /models/mukul/DeepSeek-R1-0528-GGML-FP8-Hybrid/Q4_K_M_FP8
 ```
 ### Step 2d: Set Ownership & Permissions (Host)
 After Docker creates the merged files, fix ownership and permissions on the host.
 ```bash
 HOST_OUTPUT_DIR_QUANT="/home/mukul/dev-ai/models/mukul/DeepSeek-R1-0528-GGML-FP8-Hybrid/Q4_K_M_FP8" # As defined above
 echo "Setting ownership for merged files in: ${HOST_OUTPUT_DIR_QUANT}"
 sudo chown -R $USER:$USER "${HOST_OUTPUT_DIR_QUANT}"
 sudo find "${HOST_OUTPUT_DIR_QUANT}" -type f -exec chmod 664 {} \;
 sudo find "${HOST_OUTPUT_DIR_QUANT}" -type d -exec chmod 775 {} \;
 echo "Ownership and permissions set. Verification:"
 ls -la "${HOST_OUTPUT_DIR_QUANT}"
 ```
 ---
 ## 3. Running the Model with KTransformers
 Ensure the Docker image `approachingai/ktransformers:v0.2.4post1-AVX512` is pulled.
 ### Single GPU (e.g., 1x RTX 4090)
 **1. Start Docker Container:**
 ```bash
 # Stop any previous instance
 docker stop ktransformers || true # Allow if not running
 docker rm ktransformers || true   # Allow if not existing
 # Define your host model directory
 HOST_MODEL_DIR="/home/mukul/dev-ai/models"
 TARGET_GPU="1" # Specify GPU ID, e.g., "0", "1", or "all"
 docker run --rm --gpus "\"device=${TARGET_GPU}\"" \
  -v "${HOST_MODEL_DIR}:/models" \
  -p 10002:10002 \
  --name ktransformers \
  -itd approachingai/ktransformers:v0.2.4post1-AVX512
 docker exec -it ktransformers /bin/bash
 ```
 **2. Inside the Docker container shell, launch the server:**
 ```bash
 # Set environment variable for PyTorch CUDA memory allocation
 export PYTORCH_CUDA_ALLOC_CONF=expandable_segments:True
 CONTAINER_MERGED_MODEL_PATH="/models/mukul/DeepSeek-R1-0528-GGML-FP8-Hybrid/Q4_K_M_FP8"
 CONTAINER_BASE_MODEL_CONFIG_PATH="/models/deepseek-ai/DeepSeek-R1-0528"
 # Launch server
 python3 ktransformers/server/main.py \
    --gguf_path "${CONTAINER_MERGED_MODEL_PATH}" \
    --model_path "${CONTAINER_BASE_MODEL_CONFIG_PATH}" \
    --model_name KVCache-ai/DeepSeek-R1-0528-q4km-fp8 \
    --cpu_infer 57 \
    --max_new_tokens 16384 \
    --cache_lens 24576 \
    --cache_q4 true \
    --temperature 0.6 \
    --top_p 0.95 \
    --optimize_config_path ktransformers/optimize/optimize_rules/DeepSeek-V3-Chat-fp8-linear-ggml-experts.yaml \
    --force_think \
    --use_cuda_graph \
    --host 0.0.0.0 \
    --port 10002
 ```
 *Note: The `--optimize_config_path` still refers to a `DeepSeek-V3` YAML. This V3 config is compatible and recommended.
 ### Multi-GPU (e.g., 2x RTX 4090)
 **1. Start Docker Container:**
 ```bash
 # Stop any previous instance
 docker stop ktransformers || true
 docker rm ktransformers || true
 # Define your host model directory
 HOST_MODEL_DIR="/home/mukul/dev-ai/models"
 TARGET_GPUS="0,1" # Specify GPU IDs
 docker run --rm --gpus "\"device=${TARGET_GPUS}\"" \
  -v "${HOST_MODEL_DIR}:/models" \
  -p 10002:10002 \
  --name ktransformers \
  -itd approachingai/ktransformers:v0.2.4post1-AVX512
 docker exec -it ktransformers /bin/bash
 ```
 **2. Inside the Docker container shell, launch the server:**
 ```bash
 # Set environment variable (optional for multi-GPU, but can be helpful)
 # export PYTORCH_CUDA_ALLOC_CONF=expandable_segments:True 
 # Define container paths
 CONTAINER_MERGED_MODEL_PATH="/models/mukul/DeepSeek-R1-0528-GGML-FP8-Hybrid/Q4_K_M_FP8"
 CONTAINER_BASE_MODEL_CONFIG_PATH="/models/deepseek-ai/DeepSeek-R1-0528"
 # Launch server
 python3 ktransformers/server/main.py \
    --gguf_path "${CONTAINER_MERGED_MODEL_PATH}" \
    --model_path "${CONTAINER_BASE_MODEL_CONFIG_PATH}" \
    --model_name KVCache-ai/DeepSeek-R1-0528-q4km-fp8 \
    --cpu_infer 57 \
    --max_new_tokens 24576 \
    --cache_lens 32768 \
    --cache_q4 true \
    --temperature 0.6 \
    --top_p 0.95 \
    --optimize_config_path ktransformers/optimize/optimize_rules/DeepSeek-V3-Chat-multi-gpu-fp8-linear-ggml-experts.yaml \
    --force_think \
    --use_cuda_graph \
    --host 0.0.0.0 \
    --port 10002
 ```
 *Note: The `--optimize_config_path` still refers to a `DeepSeek-V3` YAML. This is intentional.*
 ---
 ## 4. Testing the Server
 Once the server is running inside Docker (look for "Uvicorn running on http://0.0.0.0:10002"), open a **new terminal on your host machine** and test with `curl`:
 ```bash
 curl http://localhost:10002/v1/chat/completions \
    -H "Content-Type: application/json" \
    -d '{
        "model": "KVCache-ai/DeepSeek-R1-0528-q4km-fp8",
        "messages": [{"role": "user", "content": "Explain the concept of Mixture of Experts in large language models in a simple way."}],
        "max_tokens": 250,
        "temperature": 0.6,
        "top_p": 0.95
    }'
 ```
 A JSON response containing the model's output indicates success.
 ---
 ## 5. Key Server Parameters
 *   `--gguf_path`: Path inside the container to your **merged** hybrid model files.
 *   `--model_path`: Path inside the container to the **original base model's** directory (containing `config.json`, `tokenizer.json`, etc.). KTransformers needs this for model configuration.
 *   `--model_name`: Arbitrary name for the API endpoint. Used in client requests.
 *   `--cpu_infer`: Number of CPU threads for GGUF expert inference. Tune based on your CPU cores (e.g., `57` for a 56-core/112-thread CPU might leave some cores for other tasks, or you could try higher).
 *   `--max_new_tokens`: Maximum number of tokens the model can generate in a single response.
 *   `--cache_lens`: Maximum KV cache size in tokens. Directly impacts context length capacity and VRAM usage.
 *   `--cache_q4`: (Boolean) If `true`, quantizes the KV cache to 4-bit. **Crucial for saving VRAM**, especially with long contexts.
 *   `--temperature`, `--top_p`: Control generation randomness.
 *   `--optimize_config_path`: Path to the KTransformers YAML file defining the layer offloading strategy (FP8 on GPU, GGUF on CPU). **Essential for the hybrid setup.**
 *   `--force_think`: (KTransformers specific) Potentially related to how the model processes or plans.
 *   `--use_cuda_graph`: Enables CUDA graphs for potentially faster GPU execution by reducing kernel launch overhead.
 *   `--host`, `--port`: Network interface and port for the server.
 *   `PYTORCH_CUDA_ALLOC_CONF=expandable_segments:True`: Environment variable to help PyTorch manage CUDA memory more flexibly and potentially avoid OOM errors.
 ---
 ## 6. Notes on KTransformers v0.3.1
 As of 2025-06-02, the `approachingai/ktransformers:v0.3.1-AVX512` image was reported as **not working** with the provided single GPU or multi-GPU configuration.
 **Attempted Docker Start Command (v0.3.1 - Non-Functional):**
 ```bash
 # docker stop ktransformers # (if attempting to switch)
 # docker run --rm --gpus '"device=0,1"' \
 #   -v /home/mukul/dev-ai/models:/models \
 #   -p 10002:10002 \
 #   --name ktransformers \
 #   -itd approachingai/ktransformers:v0.3.1-AVX512
 #
 # docker exec -it ktransformers /bin/bash
 ```
 **Attempted Server Launch (v0.3.1 - Non-Functional):**
 ```bash
 # # Inside the v0.3.1 Docker container shell
 # PYTORCH_CUDA_ALLOC_CONF=expandable_segments:True python3 ktransformers/server/main.py \
 #     --gguf_path /models/mukul/DeepSeek-R1-0528-GGML-FP8-Hybrid/Q4_K_M_FP8 \
 #     --model_path /models/deepseek-ai/DeepSeek-R1-0528 \
 #     --model_name KVCache-ai/DeepSeek-R1-0528-q4km-fp8 \
 #     --cpu_infer 57 \
 #     --max_new_tokens 32768 \
 #     --cache_lens 65536 \
 #     --cache_q4 true \
 #     --temperature 0.6 \
 #     --top_p 0.95 \
 #     --optimize_config_path ktransformers/optimize/optimize_rules/DeepSeek-V3-Chat-multi-gpu-fp8-linear-ggml-experts.yaml \
 #     --force_think \
 #     --use_cuda_graph \
 #     --host 0.0.0.0 \
 #     --port 10002
 ```
 Stick to `approachingai/ktransformers:v0.2.4post1-AVX512` for the configurations described above until compatibility issues with newer versions are resolved for this specific model and setup.
 ---
 ## 7. Available Optimize Config YAMLs (for reference)
 The KTransformers repository contains various optimization YAML files. The ones used in this guide are for `DeepSeek-V3` but are being applied to `DeepSeek-R1-0528`. Their direct compatibility or optimality for R1-0528 should be verified. If KTransformers releases specific YAMLs for DeepSeek-R1-0528, those should be preferred.
 Reference list of some `DeepSeek-V3` YAMLs (path `ktransformers/optimize/optimize_rules/` inside the container):
 ```
 DeepSeek-V3-Chat-amx.yaml
 DeepSeek-V3-Chat-fp8-linear-ggml-experts-serve-amx.yaml
 DeepSeek-V3-Chat-fp8-linear-ggml-experts-serve.yaml
 DeepSeek-V3-Chat-fp8-linear-ggml-experts.yaml
 DeepSeek-V3-Chat-multi-gpu-4.yaml
 DeepSeek-V3-Chat-multi-gpu-8.yaml
 DeepSeek-V3-Chat-multi-gpu-fp8-linear-ggml-experts.yaml
 DeepSeek-V3-Chat-multi-gpu-marlin.yaml
 DeepSeek-V3-Chat-multi-gpu.yaml
 DeepSeek-V3-Chat-serve.yaml
 DeepSeek-V3-Chat.yaml
 ```
--- a/host/AIServerSetup/06-DeepSeek-R1-0528/download-gguf.py
+++ b/host/AIServerSetup/06-DeepSeek-R1-0528/download-gguf.py
@@ -0,0 +1,65 @@
 from huggingface_hub import hf_hub_download, list_repo_files # Import list_repo_files
 import os
 # Configuration
 repo_id = "unsloth/DeepSeek-R1-0528-GGUF"
 folder_in_repo = "Q4_K_M"
 file_extension = ".gguf"
 # Expand the tilde (~) to the user's home directory
 local_base_dir = os.path.expanduser("~/dev-ai/models/unsloth/DeepSeek-R1-0528-GGUF")
 # Create base directory
 # The hf_hub_download function will create the directory if it doesn't exist
 # when local_dir_use_symlinks=False. However, explicit creation is fine.
 os.makedirs(local_base_dir, exist_ok=True)
 # Download files
 print(f"Listing files from {repo_id} in folder {folder_in_repo} with extension {file_extension}...")
 try:
    all_repo_files = list_repo_files(repo_id, repo_type='model')
    files_to_download = [
        f for f in all_repo_files
        if f.startswith(folder_in_repo + "/") and f.endswith(file_extension)
    ]
    if not files_to_download:
        print(f"No files found in '{folder_in_repo}' with extension '{file_extension}'.")
    else:
        print(f"Found {len(files_to_download)} file(s) to download.")
    for filename_in_repo in files_to_download:
        print(f"Downloading {filename_in_repo}...")
        # The filename parameter in hf_hub_download should be the path within the repo
        # The local_dir parameter specifies where the file (maintaining its repo path structure)
        # will be saved under.
        # For example, if filename_in_repo is "UD-Q4_K_XL/file.gguf",
        # it will be saved as local_base_dir/UD-Q4_K_XL/file.gguf
        try:
            downloaded_file_path = hf_hub_download(
                repo_id=repo_id,
                filename=filename_in_repo, # This is the path of the file within the repository
                local_dir=local_base_dir,
                local_dir_use_symlinks=False,
                # Set resume_download=True if you want to resume interrupted downloads
                # resume_download=True,
            )
            # The hf_hub_download function returns the full path to the downloaded file.
            # The way files are saved when local_dir is used can be tricky.
            # If filename_in_repo is "folder/file.txt", it will be saved as "local_dir/folder/file.txt".
            # If you want all files directly in local_base_dir without the repo's folder structure,
            # you would need to adjust the local_dir or rename/move the file post-download.
            # However, for GGUF files from a specific folder, saving them under that folder structure locally is usually fine.
            print(f"Successfully downloaded and saved to: {downloaded_file_path}")
            # If you want to confirm the exact path as per your original print statement's intent:
            # expected_local_path = os.path.join(local_base_dir, filename_in_repo)
            # print(f"Saved to: {expected_local_path}")
        except Exception as e:
            print(f"Error downloading {filename_in_repo}: {str(e)}")
 except Exception as e:
    print(f"Error listing files from repository: {str(e)}")
 print("Download process complete.")
--- a/host/AIServerSetup/99-Tips-And-Tricks/01-port-forward-trick.md
+++ b/host/AIServerSetup/99-Tips-And-Tricks/01-port-forward-trick.md
@@ -0,0 +1,137 @@
 # **Port Forwarding Magic: Set Up Bolt.New with Remote Ollama Server and Qwen2.5-Coder:32B**
 This guide demonstrates how to use **port forwarding** to connect your local **Bolt.New** setup to a **remote Ollama server**, solving issues with apps that don’t allow full customization. We’ll use the open-source [Bolt.New repository](https://github.com/coleam00/bolt.new-any-llm) as our example, and we’ll even show you how to extend the context length for the popular **Qwen2.5-Coder:32B model**.
 If you encounter installation issues, submit an [issue](https://github.com/coleam00/bolt.new-any-llm/issues) or contribute by forking and improving this guide.
 ---
 ## **What You'll Learn**
 - Clone and configure **Bolt.New** for your local development.
 - Use **SSH tunneling** to seamlessly forward traffic to a remote server.
 - Extend the context length of AI models for enhanced capabilities.
 - Run **Bolt.New** locally.
 ---
 ## **Prerequisites**
 Download and install Node.js from [https://nodejs.org/en/download/](https://nodejs.org/en/download/).
 ---
 ## **Step 1: Clone the Repository**
 1. Open Terminal.
 2. Clone the repository:
   ```bash
   git clone https://github.com/coleam00/bolt.new-any-llm.git
   ```
 ---
 ## **Step 2: Stop Local Ollama Service**
 If Ollama is already running on your machine, stop it to avoid conflicts with the remote server.
 - **Stop the service**:
   ```bash
   sudo systemctl stop ollama.service
   ```
 - **OPTIONAL: Disable it from restarting**:
   ```bash
   sudo systemctl disable ollama.service
   ```
 ---
 ## **Step 3: Forward Local Traffic to the Remote Ollama Server**
 To forward all traffic from `localhost:11434` to your remote Ollama server (`ai.mtcl.lan:11434`), set up SSH tunneling:
 1. Open a terminal and run:
   ```bash
   ssh -L 11434:ai.mtcl.lan:11434 mukul@ai.mtcl.lan
   ```
   - Replace `mukul` with your remote username.
   - Replace `ai.mtcl.lan` with your server's hostname or IP.
 2. Keep this terminal session running while using Bolt.New. This ensures your app communicates with the remote server as if it’s local.
 ---
 ## **Step 4: OPTIONAL: Extend Ollama Model Context Length**
 By default, Ollama models have a context length of 2048 tokens. For tasks requiring larger input, extend this limit for **Qwen2.5-Coder:32B**:
 1. SSH into your remote server:
   ```bash
   ssh mukul@ai.mtcl.lan
   ```
 2. Access the Docker container running Ollama:
   ```bash
   docker exec -it ollama /bin/bash
   ```
 3. Create a `Modelfile`:
   While inside the Docker container, run the following commands to create the Modelfile:
   ```bash
   echo "FROM qwen2.5-coder:32b" > /tmp/Modelfile
   echo "PARAMETER num_ctx 32768" >> /tmp/Modelfile
   ```
   If you prefer, you can use cat to directly create the file:
   ```bash
   cat > /tmp/Modelfile << EOF
   FROM qwen2.5-coder:32b
   PARAMETER num_ctx 32768
   EOF
   ```
 4. Create the new model:
   ```bash
   ollama create -f /tmp/Modelfile qwen2.5-coder-extra-ctx:32b
   ```
 5. Verify the new model:
   ```bash
   ollama list
   ```
   You should see `qwen2.5-coder-extra-ctx:32b` listed.
 6. Exit the Docker container:
   ```bash
   exit
   ```
 ---
 ## **Step 5: Run Bolt.New Without Docker**
 1. **Install Dependencies**  
   Navigate to the cloned repository:
   ```bash
   cd bolt.new-any-llm
   pnpm install
   ```
 2. **Start the Development Server**  
   Run:
   ```bash
   pnpm run dev
   ```
 ---
 ## **Summary**
 This guide walks you through setting up **Bolt.New** with a **remote Ollama server**, ensuring seamless communication through SSH tunneling. We’ve also shown you how to extend the context length for **Qwen2.5-Coder:32B**, making it ideal for advanced development tasks.
 With this setup:
 - You’ll offload heavy computation to your remote server.
 - Your local machine remains light and responsive.
 - Buggy `localhost` configurations? No problem—SSH tunneling has you covered.
 Credits: [Bolt.New repository](https://github.com/coleam00/bolt.new-any-llm). 
 Let’s build something amazing! 🚀
--- a/host/AIServerSetup/99-Tips-And-Tricks/02-Set
+++ b/host/AIServerSetup/99-Tips-And-Tricks/02-Set
@@ -0,0 +1,107 @@
 ### **Guide to Set Up Bridge Networking on Ubuntu for Virtual Machines**
 This guide explains how to configure bridge networking on Ubuntu to allow virtual machines (VMs) to directly access the network, obtaining their own IP addresses from the DHCP server.
 By following this guide, you can successfully set up bridge networking, enabling your virtual machines to directly access the network as if they were standalone devices.
 ---
 #### **Step 1: Identify Your Primary Network Interface**
 The primary network interface is the one currently used by the server for network access. Identify it with the following command:
 ```bash
 ip link show
 ```
 Look for the name of the interface (e.g., `enp8s0`) with `state UP`.
 ---
 #### **Step 2: Backup Your Current Network Configuration**
 Before making any changes, back up the existing netplan configuration file:
 ```bash
 sudo cp /etc/netplan/00-installer-config.yaml /etc/netplan/00-installer-config.yaml.bak
 ```
 ---
 #### **Step 3: Configure the Bridge**
 Edit the netplan configuration file:
 ```bash
 sudo nano /etc/netplan/00-installer-config.yaml
 ```
 Replace its content with the following, adjusted for your environment:
 ```yaml
 network:
  version: 2
  ethernets:
    enp8s0:
      dhcp4: no
  bridges:
    br0:
      interfaces: [enp8s0]
      dhcp4: true
 ```
 - `enp8s0`: Your physical network interface.
 - `br0`: The new bridge interface that will be used by the virtual machines and the host.
 Save and exit the file.
 ---
 #### **Step 4: Apply the Configuration**
 Apply the new network configuration to create the bridge:
 ```bash
 sudo netplan apply
 ```
 ---
 #### **Step 5: Verify the Bridge Configuration**
 Check that the bridge `br0` is active and has an IP address:
 ```bash
 ip addr show br0
 ```
 You should see an output like this:
 ```plaintext
 3: br0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue state UP group default qlen 1000
    link/ether 46:10:cc:63:f4:37 brd ff:ff:ff:ff:ff:ff
    inet 192.168.1.10/24 metric 100 brd 192.168.1.255 scope global dynamic br0
       valid_lft 7102sec preferred_lft 7102sec
 ```
 ---
 #### **Step 6: Configure Virtual Machines to Use the Bridge**
 For VMs created with tools like `virt-manager` or `virsh`:
 1. When configuring the VM’s network interface, choose **Bridge** as the network source.
 2. Set `br0` as the bridge interface.
 3. The VM will now obtain an IP address dynamically from the same DHCP server as the host.
 For `virt-manager`:
 - Go to **Add Hardware > Network**.
 - Choose **Bridge br0** as the source.
 ---
 #### **Step 7: Test the Setup**
 1. Start a VM and ensure it obtains a dynamic IP address from the network.
 2. Test connectivity by pinging the gateway or external servers from the VM.
 ---
 ### **Key Considerations**
 1. **Dynamic IP for Host:** The host server's IP address will now be associated with the bridge (`br0`) instead of the physical interface (`enp8s0`). This is expected behavior.
 2. **Backup Configuration:** Always maintain a backup of your original network configuration to revert changes if needed.
 3. **Network Manager vs. Netplan:** Use only one method (`netplan` or `nmcli`) for managing network configurations to avoid conflicts.
 4. **Alternative Access:** If you are working on a remote server, ensure alternative access (e.g., a second network interface) before applying network changes.