Added AI server setup : to be reviewed

This commit is contained in:
2026-01-21 19:10:32 +01:00
parent 30d560f804
commit b580137ee8
16 changed files with 4089 additions and 0 deletions

View File

@@ -0,0 +1,143 @@
# Installing Webmin and Docker on Ubuntu
This guide walks you through installing Webmin on Ubuntu and expanding logical volumes via Webmins interface. Additionally, it covers Docker installation on Ubuntu.
---
## Part 1: Installing Webmin on Ubuntu
Webmin is a web-based interface for managing Unix-like systems, making tasks such as user management, server configuration, and software installation easier.
### Step 1: Update Your System
Before installing Webmin, update your system to ensure all packages are up to date.
```bash
sudo apt update && sudo apt upgrade -y
```
### Step 2: Add the Webmin Repository and Key
To add the Webmin repository, download and run the setup script.
```bash
curl -o setup-repos.sh https://raw.githubusercontent.com/webmin/webmin/master/setup-repos.sh
sudo sh setup-repos.sh
```
### Step 3: Install Webmin
With the repository set up, install Webmin:
```bash
sudo apt-get install webmin --install-recommends
```
### Step 4: Access Webmin
Once installed, Webmin runs on port 10000. You can access it by opening a browser and navigating to:
```
https://<your-server-ip>:10000
```
If you are using a firewall, allow traffic on port 10000:
```bash
sudo ufw allow 10000
```
You can now log in to Webmin using your system's root credentials.
---
## Part 2: Expanding a Logical Volume Using Webmin
Expanding a logical volume through Webmins Logical Volume Management (LVM) interface is a simple process.
### Step 1: Access Logical Volume Management
Log in to Webmin and navigate to:
**Hardware > Logical Volume Management**
Here, you can manage physical volumes, volume groups, and logical volumes.
### Step 2: Add a New Physical Volume
If you've added a new disk or partition to your system, you need to allocate it to a volume group before expanding the logical volume. To do this:
1. Locate your volume group in the Logical Volume Management module.
2. Click **Add Physical Volume**.
3. Select the new partition or RAID device and click **Add to volume group**. This action increases the available space in the group.
### Step 3: Resize the Logical Volume
To extend a logical volume:
1. In the **Logical Volumes** section, locate the logical volume you wish to extend.
2. Select **Resize**.
3. Specify the additional space or use all available free space in the volume group.
4. Click **Apply** to resize the logical volume.
### Step 4: Resize the Filesystem
After resizing the logical volume, expand the filesystem to match:
1. Click on the logical volume to view its details.
2. For supported filesystems like ext2, ext3, or ext4, click **Resize Filesystem**. The filesystem will automatically adjust to the new size of the logical volume.
---
## Part 3: Installing Docker on Ubuntu
This section covers installing Docker on Ubuntu.
### Step 1: Remove Older Versions
If you have previous versions of Docker installed, remove them:
```bash
sudo apt remove docker docker-engine docker.io containerd runc
```
### Step 2: Add Docker's Official GPG Key and Repository
Add Dockers GPG key and repository to your systems Apt sources:
```bash
sudo apt-get update
sudo apt-get install ca-certificates curl
sudo install -m 0755 -d /etc/apt/keyrings
sudo curl -fsSL https://download.docker.com/linux/ubuntu/gpg -o /etc/apt/keyrings/docker.asc
sudo chmod a+r /etc/apt/keyrings/docker.asc
echo \
"deb [arch=$(dpkg --print-architecture) signed-by=/etc/apt/keyrings/docker.asc] https://download.docker.com/linux/ubuntu \
$(. /etc/os-release && echo "$VERSION_CODENAME") stable" | \
sudo tee /etc/apt/sources.list.d/docker.list > /dev/null
sudo apt-get update
```
### Step 3: Install Docker
Now, install Docker:
```bash
sudo apt-get install docker-ce docker-ce-cli containerd.io docker-buildx-plugin docker-compose-plugin
```
### Step 4: Post-Installation Steps
To allow your user to run Docker commands without `sudo`, add your user to the Docker group:
```bash
sudo usermod -aG docker $USER
newgrp docker
```
Test your Docker installation by running the following command:
```bash
docker run hello-world
```
For more information, visit the official [Docker installation page](https://docs.docker.com/engine/install/ubuntu/).

View File

@@ -0,0 +1,182 @@
# How to Install the Latest Version of NVIDIA CUDA on Ubuntu 22.04 LTS
If youre looking to unlock the power of your NVIDIA GPU for scientific computing, machine learning, or other parallel workloads, CUDA is essential. Follow this step-by-step guide to install the latest CUDA release on Ubuntu 22.04 LTS.
## Prerequisites
Before proceeding with installing CUDA, ensure your system meets the following requirements:
- **Ubuntu 22.04 LTS** This version is highly recommended for stability and compatibility.
- **NVIDIA GPU + Drivers** CUDA requires having an NVIDIA GPU along with proprietary Nvidia drivers installed.
To check for an NVIDIA GPU, open a terminal and run:
```bash
lspci | grep -i NVIDIA
```
If an NVIDIA GPU is present, it will be listed. If not, consult NVIDIAs documentation on installing the latest display drivers.
## Step 1: Install Latest NVIDIA Drivers
Install the latest NVIDIA drivers matched to your GPU model and CUDA version using Ubuntus built-in Additional Drivers utility:
1. Open **Settings -> Software & Updates -> Additional Drivers**
2. Select the recommended driver under the NVIDIA heading
3. Click **Apply Changes** and **Reboot**
Verify the driver installation by running:
```bash
nvidia-smi
```
This should print details on your NVIDIA GPU and driver version.
## Step 2: Add the CUDA Repository
Add NVIDIAs official repository to your system to install CUDA:
1. Visit NVIDIAs CUDA Download Page and select "Linux", "x86_64", "Ubuntu", "22.04", "deb(network)"
2. Copy the repository installation commands for Ubuntu 22.04:
```bash
wget https://developer.download.nvidia.com/compute/cuda/repos/ubuntu2204/x86_64/cuda-keyring_1.1-1_all.deb
sudo dpkg -i cuda-keyring_1.1-1_all.deb
sudo apt-get update
```
Run these commands to download repository metadata and add the apt source.
## Step 3: Install CUDA Toolkit
Install CUDA using apt:
```bash
sudo apt-get -y install cuda
```
Press **Y** to proceed and allow the latest supported version of the CUDA toolkit to install.
## Step 4: Configure Environment Variables
Update environment variables to recognize the CUDA compiler, tools, and libraries:
Open `/etc/profile.d/cuda.sh` and add the following configuration:
```bash
export PATH=/usr/local/cuda/bin:$PATH
export LD_LIBRARY_PATH=/usr/local/cuda/lib64:$LD_LIBRARY_PATH
```
Save changes and refresh environment variables:
```bash
source /etc/profile.d/cuda.sh
```
Alternatively, reboot to load the updated environment variables.
## Step 5: Verify Installation
Validate the installation:
1. Check the `nvcc` compiler version:
```bash
nvcc --version
```
This should display details on the CUDA compile driver, including the installed version.
2. Verify GPU details with NVIDIA SMI:
```bash
nvidia-smi
```
# Optional: Setting Up cuDNN with CUDA: A Comprehensive Guide
This guide will walk you through downloading cuDNN from NVIDIA's official site, extracting it, copying the necessary files to the CUDA directory, and setting up environment variables for CUDA.
## Step 1: Download cuDNN
1. **Visit the NVIDIA cuDNN Archive**:
Navigate to the [NVIDIA cuDNN Archive](https://developer.nvidia.com/rdp/cudnn-archive).
2. **Select the Version**:
Choose the appropriate version of cuDNN compatible with your CUDA version. For this guide, we'll assume you are downloading `cudnn-linux-x86_64-8.9.7.29_cuda12-archive`.
3. **Download the Archive**:
Download the `tar.xz` file to your local machine.
## Step 2: Extract cuDNN
1. **Navigate to the Download Directory**:
Open a terminal and navigate to the directory where the archive was downloaded.
```bash
cd ~/Downloads
```
2. **Extract the Archive**:
Use the `tar` command to extract the contents of the archive.
```bash
tar -xvf cudnn-linux-x86_64-8.9.7.29_cuda12-archive.tar.xz
```
This will create a directory named `cudnn-linux-x86_64-8.9.7.29_cuda12-archive`.
## Step 3: Copy cuDNN Files to CUDA Directory
1. **Navigate to the Extracted Directory**:
Move into the directory containing the extracted cuDNN files.
```bash
cd cudnn-linux-x86_64-8.9.7.29_cuda12-archive
```
2. **Copy Header Files**:
Copy the header files to the CUDA include directory.
```bash
sudo cp include/cudnn*.h /usr/local/cuda-12.5/include/
```
3. **Copy Library Files**:
Copy the library files to the CUDA lib64 directory.
```bash
sudo cp lib/libcudnn* /usr/local/cuda-12.5/lib64/
```
4. **Set Correct Permissions**:
Ensure the copied files have the appropriate permissions.
```bash
sudo chmod a+r /usr/local/cuda-12.5/include/cudnn*.h /usr/local/cuda-12.5/lib64/libcudnn*
```
## Step 4: Set Up Environment Variables
1. **Open Your Shell Profile**:
Open your `.bashrc` or `.bash_profile` file in a text editor.
```bash
nano ~/.bashrc
```
2. **Add CUDA to PATH and LD_LIBRARY_PATH**:
Add the following lines to set the environment variables for CUDA. This example assumes CUDA 12.5.
```bash
export PATH=/usr/local/cuda-12.5/bin:$PATH
export LD_LIBRARY_PATH=/usr/local/cuda-12.5/lib64:$LD_LIBRARY_PATH
```
3. **Apply the Changes**:
Source the file to apply the changes immediately.
```bash
source ~/.bashrc
```
## Verification
1. **Check CUDA Installation**:
Verify that CUDA is correctly set up by running:
```bash
nvcc --version
```
2. **Check cuDNN Installation**:
Optionally, you can compile and run a sample program to ensure cuDNN is working correctly.
By following these steps, you will have downloaded and installed cuDNN, integrated it into your CUDA setup, and configured your environment variables for smooth operation. This ensures that applications requiring both CUDA and cuDNN can run without issues.

View File

@@ -0,0 +1,85 @@
# OPTIONAL: Setting NVIDIA GPU Power Limit at System Startup
## Overview
This guide explains how to set the power limit for NVIDIA GPUs at system startup using a systemd service. This ensures the power limit setting is persistent across reboots.
## Steps
### 1. Create and Configure the Service File
1. Open a terminal and create a new systemd service file:
```bash
sudo nano /etc/systemd/system/nvidia-power-limit.service
```
2. Add the following content to the file, replacing `270` with the desired power limit (e.g., 270 watts for your GPUs):
- For Dual GPU Setup:
```ini
[Unit]
Description=Set NVIDIA GPU Power Limit
[Service]
Type=oneshot
ExecStart=/usr/bin/nvidia-smi -i 0 -pl 270
ExecStart=/usr/bin/nvidia-smi -i 1 -pl 270
[Install]
WantedBy=multi-user.target
```
- For Quad GPU Setup:
```ini
[Unit]
Description=Set NVIDIA GPU Power Limit
[Service]
Type=oneshot
ExecStart=/usr/bin/nvidia-smi -i 0 -pl 270
ExecStart=/usr/bin/nvidia-smi -i 1 -pl 270
ExecStart=/usr/bin/nvidia-smi -i 2 -pl 270
ExecStart=/usr/bin/nvidia-smi -i 3 -pl 270
[Install]
WantedBy=multi-user.target
```
Save and close the file.
### 2. Apply and Enable the Service
1. Reload the systemd manager configuration:
```bash
sudo systemctl daemon-reload
```
2. Enable the service to ensure it runs at startup:
```bash
sudo systemctl enable nvidia-power-limit.service
```
### 3. (Optional) Start the Service Immediately
To apply the power limit immediately without rebooting:
```bash
sudo systemctl start nvidia-power-limit.service
```
## Verification
Check the power limits using `nvidia-smi`:
```bash
nvidia-smi -q -d POWER
```
Look for the "Power Management" section to verify the new power limits.
By following this guide, you can ensure that your NVIDIA GPUs have a power limit set at every system startup, providing consistent and controlled power usage for your GPUs.

View File

@@ -0,0 +1,103 @@
# Ollama & OpenWebUI Docker Setup
## Ollama with Nvidia GPU
Ollama makes it easy to get up and running with large language models locally.
To run Ollama using an Nvidia GPU, follow these steps:
### Step 1: Install the NVIDIA Container Toolkit
#### Install with Apt
1. **Configure the repository**:
```bash
curl -fsSL https://nvidia.github.io/libnvidia-container/gpgkey \
| sudo gpg --dearmor -o /usr/share/keyrings/nvidia-container-toolkit-keyring.gpg
curl -s -L https://nvidia.github.io/libnvidia-container/stable/deb/nvidia-container-toolkit.list \
| sed 's#deb https://#deb [signed-by=/usr/share/keyrings/nvidia-container-toolkit-keyring.gpg] https://#g' \
| sudo tee /etc/apt/sources.list.d/nvidia-container-toolkit.list
sudo apt-get update
```
2. **Install the NVIDIA Container Toolkit packages**:
```bash
sudo apt-get install -y nvidia-container-toolkit
```
#### Install with Yum or Dnf
1. **Configure the repository**:
```bash
curl -s -L https://nvidia.github.io/libnvidia-container/stable/rpm/nvidia-container-toolkit.repo \
| sudo tee /etc/yum.repos.d/nvidia-container-toolkit.repo
```
2. **Install the NVIDIA Container Toolkit packages**:
```bash
sudo yum install -y nvidia-container-toolkit
```
### Step 2: Configure Docker to Use Nvidia Driver
```bash
sudo nvidia-ctk runtime configure --runtime=docker
sudo systemctl restart docker
```
### Step 3: Start the Container
```bash
docker run -d --gpus=all -v ollama:/root/.ollama -p 11434:11434 --restart always --name ollama ollama/ollama
```
## Running Multiple Instances with Specific GPUs
You can run multiple instances of the Ollama server and assign specific GPUs to each instance. In my server, I have 4 Nvidia 3090 GPUs, which I use as described below:
### Ollama Server for GPUs 0 and 1
```bash
docker run -d --gpus '"device=0,1"' -v ollama:/root/.ollama -p 11435:11434 --restart always --name ollama1 --network ollama-network ollama/ollama
```
### Ollama Server for GPUs 2 and 3
```bash
docker run -d --gpus '"device=2,3"' -v ollama:/root/.ollama -p 11436:11434 --restart always --name ollama2 --network ollama-network ollama/ollama
```
## Running Models Locally
Once the container is up and running, you can execute models using:
```bash
docker exec -it ollama ollama run llama3.1
```
```bash
docker exec -it ollama ollama run llama3.1:70b
```
```bash
docker exec -it ollama ollama run qwen2.5-coder:1.5b
```
```bash
docker exec -it ollama ollama run deepseek-v2
```
### Try Different Models
Explore more models available in the [Ollama library](https://github.com/ollama/ollama).
## OpenWebUI Installation
To install and run OpenWebUI, use the following command:
```bash
docker run -d -p 3000:8080 --add-host=host.docker.internal:host-gateway -v open-webui:/app/backend/data --name open-webui --restart always ghcr.io/open-webui/open-webui:main
```

View File

@@ -0,0 +1,48 @@
### Wiki: Updating Docker Containers for Ollama and OpenWebUI
This guide explains the steps to update Docker containers for **Ollama** and **OpenWebUI**. Follow the instructions below to stop, remove, pull new images, and run the updated containers.
---
## Ollama
### Steps to Update
1. **Stop Existing Containers**
2. **Remove Existing Containers**
3. **Pull the Latest Ollama Image**
4. **Run Updated Containers**
For GPU devices 0 and 1:
```bash
docker stop ollama
docker rm ollama
docker pull ollama/ollama
docker run -d --gpus '"device=0,1"' -v ollama:/root/.ollama -p 11434:11434 --restart always --name ollama -e OLLAMA_KEEP_ALIVE=1h ollama/ollama
```
For NVIDIA jetson/cpu
```bash
docker stop ollama
docker rm ollama
docker pull ollama/ollama
docker run -d -v ollama:/root/.ollama -p 11434:11434 --restart always --name ollama -e OLLAMA_KEEP_ALIVE=1h ollama/ollama
```
---
## OpenWebUI
```bash
docker stop open-webui
docker rm open-webui
docker pull ghcr.io/open-webui/open-webui:main
docker run -d -p 3000:8080 --add-host=host.docker.internal:host-gateway -v open-webui:/app/backend/data --name open-webui --restart always ghcr.io/open-webui/open-webui:main
```
---
### Notes
- Make sure to adjust GPU allocation or port numbers as necessary for your setup.
- The `OLLAMA_KEEP_ALIVE` environment variable is set to `1h` to maintain the container alive for an hour after inactivity.

View File

@@ -0,0 +1,58 @@
# Running SearXNG with Custom Settings in Docker
## Overview
This guide walks you through the steps to run a SearXNG instance in Docker using a custom `settings.yml` configuration file. This setup is ideal for users who want to customize their SearXNG instance without needing to rebuild the Docker image every time they make a change.
## Prerequisites
- **Docker**: Ensure Docker is installed on your machine. Verify the installation by running `docker --version`.
- **Git**: For cloning the SearXNG repository, make sure Git is installed.
## Steps
### 1. Use the Official Image or Clone the SearXNG Repository
You can pull the official image directly from Docker Hub:
```bash
docker pull docker.io/searxng/searxng:latest
```
### 2. Customize `settings.yml`
Place your custom `settings.yml` file in the directory of your choice. Ensure that this file is configured according to your needs, including enabling JSON responses if required.
### 3. Run the SearXNG Docker Container
Run the Docker container using your custom `settings.yml` file. Choose the appropriate command based on whether you are using the official image or a custom build.
#### For the Official Image:
```bash
docker run -d -p 4000:8080 --restart always --name searxng -v ./settings.yml:/etc/searxng/settings.yml searxng/searxng:latest
```
#### Command Breakdown:
- `-d`: Runs the container in detached mode.
- `-p 4000:8080`: Maps port 8080 in the container to port 4000 on your host machine.
- `-v ./settings.yml:/etc/searxng/settings.yml`: Mounts the custom `settings.yml` file into the container.
- `searxng/searxng:latest` or `searxng/searxng`: The Docker image being used.
### 4. Access SearXNG
Once the container is running, you can access your SearXNG instance by navigating to `http://<hostname>:4000` in your web browser.
### 5. Testing JSON Output
To verify that the JSON output is correctly configured, you can use `curl` or a similar tool:
```bash
curl http://<hostname>:4000/search?q=python&format=json
```
This should return search results in JSON format.
### 5. Configuration URL for OpenWebUI
http://<hostname>:4000/search?q=<query>

File diff suppressed because it is too large Load Diff

View File

@@ -0,0 +1,123 @@
# ComfyUI Docker Setup with GGUF Support and ComfyUI Manager
This guide provides detailed steps to build and run **ComfyUI** with **GGUF support** and **ComfyUI Manager** using Docker. The GGUF format is optimized for quantized models, and ComfyUI Manager is included for easy node management.
## Prerequisites
Before starting, ensure you have the following installed on your system:
- **Docker**
- **NVIDIA GPU with CUDA support** (if using GPU acceleration)
- **Create Directory structure for git repo Models and Checkpoints**
```bash
mkdir -p ~/dev-ai/vison/models/checkpoints
```
### 1. Clone the ComfyUI Repository
First, navigate to `~/dev-ai/vison` directory and clone the ComfyUI repository to your local machine:
```bash
cd ~/dev-ai/vison
```
```bash
git clone https://github.com/comfyanonymous/ComfyUI.git
cd ComfyUI
```
### 2. Create the Dockerfile
Copy the provided `Dockerfile` in the root of your ComfyUI directory. This file contains the necessary configurations for building the Docker container with GGUF support.
### 3. Build the Docker Image
```bash
docker build -t comfyui-gguf:latest .
```
This will create a Docker image named `comfyui-gguf:latest` with both **ComfyUI Manager** and **GGUF support** built in.
### 4. Run the Docker Container
Once the image is built, you can run the Docker container with volume mapping for your models.
```bash
docker run --name comfyui -p 8188:8188 --gpus all \
-v /home/mukul/dev-ai/vison/models:/app/models \
-d comfyui-gguf:latest
```
This command maps your local `models` directory to `/app/models` inside the container and exposes ComfyUI on port `8188`.
### 5. Download and Place Checkpoint Models
Download and place your civitai checkpoint models in the `checkpoints` directory inside the container:
https://civitai.com/models/139562/realvisxl-v50
To use GGUF models or other safetensor models, follow the steps below to download them directly into the `checkpoints` directory.
1. **Navigate to the Checkpoints Directory**:
```bash
cd /home/mukul/dev-ai/vison/models/checkpoints
```
2. **Download `flux1-schnell-fp8.safetensors`**:
```bash
wget https://huggingface.co/Comfy-Org/flux1-schnell/resolve/main/flux1-schnell-fp8.safetensors?download=true -O flux1-schnell-fp8.safetensors
```
3. **Download `flux1-dev-fp8.safetensors`**:
```bash
wget https://huggingface.co/Comfy-Org/flux1-dev/resolve/main/flux1-dev-fp8.safetensors?download=true -O flux1-dev-fp8.safetensors
```
These commands will place the corresponding `.safetensors` files into the `checkpoints` directory.
### 6. Access ComfyUI
After starting the container, access the ComfyUI interface in your web browser:
```bash
http://<your-server-ip>:8188
```
Replace `<your-server-ip>` with your server's IP address or use `localhost` if you're running it locally.
### 7. Using GGUF Models
In the ComfyUI interface:
- Use the **UnetLoaderGGUF** node (found in the `bootleg` category) to load GGUF models.
- Ensure your GGUF files are correctly named and placed in the `/app/models/checkpoints` directory for detection by the loader node.
### 8. Managing Nodes with ComfyUI Manager
With **ComfyUI Manager** built into the image:
- **Install** missing nodes as needed when uploading workflows.
- **Enable/Disable** conflicting nodes from the ComfyUI Manager interface.
### 9. Stopping and Restarting the Docker Container
To stop the running container:
```bash
docker stop comfyui
```
To restart the container:
```bash
docker start comfyui
```
### 10. Logs and Troubleshooting
To view the container logs:
```bash
docker logs comfyui
```
This will provide details if anything goes wrong or if you encounter issues with GGUF models or node management.

View File

@@ -0,0 +1,33 @@
# Base image with Python 3.11 and CUDA 12.5 support
FROM nvidia/cuda:12.5.0-runtime-ubuntu22.04
# Install system dependencies
RUN apt-get update && apt-get install -y \
git \
python3-pip \
libgl1-mesa-glx \
&& rm -rf /var/lib/apt/lists/*
# Set working directory
WORKDIR /app
# Copy the cloned ComfyUI repository
COPY . /app
# Install Python dependencies
RUN pip install --upgrade pip
RUN pip install -r requirements.txt
# Clone and install ComfyUI Manager
RUN git clone https://github.com/ltdrdata/ComfyUI-Manager.git /app/custom_nodes/ComfyUI-Manager && \
pip install -r /app/custom_nodes/ComfyUI-Manager/requirements.txt
# Clone and install GGUF support for ComfyUI
RUN git clone https://github.com/city96/ComfyUI-GGUF.git /app/custom_nodes/ComfyUI-GGUF && \
pip install --upgrade gguf
# Expose the port used by ComfyUI
EXPOSE 8188
# Run ComfyUI with the server binding to 0.0.0.0
CMD ["python3", "main.py", "--listen", "0.0.0.0"]

View File

@@ -0,0 +1,113 @@
## Running uncensored models on the NVIDIA Jetson Orin Nano Super Developer Kit
This guide is aimed at helping you set up uncensored models seamlessly on your Jetson Orin Nano, ensuring you can run powerful image generation models on this compact, yet powerful device.
This tutorial will walk you through each step of the process. Even if you're starting from a fresh installation, following along should ensure everything is set up correctly. And if anything doesnt work as expected, feel free to reach out—I'll keep this guide updated to keep it running smoothly.
---
## Lets Dive In
### Step 1: Installing Miniconda and Setting Up a Python Environment
First, we need to install Miniconda on your Jetson Nano. This will allow us to create an isolated Python environment for managing dependencies. Let's set up our project environment.
```bash
wget https://repo.anaconda.com/miniconda/Miniconda3-latest-Linux-aarch64.sh
chmod +x Miniconda3-latest-Linux-aarch64.sh
./Miniconda3-latest-Linux-aarch64.sh
conda update conda
```
Now, we create and activate a Python 3.10 environment for our project.
```bash
conda create -n comfyui python=3.10
conda activate comfyui
```
### Step 2: Installing CUDA, cuDNN, TensorRT, and Verifying nvcc
```bash
Preconfigured on JetPack 6.1!
```
Next, confirm that CUDA is installed correctly by checking the `nvcc` version.
```bash
nvcc --version
```
### Step 3: Installing PyTorch, TorchVision, and TorchAudio
Now let's install the essential libraries for image generation: PyTorch, TorchVision, and Torchaudio from here [devpi - cu12.6](http://jetson.webredirect.org/jp6/cu126)
```bash
pip install https://pypi.jetson-ai-lab.dev/jp6/cu126/+f/5cf/9ed17e35cb752/torch-2.5.0-cp310-cp310-linux_aarch64.whl
pip install https://pypi.jetson-ai-lab.dev/jp6/cu126/+f/9d2/6fac77a4e832a/torchvision-0.19.1a0+6194369-cp310-cp310-linux_aarch64.whl
pip install https://pypi.jetson-ai-lab.dev/jp6/cu126/+f/812/4fbc4ba6df0a3/torchaudio-2.5.0-cp310-cp310-linux_aarch64.whl
```
### Step 4: Cloning the Project Repository
Now, we clone the necessary source code for the project from GitHub. This will include the files for running uncensored models from civtai.com.
```bash
git clone https://github.com/comfyanonymous/ComfyUI.git
cd ComfyUI
```
### Step 5: Installing Project Dependencies
Next, install the required dependencies for the project by running the `requirements.txt` file.
```bash
pip install -r requirements.txt
```
### Step 6: Resolving Issues with NumPy (if necessary)
If you encounter issues with NumPy, such as compatibility problems, you can fix it by downgrading to a version below 2.0.
```bash
pip install "numpy<2"
```
### Step 7: Running ComfyUI
Finally, we can run ComfyUI to check if everything is set up properly. Start the app with the following command:
```bash
python main.py --listen 0.0.0.0
```
---
## Great! Now that you've got ComfyUI up and running, its time to load your first uncensored model.
1. Navigate to [civitai.com](https://civitai.com) and select a model. For example, you can choose the following model:
[RealVisionBabes v1.0](https://civitai.com/models/543456?modelVersionId=604282)
2. Download the model file: [realvisionbabes_v10.safetensors](https://civitai.com/api/download/models/604282?type=Model&format=SafeTensor&size=pruned&fp=fp16)
3. Place it inside the `models/checkpoints` folder.
4. Download the VAE file: [ClearVAE_V2.3_fp16.pt](https://civitai.com/api/download/models/604282?type=VAE)
5. Place it inside the `models/vae` folder.
---
## You're all set to launch your first run!
Visit the provided URL by ComfyUI (`http://jetson:8188`) on your Jetson Nano.
Go to the [ControlNet reference demo](https://civitai.com/posts/3943573), download the workflow (also available in the repo as workflow-api.json) and, import it in comfyUI.
And hit the "Queue Prompt" button, and watch the magic unfold!
Happy generating! 🎉

View File

@@ -0,0 +1,91 @@
You are an expert prompt generator. Your task is to transform user requests into detailed, vivid, and imaginative prompts that can be used to generate visually captivating images with a diffusion model. You should:
- **Analyze** the user's request carefully and extract the key visual elements.
- **Generate a prompt** that describes the image in clear and evocative terms, ensuring its visually rich and imaginative.
- **Ensure details** are specific about the atmosphere, setting, colors, lighting, textures, and unique characteristics.
- **Use creative language** to enhance the visual quality, whether the style is realistic, surreal, or abstract.
- **Consider mood and style** (e.g., dark and moody, bright and lively, minimalist, detailed, etc.).
Here are some examples to guide you:
---
### **Few-Shot Examples:**
**Example 1:**
**User Request:**
"I want to see a futuristic cityscape at night with neon lights and flying cars."
**Generated Prompt:**
"A vibrant futuristic cityscape at night, with towering skyscrapers that stretch towards a starless sky. The buildings are covered in shimmering neon lights—bright blues, purples, and pinks—casting colorful reflections onto the glossy streets below. Flying cars zip through the air, leaving glowing trails in their wake, while holographic billboards advertise virtual products. The atmosphere is electric, bustling with energy and technology, as a soft mist rises from the ground, adding a touch of mystery to the scene."
---
**Example 2:**
**User Request:**
"Id love a serene mountain landscape with a calm lake and a small wooden cabin."
**Generated Prompt:**
"A tranquil mountain landscape at dawn, with majestic snow-capped peaks towering in the distance. A serene, glassy lake reflects the vibrant colors of the early morning sky—soft pinks, oranges, and purples. A small, rustic wooden cabin sits by the lakeshore, its smoke rising from the chimney, blending gently with the mist above the water. Pine trees surround the cabin, their dark green needles adding depth to the peaceful scene. The air is crisp, and the whole environment exudes a sense of quiet solitude."
---
**Example 3:**
**User Request:**
"I want a mystical creature in a dark enchanted forest."
**Generated Prompt:**
"A mystical creature standing tall in the heart of a dark, enchanted forest. The creature has the body of a lion, but its fur is deep indigo, shimmering with silver flecks like stars. Its eyes glow with an ethereal light, casting an otherworldly glow across the forest floor. The forest is dense with towering trees whose bark is twisted, covered in glowing moss. Fog weaves through the trees, and mysterious flowers glow faintly in the shadows. The atmosphere is magical, filled with the sense of an ancient, forgotten world full of wonder."
---
**Example 4:**
**User Request:**
"Can you create a vibrant sunset on a tropical beach with palm trees?"
**Generated Prompt:**
"A stunning tropical beach at sunset, where the sky is ablaze with fiery hues of red, orange, and pink, melting into the calm blue of the ocean. The golden sand is warm, and the gentle waves lap against the shore. Silhouetted palm trees frame the scene, their long leaves swaying in the soft breeze. The sun is just dipping below the horizon, casting a golden glow across the water. The atmosphere is peaceful yet vibrant, with the serene sounds of the ocean adding to the beauty of the moment."
---
**Example 5:**
**User Request:**
"Imagine an underwater scene with colorful coral reefs and exotic fish."
**Generated Prompt:**
"A vibrant underwater scene, where the sunlight filters down through crystal-clear water, illuminating the colorful coral reefs below. The corals are in shades of purple, pink, and yellow, teeming with life. Schools of exotic fish dart through the scene—brightly colored in hues of electric blue, orange, and green. The water is calm, with soft ripples distorting the light, while gentle seaweed sways with the current. The scene is peaceful and full of life, a kaleidoscope of color beneath the ocean's surface."
---
### **User Request:**
"Please create a scene with a magical waterfall in a forest."
**Generated Prompt:**
"A breathtaking magical waterfall cascading down from a high cliff, surrounded by an ancient forest. The water sparkles with iridescent hues, as if glowing with a soft, mystical light. Lush green foliage and towering trees frame the waterfall, with delicate vines hanging down like natures curtains. Mist rises from the base of the waterfall, creating a rainbow in the air. Sunlight filters through the canopy above, casting dappled light across the mossy rocks and the peaceful forest floor. The atmosphere is serene, almost dreamlike, filled with the sound of the waters soothing rush."
---
### **User Request:**
"I want to see an alien landscape on another planet with strange rock formations."
**Generated Prompt:**
"A surreal alien landscape on a distant planet, bathed in the pale light of two suns setting on the horizon. The ground is rocky, with bizarre rock formations that defy gravity, twisting and spiraling upward like ancient sculptures. The sky above is a vibrant shade of purple, dotted with swirling clouds and distant stars. The air is thick with an otherworldly mist, and strange, bioluminescent plants glow faintly in the twilight. The scene is alien and unearthly, with a sense of wonder and curiosity as the landscape stretches endlessly into the unknown."
---
### **User Request:**
"Could you create a winter scene with a frozen lake and a snowman?"
**Generated Prompt:**
"A peaceful winter scene with a frozen lake covered in a smooth sheet of ice, reflecting the soft pale blue of the overcast sky. Snow gently falls from the sky, coating the landscape in a thick layer of white. A cheerful snowman stands at the edge of the lake, its coal-black eyes and carrot nose adding a touch of whimsy to the quiet surroundings. Snow-covered pine trees line the shore, their branches weighed down by the snow. The air is crisp and fresh, and the entire scene feels calm, still, and full of the quiet beauty of winter."
---
### **End of Few-Shot Examples**

View File

@@ -0,0 +1,129 @@
{
"1": {
"inputs": {
"ckpt_name": "realvisionbabes_v10.safetensors"
},
"class_type": "CheckpointLoaderSimple",
"_meta": {
"title": "Load Checkpoint"
}
},
"2": {
"inputs": {
"stop_at_clip_layer": -1,
"clip": [
"1",
1
]
},
"class_type": "CLIPSetLastLayer",
"_meta": {
"title": "CLIP Set Last Layer"
}
},
"3": {
"inputs": {
"text": "amateur, instagram photo, beautiful face",
"clip": [
"2",
0
]
},
"class_type": "CLIPTextEncode",
"_meta": {
"title": "CLIP Text Encode (Prompt)"
}
},
"4": {
"inputs": {
"text": "Watermark, Text, censored, deformed, bad anatomy, disfigured, poorly drawn face, mutated, ugly, cropped, worst quality, low quality, mutation, poorly drawn, abnormal eye proportion, bad\nart, ugly face, messed up face, high forehead, professional photo shoot, makeup, photoshop, doll, plastic_doll, silicone, anime, cartoon, fake, filter, airbrush, 3d max, infant, featureless, colourless, impassive, shaders, two heads, crop,",
"clip": [
"2",
0
]
},
"class_type": "CLIPTextEncode",
"_meta": {
"title": "CLIP Text Encode (Prompt)"
}
},
"5": {
"inputs": {
"seed": 411040191827786,
"steps": 30,
"cfg": 3,
"sampler_name": "dpmpp_2m_sde",
"scheduler": "normal",
"denoise": 1,
"model": [
"1",
0
],
"positive": [
"3",
0
],
"negative": [
"4",
0
],
"latent_image": [
"6",
0
]
},
"class_type": "KSampler",
"_meta": {
"title": "KSampler"
}
},
"6": {
"inputs": {
"width": 512,
"height": 768,
"batch_size": 1
},
"class_type": "EmptyLatentImage",
"_meta": {
"title": "Empty Latent Image"
}
},
"7": {
"inputs": {
"samples": [
"5",
0
],
"vae": [
"8",
0
]
},
"class_type": "VAEDecode",
"_meta": {
"title": "VAE Decode"
}
},
"8": {
"inputs": {
"vae_name": "ClearVAE_V2.3_fp16.pt"
},
"class_type": "VAELoader",
"_meta": {
"title": "Load VAE"
}
},
"9": {
"inputs": {
"filename_prefix": "ComfyUI",
"images": [
"7",
0
]
},
"class_type": "SaveImage",
"_meta": {
"title": "Save Image"
}
}
}

View File

@@ -0,0 +1,316 @@
# Running DeepSeek-R1-0528 (FP8 Hybrid) with KTransformers
This guide provides instructions to run the DeepSeek-R1-0528 model locally using a hybrid FP8 (GPU) and Q4_K_M GGUF (CPU) approach with KTransformers, managed via Docker. This setup is optimized for high-end hardware (e.g., NVIDIA RTX 4090, high-core count CPU, significant RAM).
**Model Version:** DeepSeek-R1-0528
**KTransformers Version (Working):** `approachingai/ktransformers:v0.2.4post1-AVX512`
## Table of Contents
1. [Prerequisites](#prerequisites)
2. [Model Preparation](#model-preparation)
* [Step 2a: Download FP8 Base Model (Host)](#step-2a-download-fp8-base-model-host)
* [Step 2b: Download Q4\_K\_M GGUF Model (Host)](#step-2b-download-q4_k_m-gguf-model-host)
* [Step 2c: Merge Models (Inside Docker)](#step-2c-merge-models-inside-docker)
* [Step 2d: Set Ownership & Permissions (Host)](#step-2d-set-ownership--permissions-host)
3. [Running the Model with KTransformers](#running-the-model-with-ktransformers)
* [Single GPU (e.g., 1x RTX 4090)](#single-gpu-eg-1x-rtx-4090)
* [Multi-GPU (e.g., 2x RTX 4090)](#multi-gpu-eg-2x-rtx-4090)
4. [Testing the Server](#testing-the-server)
5. [Key Server Parameters](#key-server-parameters)
6. [Notes on KTransformers v0.3.1](#notes-on-ktransformers-v031)
7. [Available Optimize Config YAMLs (for reference)](#available-optimize-config-yamls-for-reference)
8. [Troubleshooting Tips](#troubleshooting-tips)
---
## 1. Prerequisites
* **Hardware:**
* NVIDIA GPU with FP8 support (e.g., RTX 40-series, Hopper series).
* High core-count CPU (e.g., Intel Xeon, AMD Threadripper).
* Significant System RAM (ideally 512GB for larger GGUF experts and context). The Q4_K_M experts for a large model can consume 320GB+ alone.
* Fast SSD (NVMe recommended) for model storage.
* **Software (on Host):**
* Linux OS (Ubuntu 24.04 LTS recommended).
* NVIDIA Drivers (ensure they are up-to-date and support your GPU and CUDA version).
* Docker Engine.
* NVIDIA Container Toolkit (for GPU access within Docker).
* Conda or a Python virtual environment manager.
* Python 3.9+
* `huggingface_hub` and `hf_transfer`
* Git (for cloning KTransformers if you need to inspect YAMLs or contribute).
---
## 2. Model Preparation
We assume your models will be downloaded and stored under `/home/mukul/dev-ai/models` on your host system. This path will be mounted into the Docker container as `/models`. Adjust paths if your setup differs.
### Step 2a: Download FP8 Base Model (Host)
Download the official DeepSeek-R1-0528 FP8 base model components.
```bash
# Ensure that correct packages are installed. Conda is recommended for environemnt management.
pip install -U huggingface_hub hf_transfer
export HF_HUB_ENABLE_HF_TRANSFER=1 # For faster downloads
```
```bash
# Define your host model directory
HOST_MODEL_DIR="/home/mukul/dev-ai/models"
BASE_MODEL_HF_ID="deepseek-ai/DeepSeek-R1-0528"
LOCAL_BASE_MODEL_PATH="${HOST_MODEL_DIR}/${BASE_MODEL_HF_ID}"
mkdir -p "${LOCAL_BASE_MODEL_PATH}"
echo "Downloading base model to: ${LOCAL_BASE_MODEL_PATH}"
huggingface-cli download --resume-download "${BASE_MODEL_HF_ID}" \
--local-dir "${LOCAL_BASE_MODEL_PATH}"```
```
### Step 2b: Download Q4_K_M GGUF Model (Host)
Download the Unsloth Q4_K_M GGUF version of DeepSeek-R1-0528 using the attached python script.
### Step 2c: Merge Models (Inside Docker)
This step uses the KTransformers Docker image to merge the FP8 base and Q4\_K\_M GGUF weights.
```bash
docker stop ktransformers
docker run --rm --gpus '"device=1"' \
-v /home/mukul/dev-ai/models:/models \
--name ktransformers \
-itd approachingai/ktransformers:v0.2.4post1-AVX512
docker exec -it ktransformers /bin/bash
```
```bash
python merge_tensors/merge_safetensor_gguf.py \
--safetensor_path /models/deepseek-ai/DeepSeek-R1-0528 \
--gguf_path /models/unsloth/DeepSeek-R1-0528-GGUF/Q4_K_M \
--output_path /models/mukul/DeepSeek-R1-0528-GGML-FP8-Hybrid/Q4_K_M_FP8
```
### Step 2d: Set Ownership & Permissions (Host)
After Docker creates the merged files, fix ownership and permissions on the host.
```bash
HOST_OUTPUT_DIR_QUANT="/home/mukul/dev-ai/models/mukul/DeepSeek-R1-0528-GGML-FP8-Hybrid/Q4_K_M_FP8" # As defined above
echo "Setting ownership for merged files in: ${HOST_OUTPUT_DIR_QUANT}"
sudo chown -R $USER:$USER "${HOST_OUTPUT_DIR_QUANT}"
sudo find "${HOST_OUTPUT_DIR_QUANT}" -type f -exec chmod 664 {} \;
sudo find "${HOST_OUTPUT_DIR_QUANT}" -type d -exec chmod 775 {} \;
echo "Ownership and permissions set. Verification:"
ls -la "${HOST_OUTPUT_DIR_QUANT}"
```
---
## 3. Running the Model with KTransformers
Ensure the Docker image `approachingai/ktransformers:v0.2.4post1-AVX512` is pulled.
### Single GPU (e.g., 1x RTX 4090)
**1. Start Docker Container:**
```bash
# Stop any previous instance
docker stop ktransformers || true # Allow if not running
docker rm ktransformers || true # Allow if not existing
# Define your host model directory
HOST_MODEL_DIR="/home/mukul/dev-ai/models"
TARGET_GPU="1" # Specify GPU ID, e.g., "0", "1", or "all"
docker run --rm --gpus "\"device=${TARGET_GPU}\"" \
-v "${HOST_MODEL_DIR}:/models" \
-p 10002:10002 \
--name ktransformers \
-itd approachingai/ktransformers:v0.2.4post1-AVX512
docker exec -it ktransformers /bin/bash
```
**2. Inside the Docker container shell, launch the server:**
```bash
# Set environment variable for PyTorch CUDA memory allocation
export PYTORCH_CUDA_ALLOC_CONF=expandable_segments:True
CONTAINER_MERGED_MODEL_PATH="/models/mukul/DeepSeek-R1-0528-GGML-FP8-Hybrid/Q4_K_M_FP8"
CONTAINER_BASE_MODEL_CONFIG_PATH="/models/deepseek-ai/DeepSeek-R1-0528"
# Launch server
python3 ktransformers/server/main.py \
--gguf_path "${CONTAINER_MERGED_MODEL_PATH}" \
--model_path "${CONTAINER_BASE_MODEL_CONFIG_PATH}" \
--model_name KVCache-ai/DeepSeek-R1-0528-q4km-fp8 \
--cpu_infer 57 \
--max_new_tokens 16384 \
--cache_lens 24576 \
--cache_q4 true \
--temperature 0.6 \
--top_p 0.95 \
--optimize_config_path ktransformers/optimize/optimize_rules/DeepSeek-V3-Chat-fp8-linear-ggml-experts.yaml \
--force_think \
--use_cuda_graph \
--host 0.0.0.0 \
--port 10002
```
*Note: The `--optimize_config_path` still refers to a `DeepSeek-V3` YAML. This V3 config is compatible and recommended.
### Multi-GPU (e.g., 2x RTX 4090)
**1. Start Docker Container:**
```bash
# Stop any previous instance
docker stop ktransformers || true
docker rm ktransformers || true
# Define your host model directory
HOST_MODEL_DIR="/home/mukul/dev-ai/models"
TARGET_GPUS="0,1" # Specify GPU IDs
docker run --rm --gpus "\"device=${TARGET_GPUS}\"" \
-v "${HOST_MODEL_DIR}:/models" \
-p 10002:10002 \
--name ktransformers \
-itd approachingai/ktransformers:v0.2.4post1-AVX512
docker exec -it ktransformers /bin/bash
```
**2. Inside the Docker container shell, launch the server:**
```bash
# Set environment variable (optional for multi-GPU, but can be helpful)
# export PYTORCH_CUDA_ALLOC_CONF=expandable_segments:True
# Define container paths
CONTAINER_MERGED_MODEL_PATH="/models/mukul/DeepSeek-R1-0528-GGML-FP8-Hybrid/Q4_K_M_FP8"
CONTAINER_BASE_MODEL_CONFIG_PATH="/models/deepseek-ai/DeepSeek-R1-0528"
# Launch server
python3 ktransformers/server/main.py \
--gguf_path "${CONTAINER_MERGED_MODEL_PATH}" \
--model_path "${CONTAINER_BASE_MODEL_CONFIG_PATH}" \
--model_name KVCache-ai/DeepSeek-R1-0528-q4km-fp8 \
--cpu_infer 57 \
--max_new_tokens 24576 \
--cache_lens 32768 \
--cache_q4 true \
--temperature 0.6 \
--top_p 0.95 \
--optimize_config_path ktransformers/optimize/optimize_rules/DeepSeek-V3-Chat-multi-gpu-fp8-linear-ggml-experts.yaml \
--force_think \
--use_cuda_graph \
--host 0.0.0.0 \
--port 10002
```
*Note: The `--optimize_config_path` still refers to a `DeepSeek-V3` YAML. This is intentional.*
---
## 4. Testing the Server
Once the server is running inside Docker (look for "Uvicorn running on http://0.0.0.0:10002"), open a **new terminal on your host machine** and test with `curl`:
```bash
curl http://localhost:10002/v1/chat/completions \
-H "Content-Type: application/json" \
-d '{
"model": "KVCache-ai/DeepSeek-R1-0528-q4km-fp8",
"messages": [{"role": "user", "content": "Explain the concept of Mixture of Experts in large language models in a simple way."}],
"max_tokens": 250,
"temperature": 0.6,
"top_p": 0.95
}'
```
A JSON response containing the model's output indicates success.
---
## 5. Key Server Parameters
* `--gguf_path`: Path inside the container to your **merged** hybrid model files.
* `--model_path`: Path inside the container to the **original base model's** directory (containing `config.json`, `tokenizer.json`, etc.). KTransformers needs this for model configuration.
* `--model_name`: Arbitrary name for the API endpoint. Used in client requests.
* `--cpu_infer`: Number of CPU threads for GGUF expert inference. Tune based on your CPU cores (e.g., `57` for a 56-core/112-thread CPU might leave some cores for other tasks, or you could try higher).
* `--max_new_tokens`: Maximum number of tokens the model can generate in a single response.
* `--cache_lens`: Maximum KV cache size in tokens. Directly impacts context length capacity and VRAM usage.
* `--cache_q4`: (Boolean) If `true`, quantizes the KV cache to 4-bit. **Crucial for saving VRAM**, especially with long contexts.
* `--temperature`, `--top_p`: Control generation randomness.
* `--optimize_config_path`: Path to the KTransformers YAML file defining the layer offloading strategy (FP8 on GPU, GGUF on CPU). **Essential for the hybrid setup.**
* `--force_think`: (KTransformers specific) Potentially related to how the model processes or plans.
* `--use_cuda_graph`: Enables CUDA graphs for potentially faster GPU execution by reducing kernel launch overhead.
* `--host`, `--port`: Network interface and port for the server.
* `PYTORCH_CUDA_ALLOC_CONF=expandable_segments:True`: Environment variable to help PyTorch manage CUDA memory more flexibly and potentially avoid OOM errors.
---
## 6. Notes on KTransformers v0.3.1
As of 2025-06-02, the `approachingai/ktransformers:v0.3.1-AVX512` image was reported as **not working** with the provided single GPU or multi-GPU configuration.
**Attempted Docker Start Command (v0.3.1 - Non-Functional):**
```bash
# docker stop ktransformers # (if attempting to switch)
# docker run --rm --gpus '"device=0,1"' \
# -v /home/mukul/dev-ai/models:/models \
# -p 10002:10002 \
# --name ktransformers \
# -itd approachingai/ktransformers:v0.3.1-AVX512
#
# docker exec -it ktransformers /bin/bash
```
**Attempted Server Launch (v0.3.1 - Non-Functional):**
```bash
# # Inside the v0.3.1 Docker container shell
# PYTORCH_CUDA_ALLOC_CONF=expandable_segments:True python3 ktransformers/server/main.py \
# --gguf_path /models/mukul/DeepSeek-R1-0528-GGML-FP8-Hybrid/Q4_K_M_FP8 \
# --model_path /models/deepseek-ai/DeepSeek-R1-0528 \
# --model_name KVCache-ai/DeepSeek-R1-0528-q4km-fp8 \
# --cpu_infer 57 \
# --max_new_tokens 32768 \
# --cache_lens 65536 \
# --cache_q4 true \
# --temperature 0.6 \
# --top_p 0.95 \
# --optimize_config_path ktransformers/optimize/optimize_rules/DeepSeek-V3-Chat-multi-gpu-fp8-linear-ggml-experts.yaml \
# --force_think \
# --use_cuda_graph \
# --host 0.0.0.0 \
# --port 10002
```
Stick to `approachingai/ktransformers:v0.2.4post1-AVX512` for the configurations described above until compatibility issues with newer versions are resolved for this specific model and setup.
---
## 7. Available Optimize Config YAMLs (for reference)
The KTransformers repository contains various optimization YAML files. The ones used in this guide are for `DeepSeek-V3` but are being applied to `DeepSeek-R1-0528`. Their direct compatibility or optimality for R1-0528 should be verified. If KTransformers releases specific YAMLs for DeepSeek-R1-0528, those should be preferred.
Reference list of some `DeepSeek-V3` YAMLs (path `ktransformers/optimize/optimize_rules/` inside the container):
```
DeepSeek-V3-Chat-amx.yaml
DeepSeek-V3-Chat-fp8-linear-ggml-experts-serve-amx.yaml
DeepSeek-V3-Chat-fp8-linear-ggml-experts-serve.yaml
DeepSeek-V3-Chat-fp8-linear-ggml-experts.yaml
DeepSeek-V3-Chat-multi-gpu-4.yaml
DeepSeek-V3-Chat-multi-gpu-8.yaml
DeepSeek-V3-Chat-multi-gpu-fp8-linear-ggml-experts.yaml
DeepSeek-V3-Chat-multi-gpu-marlin.yaml
DeepSeek-V3-Chat-multi-gpu.yaml
DeepSeek-V3-Chat-serve.yaml
DeepSeek-V3-Chat.yaml
```

View File

@@ -0,0 +1,65 @@
from huggingface_hub import hf_hub_download, list_repo_files # Import list_repo_files
import os
# Configuration
repo_id = "unsloth/DeepSeek-R1-0528-GGUF"
folder_in_repo = "Q4_K_M"
file_extension = ".gguf"
# Expand the tilde (~) to the user's home directory
local_base_dir = os.path.expanduser("~/dev-ai/models/unsloth/DeepSeek-R1-0528-GGUF")
# Create base directory
# The hf_hub_download function will create the directory if it doesn't exist
# when local_dir_use_symlinks=False. However, explicit creation is fine.
os.makedirs(local_base_dir, exist_ok=True)
# Download files
print(f"Listing files from {repo_id} in folder {folder_in_repo} with extension {file_extension}...")
try:
all_repo_files = list_repo_files(repo_id, repo_type='model')
files_to_download = [
f for f in all_repo_files
if f.startswith(folder_in_repo + "/") and f.endswith(file_extension)
]
if not files_to_download:
print(f"No files found in '{folder_in_repo}' with extension '{file_extension}'.")
else:
print(f"Found {len(files_to_download)} file(s) to download.")
for filename_in_repo in files_to_download:
print(f"Downloading {filename_in_repo}...")
# The filename parameter in hf_hub_download should be the path within the repo
# The local_dir parameter specifies where the file (maintaining its repo path structure)
# will be saved under.
# For example, if filename_in_repo is "UD-Q4_K_XL/file.gguf",
# it will be saved as local_base_dir/UD-Q4_K_XL/file.gguf
try:
downloaded_file_path = hf_hub_download(
repo_id=repo_id,
filename=filename_in_repo, # This is the path of the file within the repository
local_dir=local_base_dir,
local_dir_use_symlinks=False,
# Set resume_download=True if you want to resume interrupted downloads
# resume_download=True,
)
# The hf_hub_download function returns the full path to the downloaded file.
# The way files are saved when local_dir is used can be tricky.
# If filename_in_repo is "folder/file.txt", it will be saved as "local_dir/folder/file.txt".
# If you want all files directly in local_base_dir without the repo's folder structure,
# you would need to adjust the local_dir or rename/move the file post-download.
# However, for GGUF files from a specific folder, saving them under that folder structure locally is usually fine.
print(f"Successfully downloaded and saved to: {downloaded_file_path}")
# If you want to confirm the exact path as per your original print statement's intent:
# expected_local_path = os.path.join(local_base_dir, filename_in_repo)
# print(f"Saved to: {expected_local_path}")
except Exception as e:
print(f"Error downloading {filename_in_repo}: {str(e)}")
except Exception as e:
print(f"Error listing files from repository: {str(e)}")
print("Download process complete.")

View File

@@ -0,0 +1,137 @@
# **Port Forwarding Magic: Set Up Bolt.New with Remote Ollama Server and Qwen2.5-Coder:32B**
This guide demonstrates how to use **port forwarding** to connect your local **Bolt.New** setup to a **remote Ollama server**, solving issues with apps that dont allow full customization. Well use the open-source [Bolt.New repository](https://github.com/coleam00/bolt.new-any-llm) as our example, and well even show you how to extend the context length for the popular **Qwen2.5-Coder:32B model**.
If you encounter installation issues, submit an [issue](https://github.com/coleam00/bolt.new-any-llm/issues) or contribute by forking and improving this guide.
---
## **What You'll Learn**
- Clone and configure **Bolt.New** for your local development.
- Use **SSH tunneling** to seamlessly forward traffic to a remote server.
- Extend the context length of AI models for enhanced capabilities.
- Run **Bolt.New** locally.
---
## **Prerequisites**
Download and install Node.js from [https://nodejs.org/en/download/](https://nodejs.org/en/download/).
---
## **Step 1: Clone the Repository**
1. Open Terminal.
2. Clone the repository:
```bash
git clone https://github.com/coleam00/bolt.new-any-llm.git
```
---
## **Step 2: Stop Local Ollama Service**
If Ollama is already running on your machine, stop it to avoid conflicts with the remote server.
- **Stop the service**:
```bash
sudo systemctl stop ollama.service
```
- **OPTIONAL: Disable it from restarting**:
```bash
sudo systemctl disable ollama.service
```
---
## **Step 3: Forward Local Traffic to the Remote Ollama Server**
To forward all traffic from `localhost:11434` to your remote Ollama server (`ai.mtcl.lan:11434`), set up SSH tunneling:
1. Open a terminal and run:
```bash
ssh -L 11434:ai.mtcl.lan:11434 mukul@ai.mtcl.lan
```
- Replace `mukul` with your remote username.
- Replace `ai.mtcl.lan` with your server's hostname or IP.
2. Keep this terminal session running while using Bolt.New. This ensures your app communicates with the remote server as if its local.
---
## **Step 4: OPTIONAL: Extend Ollama Model Context Length**
By default, Ollama models have a context length of 2048 tokens. For tasks requiring larger input, extend this limit for **Qwen2.5-Coder:32B**:
1. SSH into your remote server:
```bash
ssh mukul@ai.mtcl.lan
```
2. Access the Docker container running Ollama:
```bash
docker exec -it ollama /bin/bash
```
3. Create a `Modelfile`:
While inside the Docker container, run the following commands to create the Modelfile:
```bash
echo "FROM qwen2.5-coder:32b" > /tmp/Modelfile
echo "PARAMETER num_ctx 32768" >> /tmp/Modelfile
```
If you prefer, you can use cat to directly create the file:
```bash
cat > /tmp/Modelfile << EOF
FROM qwen2.5-coder:32b
PARAMETER num_ctx 32768
EOF
```
4. Create the new model:
```bash
ollama create -f /tmp/Modelfile qwen2.5-coder-extra-ctx:32b
```
5. Verify the new model:
```bash
ollama list
```
You should see `qwen2.5-coder-extra-ctx:32b` listed.
6. Exit the Docker container:
```bash
exit
```
---
## **Step 5: Run Bolt.New Without Docker**
1. **Install Dependencies**
Navigate to the cloned repository:
```bash
cd bolt.new-any-llm
pnpm install
```
2. **Start the Development Server**
Run:
```bash
pnpm run dev
```
---
## **Summary**
This guide walks you through setting up **Bolt.New** with a **remote Ollama server**, ensuring seamless communication through SSH tunneling. Weve also shown you how to extend the context length for **Qwen2.5-Coder:32B**, making it ideal for advanced development tasks.
With this setup:
- Youll offload heavy computation to your remote server.
- Your local machine remains light and responsive.
- Buggy `localhost` configurations? No problem—SSH tunneling has you covered.
Credits: [Bolt.New repository](https://github.com/coleam00/bolt.new-any-llm).
Lets build something amazing! 🚀

View File

@@ -0,0 +1,107 @@
### **Guide to Set Up Bridge Networking on Ubuntu for Virtual Machines**
This guide explains how to configure bridge networking on Ubuntu to allow virtual machines (VMs) to directly access the network, obtaining their own IP addresses from the DHCP server.
By following this guide, you can successfully set up bridge networking, enabling your virtual machines to directly access the network as if they were standalone devices.
---
#### **Step 1: Identify Your Primary Network Interface**
The primary network interface is the one currently used by the server for network access. Identify it with the following command:
```bash
ip link show
```
Look for the name of the interface (e.g., `enp8s0`) with `state UP`.
---
#### **Step 2: Backup Your Current Network Configuration**
Before making any changes, back up the existing netplan configuration file:
```bash
sudo cp /etc/netplan/00-installer-config.yaml /etc/netplan/00-installer-config.yaml.bak
```
---
#### **Step 3: Configure the Bridge**
Edit the netplan configuration file:
```bash
sudo nano /etc/netplan/00-installer-config.yaml
```
Replace its content with the following, adjusted for your environment:
```yaml
network:
version: 2
ethernets:
enp8s0:
dhcp4: no
bridges:
br0:
interfaces: [enp8s0]
dhcp4: true
```
- `enp8s0`: Your physical network interface.
- `br0`: The new bridge interface that will be used by the virtual machines and the host.
Save and exit the file.
---
#### **Step 4: Apply the Configuration**
Apply the new network configuration to create the bridge:
```bash
sudo netplan apply
```
---
#### **Step 5: Verify the Bridge Configuration**
Check that the bridge `br0` is active and has an IP address:
```bash
ip addr show br0
```
You should see an output like this:
```plaintext
3: br0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue state UP group default qlen 1000
link/ether 46:10:cc:63:f4:37 brd ff:ff:ff:ff:ff:ff
inet 192.168.1.10/24 metric 100 brd 192.168.1.255 scope global dynamic br0
valid_lft 7102sec preferred_lft 7102sec
```
---
#### **Step 6: Configure Virtual Machines to Use the Bridge**
For VMs created with tools like `virt-manager` or `virsh`:
1. When configuring the VMs network interface, choose **Bridge** as the network source.
2. Set `br0` as the bridge interface.
3. The VM will now obtain an IP address dynamically from the same DHCP server as the host.
For `virt-manager`:
- Go to **Add Hardware > Network**.
- Choose **Bridge br0** as the source.
---
#### **Step 7: Test the Setup**
1. Start a VM and ensure it obtains a dynamic IP address from the network.
2. Test connectivity by pinging the gateway or external servers from the VM.
---
### **Key Considerations**
1. **Dynamic IP for Host:** The host server's IP address will now be associated with the bridge (`br0`) instead of the physical interface (`enp8s0`). This is expected behavior.
2. **Backup Configuration:** Always maintain a backup of your original network configuration to revert changes if needed.
3. **Network Manager vs. Netplan:** Use only one method (`netplan` or `nmcli`) for managing network configurations to avoid conflicts.
4. **Alternative Access:** If you are working on a remote server, ensure alternative access (e.g., a second network interface) before applying network changes.