How to model LLM components in BaseFortify (so CVEs match)

Publication date: 2025-10-09
TIPS

TL;DR

 

This is Part 3 in our series on making LLMs safe and compliant in real organisations. In Part 1 we unpacked the EU AI Act’s phased obligations (bans from Feb 2025, GPAI duties from Aug 2025, broad transparency by Aug 2026), and what businesses should prepare. In Part 2 we showed where LLM stacks actually break (Triton, vLLM, Transformers, LangChain) and the LLM-specific mitigations that help. Here in Part 3 we focus on how to model your LLM stack inside BaseFortify using simple, CPE-ish vendor, product, version entries so that when a CVE is published, it automatically becomes a tracked threat on the right node.

 

Core idea

 

Building on Part 2’s findings (real-world RCE/DoS in inference servers, unsafe loaders in model toolchains, and injection via retrieval), treat “LLM” as a stack and record each layer as its own component. That means adding the inference runtime, SDKs/libraries, gateway/orchestration, vector/database layer, GPU stack, and OS/container runtime separately so CVE matches are precise.

 

    • Inference server/runtime (e.g., nvidia, triton_inference_server; vllm, vllm)
    • Frameworks/SDKs (e.g., huggingface, transformers; langchain, langchain)
    • Orchestration/gateway (e.g., anyscale, ray; nginx, nginx)
    • Vector & database (e.g., milvus, milvus; postgresql, postgresql)
    • GPU stack (e.g., nvidia, gpu_display_driver; nvidia, cuda; nvidia, cudnn)
    • OS & container runtime (e.g., canonical, ubuntu_linux; docker, docker)

 

Why this works: CVE feeds typically reference the runtime/server or library names—not “ChatGPT/Claude/DeepSeek”. Modeling the stack ensures BaseFortify can match those CVEs to your environment.

 

Naming conventions that work (CPE-ish)

 

    • vendor: lowercase, underscores (e.g., nvidia, huggingface, anthropic)
    • product: lowercase, underscores (e.g., triton_inference_server, transformers, langchain)
    • version: exact semantic or distro version (e.g., 25.07, 4.46.0, 22.04)

 

Example: nvidia, triton_inference_server, 25.07 — clean, comparable, and easy to update.

 

A) Self-hosted: a concrete template (Linux node)

 

A lean baseline that still catches the bulk of LLM-related CVEs; swap vLLM for Triton if that’s your stack.

 

    • canonical, ubuntu_linux, 22.04
    • docker, docker, 24.0.7
    • nvidia, gpu_display_driver, 550.90.07
    • nvidia, cuda, 12.4.0
    • vllm, vllm, 0.9.0
    • pytorch, pytorch, 2.4.1
    • huggingface, transformers, 4.46.0
    • nginx, nginx, 1.24.0

 

Tip: If you run RAG, add your vector DB (e.g., postgresql, postgresql, … or milvus, milvus, …) separately.

 

B) SaaS LLMs (Anthropic, DeepSeek, OpenAI, etc.)

 

Model two things: a service placeholder (documents usage; CVEs rarely match there) and the SDK/client library (CVEs may match here).

 

    • anthropic, claude_api_service, 2025-09  |  anthropic, anthropic_python_sdk, 0.34.0
    • deepseek, deepseek_api_service, 2025-09  |  (SDK if used)
    • openai, openai_python_sdk, 1.50.0

 

Rationale: hosted model services seldom get CVEs in NVD, but their SDKs (and your surrounding stack) do. Recording the SDKs ensures threats surface.

 

C) RAG & file handling components worth tracking

 

RAG pipelines introduce parsers and tokenizers that process untrusted data; if it’s imported and touches untrusted input, model it.

 

    • python, python, 3.10.12 (base runtime; frequent security fixes)
    • Parsers you use (e.g., apache, tika, <version>; unstructured, unstructured, <version>)
    • Embedding libs if distinct (e.g., sentence_transformers, sentence_transformers, <version>)

 

Modeling examples (with one-sentence explanations)

 
A) vLLM + PostgreSQL/pgvector stack

 

A lean self-hosted serving stack (vLLM) with a classic relational store plus vector embeddings via pgvector—great for RAG with simple ops and backups.

 

Copy/paste set (comma-separated):

 

            canonical, ubuntu_linux, 22.04
            docker, docker, 24.0.7
            nvidia, gpu_display_driver, 550.90.07
            nvidia, cuda, 12.4.0
            vllm, vllm, 0.9.0
            pytorch, pytorch, 2.4.1
            huggingface, transformers, 4.46.0
            langchain, langchain, 0.3.6
            postgresql, postgresql, 16.3
            nginx, nginx, 1.24.0
 
B) Triton + Milvus stack

 

A high-performance serving stack using NVIDIA Triton with a dedicated vector database (Milvus) built for large-scale similarity search.

 

Copy/paste set (comma-separated):

 

            canonical, ubuntu_linux, 22.04
            docker, docker, 24.0.7
            nvidia, gpu_display_driver, 550.90.07
            nvidia, cuda, 12.4.0
            nvidia, triton_inference_server, 25.07
            microsoft, onnx_runtime, 1.18.0
            huggingface, transformers, 4.46.0
            langchain, langchain, 0.3.6
            milvus, milvus, 2.4.4
            nginx, nginx, 1.24.0
 
C) SaaS (Anthropic) + light local tooling

 

A minimal footprint using a hosted LLM (Anthropic) while keeping just enough local SDK/tooling and gateway to integrate safely into your environment.

 

Copy/paste set (comma-separated):

 

            canonical, ubuntu_linux, 22.04
            anthropic, claude_api_service, 2025-09
            anthropic, anthropic_python_sdk, 0.34.0
            huggingface, transformers, 4.46.0
            langchain, langchain, 0.3.6
            nginx, nginx, 1.24.0

 

Handy commands to capture versions quickly

 

Use these to read exact versions from running systems so your BaseFortify entries are precise and match CVEs reliably.

 

    • OS: lsb_release -ds
    • Docker: docker --version
    • NVIDIA: nvidia-smi (driver/CUDA); nvcc --version
    • vLLM: python -c "import vllm; print(vllm.__version__)"
    • Triton: check the container tag you run (e.g., nvcr.io/nvidia/tritonserver:25.07-py3)
    • PyTorch: python -c "import torch; print(torch.__version__)"
    • Transformers: python -c "import transformers; print(transformers.__version__)"
    • LangChain:
           python - <<'PY'
           import importlib
           m = importlib.import_module('langchain')
           print(getattr(m, '__version__', 'unknown'))
           PY

Common pitfalls (and how to avoid them)

 

    • Only adding the model name (e.g., “Claude”, “DeepSeek R1”): won’t match many CVEs. Fix: add the runtime/server, SDKs, gateway, vector DB, and GPU stack as separate components.
    • Inconsistent names (e.g., “NVIDIA Triton”, “triton”, “triton-inference”): breaks matching and search. Fix: stick to a single CPE-ish format (e.g., nvidia, triton_inference_server).
    • Missing versions: without a version, a match may not trigger. Fix: always record the exact version you deploy.
    • Lumping dependencies into one line: reduces match precision. Fix: one line per component.

 

How BaseFortify helps—specifically

 

You record components as vendor, product, version (as shown above). When a CVE is published that matches those fields, BaseFortify creates a threat automatically on the affected node/device. You can then track remediation (upgrade/mitigate) and keep an audit-friendly history of changes and closures. You can register for free at https://basefortify.eu/registerWith a free account you set up a watch list of 3 devices of any type (Desktop/laptop, server, mobile device) and a 100 applications, more than enough to add every component mentioned in this article.

 

Further reading