How to model LLM components in BaseFortify (so CVEs match)

TL;DR

This is Part 3 in our series on making LLMs safe and compliant in real organisations. In Part 1 we unpacked the EU AI Act’s phased obligations (bans from Feb 2025, GPAI duties from Aug 2025, broad transparency by Aug 2026), and what businesses should prepare. In Part 2 we showed where LLM stacks actually break (Triton, vLLM, Transformers, LangChain) and the LLM-specific mitigations that help. Here in Part 3 we focus on how to model your LLM stack inside BaseFortify using simple, CPE-ish vendor, product, version entries so that when a CVE is published, it automatically becomes a tracked threat on the right node.

Core idea

Building on Part 2’s findings (real-world RCE/DoS in inference servers, unsafe loaders in model toolchains, and injection via retrieval), treat “LLM” as a stack and record each layer as its own component. That means adding the inference runtime, SDKs/libraries, gateway/orchestration, vector/database layer, GPU stack, and OS/container runtime separately so CVE matches are precise.

- Inference server/runtime (e.g., nvidia, triton_inference_server; vllm, vllm)
- Frameworks/SDKs (e.g., huggingface, transformers; langchain, langchain)
- Orchestration/gateway (e.g., anyscale, ray; nginx, nginx)
- Vector & database (e.g., milvus, milvus; postgresql, postgresql)
- GPU stack (e.g., nvidia, gpu_display_driver; nvidia, cuda; nvidia, cudnn)
- OS & container runtime (e.g., canonical, ubuntu_linux; docker, docker)

Why this works: CVE feeds typically reference the runtime/server or library names—not “ChatGPT/Claude/DeepSeek”. Modeling the stack ensures BaseFortify can match those CVEs to your environment.

Naming conventions that work (CPE-ish)

- vendor: lowercase, underscores (e.g., nvidia, huggingface, anthropic)
- product: lowercase, underscores (e.g., triton_inference_server, transformers, langchain)
- version: exact semantic or distro version (e.g., 25.07, 4.46.0, 22.04)

Example: nvidia, triton_inference_server, 25.07 — clean, comparable, and easy to update.

A) Self-hosted: a concrete template (Linux node)

A lean baseline that still catches the bulk of LLM-related CVEs; swap vLLM for Triton if that’s your stack.

- canonical, ubuntu_linux, 22.04
- docker, docker, 24.0.7
- nvidia, gpu_display_driver, 550.90.07
- nvidia, cuda, 12.4.0
- vllm, vllm, 0.9.0
- pytorch, pytorch, 2.4.1
- huggingface, transformers, 4.46.0
- nginx, nginx, 1.24.0

Tip: If you run RAG, add your vector DB (e.g., postgresql, postgresql, … or milvus, milvus, …) separately.

B) SaaS LLMs (Anthropic, DeepSeek, OpenAI, etc.)

Model two things: a service placeholder (documents usage; CVEs rarely match there) and the SDK/client library (CVEs may match here).

- anthropic, claude_api_service, 2025-09 | anthropic, anthropic_python_sdk, 0.34.0
- deepseek, deepseek_api_service, 2025-09 | (SDK if used)
- openai, openai_python_sdk, 1.50.0

Rationale: hosted model services seldom get CVEs in NVD, but their SDKs (and your surrounding stack) do. Recording the SDKs ensures threats surface.

C) RAG & file handling components worth tracking

RAG pipelines introduce parsers and tokenizers that process untrusted data; if it’s imported and touches untrusted input, model it.

- python, python, 3.10.12 (base runtime; frequent security fixes)
- Parsers you use (e.g., apache, tika, <version>; unstructured, unstructured, <version>)
- Embedding libs if distinct (e.g., sentence_transformers, sentence_transformers, <version>)

Modeling examples (with one-sentence explanations)

A) vLLM + PostgreSQL/pgvector stack

A lean self-hosted serving stack (vLLM) with a classic relational store plus vector embeddings via pgvector—great for RAG with simple ops and backups.

Copy/paste set (comma-separated):

            canonical, ubuntu_linux, 22.04
            docker, docker, 24.0.7
            nvidia, gpu_display_driver, 550.90.07
            nvidia, cuda, 12.4.0
            vllm, vllm, 0.9.0
            pytorch, pytorch, 2.4.1
            huggingface, transformers, 4.46.0
            langchain, langchain, 0.3.6
            postgresql, postgresql, 16.3
            nginx, nginx, 1.24.0

B) Triton + Milvus stack

A high-performance serving stack using NVIDIA Triton with a dedicated vector database (Milvus) built for large-scale similarity search.

Copy/paste set (comma-separated):

            canonical, ubuntu_linux, 22.04
            docker, docker, 24.0.7
            nvidia, gpu_display_driver, 550.90.07
            nvidia, cuda, 12.4.0
            nvidia, triton_inference_server, 25.07
            microsoft, onnx_runtime, 1.18.0
            huggingface, transformers, 4.46.0
            langchain, langchain, 0.3.6
            milvus, milvus, 2.4.4
            nginx, nginx, 1.24.0

C) SaaS (Anthropic) + light local tooling

A minimal footprint using a hosted LLM (Anthropic) while keeping just enough local SDK/tooling and gateway to integrate safely into your environment.

Copy/paste set (comma-separated):

            canonical, ubuntu_linux, 22.04
            anthropic, claude_api_service, 2025-09
            anthropic, anthropic_python_sdk, 0.34.0
            huggingface, transformers, 4.46.0
            langchain, langchain, 0.3.6
            nginx, nginx, 1.24.0

Handy commands to capture versions quickly

Use these to read exact versions from running systems so your BaseFortify entries are precise and match CVEs reliably.

- OS: lsb_release -ds
- Docker: docker --version
- NVIDIA: nvidia-smi (driver/CUDA); nvcc --version
- vLLM: python -c "import vllm; print(vllm.__version__)"
- Triton: check the container tag you run (e.g., nvcr.io/nvidia/tritonserver:25.07-py3)
- PyTorch: python -c "import torch; print(torch.__version__)"
- Transformers: python -c "import transformers; print(transformers.__version__)"
- LangChain:
  python - <<'PY'
  import importlib
  m = importlib.import_module('langchain')
  print(getattr(m, '__version__', 'unknown'))
  PY

Common pitfalls (and how to avoid them)

- Only adding the model name (e.g., “Claude”, “DeepSeek R1”): won’t match many CVEs. Fix: add the runtime/server, SDKs, gateway, vector DB, and GPU stack as separate components.
- Inconsistent names (e.g., “NVIDIA Triton”, “triton”, “triton-inference”): breaks matching and search. Fix: stick to a single CPE-ish format (e.g., nvidia, triton_inference_server).
- Missing versions: without a version, a match may not trigger. Fix: always record the exact version you deploy.
- Lumping dependencies into one line: reduces match precision. Fix: one line per component.

How BaseFortify helps—specifically

You record components as vendor, product, version (as shown above). When a CVE is published that matches those fields, BaseFortify creates a threat automatically on the affected node/device. You can then track remediation (upgrade/mitigate) and keep an audit-friendly history of changes and closures. You can register for free at https://basefortify.eu/register. With a free account you set up a watch list of 3 devices of any type (Desktop/laptop, server, mobile device) and a 100 applications, more than enough to add every component mentioned in this article.

How to model LLM components in BaseFortify (so CVEs match)

TL;DR

Core idea

Naming conventions that work (CPE-ish)

A) Self-hosted: a concrete template (Linux node)

B) SaaS LLMs (Anthropic, DeepSeek, OpenAI, etc.)

C) RAG & file handling components worth tracking

Modeling examples (with one-sentence explanations)

A) vLLM + PostgreSQL/pgvector stack

B) Triton + Milvus stack

C) SaaS (Anthropic) + light local tooling

Handy commands to capture versions quickly

Common pitfalls (and how to avoid them)

How BaseFortify helps—specifically

Further reading