
How to model LLM components in BaseFortify (so CVEs match)
Publication date: 2025-10-09
TL;DR
This is Part 3 in our series on making LLMs safe and compliant in real organisations. In Part 1 we unpacked the EU AI Act’s phased obligations (bans from Feb 2025, GPAI duties from Aug 2025, broad transparency by Aug 2026), and what businesses should prepare. In Part 2 we showed where LLM stacks actually break (Triton, vLLM, Transformers, LangChain) and the LLM-specific mitigations that help. Here in Part 3 we focus on how to model your LLM stack inside BaseFortify using simple, CPE-ish vendor, product, version entries so that when a CVE is published, it automatically becomes a tracked threat on the right node.
Core idea
Building on Part 2’s findings (real-world RCE/DoS in inference servers, unsafe loaders in model toolchains, and injection via retrieval), treat “LLM” as a stack and record each layer as its own component. That means adding the inference runtime, SDKs/libraries, gateway/orchestration, vector/database layer, GPU stack, and OS/container runtime separately so CVE matches are precise.
-
- Inference server/runtime (e.g.,
nvidia, triton_inference_server
;vllm, vllm
) - Frameworks/SDKs (e.g.,
huggingface, transformers
;langchain, langchain
) - Orchestration/gateway (e.g.,
anyscale, ray
;nginx, nginx
) - Vector & database (e.g.,
milvus, milvus
;postgresql, postgresql
) - GPU stack (e.g.,
nvidia, gpu_display_driver
;nvidia, cuda
;nvidia, cudnn
) - OS & container runtime (e.g.,
canonical, ubuntu_linux
;docker, docker
)
- Inference server/runtime (e.g.,
Why this works: CVE feeds typically reference the runtime/server or library names—not “ChatGPT/Claude/DeepSeek”. Modeling the stack ensures BaseFortify can match those CVEs to your environment.
Naming conventions that work (CPE-ish)
-
- vendor: lowercase, underscores (e.g.,
nvidia
,huggingface
,anthropic
) - product: lowercase, underscores (e.g.,
triton_inference_server
,transformers
,langchain
) - version: exact semantic or distro version (e.g.,
25.07
,4.46.0
,22.04
)
- vendor: lowercase, underscores (e.g.,
Example: nvidia, triton_inference_server, 25.07
— clean, comparable, and easy to update.
A) Self-hosted: a concrete template (Linux node)
A lean baseline that still catches the bulk of LLM-related CVEs; swap vLLM for Triton if that’s your stack.
-
canonical, ubuntu_linux, 22.04
docker, docker, 24.0.7
nvidia, gpu_display_driver, 550.90.07
nvidia, cuda, 12.4.0
vllm, vllm, 0.9.0
pytorch, pytorch, 2.4.1
huggingface, transformers, 4.46.0
nginx, nginx, 1.24.0
Tip: If you run RAG, add your vector DB (e.g., postgresql, postgresql, …
or milvus, milvus, …
) separately.
B) SaaS LLMs (Anthropic, DeepSeek, OpenAI, etc.)
Model two things: a service placeholder (documents usage; CVEs rarely match there) and the SDK/client library (CVEs may match here).
-
anthropic, claude_api_service, 2025-09
|anthropic, anthropic_python_sdk, 0.34.0
deepseek, deepseek_api_service, 2025-09
| (SDK if used)openai, openai_python_sdk, 1.50.0
Rationale: hosted model services seldom get CVEs in NVD, but their SDKs (and your surrounding stack) do. Recording the SDKs ensures threats surface.
C) RAG & file handling components worth tracking
RAG pipelines introduce parsers and tokenizers that process untrusted data; if it’s imported and touches untrusted input, model it.
-
python, python, 3.10.12
(base runtime; frequent security fixes)- Parsers you use (e.g.,
apache, tika, <version>
;unstructured, unstructured, <version>
) - Embedding libs if distinct (e.g.,
sentence_transformers, sentence_transformers, <version>
)
Modeling examples (with one-sentence explanations)
A) vLLM + PostgreSQL/pgvector stack
A lean self-hosted serving stack (vLLM) with a classic relational store plus vector embeddings via pgvector—great for RAG with simple ops and backups.
Copy/paste set (comma-separated):
canonical, ubuntu_linux, 22.04
docker, docker, 24.0.7
nvidia, gpu_display_driver, 550.90.07
nvidia, cuda, 12.4.0
vllm, vllm, 0.9.0
pytorch, pytorch, 2.4.1
huggingface, transformers, 4.46.0
langchain, langchain, 0.3.6
postgresql, postgresql, 16.3
nginx, nginx, 1.24.0
B) Triton + Milvus stack
A high-performance serving stack using NVIDIA Triton with a dedicated vector database (Milvus) built for large-scale similarity search.
Copy/paste set (comma-separated):
canonical, ubuntu_linux, 22.04
docker, docker, 24.0.7
nvidia, gpu_display_driver, 550.90.07
nvidia, cuda, 12.4.0
nvidia, triton_inference_server, 25.07
microsoft, onnx_runtime, 1.18.0
huggingface, transformers, 4.46.0
langchain, langchain, 0.3.6
milvus, milvus, 2.4.4
nginx, nginx, 1.24.0
C) SaaS (Anthropic) + light local tooling
A minimal footprint using a hosted LLM (Anthropic) while keeping just enough local SDK/tooling and gateway to integrate safely into your environment.
Copy/paste set (comma-separated):
canonical, ubuntu_linux, 22.04
anthropic, claude_api_service, 2025-09
anthropic, anthropic_python_sdk, 0.34.0
huggingface, transformers, 4.46.0
langchain, langchain, 0.3.6
nginx, nginx, 1.24.0
Handy commands to capture versions quickly
Use these to read exact versions from running systems so your BaseFortify entries are precise and match CVEs reliably.
-
- OS:
lsb_release -ds
- Docker:
docker --version
- NVIDIA:
nvidia-smi
(driver/CUDA);nvcc --version
- vLLM:
python -c "import vllm; print(vllm.__version__)"
- Triton: check the container tag you run (e.g.,
nvcr.io/nvidia/tritonserver:25.07-py3
) - PyTorch:
python -c "import torch; print(torch.__version__)"
- Transformers:
python -c "import transformers; print(transformers.__version__)"
- LangChain:
python - <<'PY'
import importlib
m = importlib.import_module('langchain')
print(getattr(m, '__version__', 'unknown'))
PY
- OS:
Common pitfalls (and how to avoid them)
-
- Only adding the model name (e.g., “Claude”, “DeepSeek R1”): won’t match many CVEs. Fix: add the runtime/server, SDKs, gateway, vector DB, and GPU stack as separate components.
- Inconsistent names (e.g., “NVIDIA Triton”, “triton”, “triton-inference”): breaks matching and search. Fix: stick to a single CPE-ish format (e.g.,
nvidia, triton_inference_server
). - Missing versions: without a version, a match may not trigger. Fix: always record the exact version you deploy.
- Lumping dependencies into one line: reduces match precision. Fix: one line per component.
How BaseFortify helps—specifically
You record components as vendor, product, version (as shown above). When a CVE is published that matches those fields, BaseFortify creates a threat automatically on the affected node/device. You can then track remediation (upgrade/mitigate) and keep an audit-friendly history of changes and closures. You can register for free at https://basefortify.eu/register. With a free account you set up a watch list of 3 devices of any type (Desktop/laptop, server, mobile device) and a 100 applications, more than enough to add every component mentioned in this article.
Further reading