QNAP’s QAI-h1290FX pairs a 16-core EPYC 7302P based on a six-year-old Zen 2 architecture with NVIDIA’s RTX PRO 6000 Blackwell carrying 96GB of VRAM, combining aging server silicon with one of the most powerful AI GPUs available today. The platform is built for running large language models locally while also acting as high-speed all-flash storage.
The EPYC 7302P comes with 16 cores and 32 threads and dates back to 2019. It still provides stable multi-threaded performance for data handling, virtualization, and inference pipelines, but it falls behind newer EPYC generations in single-core speed and memory bandwidth. In this setup, the CPU plays a supporting role, while most compute-heavy tasks shift to the GPU.
The RTX PRO 6000 Blackwell Max-Q is the main driver for AI workloads, focusing on compute acceleration handled by the GPU vs CPU for gaming and compute discussion. With 96GB of GDDR7 memory, it is built for large models that need high memory capacity, including 70B-parameter class LLMs. QNAP also offers an RTX PRO 4500 Blackwell option with 32GB VRAM, aimed at smaller models around the 30B range.
The NAS is built around a full flash storage layout with twelve hot-swappable 2.5-inch U.2 NVMe or SATA bays. This setup is designed for fast access to datasets, embeddings, and model files, which is critical for AI workloads that depend on high I/O throughput.
Memory starts at 128GB of DDR4 RDIMM ECC and can scale up to 1TB depending on configuration, reflecting enterprise-grade DDR5 RAM trends in modern systems. Expansion is handled through four PCIe Gen4 slots, arranged as three x16 and one x8, allowing additional networking cards, GPUs, or storage upgrades.
Networking focuses on high throughput, with dual 25GbE SFP28 ports alongside dual 2.5GbE connections. The platform also supports 100GbE upgrades through PCIe expansion, which is important for moving large datasets quickly during AI inference and training workflows.
The QAI-h1290FX is designed for fully local deployment. It supports container-based environments through Docker and LXD, along with built-in tools such as Ollama and vLLM for running models directly on the device. This allows organizations to keep sensitive data on-premises rather than sending it to cloud services.

Performance varies depending on model size. Smaller models around 8B parameters can reach up to 170 tokens per second, while larger models such as 70B or 120B see lower throughput as memory demand increases. This scaling reflects the GPU’s role as the main compute engine, with VRAM capacity becoming the limiting factor.
The overall design positions the QAI-h1290FX as a turnkey AI appliance, following trends in AI computing platforms built for local workloads. Instead of building a workstation with separate CPU, GPU, storage, and networking components, everything is integrated into one unit. The trade-off is flexibility, along with reliance on an older CPU that may limit workloads needing stronger single-thread performance.
Pricing starts at $8,999 for the 64GB configuration, rising to $13,499 for 128GB and $15,999 for 256GB memory options. The NAS is available through QNAP’s enterprise channels.
Source: Reddit






