The AI Inference Extension enables LLM models to run on Chromia network nodes instead of centralized cloud services like AWS. Inputs and outputs are managed via blockchain transactions, processed and validated by multiple nodes. Currently available on Testnet, it supports CPU- and GPU-based models. This demo uses Qwen2.5-1.5B-Instruct, running on GPU-enabled nodes.