Selected Works

LLM Routing, Research, and AI Infra.

Papers and infrastructure work around routing as a systems problem.

Primary Track
vLLM Semantic Router

vLLM Semantic Router

Co-Founder

Signal-driven decision routing for mixture-of-modality deployments.

16 papers / 2025-2026

Paper Archive

Papers on routing, systems, and inference optimization.

A research archive spanning semantic routing, agent behavior, and infrastructure efficiency.

Research Publication

Token-Budget-Aware Pool Routing for Cost-Efficient LLM Inference

arXiv Technical Report / 2026

Huamin Chen, Xunzhuo Liu, Junchen Jiang, Bowei He, Xue Liu

Proposes token-budget-aware pool routing that estimates each request's token budget online and dispatches it to short- or long-context serving pools, reducing GPU cost while improving stability for LLM inference.

Research Publication

Adaptive Vision-Language Model Routing for Computer Use Agents

arXiv Technical Report / 2026

Xunzhuo Liu, Bowei He, Xue Liu, Andy Luo, Haichen Zhang, Huamin Chen

Proposes Adaptive VLM Routing to estimate step difficulty in computer-use agents and route each action to the cheapest model that can still satisfy a target reliability threshold.

Research Publication

When to Reason: Semantic Router for vLLM

NeurIPS - MLForSys / 2025

Chen Wang, Xunzhuo Liu, Yuhan Liu, Yue Zhu, Xiangxi Mo, Junchen Jiang, Huamin Chen

Routes prompts by reasoning requirements so reasoning is only invoked when it pays off, improving accuracy while cutting token usage and latency versus always-on reasoning.

Research Publication

Category-Aware Semantic Caching for Heterogeneous LLM Workloads

arXiv Technical Report / 2025

Chen Wang, Xunzhuo Liu, Yue Zhu, Alaa Youssef, Priya Nagpurkar, Huamin Chen

Proposes category-aware semantic caching with category-specific similarity thresholds, TTLs, and quotas, using a hybrid split between in-memory HNSW retrieval and external document storage.

Open Source

Infrastructure and standards work beyond the router.

Maintainer, steering, and reviewer roles across gateways, service mesh, and inference infrastructure.

Node 01

Envoy Gateway

Steering Committee and Maintainer

Manages Envoy Proxy as a standalone or Kubernetes-based application gateway.

Node 02

Envoy AI Gateway

Maintainer

Manages unified access to generative AI services built on Envoy Gateway.

Node 03

vLLM AIBrix

Maintainer

Cost-efficient and pluggable infrastructure components for GenAI inference.

Node 04

Higress

Approver

AI gateway and AI-native API gateway.

Node 05

Istio

Maintainer

Connects, secures, controls, and observes services.

Node 06

Kiali

Maintainer

Observability console for Istio with service mesh.

Node 07

Aeraki Mesh

Maintainer

Manages any layer-7 protocols in a service mesh.

Node 08

Merbridge

Maintainer

Uses eBPF to speed up service mesh data paths.

Node 09

Kubernetes Gateway API

Reviewer

Role-oriented, portable, and expressive interfaces for Kubernetes networking.

Node 10

Kubernetes Ingress2Gateway

Reviewer

Converts Ingress resources to Gateway API resources.