Selected Works

LLM Routing, Research, and AI Infra.

Papers and infrastructure work around routing as a systems problem.

Primary Track
vLLM Semantic Router

vLLM Semantic Router

Co-Founder

Signal-driven decision routing for mixture-of-modality deployments.

17 papers / 2025-2026

Paper Archive

Papers on routing, systems, and inference optimization.

A research archive spanning semantic routing, agent behavior, and infrastructure efficiency.

Research Publication

Elephant Agent: Personal-Model-First Self-Evolution for Personal AI

Agentic Intelligence Lab / 2026

Xunzhuo Liu, Hao Wu, Huamin Chen, Xue Liu, Bowei He

Introduces a personal-model-first agent architecture where personal AI grows a correctable understanding of Identity, World, Pulse, and Journey through user-paced curiosity and reflection after each turn.

Research Publication

Token-Budget-Aware Pool Routing for Cost-Efficient LLM Inference

arXiv Technical Report / 2026

Huamin Chen, Xunzhuo Liu, Junchen Jiang, Bowei He, Xue Liu

Proposes token-budget-aware pool routing that estimates each request's token budget online and dispatches it to short- or long-context serving pools, reducing GPU cost while improving stability for LLM inference.

Research Publication

Adaptive Vision-Language Model Routing for Computer Use Agents

arXiv Technical Report / 2026

Xunzhuo Liu, Bowei He, Xue Liu, Andy Luo, Haichen Zhang, Huamin Chen

Proposes Adaptive VLM Routing to estimate step difficulty in computer-use agents and route each action to the cheapest model that can still satisfy a target reliability threshold.

Research Publication

When to Reason: Semantic Router for vLLM

NeurIPS - MLForSys / 2025

Chen Wang, Xunzhuo Liu, Yuhan Liu, Yue Zhu, Xiangxi Mo, Junchen Jiang, Huamin Chen

Routes prompts by reasoning requirements so reasoning is only invoked when it pays off, improving accuracy while cutting token usage and latency versus always-on reasoning.

Research Publication

Category-Aware Semantic Caching for Heterogeneous LLM Workloads

arXiv Technical Report / 2025

Chen Wang, Xunzhuo Liu, Yue Zhu, Alaa Youssef, Priya Nagpurkar, Huamin Chen

Proposes category-aware semantic caching with category-specific similarity thresholds, TTLs, and quotas, using a hybrid split between in-memory HNSW retrieval and external document storage.

Open Source

Infrastructure and standards work beyond the router.

Maintainer, steering, and reviewer roles across gateways, service mesh, and inference infrastructure.

Node 01

Elephant Agent

Creator

Personal-model-first self-evolving AI agent that grows correctable understanding and gets curious at the user's pace.

Node 02

Envoy Gateway

Steering Committee and Maintainer

Manages Envoy Proxy as a standalone or Kubernetes-based application gateway.

Node 03

Envoy AI Gateway

Maintainer

Manages unified access to generative AI services built on Envoy Gateway.

Node 04

vLLM AIBrix

Maintainer

Cost-efficient and pluggable infrastructure components for GenAI inference.

Node 05

Higress

Approver

AI gateway and AI-native API gateway.

Node 06

Istio

Maintainer

Connects, secures, controls, and observes services.

Node 07

Kiali

Maintainer

Observability console for Istio with service mesh.

Node 08

Aeraki Mesh

Maintainer

Manages any layer-7 protocols in a service mesh.

Node 09

Merbridge

Maintainer

Uses eBPF to speed up service mesh data paths.

Node 10

Kubernetes Gateway API

Reviewer

Role-oriented, portable, and expressive interfaces for Kubernetes networking.

Node 11

Kubernetes Ingress2Gateway

Reviewer

Converts Ingress resources to Gateway API resources.