Claude Code Digest — 2026-03-28 20:10:04
What the docs reveal
Anthropic is quietly battling prompt bloat. Claude Code now enforces strict character limits on custom skills. The system dynamically truncates individual skill descriptions at 250 characters. Furthermore, total skill metadata now receives a strict budget defaulting to one percent of the context window, with an 8,000-character floor.
Developers clearly write massive tool descriptions. When you load dozens of custom tools into Claude Code, their metadata consumes tens of thousands of tokens before you even type a prompt. This starves the agent of reasoning headroom. It slows down time-to-first-token. It drives up API costs. By instituting hard truncation, Anthropic forces developers to write concise, precise tool definitions.
You must adapt your custom skills immediately. Stop writing paragraphs explaining what a tool does. Focus entirely on keywords, exact parameters, and strict return formats. If your workflow relies on massive prompt injection within tool definitions, your skills will break as Claude silently cuts them off at the 250-character mark.
Anthropic did leave a heavy-handed escape hatch. You can override the global limit using the new SLASH_COMMAND_TOOL_CHAR_BUDGET environment variable. Raising this value keeps massive toolsets from falling out of the total context budget, but it does not bypass the 250-character limit per description. Use this variable cautiously. Enlarging the allowed metadata budget actively damages the model's working memory.
Enterprise adoption requires enterprise observability. Claude Code now injects the X-Claude-Code-Session-Id HTTP header into all outgoing requests. This allows corporate proxies and LLM gateways to track and aggregate API usage by terminal session without pulling apart the JSON request body.
Parsing massive JSON body payloads at the gateway layer introduces latency and wastes CPU cycles. Proxies operate exponentially faster when they route based solely on headers. Before this change, tracing a massive token spike back to a specific developer's terminal session required deep inspection or complex logging wrappers. Today, a simple HTTP header exposes the exact session origin.
Platform engineering teams will adopt this immediately. If your company routes API traffic through systems like LiteLLM, Cloudflare AI Gateway, or an internal proxy, you can finally build accurate chargeback models per developer session. You gain precise visibility into exactly how much capital a specific coding session consumes.