HIGHCWE-770AI Security
Rate Limiting Not Enforced on LLM Endpoint
Description
LLM-powered endpoints lack rate limiting, allowing attackers to exhaust your API quota and rack up massive bills.
How Vezraa Detects It
We send rapid requests to your LLM endpoints and check if rate limiting is enforced via 429 responses.
Real-World Impact
Attackers can drain your OpenAI or Anthropic budget in minutes — a single compromised endpoint can cost thousands of dollars per hour.
Fix Example
// Rate limiting with express-rate-limit
const limiter = rateLimit({
windowMs: 60 * 1000,
max: 10,
message: 'Too many requests'
});
app.use('/api/chat', limiter);Affected Stacks
OpenAIAnthropicLLM APIsNext.js