Why Your Rails Error Logs Are Full of Noise
Every public-facing Rails application is under constant reconnaissance. Automated bot scanners continuously probe for exposed credentials, configuration files, and common CMS admin panels — often attempting thousands of requests per day for paths like /.env, /wp-admin, /aws/credentials, and /.git/config.
These requests don't simply bounce off your application with a harmless 404. They penetrate deep into the Rails stack, triggering a cascade of unnecessary operations:
- Routing noise as Rails tries to match
/xmlrpc.phpor/phpmyadmin/against your route definitions, generating 404 responses - Middleware overhead as each probe traverses the full Rack stack — session loading, parameter parsing, and routing — before returning a 404
- Database queries executed by
before_actionfilters in yourApplicationController(authentication checks, tenant lookups) that fire even for requests heading toward a 404 - Error monitoring noise that obscures real application errors in your exception tracker
The result? Error logs flooded with exceptions for paths your application never intended to serve, obscuring genuine errors that need investigation. Your monitoring tools count these as application errors, skewing your metrics and potentially triggering false alerts.
Key insight: By the time Rails returns a 404 for/.env.backup, your application has already spent milliseconds executing filters, checking sessions, and querying databases — wasted cycles repeated thousands of times daily across bot traffic.
Why Rack Middleware Is the Right Layer for This
Rack sits between your web server (Nginx, Puma) and your Rails application, processing every HTTP request before it touches ActionDispatch, the router, or any controller code. This makes it the ideal interception point for bot scanner traffic.
When a bot requests /wp-admin/install.php, rejecting it at the Rack layer means you avoid:
- Router pattern matching across dozens or hundreds of routes
- ActionDispatch middleware chain (session loading, parameter parsing, cookie handling)
- Controller instantiation and filter execution
- Database connection checkout
- ActiveRecord queries triggered by
before_actioncallbacks - View rendering and response serialisation
The request still enters the Rack middleware chain, but it never reaches routing, controllers, or your application logic. Compare this to alternatives:
Nginx rules block traffic even earlier — before it reaches Puma/Rack — and Nginx supports PCRE regex matching, conditionals, and graceful reloads. For pure performance, Nginx is cheaper. But Rack middleware is more portable: it ships with your app code, uses Ruby pattern matching, and deploys with your normal release process. For maximum protection, use both.
Controller filters run too late — the request has already consumed resources traversing the full middleware stack and matching routes.
External WAFs (Cloudflare, AWS WAF) add latency, monthly costs, and complexity. They excel at volumetric attacks but are overkill for simple pattern matching against known bot signatures.
Rack middleware gives you Rails-native pattern matching that stops rejected requests before routing, controllers, or database connections — minimal overhead compared to letting them traverse the full stack.
Designing the Middleware: What to Block and How
Effective bot blocking requires multiple detection layers — a single matching strategy will miss probe variants. The middleware uses three complementary techniques:
Extension matching catches technology-specific probes like .php, .asp, or .jsp files. Scanners blindly test for common web platforms, and these extensions have no legitimate place in a Rails application.
Dot-directory and dot-file detection blocks attempts to access hidden configuration directories (.git, .aws, .docker) and sensitive files (.env, credentials). This protects accidentally exposed repositories or credential files.
Filename prefix variants handle obfuscated naming patterns. Scanners don't just probe for .env — they also test .env.backup, .env-production, .env_local, and .env-script.js. Without the middleware, these all reach the Rails router, traverse the middleware stack, and return 404 — adding noise to logs and consuming resources unnecessarily.
def bot_filename?(path)
basename = path.split("/").last.to_s
BOT_FILENAMES.any? { |name|
basename == name || basename.start_with?("#{name}.")
} ||
basename.start_with?(".env-") ||
basename.start_with?(".env_") ||
basename.start_with?("ftpsync")
end
The middleware returns a404immediately when any pattern matches — before routing, CSRF checks, database connections, or controller logic execute.
Rack middleware is the sweet spot for bot blocking in Rails: it intercepts requests before routing, CSRF checks, or database queries — minimal overhead compared to letting scanner probes traverse the full stack.
Nginx rules block traffic even earlier (before Puma/Rack) and support PCRE regex matching with graceful reloads. For pure performance, Nginx is cheaper. But Rack middleware is more portable — it ships with your app code and deploys with your normal release process. Controller filters run too late — requests have already consumed resources through the full middleware stack. External WAFs (Cloudflare, AWS WAF) add latency, monthly costs, and complexity that's overkill for simple pattern matching.
For maximum protection, layer both: Rack middleware as your primary defence, Nginx rules upstream for defence in depth.
The Implementation: A Complete, Battle-Tested Middleware
Here's the complete middleware with inline commentary explaining each detection strategy:
class BotPathBlocker
# Extensions commonly probed by PHP/ASP exploit scanners
BOT_EXTENSIONS = %w[.php .asp .aspx .cgi .jsp .py .pl].freeze
# Hidden directories scanners enumerate for credentials
BOT_DOTDIRS = %w[.git .svn .aws .ssh .docker .kube].freeze
# Config files with common naming variants
BOT_FILENAMES = %w[.env .htaccess credentials wp-config].freeze
# Exact paths frequently targeted (WordPress, CMS admin panels)
BOT_PATHS = %w[/wp-admin /wp-login.php /admin /phpmyadmin].freeze
def initialize(app)
@app = app
end
def call(env)
return block_request if bot_path?(env["PATH_INFO"].to_s)
@app.call(env)
end
private
def bot_path?(path)
# Extension check: catches exploit scanners probing PHP/ASP endpoints
return true if BOT_EXTENSIONS.any? { |ext| path.end_with?(ext) }
# Dot-directory check: detects enumeration of version control and config dirs
return true if path.split("/").any? { |segment| BOT_DOTDIRS.any? { |dir| segment.start_with?(dir) } }
# Exact path check: blocks common CMS admin panel probes
return true if BOT_PATHS.include?(path)
# Filename check with prefix variants: handles .env, .env.backup, .env-production, .env_local
bot_filename?(path)
end
def bot_filename?(path)
basename = path.split("/").last.to_s
BOT_FILENAMES.any? do |name|
basename == name ||
basename.start_with?("#{name}.") ||
basename.start_with?("#{name}-") ||
basename.start_with?("#{name}_")
end
end
# Return minimal 404 with no body to avoid leaking server information
def block_request
[404, { "Content-Type" => "text/plain" }, []]
end
end
The key insight: dash and underscore prefixes (-, _) were edge cases discovered in production. Scanners probe .env-production.js and .config_backup.php to bypass naive matchers checking only dot separators.
Edge Cases Discovered in Production
While the initial middleware implementation blocked obvious patterns, production log analysis revealed sophisticated scanner behaviour that bypassed naïve matchers. The most common gap appeared in config file naming conventions—requests for wp-config-sample.php and .env-production.js sailed through the middleware because they used dashes or underscores between segments rather than dots.
The original bot_filename? method only checked for dot-separated variants:
def bot_filename?(path)
basename = path.split("/").last.to_s
BOT_FILENAMES.any? { |n| basename == n || basename.start_with?("#{n}.") }
end
This caught .env.backup but missed .env-backup, .env_production, and config-script.js. Weekly log reviews showed these variants reaching the Rails router and generating 404 responses — adding noise to error monitoring and consuming resources through the full middleware stack unnecessarily.
The hardened version explicitly handles multiple separator patterns:
def bot_filename?(path)
basename = path.split("/").last.to_s
BOT_FILENAMES.any? { |n| basename == n || basename.start_with?("#{n}.") } ||
basename.start_with?(".env-") ||
basename.start_with?(".env_") ||
basename.start_with?("ftpsync")
end
Key lesson: Extension-only matching is insufficient. Bot scanners actively probe separator variants (.,-,_) to evade basic pattern filters.
Logging and Observability: Knowing What You're Blocking
Visibility into blocked traffic is essential — not just to confirm your middleware is working, but to spot emerging scanner patterns that aren't yet on your blocklist. Structured logging allows you to aggregate blocked requests without drowning application logs in noise.
Add logging directly to your middleware using Rails' tagged logging:
class BotPathBlocker
def call(env)
path = env["PATH_INFO"]
if bot_path?(path)
# Use REMOTE_ADDR as the primary source; only trust X-Forwarded-For
# if your reverse proxy is configured to sanitise it
ip = env["REMOTE_ADDR"]
Rails.logger.info(
"[BotBlocker] Blocked: #{env['REQUEST_METHOD']} #{path} from #{ip}"
)
return [404, {}, []]
end
@app.call(env)
end
end
For production environments, consider incrementing metrics instead of (or alongside) log entries. A simple counter with path labels helps identify which patterns trigger most frequently:
BLOCKED_COUNTER = Prometheus::Client::Counter.new(
:blocked_bot_requests_total,
docstring: 'Blocked bot scanner requests',
labels: [:pattern_type, :extension]
)
Review blocked request logs weekly to identify new scanner patterns. If you see repeated requests for.envsor.aws-credentials, add them to your blocklist before they multiply.
Keep blocked request logs in a separate namespace (e.g., tagged with [BotBlocker]) so they can be filtered out of standard application monitoring without losing visibility altogether.
Testing the Middleware
Testing your middleware thoroughly prevents both security gaps and false positives that might block legitimate application routes. Start with unit tests that verify the middleware in isolation from Rails:
RSpec.describe BotPathBlocker do
let(:app) { ->(env) { [200, {}, ["OK"]] } }
let(:middleware) { described_class.new(app) }
def call_with_path(path)
middleware.call({ "PATH_INFO" => path })
end
it "blocks .env file requests" do
status, = call_with_path("/.env")
expect(status).to eq(404)
end
it "blocks dash-separated config variants" do
status, = call_with_path("/.env-production")
expect(status).to eq(404)
end
it "allows legitimate application paths" do
status, = call_with_path("/dashboard")
expect(status).to eq(200)
end
it "blocks PHP exploit probes" do
status, = call_with_path("/wp-admin/install.php")
expect(status).to eq(404)
end
end
Integration tests verify that blocked requests never reach your Rails router. Mount a test route that increments a counter, then confirm bot paths never trigger it:
it "prevents blocked paths from reaching Rails routing" do
counter = 0
Rails.application.routes.draw do
get "/*path", to: ->(_) { counter += 1; [200, {}, []] }
end
get "/.aws/credentials"
expect(response.status).to eq(404)
expect(counter).to eq(0)
end
Critical: Test your blocklist against every route inrails routes. Patterns like/api/configor/files/.hiddenmight accidentally match bot detection rules, creating false positives in production.
Scanners actively probe dash and underscore-separated config file variants like .env-production, .env_local, and .env-backup specifically to bypass naive filters that only match dot separators (e.g., .env.backup). Without the middleware, these variants reach the Rails router and generate 404 responses — adding noise to error monitoring and consuming resources through the full middleware stack for requests that could be rejected much earlier.
Deployment Considerations and Complementary Measures
Roll out bot-blocking middleware using a staged approach to avoid false positives. Start in monitoring mode by logging matched paths rather than blocking them:
def call(env)
path = env["PATH_INFO"].to_s
if bot_path?(path)
Rails.logger.info("[BotBlock] Would block: #{path}")
# return [404, {}, []] # Uncomment to enforce blocking
end
@app.call(env)
end
Review logs for a week, add legitimate patterns to an allowlist, then enable blocking.
Defence in depth: Rack middleware is your first line of defence, not your only one.
Combine it with upstream rules for maximum efficiency. Configure Nginx to reject bot patterns before they reach your application server:
# Nginx can use 403 or 444 (drop connection) — either is valid
# since Nginx is outside the app and not mimicking a "not found" response
location ~ /\.(env|git|aws) {
return 444;
}
location ~ \.(php|asp|jsp)$ {
return 444;
}
At the CDN level (Cloudflare, Fastly), create firewall rules using threat scores or known bot signatures. This blocks attacks at the edge, reducing bandwidth costs.
Add rate limiting as a complementary layer using Rack::Attack — middleware alone won't stop volumetric attacks from rotating IPs. For persistent offenders, implement IP-based blocking with temporary bans (e.g., 24-hour blocks after 100 rejected requests).
Monitor your middleware's effectiveness by tracking 403 responses in your analytics pipeline. If you see a pattern bypass detection, extend your matcher patterns immediately.