Rack Middleware as Your First Line of Defence: Blocking Bot Scanners Before They Hit Rails

Why Your Rails Error Logs Are Full of Noise

Every public-facing Rails application is under constant reconnaissance. Automated bot scanners continuously probe for exposed credentials, configuration files, and common CMS admin panels — often attempting thousands of requests per day for paths like /.env, /wp-admin, /aws/credentials, and /.git/config.

These requests don't simply bounce off your application with a harmless 404. They penetrate deep into the Rails stack, triggering a cascade of unnecessary operations:

Routing noise as Rails tries to match /xmlrpc.php or /phpmyadmin/ against your route definitions, generating 404 responses
Middleware overhead as each probe traverses the full Rack stack — session loading, parameter parsing, and routing — before returning a 404
Database queries executed by before_action filters in your ApplicationController (authentication checks, tenant lookups) that fire even for requests heading toward a 404
Error monitoring noise that obscures real application errors in your exception tracker

The result? Error logs flooded with exceptions for paths your application never intended to serve, obscuring genuine errors that need investigation. Your monitoring tools count these as application errors, skewing your metrics and potentially triggering false alerts.

Key insight: By the time Rails returns a 404 for /.env.backup, your application has already spent milliseconds executing filters, checking sessions, and querying databases — wasted cycles repeated thousands of times daily across bot traffic.

Why Rack Middleware Is the Right Layer for This

Rack sits between your web server (Nginx, Puma) and your Rails application, processing every HTTP request before it touches ActionDispatch, the router, or any controller code. This makes it the ideal interception point for bot scanner traffic.

When a bot requests /wp-admin/install.php, rejecting it at the Rack layer means you avoid:

Router pattern matching across dozens or hundreds of routes
ActionDispatch middleware chain (session loading, parameter parsing, cookie handling)
Controller instantiation and filter execution
Database connection checkout
ActiveRecord queries triggered by before_action callbacks
View rendering and response serialisation

The request still enters the Rack middleware chain, but it never reaches routing, controllers, or your application logic. Compare this to alternatives:

Nginx rules block traffic even earlier — before it reaches Puma/Rack — and Nginx supports PCRE regex matching, conditionals, and graceful reloads. For pure performance, Nginx is cheaper. But Rack middleware is more portable: it ships with your app code, uses Ruby pattern matching, and deploys with your normal release process. For maximum protection, use both.

Controller filters run too late — the request has already consumed resources traversing the full middleware stack and matching routes.

External WAFs (Cloudflare, AWS WAF) add latency, monthly costs, and complexity. They excel at volumetric attacks but are overkill for simple pattern matching against known bot signatures.

Rack middleware gives you Rails-native pattern matching that stops rejected requests before routing, controllers, or database connections — minimal overhead compared to letting them traverse the full stack.

Designing the Middleware: What to Block and How

Effective bot blocking requires multiple detection layers — a single matching strategy will miss probe variants. The middleware uses three complementary techniques:

Extension matching catches technology-specific probes like .php, .asp, or .jsp files. Scanners blindly test for common web platforms, and these extensions have no legitimate place in a Rails application.

Dot-directory and dot-file detection blocks attempts to access hidden configuration directories (.git, .aws, .docker) and sensitive files (.env, credentials). This protects accidentally exposed repositories or credential files.

Filename prefix variants handle obfuscated naming patterns. Scanners don't just probe for .env — they also test .env.backup, .env-production, .env_local, and .env-script.js. Without the middleware, these all reach the Rails router, traverse the middleware stack, and return 404 — adding noise to logs and consuming resources unnecessarily.

def bot_filename?(path)
  basename = path.split("/").last.to_s
  
  BOT_FILENAMES.any? { |name| 
    basename == name || basename.start_with?("#{name}.")
  } ||
  basename.start_with?(".env-") ||
  basename.start_with?(".env_") ||
  basename.start_with?("ftpsync")
end

The middleware returns a 404 immediately when any pattern matches — before routing, CSRF checks, database connections, or controller logic execute.

Choosing the Right Layer

Rack middleware is the sweet spot for bot blocking in Rails: it intercepts requests before routing, CSRF checks, or database queries — minimal overhead compared to letting scanner probes traverse the full stack.

Nginx rules block traffic even earlier (before Puma/Rack) and support PCRE regex matching with graceful reloads. For pure performance, Nginx is cheaper. But Rack middleware is more portable — it ships with your app code and deploys with your normal release process. Controller filters run too late — requests have already consumed resources through the full middleware stack. External WAFs (Cloudflare, AWS WAF) add latency, monthly costs, and complexity that's overkill for simple pattern matching.

For maximum protection, layer both: Rack middleware as your primary defence, Nginx rules upstream for defence in depth.

The Implementation: A Complete, Battle-Tested Middleware

Here's the complete middleware with inline commentary explaining each detection strategy:

class BotPathBlocker
  # Extensions commonly probed by PHP/ASP exploit scanners
  BOT_EXTENSIONS = %w[.php .asp .aspx .cgi .jsp .py .pl].freeze
  
  # Hidden directories scanners enumerate for credentials
  BOT_DOTDIRS = %w[.git .svn .aws .ssh .docker .kube].freeze
  
  # Config files with common naming variants
  BOT_FILENAMES = %w[.env .htaccess credentials wp-config].freeze
  
  # Exact paths frequently targeted (WordPress, CMS admin panels)
  BOT_PATHS = %w[/wp-admin /wp-login.php /admin /phpmyadmin].freeze

  def initialize(app)
    @app = app
  end

  def call(env)
    return block_request if bot_path?(env["PATH_INFO"].to_s)
    @app.call(env)
  end

  private

  def bot_path?(path)
    # Extension check: catches exploit scanners probing PHP/ASP endpoints
    return true if BOT_EXTENSIONS.any? { |ext| path.end_with?(ext) }
    
    # Dot-directory check: detects enumeration of version control and config dirs
    return true if path.split("/").any? { |segment| BOT_DOTDIRS.any? { |dir| segment.start_with?(dir) } }
    
    # Exact path check: blocks common CMS admin panel probes
    return true if BOT_PATHS.include?(path)
    
    # Filename check with prefix variants: handles .env, .env.backup, .env-production, .env_local
    bot_filename?(path)
  end

  def bot_filename?(path)
    basename = path.split("/").last.to_s
    BOT_FILENAMES.any? do |name|
      basename == name || 
      basename.start_with?("#{name}.") || 
      basename.start_with?("#{name}-") || 
      basename.start_with?("#{name}_")
    end
  end

  # Return minimal 404 with no body to avoid leaking server information
  def block_request
    [404, { "Content-Type" => "text/plain" }, []]
  end
end

The key insight: dash and underscore prefixes (-, _) were edge cases discovered in production. Scanners probe .env-production.js and .config_backup.php to bypass naive matchers checking only dot separators.

Edge Cases Discovered in Production

While the initial middleware implementation blocked obvious patterns, production log analysis revealed sophisticated scanner behaviour that bypassed naïve matchers. The most common gap appeared in config file naming conventions—requests for wp-config-sample.php and .env-production.js sailed through the middleware because they used dashes or underscores between segments rather than dots.

The original bot_filename? method only checked for dot-separated variants:

def bot_filename?(path)
  basename = path.split("/").last.to_s
  BOT_FILENAMES.any? { |n| basename == n || basename.start_with?("#{n}.") }
end

This caught .env.backup but missed .env-backup, .env_production, and config-script.js. Weekly log reviews showed these variants reaching the Rails router and generating 404 responses — adding noise to error monitoring and consuming resources through the full middleware stack unnecessarily.

The hardened version explicitly handles multiple separator patterns:

def bot_filename?(path)
  basename = path.split("/").last.to_s
  BOT_FILENAMES.any? { |n| basename == n || basename.start_with?("#{n}.") } ||
    basename.start_with?(".env-") ||
    basename.start_with?(".env_") ||
    basename.start_with?("ftpsync")
end

Key lesson: Extension-only matching is insufficient. Bot scanners actively probe separator variants (., -, _) to evade basic pattern filters.

Logging and Observability: Knowing What You're Blocking

Visibility into blocked traffic is essential — not just to confirm your middleware is working, but to spot emerging scanner patterns that aren't yet on your blocklist. Structured logging allows you to aggregate blocked requests without drowning application logs in noise.

Add logging directly to your middleware using Rails' tagged logging:

class BotPathBlocker
  def call(env)
    path = env["PATH_INFO"]

    if bot_path?(path)
      # Use REMOTE_ADDR as the primary source; only trust X-Forwarded-For
      # if your reverse proxy is configured to sanitise it
      ip = env["REMOTE_ADDR"]
      Rails.logger.info(
        "[BotBlocker] Blocked: #{env['REQUEST_METHOD']} #{path} from #{ip}"
      )
      return [404, {}, []]
    end

    @app.call(env)
  end
end

For production environments, consider incrementing metrics instead of (or alongside) log entries. A simple counter with path labels helps identify which patterns trigger most frequently:

BLOCKED_COUNTER = Prometheus::Client::Counter.new(
  :blocked_bot_requests_total,
  docstring: 'Blocked bot scanner requests',
  labels: [:pattern_type, :extension]
)

Review blocked request logs weekly to identify new scanner patterns. If you see repeated requests for .envs or .aws-credentials, add them to your blocklist before they multiply.

Keep blocked request logs in a separate namespace (e.g., tagged with [BotBlocker]) so they can be filtered out of standard application monitoring without losing visibility altogether.

Testing the Middleware

Testing your middleware thoroughly prevents both security gaps and false positives that might block legitimate application routes. Start with unit tests that verify the middleware in isolation from Rails:

RSpec.describe BotPathBlocker do
  let(:app) { ->(env) { [200, {}, ["OK"]] } }
  let(:middleware) { described_class.new(app) }

  def call_with_path(path)
    middleware.call({ "PATH_INFO" => path })
  end

  it "blocks .env file requests" do
    status, = call_with_path("/.env")
    expect(status).to eq(404)
  end

  it "blocks dash-separated config variants" do
    status, = call_with_path("/.env-production")
    expect(status).to eq(404)
  end

  it "allows legitimate application paths" do
    status, = call_with_path("/dashboard")
    expect(status).to eq(200)
  end

  it "blocks PHP exploit probes" do
    status, = call_with_path("/wp-admin/install.php")
    expect(status).to eq(404)
  end
end

Integration tests verify that blocked requests never reach your Rails router. Mount a test route that increments a counter, then confirm bot paths never trigger it:

it "prevents blocked paths from reaching Rails routing" do
  counter = 0
  Rails.application.routes.draw do
    get "/*path", to: ->(_) { counter += 1; [200, {}, []] }
  end

  get "/.aws/credentials"
  expect(response.status).to eq(404)
  expect(counter).to eq(0)
end

Critical: Test your blocklist against every route in rails routes. Patterns like /api/config or /files/.hidden might accidentally match bot detection rules, creating false positives in production.

Watch for Separator Variants

Scanners actively probe dash and underscore-separated config file variants like .env-production, .env_local, and .env-backup specifically to bypass naive filters that only match dot separators (e.g., .env.backup). Without the middleware, these variants reach the Rails router and generate 404 responses — adding noise to error monitoring and consuming resources through the full middleware stack for requests that could be rejected much earlier.

Deployment Considerations and Complementary Measures

Roll out bot-blocking middleware using a staged approach to avoid false positives. Start in monitoring mode by logging matched paths rather than blocking them:

def call(env)
  path = env["PATH_INFO"].to_s
  if bot_path?(path)
    Rails.logger.info("[BotBlock] Would block: #{path}")
    # return [404, {}, []]  # Uncomment to enforce blocking
  end
  @app.call(env)
end

Review logs for a week, add legitimate patterns to an allowlist, then enable blocking.

Defence in depth: Rack middleware is your first line of defence, not your only one.

Combine it with upstream rules for maximum efficiency. Configure Nginx to reject bot patterns before they reach your application server:

# Nginx can use 403 or 444 (drop connection) — either is valid
# since Nginx is outside the app and not mimicking a "not found" response
location ~ /\.(env|git|aws) {
  return 444;
}
location ~ \.(php|asp|jsp)$ {
  return 444;
}

At the CDN level (Cloudflare, Fastly), create firewall rules using threat scores or known bot signatures. This blocks attacks at the edge, reducing bandwidth costs.

Add rate limiting as a complementary layer using Rack::Attack — middleware alone won't stop volumetric attacks from rotating IPs. For persistent offenders, implement IP-based blocking with temporary bans (e.g., 24-hour blocks after 100 rejected requests).

Monitor your middleware's effectiveness by tracking 403 responses in your analytics pipeline. If you see a pattern bypass detection, extend your matcher patterns immediately.

Ruby