Rack Middleware as Your First Line of Defence: Blocking Bot Scanners Before They Hit Rails

Why Your Rails Error Logs Are Full of Noise

Every public-facing Rails application is under constant reconnaissance. Automated bot scanners continuously probe for exposed credentials, configuration files, and common CMS admin panels — often attempting thousands of requests per day for paths like /.env, /wp-admin, /aws/credentials, and /.git/config.

These requests don't simply bounce off your application with a harmless 404. They penetrate deep into the Rails stack, triggering a cascade of unnecessary operations:

CSRF token verification failures when bots attempt to POST to admin panels that don't exist
Routing exceptions as Rails tries to match /xmlrpc.php or /phpmyadmin/ against your route definitions
Database queries executed by before_action filters in your ApplicationController (checking current user, loading site configuration, etc.) before Rails finally determines the route doesn't exist
ActionController instrumentation overhead for every single bot request that reaches your controller layer

The result? Error logs flooded with exceptions for paths your application never intended to serve, obscuring genuine errors that need investigation. Your monitoring tools count these as application errors, skewing your metrics and potentially triggering false alerts.

Key insight: By the time Rails returns a 404 for /.env.backup, your application has already spent milliseconds executing filters, checking sessions, and querying databases — wasted cycles repeated thousands of times daily across bot traffic.

Why Rack Middleware Is the Right Layer for This

Rack sits between your web server (Nginx, Puma) and your Rails application, processing every HTTP request before it touches ActionDispatch, the router, or any controller code. This makes it the ideal interception point for bot scanner traffic.

When a bot requests /wp-admin/install.php, rejecting it at the Rack layer means you avoid:

Router pattern matching across dozens or hundreds of routes
ActionDispatch middleware chain (CSRF protection, session loading, parameter parsing)
Controller instantiation and filter execution
Database connection checkout
ActiveRecord queries triggered by before_action callbacks
View rendering and response serialisation

The request never enters your application logic. Compare this to alternatives:

Nginx rules block traffic earlier but require deployment-specific configuration, regexp expertise, and lack access to Rails conventions (you can't easily match "any path ending with a list of 40 config filename variants"). Changes require web server reloads rather than application deploys.

Controller filters run too late — the request has already consumed resources traversing the middleware stack and matching routes. CSRF failures still generate exceptions and error reports.

External WAFs (Cloudflare, AWS WAF) add latency, monthly costs, and complexity. They excel at volumetric attacks but are overkill for simple pattern matching against known bot signatures.

Rack middleware gives you Rails-native pattern matching with zero application overhead for rejected requests.

Designing the Middleware: What to Block and How

Effective bot blocking requires multiple detection layers — a single matching strategy will miss probe variants. The middleware uses three complementary techniques:

Extension matching catches technology-specific probes like .php, .asp, or .jsp files. Scanners blindly test for common web platforms, and these extensions have no legitimate place in a Rails application.

Dot-directory and dot-file detection blocks attempts to access hidden configuration directories (.git, .aws, .docker) and sensitive files (.env, credentials). This protects accidentally exposed repositories or credential files.

Filename prefix variants handle obfuscated naming patterns. Scanners don't just probe for .env — they also test .env.backup, .env-production, .env_local, and .env-script.js. The dash-separated variant is particularly insidious: requests for .env-script.js will reach your Rails router as JavaScript format, bypass CSRF protection, and trigger unnecessary database queries before returning 404.

def bot_filename?(path)
  basename = path.split("/").last.to_s
  
  BOT_FILENAMES.any? { |name| 
    basename == name || basename.start_with?("#{name}.")
  } ||
  basename.start_with?(".env-") ||
  basename.start_with?(".env_") ||
  basename.start_with?("ftpsync")
end

The middleware returns 403 Forbidden immediately when any pattern matches — before CSRF tokens are checked, database connections are established, or any Rails code executes.

Choosing the Right Layer

Rack middleware is the sweet spot for bot blocking in Rails: it intercepts requests before routing, CSRF checks, or database queries execute — giving you Rails-native pattern matching with zero application overhead for rejected requests.

Nginx rules block traffic earlier but require deployment-specific configuration, regexp expertise, and web server reloads for changes. Controller filters run too late — requests have already consumed resources traversing the middleware stack and triggering CSRF exceptions. External WAFs (Cloudflare, AWS WAF) add latency, monthly costs, and complexity that's overkill for simple pattern matching against known bot signatures.

For maximum protection, use Rack middleware as your primary defence and layer Nginx rules or CDN firewall rules upstream for defence in depth.

The Implementation: A Complete, Battle-Tested Middleware

Here's the complete middleware with inline commentary explaining each detection strategy:

class BotPathBlocker
  # Extensions commonly probed by PHP/ASP exploit scanners
  BOT_EXTENSIONS = %w[.php .asp .aspx .cgi .jsp .py .pl].freeze
  
  # Hidden directories scanners enumerate for credentials
  BOT_DOTDIRS = %w[.git .svn .aws .ssh .docker .kube].freeze
  
  # Config files with common naming variants
  BOT_FILENAMES = %w[.env .htaccess credentials wp-config].freeze
  
  # Exact paths frequently targeted (WordPress, CMS admin panels)
  BOT_PATHS = %w[/wp-admin /wp-login.php /admin /phpmyadmin].freeze

  def initialize(app)
    @app = app
  end

  def call(env)
    return block_request if bot_path?(env["PATH_INFO"].to_s)
    @app.call(env)
  end

  private

  def bot_path?(path)
    # Extension check: catches exploit scanners probing PHP/ASP endpoints
    return true if BOT_EXTENSIONS.any? { |ext| path.end_with?(ext) }
    
    # Dot-directory check: detects enumeration of version control and config dirs
    return true if path.split("/").any? { |segment| BOT_DOTDIRS.any? { |dir| segment.start_with?(dir) } }
    
    # Exact path check: blocks common CMS admin panel probes
    return true if BOT_PATHS.include?(path)
    
    # Filename check with prefix variants: handles .env, .env.backup, .env-production, .env_local
    bot_filename?(path)
  end

  def bot_filename?(path)
    basename = path.split("/").last.to_s
    BOT_FILENAMES.any? do |name|
      basename == name || 
      basename.start_with?("#{name}.") || 
      basename.start_with?("#{name}-") || 
      basename.start_with?("#{name}_")
    end
  end

  # Return minimal 404 with no body to avoid leaking server information
  def block_request
    [404, { "Content-Type" => "text/plain" }, []]
  end
end

The key insight: dash and underscore prefixes (-, _) were edge cases discovered in production. Scanners probe .env-production.js and .config_backup.php to bypass naive matchers checking only dot separators.

Edge Cases Discovered in Production

While the initial middleware implementation blocked obvious patterns, production log analysis revealed sophisticated scanner behaviour that bypassed naïve matchers. The most common gap appeared in config file naming conventions—requests for wp-config-sample.php and .env-production.js sailed through the middleware because they used dashes or underscores between segments rather than dots.

The original bot_filename? method only checked for dot-separated variants:

def bot_filename?(path)
  basename = path.split("/").last.to_s
  BOT_FILENAMES.any? { |n| basename == n || basename.start_with?("#{n}.") }
end

This caught .env.backup but missed .env-backup, .env_production, and config-script.js. Weekly log reviews showed these variants generating CSRF exceptions as they reached Rails controllers, triggering database queries and filter chains before ultimately failing.

The hardened version explicitly handles multiple separator patterns:

def bot_filename?(path)
  basename = path.split("/").last.to_s
  BOT_FILENAMES.any? { |n| basename == n || basename.start_with?("#{n}.") } ||
    basename.start_with?(".env-") ||
    basename.start_with?(".env_") ||
    basename.start_with?("ftpsync")
end

Key lesson: Extension-only matching is insufficient. Bot scanners actively probe separator variants (., -, _) to evade basic pattern filters.

Logging and Observability: Knowing What You're Blocking

Visibility into blocked traffic is essential — not just to confirm your middleware is working, but to spot emerging scanner patterns that aren't yet on your blocklist. Structured logging allows you to aggregate blocked requests without drowning application logs in noise.

Add logging directly to your middleware using Rails' tagged logging:

class BotPathBlocker
  def call(env)
    path = env["PATH_INFO"]
    
    if bot_path?(path)
      Rails.logger.info(
        "[BotBlocker] Blocked: #{env['REQUEST_METHOD']} #{path} " \
        "from #{env['HTTP_X_FORWARDED_FOR'] || env['REMOTE_ADDR']}"
      )
      return [403, {}, ["Forbidden"]]
    end
    
    @app.call(env)
  end
end

For production environments, consider incrementing metrics instead of (or alongside) log entries. A simple counter with path labels helps identify which patterns trigger most frequently:

BLOCKED_COUNTER = Prometheus::Client::Counter.new(
  :blocked_bot_requests_total,
  docstring: 'Blocked bot scanner requests',
  labels: [:pattern_type, :extension]
)

Review blocked request logs weekly to identify new scanner patterns. If you see repeated requests for .envs or .aws-credentials, add them to your blocklist before they multiply.

Keep blocked request logs in a separate namespace (e.g., tagged with [BotBlocker]) so they can be filtered out of standard application monitoring without losing visibility altogether.

Testing the Middleware

Testing your middleware thoroughly prevents both security gaps and false positives that might block legitimate application routes. Start with unit tests that verify the middleware in isolation from Rails:

RSpec.describe BotPathBlocker do
  let(:app) { ->(env) { [200, {}, ["OK"]] } }
  let(:middleware) { described_class.new(app) }

  def call_with_path(path)
    middleware.call({ "PATH_INFO" => path })
  end

  it "blocks .env file requests" do
    status, = call_with_path("/.env")
    expect(status).to eq(403)
  end

  it "blocks dash-separated config variants" do
    status, = call_with_path("/.env-production")
    expect(status).to eq(403)
  end

  it "allows legitimate application paths" do
    status, = call_with_path("/developers/calculator")
    expect(status).to eq(200)
  end
end

Integration tests verify that blocked requests never reach your Rails router. Mount a test route that increments a counter, then confirm bot paths never trigger it:

it "prevents blocked paths from reaching Rails routing" do
  counter = 0
  Rails.application.routes.draw do
    get "/*path", to: ->(_) { counter += 1; [200, {}, []] }
  end

  get "/.aws/credentials"
  expect(response.status).to eq(403)
  expect(counter).to eq(0)
end

Critical: Test your blocklist against every route in rails routes. Patterns like /api/config or /files/.hidden might accidentally match bot detection rules, creating false positives in production.

Watch for Separator Variants

Scanners actively probe dash and underscore-separated config file variants like .env-production, .env_local, and .env-backup specifically to bypass naive filters that only match dot separators (e.g., .env.backup). In production, these variants were observed reaching the Rails router, triggering CSRF exceptions and unnecessary database queries before returning 404.

Your blocklist must explicitly test against multiple separator patterns (., -, _). Extension-only matching is insufficient — always test your middleware against all three separator types and review blocked request logs weekly to catch new evasion patterns before they multiply.

Deployment Considerations and Complementary Measures

Roll out bot-blocking middleware using a staged approach to avoid false positives. Start in monitoring mode by logging matched paths rather than blocking them:

def call(env)
  path = env["PATH_INFO"].to_s
  if bot_path?(path)
    Rails.logger.info("[BotBlock] Would block: #{path}")
    # return [403, {}, ["Forbidden"]]  # Commented during monitoring
  end
  @app.call(env)
end

Review logs for a week, add legitimate patterns to an allowlist, then enable blocking.

Defence in depth: Rack middleware is your first line of defence, not your only one.

Combine it with upstream rules for maximum efficiency. Configure Nginx to reject bot patterns before they reach your application server:

location ~ /\.(env|git|aws) {
  return 403;
}
location ~ \.(php|asp|jsp)$ {
  return 403;
}

At the CDN level (Cloudflare, Fastly), create firewall rules using threat scores or known bot signatures. This blocks attacks at the edge, reducing bandwidth costs.

Add rate limiting as a complementary layer using Rack::Attack — middleware alone won't stop volumetric attacks from rotating IPs. For persistent offenders, implement IP-based blocking with temporary bans (e.g., 24-hour blocks after 100 rejected requests).

Monitor your middleware's effectiveness by tracking 403 responses in your analytics pipeline. If you see a pattern bypass detection, extend your matcher patterns immediately.

Ruby