The Document Proxy Pattern: Coordinating File Access Across Integrated Services Without Duplicating Storage

Two Systems, One Document, Zero Shared Credentials

Consider a case study: Platform A generates client reports as PDFs. Platform B orchestrates the client engagement workflow. When Platform A generates a report, Platform B's users need to view it — but those users have accounts only on Platform B, with no credentials for Platform A.

The naive redirect approach fails immediately: sending users to Platform A's download URL returns a 401. Signed URLs with expiry tokens work temporarily but create a brittle dependency on token lifetimes and complicate browser caching.

The duplicate storage approach solves authentication but introduces new problems. Storing 10 MB PDFs in both systems doubles infrastructure costs. Worse, it creates synchronisation drift: if Platform A regenerates a corrected report, Platform B's copy remains stale unless you build a complex invalidation mechanism.

Neither approach handles credential isolation properly. Users shouldn't possess Platform A's API keys, and Platform A shouldn't issue per-user tokens for an external system it doesn't manage.

This is precisely where the document proxy pattern emerges: Platform B stores only the download URL, then provides its own endpoint that fetches documents using organisation-level credentials and streams them to authenticated users.

The Document Proxy Pattern: An Architectural Overview

The Document Proxy Pattern solves a common integration challenge: two platforms need to share documents, but users of each system lack credentials for the other. The architecture has three core components working in concert.

Platform A (the originating system) notifies Platform B via webhook when a document is uploaded. Platform B stores only the document's download URL—not the file itself. When a Platform B user requests the document, Platform B's proxy endpoint fetches it using stored organisation-level credentials and streams it back.

The data flow is linear: (1) Platform A fires webhook with document URL, (2) Platform B validates HMAC signature and stores URL in its database, (3) Platform B user clicks document link, (4) proxy endpoint authenticates user, (5) proxy fetches PDF from Platform A using API credentials, (6) proxy streams response to user. This maintains credential isolation—end users never see service-to-service API keys, and Platform A never authenticates individual Platform B users directly.

Side One: The Originating Platform's Webhook Notification

When Platform A detects a document upload event, the webhook notification serves as a lightweight metadata courier rather than a file transport mechanism. The triggering event typically fires after the file has been successfully attached and persisted — in Rails with Active Storage, this means explicitly enqueuing the webhook job from the controller action rather than relying on model callbacks, since ActiveStorage::Attachment records don't trigger after_commit on the parent model.

The payload construction follows a principle of minimal data transfer: include immutable metadata (record identifiers, filename, upload timestamp) and a fetchable URL, but never the file contents themselves. A typical payload looks like:

{
  "RecordID": "abc-123",
  "DocumentUrl": "https://platform-a.example/api/v1/records/abc-123/document",
  "DocumentFilename": "contract.pdf",
  "UploadedAt": "2024-03-15T14:32:00Z"
}

HMAC request signing ensures integrity and authenticity. Compute the signature over the JSON payload body using a shared secret, then attach it as an X-Webhook-Signature header. The consuming platform verifies this before processing.

Design insight: Store the download URL in the payload, not a temporary signed URL. The consumer will fetch on-demand when users request the document, potentially hours or days later — signed URLs would expire.

Use Stable URLs, Not Signed URLs

Always store the permanent download URL in your webhook payload, not a temporary signed URL. The consuming platform fetches documents on-demand when users actually request them — potentially hours or days after the webhook fires.

Signed URLs with expiry tokens will have long expired by then, breaking document access silently. A stable, credential-protected download URL ensures the proxy endpoint can fetch the document reliably at any point in the future using organisation-level API keys.

Side Two: Receiving and Verifying the Webhook

Once Platform A dispatches its webhook, Platform B must verify authenticity, extract the payload, and persist the reference—crucially, storing the URL rather than fetching and re-storing the file. This decision eliminates duplicate storage, avoids Active Storage overhead on the receiving side, and establishes Platform A as the canonical document source.

The webhook receiver follows the established pattern: inherit HMAC verification from a base controller, parse the JSON payload, validate required fields (RecordID, DocumentUrl), and locate the target record via a cascading lookup scoped to the partner. This lookup typically checks three tiers in priority order: an explicit secondary ID (Platform B's own ID), the primary ID as Platform B's ID, then the primary ID as Platform A's reference stored in partner_record_id.

Idempotency is essential. Persist a document_received_at timestamp alongside the URL, allowing the controller to recognise repeat deliveries (webhooks may fire multiple times for a single upload). Use database constraints or upsert semantics where appropriate, and return 200 OK even for duplicate notifications—retries should not trigger errors.

The Cascading Resource Lookup Pattern

When Platform B receives a document notification webhook from Platform A, it must locate the correct resource within its own database—but the webhook payload references Platform A's identifiers, not Platform B's internal IDs. The cascading resource lookup pattern resolves this mismatch by attempting multiple association paths in priority order, each scoped to the authenticated partner organisation.

Consider a document belonging to a record that belongs to a project that belongs to a user. Platform B's webhook receiver first attempts to match the incoming SecondaryID (Platform B's own ID, if the resource originated there). If not found, it tries PrimaryID as Platform B's internal ID. Finally, it checks PrimaryID against partner_record_id, the stored reference for resources originating on Platform A:

def find_record(payload)
  scope = Record.for_partner(partner)
  
  scope.find_by(id: payload["SecondaryID"]) ||
    scope.find_by(id: payload["PrimaryID"]) ||
    scope.find_by(partner_record_id: payload["PrimaryID"])
end

This three-tier cascade handles bidirectional resource ownership whilst the .for_partner(partner) scope ensures proper authorisation boundaries—Platform B never queries records belonging to other organisations, even if a malicious payload supplies valid IDs from elsewhere.

Building the Proxy Endpoint That Streams Documents

The proxy endpoint bridges two authentication boundaries: verifying the requesting user belongs to an organisation with access, then using that organisation's stored credentials to fetch the document from Platform A. The controller action performs a cascading resource lookup to authorise access, fetches the PDF via HTTP with server-side credentials, and streams the response directly to the browser.

def show
  record = find_record_via_cascade(params[:id])
  return render_not_found unless record && current_user.can_view?(record)

  document_data = fetch_document_from_platform_a(record)
  return render_error unless document_data

  send_data document_data[:body],
    type: "application/pdf",
    disposition: "inline",
    filename: record.document_filename
end

HTTP streaming with send_data keeps memory usage constant regardless of file size—the proxy doesn't buffer the entire PDF. The disposition: "inline" directive tells browsers to render the PDF in-tab rather than forcing a download. For error scenarios, return a 502 Bad Gateway status when Platform A is unreachable, and 404 Not Found when the resource doesn't exist or the user lacks permission—never leak existence information through differing error messages.

Active Storage Gotchas: When the Framework Fights the Pattern

Active Storage's design assumptions can create friction with the document proxy pattern. The framework expects to manage blob storage locally, which conflicts with our "store URL, not file" approach. If you're tempted to use attach(io: remote_file) to create an Active Storage blob from a partner's document, you'll end up duplicating the file — precisely what the pattern aims to avoid.

The proxy endpoint becomes critical here. Rather than generating signed Active Storage URLs (which expire and require redirect handling), use send_data to stream the PDF directly through your own endpoint. This gives you complete control over authentication and eliminates URL expiry concerns.

Another gotcha: Active Storage attachments don't trigger after_commit callbacks on the parent model because they're separate database records. If you need to notify a partner service after upload, explicitly enqueue the job in your controller action rather than relying on model callbacks. This keeps the control flow visible and debuggable, avoiding mysterious callback chains that fire on ActiveStorage::Attachment records instead of your domain models.

Security Considerations Across the Integration

When file access bridges two systems, security becomes a distributed problem. The HMAC signature on incoming webhooks provides the first line of defence—verify the signature before trusting any payload data, because an attacker who can forge notifications could inject malicious URLs or point your proxy at internal resources. The webhook receiver validates authenticity before parsing JSON, rejecting spoofed requests at the earliest possible moment.

Credential isolation prevents lateral movement: users of the partner platform cannot call your API, and your users cannot call theirs. Each system authenticates with organisation-level credentials stored securely (environment variables, encrypted settings), never exposing these tokens to end users. The proxy endpoint enforces this boundary—it uses the stored API key to fetch documents, then streams them to authenticated users who've passed your own authorisation checks:

def show
  authorize @record  # Your platform's authorization
  # Fetch using service credentials, not user session
  pdf = fetch_document_from_organization(@record)
  send_data pdf[:body], type: "application/pdf"
end

Transport-layer security completes the picture: enforce TLS 1.2+ for all webhook and API traffic, rotate credentials quarterly, and log all cross-service requests for audit trails. The proxy pattern concentrates security decisions at clearly defined boundaries rather than scattering them across user sessions.

Active Storage Callback Gotcha

Active Storage attachments are separate database records, which means they don't trigger after_commit callbacks on the parent model. If you need to notify a partner service after a document upload, don't rely on model callbacks — they'll fire on ActiveStorage::Attachment records instead of your domain models.

Explicitly enqueue your webhook job in the controller action where the upload occurs. This keeps the control flow visible and debuggable, avoiding mysterious callback chains that never reach your intended hook.

Testing the Full Integration

Testing a two-sided integration requires validating both the webhook sender and receiver independently before smoke-testing the full round-trip. Start with unit tests for HMAC signing and verification — these are pure functions that should never hit the network. On the sending side, verify that your signature generation matches the algorithm documented in your API contract. On the receiving side, test both valid signatures and tampered payloads to confirm rejection.

Integration tests for the webhook receiver should use fixture JSON payloads that match your real webhook structure. Test the happy path (valid payload creates/updates the record), missing required fields, and the cascading resource lookup logic with different ID combinations. Stub the remote fetch in your proxy endpoint tests using WebMock or similar:

before do
  stub_request(:get, "https://partner.example.com/api/v1/records/7/document")
    .with(headers: { "X-Api-Key" => "test-key" })
    .to_return(status: 200, body: "%PDF-1.0 test", 
               headers: { "Content-Type" => "application/pdf" })
end

Key insight: Test failure modes explicitly — 500 responses from the remote service, network timeouts, expired credentials. Your proxy should degrade gracefully rather than exposing raw errors to end users.

Finally, run end-to-end smoke tests in a staging environment: upload a real document, verify the webhook fires, confirm the URL is stored, and test the proxy endpoint returns the PDF. This catches integration issues that unit tests miss, such as mismatched API versions or certificate problems.

When to Use This Pattern and When to Reach for Something Else

This pattern excels in environments where credential isolation is paramount—when users in System A shouldn't have direct API access to System B, yet need to access documents stored there. It's particularly well-suited to moderate document volumes (hundreds to low thousands per month) where simplicity and maintainability outweigh peak throughput concerns.

When this pattern shines:

You're integrating two applications with distinct user bases and authentication boundaries
Document volume is predictable and moderate (typically < 5,000 fetches/day)
You want to avoid duplicating storage infrastructure and keep a single source of truth
Your team values straightforward request paths over complex distributed storage

When to consider alternatives:

Scenario	Better Approach
High-volume document access (>100 req/sec)	CDN with pre-signed URLs or edge caching
Multiple consuming applications	Dedicated document service with federated auth
Large files (>50MB)	Shared object storage (S3) with temporary credentials
Offline/unreliable networks	Local replication with eventual consistency

Key consideration: The proxy introduces a runtime dependency. If the originating platform experiences downtime, consuming applications lose document access immediately. For mission-critical documents, evaluate whether this coupling is acceptable or whether replication provides necessary resilience.

Simplicity as a Feature

The Document Proxy Pattern's greatest strength lies in what it doesn't require. There's no duplicated storage consuming disk space and bandwidth. No shared credentials creating security boundaries between systems. No complex synchronisation logic racing to keep copies aligned. Each application maintains full ownership of its concerns: the originating platform controls the document lifecycle, whilst the consuming platform manages its users' access patterns.

When integration feels heavy, look for what you can avoid storing rather than how cleverly you can replicate it.

This pattern appears wherever credential boundaries matter — partner APIs, multi-tenant platforms, microservices with separate authentication domains. The next time you're tempted to download, transform, and re-store a resource "just to be safe", consider whether a URL, a proxy endpoint, and a clear ownership model might serve you better. Simplicity scales. The fewer moving parts in your integration, the fewer places it can break, and the less cognitive load for the next developer maintaining it.

Frequently Asked Questions About the Document Proxy Pattern

What happens if the originating platform goes down?

The proxy introduces a runtime dependency, so if the originating platform experiences downtime, the consuming application loses document access immediately. For mission-critical documents, you should evaluate whether this coupling is acceptable or whether local replication with eventual consistency provides the necessary resilience.

How do I handle very large files through the proxy?

For moderate file sizes, the proxy uses HTTP streaming with send_data to keep memory usage constant without buffering the entire file. However, for large files exceeding 50 MB, the article recommends considering an alternative approach such as shared object storage (like S3) with temporary credentials rather than proxying the content directly.

Can I cache documents locally instead of fetching them every time?

The pattern deliberately avoids storing documents locally to maintain a single source of truth and prevent synchronisation drift—if the originating platform regenerates a corrected document, a cached copy would be stale. If your use case demands caching, you would need to build an invalidation mechanism, which adds the complexity this pattern is designed to avoid.

What if the document URL changes after it's been stored?

The article recommends storing a stable, fetchable download URL in the webhook payload rather than a temporary signed URL, since users may request the document hours or days later. If the originating platform changes its URL structure, it should fire a new webhook notification so the consuming platform can update the stored reference.

How do I rate-limit or manage high volumes of proxy requests?

The pattern is best suited for moderate document volumes—typically fewer than 5,000 fetches per day. For high-volume access exceeding 100 requests per second, the article recommends switching to a CDN with pre-signed URLs or edge caching rather than routing all traffic through the proxy endpoint.

Should I replicate documents across both systems for resilience?

The article advises against duplication in most cases because it doubles infrastructure costs and creates synchronisation drift when documents are updated. However, for offline or unreliable network scenarios, local replication with eventual consistency is listed as a valid alternative when the runtime dependency on the originating platform is unacceptable.

Ruby