Designing a Two-Tier Multi-Tenant Data Model: Shared Global Records with Tenant-Scoped Overrides

The Problem: Why Naive Multi-Tenancy Leads to Data Duplication

In a typical multi-tenant SaaS platform, the simplest approach is to duplicate all data per tenant. Need product catalogues for fifty customers? Create fifty copies of the same supplier data. This naïve strategy collapses quickly when you're dealing with supplier product feeds that are scraped or ingested centrally.

Consider a B2B procurement platform where suppliers publish catalogues of 10,000+ SKUs. If you scrape that data once and duplicate it across every tenant, you're storing redundant copies of base product information—name, barcode, pack size, brand—that rarely changes. When a supplier updates a product description, you must now propagate that change across hundreds of tenant databases. Worse, if a tenant manually edits a field (correcting a typo in the supplier's data), your next import risks overwriting their correction.

The real requirement is shared global records with tenant-scoped overrides: base supplier data lives in a central table, scraped once and reused by all tenants. Tenant-specific concerns—negotiated pricing, custom availability rules, internal notes—live in separate, scoped tables that layer on top. This two-tier model eliminates duplication whilst preserving tenant isolation where it matters. The challenge is designing a clean interface that transparently falls back from tenant overrides to global defaults without leaking abstraction all over your codebase.

Introducing the Two-Tier Data Model

The two-tier data model separates concerns by storing shared, canonical data in global tables and tenant-specific deltas in scoped tables. For a product catalogue platform, this means supplier product data—names, descriptions, images, categories—lives in a single global products table, scraped once from the manufacturer's website. Each tenant then maintains their own tenant_products table holding only the fields they've customised: retailer-specific pricing, availability flags, or bespoke descriptions.

This architectural pattern eliminates redundant scraping and storage. If 50 tenants sell the same widget, you store the base product data once, not 50 times. The global tier is authoritative for static, manufacturer-provided data. The tenant tier is authoritative for commercial and operational overrides. When a tenant updates pricing or marks a product unavailable, that change lives exclusively in their scoped record—other tenants remain unaffected.

The link between tiers is typically a belongs_to :global_product, optional: true association. Private products (tenant-uploaded rather than supplier-sourced) skip the global layer entirely, with global_product_id left null. This optional relationship preserves the model's flexibility whilst maintaining clear data lineage.

The Effective Delegation Pattern: Transparent Fallback in Practice

The effective delegation pattern provides a clean, transparent interface for falling back to global records when tenant overrides don't exist. Instead of forcing consumers to check tenant_price.present? ? tenant_price : global_price, the model exposes a single effective_price method that handles the logic internally.

The implementation is straightforward—each tenant-scoped attribute gets a corresponding effective_* method that checks the local override first, then delegates to the associated global record:

class TenantProduct < ApplicationRecord
  belongs_to :global_product, optional: true

  def effective_price
    price.presence || global_product&.price
  end

  def effective_description
    description.presence || global_product&.description
  end
end

This approach favours explicit delegation over method_missing. Whilst method_missing could dynamically route undefined calls to the global record, it introduces debugging complexity (stack traces become opaque) and breaks introspection—respond_to? won't recognise delegated methods without additional work. Explicit methods are self-documenting, appear in code completion, and make the fallback behaviour obvious when reading the model.

Key trade-off: Explicit delegation requires one method per attribute (slightly more boilerplate), but gains clarity, performance (no method lookup overhead), and maintainability. Reserve method_missing for truly dynamic scenarios where the attribute set is unknowable at definition time.

Schema Design and Migration Strategy

The foundation of this pattern rests on three table types: global records that hold shared supplier data, tenant-scoped records that store retailer-specific overrides, and canonical product records that both can reference. The global suppliers table contains fields like name, website, status, whilst the tenant-scoped tenant_suppliers table mirrors these fields (prefixed local_*) plus adds tenant_id, global_supplier_id (nullable foreign key), and tenant-specific pricing columns.

class CreateTenantSuppliers < ActiveRecord::Migration[7.1]
  def change
    create_table :tenant_suppliers do |t|
      t.references :tenant, null: false, foreign_key: true
      t.references :global_supplier, foreign_key: { to_table: :suppliers }
      t.string :local_name
      t.decimal :cost_price, precision: 10, scale: 2
      t.timestamps
    end

    add_index :tenant_suppliers, [:tenant_id, :global_supplier_id], 
              unique: true, where: "global_supplier_id IS NOT NULL"
  end
end

Key insight: The composite unique constraint prevents a tenant from creating duplicate links to the same global supplier, whilst the partial WHERE clause permits multiple private (unlinked) suppliers per tenant.

Indexing strategy requires compound indices on lookup paths: (tenant_id, status) for scoped queries, (tenant_id, barcode) and (tenant_id, sku) for normalisation matching. Foreign keys to global records should be indexed individually since joins happen frequently during effective delegation.

Structure your migrations so global and tenant layers evolve independently—global supplier schema changes don't force tenant-scoped migrations, and vice versa. Use separate migration files prefixed by domain (CreateGlobalSuppliers, AddWebsiteToGlobalSuppliers) to maintain clear boundaries as the system scales.

Normalisation Pipeline: From Raw CSV to Clean Global Records

The normalisation pipeline transforms raw supplier feeds—CSV or Excel files—into canonical records that can be reused across tenants. The pipeline consists of three stages: parsing, mapping, and normalisation. Uploaded files are parsed using the CSV or Roo gems, producing a hash-per-row representation. The column mapping UI then presents operators with a form showing detected CSV headers alongside a dropdown of target fields (name, SKU, barcode, price). Auto-detection uses regex patterns to pre-select likely matches:

AUTO_DETECT_PATTERNS = {
  "sku" => /\b(sku|item.?code|product.?code|stock.?code)\b/i,
  "barcode" => /\b(barcode|ean|upc|gtin)\b/i,
}

Once mappings are confirmed, the mapping is stored as { target_field => source_column } to simplify lookup during normalisation. The normalisation service iterates target fields, extracts the corresponding CSV value, and either matches an existing global product by barcode (most reliable) then SKU, or creates a new record. To preserve tenant edits, the "fill blank fields only" strategy only updates nil columns—never overwriting existing values.

Idempotency is achieved by separating data assignment from metadata updates. The service first assigns product fields, checks changed?, then conditionally updates last_seen_at: Time.current. This prevents false change detection on re-import of identical data.

Global Models Need Special Treatment

Global records deliberately exclude tenant_id and should live entirely outside the acts_as_tenant gem's automatic scoping. Remove the acts_as_tenant declaration from global models altogether, and wrap any reads of global data in explicit unscoped blocks to prevent queries from silently filtering them out.

Without these precautions, tenant-scoped default scopes will exclude global records from query results or raise validation errors on save—causing subtle bugs that are difficult to trace, especially in background jobs where tenant context may not be set.

Integrating with acts_as_tenant: Pitfalls and Workarounds

The popular acts_as_tenant gem enforces automatic scoping of queries to the current tenant via default_scope. This creates immediate friction with global records that deliberately lack a tenant_id: queries will silently exclude them or raise validation errors when attempting to save.

The cleanest workaround for read operations is to wrap global record queries in unscoped blocks:

def effective_name
  local_name.presence || global_record&.unscoped { name }
end

For write operations, consider bypassing acts_as_tenant entirely for your global models. Remove acts_as_tenant from the model and handle tenant context manually in controllers where needed:

class GlobalSupplier < ApplicationRecord
  # No acts_as_tenant declaration
end

Key takeaway: Global records should live outside the gem's automatic scoping. Use explicit unscoped blocks for associations or exclude global models from acts_as_tenant altogether.

Test coverage becomes critical here—the validate_uniqueness_of matcher from shoulda-matchers will fail for tenant-scoped models because it attempts saves outside tenant context. Replace these with manual validation specs wrapped in with_tenant blocks to avoid silent test failures.

Clear Tenant State in Tests

The acts_as_tenant gem enforces tenant scoping via a thread-local variable that can leak between RSpec examples if not explicitly cleared. Always add an after(:each) hook in your rails_helper.rb to reset the current tenant:

config.after(:each) { ActsAsTenant.current_tenant = nil }

Wrap all tenant-dependent specs in with_tenant blocks, and be aware that shoulda-matchers' validate_uniqueness_of will fail for tenant-scoped models unless run within a tenant context—replace these with manual validation checks instead.

Preserving Tenant Edits Across Global Data Refreshes

When global data is refreshed—whether through a nightly scrape or a monthly bulk update—tenant customisations must survive intact. This requires careful identifier management and a clear strategy for detecting and resolving conflicts.

The foundation is stable global record identifiers. Never rely on auto-incrementing IDs or row order. Instead, use business keys like GTIN barcodes, supplier SKUs, or unique catalogue codes. When your scraper encounters a product it's seen before, it must match against this immutable identifier to update the existing global record rather than creating a duplicate:

class GlobalProductScraper
  def sync_product(scraped_data)
    global_product = GlobalProduct.find_or_initialize_by(
      gtin: scraped_data[:gtin]
    )
    
    global_product.assign_attributes(
      name: scraped_data[:name],
      description: scraped_data[:description],
      last_scraped_at: Time.current
    )
    
    global_product.save!
  end
end

For tenant overrides, the foreign key relationship (tenant_product.global_product_id) remains constant even as the global record's attributes change. Your effective_* delegation methods continue working correctly:

def effective_name
  local_name.presence || global_product&.name
end

If the global product's name changes from "Widget Pro" to "Widget Pro Plus", tenants who haven't overridden the name automatically inherit the update. Those with custom names keep them untouched.

Conflict resolution becomes necessary when a global field changes and a tenant has an outdated override. Consider tracking which fields each tenant has explicitly customised:

class TenantProduct < ApplicationRecord
  belongs_to :global_product
  
  store :overridden_fields, accessors: [:name_overridden, :price_overridden], coder: JSON
  
  def effective_name
    name_overridden ? name : (global_product&.name || name)
  end
  
  def update_name(new_name)
    self.name = new_name
    self.name_overridden = true
    save!
  end
end

This explicit flag prevents accidental overrides (setting a field to the same value as global shouldn't lock it) and enables "reset to global" functionality in your UI.

Soft deletes handle discontinued products gracefully. When a product disappears from the supplier catalogue, mark the global record as archived_at: Time.current rather than destroying it. Tenant products referencing archived globals can display a warning ("This product is no longer available from the supplier") whilst preserving historical order data.

Key takeaway: Stable identifiers, explicit override tracking, and soft deletes form a three-part strategy that keeps tenant customisations safe whilst allowing global data to evolve independently.

Querying Across Tiers: Building Efficient Read Paths

Reading data that seamlessly combines global defaults with tenant overrides requires careful query design. The naive approach — fetching both tiers and merging in Ruby — performs poorly at scale. Instead, push the merge logic down to the database layer using COALESCE patterns and optimised indexing.

SQL-Level Merging with COALESCE

The most efficient approach uses COALESCE in a single query to fall back from tenant-scoped to global records:

scope :effective_data, -> {
  select(<<-SQL.squish)
    tenant_suppliers.*,
    COALESCE(tenant_suppliers.name, global_suppliers.name) AS effective_name,
    COALESCE(tenant_suppliers.price, global_suppliers.price) AS effective_price
  SQL
  .left_joins(:global_supplier)
}

This returns one result set with computed effective_* columns. For frequently-accessed patterns, wrap this in a database view that pre-defines the COALESCE logic, then mount an ActiveRecord model on top.

Caching Strategies for Two-Tier Data

Cache keys must incorporate both tiers. Use Russian doll caching with composite keys:

cache ["supplier", @supplier.cache_key, @supplier.global_supplier&.cache_key] do
  # render merged data
end

For API endpoints serving many tenants, maintain a global cache warming job that pre-computes effective values for popular records. Expire tenant-scoped caches on local updates; expire global caches on scrape runs.

Frequently Asked Questions About the Two-Tier Multi-Tenancy Pattern

When should I use the two-tier data model instead of schema isolation (e.g., the Apartment gem)?

Use the two-tier model when you have large, shared reference catalogues—like supplier product feeds scraped daily for many tenants—where duplicating data per tenant would be wasteful. If tenants rarely share data, or your team lacks experience with multi-tenancy patterns, simpler apartment-based schema isolation or per-tenant copying will be easier to debug and reason about.

How do I handle archived or discontinued products in the global catalogue?

Use soft deletes rather than destroying global records. When a product disappears from a supplier catalogue, mark the global record with an archived_at timestamp. Tenant products referencing archived globals can then display a warning like "This product is no longer available from the supplier" whilst preserving historical order data.

Can a tenant reset an overridden field back to the global default value?

Yes, if you implement explicit override tracking using a stored field flag (such as an overridden_fields JSON column with per-attribute flags like name_overridden). To reset, clear the tenant's local value and set the override flag back to false, so the effective_* delegation method will fall back to the global record's value.

What happens to tenant data when the global record is updated by a scrape or bulk refresh?

Tenants who haven't overridden a field automatically inherit the updated global value through the effective_* delegation methods. Tenants who have explicitly customised a field keep their local override untouched. The foreign key relationship between the tenant record and the global record remains constant even as the global record's attributes change.

How do I prevent duplicate global records from being created during scraping?

Use stable business keys—such as GTIN barcodes, supplier SKUs, or unique catalogue codes—rather than auto-incrementing IDs or row order. Your scraper should use find_or_initialize_by on the immutable identifier (e.g., GTIN) to match existing global records and update them rather than creating duplicates.

How does acts_as_tenant interact with global records, and what are the main pitfalls?

The acts_as_tenant gem enforces automatic tenant scoping via default_scope, which silently excludes global records that lack a tenant_id. The recommended workaround is to exclude global models from acts_as_tenant entirely and use explicit unscoped blocks for associations. In tests, be sure to clear the tenant context after each example and avoid shoulda-matchers' validate_uniqueness_of for tenant-scoped models.

Testing the Two-Tier Model

Factory Setup for Two-Tier Records

Structure your factories to reflect the global-vs-tenant split. Create separate factories for global records (no tenant association) and tenant-scoped records (with belongs_to :account):

FactoryBot.define do
  factory :global_supplier do
    name { "Acme Wholesale" }
    barcode { "5060123456789" }
    # No account_id — deliberately global
  end

  factory :tenant_supplier do
    account
    global_supplier { nil } # Override for private suppliers
    local_price { 12.99 }

    trait :linked_to_global do
      global_supplier
      local_price { nil } # Will fall back to global
    end
  end
end

Use traits to test both private suppliers and those linked to global records. The :linked_to_global trait exercises the effective_* delegation pattern.

Testing Fallback Behaviour

Explicitly test that effective_* methods fall back correctly when local fields are nil. Create specs that verify both the presence case (local value takes precedence) and the fallback case:

describe "#effective_price" do
  context "when local_price is set" do
    let(:supplier) { create(:tenant_supplier, :linked_to_global, local_price: 14.99) }

    it "returns the local override" do
      expect(supplier.effective_price).to eq(14.99)
    end
  end

  context "when local_price is nil" do
    let(:supplier) { create(:tenant_supplier, :linked_to_global, local_price: nil) }

    it "falls back to global price" do
      expect(supplier.effective_price).to eq(supplier.global_supplier.price)
    end
  end
end

Key insight: Always test both the override and fallback paths. Many bugs stem from assuming presence checks work when fields contain empty strings or zero values.

Preventing Tenant Context Leakage

The acts_as_tenant gem enforces tenant scoping via a thread-local variable. In RSpec, this can leak between examples if not explicitly cleared. Wrap tenant-dependent specs in with_tenant blocks and ensure your rails_helper.rb includes an after(:each) hook:

RSpec.configure do |config|
  config.after(:each) do
    ActsAsTenant.current_tenant = nil
  end
end

Common gotcha: shoulda-matchers' validate_uniqueness_of matcher attempts to save records. With tenant scoping active, this fails unless the test is wrapped in with_tenant. Replace with manual validation checks for tenant-scoped models:

it "enforces unique SKU per account" do
  with_tenant(account) do
    create(:tenant_supplier, sku: "ABC-123")
    duplicate = build(:tenant_supplier, sku: "ABC-123")
    expect(duplicate).not_to be_valid
    expect(duplicate.errors[:sku]).to include("has already been taken")
  end
end

When testing background jobs that process tenant data, explicitly set the tenant context at the job's entry point — don't rely on controller-level filters carrying through.

When This Pattern Is (and Isn't) the Right Choice

This pattern excels when you have large, shared reference catalogues that would be wasteful to duplicate per tenant—think product databases scraped from suppliers, postal code lookups, or industry-standard taxonomies. If you're synchronising supplier data daily for 50 tenants, storing it once and letting tenants override pricing or availability delivers real storage and maintenance savings.

Choose this approach when:

Shared data changes frequently (automated scraping, API feeds) and re-distributing copies to every tenant would be expensive
Tenants need autonomy to override specific fields (pricing, descriptions) without breaking the sync pipeline
You have a clear separation between "canonical truth" (global) and "tenant preferences" (scoped)

Avoid this pattern when:

Tenants rarely share data—if 90% of records are tenant-specific, the two-tier split adds complexity for minimal benefit
The global dataset is small and stable (a handful of configuration options fits comfortably in tenant tables)
Your team lacks experience with multi-tenancy patterns—simpler apartment-based schema isolation or per-tenant copying will be easier to debug and reason about

The two-tier model trades code complexity for data efficiency. Make sure the savings justify the cognitive overhead before committing to it.

Ruby