JSONB Indexing Layer (Hybrid)

Goal: make querying base entities together with custom fields fast and simple by maintaining a flat, query-friendly JSONB index per entity record, without breaking module isomorphism or schema agility. See the Hybrid query engine for runtime usage.

What you get

Hybrid engine: uses a JSONB-backed index when available; falls back to the join-based engine otherwise.
Event-driven updates: created/updated/deleted events keep the index synced. Custom-field updates also trigger reindex via the DataEngine write layer.
Zero-churn evolution: new fields (base or custom) are available immediately in the JSON document; promote hot fields later if needed.

Storage

Table: entity_indexes
- id uuid (PK)
- entity_type text e.g., example:todo
- entity_id text base record id as text
- organization_id uuid null, tenant_id uuid null
- doc jsonb flattened document of base fields + cf:<key> values
- index_version int (default 1), created_at, updated_at, deleted_at
- organization_id_coalesced uuid generated always as (coalesce(organization_id, '00000000-0000-0000-0000-000000000000')) stored
Indexes
- gin(doc jsonb_path_ops) for JSONB containment/path queries
- Unique: unique(entity_type, entity_id, organization_id_coalesced)

Document shape

Base fields are stored under their snake_case DB names.
Custom fields are stored as cf:<key> values.
Arrays are preserved as arrays; singletons are scalars.

Indexer

Build: merges base row with custom field values into a single plain object.
Upsert: inserts or updates the JSONB row keyed by (entity_type, entity_id, organization_id_coalesced).
Delete: sets deleted_at on logical delete.

Events and consistency

Subscribers are registered for all entities that have custom field definitions:
- <module>.<entity>.created|updated → upsert index row.
- <module>.<entity>.deleted → mark index row deleted.
Custom-field writes done via the DataEngine emit <module>.<entity>.updated as well, reusing the same subscriber path.

Query routing

The HybridQueryEngine prefers the JSONB index when rows exist for the target entity; otherwise it falls back to the basic engine.
Supported in index path:
- Filters: base fields and cf:* via JSON path extraction and casting for common types.
- Sorting: base fields and cf:* using casted expressions when possible.
- Paging and multi-tenant scoping (organization_id, tenant_id) and soft-delete exclusion by default.

Backfill

CLI: mercato query_index rebuild --entity <module:entity> reindexes existing rows with a live progress bar.
- All orgs/tenants: mercato query_index rebuild --entity example:todo --global
- Scoped: mercato query_index rebuild --entity example:todo --org <orgId> --tenant <tenantId>
- Single record: mercato query_index rebuild --entity example:todo --record <entityId>
- Options: --withDeleted --limit <n> --offset <n> --batch <size>
CLI: mercato query_index rebuild-all walks every registered entity index using the same flags as rebuild, making it easy to refresh the entire corpus in one go.
CLI: mercato query_index reindex [--entity <module:entity>] [--tenant <tenantId>] [--force] (shorthand: yarn mercato reindex) now splits work into five partitions by default. Override the granularity with --partitions <n> or target a single shard with --partition-index <i>. Combine with --skipResetCoverage on partitions >0 when running several workers concurrently. The command continues to honour --batch <size> for the inner chunk size (defaults to 500 rows).
- Parallel run: start five terminals with mercato query_index reindex --entity example:todo --partitions 5 --partition-index <0-4>; the partition 0 process resets coverage, so the remaining workers should pass --skipResetCoverage.
- Single worker: omit --partition-index and the CLI will iterate every partition sequentially.
Each CLI run purges existing index rows before reindexing unless --skipPurge is provided, ensuring a clean rebuild.
The admin status view now tracks per-partition progress and marks jobs as stalled when no heartbeat is recorded for 60 seconds, making it easier to spot a wedged worker.
Scheduling via events remains available: mercato query_index reindex [...] and mercato query_index purge [...] enqueue background jobs when you don’t need immediate writes. API callers can pass partitionCount, partitionIndex, and batchSize to mirror the CLI behaviour; omitting partitionIndex emits one event per partition automatically.
Bulk imports: build and insert index payloads alongside your ETL transaction with the buildIndexDocument helper (packages/core/src/modules/query_index/lib/document.ts) so large seeders can skip the post-import reindex sweep. The CRM stress-test seeder now follows this pattern.

Performance tips

Start with gin(doc jsonb_path_ops).
Add expression indexes for hot JSON paths (filters/sorts) before promoting to typed columns:
- Example: create index on entity_indexes (((doc->>'cf:priority')::int));
Consider typed columns only for extreme hotspots.

Limitations and notes

The index row appears after create/update events or backfill. Queries still work via fallback when the index is missing.
Cross-module relations remain by foreign key id only; no cross-module joins.
When the hybrid engine falls back because of partial index coverage it emits a warning and, by default, schedules an asynchronous reindex for the affected entity/tenant scope. Toggle this with SCHEDULE_AUTO_REINDEX (defaults to true) if you prefer to manage reindex jobs manually. For smaller datasets (≤10k customers) keep this flag enabled together with OPTIMIZE_INDEX_COVERAGE_STATS=false so coverage updates remain instant.
OPTIMIZE_INDEX_COVERAGE_STATS=true enables cached coverage snapshots. Leave it false to recalculate counts on every read, which is ideal for fast feedback during development or on small installations.
Keep the indexed path even when gaps are detected by setting FORCE_QUERY_INDEX_ON_PARTIAL_INDEXES=true (enabled by default). When active, list responses include an x-om-partial-index response header (with entity, baseCount, indexedCount, and scope) so the admin UI can surface a banner prompting the user to reindex via /backend/query-indexes.
Coverage statistics are tracked per tenant to avoid cross-organization leakage. Admin views surface the tenant snapshot directly and never blend organization-level scopes.

Debugging SQL output

Standard debug logs honour LOG_VERBOSITY=debug (or a development NODE_ENV) and describe which path the hybrid engine picks.
To additionally print the SQL emitted for the count and data queries, set QUERY_ENGINE_DEBUG_SQL=true. The flag is disabled by default to keep development logs readable.

Related files

packages/core/src/modules/query_index/data/entities.ts
packages/core/src/modules/query_index/lib/indexer.ts
packages/core/src/modules/query_index/lib/engine.ts
packages/core/src/modules/query_index/di.ts
packages/core/src/modules/query_index/migrations/Migration20251001120000.ts
packages/core/src/modules/query_index/migrations/Migration20251001123000.ts

Debugging SQL output​

Debugging SQL output