JSONB Indexing Layer (Hybrid)
Goal: make querying base entities together with custom fields fast and simple by maintaining a flat, query-friendly JSONB index per entity record, without breaking module isomorphism or schema agility. See the Hybrid query engine for runtime usage.
What you get
- Hybrid engine: uses a JSONB-backed index when available; falls back to the join-based engine otherwise.
- Event-driven updates: created/updated/deleted events keep the index synced. Custom-field updates also trigger reindex via the DataEngine write layer.
- Zero-churn evolution: new fields (base or custom) are available immediately in the JSON document; promote hot fields later if needed.
Storage
- Table:
entity_indexesid uuid(PK)entity_type texte.g.,example:todoentity_id textbase record id as textorganization_id uuid null,tenant_id uuid nulldoc jsonbflattened document of base fields +cf:<key>valuesindex_version int(default 1),created_at,updated_at,deleted_atorganization_id_coalesced uuid generated always as (coalesce(organization_id, '00000000-0000-0000-0000-000000000000')) stored
- Indexes
gin(doc jsonb_path_ops)for JSONB containment/path queries- Unique:
unique(entity_type, entity_id, organization_id_coalesced)
Document shape
- Base fields are stored under their snake_case DB names.
- Custom fields are stored as
cf:<key>values. - Arrays are preserved as arrays; singletons are scalars.
Indexer
- Build: merges base row with custom field values into a single plain object.
- Upsert: inserts or updates the JSONB row keyed by
(entity_type, entity_id, organization_id_coalesced). - Delete: sets
deleted_aton logical delete.
Events and consistency
- Subscribers are registered for all entities that have custom field definitions:
<module>.<entity>.created|updated→ upsert index row.<module>.<entity>.deleted→ mark index row deleted.
- Custom-field writes done via the DataEngine emit
<module>.<entity>.updatedas well, reusing the same subscriber path.
Query routing
- The
HybridQueryEngineprefers the JSONB index when rows exist for the target entity; otherwise it falls back to the basic engine. - Supported in index path:
- Filters: base fields and
cf:*via JSON path extraction and casting for common types. - Sorting: base fields and
cf:*using casted expressions when possible. - Paging and multi-tenant scoping (
organization_id,tenant_id) and soft-delete exclusion by default.
- Filters: base fields and
Backfill
- CLI:
mercato query_index rebuild --entity <module:entity>reindexes existing rows with a live progress bar.- All orgs/tenants:
mercato query_index rebuild --entity example:todo --global - Scoped:
mercato query_index rebuild --entity example:todo --org <orgId> --tenant <tenantId> - Single record:
mercato query_index rebuild --entity example:todo --record <entityId> - Options:
--withDeleted--limit <n>--offset <n>--batch <size>
- All orgs/tenants:
- CLI:
mercato query_index rebuild-allwalks every registered entity index using the same flags asrebuild, making it easy to refresh the entire corpus in one go. - CLI:
mercato query_index reindex [--entity <module:entity>] [--tenant <tenantId>] [--force](shorthand:yarn mercato reindex) now splits work into five partitions by default. Override the granularity with--partitions <n>or target a single shard with--partition-index <i>. Combine with--skipResetCoverageon partitions >0 when running several workers concurrently. The command continues to honour--batch <size>for the inner chunk size (defaults to 500 rows).- Parallel run: start five terminals with
mercato query_index reindex --entity example:todo --partitions 5 --partition-index <0-4>; the partition0process resets coverage, so the remaining workers should pass--skipResetCoverage. - Single worker: omit
--partition-indexand the CLI will iterate every partition sequentially.
- Parallel run: start five terminals with
- Each CLI run purges existing index rows before reindexing unless
--skipPurgeis provided, ensuring a clean rebuild. - The admin status view now tracks per-partition progress and marks jobs as stalled when no heartbeat is recorded for 60 seconds, making it easier to spot a wedged worker.
- Scheduling via events remains available:
mercato query_index reindex [...]andmercato query_index purge [...]enqueue background jobs when you don’t need immediate writes. API callers can passpartitionCount,partitionIndex, andbatchSizeto mirror the CLI behaviour; omittingpartitionIndexemits one event per partition automatically. - Bulk imports: build and insert index payloads alongside your ETL transaction with the
buildIndexDocumenthelper (packages/core/src/modules/query_index/lib/document.ts) so large seeders can skip the post-import reindex sweep. The CRM stress-test seeder now follows this pattern.
Performance tips
- Start with
gin(doc jsonb_path_ops). - Add expression indexes for hot JSON paths (filters/sorts) before promoting to typed columns:
- Example:
create index on entity_indexes (((doc->>'cf:priority')::int));
- Example:
- Consider typed columns only for extreme hotspots.
Limitations and notes
- The index row appears after create/update events or backfill. Queries still work via fallback when the index is missing.
- Cross-module relations remain by foreign key id only; no cross-module joins.
- When the hybrid engine falls back because of partial index coverage it emits a warning and, by default, schedules an asynchronous reindex for the affected entity/tenant scope. Toggle this with
SCHEDULE_AUTO_REINDEX(defaults totrue) if you prefer to manage reindex jobs manually. For smaller datasets (≤10k customers) keep this flag enabled together withOPTIMIZE_INDEX_COVERAGE_STATS=falseso coverage updates remain instant. OPTIMIZE_INDEX_COVERAGE_STATS=trueenables cached coverage snapshots. Leave itfalseto recalculate counts on every read, which is ideal for fast feedback during development or on small installations.- Keep the indexed path even when gaps are detected by setting
FORCE_QUERY_INDEX_ON_PARTIAL_INDEXES=true(enabled by default). When active, list responses include anx-om-partial-indexresponse header (withentity,baseCount,indexedCount, andscope) so the admin UI can surface a banner prompting the user to reindex via/backend/query-indexes. - Coverage statistics are tracked per tenant to avoid cross-organization leakage. Admin views surface the tenant snapshot directly and never blend organization-level scopes.
Debugging SQL output
- Standard debug logs honour
LOG_VERBOSITY=debug(or a developmentNODE_ENV) and describe which path the hybrid engine picks. - To additionally print the SQL emitted for the count and data queries, set
QUERY_ENGINE_DEBUG_SQL=true. The flag is disabled by default to keep development logs readable.
Related files
packages/core/src/modules/query_index/data/entities.tspackages/core/src/modules/query_index/lib/indexer.tspackages/core/src/modules/query_index/lib/engine.tspackages/core/src/modules/query_index/di.tspackages/core/src/modules/query_index/migrations/Migration20251001120000.tspackages/core/src/modules/query_index/migrations/Migration20251001123000.ts