Search Ranking

ContextForge layers four ranking signals on top of semantic similarity. They surface live, anchored knowledge over stale wishes — without hiding anything.

TL;DR. If you have items in memory, ContextForge ranks them by relevance to your query and by how alive, recent, anchored to real code, and concrete they are. Pure semantic similarity is just the starting point.

The four signals

Each signal is layered on top of the base cosine similarity score returned by the embedding model. Boosts are additive and bounded; one penalty is multiplicative. Nothing is filtered out — items can still be retrieved, they just lose priority when something more trustworthy exists.

1. Test data exclusion

Items tagged test are excluded from default search results so test seed data never pollutes a real query. You can opt them back in by passing filters.tags: ["test"].

What this changes:a query like "auth flow" no longer returns the fixture rows your team created while building tests.

2. Recency boost (time decay)

Fresh items get up to +15% on top of their similarity score. The boost decays with a half-life of 60 days, using updated_at (falling back to created_at).

Today
+15%
60 days
+7.5%
120 days
+3.75%

What this changes: when you have two items with similar relevance, the more recent one wins. Old items are not penalized — they just stop getting a free boost.

3. Git correlation

When an item references a file path that has been touched in the last 30 days by a tracked git repository, it gets up to +20%on its score. The boost is proportional to how many of the item's referenced paths show up in recent commits.

Example. A decision note that mentions src/lib/auth.ts ranks above a generic plan when that file is actively being committed. Items with no file references stay neutral — there's nothing to correlate.

What this changes: knowledge that is anchored to live code surfaces above wishes that mention no specific code. Requires a connected git repository — set one up under Git Integration.

4. Proposal penalty

Items whose title or content starts with TODO:, FIXME:, WISH:, XXX:, HACK:, or IDEA: receive a 30% multiplicative penalty. The prefix must be at the very start (after whitespace) — inline mentions don't trigger it.

What this changes: AI-generated TODO checklists no longer rank as if they were established facts. A real decision stored as a normal note will out-rank a wish that has the same topic and similar wording.

How they combine

The final score is computed once per candidate item:

final = min(
  1.0,
  (cosine + cosine*0.15*recency + cosine*0.20*git) * proposalPenalty
)

The three boosts stack additively (each capped at its own contribution) and the proposal penalty multiplies the whole thing. Even a perfect match loses real ground when it's flagged as a wish.

Tips for items that rank well

  • Reference real file paths or function names in the content. Items that anchor to code your team is actively committing get a measurable boost over items that don't.
  • Update items when the underlying fact changes instead of creating a new one. Bumping updated_at keeps the recency boost alive.
  • Store wishes as tasks, not as knowledge items. Task tracking lives at /dashboard/tasks. If you must keep a TODO in memory, the prefix marks it as such — but it will rank below real facts.
  • !Avoid the "dump 30 AI-generated TODOs" pattern. They'll all match similar queries with similar scores and crowd out the one real answer.

FAQ

Are any items hidden from results?

Only items tagged test are excluded by default, and they come back the moment you ask for them. Everything else is always retrievable — it just ranks differently.

Can I opt out of these signals?

Not yet from the public API. The signals run on every query right now. If you have a use case that's harmed by one of them, reach out and we'll consider exposing a knob.

Does git correlation work without a connected repo?

No — if you haven't connected a repository, the git factor is always 0 and items neither gain nor lose from this signal. The other three signals still apply.

How does this compare to other memory tools?

Most semantic memory products stop at cosine similarity. Git correlation in particular is uncommon — it requires the memory store to also ingest git activity, which ContextForge does natively. See Git Integration.

Related

  • Semantic Search — how the underlying embedding match works.
  • Git Integration — connect a repo so Signal 3 (git correlation) can fire.
  • Tasks — where to put wishes instead of as knowledge items.