Session 5 — Semantic Signatures

What We Built

Real meaning-based signatures replace random number generation. Similar concepts now produce similar signatures automatically. The brain understands relationships between ideas for the first time.

The Problem With Random Signatures

Before Session 5, every node's signature was 64 random numbers. The architecture worked perfectly but had no understanding — "dog" and "puppy" were as unrelated as "dog" and "quantum physics."

How Semantic Signatures Work

Input — a word or phrase like "firewall"
Encode — sentence-transformers converts it to a 384-dimension embedding vector
Project — a fixed projection matrix reduces 384 → 64 dimensions
Normalize — unit length normalization for cosine similarity
Store — the 64-number vector becomes the node's signature

Similar concepts produce similar vectors automatically — no manual labeling needed.

Key Concepts

Word Embeddings

Vectors where geometry encodes meaning. Similar meanings produce vectors that point in similar directions.

The Famous Example

king − man + woman ≈ queen

That's not magic — it's just geometry. Meaning becomes math.

Analogy

A map of meaning. Every concept gets coordinates. Nearby coordinates = similar meaning.

Dimension Reduction

384 dimensions → 64 dimensions via a fixed random projection matrix (seed 2025). Preserves semantic relationships — similar concepts stay similar after projection.

The Model

all-MiniLM-L6-v2 — small, fast, CPU-friendly. ~80MB download, cached locally after first use. No API key, no internet required after download.

Similarity Results (Verified)

Pair	Score	Interpretation
dog vs puppy	0.814	Almost the same concept ✅
dog vs cat	0.773	Similar — both animals ✅
dog vs database	0.278	Barely related ✅
networking vs firewall	0.535	Related domain ✅
networking vs cooking	0.439	Less related ✅

New Methods Added

# New file — embeddings.py
concept_to_signature("firewall")          # single concept → 64-dim vector
concepts_to_signatures(["DNS", "VPN"])    # batch → faster
similarity("dog", "puppy")               # test similarity between two concepts

# New methods in KnowledgeLandscape
brain.learn("firewall")                   # teach one concept
brain.learn_many(["DNS", "VPN", "router"]) # batch teach
brain.query_concept("network security")   # query by name
brain.concept_label(node_id)              # get human-readable label

TurboQuant Context

Google published TurboQuant in March 2026 — a compression algorithm working in the same problem space as NCI. The key difference:

TurboQuant — compresses existing large models after the fact
NCI — compression is native to the architecture from the start

TurboQuant researchers noted their approach is approaching its theoretical limit. NCI starts from a different foundation entirely.

Key Files

embeddings.py — semantic signature generation
index.py — learn(), learn_many(), query_concept(), concept_label()