ML.

Tree-sitter

Semble Architecture Analysis: How an Agent-Oriented RAG Solves Code Search with Static Embeddings

Semble is a Python library that splits code into chunks with tree-sitter, fuses Model2Vec static embeddings with BM25 via RRF, and applies code-aware reranking — delivering millisecond code search on CPU alone. Where CodeGraph solves the same problem with an AST knowledge graph, Semble solves it through retrieval. This post analyzes the architecture by contrasting the two approaches.

CodeGraph Architecture Analysis: How Is the Code Intelligence Layer Beneath Coding Agents Built?

CodeGraph is a TypeScript tool that uses tree-sitter to parse source code, builds a local SQLite knowledge graph of symbols, edges, and files (with FTS5), and exposes that graph via an MCP server to coding agents like Claude Code, Cursor, Codex, and Hermes Agent. Instead of burning tokens navigating a codebase with grep/Read calls, agents get answers from a single codegraph_explore invocation. This post analyzes the architecture that makes that possible.