ObservabilityGuy

Posted on May 11

From Observable to Understandable: Building Agent-Native Code Knowledge Graphs with UModel

#umodel #observable

UModel builds agent-native code knowledge graphs using deterministic AST parsing and cross-domain associations for deeper AI code understanding.

Background
In recent years, AI agents (Cursor, Copilot, Claude Code, Codex, etc.) have become deeply involved in software development. From code completion to cross-file refactoring, from bug localization to architecture design, agent capabilities are growing stronger. From Prompt Engineering to Context Engineering to Harness Engineering, the ways to harness AI continue to evolve, and the capability boundaries of agents continue to expand.

However, when we hand a real enterprise-level project to an agent, an overlooked question begins to surface: Does the agent really understand your project?

The way agents currently understand code is diverging into two distinct schools:

● No-index school: Claude Code follows the Unix philosophy and performs no pre-indexing at all — it searches the file system in real time using grep, rg, and glob. Anthropic's internal tests found that agentic search outperforms retrieval-augmented generation across the board, by a lot. It is concise, real-time, and free of privacy issues, but each session starts from scratch and is costly for large repositories.

● CodeIndex School: Cursor, Windsurf, and Copilot follow the vector index route: using tree-sitter for semantic text segmentation, generating embeddings and storing them in a vector database (such as Turbopuffer), then using Merkle tree for incremental synchronization. Qodo and Augment Code go a step further by overlaying a code dependency graph and commit history index on top of the vector index.

Both schools have their own strengths, but they still struggle with the following problems:

● I want to change the Adapter interface of pkg/a2a. What is the scope of impact?

Vector similarity search cannot find the dependency chain, and grep-based file-by-file search is inefficient and incomplete.
● In production, the vibeops-xxx SLO has been breached with a large number of pending requests. What is the cause? Is it a code change?

The code index only covers the code domain; O&M domain data is not in the graph.
● Are there any abnormal dependencies in the project that cross architecture borders?

Without architecture level modeling, crossing borders cannot be defined.
What these problems have in common is that they require deterministic structural relationships, cross-domain entity associations, and change history across the time dimension.

The author has been working in the observable field for more than ten years, reviewing the development of observable, especially with the increasing complexity of cloud native and AI native systems, observable has long faced not only "looking at a log and staring at a monitoring chart", but also putting the scattered objects such as applications, services, containers, databases, alarms, changes and events back into the same context, answer "who is related to whom", "how the impact is spread" and "when did the problem begin to occur".

Because of this, Alibaba Cloud can observe the gradual evolution from the collection and display of scattered data such as logs, indicators, and links to the unified modeling of object-oriented, relationship, and time series. UModel is precipitated under this practical background.

This is strikingly similar to the trajectory of the observability realm: from viewing logs to unified modeling, observability evolved from fragmented data to the UModel knowledge graph. Yet code understanding, even with the most advanced CodeIndex solution, remains at the stage of helping agents find relevant snippets — the snippets are found, but the structure is not understood.

Five Paradigms of Code Understanding
Before diving into the technical solution, it is necessary to clarify the complete landscape of current code understanding. The five paradigms represent the evolution from stateless search to stateful inference.

Paradigm 1: Agentic Search (Claude Code School)
Claude Code is currently the most extreme index-free route. Anthropic founding engineer Boris Cherny publicly shared the story behind this decision: early versions of Claude Code used retrieval-augmented generation + a local vector library, but internal tests found that agentic search won comprehensively — by a lot, and this was surprising.

Its approach is pure to the point of elegance:

Agent receives a question  
  → Glob: pattern matching by file name (near-zero token cost)  
  → Grep (ripgrep): regex search by content (low token cost)  
  → Read: read the complete file (high token cost)  
  → Evaluate → next round of search or provide an answer

Tools are tiered by token cost, and the agent independently determines the search policy — like an experienced developer using rg + cat in the terminal to troubleshoot issues. This Unix-philosophy method has several real advantages:

● Zero pre-processing: no index build time required — open the project and start working immediately

● Always Fresh: No index expiration issues. Every search reflects the real-time file system status.

● Privacy-Friendly: Code never leaves your local machine — no embeddings are generated, and nothing is uploaded to any server.

● Simple and Reliable: The dependency chain is extremely short: Agent + file system + ripgrep. No vector database to crash.

But the ceiling of this approach is equally clear:

● No Structure Awareness: rg HandleRequest can find all occurrences, but cannot distinguish definitions from invocations or comments. The Agent has to read the code itself to determine this.

● Start from Scratch Every Time: Dependencies analyzed in the previous session are entirely discarded in the next. There is no persistence of accumulated knowledge.

● Limited scale: A TypeScript project with 200 files is fine, but for an enterprise-level monorepo with 50,000 files, agentic search may require 30+ rounds of tool calling and tens of thousands of tokens to piece together a global dependency graph. In practice, it is impossible to construct a complete global graph — only partial views relevant to the current job can be assembled.

● Unable to perform global analysis: Cannot answer "list all invocations across architecture levels" because the architecture levels themselves have not been modeled.

Paradigm 2: CodeIndex / Vector Index (Cursor, Windsurf, and Copilot School)
This is the mainstream technical approach of current AI IDEs. Taking Cursor as an example, its technical architecture has been extensively analyzed in public:

Code Repository  
  → Parse into AST with tree-sitter  
  → Segment by semantic unit (function, class, logic block)  
  → Generate vector embedding  
  → Store in Turbopuffer vector database  
  → Merkle Tree tracks changes for incremental synchronization

Cursor has achieved several elegant optimizations in engineering: it uses Merkle Tree root hash comparison to detect changes every 10 minutes and only re-embeds changed files; 92% codebase similarity among team members allows index reuse, reducing the initial indexing for new members from minutes to seconds; the index scope is controlled via .cursorignore.

Windsurf (Codeium) uses a similar retrieval-augmented generation architecture: 768-dimensional vector embedding + proprietary M-Query retrieval, but additionally overlays the Cascade context engine to track edit history, terminal commands, navigation patterns, and other session states. GitHub Copilot achieved sub-second semantic search indexing in March 2025.

The real value of CodeIndex is semantic search: the agent can find relevant code by describing intent in natural language without knowing the exact function name. This is something grep cannot do.

But CodeIndex has a fundamental limitation: vector similarity is text-level approximate matching, not structure-level relational reasoning.

● import pkg/a2a is a deterministic dependency in code, but in vector space it is merely a similarity signal of a text segment.

● Finding all modules that directly or indirectly depend on pkg/a2a requires graph traversal, not AISearch.

● Determining how many hops the impact of this interface change propagates along the invocation chain requires deterministic call relationships, not semantic similarity.

● Augment Code's evaluation shows that Cursor produces inconsistencies in cross-file refactoring across 50+ files: the first 30 files are modified correctly, but the last 20 contain faults due to context window overflow.

CodeIndex is essentially a smarter search engine: it helps agents find the correct snippets to insert into the context, but does not perform structured inference for agents.

Paradigm 3: Code Graph + Retrieval-Augmented Generation Hybrid (Qodo and Augment Code School)
Qodo and Augment Code represent the next evolutionary direction of CodeIndex: layering code structure graphs on top of vector indexes.

Qodo's technology stack is particularly rigorous:

● Self-developed Qodo-Embed-1 code embedding model (1.5B parameters surpassing 7B competitors on the CoIR benchmark), capturing syntax, variable dependencies, control flow, API usage, and other code-specific semantics through synthetic data training

● Client-side code graph building: functions, classes, modules and their call graphs, inheritance relationships, and cross-language links

● Server-side maintenance of vector database + design documents + architecture diagrams + PR/commit history

● AST-aware segment policy: recursively chunk AST edge zones and backfill key contexts such as import statements and class definitions

Augment Code 's Context Engine goes even further:

● Semantic index across repositories to understand how services connect and depend on each other

● Index beyond Code: commit history (why changes were made), codebase patterns, external documents, tickets, and even tribal knowledge

● Released Context Lineage in 2025 to index commit histories and diff summaries, enabling agents to understand the evolution of architectural decisions

● Open to any compatible agent via MCP protocol, with benchmarks showing 30–80% quality improvement

The key advancement of this school of thought is that code is not just text, but a structured graph. Augment, in particular, demonstrates the insight that understanding requires context, and context requires history.

However, even the most advanced code graph + retrieval-augmented generation hybrid solution still has several systemic borders:

● The graph scope is limited to the code domain: It knows that A invokes B, but not what alerts the service corresponding to B has triggered in the production environment. The code graph and the O&M graph are disconnected.

● Limited graph query capabilities: Graphs serving retrieval-augmented generation typically support neighbor lookup and short-path queries, but do not support arbitrary-depth graph traversal, pattern matching, or aggregation and analysis.

● IDE-local, not team-global: The index is attached to a developer's IDE instance. Structural insights analyzed by one person cannot be directly reused by another.

● Lack of a standardized timing dimension: Augment's Context Lineage has started incorporating commit history, but build logs, deployment logs, test logs, and event logs — these complete temporal memories are not yet in the graph.

Paradigm 4: CodeWiki / LLM Document (DeepWiki School)
DeepWiki (GitHub 15.7k stars, produced by the team behind Cognition AI / Devin) represents another approach: Code Repository → LLM → polished Wiki document. Simply replace github.com in the URL with deepwiki.com to see the automatically generated architecture diagrams, module documents, and function annotations.

This provides an excellent experience for developers to quickly understand unfamiliar projects. DeepWiki also supports controlling the generation scope through the .devin/wiki.json configuration file, and provides tool interfaces such as ask_question, read_wiki_structure, and read_wiki_contents via the MCP Server.

But documents are essentially linear narratives optimized for human reading:

● Hard to authenticate: Descriptions generated by LLMs may hallucinate, and in code understanding, an incorrect "A invokes B" is more dangerous than no information at all.

● Hard to traverse: Documents cannot answer graph traversal queries such as "list all functions that invoke X."

● Difficult to infer: Multi-hop analysis is not supported: if A is changed, following the calls relationship for 3 hops, which entry points are affected?

● Difficult to maintain: Changing a single line of code requires full regeneration. Although DeepWiki supports badge-triggered auto-refresh, each time it invokes a full LLM call, resulting in high cost and latency.

● Not programmable: The MCP interface essentially asks a document a question, rather than executing a query on the graph.

The relationship between CodeWiki and CodeIndex is similar to the relationship between materialized views and DPI engines in the database realm: documents are precomputed views that answer preset questions quickly, but cannot answer ad-hoc queries outside the view.

Paradigm 5: Code Knowledge Graph (Our Choice)
The five paradigms can be arranged along a single axis: from "stateless search" to "stateful inference".

If Agentic Search is each on-site survey, CodeIndex is surveying with a high-definition map, Code Graph + retrieval-augmented generation is a map annotated with highways and railways, and CodeWiki is a commissioned local chronicle: then what we want to build is a living GIS system: you can query the path between any two points, overlay real-time traffic data, annotate the traffic history of each road, continuously update as the terrain changes, and support storage analysis in any dimension.

The key difference is not better search, but a systematic combination of three dimensions:

1.Deterministic vs. Probabilistic: CodeIndex gives you the most likely relevant snippets (vector similarity). Code Graph gives you structural relationships parsed from the AST (but query capability is limited by the retrieval-augmented generation frame). We give you deterministic AST fetch + SPL/graph-match arbitrary query: confidence level 1.0 relationships + a Turing-complete query language.

2.Code domain vs cross-domain: From Agentic Search to Code Graph + retrieval-augmented generation, all solutions stop at the code domain. Which functions does this module invoke: answerable. How many alerts did the production service corresponding to this module have last week: unanswerable. UModel's EntitySetLink can connect code.module to ops.service, event.alert, and req.issue. The agent infers along the link without needing to jump out of the graph.

3.Snapshot vs timeline: CodeIndex is a snapshot index of the current code. Code Graph is starting to incorporate commit history. We provide a complete time dimension: commit_log, build_log, deploy_log, test_log, and incident_log. Each LogSet is associated with an EntitySet through DataLink. The agent not only knows what the current structure is, but also how it evolved to this point and how it performs in production.

From Personal Wiki to Code Wiki: One Paradigm, Different Certainty

The personal Wiki flow is: source data → LLM extracts entities and relationships → snap and normalization → UModel structure layer → Wiki pages. The entire extraction procedure depends entirely on the LLM, so each relationship is inherently uncertain: Are Zhang Cheng and Yuan Yi the same person? Is this article related to that project? Both require LLM judgment and correction by the snap layer.

There is one fundamental difference in the code realm: the structural relationships of code are deterministic.

import pkg/a2a imports pkg/a2a, and func (s *Server) HandleRequest() is a method of the Server class: these do not require LLM inference — AST parsing can determine them with a confidence level of 1.0.

This means that code wikis can introduce a model layer deterministic guarantee on top of the personal wiki paradigm:

Personal Wiki:   Source material → [LLM fetch] → Snap → UModel → Wiki Page  
                          ↑ Entirely dependent on LLM, confidence level 0.4–0.9  

Code Wiki:   Code Repository → [AST deterministic fetch] + [LLM semantics enhancement] → UModel → CLI query  
                          ↑ Structural relationships determined (1.0)   ↑ Summary/attribution supplement (0.6–0.9)

This layer of determinism is critical to the agent's reasoning: when the agent performs RCA, it needs to trust every hop on the invocation chain. If a calls relationship is guessed by the LLM, the entire reasoning chain becomes unreliable. Relationships fetched by AST are deterministic facts that the agent can trust unconditionally.

At the same time, the code wiki retains the LLM enhancement capabilities of the personal wiki: semantic layer information such as module summaries, document-code associations, and widget attributions is still generated by the LLM, annotated as INFERRED, and the agent can selectively accept it.

Entity + Log + Link: Not Just a Structure Graph
The core design of UModel in the observability realm is to describe the IT world with a graph composed of sets and links: EntitySet describes the current state of entities, LogSet describes timing management events, MetricSet describes measure indicators, and Link connects them into a network.

When we apply the same modeling methodology to the code realm, we get more than just a structure graph.

Entity: Current Code Structure
Five types of EntitySets describe the current state of the code and support the coexistence of multiple repositories through repo_id composite primary keys:

repo_id participates in the primary key calculation (Entity ID = md5(repo_id:pk_value)), so that modules with the same name in different repositories do not conflict, and a single graph can accommodate multiple projects simultaneously.

Six types of EntitySetLink describe structural relationships: contains, imports, calls, extends, describes, and belongs_to. Each relationship is annotated with confidence and extraction_method (EXTRACTED / INFERRED / AMBIGUOUS).

Log: The Change History of Code
This is a critical watershed between Code-WIKI and all pure graph tools.

In the observability realm, we look at not only the current status of a pod (Entity), but also its logs and metric trends. Code is the same: looking only at the structure without the history is like looking at a single screenshot.

Logs in the code realm go far beyond Git commits:

The value of logs lies in the associated query with entities:

● Who modified this module in the last week? →commit_log WHERE module_path = X AND time > now()-7d

● Have any new incidents occurred since the last deployment? →deploy_log JOIN incident_log ON time_window

● Has the build time increased after introducing this dependency? →build_log GROUP BY week, cross-referencing dependency change time in commit_log

Each LogSet is associated with the corresponding EntitySet through DataLink. The agent can navigate from an entity to a log, or trace back from a log to an entity.

Cross-Domain Association: Code Is Not an Island
Code never exists in isolation. It serves requirements, reaches production through CICD, generates observable data at runtime, and traces back to the code for troubleshooting when issues arise. In the current toolchain, each link is an island: requirements are in Jira, code is in Git, builds are in Jenkins, services run in K8s, and alerts are in the monitoring system.

When a production alert fires, how many systems must you jump through and how many pieces of info must you manually correlate to trace from the alert back to the code change?

The value of UModel is that all these entities can live in the same graph.

Technical Architecture: Dual-Track Fetch + Graph Build
Overall Pipeline

DETECT: Incremental Change Detection
A SHA256 content fingerprint is computed for each file and compared against the cache from the last build. For vibeops-agents (~2,375 Go files), an incremental build typically processes only dozens of changed files, reducing the time from minutes to seconds.

EXTRACT: AST + LLM Dual Track
AST track (tree-sitter): A PEG-based incremental resolver that supports 40+ languages. It uses tags.scm rules to consistently fetch definitions, references, structural relationships, import relationships, invocation relationships, and inheritance relationships across languages. All extraction results have a confidence level of 1.0.

Notably, CodeIndex solutions such as Cursor also use tree-sitter. However, they use tree-sitter for semantic text segmentation (splitting code into chunks suitable for embedding), whereas we use tree-sitter for structure extraction (fetching deterministic relationships such as definitions, references, invocations, and inheritance). The same resolver serves completely different goals: the former produces vectors, and the latter produces a graph.

LLM track: Module summaries (agent context injection segments, not human-readable documents), document-code associations, and widget attribution. Each is annotated with extraction_method: INFERRED + confidence level. Agents can select a trust threshold by scenario: RCA prefers high confidence levels, while exploration scenarios can be relaxed.

RESOLVE: Cross-file Symbol Parsing
Single-file AST cannot resolve cross-file references. RESOLVE handles the following:

● Go import github.com/org/repo/pkg/a2a→ module_path pkg/a2a

● Method receiver type (s *Server)→ attribution code.type pkg/server.Server

● Invoke s.HandleRequest()→pkg/server.Server.HandleRequest

● Interface implementation type Adapter struct implements Handler→ extends relationship

Deterministic parsing, no dependency on LLM.

BUILD: Graph Assembly + Architecture Discovery
Architecture discovery is not simple community detection: Louvain/Leiden discovers clusters, not architectures. Complete flow:

Step 1: Graph construction  
  Modules as edge zones, imports + calls + extends as directed edges  
  Edge weight: calls > imports > extends  

Step 2: Hierarchical analysis  
  Compute dependency directionality: A→B and B↛A → A is above B  
  Detect top-level entries with indegree = 0 and underlying infrastructure with outdegree = 0  

Step 3: Community detection  
  Leiden algorithm discovers functional clusters on directed graphs  
  Resolution parameter controls granularity (~150 modules → ~15 widgets)  

Step 4: Annotation and naming  
  Annotate hierarchy based on dependency direction: API/Gateway, Service/Business, Infrastructure/Utility  
  LLM naming and description, cross-validation with project documents

The output is a hierarchical, directional, named architecture view. The agent can use this to determine whether an invocation crosses architecture layers.

SYNC: Synchronize to UModel

Entity write: starops umodel post-logs → __entity logstore  
Topo write:  starops umodel post-logs → __topo logstore  
Schema synchronization: starops umodel sync (register EntitySet/Link definitions)

The UModel backend is based on the Simple Log Service storage engine and inherits capabilities such as high-throughput writes, second-level query, graph-match graph traversal, SQL aggregation, and full-text index.

SERVE: Engineering Details of the Query
Key patterns explored in practice:

Two-step query: graph-match returns entity_id without business fields. All graph traversal queries first traverse the topology to obtain the ID set, then pull business fields in batches:

Step 1: .topo | graph-match (n1:code@code.module {__entity_id__: '<id>'})  
              -[e]->(n2) project n1, e, n2  

Step 2: .entity with(domain='code', name='code.module', ids=['id1','id2',...])

Aggregation via direct Simple Log Service (SLS) query: Statistical queries such as hot spot analysis directly run SQL against the __topo Logstore:

SELECT dest_entity_id, count(1) as import_count

FROM log WHERE relation_type = 'imports'

GROUP BY dest_entity_id

ORDER BY import_count DESC LIMIT 20

At the current multi-repository scale (~11,000 entities, ~19,000 edges, including the vibeops-agents and starops-cli projects), the end-to-end latency of a single query is in the hundreds of milliseconds.

Agent Interaction Layer: Command-Line Interface (CLI) + Skill
CLI Design
The agent's reasoning is progressive: search first, see the results, and then decide the next step. The CLI's search→context→impact naturally matches this pattern and supports batch execution and MPS queue combinations.

code-wiki query <subcommand>     # graph query  
  ├── search <keyword>       # entity search  
  ├── context <name>         # full context of a symbol  
  ├── impact <path>          # change impact analysis  
  ├── callers / callees      # invocation chain  
  ├── deps / rdeps           # dependencies / reverse dependencies  

code-wiki check <subcommand>     # administration check  
  ├── arch                   # architecture violation scan  
  └── hotspots               # coupling hot spots  

code-wiki ingest             # build/update graph  
code-wiki status             # health check

Subcommands are organized by agent intent. The agent does not need to know whether the underlying implementation is graph-match or Simple Log Service SQL: use impact to view the impact scope.

Output Format: Optimized for the Agent Context Window
The default --format brief output is optimized for the agent's token budget:

$ code-wiki query context pkg/a2a  

Module: pkg/a2a  
  LOC: 1,247 | Language: Go | Component: a2a-protocol  
  Summary: A2A protocol implementation for agent-to-agent communication  

Types (17): TaskStore(struct), A2AServer(struct), AgentCard(struct), ...  
Functions (52): HandleA2ARequest[entry], StartA2AServer[entry], ...  
Reverse dependencies (9): pkg/api/handler, pkg/server, cmd/vibeops-agents, ...  
Component crossings: → api, → scheduler

The output of a query context is < 500 tokens. Use --format json when full data is required.

Skill: Scenario-based User Guide
Agent Skills with the command-line interface (CLI) are organized by scenario. Agents do not need to learn Structured Process Language syntax:

## RCA: From alerting to code  
code-wiki query search <keyword>       # Locate module  
code-wiki query context <module>      # Understand structure  
code-wiki query callers <function>    # Trace invocation chain  

## Development: Evaluate impact before changing code  
code-wiki query impact <module>       # Impact scope  
code-wiki query deps / rdeps          # Dependencies  

## Administration: Regular checks  
code-wiki check arch                  # Architecture violations  
code-wiki check hotspots              # Hot spot analysis

Case Study
Case 1: Agent Independently Completes Change Impact Assessment
We assign a job to a sub-agent:

Assess the impact scope of modifying the pkg/a2a module in the vibeops-agents project, including affected modules and entry points, upstream and downstream dependencies, and architecture threats.

The sub-agent completed the following inference only by code-wiki** command-line interface (CLI) and Skill**, without reading any source code files:

$ code-wiki query search a2a  
Found 9 entities matching a2a:  
  [module] pkg/a2a                    LOC:1247  A2A protocol core  
  [module] pkg/a2a/adapter            LOC:834   A2A adapter implementations  
  [module] pkg/a2a/taskstore          LOC:567   A2A task state management  
  ...  

$ code-wiki query context pkg/a2a  
Module: pkg/a2a  
  Types (17): TaskStore, A2AServer, AgentCard, ...  
  Functions (52): HandleA2ARequest[entry], StartA2AServer[entry], ...  
  Reverse dependencies (9): pkg/api/handler, pkg/server, cmd/vibeops-agents, ...  

$ code-wiki query impact pkg/a2a  
Impact of pkg/a2a:  
  Direct dependents (9): pkg/api/handler, pkg/server, cmd/vibeops-agents, ...  
  Entry points affected: HandleA2ARequest, StartA2AServer, main  
  Component crossings: a2a-protocol → api, a2a-protocol → server  

$ code-wiki check hotspots  
Top hotspots by import count:  
  1. pkg/a2a/adapter    (48 imports)  ← High coupling warning  
  2. pkg/util/logger     (35 imports)  
  3. pkg/config          (32 imports)

Agent final output:

=== A2A Protocol Modification Impact Assessment ===  
  
Scope of Impact:  
- Directly dependent modules: 9 (pkg/api/handler, pkg/server, etc.)  
- Entry points affected: 3 (HandleA2ARequest, StartA2AServer, main)  
- Cross-component borders: 2 (a2a-protocol → api, a2a-protocol → server)  
  
Threat points:  
- pkg/a2a/adapter is imported by 48 modules, the highest coupling degree in the entire project  
- bus factor = 1, consider adding a reviewer  
  
Suggested execution order:  
1. Modify the pkg/a2a core protocol definition first  
2. Synchronize the interface implementation of pkg/a2a/adapter  
3. Modify the invocation method of pkg/api/handler  
4. Update the initialization logic of pkg/server  
5. Regression testing on the cmd/vibeops-agents entry point

5 commands in total, each taking 1–3 seconds, with a total time of less than 15 seconds. The Agent did not read any source code files and completed a full impact assessment solely through graph queries.

Case 2: RCA: From Alerting to Code Root Cause
Production alerting: service-vibeops error_rate > 5%

# 1. Locate the code module from the O&M entity  
$ code-wiki query context pkg/server  
Module: pkg/server  
  Functions (23): StartServer[entry], handleRequest, applyMiddleware, ...  
  Dependencies (12): pkg/a2a, pkg/config, pkg/auth, ...  
  
# 2. Trace the invocation chain to locate the potentially faulty downstream  
$ code-wiki query callees pkg/server.handleRequest  
Callees of handleRequest:  
  pkg/auth.ValidateToken          [component: auth]  
  pkg/a2a.HandleA2ARequest        [component: a2a-protocol]  
  pkg/scheduler.DispatchTask      [component: scheduler]  
  
# 3. Check commit_log and find that the a2a module was changed 2 hours ago  
#    author=xxx, message=refactor adapter interface  
  
# 4. Confirm the impact of the change  
$ code-wiki query impact pkg/a2a  
Impact of pkg/a2a:  
  Direct dependents (9): pkg/api/handler, pkg/server, ...  
  Entry points affected: HandleA2ARequest, StartA2AServer  
  
# → Root cause: The a2a interface refactoring affected the server invocation chain. Check interface compatibility.

Case 3: Architecture Administration: Detecting Architecture Decay

# 1. Scan for architecture violations  
$ code-wiki check arch  
Architecture violations:  
  pkg/util/logger calls pkg/api/handler.GetRequestID  
    [utility → api] The utility layer should not invoke the api layer  
  pkg/config calls pkg/scheduler.GetDefaultConfig  
    [infra → service] The infrastructure layer should not depend on the business layer  
  
# 2. Identify coupling hot spots  
$ code-wiki check hotspots  
Top hotspots:  
  1. pkg/a2a/adapter      48 imports  [HIGH]  
  2. pkg/util/logger       35 imports  [NORMAL]  
  3. pkg/scheduler/queue   28 imports  [MEDIUM]  
  
# 3. Analyze the highly coupled module in depth  
$ code-wiki query rdeps pkg/a2a/adapter  
Reverse dependencies (48):  
  pkg/api/* (12 modules), pkg/server/* (8 modules), pkg/scheduler/* (6 modules), ...  
  
# Agent suggests splitting into adapter/protocol, adapter/transform, and adapter/routing

Outlook
Comprehensive Digital Evaluation
We plan to build a standardized code comprehension evaluation benchmark covering core scenarios such as impact analysis, invocation chain tracing, architecture violation detection, and RCA root cause localization. On real codebases of varying scales, we will compare the performance of three paradigms — Model + Bash (Agentic Search), Model + CodeWiki (LLM document), and Model + UModel (knowledge graph) — across dimensions including accuracy, recall rate, number of inference steps, and token consumption.

Use SWE-bench-style quantization evaluation to make the capability borders of each paradigm measurable and reproducible. Based on this, optimize the overall technical architecture based on benchmark fractions, including iterative upgrades to related skills and the command-line interface (CLI).

Agent Self-Maintenance
Agents are not just graph consumers, they can also be maintainers:

● After a code schema evolution, the associated LLM-inferred relationships are marked for reevaluation

● Regularly inspect orphaned entities, missing relationships, and expired data

● On top of the above capabilities, a verification and quality assessment system is also needed to make self-maintenance controllable.

Architecture Guard Gate
Integrated into the CI flow, automatically run on PR:

codecode-wiki ingest --incremental        # Incremental graph update  
code-wiki check arch                  # Architecture violation check  
code-wiki query impact <changed_files> # Change impact analysis

From Observable to Understandable
From modeling observable data to modeling code knowledge, from describing running systems with Entity + Log to describing code systems with Entity + Log: UModel is evolving from observing IT systems to understanding the code and procedures that build them.

When agents truly understand the structure, history, and production performance of code simultaneously, genuinely AI-native software engineering becomes possible.

DEV Community

From Observable to Understandable: Building Agent-Native Code Knowledge Graphs with UModel

Top comments (0)