dp_006 q08 (Claude Code) | Evaluation Runs

Filesystem View raw JSON

Input

177k

Output

191

Steps

Tools

Latency

18.2s

Cost

$2.68

#2 Ran ls /Users/prasanna/Documents/supermemoryai/supermemoryfs/.scratch/eval/eval/workspace/fs/dp_006/agent_memory/ 50 chars

Command Bash

ls /Users/prasanna/Documents/supermemoryai/supermemoryfs/.scratch/eval/eval/workspace/fs/dp_006/agent_memory/

{
  "description": "List agent_memory directory"
}

Result 7 lines

code
email
issues
memory
notes
pr_threads
releases

#3 Ran ls /Users/prasanna/Documents/supermemoryai/supermemoryfs/.scratch/eval/eval/workspace/fs/dp_006/agent_memory/releases/ 243 chars

Command Bash

ls /Users/prasanna/Documents/supermemoryai/supermemoryfs/.scratch/eval/eval/workspace/fs/dp_006/agent_memory/releases/

{
  "description": "List releases directory"
}

Result 8 lines

release-v0.1.0-2026-01-22.md
release-v0.2.0-2026-02-28.md
release-v0.3.0-2026-03-31.md
release-v0.3.5-hotfix-2026-04-28.md
release-v0.4.0-2026-05-15.md
release-v0.5.0-2026-06-08.md
release-v0.6.0-2026-06-22.md
release-v0.6.2-prep-2026-06-30.md

#5 Edited agent_memory/releases/release-v0.5.0-2026-06-08.md 15,908 chars

Command Read

{
  "file_path": "/Users/prasanna/Documents/supermemoryai/supermemoryfs/.scratch/eval/eval/workspace/fs/dp_006/agent_memory/releases/release-v0.5.0-2026-06-08.md"
}

Result 9 lines

1	# kitabi v0.5.0 (2026-06-08) — Concurrent file processing
2	
3	> **Breaking change:** plugin-API ABI version bumped from 1 → 2. Plugin authors must rebuild against `kitabi-plugin-api` 0.5.x. See [CHANGELOG-PLUGIN-API.md](../../CHANGELOG-PLUGIN-API.md) and the migration section below.
4	
5	---
6	
7	## Highlights 🚀
8	
9	- **5× throughput improvement** on multi-file batches. A 60 MB Wikipedia subs…

#6 Reasoning 735 chars

Here's the benchmark comparison from the v0.5.0 release notes (M2 Max, macOS 14.5, `--max-concurrency=8`):

| Corpus | Files | Total size | v0.4.2 | v0.5.0 | Speedup |
|---|---|---|---|---|---|
| Tiny (synthetic) | 10 | 500 KB | 48 ms | 12 ms | 4.0× |
| Small (personal wiki) | 120 | 8 MB | 410 ms | 88 ms | 4.7× |
| Medium (project docs) | 480 | 35 MB | 2,940 ms | 580 ms | 5.1× |
| Large (Wikipedia subset) | 1,200 | 60 MB | 7,400 ms | 280 ms | 26.4× |
| XL (synthetic, 1 GB) | 8,000 | 1,024 MB | OOM / killed | 4,100 ms | — |

Source: `agent_memory/releases/release-v0.5.0-2026-06-08.md:136-142`. Note: the XL corpus has no speedup ratio because v0.4.2 OOM'd (the old sequential path loaded all files into memory before processing).

Answer

Here's the benchmark comparison from the v0.5.0 release notes (M2 Max, macOS 14.5, `--max-concurrency=8`): | Corpus | Files | Total size | v0.4.2 | v0.5.0 | Speedup | |---|---|---|---|---|---| | Tiny (synthetic) | 10 | 500 KB | 48 ms | 12 ms | 4.0× | | Small (personal wiki) | 120 | 8 MB | 410 ms | 88 ms | 4.7× | | Medium (project docs) | 480 | 35 MB | 2,940 ms | 580 ms | 5.1× | | Large (Wikipedia subset) | 1,200 | 60 MB | 7,400 ms | 280 ms | 26.4× | | XL (synthetic, 1 GB) | 8,000 | 1,024 MB | O…

Judge: covers all asked facts

SMFS Won View raw JSON

Input

95k

Output

Steps

Tools

Latency

12.2s

Cost

$1.43

#2 Ran smfs grep "v0.5.0 release notes benchmark table corpus" /Users/prasanna/Documents/supermemoryai/supermemoryfs/.scratch/eval/eval/workspace/s… 5,166 chars

Command Bash

smfs grep "v0.5.0 release notes benchmark table corpus" /Users/prasanna/Documents/supermemoryai/supermemoryfs/.scratch/eval/eval/workspace/smfs/dp_006/agent_memory/

{
  "description": "Search for v0.5.0 benchmark table"
}

Result 7 lines

# supermemory semantic search — 10 results for "v0.5.0 release notes benchmark table corpus"
# searches by meaning across files in this container. usage:
#   grep "natural language query"          search all files
#   grep "query" path/to/dir/              search within directory
# output: <filepath>:<line_start>-<line_end>:<chunk> (chunk text is verbatim from the file)

/releases/release-v0.5.0-2…

#3 Reasoning 606 chars

I found the table.

| Corpus | Files | Total size | v0.4.2 | v0.5.0 | Speedup |
|---|---|---|---|---|---|
| Tiny (synthetic) | 10 | 500 KB | 48 ms | 12 ms | 4.0× |
| Small (personal wiki) | 120 | 8 MB | 410 ms | 88 ms | 4.7× |
| Medium (project docs) | 480 | 35 MB | 2,940 ms | 580 ms | 5.1× |
| Large (Wikipedia subset) | 1,200 | 60 MB | 7,400 ms | 280 ms | 26.4× |
| XL (synthetic, 1 GB) | 8,000 | 1,024 MB | OOM / killed | 4,100 ms | — |

Source: `releases/release-v0.5.0-2026-06-08.md:131-142`. Note that v0.4.2 OOM'd on the XL corpus (no comparable time), so no speedup ratio is reported for that row.

Answer

I found the table. | Corpus | Files | Total size | v0.4.2 | v0.5.0 | Speedup | |---|---|---|---|---|---| | Tiny (synthetic) | 10 | 500 KB | 48 ms | 12 ms | 4.0× | | Small (personal wiki) | 120 | 8 MB | 410 ms | 88 ms | 4.7× | | Medium (project docs) | 480 | 35 MB | 2,940 ms | 580 ms | 5.1× | | Large (Wikipedia subset) | 1,200 | 60 MB | 7,400 ms | 280 ms | 26.4× | | XL (synthetic, 1 GB) | 8,000 | 1,024 MB | OOM / killed | 4,100 ms | — | Source: `releases/release-v0.5.0-2026-06-08.md:131-142`. N…

Judge: covers all asked facts