library-pipeline · the manual walk-through, scan to shelf
Skip this if C:\scans\<Title>\ already has your scanned pages in it.
Before anything else, get a copy of your scanned pages into C:\scans\<Title>\. This folder is permanent, untouched storage — the pipeline never writes to it, and you never sort or organize anything here. Just get the raw pages in.
Wherever your scans currently live (scanner output folder, a temp folder, a USB drive — anywhere), copy them — don't move them — so your true originals stay put no matter what happens later.
C:\scans\<Title>\ contains your scanned page images, and your original copies elsewhere are untouched.
Skip this if C:\library-pipeline\<Title>\book_config.toml already exists.
"Temporary" means: chapter-folder sorting (below) goes away once automatic chapter detection is built. Until then, this is the manual setup every new book needs.
1a. Create the working folder. Use the exact same title you'll use everywhere else in this checklist — capitalization and punctuation included.
DATA FOLDER1b. Copy the sample config into it. This file always lives in the same spot for every book — inside that book's own working folder, never in the code repo.
CODE FOLDER1c. Open that new book_config.toml (Notepad or VS Code) and edit two lines — everything else can stay as shipped:
1d. Set ignore_zones — ask Claude to measure it for you. Almost every book has some repeating header or footer (book title at the top, "Page X of Y" at the bottom, etc.) that you don't want baked into the cleaned text on every single page. Rather than measuring pixels by hand, upload 1–2 sample page images straight into this chat and ask Claude to find the coordinates:
C:\scans\<Title>\, then ask: "Here are 1-2 sample pages from this book. Can you measure the pixel coordinates of any repeating header/footer band and give me an ignore_zones entry for book_config.toml?" Claude will measure the actual image (not guess from looking) and hand back a ready-to-paste block like:
book_config.toml in place of the commented-out example, matching the indentation already there.
No header or footer on this book at all? Leave ignore_zones commented out — nothing to exclude.
1e. Organize scans into chapter folders. Create 00_source\Chapter_01\, Chapter_02\, etc. (zero-padded) inside the working folder, and copy pages from C:\scans\<Title>\ into the matching chapter folder. Front matter goes in Chapter_00\.
Chapter_01\ folder instead, in correct reading order. You'll get one unbroken 02_cleaned.md with no chapter breaks, which you can split later. Only works cleanly if your filenames already sort into correct page order (sequential numbers or timestamps).book_config.toml exists with the right title/author, and Chapter_01 (at minimum) has page images in it.
Open the Umi-OCR app (desktop shortcut, or double-click the .exe directly — see note below if you don't have a shortcut yet). Inside the app, toggle the HTTP service ON. It defaults to off every time you open it.
No desktop shortcut? Open C:\tools\Umi-OCR_Paddle_v2.1.5\ in File Explorer, find Umi-OCR.exe, right-click it → Send to → Desktop (create shortcut). One-time fix.
Sanity check it's actually listening — this works from anywhere, no folder matters:
You get a response back (even a "decode failed" error counts — that means the API is up).
Open these two files in a text editor (paths shown are exact, full paths — paste straight into File Explorer's address bar if that's easier than browsing):
In both files, find the line near the top that looks like this, and change the text in quotes to match your working-folder name exactly — same capitalization, same punctuation, no extra words:
BOOK_TITLE is identical in both files, and matches the folder name under C:\library-pipeline\ exactly.
This command must run from the code folder — the place the actual program lives, not the per-book working folder. Both lines below go in the same PowerShell window, one after the other (or paste each into a brand-new window — either works, since the path is spelled out in full each time):
CODE FOLDERThis reads every page in this book's 00_source\ (over in the data folder) and writes one OCR text file per page into 01_extracted\, right next to it. You don't need to be "in" that folder for this to work — the script finds it using the BOOK_TITLE you set in Step 3.
It prints "No failures." and "Done: N pages -> <output_dir>".
Same code folder as Step 4. If you closed your terminal, run the cd line again first:
This writes the pivot file, 02_cleaned.md, into this book's data folder, plus a book_metadata.toml with stats.
It prints "Written: <path>" and "Size: <N> bytes".
Open this exact file in Notepad — it's in the data folder, not the code folder:
Skim for red-underline spell-check marks — you're scanning, not reading every word. Some misses are fine.
## Front Matter as the very first line of the file, or the audio step will truncate the opening.ignore_zones block into book_config.toml (same file as Step 1, data folder).force=True to the assemble_book(...) call in run_clean_assemble.py for this one run. Remove it again right after — leaving it on permanently disables the protection for edits you DO want to keep later.If you hand-edit this file, re-running Step 5 afterward is blocked unless you pass force=True — that's intentional, to protect your edits. Don't force it unless you mean to throw the edits away.
The prose reads cleanly, and the first line is "## Front Matter" if needed.
This step leaves the new pipeline entirely. The audio renderer isn't built yet, so audio still comes from the old v1 script, which takes 02_cleaned.md as its input — the same file you just reviewed.
Once it runs, the audio gets packaged with ffmpeg. It must re-encode, not stream-copy:
DATA FOLDERIf a partial .m4b or .mp3 already exists, the resume logic will skip the work — delete the partial file first to force a redo.
A finished .m4b (or per-chapter .mp3 files) exists.
Copy the finished audio to the NAS, then trigger an Audiobookshelf rescan.
The title shows up in Audiobookshelf after the rescan.