installable web evidence capture

Install the crawler. Give agents receipts.

BitterCrawl gives the Bitter CLI a local web reader: download the CLI, install the crawl package, then turn URLs into markdown, source artifacts, hashes, and receipts an agent can cite.

01 / install

From zero to a cited crawl in one terminal session.

BitterCrawl is available today as a local package and standalone CLI. Install the repo on `BITTER_PLUGIN_PATH` and the Bitter CLI can dispatch `bitter crawl` to the local plugin.

1

Download the Bitter CLI

curl -fsSL https://install.bitter.sh | sh
~/.local/bin/bitter --no-update version
2

Download the plugin

mkdir -p ~/.bitter/plugins
git clone https://github.com/sheetgenius/crawl.git \
  ~/.bitter/plugins/bittercrawl-plugin
cd ~/.bitter/plugins/bittercrawl-plugin
bun install
3

Register the plugin path

export BITTER_PLUGIN_PATH="$HOME/.bitter/plugins"

Add that export to your shell profile if you want the plugin available in every terminal.

4

Crawl through Bitter

bitter crawl https://example.com \
  --out .bitter/crawl/runs \
  --json

This routes through the `bitter.crawl` plugin and writes markdown, raw HTML, headers, result JSON, and a receipt.

02 / why

No toll booth in front of an agent's web sense.

Agents need to open docs, inspect competitors, read changelogs, verify pricing pages, compare examples, and preserve evidence. That should be as ordinary as reading a local file or checking Git.

Hosted scrape APIs are useful escalation paths. They should not be the default economics of Bitter's research loop. BitterCrawl starts with local fetch and extraction, then records what happened so a later run can reuse and audit the source.

for builders

Read primary sources without slowing down the loop.

Docs, READMEs, changelogs, pricing pages, and competitor copy become local artifacts agents can reuse.

for operators

Turn web claims into inspectable evidence.

Every important fetch leaves behind the URL, timestamp, hashes, and files needed to reconstruct the claim.

01

Local by default

Normal public pages should not require an account, API key, credit balance, or hosted dependency.

02

Receipt-bearing

The durable output is not just markdown. It is a proof path: URL, fetch, artifacts, policy, and hashes.

03

Provider-pluggable

Static fetch first. Crawl4AI, browser rendering, and hosted providers become explicit adapters.

03 / cli

A small command surface with a stable envelope.

The first wedge is deliberately narrow: one URL in, markdown and a receipt out. The standalone `bittercrawl` command and the Bitter plugin route should converge on the same `crawl.read` primitive.

standalone read

bun run src/cli.ts https://example.com

Fetch a page, extract main content, write local artifacts, and print useful markdown.

agent envelope

bun run src/cli.ts https://example.com --json

Return `bitter.crawl.result.v0` for agents, scripts, and future method graph calls.

explicit custody

bun run src/cli.ts https://docs.example.com --out .bitter/crawl/runs

Control where run directories, raw HTML, headers, markdown, and receipts are written.

Bitter plugin projection

bitter crawl https://example.com --json

The route exposed by the local plugin. A bundled release can remove the `BITTER_PLUGIN_PATH` setup later.

04 / receipt

The receipt is the product charter.

A useful crawl should answer the questions a future agent will ask: what URL was read, when, by which adapter, what changed on redirect, what was saved, and which hashes identify the evidence.

schemabitter.crawl.receipt.v0
fetchrequested URL, final URL, status, content type, elapsed time
artifactspage.md, raw.html, headers.json, result.json, receipt.json
hashesmarkdown_sha256 and html_sha256
policyuser agent, cache status, robots posture

why this matters

Summaries are cheap to produce and easy to misremember. Receipts make a web claim reconstructable without trusting the agent's prose.

05 / path

Start boring. Add power where evidence demands it.

now

local-static

Native fetch, Readability, Turndown, markdown, raw artifacts, receipt.

next

crawl4ai

Optional local adapter for richer extraction when users install it.

later

browser

Crawlee or Playwright for rendered pages, screenshots, and sessions.

explicit

hosted

Firecrawl, Jina, or Bitter-hosted workers as configured fallback paths.

06 / faq

The default is local. The escape hatches stay explicit.

How do I install it today?

Install the Bitter CLI from `install.bitter.sh`, clone `sheetgenius/crawl` under `~/.bitter/plugins/bittercrawl-plugin`, run `bun install`, export `BITTER_PLUGIN_PATH="$HOME/.bitter/plugins"`, then use `bitter crawl https://example.com --json`.

Is BitterCrawl a hosted crawler?

Not first. The first version is a local-first CLI primitive that fetches a page, extracts markdown, writes artifacts, and emits a receipt. Hosted crawling belongs behind explicit provider flags.

Is this a Firecrawl replacement?

It is a different default. Firecrawl is a useful compatibility adapter and benchmark. BitterCrawl is the custody layer: evidence, cache discipline, source artifacts, and receipts for Bitter's agent loop.

What happens on JavaScript-heavy pages?

The local-static adapter covers normal documents first. Browser rendering, Crawl4AI, and hosted providers become explicit escalation paths when static fetch is not enough.

07 / first wedge

Start with a local command.

BitterCrawl Cloud can wait. The immediate product promise is simple: every agent can read a public page and leave a receipt.

$ bitter crawl https://example.com --json