Download the Bitter CLI
curl -fsSL https://install.bitter.sh | sh
~/.local/bin/bitter --no-update version
installable web evidence capture
BitterCrawl gives the Bitter CLI a local web reader: download the CLI, install the crawl package, then turn URLs into markdown, source artifacts, hashes, and receipts an agent can cite.
01 / install
BitterCrawl is available today as a local package and standalone CLI. Install the repo on `BITTER_PLUGIN_PATH` and the Bitter CLI can dispatch `bitter crawl` to the local plugin.
curl -fsSL https://install.bitter.sh | sh
~/.local/bin/bitter --no-update version
mkdir -p ~/.bitter/plugins
git clone https://github.com/sheetgenius/crawl.git \
~/.bitter/plugins/bittercrawl-plugin
cd ~/.bitter/plugins/bittercrawl-plugin
bun install
export BITTER_PLUGIN_PATH="$HOME/.bitter/plugins"
Add that export to your shell profile if you want the plugin available in every terminal.
bitter crawl https://example.com \
--out .bitter/crawl/runs \
--json
This routes through the `bitter.crawl` plugin and writes markdown, raw HTML, headers, result JSON, and a receipt.
02 / why
Agents need to open docs, inspect competitors, read changelogs, verify pricing pages, compare examples, and preserve evidence. That should be as ordinary as reading a local file or checking Git.
Hosted scrape APIs are useful escalation paths. They should not be the default economics of Bitter's research loop. BitterCrawl starts with local fetch and extraction, then records what happened so a later run can reuse and audit the source.
for builders
Docs, READMEs, changelogs, pricing pages, and competitor copy become local artifacts agents can reuse.
for operators
Every important fetch leaves behind the URL, timestamp, hashes, and files needed to reconstruct the claim.
Normal public pages should not require an account, API key, credit balance, or hosted dependency.
The durable output is not just markdown. It is a proof path: URL, fetch, artifacts, policy, and hashes.
Static fetch first. Crawl4AI, browser rendering, and hosted providers become explicit adapters.
03 / cli
The first wedge is deliberately narrow: one URL in, markdown and a receipt out. The standalone `bittercrawl` command and the Bitter plugin route should converge on the same `crawl.read` primitive.
standalone read
bun run src/cli.ts https://example.com
Fetch a page, extract main content, write local artifacts, and print useful markdown.
agent envelope
bun run src/cli.ts https://example.com --json
Return `bitter.crawl.result.v0` for agents, scripts, and future method graph calls.
explicit custody
bun run src/cli.ts https://docs.example.com --out .bitter/crawl/runs
Control where run directories, raw HTML, headers, markdown, and receipts are written.
Bitter plugin projection
bitter crawl https://example.com --json
The route exposed by the local plugin. A bundled release can remove the `BITTER_PLUGIN_PATH` setup later.
04 / receipt
A useful crawl should answer the questions a future agent will ask: what URL was read, when, by which adapter, what changed on redirect, what was saved, and which hashes identify the evidence.
why this matters
Summaries are cheap to produce and easy to misremember. Receipts make a web claim reconstructable without trusting the agent's prose.
05 / path
Native fetch, Readability, Turndown, markdown, raw artifacts, receipt.
Optional local adapter for richer extraction when users install it.
Crawlee or Playwright for rendered pages, screenshots, and sessions.
Firecrawl, Jina, or Bitter-hosted workers as configured fallback paths.
06 / faq
Install the Bitter CLI from `install.bitter.sh`, clone `sheetgenius/crawl` under `~/.bitter/plugins/bittercrawl-plugin`, run `bun install`, export `BITTER_PLUGIN_PATH="$HOME/.bitter/plugins"`, then use `bitter crawl https://example.com --json`.
Not first. The first version is a local-first CLI primitive that fetches a page, extracts markdown, writes artifacts, and emits a receipt. Hosted crawling belongs behind explicit provider flags.
It is a different default. Firecrawl is a useful compatibility adapter and benchmark. BitterCrawl is the custody layer: evidence, cache discipline, source artifacts, and receipts for Bitter's agent loop.
The local-static adapter covers normal documents first. Browser rendering, Crawl4AI, and hosted providers become explicit escalation paths when static fetch is not enough.
07 / first wedge
BitterCrawl Cloud can wait. The immediate product promise is simple: every agent can read a public page and leave a receipt.
$ bitter crawl https://example.com --json