14.76. DD 76: Paivana - Fighting AI Bots with GNU Taler#
14.76.1. Summary#
This design document describes the architecture of an AI Web firewall using GNU Taler, as well as new features that are required for the implementation.
14.76.2. Motivation#
AI bots are causing enormous amounts of traffic by scraping sites like git forges. They neither respect robots.txt nor 5xx HTTP responses. Solutions like Anubis and IP-based blocking do not work anymore at this point.
14.76.3. Requirements#
Must withstand high traffic from bots, requests before a payment happened must be very cheap.
14.76.4. Proposed Solution#
Architecture
paivana-httpd is a reverse proxy that sits between ingress HTTP(S) traffic and the protected upstream service.
paivana-httpd is configured with a particular merchant backend.
A payment template must be set up in the merchant backend (called
{template_id}from here on).
Steps:
Browser visits git.taler.net
paivana-httpd checks for a signed cookie
If cookie is set and valid request is reverse-proxied to upstream. Stop.
Otherwise, a paywall page is rendered, continue.
The browser (rendering the paywall page) generates a random session ID via JS.
Based on this session ID, a
taler://pay-template/{mechant_backend}/{template_id}?session_id={session_id}URI is generated and rendered as a QR code and link.The browser long-polls on a new
{merchant_backend}/sessions/{session_id}endpoint that returns when an order with the given session ID has been paid for (regardless of the order ID, which is not known at this point).A wallet now needs to instantiate the pay template and pay for the resulting order.
Once the long-poller has returned, the paywall makes a
GETfetch request to{paivana_backend}/paivana-paid/{session_id}. Paivana checks the payment status with the merchant backend. If the payment has succeeded, it returns an HTTP response that sets a cookie. The browser reloads the page.
Cookie: The session cookie is computed as exp_timestamp || H(client_ip || session_id || exp_timestamp)
Problems:
A smart attacker might still create a lot of orders via the pay-template.
Solution A: Don’t care, unlikely to happen in the first place.
Solution B: Rate-limit template instantiation on a per-IP basis.
The long-polling might overwhelm the merchant
Should paivana-httpd instead reverse-proxy the long-polling to allow rate-limiting the long-polling?
Implementation:
Merchant needs new endpoint to long-poll on
session_idMerchant needs to support template instantiation with
session_id.Paivana component needs to be specified / implemented
Wallet-core needs support for a
session_idin pay templates.
14.76.5. Test Plan#
Deploy it for git.taler.net
14.76.6. Definition of Done#
N/A
14.76.7. Alternatives#
14.76.8. Drawbacks#
Requires JavaScript
Could be made to work without JS by returning some
Paivana: ...header.
14.76.9. Discussion / Q&A#
Do we introduce a new type of session_id for this or can/should we reuse the existing session_id feature?