18.76. DD 76: Paivana - Fighting AI Bots with GNU Taler#
18.76.1. Summary#
This design document describes the architecture of an AI Web firewall using GNU Taler, as well as new features that are required for the implementation.
18.76.2. Motivation#
AI bots are causing enormous amounts of traffic by scraping sites like git forges. They neither respect robots.txt nor 5xx HTTP responses. Solutions like Anubis and IP-based blocking do not work anymore at this point.
18.76.3. Requirements#
Must withstand high traffic from bots, requests before a payment happened must be very cheap, both in terms of response generation and database interaction.
Should work not just for our paivana-httpd but also for Turnstile-style paywalls that need to work with purely static paywall pages without PHP sessions.
18.76.4. Proposed Solution#
18.76.4.1. Architecture#
paivana-httpd is a reverse proxy that sits between ingress HTTP(S) traffic and the protected upstream service.
paivana-httpd is configured with a particular merchant backend.
A payment template must be set up in the merchant backend (called
{template_id}from here on).
Steps:
Browser visits
{website}(for example,https://git.taler.net) where{domain}is the domain name of{website}.paivana-httpd working as a reverse-proxy for
{website}. Whenever called for a non-whitelisted URL, it checks for a the presence of a Paivana cookie valid for this client IP address and{website}at this time. The Paivana Cookie is computed as:cur_time || '-' || H(website || client_ip || paivana_server_secret || cur_time).If such a cookie is set and valid, the request is reverse-proxied to upstream. Stop.
Otherwise, a static non-cachable paywall page is returned, including a machine-readable
PaivanaHTTP header with thetaler://pay-template/URL minus the client-computed{paivana_id}and fullfillment URL (see below). Continue.
The browser (rendering the paywall page) generates a random paivana ID via JS using the current time (
cur_time) in seconds since the Epoch and the current URL ({website}) plus some freshly generated entropy ({nonce}):paivana_id := cur_time || '-' || H(nonce || website || cur_time).The same computation could also easily be done by a non-JS client that processes the
PaivanaHTTP header (or a GNU Taler wallet running as a Web extension).Based on this paivana ID, a
taler://pay-template/{merchant_backend}/{template_id}?session_id={paivana_id}&fulfillment_url={website}URI is generated and rendered as a QR code and link, prompting the user to pay for access to the{website}using GNU Taler.The JavaScript in the paywall page running in the browser (or the non-JS client) long-polls on a new
https://{merchant_backend}/sessions/{paivana_id}endpoint that returns when an order with the given session ID has been paid for (regardless of the order ID, which is not known to the browser).A wallet now needs to instantiate the pay template, passing the
session_idand thewebsiteas an additional inputs to the order creation (the session ID here will work just like existing use ofsession_idsin session-bound payments). Similarly, thewebsiteworks as the fulfillment URL as usual.The wallet then must pay for the resulting order by talking to the Merchant backend.
When the long-poller returns and the payment has succeeded, the browser (still rendering the paywall page) also learns the order ID.
The JavaScript of the paywall page (or the non-JS client processing the
PaivanaHTTP header) then POSTs the order ID,nonce,cur_timeandwebsiteto{domain}/.well-known/pavivana.paivana-httpd computes the paivana ID and checks if the given order ID was indeed paid recently for the computed paivana ID. If so, it generates an HTTP response which the Paivana cookie and redirects to the fulfillment URL (which is the original {website}).
The browser reloads the page with the correct Paivana cookie (see first step).
18.76.4.2. Problems:#
A smart attacker might still create a lot of orders via the pay-template.
Solution A: Don’t care, unlikely to happen in the first place.
Solution B: Rate-limit template instantiation on a per-IP basis.
18.76.4.3. Implementation:#
Merchant backend needs way to lookup order IDs under a
session_id(DONE: e027e729..b476f8ae)Merchant backend needs way to instantiate templates with a given
session_idandfulfillment_url. This also requires extending the allowed responses for templates in general.Paivana component needs to be implemented
Wallet-core needs support for a
session_idandfulfillment_urlin pay templates.
18.76.5. Test Plan#
Deploy it for git.taler.net
18.76.6. Definition of Done#
N/A
18.76.7. Alternatives#
Do not re-use the session ID mechanism but introduce some new concept. This has the drawback of us needing additional tables and indicies, and also the existing use of the session ID is very parallel to this one.
18.76.8. Drawbacks#
This exposes an order ID to anyone who knows the session ID. This is clearly not an issue in this context, and for the existing uses of the session ID it also seems clear that knowledge of the session ID requires an attacker to have access that would easily also already give them any order ID, so this seems harmless.