14.76. DD 76: Paivana - Fighting AI Bots with GNU Taler#

14.76.1. Summary#

This design document describes the architecture of an AI Web firewall using GNU Taler, as well as new features that are required for the implementation.

14.76.2. Motivation#

AI bots are causing enormous amounts of traffic by scraping sites like git forges. They neither respect robots.txt nor 5xx HTTP responses. Solutions like Anubis and IP-based blocking do not work anymore at this point.

14.76.3. Requirements#

  • Must withstand high traffic from bots, requests before a payment happened must be very cheap.

14.76.4. Proposed Solution#

Architecture

  • paivana-httpd is a reverse proxy that sits between ingress HTTP(S) traffic and the protected upstream service.

  • paivana-httpd is configured with a particular merchant backend.

  • A payment template must be set up in the merchant backend (called {template_id} from here on).

Steps:

  • Browser visits git.taler.net

  • paivana-httpd checks for a signed cookie

    • If cookie is set and valid request is reverse-proxied to upstream. Stop.

    • Otherwise, a paywall page is rendered, continue.

  • The browser (rendering the paywall page) generates a random session ID via JS.

  • Based on this session ID, a taler://pay-template/{mechant_backend}/{template_id}?session_id={session_id} URI is generated and rendered as a QR code and link.

  • The browser long-polls on a new {merchant_backend}/sessions/{session_id} endpoint that returns when an order with the given session ID has been paid for (regardless of the order ID, which is not known at this point).

  • A wallet now needs to instantiate the pay template and pay for the resulting order.

  • Once the long-poller has returned, the paywall makes a GET fetch request to {paivana_backend}/paivana-paid/{session_id}. Paivana checks the payment status with the merchant backend. If the payment has succeeded, it returns an HTTP response that sets a cookie. The browser reloads the page.

Cookie: The session cookie is computed as exp_timestamp || H(client_ip || session_id || exp_timestamp)

Problems:

  • A smart attacker might still create a lot of orders via the pay-template.

    • Solution A: Don’t care, unlikely to happen in the first place.

    • Solution B: Rate-limit template instantiation on a per-IP basis.

  • The long-polling might overwhelm the merchant

    • Should paivana-httpd instead reverse-proxy the long-polling to allow rate-limiting the long-polling?

Implementation:

  • Merchant needs new endpoint to long-poll on session_id

  • Merchant needs to support template instantiation with session_id.

  • Paivana component needs to be specified / implemented

  • Wallet-core needs support for a session_id in pay templates.

14.76.5. Test Plan#

  • Deploy it for git.taler.net

14.76.6. Definition of Done#

N/A

14.76.7. Alternatives#

14.76.8. Drawbacks#

  • Requires JavaScript

    • Could be made to work without JS by returning some Paivana: ... header.

14.76.9. Discussion / Q&A#

  • Do we introduce a new type of session_id for this or can/should we reuse the existing session_id feature?