12.18. DD 18: Forgettable Data in JSON Contract Terms

12.18.1. Summary

This document defines concepts and algorithms for handling the JSON format of contract terms with forgettable data in Taler payments.

12.18.2. Motivation

The contract terms JSON format used in Taler describes various aspects of a payment request, such as the amount to be paid, accepted payment service providers, a human-readable summary, a list of products and shipping information.

To support data minimization, it would be nice if some pieces of information stored in the contract terms (either in the storage of the merchant or the customer’s wallet) could be deleted as soon as they are not strictly required anymore.

However, the cryptographic hash of the contract terms is used throughout the Taler protocol as an opaque handle for the payment and associated processes. In an audit, a merchant might be asked to reveal the plain-text contract terms for a particular hash.

Thus the hashing of the contract terms needs to take into account the forgettable parts of a contract terms. The contract terms hash needs to be the same before and after forgetting a forgettable part of the contract terms.

12.18.3. Proposed Solution

Members of objects can be marked as forgettable by adding metadata to the contract terms JSON. Before hashing the contract terms JSON, it is first scrubbed and canonicalized. Scrubbing replaces forgettable members with a salted hash of their (recursively scrubbed and canonicalized) value. To prevent attempts at guessing the value of forgotten members, a salt is generated and stored in the contract terms for each forgettable member.

12.18.3.1. Constraints on Contract Terms JSON

In order to make it easy to get a canonical representation for JSON contract terms, the following restrictions apply:

  • Member names are restricted: Only strings matching the regular expression ^[0-9A-Z_a-z]+$ or the literal names $forgettable or $forgotten are allowed. This makes the sorting of object members easier, as RFC8785 requires sorting by UTF-16 code points.

  • Floating point numbers are forbidden. Numbers must be integers in the range -(2**53 - 1) to (2**52) - 1.

12.18.3.2. Marking Members as Forgettable

A property is marked as forgettable by including the property name as a key in the special $forgettable field of the property’s parent object.

{
 "delivery_address": "...",
 "$forgettable": {
   "delivery_address": "<salt>"
 },
}

Clients that write contract terms might not be able to easily generate the salt value. Thus, the merchant backend must also allow the following syntax in the order creation request:

{
 "$forgettable": {
   "delivery_address": true
 },
}

However, a JSON object with such a forgettable specification must be considered an invalid contract terms object.

12.18.3.3. Forgetting a Forgettable Member

To forget a forgettable member, it is removed from the parent object, and the salted hash of the member’s scrubbed and canonicalized value is put into the special $forgotten$ member of the parent object.

{
 ...props,
 "delivery_address": "...",
 "$forgettable": {
   "delivery_address": "<memb_salt>"
 },
}

=>

{
 ...props,
 "$forgotten": {
   "delivery_address": "<memb_salted_hash>"
 },
 "$forgettable": {
   "delivery_address": "<memb_salt>"
 },
}

The hash of a member value memb_val with salt memb_salt is computed as follows:

memb_val_canon = canonicalized_json(scrub(memb_val));

memb_salted_hash = hkdf_sha512({
  output_length: 64,
  input_key_material: memb_val_canon,
  salt: memb_salt,
});

When encoding memb_salted_hash with base32-crockford, the resulting output must be upper-case.

12.18.3.4. Scrubbing

A JSON object is scrubbed by recursively identifying and forgetting all forgettable fields.

12.18.3.5. Canonicalized Hashing

A JSON object is canonicalized by converting it to an ASCII byte array with the algorithm specified in RFC 8785. The resulting bytes are terminated with a single 0-byte and then hashed with SHA512.

12.18.3.6. Test vector

The following input contains top-level and nested forgettable fields, as well as booleans, integers, strings and objects as well as non-forgettable fields. It is thus suitable as a minimal interoperability test:

{
  "k1": 1,
  "_forgettable": {
    "k1": "SALT"
  },
  "k2": {
    "n1": true,
    "_forgettable": {
      "n1": "salt"
    }
  },
  "k3": {
    "n1": "string"
  }
}

Hashing the above contract results in the following Crockford base32 encoded hash 287VXK8T6PXKD05W8Y94QJNEFCMRXBC9S7KNKTWGH2G2J2D7RYKPSHNH1HG9NT1K2HRHGC67W6QM6GEC4BSN1DPNEBCS0AVDT2DBP5G.

Note that typically the salt values must be chosen at random, only for this test we use static salt values.

12.18.4. Discussion / Q&A

  • It is not completely clear which parts of the contract terms should be forgettable. This should be individually decided by the merchant based on applicable legislation.

  • Is it really necessary that there is one salt per forgettable member? We could also have a “contract terms global” salt, and then use the global salt and the path of the forgettable field as the salt for hashing.

  • Why do we require the 0-termination in the hash / kdf? Doesn’t seem to match what e.g. shasum does.

  • Why do we not supply any “info” string (= context chunks in the GNUNET_CRYPTO_kdf terminology) to the hkdf? Does it matter?

  • We could also delete the corresponding $forgettable entry after forgetting a member. This would save storage. But to prove that a certain forgettable info matches the contract terms, the prover would need to also store/provide the salt.

  • What validations should the wallet do? Should the wallet ever accept contract terms where fields are already forgotten?