Encoding for Robust Immutable Storage (E

Encoding for Robust Immutable Storage (ERIS)
Version:
1.0.0
Published:
23 August 2022
Encoding for Robust Immutable Storage (ERIS)
Abstract
This document describes the Encoding for Robust Immutable Storage (ERIS). ERIS is an encoding of arbitrary content into a set of uniformly sized, encrypted and content-addressed blocks as well as a short identifier that can be encoded as an URN. The content can be reassembled from the blocks only with this identifier. The encoding is defined independent of any storage and transport layer or any specific application. We illustrate how ERIS can be used as to build robust and decentralized applications.
Table of Contents
1.
Introduction
Unavailability of content on computer networks is a major cause for reduced reliability of networked services
Polleres20
Availability can be increased by caching content on multiple peers. However most content on the Internet is identified by its location. Caching location-addressed content is complicated as the content receives a new location when cached.
An alternative to identifying content by its location is to identify content by its content itself. This is called content-addressing. The hash of some content is computed and used as an unique identifier for the content.
Caching content-addressed content and making it available redundantly is much easier as the content is completely decoupled from any physical location. Integrity of content is automatically ensured with content-addressing (when using a cryptographic hash) as the identifier of the content can be recomputed to check that the content matches the requested identifier.
However, naive content-addressing has certain drawbacks:
Large content is stored as a large chunk of data. In order to optimize storage and network operations it is better to split up content into smaller uniformly sized blocks and reassemble blocks when needed.
Unencrypted: Content is readable by all peers involved in transporting, caching and storing content.
ERIS addresses these issues by splitting content into small uniformly sized and encrypted blocks. These blocks can be reassembled to the original content only with access to a short
read capability
, which can be encoded as an URN.
Encodings similar to ERIS are already widely-used in applications and protocols such as GNUNet (see
Section 1.3
), BitTorrent, Freenet
Freenet
, Gnutella, Direct Connect
THEX
and others. However, they all use slightly different encodings that are tied to the respective protocols and applications. ERIS defines an encoding independent of any specific protocol or application and decouples content from transport and storage layers. ERIS may be seen as a modest step towards Information-Centric Networking
RFC7927
1.1.
Objectives
The objectives of ERIS are:
Availability
Content encoded with ERIS can be easily replicated and cached to increase the availability of content.
Data integrity
Integrity of content is verified while decoding content from blocks.
Intermediary Peer Deniability
Intermediary peers, who are storing and transporting encoded blocks without access to a read capability, can claim that decrypting encoded content is infeasible for them.
Censorship Resistance
An adversary who does not have access to a read capability of some encoded content can not selectively block access to the content without blocking access to all content.
Deterministic Identifiers
The read capability can be used as a deterministic identifier of the encoded content.
URN reference
ERIS encoded content can be referenced with a single URN (the encoded read capability).
Storage efficiency
ERIS can be used to encode small content (< 1 kibibyte) as well as large content (> many gibibyte) with reasonable storage overhead.
Simplicity
The encoding should be as simple as possible in order to allow correct implementation on various platforms and in various languages.
Confidentiality is not an objective of ERIS and ERIS
SHOULD NOT
be used to ensure that content is kept secret from an adversary.
See
Section 4
for security considerations.
1.2.
Scope
ERIS describes how arbitrary content (sequence of bytes) can be encoded into a set of uniformly sized blocks and an identifier with which the content can be decoded from the set of blocks.
ERIS does not prescribe how the blocks should be stored or transported over network. The only requirement is that a block can be referenced and accessed (if available) by the hash value of the contents of the block. In
Section 3.1
we show how existing technology can be used to store and transport blocks.
There is also no support for grouping content or mutating content. In
Section 3.3
we describe how such functionality can be implemented on top of ERIS.
ERIS is an attempt to find a minimal common basis on which higher functionality, such as mutability, can be built.
1.3.
Previous Work
ERIS is inspired and based on the encoding used in the file-sharing application of
GNUNet
- Encoding for Censorship-Resistant Sharing (ECRS)
ECRS
ERIS differs from ECRS in following points:
Cryptographic primitives
ECRS itself does not specify any cryptographic primitives. The GNUNet implementation uses the SHA-512 hash and AES cipher. ERIS uses the Blake2b-256 cryptographic hash
RFC7693
and the ChaCha20 stream cipher
RFC8439
. This improves performance, storage efficiency (as hash references are smaller) and allows a convergence secret to be used (via Blake2b keyed hashing; see
Section 2.3
).
Block size
ECRS uses a fixed block size of 32 KiB. This can be very inefficient when encoding many small pieces of content. ERIS allows a block size of 1 KiB or 32 KiB, allowing efficient encoding of small and large content (see
Section 2.2
).
URN
ECRS does not specify an URN for referring to encoded content (this is specified as part of the GNUNet file-sharing application). ERIS specifies an URN for encoded content regardless of encoding application or storage and transport layer (see
Section 2.7
).
Namespaces
ECRS defines two mechanisms for grouping and discovering encoded content (SBlock and KBlock). ERIS does not specify any such mechanisms (see
Section 3.3
).
Other related projects include Tahoe-LAFS, Freenet and Datashards. The reader is referred to the ECRS paper
ECRS
for an in-depth explanation and comparison of related projects.
1.4.
Versioning
This specification is versioned according to
Semantic Versioning 2.0.0
The most recent version of the specification is published at
Patch version may be incremented with editorial fixes. Minor version may be incremented with new backwards compatible functionality. Major version will be incremented when backwards incompatible changes are necessary (e.g. update of cryptographic primitives).
See also
Appendix "Changelog"
1.5.
Terminology
The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this document are to be interpreted as described in RFC 2119
RFC2119
We use binary prefixes for multiples of bytes, i.e: 1024 bytes is 1 kibibyte (KiB), 1024 kibibytes is 1 mebibyte (MiB) and 1024 mebibytes is 1 gigibytes (GiB).
2.
Specification of ERIS
2.1.
Cryptographic Primitives
The cryptographic primitives used by ERIS are a cryptographic hash function, a symmetric key cipher and a padding algorithm. The hash function and cipher are readily available in open-source libraries such as
libsodium
or
Monocypher
. The padding algorithm can be implemented with reasonable effort.
2.1.1.
Cryptographic Hash Function
We use Blake2b
RFC7693
with output size of 256 bit (32 byte). The keying feature is used and we refer to the key used for keying Blake2b as the hashing key
. The hashing key always has a size of 256 bit (32 byte) (see
Section 2.3
).
The functions provided are
Blake2b-256-Keyed(INPUT, HASHING-KEY)
for keyed hashing and
Blake2b-256(INPUT)
for unkeyed hashing.
2.1.2.
Symmetric Key Cipher
We use the ChaCha20 (IETF variant)
RFC8439
stream cypher. The provided function is
ChaCha20(INPUT, KEY, NONCE)
, where
INPUT
is an arbitrary length byte sequence,
KEY
is the 256 bit encryption key and
NONCE
is the 96 bit nonce. The output is the encrypted byte sequence.
The 32 bit initial counter is set to null.
Decryption is done with the same function where
INPUT
is the encrypted byte sequence.
2.1.3.
Padding Algorithm
We use a byte padding scheme to ensure that input content size is a multiple of a block size. The procedures
Pad
and
Unpad
, as described below, provide the necessary functionality
The padding scheme used is the same as the one implemented in the
libsodium
cryptographic library.
2.1.3.1.
Pad
The procedure
Pad(input, block-size)
given
input
of length
adds a mandatory byte valued
0x80
(hexadecimal) to
input
followed by
m < block-size
bytes valued
0x00
such that
n + m + 1
is the smallest multiple of
block-size
Pad(input, block-size):
n := Length(input)

# the final modulo block-size ensures that if (n + 1) modulo block-size = 0 no unnecessary zeroes are added.
m := (block-size - (n + 1 modulo block-size)) modulo block-size

# add mandatory 0x80 followed by m bytes of zeroe (0x00)
padded := input ++ [ 0x80 ] ++ ZeroBytes(m)

assert (Length(padded) modulo block-size = 0)

return padded
Figure 1
Pad
procedure
2.1.3.2.
Unpad
The procedure
Unpad(input, block-size)
starts reading bytes from the end of
input
until a
0x80
is read and then returns bytes of
input
before the
0x80
. The procedure throws an error if a value other than
0x00
is read before reading
0x80
, if no
0x80
is read after reading
block-size
bytes from the end of
input
or if length of
input
is less than
block-size
Unpad(input, block-size):

# ensure that input is at least as large as block-size
if Length(input) < block-size:
throw "invalid padding"

n := Length(input)

for i in Range(0, block-size):

# read the ith byte from the end of input
byte = input[n - i]

if byte = 0x80:
# special marker is reached, return everyting before it
return input[0:(n-i)]
else if byte = 0x00:
# continue with next byte from right
continue
else
# padding must be 0x00 or 0x80
throw "invalid padding"

# no 0x80 has been read after reading block-size bytes from the right of input
throw "invalid padding"
Figure 2
Unpad
procedure
Implementations
MUST
check that padding is valid when unpadding. This is verified in the negative test vectors (see
Section 2.8.6.2
).
2.2.
Block Size
ERIS uses two block sizes: 1KiB (1024 bytes) and 32KiB (32768 bytes). The block size must be specified when encoding content.
Both block sizes can be used to encode content of arbitrary size. The block size of 1KiB is an optimization towards smaller content.
The block size is encoded in the read capability and the decoding process is capable of handling both block sizes.
Implementations
MUST
support encoding and decoding content with both block sizes (1KiB and 32KiB).
2.2.1.
Recommendation on Block Size Choice
Applications are
RECOMMENDED
to use a block size of 1KiB for content smaller than 16KiB and a block size of 32KiB for larger content.
When using block size 32KiB to encode content smaller than 1KiB, the content will be encoded in a 32KiB block. This is a storage overhead of over 3100%. When encoding very many pieces of small content (e.g. short messages or cartographic nodes) this overhead might not be acceptable. On the other hand, using block size 1KiB to encode large content is also not efficient, as the content is split into many small 1KiB blocks and must be reassembled using internal nodes (see
Section 2.4.3
). When encoding larger content it is more efficient to use a block size of 32KiB. Using 16KiB as a breaking point is reasonable for most applications.
Note that the best choice of block size may depend on other factors such as number of round-trips to the storage layer. Content larger than 1KiB encoded with block size 1KiB will always be encoded in multiple levels, requiring multiple calls to a storage layer. For certain applications it might be better to minimize the number of calls to the storage layer at the cost of higher storage overhead.
In other applications the size of the content to be encoded might not be known when encoding starts and block size must be chosen (see
Section 2.8.2
). In such cases applications should use appropriate heuristics.
2.3.
Convergence Secret
Using the hash of the content as key is called
convergent encryption
Because the hash of the content is deterministically computed from the content, the key will be the same when the same content is encoded twice. This results in de-duplication of content, as well as deterministic identifiers. Both are useful properties for certain applications.
However, convergent encryption suffers from two known attacks that allow adversaries to either confirm the presence of some encoded content or even learn the content when parts are predictable (see
Section 4.6
for details). A solution to both attacks is to use a
convergence secret
ERIS requires a 32 byte convergence secret to be specified when encoding some content. Using different convergence secrets to encode the same content will result in different blocks and different read capabilities. This prevents deterministic identifiers and de-duplication, but allows a slightly stronger form of censorship resistance (see
Section 4.5
).
The convergence secret only needs to be provided during encoding. The content can be decoded from blocks without access to the convergence secret (see
Section 2.5
).
Users who do not want to be vulnerable to the known attacks against convergent encryption and are ready to give up determistic identifiers and de-duplication
SHOULD
use a random (cryptographically secure) and unique convergent secret for every encoded content. Note that even while using random and unique convergent secrets ERIS
SHOULD NOT
be used to ensure confidentiality.
A group using a shared convergence secret can benefit from the advantages of convergent encryption (de-duplication and deterministic identifiers) while being safe against certain attacks from adversaries that do not know the convergence secret.
If the known attacks against convergent encryption are well understood and the advantages of deterministic identifiers and de-duplication outweigh, then the
null convergence secret
(32 bytes of zeroes)
MAY
be used.
The convergence secret is implemented as the keying feature of the Blake2 cryptographic hash
RFC7693
2.4.
Encoding
Inputs to the encoding process are:
content
An arbitrary length byte sequence of content to be encoded.
convergence-secret
A 256 bit (32 byte) byte sequence (see
Section 2.3
).
block-size
The block size used for encoding in bytes can be either 1024 (1KiB) or 32768 (32KiB) (see
Section 2.2
).
The output of the encoding process is a set of uniformly sized blocks and a read capability.
The encoding process constructs a tree of uniformly sized
nodes
. Leaf nodes at the bottom of the tree (level 0) correspond to the input content. Internal nodes (not leaf nodes) consist of references to nodes at lower levels
The maximum number of references to nodes of a lower level in a node is called the
arity
of the tree and depends on the block size. For block size 1KiB the arity of the tree is 16, for block size 32KiB the arity is 512.
A simplified view of a tree constructed by encoding some content that is split into four leaf nodes is given in
Figure 3
. For illustration purposes the tree is of arity 2 (instead of 16 or 512).
Figure 3
A tree of content that is split into four leaf nodes
Nodes are unencrypted elements of the tree
. The encoding process output
blocks
which are encrypted nodes
As nodes are always stored as encrypted blocks, it is necessary to have the reference to the encrypted block as well as the encryption key to decrypt the node from the block. A reference to a node is a pair consisting of a reference to an encrypted block and the key to decrypt the block. We call such a pair a
reference-key pair
. The reference is the unkeyed Blake2b hash of the encrypted block (32 bytes) and the key is the ChaCha20 key to decrypt the block (32 bytes). The concatenation of a reference-key pair is 64 bytes long (512 bits).
Internal nodes of the encoding tree (i.e. not leaf nodes) are concatenations of reference-key pairs (see
Section 2.4.3
).
The block size, the level of the root node and the reference-key pair to the root node (
root reference
and
root key
) are the necessary pieces of information required to decode content (see
Section 2.5
). The tuple consisting of block size, level, root reference and key is called the
read capability
The steps of the encoding process are:
Split content into leaf nodes (see
Section 2.4.1
).
Encrypt leaf nodes into blocks and reference-key pairs (see
Section 2.4.2
).
Recursively construct tree by making nodes from reference-key pairs to nodes at lower level (see
Section 2.4.3
) and encrypting nodes to blocks and reference-key pairs (see
Section 2.4.2
) until there is a single root node.
Construct the read capability from the tree level, used block size and reference-key pair to the root node.
A pseudocode implementation of the encoding process:
ERIS-Encode(content, convergence-secret, block-size):
# initialize empty set of blocks to be output
blocks := {}

# split content into leaf nodes
leaf-nodes := Split-Content(content, block-size)

# initialize a list of reference-key pairs
reference-key-pairs := []

# initialize level to 0
level := 0

# encrypt leaf nodes to blocks and reference-key pairs
for leaf-node in leaf-nodes:
block, reference, key := Encrypt-Node(leaf-node, level, convergence-secret)

# add block to set of blocks to output
blocks := blocks ∪ { block }

# add reference-key pair to list of reference-key pairs
reference-key-pairs := reference-key-pairs ++ [ (reference, key) ]

# we can be sure that there is at least a single reference-key pair
assert Length(reference-key-pairs) >= 1

# Construct higher levels until there is a single reference-key pair
while Length(reference-key-pairs) > 1:

# increment level
level := level + 1

# construct list of nodes at current level
nodes := Construct-Internal-Nodes(reference-key-pairs, level, convergence-secret)

# clear the reference-key pairs
reference-key-pairs := []

# encrypt nodes to blocks and reference-key pairs
for node in nodes:
block, reference, key := Encrypt-Node(node, level, convergence-secret)

# add block to set of blocks to output
blocks := blocks ∪ { block }

# add reference-key pair to list of reference-key pairs
reference-key-pairs := reference-key-pairs ++ [ (reference, key) ]

# after constructing tree there is a single reference key pair
assert Length(reference-key-pairs) = 1

# get the root reference-key pair (pointing to the root node)
root-reference, root-key := reference-key-pairs[0]

# construct the read capability
read-capability := block-size, level, root-reference, root-key

return read-capability, blocks
Figure 4
ERIS-Encode procedure
The sub-processes
Split-Content
Encrypt-Leaf-Node
Encrypt-Internal-Node
and
Construct-Internal-Nodes
are explained in the following sections.
Note that the pseudocode implementation is provided for illustration purposes. Implementers are
RECOMMENDED
to implement a more efficient streaming encoder (see
Section 2.8.2
).
2.4.1.
Split Content into Leaf Nodes
The input content is split into uniformly sized blocks of size
block-size
. In order that this is always possible, the content is first padded.
Figure 5
Content padded and split into three leaf nodes
A pseudo code implementation:
Split-Content(content, block-size):
# pad content
padded := Pad(content, block-size)

# length of padded content is a multiple of block size
assert Length(padded) mod block-size = 0

# split padded content
leaf-nodes := split padded into pieces of size exactly block-size

return leaf-nodes
Figure 6
Split-Content procedure
Note that when the length of the content is a multiple of the block size, then an entire leaf node will be created with padding. This is necessary as we do not encode the length of the content but require mandatory padding.
Figure 7
illustrates such a case.
Figure 7
Last leaf node is entirely padding
2.4.2.
Encrypt Node
Unencrypted nodes (leaf nodes or internal nodes) are encrypted into blocks. We use two slightly different procedures to encrypt leaf nodes and internal nodes. Leaf nodes use the convergence secret whereas internal nodes do not.
The
Encrypt-Leaf-Node
procedure is used to encrypt leaf nodes (level 0). It takes an unencrypted leaf node and the convergence secret and returns the encrypted block as well as a reference-key pair to the block:
Encrypt-Leaf-Node(node, convergence-secret):
# use the keyed Blake2b hash to compute the encryption key
key := Blake2b-256-Keyed(node, convergence-secret)

# nonce is 12 bytes of 0
nonce := [0; 0; 0; 0; 0; 0; 0; 0; 0; 0; 0; 0]

# encrypt node to block
block := ChaCha20(node, key, nonce)

# compute reference to encrypted block using unkeyed Blake2b
reference := Blake2b-256(block)

return block, reference, key
Figure 8
Encrypt-Leaf-Node procedure
The
Encrypt-Internal-Node
procedure is used to encrypt internal nodes (level 1 and above). It takes an unencrypted node and the level of the node as input and returns the encrypted block as well as a reference-key pair to the block:
Encrypt-Internal-Node(node, level):
# level is always larger than 0
assert level > 0

# use the unkeyed Blake2b hash to compute the encryption key
key := Blake2b-256(node)

# first byte of nonce is level of the node followed by 11 bytes of zero
nonce := [level; 0; 0; 0; 0; 0; 0; 0; 0; 0; 0; 0]

# encrypt node to block
block := ChaCha20(node, key, nonce)

# compute reference to encrypted block using unkeyed Blake2b
reference := Blake2b-256(block)

return block, reference, key
Figure 9
Encrypt-Internal-Node procedure
For both
Encrypt-Leaf-Node
and
Encrypt-Internal-Node
the ChaCha20 nonce is set to a single byte containing the level of the block followed by 11 bytes of null (zero) (for
Encrypt-Leaf-Node
the level is 0). By doing this we have implicitly encoded the level of the block in the encrypted block. This prevents certain vulnerabilities where the level of a block is confused.
Note that we reuse the same nonce for all blocks at a certain level. This is safe as we always use different keys for different input content.
The convergence secret is only used to compute the encryption key of leaf nodes. This is a minor performance optimization and allows the read capability to be strongly verified in certain cases (see
Section 2.5, Paragraph 4
).
Note that the convergence secret is not used to compute the reference to the encrypted block. This allows blocks to be dereferenced and verified without knowing the convergence secret.
2.4.3.
Construct Internal Nodes
Internal nodes of the tree (not leaf nodes) are constructed by concatenating at most arity reference-key pairs to nodes of a the next lower level.
If there are less than arity number of references-key pairs to collect in a node, then the node is filled up to
block-size
with zeroes. This ensures that nodes (and blocks) always have the same size (
block-size
).
The procedure
Construct-Internal-Nodes
takes as input a non-empty list of reference-key pairs and the block size and returns a list of nodes. A pseudocode implementation is provided:
Construct-Internal-Nodes(reference-key-pairs, block-size):
# input is always non-empty
assert Length(reference-key-pairs) > 0

# compute arity
arity := block-size / 64

# initialize empty list of nodes to return
nodes := []

while Length(reference-key-pairs) > 0:
# take at most arity reference-key pairs from the left of reference-key-pairs
node-reference-key-pairs, rest :=
Take-At-Most-N-From-Left(reference-key-pairs, arity)

# concatenate all reference-key pairs to a node
node := Concat(node-reference-key-pairs)

# make sure node has size block-size by filling up with zeroes
if Length(node) < block-size:
node := Fill-with-zeroes(node, block-size)

# add node to list of nodes to return
nodes := nodes ++ [ node ]

# set reference-key-pairs to rest
reference-key-pairs := rest

return nodes
Figure 10
Construct-Internal-Nodes procedure
Note that the method for filling up internal nodes with zeroes to make sure they have size block-size is different than the padding algorithm used on content (see
Section 2.4.1
). We do not need the special marker byte (
0x80
; see
Section 2.1.3
) as structures in internal nodes (reference-key pairs) have fixed sizes. When decoding we process reference-key pairs until we reach a reference-key pair that is all zeroes (see
Section 2.5, Paragraph 3
).
An example node with three reference-key pairs and filled with zeroes is illustrated in
Figure 11
Figure 11
Internal node filled with zeroes
If in a tree of encoded content the number of nodes at any level is not a multiple of the arity, then there will be internal nodes that are filled with zeroes. Because internal nodes are always filled with reference-key pairs to the left, trees are always left-aligned.
Figure 12
illustrates such a left-aligned tree.
Figure 12
A tree with arity two and five leaf nodes
2.5.
Decoding
Given a read capability and access to blocks via a block storage the content can be decoded as shown in the procedure
ERIS-Decode
ERIS-Decode-Recurse(block-size, level, reference, key):

# dereference the node
node := Dereference-Node(reference, key, level, block-size)

if level = 0:
# node is a leaf node
return node
else:
# node is an internal node

# initialize an output
output := []

# iterate over all the reference-key-pairs in the node
for reference-key-pair in node:
# decode sub-tree and append to output
output := output ++ Eris-Decode-Recurse(level - 1, reference-key-pair)

return output

ERIS-Decode(block-size, level, root-reference, root-key):

# verify integrity of read capability key if level is larger than 0
if level > 0:
Verify-Key(level, root-reference, root-key)

padded := ERIS-Decode-Recurse(block-size, level, root-reference, root-key)

return Unpad(padded, block-size)
Figure 13
ERIS-Decode procedure
When iterating over reference-key pairs in nodes we read 64 bytes from the node. If the 64 bytes are non-zero they are split into a reference-key pair and iterated over. If 64 bytes of zeroes are encountered the rest of the node
MUST
be checked to be all zeroes (see
Section 2.4.3, Paragraph 5
). If rest of node is all zeros there are no more reference-key pairs in the node and iteration ends. If a non-zero byte is encountered after reading 64 bytes of zeroes then an error must be thrown indicating an invalid internal node.
Implementations
MUST
verify the key appearing in the read capability if level of encoded content is larger than 0.
The procedures
Dereference-Node
and
Verify-Key
are provided in the next section.
Note that the pseudocode implementation is provided for illustration purposes. Implementers are
RECOMMENDED
to implement a more efficient streaming or random-access decoder (see
Section 2.8.3
and
Section 2.8.4
).
2.5.1.
Dereference Nodes
A node can be dereferenced given a reference-key pair by first retrieving the block for the reference and decrypting the block using the key.
The block storage is represented by some procedure
Block-Storage-Get
that takes as input a reference and outputs the corresponding block, such that
Blake2b-256(block) = reference
The procedure
Dereference-Node
returns a node given inputs a reference-key pair and the level of the node to dereference. A pseudocode implementation is provided:
Dereference-Node(reference-key-pair, level, block-size):
# unpack reference-key pair
reference, key := reference-key-pair

# get block
block := Block-Storage-Get(reference)

# ensure that block has correct size
if not Length(block) = block-size:
throw "block has invalid size"

# ensure that block is valid for reference
if not Blake2b-256(block) = reference:
throw "block is invalid"

# first byte of nonce is level of the node followed by 11 bytes of zero
nonce := [level; 0; 0; 0; 0; 0; 0; 0; 0; 0; 0; 0]

node := ChaCha20(block, key, nonce)

return node
Figure 14
Dereference-Node
procedure
Implementations
MUST
ensure that blocks have expected size and are valid (i.e. that
Blake2b-256(block) = reference
).
2.5.2.
Verify Key
The key to an internal node is computed using unkeyed Blake2b without the convergence secret (see
Section 2.4.2
). This allows the key itself to be verified by recomputing the Blake2b hash of the node.
Verify-Key(level, reference, key):
# ensure that reference-key pair is to an internal node
assert level > 0

# dereference the node
node := Dereference-Node(reference, key, level)

# verify integrity of key
if not Blake2b-256(node) = key:
throw "key in read capability is invalid"
Figure 15
Verify-Read-Capability
procedure
Note that this check only works if the level of the encoded content is larger than one. If the level is 0 then the reference-key pair points to a leaf node for which the key can not be verified as it is computed by using the convergence secret (see
Section 2.4.2
).
2.6.
Binary Encoding of Read Capability
The read capability consisting of the block-size, level of root reference-key pair as well as the root reference-key pair form the necessary pieces of information required to decode content.
We specify an binary encoding of the read capability to 66 bytes:
Table 1
Binary Encoding of Read Capability
Byte offset
Content
Length (in bytes)
block size (
0x0a
for block size 1KiB and
0x0f
for block size 32KiB)
level of root reference-key pair as unsigned integer
root reference
32
34
root key
32
2.6.1.
CBOR Tag
The CBOR tag
276
is assigned for a ERIS binary read capability (see
Section 6.1
). This allows efficient references to ERIS encoded content from CBOR
RFC8949
2.6.2.
GNU Name System
The GNU Name System
LSD0001
is a decentralized and censorship-resistant name system that can be used to resolve memorable names to secure identifiers. A GNU Name System record type for ERIS read capabilities is defined (see
Section 7.1
) allowing any ERIS encoded content to be associated with memorable names.
2.7.
URN
A read capability can be encoded as an URN
RFC8141
using the namespace identifier
eris
and the unpadded Base32
RFC4648
encoding of the read capability as namespace specific string, i.e:
urn:eris:BASE32-READ-CAPABILITY
For example the ERIS URN of the UTF-8 encoded string "Hello world!" (with block size 1KiB and null convergence secret):
urn:urn:eris:BIAD77QDJMFAKZYH2DXBUZYAP3MXZ3DJZVFYQ5DFWC6T65WSFCU5S2IT4YZGJ7AC4SYQMP2DM2ANS2ZTCP3DJJIRV733CRAAHOSWIYZM3M
2.7.1.
Block URN
Blocks are referenced by their Blake2b-256 hash (see
Section 2.4, Paragraph 9
). In certain circumstances it is useful to encode such a reference as an URN. Applications are
RECOMMENDED
to use the URN namespace identifier
blake2b
and the unpadded Base32
RFC4648
encoding of the reference as namespace specific string, i.e:
urn:blake2b:BASE32-REF
For example the URN encoding of a reference to a block might look like this:
urn:blake2b:H77AGSYKAVTQPUHODJTQA7WZPTWGTTKLRB2GLMF5H53NEKFJ3FUQ
2.8.
Implementation Notes
This section contains notes on implementing ERIS.
For further inspiration there are implementations available in programming languages such as
Guile Scheme
Nim
OCaml
Smalltalk
Python
or
Rust
. See also the
project website
for an updated list of existing implementations.
2.8.1.
Specification Version
Implementations
SHOULD
provide an interface that describes which version of the specification is implemented.
2.8.2.
Streaming Encoding
The pseudocode algorithms for encoding presented in
Section 2.4
assumes that the entire content is available in memory when encoding. This is not always practical or feasible. For efficiency purposes and in order to handle content that is larger than memory it is necessary to encode content in smaller pieces and eagerly emit partial results while maintaining a small amount of state. We call this
streaming encoding
ERIS allows streaming encoding and it is
RECOMMENDED
to implement such an encoding procedure.
A streaming encoder can be implemented by maintaining a list of reference-key pairs that have not yet been collected into a node at each level. As soon as enough reference-key pairs are available they are combined into a node, encrypted to a block and emitted. Such an encoding procedure needs to maintain an amount of state proportional to the level of tree. The level of the tree is bound by the logarithm of the content size (with base arity of the tree).
For an example, see
the reference Guile implementation
2.8.3.
Streaming Decoding
A streaming decoder can be implemented by eagerly emitting decoded leaf nodes instead of collecting them to a final output as done in the
Eris-Decode
(see
Figure 13
). The amount of state that needs to be maintained is proportional to the level of the tree the content is encoded to.
2.8.4.
Random-Access Decoding
It is possible to efficiently decode a fixed number of bytes from some given offset without decoding the entire content or de-referencing all blocks that the content is encoded to. This can be done by selectively descending down the tree of the encoded content to the leaf nodes that hold the requested bytes. A decoder that is capable of decoding small pieces of some encoded content efficiently and furthermore can seek to given positions for further reading is called a
random-access decoder
Applications of random-access decoders include seeking in encoded media files, reading individual files from an encoded file-system (see
ERIS-FS
) or identifying file-types based on magic bytes at fixed positions.
Implementers should consider implementing a decoder on which following procedures are defined:
Position()
Returns the current offset from the start of the encoded content.
Seek(p)
Sets current position to
Read(n)
Decodes and returns
bytes of the encoded content from current position.
Length()
Returns the length of the encoded content (without padding). The length of some encoded content can be found by descending down the tree by the right-most node and unpadding the last leaf node. Implementers should expose such a functionality to users as estimating the length without insight into the encoded tree is inefficient.
Implementing a random-access decoder seems to be the most challenging aspect of implementing ERIS. Implementers are encouraged to use extensive testing (e.g. property-based testing). Please don't hesitate to get in touch with the maintainers if you are implementing a random-access decoder (see
Section 5
). We are very curious to see your implementation!
The
Guile Scheme
and
OCaml
implementations have a random-access decoder.
2.8.5.
De-referencing Blocks
The procedure
Block-Storage-Get
as used in
Section 2.5
may fetch blocks from various transport and storage layers (see
Section 3.1
). Implementors of ERIS libraries
SHOULD
provide abstractions or interfaces that allow users to use their own custom transport and storage layers.
Implementations
SHOULD
also consider cacheing block de-references or allowing users to define suitable caching mechanisms. This is especially relevant for random-access decoders.
2.8.6.
Test Vectors
Three types of test vectors are provided that can be used to ensure correct implementation of the encoding: positive (see
Section 2.8.6.1
), negative (see
Section 2.8.6.2
) and large content (see
Section 2.8.6.3
).
Implementations
MUST
satisfy the positive and negative test vectors.
Implementations are
RECOMMENDED
to also satisfy the large content test vectors. However this requires implementing a streaming encoder (see
Section 2.8.2
) which might not be necessary or desirable for certain applications (e.g. constrained environments where size of content is always very small).
The positive and negative test vectors are provided as machine-readable JSON files in the archive
eris-test-vectors-v1.0.0.tar.gz
Following JSON field appear in both positive and negative test vectors:
id
Numeric identifier of the test vector.
type
Specifies the type of test. Values are either
positive
or
negative
spec-version
Version of ERIS specification
name
Short human readable name.
description
Human readable description of the test.
read-capability
JSON map containing the components of the read capability. This is not used in tests but is here as a help for developers.
urn
The ERIS URN of the content.
blocks
A JSON map of blocks required to decode the content given the URN. Key and field are encoded as Base32 strings.
Further fields are used in the positive test vectors.
2.8.6.1.
Positive
The positive test vectors can be used to ensure that implementations correctly encode content to a given read capability and can decode the same content from given blocks.
Following additional fields are used in the JSON files for positive test vectors:
content
The binary content to be encoded as Base32 (unpadded) string.
convergence-secret
The convergence secret to be used as Base32 string.
block-size
Block size that should be used for encoding in bytes (either 1024 or 32768).
The test vector
eris-test-vector-positive-00.json
is shown as example:
"id": 0,
"type": "positive",
"spec-version": "1.0.0",
"name": "short string (block size 1KiB)",
"description": "Encode the UTF-8 encoding of the string \"Hello world!\" with block-size 1KiB and null convergence-secret.",
"content": "JBSWY3DPEB3W64TMMQQQ",
"convergence-secret": "AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA",
"block-size": 1024,
"read-capability": {
"block-size": 1024,
"level": 0,
"root-reference": "H77AGSYKAVTQPUHODJTQA7WZPTWGTTKLRB2GLMF5H53NEKFJ3FUQ",
"root-key": "CPTDEZH4ALSLCBR7INTIBWLLGMJ7MNFFCGX7PMKEAA52KZDDFTNQ"
},
"urn": "urn:eris:BIAD77QDJMFAKZYH2DXBUZYAP3MXZ3DJZVFYQ5DFWC6T65WSFCU5S2IT4YZGJ7AC4SYQMP2DM2ANS2ZTCP3DJJIRV733CRAAHOSWIYZM3M",
"blocks": {
"H77AGSYKAVTQPUHODJTQA7WZPTWGTTKLRB2GLMF5H53NEKFJ3FUQ": "EWZKXK73236ETFGMMFORFLMNIPE5V3S3WDVFECUPI47RFJBA5ZMBHH6HMOZCNFQKOTADCPJMPHTZJNEW4VOKHBSNABYVIZWQDV5GQPECUBDAULOPR2S7ITYQSGGVPPEWJVEZNIUKUFR4XE7GQPUDY3FPFSCUYIISZX6PWLLPNPI5V3RKWQGN2L6LLE5G7TZ5FVPAYUHOES4LGHRKSXYCNQF6IR5HLKX2C2EPVKSU2T6XOSAF5VHUZ2GTQS7BLT3VYP5BYI2WR4GJEYDWLY26TK6ZQ2DYZZBIYSVUIY557FE6QOV3L5X5HCAQEWPYCUKUADOOSMNU7EEONPRMBJU4XLQ66AOOVRQ66OJLHANVLNFDXXPLH6KDVCJBVQWWWI7PA6OGKGPU7ZZPZT2DIBOAUGWM6DVZWWX3DA3GHWS3VY6RQMLAKDXHZRQ6VDVMLMFSULJYHACC7G57CZ2SG7XB24XT3SLJG56PO3Z7YJJYEVP44F44YCZ5YS4NRZKWS4OTFMXGNF25G3GIGSV5NEVVTSO6J5EKEXWTX74X27HYI4UZ45YF675423AWYUVTPVLUWOJMGANQRDWYOPFE5QH6JUINCH5NYZUHYPZP6WHC4IVOLYFDAUNOWLRVR37BLT5E44VVJ6XDQZAS2XT6G2XM3RJUUQEYD2RRFBWGPNSOJ2RUPE654GKHRDCKUX2MZ6D43LKI2DKCF7QEPYWJWJH6EI74NQNOLCHEAUFEXH5ZXUXO6JJ5PKOXGL4RBOGCP2X2RYXOJOCT55BAGCRQHID2TRO7NPZWGQNMSSWHOAXY6JFCVFXXGR4JM62HHXZTKODD7NYXO7EUS3GMY2NDQFENM3XKNAI5MFNLL7ERMPSIXHAJ44ASIDZS7RPZ542SLH7XONZ6PMCPI4V66ALJJTTXMJAEU35YPH3UD7UHBCM4OI3SDGTUL3TQQWMDIFBNECJN7FNAWRXTWCXM6CIILVYAITWSEDIDEMLBKR5KIGE5SQTW2ITIIA725SNZO3PJMQCAPJI4H3QXVPKG4OZIOTENU2VW3W3PNAYVE65YJBQGPY6M6LRQYGPYYSEFTRPW3YXGGC2ICFROUD7FXCFXVD6OWA4B6LDFDX4LPF4H7525BVRBNW2ZLMXZUXCFZSZOSSP7VKBCWIDJ72XSR43YFKTL5TADVXDF3RN2HHAGKXWOXINMJJLRE4K72H54IOROFS4FD5QYXWSJWH4ENYC4PAOJ6JELRFYC6RMXP73VR745WY4ZOFQTRQ5ZEA2C3M7JTQUVKV26XGVVHBYA7NEMRPZNVRXHCKYN3CGJSICBUFGMHSSDBTRIF3BCPVMLRBU25DFGGM4LEEL4KTIAJITYY5XPR4XDRD55PEDVOUL342IXCNEBTTPPLMPV6EJYUFJS42R4XLDOT7NOFPLTZUBLWSLL7IVZNPNI6DZ4CR7YEQP72DDUWDJJTKACT35JLFPDW3M2VUOJF3CUWN6FYN5YJJSXYMXSVDZDVIAJYF2HOPQEHLMRF3MJAXMTLMCOIARLFZKAGRSW6PWQZ7ZJLCQAPSJTPNDA2SLUA3UHH34NWEPTAVWOBDPNTMT27TK5P4VKLE2YEJHKWE6SJA3V7A3UPQS24SWDJ2BPOV7JG23ZVIA"
Figure 16
Positive test vector
eris-test-vector-positive-00.json
Implementations
MUST
for all positive test vectors:
Verify that the content encodes to the specified URN given the convergent secret and block size.
Verify that the content decoded from the given URN and blocks is equal to the specified content.
2.8.6.2.
Negative
Negative test vectors are provided to help implementations ensure that they correctly verify content while decoding.
For all negative test vectors, implementations should attempt to decode content given the URN and blocks. The test passes if decoding fails. The reason for failure is described in the
description
field. Implementations
MUST
pass all negative test vectors.
The test vector
eris-test-vector-negative-13.json
is shown as example:
"id": 13,
"type": "negative",
"spec-version": "1.0.0",
"name": "no blocks",
"description": "Content can not be decoded because of missing blocks.",
"read-capability": {
"block-size": 1024,
"level": 0,
"root-reference": "H77AGSYKAVTQPUHODJTQA7WZPTWGTTKLRB2GLMF5H53NEKFJ3FUQ",
"root-key": "CPTDEZH4ALSLCBR7INTIBWLLGMJ7MNFFCGX7PMKEAA52KZDDFTNQ"
},
"urn": "urn:eris:BIAD77QDJMFAKZYH2DXBUZYAP3MXZ3DJZVFYQ5DFWC6T65WSFCU5S2IT4YZGJ7AC4SYQMP2DM2ANS2ZTCP3DJJIRV733CRAAHOSWIYZM3M",
"blocks": {}
Figure 17
Negative test vector
eris-test-vector-negative-13.json
2.8.6.3.
Large Content
In order to verify implementations that encode content by streaming (see
Section 2.8.2
) URNs of large contents that are generated in a specified way are provided:
Table 2
Large content test vectors
Test name
Content size
Block size
Level of root reference
URN
100MiB (block size 1KiB)
100MiB
1KiB
urn:eris:BIC6F5EKY2PMXS2VNOKPD3AJGKTQBD3EXSCSLZIENXAXBM7PCTH2TCMF5OKJWAN36N4DFO6JPFZBR3MS7ECOGDYDERIJJ4N5KAQSZS67YY
1GiB (block size 32KiB)
1GiB
32KiB
urn:eris:B4BL4DKSEOPGMYS2CU2OFNYCH4BGQT774GXKGURLFO5FDXAQQPJGJ35AZR3PEK6CVCV74FVTAXHRSWLUUNYYA46ZPOPDOV2M5NVLBETWVI
256GiB (block size 32KiB)
256GiB
32KiB
urn:eris:B4B5DNZVGU4QDCN7TAYWQZE5IJ6ESAOESEVYB5PPWFWHE252OY4X5XXJMNL4JMMFMO5LNITC7OGCLU4IOSZ7G6SA5F2VTZG2GZ5UCYFD5E
Content is the ChaCha20 stream using a null nonce and the key which is the Blake2b hash of the UTF-8 encoded test name (e.g.
key := Blake2b-256("100MiB (block size 1KiB)")
). The ChaCha20 stream can be computed by encoding a null byte sequence (e.g.
chacha20-stream := ChaCha20(null_byte_stream, KEY)
).
3.
Applications
Traditionally encoding schemes similar to ERIS are used in peer-to-peer file-sharing applications (e.g. BitTorrent, GNUNet file-sharing, FreeNet). Given a performant block tranport layer, ERIS can be used for efficient file-sharing. However, we hope to motivate usage for a much wider scope of applications.
As part of the
openEngiadina
project we are using ERIS to encode small bits of information that constitute "local knowledge" (e.g. geographic information, social and cultural events, etc.) along with the social interactions that created and curated this information (using the ActivityStreams vocabulary). ERIS allows such information to be securely cached on multiple peers to increase the robustness of the system. The fact that ERIS encoded content can be referenced by an URN allows it to be embeded into existing data structures and protocols. In particular, ERIS encoded content can be referenced from RDF and RDF can be made content-addressable with ERIS (see
ERIS and RDF
).
ERIS can be used to share content in small communal and friend-to-friend networks. For this the ability to use convergence secrets is very useful for increased security (see
Section 4.6
). This use-case has been further research as part of the
DREAM
project.
ERIS can also be used to create larger content-delivery networks. In particular, we are working towards making software sources and pre-built packages for
Guix
more robustly available and peer-to-peer shareable.
ERIS identifiers have also been embedded in ELF binaries to reference shared libraries (see
Sigil OS
).
3.1.
Storage and Transport Layers
ERIS is defined independent of any storage and transport layer for blocks. The only requirement is that blocks can be accessed by their reference - the hash of the block content.
Possible storage layers include:
in-memory hash-map
key-value store and other databases
files on a file system
Transport mechanisms include:
HTTP: A simple HTTP endpoint can be used to de-reference blocks (see
ERIS over HTTP
).
Sneakernet: Blocks can be transported on a physical medium such as a USB stick.
More interesting transport and storage layers use the fact that blocks are content-addressed. For example the peer-to-peer network
IPFS
can be used to store and transport blocks. The major advantages over using IPFS directly are that blocks are encrypted and not readable to IPFS peers without the read capability and that identifier of blocks and encoded content are not tied to the IPFS network. Applications can transparently use IPFS or any other storage/transport mechanism.
Other protocols we are investigating for usage as ERIS transport layer include
Named Data Networking
, BitTorrent and GNUNet.
We are also researching transport protocols based on CoAP
RFC7252
3.2.
Authenticity of Content
While decoding ERIS encoded content the integrity of the content is verified. Content can not be tampered with without changing the identifier (read capability) of the content. To prove authenticity of encoded content it is sufficient to sign the read capability.
We have presented a concrete proposal on how this might be done using a RDF vocabulary and the Ed25519 cryptographic signature scheme
RDF-Signify
3.3.
Mutability and Namespaces
Encoded content is immutable in the sense that changing the encoded content results in a new identifier. Existing references to the old content need to be updated. This is a property that allows robust availabilty of content.
Nevertheless, there are applications where one wants to reference mutable content. Examples include user profiles or dynamic collections of content. Making small changes to a user profile or adding a piece of content to a collection should preserve the identifiers.
There are many ways of implementing such mutability or namespaces. ERIS does not specify any particular mechanism. Possible mechanisms include:
Centralized servers that returns a mutable list of reference to (immutable) content. This is how most HTTP services work.
Append-only logs where changes are securely appended with cryptographic signatures. The state is computed from the log of changes. This is how peer-to-peer systems such as
hypercore
or
Secure ScuttleButt
work. See also
ERIS Canons
for appendable logs optimized for usage with ERIS.
Petname system: A system where a dynamic local name can be mapped to a reference. Sophisticated systems that allow delegation of naming authority include the GNU Name System
LSD0001
(see
Section 2.6.2
).
Commutative Replicated Data Types (CRDTs) are distributed data structures similar to append-only logs with the advantage that the state of a mutable container can diverge and converge to consistent state eventually. Such structures seem especially suitable when control over a mutable container is shared by multiple parties. For an example see
Distributed Mutable Containers
We believe that the best suited mechanism for handling mutability depends on concrete applications and use-cases. A key value of ERIS is that it is agnostic of such mechanisms and can be used from any of them.
4.
Security Considerations
In this section we discuss security considerations when using ERIS as an abstract encoding as well as when used in conjunction with storage and transport layers (see
Section 3.1
).
We use terms for communication security as defined in RFC 3552
RFC3552
(e.g. CONFIDENTIALITY or PEER ENTITY AUTHENTICATION).
4.1.
Threat Model
We consider a setting with following entities:
Publisher
Wants to publish some content.
Audience
A group of entities that should be able to decode the content published by the publisher. They receive the ERIS read capability from the publisher over a channel that provides CONFIDENTIALITY, DATA INTEGRITY and PEER ENTITY AUTHENTICATION.
Intermediary Peers
A group of entities that assist in making the content available to the audience by storing and transporting blocks of the encoded content. There are no communication security requirements for the communication between the intermediary peers and publisher or audience. Note that the publisher as well as members of the audience can act as intermediary peers.
Censor
An adversary that wishes to prevent the audience from being able to decode some specific content. The censor does not have access to the read capability of the encoded content but may inspect, modify or drop communication between the intermediary peers and the audience. The censor does not have access and can not control the internal state of the publisher, audience or intermediary peers. The censor can impersonate a malicious intermediary peer.
See also the ECRS paper
ECRS
and the theoretical treatment of censorship resistance by Perng et al.
Perng2005
4.2.
Availability
The publisher can make the blocks of the encoded content available by replicating them to intermediary peers and audience directly over various different storage and transport layers (see
Section 3.1
).
4.3.
Data Integrity
Members of the audience can verify the integrity of the content while decoding by verifying the Blake2b hash of a block (see
Section 2.5
). Even when a malicious intermediary peer is distributing invalid blocks, this will be detected by the internal decoding process run by the audience.
Intermediary peers can pro-actively detect invalid blocks by checking the Blake2b hash of the block.
Note that the ERIS read capability can not be used to verify integrity of content when the content is given directly and not decoded from blocks. When the content is given in as a sequence of bytes, the only way to compute the read capability is to encode the content. However, the read capability to verify might have been computed using a convergence secret (see
Section 2.3
) that is not known, making it impossible to verify that the content corresponds to the read capability.
4.4.
Intermediary Peer Deniability
Intermediary peers do not need to have access to the read capability in order to store and transport blocks of encoded content. As the blocks only contain encrypted data, intermediary peers can plausibly deny being able to decode the content.
Note that in certain situations an active attack that can reveal parts of the encoded content is possible (see
Section 4.6
).
4.5.
Censorship Resistance
We define censorship resistance as the inability of a censor who does not have access to the read capability of some content to prevent members of the audience from decoding the specific content without preventing access from all content (i.e. DENIAL OF SERVICE).
This holds as a censor that does not have access to the read capability of some content can not decide if a given block is required to decode the specific content or is a block from the encoding of some other content.
Note that a censor can prevent the audience from decoding any content by dropping all communication to intermediary peers - the censor can perform a DENIAL OF SERVICE attack. Practically a DENIAL OF SERVICE attack can be made difficult by replicating blocks and using various different storage and transport layers (see
Section 4.2
).
Our definition of censorship resistance is slightly stronger than that of ECRS
ECRS
. In ECRS the censor may not know the exact content, whereas in ERIS the censor may not know the read capability. If the censor does know the exact content that should be censored than we can use a fresh convergence secret to create a read capability that the censor does not know (see
Section 2.3
).
4.6.
Known Attacks on Convergent Encryption
Convergent encryption allows de-duplication of content and deterministic identifiers. However, it also suffers from two known attacks
Zooko2008
The Confirmation Of A File Attack
An adversary who knows the read capability of some ERIS encoded content can enumerate all blocks that are required to decode the content. The adversary can now confirm that a member of the audience is accessing some content by observing which blocks are being accessed.
The Learn-The-Remaining-Information Attack
When encoding content where an adversary can predict parts of the content, the remaining information may be learned by brute forcing the remainder. For example, this might be a configuration file where the adversary knows the entire content except for a single field containing a password. The adversary can then brute force the password and confirm that a configuration with a certain password has been accessed. Note that the brute-forcing effort can be reused to efficiently learn many secrets, similar to how rainbow tables are used to crack password hashes.
ERIS is vulnerable to both when using a convergent secret that is known to the adversary. A defense against both attacks is to use a convergence secret that is not known by the adversary (see
Section 2.3
). Using a different convergence secret causes the same content to be encoded into different blocks and identifiers.
De-duplication and deterministic identifiers are both properties that may be important to certain applications and users. Users should be aware of the known attacks and must decide depending on application and context on whether mitigations are necessary.
4.7.
Observing Block Access
A passive adversary that only observes communication between audience and intermediary peers might be able to learn information about encoded content from the pattern in which blocks are accessed by members of the audience. For example an adversary might be able to infer that certain blocks are part of the encoding of a video with some resolution by observing that blocks are fetched with predictable intervals and in a fixed order.
A passive adversary may also log block transfers between audience and intermediary peers for extended periods of time. If a read capability is leaked to the adversary later, they will know who has accessed blocks for decoding content with the leaked read capability in the past.
Storage and transport layers
SHOULD
use encryption to prevent passive network attackers from being able to observe such patterns.
Members of the audience
MAY
use obfuscation tactics when getting blocks from storage and transport layers to prevent malicious intermediary peers from being able to observe such patterns.
5.
Contact
A mailing list for general discussion on ERIS is available at
~pukkamustard/eris@lists.sr.ht
. Ephemeral discussions take place in the
#eris channel
on the Libera IRC network. See also the
project page
for more information.
Please feel free to direct any questions or comments regarding the specification to the mailing list. You are also invited to share your implementations and use-cases.
Urgent and sensitive security issues may be addressed directly to the
ERIS maintainers
6.
IANA Considerations
6.1.
CBOR Tags Registry
This specification requires the assignment of a CBOR tag for a binary ERIS read capability. The tag is added to the CBOR Tags Registry as defined in RFC 8949
RFC8949
Table 3
CBOR Tag Registration for ERIS binary read capability
Tag
Data Item
Semantics
276
byte string
ERIS binary read capability (see
Section 2.6
7.
GANA Considerations
7.1.
GNU Name System record types registry
GANA
GANA
is requested to add an entry into the "GNU Name System record types" registry as follows:
Table 4
GNU Name System record types registry entry for ERIS read capability
Number
Name
Comment
References
65557
ERIS_READ_CAPABILITY
Encoding for Robust Immutable Storage (ERIS) binary read capability
8.
Acknowledgments and Final Words
Initial development of ERIS was done as part of the
openEngiadina
project and was supported by the
NLnet Foundation
trough the
NGI0 Discovery Fund
. Further development is being supported by the
NLnet Foundation
trough
NGI Assure
Our friend and fellow developer rustra is imprisoned as a victim of political repression in Belarus. Read his
last words in court
and an
interview with him
. Consider donating to the
Anarchist Black Cross Belarus
. Support victims of repression and resist any form of repression and oppression. Resistance is not futile.
9.
References
9.1.
Normative References
[GANA]
GNUnet e.V.
"GNUnet Assigned Numbers Authority (GANA)"
April 2020
[LSD0001]
Schanzenbach, M.
Grothoff, C.
, and
B. Fix
"The GNU Name System"
2021
[RFC2119]
Bradner, S.
"Key words for use in RFCs to Indicate Requirement Levels"
BCP 14
RFC 2119
DOI 10.17487/RFC2119
March 1997
[RFC3552]
Rescorla, E.
and
B. Korver
"Guidelines for Writing RFC Text on Security Considerations"
BCP 72
RFC 3552
DOI 10.17487/RFC3552
July 2003
[RFC4648]
Josefsson, S.
"The Base16, Base32, and Base64 Data Encodings"
RFC 4648
DOI 10.17487/RFC4648
October 2006
[RFC7693]
Saarinen, M-J., Ed.
and
J-P. Aumasson
"The BLAKE2 Cryptographic Hash and Message Authentication Code (MAC)"
RFC 7693
DOI 10.17487/RFC7693
November 2015
[RFC8141]
Saint-Andre, P.
and
J. Klensin
"Uniform Resource Names (URNs)"
RFC 8141
DOI 10.17487/RFC8141
April 2017
[RFC8439]
Nir, Y.
and
A. Langley
"ChaCha20 and Poly1305 for IETF Protocols"
RFC 8439
DOI 10.17487/RFC8439
June 2018
[RFC8949]
Bormann, C.
and
P. Hoffman
"Concise Binary Object Representation (CBOR)"
STD 94
RFC 8949
DOI 10.17487/RFC8949
December 2020
9.2.
Informative References
[ECRS]
Grothoff, C.
Grothoff, K.
Horozov, T.
, and
J.T. Lindgren
"An Encoding for Censorship-Resistant Sharing"
2003
[Freenet]
Clarke, I.
Sandberg, O.
Wiley, B.
, and
T. Hong
"Freenet: A Distributed Anonymous Information Storage and Retrieval System"
Designing Privacy Enhancing Technologies pp. 46-66
DOI 10.1007/3-540-44702-4_4
2001
[Perng2005]
Perng, G.
Reiter, M.
, and
C. Wang
"Censorship Resistance Revisited"
Information Hiding pp. 62-76
DOI 10.1007/11558859_6
2005
[Polleres20]
Polleres, A.
Kamdar, M.
Fernández, J.
Tudorache, T.
, and
M. Musen
"A more decentralized vision for Linked Data"
Semantic Web Vol. 11, pp. 101-113
DOI 10.3233/SW-190380
January 2020
[RFC7252]
Shelby, Z.
Hartke, K.
, and
C. Bormann
"The Constrained Application Protocol (CoAP)"
RFC 7252
DOI 10.17487/RFC7252
June 2014
[RFC7927]
Kutscher, D., Ed.
Eum, S.
Pentikousis, K.
Psaras, I.
Corujo, D.
Saucez, D.
Schmidt, T.
, and
M. Waehlisch
"Information-Centric Networking (ICN) Research Challenges"
RFC 7927
DOI 10.17487/RFC7927
July 2016
[THEX]
Mohr, G.
"Tree Hash EXchange format (THEX)"
March 2003
[Zooko2008]
Wilcox-O'Hearn, Z.
"Drew Perttula and Attacks on Convergent Encryption"
2008
Changelog
1.0.0 (23. August 2022)
Permanent link
Changed
Change order of ChaCha20 nonce used to encrypt blocks by placing level in the first byte
Require convergence secret to be specified explicitly
Update URN scheme to
eris
Added
Section with implementation notes
Note on when random and unique convergence secrets should be used and when the null convergence secret may be used
0.4.0 (13. May 2022)
Permanent link
Changed
Encode the level of a block in the ChaCha20 nonce used to encrypt a block as a mitigation against attacks based on block level confusion
Use unkeyed Blake2b to compute encryption key of internal nodes. This allows the key in the read capability to be verified if tree level is larger than 0.
Add DATA INTEGRITY to requirement on channel between publisher and audience in threat model
Update URN scheme to
erisx3
Syntax of pseudocode uses less ALL CAPS
Clarify usage of lists and sets in pseudocode
Clarify usage of terms nodes and blocks
Restructure description of encoding and decoding algorithms
Reformat to IETF xml2rfc (see
RFC7991
Added
More figures to illustrate encoding and non-regular cases
Recommendation for encoding references to blocks as URNs
Negative test vectors
Pseudocode for padding algorithms
0.3.0 (11. January 2022)
Permanent link
Added
CBOR Tag for ERIS binary read capability
GANA GNU Name System record types entry
Security considerations
Fixed
Off-by-one errors in specification of
PAD
and
UNPAD
Simplify pseudocode implementation of
Split-Content
by reading from padded content
Changed
Encoding of block size in binary read capability: Use 0x0a for block size 1KiB (instead of 0x00) and 0x0f for block size 32KiB (instead of 0x01)
Remove confidentiality from objectives and add intermediary peer deniability and censorship resistance
Test vectors are provided in a tar archive
0.2.0 (7. December 2020)
Major update of encoding that removes the
verification capability
- ability to verify integrity of content without reading content.
Permanent link
0.1.0 (11. June 2020)
Initial version.
Permanent link
Index
arity
Section 2.4, Paragraph 5
block
Section 2.4, Paragraph 8
convergent encryption
Section 2.3, Paragraph 1
hashing key
Section 2.1.1, Paragraph 1
node
Section 2.4, Paragraph 4
Section 2.4, Paragraph 8
null convergence secret
Section 2.3, Paragraph 8
random-access decoder
Section 2.8.4, Paragraph 1
read capability
Section 2.4, Paragraph 11
reference-key pair
Section 2.4, Paragraph 9
streaming encoding
Section 2.8.2, Paragraph 1
Author's Address
Endo Renberg
Email:
~pukkamustard/eris@lists.sr.ht
URI: