ISCC-UNIT Instance-Code#
IEP: | 0009 |
---|---|
Title: | ISCC-UNIT Instance-Code |
Author: | Titusz Pan tp@iscc.foundation |
Comments: | https://github.com/iscc/iscc-ieps/issues/14 |
Status: | DRAFT |
Type: | Core |
License: | CC-BY-4.0 |
Created: | 2022-09-28 |
Updated: | 2024-01-05 |
Note
This document is a DRAFT contributed as input to ISO TC 46/SC 9/WG 18. The final version is developed at the International Organization for Standardization as ISO/DIS 24138
1. General#
- The Instance-Code shall be a checksum for any kind of data regardless of its media type.
- The Instance-Code shall match identical files and indicate data transmission errors or data manipulation.
- The Instance-Code shall be the prefix of a merkle root of a merkle tree as illustrated in Figure 11.
- The cryptographic hash algorithm used for Instance-Code calculation shall be updated if a security weakness is discovered.
- An update shall be indicated by a higher version number of the Instance-Code header.
NOTE 1
Depending on security and uniqueness requirements, applications can generate and store Instance-Codes up to a length of 256 bits in steps of 32 bits. Longer variants of an Instance-Code are extensions of the shorter versions (which function only as checksums) and remain compatible while improving the security of the hash.
NOTE 2
The use of tree-based hashing supports efficient containment-proofs, verified streaming and verification of data from trusted and untrusted sources. A data delivery application can provide cryptographic proofs for partial data and streams such that the receiving application can incrementally verify chunks of data against a given Instance-Code.
2. Format#
The Instance-Code shall have the data format illustrated in Figure 12:
EXAMPLE 1: 64-bit Instance-Code in its canonical form:
ISCC:IAAZ3NGA3HTIYUQD
EXAMPLE 2: 256-bit Instance-Code in its canonical form:
ISCC:IADZ3NGA3HTIYUQD3SGC737FF6S5KRTRXY5DEU7ANCEMVTT4MDS2OQY
3. Inputs#
The input for calculating the Instance-Code shall be the bytes of a file, without reference to their meaning or structure.
4. Outputs#
Instance-Code processing shall generate the following output elements:
iscc
: the Instance-Code in its canonical form (required);datahash
: a cryptographic hash of the input file encoded as multihash (optional);filesize
: the size of the input file in bytes (optional).
5. Processing#
An ISCC processor shall calculate the Instance-Code as follows:
- Apply the tree-based cryptographic hash function to the file input using the appropriate number of bytes for the output digest.
- Use the hash digest of output as the ISCC-BODY of the Instance-Code.
For further technical details see source-code in modules code_instance.py of the reference implementation.
NOTE
The merkle tree chunk size is defined by the BLAKE3 1 algorithm to be 1024 bytes.
6. Conformance#
An implementation of the Instance-Code algorithm shall be regarded as conforming to the standard if it creates the same Instance-Code as the reference implementation for the same data input.
NOTE
The ISCC reference implementation uses the open source BLAKE3 1 library for merkle tree hashing and appropriate use of this software will generate the same codes as the reference implementation.
7. Bibliography#
-
O’Connor, J., Aumasson, J.P., Neves, S., Wilcox-O’Hearn, Z., BLAKE3: one function, fast everywhere. Version 20211102173700, accessed July 2022. Available at https://github.com/BLAKE3-team/BLAKE3-specs/blob/master/blake3.pdf ↩↩