ISCC-CODE#
IEP: | 0010 |
---|---|
Title: | ISCC-CODE |
Author: | Titusz Pan tp@iscc.foundation |
Comments: | https://github.com/iscc/iscc-ieps/issues/15 |
Status: | DRAFT |
Type: | Core |
License: | CC-BY-4.0 |
Created: | 2022-09-28 |
Updated: | 2024-01-05 |
Note
This document is a DRAFT contributed as input to ISO TC 46/SC 9/WG 18. The final version is developed at the International Organization for Standardization as ISO/DIS 24138
1. General#
An ISCC-CODE shall be an ordered sequence of two or more headerless ISCC-UNITs of different MainTypes derived from one referent prefixed with a common header.
2. Purpose#
The ISCC-CODE shall support identification, clustering, discovery and matching of files based on their metadata, content, data similarity, and where appropriate it shall be used together with other identifiers in accordance with the principles outlined in the Annex.
3. Format#
The ISCC-CODE shall have the data format illustrated in Figure 13:
EXAMPLE 1: 128-bit ISCC-CODE with Data-Code and Instance-Code:
ISCC:KUAIFYXGML3SRNH25MIWPM3HVHBXQ
EXAMPLE 2: 256-bit ISCC-CODE with Meta-, Text-, Data-, and Instance-Code:
ISCC:KAC6HZYGQLBASTFMBJOS6NDLVKKFLAXC4ZRPOKFU7LVRCZ5TM6U4G6A
3.1 SubTypes#
- If an ISCC-UNIT of type CONTENT is sequenced into an ISCC-CODE, the SubType of the ISCC-CODE shall be that of the Content-Code (see 4.2.2 SubTypes).
- If the ISCC-CODE is composed only of a sequence of the types META, DATA, and INSTANCE, the SubType shall be NONE.
- If the ISCC-CODE is composed only of a sequence of types DATA and INSTANCE, the SubType shall be SUM.
3.2 Length#
- The length in bits of the ISCC–BODY of an ISCC-CODE shall be calculated as the number of data bits set in the Length field of the header times 64 plus 128 bits.
- The ISCC-UNITs composed into an ISCC-CODE shall be ordered as follows: META, SEMANTIC, CONTENT, DATA, INSTANCE.
- The data bits of the Length field shall be the bits following the prefix bit(s) and they shall encode the composition of the ISCC-BODY of an ISCC-CODE as follows:
- The first data bit shall signify the presence of a Meta-Code.
- The second data bit shall signify the presence of a Semantic-Code.
- The third data bit shall signify the presence of a Content-Code.
4. Inputs#
- The input for calculating an ISCC-CODE shall be a collection of ISCC-UNITs.
- The input shall include at least a Data-Code and an Instance-Code with a minimum of 64 bits each.
- The input shall include at most one ISCC-UNIT of each MainType.
- If both a Semantic-Code and a Content-Code are given as input, they shall be of the same SubType.
5. Outputs#
ISCC-CODE processing shall generate the following output elements:
iscc
: the ISCC-CODE in its canonical form (required);filename
: the name of the input file (optional);- any other elements collected during processing of the individual ISCC-UNITs (optional).
6. Processing#
An ISCC processor shall compose an ISCC-CODE as follows:
- Sort the ISCC-UNITs according to their predefined order (META, SEMANTIC, CONTENT, DATA, INSTANCE).
- Decode the ISCC-UNITs to binary and if supplied SubTypes of the Semantic-Code and Content-Code are different, halt.
- Remove the headers of the decoded ISCC-UNITs.
- Truncate the ISCC-UNITs by keeping only the first 64 bits.
- Concatenate the headerless and truncated ISCC-UNITs in sorted order to construct the final ISCC-BODY of the ISCC-CODE.
- Prefix the ISCC-BODY with the appropriate ISCC-HEADER and encode the result to its canonical form.
For further details see source-code in the module iscc_code.py of the reference implementation.
7. Comparing#
To measure the similarity of two ISCC-CODEs, check if the Instance-Codes are identical. Calculate the binary hamming distance of the ISCC-BODYs of the other ISCC-UNITs with the same MainType and SubType. Lower values of the hamming distance indicate higher probability of similarity. Higher values of the hamming distance indicate decreasing similarity. The threshold indicating identity will vary according to the MainType and the application.
For further details, see source-code of the function iscc_compare
in the module
utils.py of the
reference implementation.
8. Annex#
8.1 Principles#
An ISCC-CODE shall be used for the identification of digital assets. It shall, where appropriate, be used in conjunction with existing identifier schemes (see Figure A.1).
An ISCC-CODE shall not be used as a replacement for other identifier schemes such as DOI, ISAN, ISBN, ISRC, ISSN, ISWC, and other commonly recognized identifiers.
8.2 Linking#
If the referent of an ISCC is the manifestation of another entity that has an identifier within
another identifier scheme, the relationship shall be indicated in the ISCC metadata element
identifier
(see IEP-0012).
NOTE
The identifier
element used in the ISCC metadata schema can reference multiple identifiers.
8.3 Use of ISCC#
Other identifier schemes that wish to integrate with ISCC shall:
- specify a metadata schema to be used as seed metadata for generating ISCCs in the context of their use cases;
- add ISCCs to their metadata in order to link their referents to related digital manifestations.
Seed metadata schema shall:
- define the descriptive elements to be used for matching referents based on similar metadata;
- use ISO/IEC 21778:2017 JSON format;
- use JSON-LD syntax;
- be defined by a JSON schema;
- be registered as an IANA media type.
If possible, seed metadata shall be embedded into a digital asset as described in IEP-0002 - Metadata embedding