ISCC-UNIT Content-Code Mixed#
IEP: | 0007 |
---|---|
Title: | ISCC-UNIT Condent-Code Mixed |
Author: | Titusz Pan tp@iscc.foundation |
Comments: | https://github.com/iscc/iscc-ieps/issues/12 |
Status: | DRAFT |
Type: | Core |
License: | CC-BY-4.0 |
Created: | 2022-09-28 |
Updated: | 2024-01-02 |
Note
This document is a DRAFT contributed as input to ISO TC 46/SC 9/WG 18. The final version is developed at the International Organization for Standardization as ISO/DIS 24138
1. General#
- The Content-Code Subtype Mixed (Mixed-Code) shall be a similarity preserving hash of a collection of assets of the same or different media types combined into a single multimedia file.
- An ISCC processor that supports the creation of Mixed-Codes shall publicly document the supported file formats and the rules by which it divides the different parts of a multimedia file.
- The Mixed-Code shall be robust against format conversions, scaling, compression, and minor edits of the individual parts of the multimedia file.
2. Format#
The Mixed-Code shall have the data format illustrated in Figure 9:
EXAMPLE 1: 64-bit Mixed-Code in its canonical form:
ISCC:EQASD57JXX7U73P7
EXAMPLE 2: 256-bit Mixed-Code in its canonical form:
ISCC:EQDSD57JXX7U73P7HPPH2P3U5OXZM7PL65T3HZ5JZ76H577P77NO5ZY
3. Inputs#
- The input for calculating the Mixed-Code shall be the Content-Codes of the individual parts of the multimedia file.
- At least two Content-Codes shall be required as input to calculate a Mixed-Code.
4. Outputs#
Mixed-Code processing shall generate the following ISCC metadata output elements:
- iscc: the Mixed-Code in its canonical form (required);
- parts: the list of Content–Codes used for calculating the Mixed-Code (recommended);
- Additional metadata extracted from the multimedia file (optional).
5. Processing#
An ISCC processor shall pre-process the multimedia file as follows:
- Generate individual Content-Codes for each part of the multimedia file according to the specifications in IEP-0003, IEP-0004, IEP-0005 and IEP-0006.
An ISCC processor shall calculate the Mixed-Code as follows:
- Create a byte sequence from each Content-Code retaining the first byte of the ISCC-HEADER concatenated with the bytes of the ISCC-BODY.
- Apply the similarity hash to the list of byte sequences from step 1 to calculate the ISCC-BODY of the Mixed-Code.
6. Conformance#
The normative behaviour of an ISCC processor in generating a Mixed Code is specified only for Content-Code inputs. An implementation of the Mixed-Code algorithm shall be regarded as conforming to the standard if it creates the same Mixed-Code as the reference implementation for the same Content-Code inputs.
NOTE
For further technical details see source-code in modules code_content_mixed.py and simhash.py of the reference implementation.