| % -*- mode: latex; TeX-master: "Vorbis_I_spec"; -*- |
| %!TEX root = Vorbis_I_spec.tex |
| % $Id$ |
| \section{Bitpacking Convention} \label{vorbis:spec:bitpacking} |
| |
| \subsection{Overview} |
| |
| The Vorbis codec uses relatively unstructured raw packets containing |
| arbitrary-width binary integer fields. Logically, these packets are a |
| bitstream in which bits are coded one-by-one by the encoder and then |
| read one-by-one in the same monotonically increasing order by the |
| decoder. Most current binary storage arrangements group bits into a |
| native word size of eight bits (octets), sixteen bits, thirty-two bits |
| or, less commonly other fixed word sizes. The Vorbis bitpacking |
| convention specifies the correct mapping of the logical packet |
| bitstream into an actual representation in fixed-width words. |
| |
| |
| \subsubsection{octets, bytes and words} |
| |
| In most contemporary architectures, a 'byte' is synonymous with an |
| 'octet', that is, eight bits. This has not always been the case; |
| seven, ten, eleven and sixteen bit 'bytes' have been used. For |
| purposes of the bitpacking convention, a byte implies the native, |
| smallest integer storage representation offered by a platform. On |
| modern platforms, this is generally assumed to be eight bits (not |
| necessarily because of the processor but because of the |
| filesystem/memory architecture. Modern filesystems invariably offer |
| bytes as the fundamental atom of storage). A 'word' is an integer |
| size that is a grouped multiple of this smallest size. |
| |
| The most ubiquitous architectures today consider a 'byte' to be an |
| octet (eight bits) and a word to be a group of two, four or eight |
| bytes (16, 32 or 64 bits). Note however that the Vorbis bitpacking |
| convention is still well defined for any native byte size; Vorbis uses |
| the native bit-width of a given storage system. This document assumes |
| that a byte is one octet for purposes of example. |
| |
| \subsubsection{bit order} |
| |
| A byte has a well-defined 'least significant' bit (LSb), which is the |
| only bit set when the byte is storing the two's complement integer |
| value +1. A byte's 'most significant' bit (MSb) is at the opposite |
| end of the byte. Bits in a byte are numbered from zero at the LSb to |
| $n$ ($n=7$ in an octet) for the |
| MSb. |
| |
| |
| |
| \subsubsection{byte order} |
| |
| Words are native groupings of multiple bytes. Several byte orderings |
| are possible in a word; the common ones are 3-2-1-0 ('big endian' or |
| 'most significant byte first' in which the highest-valued byte comes |
| first), 0-1-2-3 ('little endian' or 'least significant byte first' in |
| which the lowest value byte comes first) and less commonly 3-1-2-0 and |
| 0-2-1-3 ('mixed endian'). |
| |
| The Vorbis bitpacking convention specifies storage and bitstream |
| manipulation at the byte, not word, level, thus host word ordering is |
| of a concern only during optimization when writing high performance |
| code that operates on a word of storage at a time rather than by byte. |
| Logically, bytes are always coded and decoded in order from byte zero |
| through byte $n$. |
| |
| |
| |
| \subsubsection{coding bits into byte sequences} |
| |
| The Vorbis codec has need to code arbitrary bit-width integers, from |
| zero to 32 bits wide, into packets. These integer fields are not |
| aligned to the boundaries of the byte representation; the next field |
| is written at the bit position at which the previous field ends. |
| |
| The encoder logically packs integers by writing the LSb of a binary |
| integer to the logical bitstream first, followed by next least |
| significant bit, etc, until the requested number of bits have been |
| coded. When packing the bits into bytes, the encoder begins by |
| placing the LSb of the integer to be written into the least |
| significant unused bit position of the destination byte, followed by |
| the next-least significant bit of the source integer and so on up to |
| the requested number of bits. When all bits of the destination byte |
| have been filled, encoding continues by zeroing all bits of the next |
| byte and writing the next bit into the bit position 0 of that byte. |
| Decoding follows the same process as encoding, but by reading bits |
| from the byte stream and reassembling them into integers. |
| |
| |
| |
| \subsubsection{signedness} |
| |
| The signedness of a specific number resulting from decode is to be |
| interpreted by the decoder given decode context. That is, the three |
| bit binary pattern 'b111' can be taken to represent either 'seven' as |
| an unsigned integer, or '-1' as a signed, two's complement integer. |
| The encoder and decoder are responsible for knowing if fields are to |
| be treated as signed or unsigned. |
| |
| |
| |
| \subsubsection{coding example} |
| |
| Code the 4 bit integer value '12' [b1100] into an empty bytestream. |
| Bytestream result: |
| |
| \begin{Verbatim}[commandchars=\\\{\}] |
| | |
| V |
| |
| 7 6 5 4 3 2 1 0 |
| byte 0 [0 0 0 0 1 1 0 0] <- |
| byte 1 [ ] |
| byte 2 [ ] |
| byte 3 [ ] |
| ... |
| byte n [ ] bytestream length == 1 byte |
| |
| \end{Verbatim} |
| |
| |
| Continue by coding the 3 bit integer value '-1' [b111]: |
| |
| \begin{Verbatim}[commandchars=\\\{\}] |
| | |
| V |
| |
| 7 6 5 4 3 2 1 0 |
| byte 0 [0 1 1 1 1 1 0 0] <- |
| byte 1 [ ] |
| byte 2 [ ] |
| byte 3 [ ] |
| ... |
| byte n [ ] bytestream length == 1 byte |
| \end{Verbatim} |
| |
| |
| Continue by coding the 7 bit integer value '17' [b0010001]: |
| |
| \begin{Verbatim}[commandchars=\\\{\}] |
| | |
| V |
| |
| 7 6 5 4 3 2 1 0 |
| byte 0 [1 1 1 1 1 1 0 0] |
| byte 1 [0 0 0 0 1 0 0 0] <- |
| byte 2 [ ] |
| byte 3 [ ] |
| ... |
| byte n [ ] bytestream length == 2 bytes |
| bit cursor == 6 |
| \end{Verbatim} |
| |
| |
| Continue by coding the 13 bit integer value '6969' [b110 11001110 01]: |
| |
| \begin{Verbatim}[commandchars=\\\{\}] |
| | |
| V |
| |
| 7 6 5 4 3 2 1 0 |
| byte 0 [1 1 1 1 1 1 0 0] |
| byte 1 [0 1 0 0 1 0 0 0] |
| byte 2 [1 1 0 0 1 1 1 0] |
| byte 3 [0 0 0 0 0 1 1 0] <- |
| ... |
| byte n [ ] bytestream length == 4 bytes |
| |
| \end{Verbatim} |
| |
| |
| |
| |
| \subsubsection{decoding example} |
| |
| Reading from the beginning of the bytestream encoded in the above example: |
| |
| \begin{Verbatim}[commandchars=\\\{\}] |
| | |
| V |
| |
| 7 6 5 4 3 2 1 0 |
| byte 0 [1 1 1 1 1 1 0 0] <- |
| byte 1 [0 1 0 0 1 0 0 0] |
| byte 2 [1 1 0 0 1 1 1 0] |
| byte 3 [0 0 0 0 0 1 1 0] bytestream length == 4 bytes |
| |
| \end{Verbatim} |
| |
| |
| We read two, two-bit integer fields, resulting in the returned numbers |
| 'b00' and 'b11'. Two things are worth noting here: |
| |
| \begin{itemize} |
| \item Although these four bits were originally written as a single |
| four-bit integer, reading some other combination of bit-widths from the |
| bitstream is well defined. There are no artificial alignment |
| boundaries maintained in the bitstream. |
| |
| \item The second value is the |
| two-bit-wide integer 'b11'. This value may be interpreted either as |
| the unsigned value '3', or the signed value '-1'. Signedness is |
| dependent on decode context. |
| \end{itemize} |
| |
| |
| |
| |
| \subsubsection{end-of-packet alignment} |
| |
| The typical use of bitpacking is to produce many independent |
| byte-aligned packets which are embedded into a larger byte-aligned |
| container structure, such as an Ogg transport bitstream. Externally, |
| each bytestream (encoded bitstream) must begin and end on a byte |
| boundary. Often, the encoded bitstream is not an integer number of |
| bytes, and so there is unused (uncoded) space in the last byte of a |
| packet. |
| |
| Unused space in the last byte of a bytestream is always zeroed during |
| the coding process. Thus, should this unused space be read, it will |
| return binary zeroes. |
| |
| Attempting to read past the end of an encoded packet results in an |
| 'end-of-packet' condition. End-of-packet is not to be considered an |
| error; it is merely a state indicating that there is insufficient |
| remaining data to fulfill the desired read size. Vorbis uses truncated |
| packets as a normal mode of operation, and as such, decoders must |
| handle reading past the end of a packet as a typical mode of |
| operation. Any further read operations after an 'end-of-packet' |
| condition shall also return 'end-of-packet'. |
| |
| |
| |
| \subsubsection{reading zero bits} |
| |
| Reading a zero-bit-wide integer returns the value '0' and does not |
| increment the stream cursor. Reading to the end of the packet (but |
| not past, such that an 'end-of-packet' condition has not triggered) |
| and then reading a zero bit integer shall succeed, returning 0, and |
| not trigger an end-of-packet condition. Reading a zero-bit-wide |
| integer after a previous read sets 'end-of-packet' shall also fail |
| with 'end-of-packet'. |
| |
| |
| |
| |
| |
| |