| <!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Strict//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-strict.dtd"> |
| <html> |
| <head> |
| |
| <meta http-equiv="Content-Type" content="text/html; charset=iso-8859-15"/> |
| <title>Ogg Documentation</title> |
| |
| <style type="text/css"> |
| body { |
| margin: 0 18px 0 18px; |
| padding-bottom: 30px; |
| font-family: Verdana, Arial, Helvetica, sans-serif; |
| color: #333333; |
| font-size: .8em; |
| } |
| |
| a { |
| color: #3366cc; |
| } |
| |
| img { |
| border: 0; |
| } |
| |
| #xiphlogo { |
| margin: 30px 0 16px 0; |
| } |
| |
| #content p { |
| line-height: 1.4; |
| } |
| |
| h1, h1 a, h2, h2 a, h3, h3 a { |
| font-weight: bold; |
| color: #ff9900; |
| margin: 1.3em 0 8px 0; |
| } |
| |
| h1 { |
| font-size: 1.3em; |
| } |
| |
| h2 { |
| font-size: 1.2em; |
| } |
| |
| h3 { |
| font-size: 1.1em; |
| } |
| |
| li { |
| line-height: 1.4; |
| } |
| |
| #copyright { |
| margin-top: 30px; |
| line-height: 1.5em; |
| text-align: center; |
| font-size: .8em; |
| color: #888888; |
| clear: both; |
| } |
| </style> |
| |
| </head> |
| |
| <body> |
| |
| <div id="xiphlogo"> |
| <a href="http://www.xiph.org/"><img src="fish_xiph_org.png" alt="Fish Logo and Xiph.org"/></a> |
| </div> |
| |
| <h1>Ogg logical bitstream framing</h1> |
| |
| <h2>Ogg bitstreams</h2> |
| |
| <p>The Ogg transport bitstream is designed to provide framing, error |
| protection and seeking structure for higher-level codec streams that |
| consist of raw, unencapsulated data packets, such as the Vorbis audio |
| codec or Theora video codec.</p> |
| |
| <h2>Application example: Vorbis</h2> |
| |
| <p>Vorbis encodes short-time blocks of PCM data into raw packets of |
| bit-packed data. These raw packets may be used directly by transport |
| mechanisms that provide their own framing and packet-separation |
| mechanisms (such as UDP datagrams). For stream based storage (such as |
| files) and transport (such as TCP streams or pipes), Vorbis uses the |
| Ogg bitstream format to provide framing/sync, sync recapture |
| after error, landmarks during seeking, and enough information to |
| properly separate data back into packets at the original packet |
| boundaries without relying on decoding to find packet boundaries.</p> |
| |
| <h2>Design constraints for Ogg bitstreams</h2> |
| |
| <ol> |
| <li>True streaming; we must not need to seek to build a 100% |
| complete bitstream.</li> |
| <li>Use no more than approximately 1-2% of bitstream bandwidth for |
| packet boundary marking, high-level framing, sync and seeking.</li> |
| <li>Specification of absolute position within the original sample |
| stream.</li> |
| <li>Simple mechanism to ease limited editing, such as a simplified |
| concatenation mechanism.</li> |
| <li>Detection of corruption, recapture after error and direct, random |
| access to data at arbitrary positions in the bitstream.</li> |
| </ol> |
| |
| <h2>Logical and Physical Bitstreams</h2> |
| |
| <p>A <em>logical</em> Ogg bitstream is a contiguous stream of |
| sequential pages belonging only to the logical bitstream. A |
| <em>physical</em> Ogg bitstream is constructed from one or more |
| than one logical Ogg bitstream (the simplest physical bitstream |
| is simply a single logical bitstream). We describe below the exact |
| formatting of an Ogg logical bitstream. Combining logical |
| bitstreams into more complex physical bitstreams is described in the |
| <a href="oggstream.html">Ogg bitstream overview</a>. The exact |
| mapping of raw Vorbis packets into a valid Ogg Vorbis physical |
| bitstream is described in the Vorbis I Specification.</p> |
| |
| <h2>Bitstream structure</h2> |
| |
| <p>An Ogg stream is structured by dividing incoming packets into |
| segments of up to 255 bytes and then wrapping a group of contiguous |
| packet segments into a variable length page preceded by a page |
| header. Both the header size and page size are variable; the page |
| header contains sizing information and checksum data to determine |
| header/page size and data integrity.</p> |
| |
| <p>The bitstream is captured (or recaptured) by looking for the beginning |
| of a page, specifically the capture pattern. Once the capture pattern |
| is found, the decoder verifies page sync and integrity by computing |
| and comparing the checksum. At that point, the decoder can extract the |
| packets themselves.</p> |
| |
| <h3>Packet segmentation</h3> |
| |
| <p>Packets are logically divided into multiple segments before encoding |
| into a page. Note that the segmentation and fragmentation process is a |
| logical one; it's used to compute page header values and the original |
| page data need not be disturbed, even when a packet spans page |
| boundaries.</p> |
| |
| <p>The raw packet is logically divided into [n] 255 byte segments and a |
| last fractional segment of < 255 bytes. A packet size may well |
| consist only of the trailing fractional segment, and a fractional |
| segment may be zero length. These values, called "lacing values" are |
| then saved and placed into the header segment table.</p> |
| |
| <p>An example should make the basic concept clear:</p> |
| |
| <pre> |
| <tt> |
| raw packet: |
| ___________________________________________ |
| |______________packet data__________________| 753 bytes |
| |
| lacing values for page header segment table: 255,255,243 |
| </tt> |
| </pre> |
| |
| <p>We simply add the lacing values for the total size; the last lacing |
| value for a packet is always the value that is less than 255. Note |
| that this encoding both avoids imposing a maximum packet size as well |
| as imposing minimum overhead on small packets (as opposed to, eg, |
| simply using two bytes at the head of every packet and having a max |
| packet size of 32k. Small packets (<255, the typical case) are |
| penalized with twice the segmentation overhead). Using the lacing |
| values as suggested, small packets see the minimum possible |
| byte-aligned overhead (1 byte) and large packets, over 512 bytes or |
| so, see a fairly constant ~.5% overhead on encoding space.</p> |
| |
| <p>Note that a lacing value of 255 implies that a second lacing value |
| follows in the packet, and a value of < 255 marks the end of the |
| packet after that many additional bytes. A packet of 255 bytes (or a |
| multiple of 255 bytes) is terminated by a lacing value of 0:</p> |
| |
| <pre><tt> |
| raw packet: |
| _______________________________ |
| |________packet data____________| 255 bytes |
| |
| lacing values: 255, 0 |
| </tt></pre> |
| |
| <p>Note also that a 'nil' (zero length) packet is not an error; it |
| consists of nothing more than a lacing value of zero in the header.</p> |
| |
| <h3>Packets spanning pages</h3> |
| |
| <p>Packets are not restricted to beginning and ending within a page, |
| although individual segments are, by definition, required to do so. |
| Packets are not restricted to a maximum size, although excessively |
| large packets in the data stream are discouraged; the Ogg |
| bitstream specification strongly recommends nominal page size of |
| approximately 4-8kB (large packets are foreseen as being useful for |
| initialization data at the beginning of a logical bitstream).</p> |
| |
| <p>After segmenting a packet, the encoder may decide not to place all the |
| resulting segments into the current page; to do so, the encoder places |
| the lacing values of the segments it wishes to belong to the current |
| page into the current segment table, then finishes the page. The next |
| page is begun with the first value in the segment table belonging to |
| the next packet segment, thus continuing the packet (data in the |
| packet body must also correspond properly to the lacing values in the |
| spanned pages. The segment data in the first packet corresponding to |
| the lacing values of the first page belong in that page; packet |
| segments listed in the segment table of the following page must begin |
| the page body of the subsequent page).</p> |
| |
| <p>The last mechanic to spanning a page boundary is to set the header |
| flag in the new page to indicate that the first lacing value in the |
| segment table continues rather than begins a packet; a header flag of |
| 0x01 is set to indicate a continued packet. Although mandatory, it |
| is not actually algorithmically necessary; one could inspect the |
| preceding segment table to determine if the packet is new or |
| continued. Adding the information to the packet_header flag allows a |
| simpler design (with no overhead) that needs only inspect the current |
| page header after frame capture. This also allows faster error |
| recovery in the event that the packet originates in a corrupt |
| preceding page, implying that the previous page's segment table |
| cannot be trusted.</p> |
| |
| <p>Note that a packet can span an arbitrary number of pages; the above |
| spanning process is repeated for each spanned page boundary. Also a |
| 'zero termination' on a packet size that is an even multiple of 255 |
| must appear even if the lacing value appears in the next page as a |
| zero-length continuation of the current packet. The header flag |
| should be set to 0x01 to indicate that the packet spanned, even though |
| the span is a nil case as far as data is concerned.</p> |
| |
| <p>The encoding looks odd, but is properly optimized for speed and the |
| expected case of the majority of packets being between 50 and 200 |
| bytes (note that it is designed such that packets of wildly different |
| sizes can be handled within the model; placing packet size |
| restrictions on the encoder would have only slightly simplified design |
| in page generation and increased overall encoder complexity).</p> |
| |
| <p>The main point behind tracking individual packets (and packet |
| segments) is to allow more flexible encoding tricks that requiring |
| explicit knowledge of packet size. An example is simple bandwidth |
| limiting, implemented by simply truncating packets in the nominal case |
| if the packet is arranged so that the least sensitive portion of the |
| data comes last.</p> |
| |
| <h3>Page header</h3> |
| |
| <p>The headering mechanism is designed to avoid copying and re-assembly |
| of the packet data (ie, making the packet segmentation process a |
| logical one); the header can be generated directly from incoming |
| packet data. The encoder buffers packet data until it finishes a |
| complete page at which point it writes the header followed by the |
| buffered packet segments.</p> |
| |
| <h4>capture_pattern</h4> |
| |
| <p>A header begins with a capture pattern that simplifies identifying |
| pages; once the decoder has found the capture pattern it can do a more |
| intensive job of verifying that it has in fact found a page boundary |
| (as opposed to an inadvertent coincidence in the byte stream).</p> |
| |
| <pre><tt> |
| byte value |
| |
| 0 0x4f 'O' |
| 1 0x67 'g' |
| 2 0x67 'g' |
| 3 0x53 'S' |
| </tt></pre> |
| |
| <h4>stream_structure_version</h4> |
| |
| <p>The capture pattern is followed by the stream structure revision:</p> |
| |
| <pre><tt> |
| byte value |
| |
| 4 0x00 |
| </tt></pre> |
| |
| <h4>header_type_flag</h4> |
| |
| <p>The header type flag identifies this page's context in the bitstream:</p> |
| |
| <pre><tt> |
| byte value |
| |
| 5 bitflags: 0x01: unset = fresh packet |
| set = continued packet |
| 0x02: unset = not first page of logical bitstream |
| set = first page of logical bitstream (bos) |
| 0x04: unset = not last page of logical bitstream |
| set = last page of logical bitstream (eos) |
| </tt></pre> |
| |
| <h4>absolute granule position</h4> |
| |
| <p>(This is packed in the same way the rest of Ogg data is packed; LSb |
| of LSB first. Note that the 'position' data specifies a 'sample' |
| number (eg, in a CD quality sample is four octets, 16 bits for left |
| and 16 bits for right; in video it would likely be the frame number. |
| It is up to the specific codec in use to define the semantic meaning |
| of the granule position value). The position specified is the total |
| samples encoded after including all packets finished on this page |
| (packets begun on this page but continuing on to the next page do not |
| count). The rationale here is that the position specified in the |
| frame header of the last page tells how long the data coded by the |
| bitstream is. A truncated stream will still return the proper number |
| of samples that can be decoded fully.</p> |
| |
| <p>A special value of '-1' (in two's complement) indicates that no packets |
| finish on this page.</p> |
| |
| <pre><tt> |
| byte value |
| |
| 6 0xXX LSB |
| 7 0xXX |
| 8 0xXX |
| 9 0xXX |
| 10 0xXX |
| 11 0xXX |
| 12 0xXX |
| 13 0xXX MSB |
| </tt></pre> |
| |
| <h4>stream serial number</h4> |
| |
| <p>Ogg allows for separate logical bitstreams to be mixed at page |
| granularity in a physical bitstream. The most common case would be |
| sequential arrangement, but it is possible to interleave pages for |
| two separate bitstreams to be decoded concurrently. The serial |
| number is the means by which pages physical pages are associated with |
| a particular logical stream. Each logical stream must have a unique |
| serial number within a physical stream:</p> |
| |
| <pre><tt> |
| byte value |
| |
| 14 0xXX LSB |
| 15 0xXX |
| 16 0xXX |
| 17 0xXX MSB |
| </tt></pre> |
| |
| <h4>page sequence no</h4> |
| |
| <p>Page counter; lets us know if a page is lost (useful where packets |
| span page boundaries).</p> |
| |
| <pre><tt> |
| byte value |
| |
| 18 0xXX LSB |
| 19 0xXX |
| 20 0xXX |
| 21 0xXX MSB |
| </tt></pre> |
| |
| <h4>page checksum</h4> |
| |
| <p>32 bit CRC value (direct algorithm, initial val and final XOR = 0, |
| generator polynomial=0x04c11db7). The value is computed over the |
| entire header (with the CRC field in the header set to zero) and then |
| continued over the page. The CRC field is then filled with the |
| computed value.</p> |
| |
| <p>(A thorough discussion of CRC algorithms can be found in <a |
| href="http://www.ross.net/crc/download/crc_v3.txt">"A |
| Painless Guide to CRC Error Detection Algorithms"</a> by Ross |
| Williams <a href="mailto:ross@ross.net">ross@ross.net</a>.)</p> |
| |
| <pre><tt> |
| byte value |
| |
| 22 0xXX LSB |
| 23 0xXX |
| 24 0xXX |
| 25 0xXX MSB |
| </tt></pre> |
| |
| <h4>page_segments</h4> |
| |
| <p>The number of segment entries to appear in the segment table. The |
| maximum number of 255 segments (255 bytes each) sets the maximum |
| possible physical page size at 65307 bytes or just under 64kB (thus |
| we know that a header corrupted so as destroy sizing/alignment |
| information will not cause a runaway bitstream. We'll read in the |
| page according to the corrupted size information that's guaranteed to |
| be a reasonable size regardless, notice the checksum mismatch, drop |
| sync and then look for recapture).</p> |
| |
| <pre><tt> |
| byte value |
| |
| 26 0x00-0xff (0-255) |
| </tt></pre> |
| |
| <h4>segment_table (containing packet lacing values)</h4> |
| |
| <p>The lacing values for each packet segment physically appearing in |
| this page are listed in contiguous order.</p> |
| |
| <pre><tt> |
| byte value |
| |
| 27 0x00-0xff (0-255) |
| [...] |
| n 0x00-0xff (0-255, n=page_segments+26) |
| </tt></pre> |
| |
| <p>Total page size is calculated directly from the known header size and |
| lacing values in the segment table. Packet data segments follow |
| immediately after the header.</p> |
| |
| <p>Page headers typically impose a flat .25-.5% space overhead assuming |
| nominal ~8k page sizes. The segmentation table needed for exact |
| packet recovery in the streaming layer adds approximately .5-1% |
| nominal assuming expected encoder behavior in the 44.1kHz, 128kbps |
| stereo encodings.</p> |
| |
| <div id="copyright"> |
| The Xiph Fish Logo is a |
| trademark (™) of Xiph.Org.<br/> |
| |
| These pages © 1994 - 2005 Xiph.Org. All rights reserved. |
| </div> |
| |
| </body> |
| </html> |