| <!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Strict//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-strict.dtd"> |
| <html> |
| <head> |
| |
| <meta http-equiv="Content-Type" content="text/html; charset=iso-8859-15"/> |
| <title>Ogg Documentation</title> |
| |
| <style type="text/css"> |
| body { |
| margin: 0 18px 0 18px; |
| padding-bottom: 30px; |
| font-family: Verdana, Arial, Helvetica, sans-serif; |
| color: #333333; |
| font-size: .8em; |
| } |
| |
| a { |
| color: #3366cc; |
| } |
| |
| img { |
| border: 0; |
| } |
| |
| #xiphlogo { |
| margin: 30px 0 16px 0; |
| } |
| |
| #content p { |
| line-height: 1.4; |
| } |
| |
| h1, h1 a, h2, h2 a, h3, h3 a { |
| font-weight: bold; |
| color: #ff9900; |
| margin: 1.3em 0 8px 0; |
| } |
| |
| h1 { |
| font-size: 1.3em; |
| } |
| |
| h2 { |
| font-size: 1.2em; |
| } |
| |
| h3 { |
| font-size: 1.1em; |
| } |
| |
| li { |
| line-height: 1.4; |
| } |
| |
| #copyright { |
| margin-top: 30px; |
| line-height: 1.5em; |
| text-align: center; |
| font-size: .8em; |
| color: #888888; |
| clear: both; |
| } |
| </style> |
| |
| </head> |
| |
| <body> |
| |
| <div id="xiphlogo"> |
| <a href="http://www.xiph.org/"><img src="fish_xiph_org.png" alt="Fish Logo and Xiph.org"/></a> |
| </div> |
| |
| <h1>Ogg bitstream overview</h1> |
| |
| This document serves as starting point for understanding the design |
| and implementation of the Ogg container format. If you're new to Ogg |
| or merely want a high-level technical overview, start reading here. |
| Other documents linked from the <a href="index.html">index page</a> |
| give distilled technical descriptions and references of the container |
| mechanisms. This document is intended to aid understanding. |
| |
| <h2>Container format design points</h2> |
| |
| <p>Ogg is intended to be a simplest-possible container, concerned only |
| with framing, ordering, and interleave. It can be used as a stream delivery |
| mechanism, for media file storage, or as a building block toward |
| implementing a more complex, non-linear container (for example, see |
| the <a href="skeleton.html">Skeleton</a> or <a |
| href="http://en.wikipedia.org/wiki/Annodex">Annodex/CMML</a>). |
| |
| <p>The Ogg container is not intended to be a monolithic |
| 'kitchen-sink'. It exists only to frame and deliver in-order stream |
| data and as such is vastly simpler than most other containers. |
| Elementary and multiplexed streams are both constructed entirely from a |
| single building block (an Ogg page) comprised of eight fields |
| totalling twenty-eight bytes (the page header) a list of packet lengths |
| (up to 255 bytes) and payload data (up to 65025 bytes). The structure |
| of every page is the same. There are no optional fields or alternate |
| encodings. |
| |
| <p>Stream and media metadata is contained in Ogg and not built into |
| the Ogg container itself. Metadata is thus compartmentalized and |
| layered rather than part of a monolithic design, an especially good |
| idea as no two groups seem able to agree on what a complete or |
| complete-enough metadata set should be. In this way, the container and |
| container implementation are isolated from unnecessary design flux. |
| |
| <h3>Streaming</h3> |
| |
| <p>The Ogg container is primarily a streaming format, |
| encapsulating chronological, time-linear mixed media into a single |
| delivery stream or file. The design is such that an application can |
| always encode and/or decode all features of a bitstream in one pass |
| with no seeking and minimal buffering. Seeking to provide optimized |
| encoding (such as two-pass encoding) or interactive decoding (such as |
| scrubbing or instant replay) is not disallowed or discouraged, however |
| no container feature requires nonlinear access of the bitstream. |
| |
| <h3>Variable Bit Rate, Variable Payload Size</h3> |
| |
| <p>Ogg is designed to contain any size data payload with bounded, |
| predictable efficiency. Ogg packets have no maximum size and a |
| zero-byte minimum size. There is no restriction on size changes from |
| packet to packet. Variable size packets do not require the use of any |
| optional or additional container features. There is no optimal |
| suggested packet size, though special consideration was paid to make |
| sure 50-200 byte packets were no less efficient than larger packet |
| sizes. The original design criteria was a 2% overhead at 50 byte |
| packets, dropping to a maximum working overhead of 1% with larger |
| packets, and a typical working overhead of .5-.7% for most practical |
| uses. |
| |
| <h3>Simple pagination</h3> |
| |
| <p>Ogg is a byte-aligned container with no context-dependent, optional |
| or variable-length fields. Ogg requires no repacking of codec data. |
| The page structure is written out in-line as packet data is submitted |
| to the streaming abstraction. In addition, it is possible to |
| implement both Ogg mux and demux as MT-hot zero-copy abstractions (as |
| is done in the Tremor sourcebase). |
| |
| <h3>Capture</h3> |
| |
| <p>Ogg is designed for efficient and immediate stream capture with |
| high confidence. Although packets have no size limit in Ogg, pages |
| are a maximum of just under 64kB meaning that any Ogg stream can be |
| captured with confidence after seeing 128kB of data or less [worst |
| case; typical figure is 6kB] from any random starting point in the |
| stream. |
| |
| <h3>Seeking</h3> |
| |
| <p>Ogg implements simple coarse- and fine-grained seeking by design. |
| |
| <p>Coarse seeking may be performed by simply 'moving the tone arm' to a |
| new position and 'dropping the needle'. Rapid capture with |
| accompanying timecode from any location in an Ogg file is guaranteed |
| by the stream design. From the acquisition of the first timecode, |
| all data needed to play back from that time code forward is ahead of |
| the stream cursor. |
| |
| <p>Ogg implements full sample-granularity seeking using an |
| interpolated bisection search built on the capture and timecode |
| mechanisms used by coarse seeking. As above, once a search finds |
| the desired timecode, all data needed to play back from that time code |
| forward is ahead of the stream cursor. |
| |
| <p>Both coarse and fine seeking use the page structure and sequencing |
| inherent to the Ogg format. All Ogg streams are fully seekable from |
| creation; seekability is unaffected by truncation or missing data, and |
| is tolerant of gross corruption. Seek operations are neither 'fuzzy' nor |
| heuristic. |
| |
| <p>Seeking without use of an index is a major point of the Ogg |
| design. There are several reasons why Ogg forgoes an index: |
| |
| <ul> |
| |
| <li>It must be possible to create an Ogg stream in a single pass, and |
| an index requires either two passes to create, or the index must be |
| tacked onto the end of a live stream after the stream is finished. |
| Both methods run afoul of other design constraints. |
| |
| <li>An index is only marginally useful in Ogg for the complexity |
| added; it adds no new functionality and seldom improves performance |
| noticeably. Empirical testing shows that indexless interpolation |
| search does not require many more seeks in practice than using an |
| index would. |
| |
| <li>'Optional' indexes encourage lazy implementations that can seek |
| only when indexes are present, or that implement indexless seeking |
| only by building an internal index after reading the entire file |
| beginning to end. This has been the fate of other containers that |
| specify optional indexing. |
| |
| </ul> |
| |
| <h3>Simple multiplexing</h3> |
| |
| <p>Ogg multiplexes streams by interleaving pages from multiple elementary streams into a |
| multiplexed stream in time order. The multiplexed pages are not |
| altered. Muxing an Ogg AV stream out of separate audio, |
| video and data streams is akin to shuffling several decks of cards |
| together into a single deck; the cards themselves remain unchanged. |
| Demultiplexing is similarly simple (as the cards are marked). |
| |
| <p>The goal of this design is to make the mux/demux operation as |
| trivial as possible to allow live streaming systems to build and |
| rebuild streams on the fly with minimal CPU usage and no additional |
| storage or latency requirements. |
| |
| <h3>Continuous and Discontinuous Media</h3> |
| |
| <p>Ogg streams belong to one of two categories, "Continuous" streams and |
| "Discontinuous" streams. |
| |
| <p>A stream that provides a gapless, time-continuous media type with a |
| fine-grained timebase is considered to be 'Continuous'. A continuous |
| stream should never be starved of data. Examples of continuous data |
| types include broadcast audio and video. |
| |
| <p>A stream that delivers data in a potentially irregular pattern or |
| with widely spaced timing gaps is considered to be 'Discontinuous'. A |
| discontinuous stream may be best thought of as data representing |
| scattered events; although they happen in order, they are typically |
| unconnected data often located far apart. One example of a |
| discontinuous stream types would be captioning such as <a |
| href="http://wiki.xiph.org/OggKate">Ogg Kate</a>. Although it's |
| possible to design captions as a continuous stream type, it's most |
| natural to think of captions as widely spaced pieces of text with |
| little happening between. |
| |
| <p>The fundamental reason for distinction between continuous and |
| discontinuous streams concerns buffering. |
| |
| <h3>Buffering</h3> |
| |
| <p>A continuous stream is, by definition, gapless. Ogg buffering is based |
| on the simple premise of never allowing an active continuous stream |
| to starve for data during decode; buffering works ahead until all |
| continuous streams in a physical stream have data ready and no further. |
| |
| <p>Discontinuous stream data is not assumed to be predictable. The |
| buffering design takes discontinuous data 'as it comes' rather than |
| working ahead to look for future discontinuous data for a potentially |
| unbounded period. Thus, the buffering process makes no attempt to fill |
| discontinuous stream buffers; their pages simply 'fall out' of the |
| stream when continuous streams are handled properly. |
| |
| <p>Buffering requirements in this design need not be explicitly |
| declared or managed in the encoded stream. The decoder simply reads as |
| much data as is necessary to keep all continuous stream types gapless |
| and no more, with discontinuous data processed as it arrives in the |
| continuous data. Buffering is implicitly optimal for the given |
| stream. Because all pages of all data types are stamped with absolute |
| timing information within the stream, inter-stream synchronization |
| timing is always maintained without the need for explicitly declared |
| buffer-ahead hinting. |
| |
| <h3>Codec metadata</h3> |
| |
| <p>Ogg does not replicate codec-specific metadata into the mux layer |
| in an attempt to make the mux and codec layer implementations 'fully |
| separable'. Things like specific timebase, keyframing strategy, frame |
| duration, etc, do not appear in the Ogg container. The mux layer is, |
| instead, expected to query a codec through a standardized interface, |
| left to the implementation, for this data when it is needed. |
| |
| <p>Though modern design wisdom usually prefers to predict all possible |
| needs of current and future codecs then embed these dependencies and |
| the required metadata into the container itself, this strategy |
| increases container specification complexity, fragility, and rigidity. |
| The mux and codec implementations become more independent, but the |
| specifications become less independent. A codec can't do what a |
| container hasn't already provided for. New codecs are harder to |
| support, and you can do fewer useful things with the ones you've |
| already got (eg, try to make a good splitter without using any codecs. |
| You're stuck splitting at keyframes only, or building yet another new |
| mechanism into the container layer to mark what frames to skip |
| displaying). |
| |
| <p>Ogg's design goes the opposite direction, where the specification |
| is to be as simple, easy to understand, and 'proofed' against novel |
| codecs as possible. When an Ogg mux layer requires codec-specific |
| information, it queries the codec (or a codec stub). This trades a |
| more complex implementation for a simpler, more flexible |
| specification. |
| |
| <h3>Stream structure metadata</h3> |
| |
| <p>The Ogg container itself does not define a metadata system for |
| declaring the structure and interrelations between multiple media |
| types in a muxed stream. That is, the Ogg container itself does not |
| specify data like 'which steam is the subtitle stream?' or 'which |
| video stream is the primary angle?'. This metadata still exists, but |
| is stored in the Ogg container rather than being built into the Ogg |
| container. Xiph specifies the 'Skeleton' metadata format for Ogg |
| streams, but this decoupling of container and stream structure |
| metadata means it is possible to use Ogg with any metadata |
| specification without altering the container itself, or without stream |
| structure metadata at all. |
| |
| <h3>Frame accurate absolute position</h3> |
| |
| <p>Every Ogg page is stamped with a 64 bit 'granule position' that |
| serves as an absolute timestamp for mux and seeking. A few nifty |
| little tricks are usually also embedded in the granpos state, but |
| we'll leave those aside for the moment (strictly speaking, they're |
| part of each codec's mapping, not Ogg). |
| |
| <p>As previously mentioned above, granule positions are mapped into |
| absolute timestamps by the codec, rather than being a hard timestamp. |
| This allows maximally efficient use of the available 64 bits to |
| address every sample/frame position without approximation while |
| supporting new and previously unknown timebase encodings without |
| needing to extend or update the mux layer. When a codec needs a novel |
| timebase, it simply brings the code for that mapping along with it. |
| This is not a theoretical curiosity; new, wholly novel timebases were |
| deployed with the adoption of both Theora and Dirac. "Rolling INTRA" |
| (keyframeless video) also benefits from novel use of the granule |
| position. |
| |
| <h2>Ogg stream arrangement</h2> |
| |
| <h3>Packets, pages, and bitstreams</h3> |
| |
| <p>Ogg codecs use <em>packets</em>. Packets are octet payloads of |
| raw, compressed data, containing the data needed for a single |
| decompressed unit, eg, one video frame. Packets have no maximum size |
| and may be zero length. They do not have any high-level structure or |
| boundary information; strung together, the unframed packets form a |
| <em>logical bitstream</em> of apparently random bytes with no internal |
| landmarks. |
| |
| <p>Logical bitstream packets are grouped and framed into Ogg pages |
| along with a unique stream <em>serial number</em> to produce a |
| <em>physical bitstream</em>. An <em>elementary stream</em> is a |
| physical bitstream containing only the pages framing a single logical |
| bitstream. Each page is a self contained entity, although a packet may |
| be split and encoded across one or more pages. The page decode |
| mechanism is designed to recognize, verify and handle single pages at |
| a time from the overall bitstream. |
| |
| <p><a href="framing.html">Ogg Bitstream Framing</a> specifies |
| the page format of an Ogg bitstream, the packet coding process |
| and elementary bitstreams in detail. |
| |
| <h3>Multiplexed bitstreams</h3> |
| |
| <p>Multiple logical/elementary bitstreams can be combined into a single |
| <em>multiplexed bitstream</em> by interleaving whole pages from each |
| contributing elementary stream in time order. The result is a single |
| physical stream that multiplexes and frames multiple logical streams. |
| Each logical stream is identified by the unique stream serial number |
| stamped in its pages. A physical stream may include a 'meta-header' |
| (such as the <a href="skeleton.html">Ogg Skeleton</a>) comprising its |
| own Ogg page at the beginning of the physical stream. A decoder |
| recovers the original logical/elementary bitstreams out of the |
| physical bitstream by taking the pages in order from the physical |
| bitstream and redirecting them into the appropriate logical decoding |
| entity. |
| |
| <p><a href="ogg-multiplex.html">Ogg Bitstream Multiplexing</a> specifies |
| proper multiplexing of an Ogg bitstream in detail. |
| |
| <h3>Chaining</h3> |
| |
| <p>Multiple Ogg physical bitstreams may be concatenated into a single new |
| stream; this is <em>chaining</em>. The bitstreams do not overlap; the |
| final page of a given logical bitstream is immediately followed by the |
| initial page of the next.</p> |
| |
| <p>Each logical bitstream in a chain must have a unique serial number |
| within the scope of the full physical bitstream, not only within a |
| particular <em>link</em> or <em>segment</em> of the chain.</p> |
| |
| <h3>Continuous and discontinuous streams</h3> |
| |
| <p>Within Ogg, each stream must be declared (by the codec) to be |
| continuous- or discontinuous-time. Most codecs treat all streams they |
| use as either inherently continuous- or discontinuous-time, although |
| this is not a requirement. A codec may, as part of its mapping, choose |
| according to data in the initial header. |
| |
| <p>Continuous-time pages are stamped by end-time, discontinuous pages |
| are stamped by begin-time. Pages in a multiplexed stream are |
| interleaved in order of the time stamp regardless of stream type. |
| Both continuous and discontinuous logical streams are used to seek |
| within a physical stream, however only continuous streams are used to |
| determine buffering depth; because discontinuous streams are stamped |
| by start time, they will always 'fall out' in time when buffering |
| tracks only the continuous streams. See 'Examples' for an |
| illustration of the buffering mechanism. |
| |
| <h2>Mapping Requirements</h2> |
| |
| <p>Each codec is allowed some freedom in deciding how its logical |
| bitstream is encapsulated into an Ogg bitstream (even if it is a |
| trivial mapping, eg, 'plop the packets in and go'). This is the |
| codec's <em>mapping</em>. Ogg imposes a few mapping requirements |
| on any codec. |
| |
| <p>The <a href="framing.html">framing specification</a> defines |
| 'beginning of stream' and 'end of stream' page markers via a header |
| flag (it is possible for a stream to consist of a single page). A |
| correct stream always consists of an integer number of pages, an easy |
| requirement given the variable size nature of pages.</p> |
| |
| <p>The first page of an elementary Ogg bitstream consists of a single, |
| small 'initial header' packet that must include sufficient information |
| to identify the exact CODEC type. From this initial header, the codec |
| must also be able to determine its timebase and whether or not it is a |
| continuous- or discontinuous-time stream. The initial header must fit |
| on a single page. If a codec makes use of auxiliary headers (for |
| example, Vorbis uses two auxiliary headers), these headers must follow |
| the initial header immediately. The last header finishes its page; |
| data begins on a fresh page. |
| |
| <p>As an example, Ogg Vorbis places the name and revision of the |
| Vorbis CODEC, the audio rate and the audio quality into this initial |
| header. Comments and detailed codec setup appears in the larger |
| auxiliary headers.</p> |
| |
| <h2>Multiplexing Requirements</h2> |
| |
| <p>Multiplexing requirements within Ogg are straightforward. When |
| constructing a single-link (unchained) physical bitstream consisting |
| of multiple elementary streams: |
| |
| <ol> |
| |
| <li> The initial header for each stream appears in sequence, each |
| header on a single page. All initial headers must appear with no |
| intervening data (no auxiliary header pages or packets, no data pages |
| or packets). Order of the initial headers is unspecified. The |
| 'beginning of stream' flag is set on each initial header. |
| |
| <li> All auxiliary headers for all streams must follow. Order |
| is unspecified. The final auxiliary header of each stream must flush |
| its page. |
| |
| <li>Data pages for each stream follow, interleaved in time order. |
| |
| <li>The final page of each stream sets the 'end of stream' flag. |
| Unlike initial pages, terminal pages for the logical bitstreams need |
| not occur contiguously; indeed it may not be possible for them to do so. |
| </oL> |
| |
| <p>Each grouped bitstream must have a unique serial number within the |
| scope of the physical bitstream.</p> |
| |
| <h3>chaining and multiplexing</h3> |
| |
| <p>Multiplexed and/or unmultiplexed bitstreams may be chained |
| consecutively. Such a physical bitstream obeys all the rules of both |
| chained and multiplexed streams. Each link, when unchained, must |
| stand on its own as a valid physical bitstream. Chained streams do |
| not mix; a new segment may not begin until all streams in the |
| preceding segment have terminated. </p> |
| |
| <h2>Examples</h2> |
| |
| <em>[More to come shortly; this section is currently being revised and expanded]</em> |
| |
| <p>Below, we present an example of a multiplexed and chained bitstream:</p> |
| |
| <p><img src="stream.png" alt="stream"/></p> |
| |
| <p>In this example, we see pages from five total logical bitstreams |
| multiplexed into a physical bitstream. Note the following |
| characteristics:</p> |
| |
| <ol> |
| <li>Multiplexed bitstreams in a given link begin together; all of the |
| initial pages must appear before any data pages. When concurrently |
| multiplexed groups are chained, the new group does not begin until all |
| the bitstreams in the previous group have terminated.</li> |
| |
| <li>The ordering of pages of concurrently multiplexed bitstreams is |
| goverened by timestamp (not shown here); there is no regular |
| interleaving order. Pages within a logical bitstream appear in |
| sequence order.</li> |
| </ol> |
| |
| <div id="copyright"> |
| The Xiph Fish Logo is a |
| trademark (™) of Xiph.Org.<br/> |
| |
| These pages © 1994 - 2010 Xiph.Org. All rights reserved. |
| </div> |
| |
| </body> |
| </html> |