| % -*- mode: latex; TeX-master: "Vorbis_I_spec"; -*- |
| %!TEX root = Vorbis_I_spec.tex |
| % $Id$ |
| \section{Embedding Vorbis into an Ogg stream} \label{vorbis:over:ogg} |
| |
| \subsection{Overview} |
| |
| This document describes using Ogg logical and physical transport |
| streams to encapsulate Vorbis compressed audio packet data into file |
| form. |
| |
| The \xref{vorbis:spec:intro} provides an overview of the construction |
| of Vorbis audio packets. |
| |
| The \href{oggstream.html}{Ogg |
| bitstream overview} and \href{framing.html}{Ogg logical |
| bitstream and framing spec} provide detailed descriptions of Ogg |
| transport streams. This specification document assumes a working |
| knowledge of the concepts covered in these named backround |
| documents. Please read them first. |
| |
| \subsubsection{Restrictions} |
| |
| The Ogg/Vorbis I specification currently dictates that Ogg/Vorbis |
| streams use Ogg transport streams in degenerate, unmultiplexed |
| form only. That is: |
| |
| \begin{itemize} |
| \item |
| A meta-headerless Ogg file encapsulates the Vorbis I packets |
| |
| \item |
| The Ogg stream may be chained, i.e., contain multiple, contigous logical streams (links). |
| |
| \item |
| The Ogg stream must be unmultiplexed (only one stream, a Vorbis audio stream, per link) |
| |
| \end{itemize} |
| |
| |
| This is not to say that it is not currently possible to multiplex |
| Vorbis with other media types into a multi-stream Ogg file. At the |
| time this document was written, Ogg was becoming a popular container |
| for low-bitrate movies consisting of DivX video and Vorbis audio. |
| However, a 'Vorbis I audio file' is taken to imply Vorbis audio |
| existing alone within a degenerate Ogg stream. A compliant 'Vorbis |
| audio player' is not required to implement Ogg support beyond the |
| specific support of Vorbis within a degenrate Ogg stream (naturally, |
| application authors are encouraged to support full multiplexed Ogg |
| handling). |
| |
| |
| |
| |
| \subsubsection{MIME type} |
| |
| The MIME type of Ogg files depend on the context. Specifically, complex |
| multimedia and applications should use \literal{application/ogg}, |
| while visual media should use \literal{video/ogg}, and audio |
| \literal{audio/ogg}. Vorbis data encapsulated in Ogg may appear |
| in any of those types. RTP encapsulated Vorbis should use |
| \literal{audio/vorbis} + \literal{audio/vorbis-config}. |
| |
| |
| \subsection{Encapsulation} |
| |
| Ogg encapsulation of a Vorbis packet stream is straightforward. |
| |
| \begin{itemize} |
| |
| \item |
| The first Vorbis packet (the identification header), which |
| uniquely identifies a stream as Vorbis audio, is placed alone in the |
| first page of the logical Ogg stream. This results in a first Ogg |
| page of exactly 58 bytes at the very beginning of the logical stream. |
| |
| |
| \item |
| This first page is marked 'beginning of stream' in the page flags. |
| |
| |
| \item |
| The second and third vorbis packets (comment and setup |
| headers) may span one or more pages beginning on the second page of |
| the logical stream. However many pages they span, the third header |
| packet finishes the page on which it ends. The next (first audio) packet |
| must begin on a fresh page. |
| |
| |
| \item |
| The granule position of these first pages containing only headers is zero. |
| |
| |
| \item |
| The first audio packet of the logical stream begins a fresh Ogg page. |
| |
| |
| \item |
| Packets are placed into ogg pages in order until the end of stream. |
| |
| |
| \item |
| The last page is marked 'end of stream' in the page flags. |
| |
| |
| \item |
| Vorbis packets may span page boundaries. |
| |
| |
| \item |
| The granule position of pages containing Vorbis audio is in units |
| of PCM audio samples (per channel; a stereo stream's granule position |
| does not increment at twice the speed of a mono stream). |
| |
| |
| \item |
| The granule position of a page represents the end PCM sample |
| position of the last packet \emph{completed} on that |
| page. The 'last PCM sample' is the last complete sample returned by |
| decode, not an internal sample awaiting lapping with a |
| subsequent block. A page that is entirely spanned by a single |
| packet (that completes on a subsequent page) has no granule |
| position, and the granule position is set to '-1'. |
| |
| |
| Note that the last decoded (fully lapped) PCM sample from a packet |
| is not necessarily the middle sample from that block. If, eg, the |
| current Vorbis packet encodes a "long block" and the next Vorbis |
| packet encodes a "short block", the last decodable sample from the |
| current packet be at position (3*long\_block\_length/4) - |
| (short\_block\_length/4). |
| |
| |
| \item |
| The granule (PCM) position of the first page need not indicate |
| that the stream started at position zero. Although the granule |
| position belongs to the last completed packet on the page and a |
| valid granule position must be positive, by |
| inference it may indicate that the PCM position of the beginning |
| of audio is positive or negative. |
| |
| |
| \begin{itemize} |
| \item |
| A positive starting value simply indicates that this stream begins at |
| some positive time offset, potentially within a larger |
| program. This is a common case when connecting to the middle |
| of broadcast stream. |
| |
| \item |
| A negative value indicates that |
| output samples preceeding time zero should be discarded during |
| decoding; this technique is used to allow sample-granularity |
| editing of the stream start time of already-encoded Vorbis |
| streams. The number of samples to be discarded must not exceed |
| the overlap-add span of the first two audio packets. |
| |
| \end{itemize} |
| |
| |
| In both of these cases in which the initial audio PCM starting |
| offset is nonzero, the second finished audio packet must flush the |
| page on which it appears and the third packet begin a fresh page. |
| This allows the decoder to always be able to perform PCM position |
| adjustments before needing to return any PCM data from synthesis, |
| resulting in correct positioning information without any aditional |
| seeking logic. |
| |
| |
| \begin{note} |
| Failure to do so should, at worst, cause a |
| decoder implementation to return incorrect positioning information |
| for seeking operations at the very beginning of the stream. |
| \end{note} |
| |
| |
| \item |
| A granule position on the final page in a stream that indicates |
| less audio data than the final packet would normally return is used to |
| end the stream on other than even frame boundaries. The difference |
| between the actual available data returned and the declared amount |
| indicates how many trailing samples to discard from the decoding |
| process. |
| |
| \end{itemize} |