Commit 46138b1b authored by Tim-Philipp Müller's avatar Tim-Philipp Müller
Browse files

docs: design: move most design docs to gst-docs module

parent 49653b05
......@@ -2,16 +2,5 @@ SUBDIRS =
EXTRA_DIST = \
design-audiosinks.txt \
design-decodebin.txt \
design-encoding.txt \
design-orc-integration.txt \
draft-hw-acceleration.txt \
draft-keyframe-force.txt \
draft-subtitle-overlays.txt\
draft-va.txt \
part-interlaced-video.txt \
part-mediatype-audio-raw.txt\
part-mediatype-text-raw.txt\
part-mediatype-video-raw.txt\
part-playbin.txt
draft-va.txt
Audiosink design
----------------
Requirements:
- must operate chain based.
Most simple playback pipelines will push audio from the decoders
into the audio sink.
- must operate getrange based
Most professional audio applications will operate in a mode where
the audio sink pulls samples from the pipeline. This is typically
done in a callback from the audiosink requesting N samples. The
callback is either scheduled from a thread or from an interrupt
from the audio hardware device.
- Exact sample accurate clocks.
the audiosink must be able to provide a clock that is sample
accurate even if samples are dropped or when discontinuities are
found in the stream.
- Exact timing of playback.
The audiosink must be able to play samples at their exact times.
- use DMA access when possible.
When the hardware can do DMA we should use it. This should also
work over bufferpools to avoid data copying to/from kernel space.
Design:
The design is based on a set of base classes and the concept of a
ringbuffer of samples.
+-----------+ - provide preroll, rendering, timing
+ basesink + - caps nego
+-----+-----+
|
+-----V----------+ - manages ringbuffer
+ audiobasesink + - manages scheduling (push/pull)
+-----+----------+ - manages clock/query/seek
| - manages scheduling of samples in the ringbuffer
| - manages caps parsing
|
+-----V------+ - default ringbuffer implementation with a GThread
+ audiosink + - subclasses provide open/read/close methods
+------------+
The ringbuffer is a contiguous piece of memory divided into segtotal
pieces of segments. Each segment has segsize bytes.
play position
v
+---+---+---+-------------------------------------+----------+
+ 0 | 1 | 2 | .... | segtotal |
+---+---+---+-------------------------------------+----------+
<--->
segsize bytes = N samples * bytes_per_sample.
The ringbuffer has a play position, which is expressed in
segments. The play position is where the device is currently reading
samples from the buffer.
The ringbuffer can be put to the PLAYING or STOPPED state.
In the STOPPED state no samples are played to the device and the play
pointer does not advance.
In the PLAYING state samples are written to the device and the ringbuffer
should call a configurable callback after each segment is written to the
device. In this state the play pointer is advanced after each segment is
written.
A write operation to the ringbuffer will put new samples in the ringbuffer.
If there is not enough space in the ringbuffer, the write operation will
block. The playback of the buffer never stops, even if the buffer is
empty. When the buffer is empty, silence is played by the device.
The ringbuffer is implemented with lockfree atomic operations, especially
on the reading side so that low-latency operations are possible.
Whenever new samples are to be put into the ringbuffer, the position of the
read pointer is taken. The required write position is taken and the diff
is made between the required and actual position. If the difference is <0,
the sample is too late. If the difference is bigger than segtotal, the
writing part has to wait for the play pointer to advance.
Scheduling:
- chain based mode:
In chain based mode, bytes are written into the ringbuffer. This operation
will eventually block when the ringbuffer is filled.
When no samples arrive in time, the ringbuffer will play silence. Each
buffer that arrives will be placed into the ringbuffer at the correct
times. This means that dropping samples or inserting silence is done
automatically and very accurate and independend of the play pointer.
In this mode, the ringbuffer is usually kept as full as possible. When
using a small buffer (small segsize and segtotal), the latency for audio
to start from the sink to when it is played can be kept low but at least
one context switch has to be made between read and write.
- getrange based mode
In getrange based mode, the audiobasesink will use the callback function
of the ringbuffer to get a segsize samples from the peer element. These
samples will then be placed in the ringbuffer at the next play position.
It is assumed that the getrange function returns fast enough to fill the
ringbuffer before the play pointer reaches the write pointer.
In this mode, the ringbuffer is usually kept as empty as possible. There
is no context switch needed between the elements that create the samples
and the actual writing of the samples to the device.
DMA mode:
- Elements that can do DMA based access to the audio device have to subclass
from the GstAudioBaseSink class and wrap the DMA ringbuffer in a subclass
of GstRingBuffer.
The ringbuffer subclass should trigger a callback after writing or playing
each sample to the device. This callback can be triggered from a thread or
from a signal from the audio device.
Clocks:
The GstAudioBaseSink class will use the ringbuffer to act as a clock provider.
It can do this by using the play pointer and the delay to calculate the
clock time.
Decodebin design
GstDecodeBin
------------
Description:
Autoplug and decode to raw media
Input : single pad with ANY caps Output : Dynamic pads
* Contents
_ a GstTypeFindElement connected to the single sink pad
_ optionally a demuxer/parser
_ optionally one or more DecodeGroup
* Autoplugging
The goal is to reach 'target' caps (by default raw media).
This is done by using the GstCaps of a source pad and finding the available
demuxers/decoders GstElement that can be linked to that pad.
The process starts with the source pad of typefind and stops when no more
non-target caps are left. It is commonly done while pre-rolling, but can also
happen whenever a new pad appears on any element.
Once a target caps has been found, that pad is ghosted and the
'pad-added' signal is emitted.
If no compatible elements can be found for a GstCaps, the pad is ghosted and
the 'unknown-type' signal is emitted.
* Assisted auto-plugging
When starting the auto-plugging process for a given GstCaps, two signals are
emitted in the following way in order to allow the application/user to assist or
fine-tune the process.
_ 'autoplug-continue' :
gboolean user_function (GstElement * decodebin, GstPad *pad, GstCaps * caps)
This signal is fired at the very beginning with the source pad GstCaps. If
the callback returns TRUE, the process continues normally. If the callback
returns FALSE, then the GstCaps are considered as a target caps and the
autoplugging process stops.
- 'autoplug-factories' :
GValueArray user_function (GstElement* decodebin, GstPad* pad,
GstCaps* caps);
Get a list of elementfactories for @pad with @caps. This function is used to
instruct decodebin2 of the elements it should try to autoplug. The default
behaviour when this function is not overriden is to get all elements that
can handle @caps from the registry sorted by rank.
- 'autoplug-select' :
gint user_function (GstElement* decodebin, GstPad* pad, GstCaps* caps,
GValueArray* factories);
This signal is fired once autoplugging has got a list of compatible
GstElementFactory. The signal is emitted with the GstCaps of the source pad
and a pointer on the GValueArray of compatible factories.
The callback should return the index of the elementfactory in @factories
that should be tried next.
If the callback returns -1, the autoplugging process will stop as if no
compatible factories were found.
The default implementation of this function will try to autoplug the first
factory of the list.
* Target Caps
The target caps are a read/write GObject property of decodebin.
By default the target caps are:
_ Raw audio : audio/x-raw
_ and raw video : video/x-raw
_ and Text : text/plain, text/x-pango-markup
* media chain/group handling
When autoplugging, all streams coming out of a demuxer will be grouped in a
DecodeGroup.
All new source pads created on that demuxer after it has emitted the
'no-more-pads' signal will be put in another DecodeGroup.
Only one decodegroup can be active at any given time. If a new decodegroup is
created while another one exists, that decodegroup will be set as blocking until
the existing one has drained.
DecodeGroup
-----------
Description:
Streams belonging to the same group/chain of a media file.
* Contents
The DecodeGroup contains:
_ a GstMultiQueue to which all streams of a the media group are connected.
_ the eventual decoders which are autoplugged in order to produce the
requested target pads.
* Proper group draining
The DecodeGroup takes care that all the streams in the group are completely
drained (EOS has come through all source ghost pads).
* Pre-roll and block
The DecodeGroup has a global blocking feature. If enabled, all the ghosted
source pads for that group will be blocked.
A method is available to unblock all blocked pads for that group.
GstMultiQueue
-------------
Description:
Multiple input-output data queue
The GstMultiQueue achieves the same functionality as GstQueue, with a few
differences:
* Multiple streams handling.
The element handles queueing data on more than one stream at once. To
achieve such a feature it has request sink pads (sink_%u) and 'sometimes' src
pads (src_%u).
When requesting a given sinkpad, the associated srcpad for that stream will
be created. Ex: requesting sink_1 will generate src_1.
* Non-starvation on multiple streams.
If more than one stream is used with the element, the streams' queues will
be dynamically grown (up to a limit), in order to ensure that no stream is
risking data starvation. This guarantees that at any given time there are at
least N bytes queued and available for each individual stream.
If an EOS event comes through a srcpad, the associated queue should be
considered as 'not-empty' in the queue-size-growing algorithm.
* Non-linked srcpads graceful handling.
A GstTask is started for all srcpads when going to GST_STATE_PAUSED.
The task are blocking against a GCondition which will be fired in two
different cases:
_ When the associated queue has received a buffer.
_ When the associated queue was previously declared as 'not-linked' and the
first buffer of the queue is scheduled to be pushed synchronously in
relation to the order in which it arrived globally in the element (see
'Synchronous data pushing' below).
When woken up by the GCondition, the GstTask will try to push the next
GstBuffer/GstEvent on the queue. If pushing the GstBuffer/GstEvent returns
GST_FLOW_NOT_LINKED, then the associated queue is marked as 'not-linked'. If
pushing the GstBuffer/GstEvent succeeded the queue will no longer be marked as
'not-linked'.
If pushing on all srcpads returns GstFlowReturn different from GST_FLOW_OK,
then all the srcpads' tasks are stopped and subsequent pushes on sinkpads will
return GST_FLOW_NOT_LINKED.
* Synchronous data pushing for non-linked pads.
In order to better support dynamic switching between streams, the multiqueue
(unlike the current GStreamer queue) continues to push buffers on non-linked
pads rather than shutting down.
In addition, to prevent a non-linked stream from very quickly consuming all
available buffers and thus 'racing ahead' of the other streams, the element
must ensure that buffers and inlined events for a non-linked stream are pushed
in the same order as they were received, relative to the other streams
controlled by the element. This means that a buffer cannot be pushed to a
non-linked pad any sooner than buffers in any other stream which were received
before it.
=====================================
Parsers, decoders and auto-plugging
=====================================
This section has DRAFT status.
Some media formats come in different "flavours" or "stream formats". These
formats differ in the way the setup data and media data is signalled and/or
packaged. An example for this is H.264 video, where there is a bytestream
format (with codec setup data signalled inline and units prefixed by a sync
code and packet length information) and a "raw" format where codec setup
data is signalled out of band (via the caps) and the chunking is implicit
in the way the buffers were muxed into a container, to mention just two of
the possible variants.
Especially on embedded platforms it is common that decoders can only
handle one particular stream format, and not all of them.
Where there are multiple stream formats, parsers are usually expected
to be able to convert between the different formats. This will, if
implemented correctly, work as expected in a static pipeline such as
... ! parser ! decoder ! sink
where the parser can query the decoder's capabilities even before
processing the first piece of data, and configure itself to convert
accordingly, if conversion is needed at all.
In an auto-plugging context this is not so straight-forward though,
because elements are plugged incrementally and not before the previous
element has processes some data and decided what it will output exactly
(unless the template caps are completely fixed, then it can continue
right away, this is not always the case here though, see below). A
parser will thus have to decide on *some* output format so auto-plugging
can continue. It doesn't know anything about the available decoders and
their capabilities though, so it's possible that it will choose a format
that is not supported by any of the available decoders, or by the preferred
decoder.
If the parser had sufficiently concise but fixed source pad template caps,
decodebin could continue to plug a decoder right away, allowing the
parser to configure itself in the same way as it would with a static
pipeline. This is not an option, unfortunately, because often the
parser needs to process some data to determine e.g. the format's profile or
other stream properties (resolution, sample rate, channel configuration, etc.),
and there may be different decoders for different profiles (e.g. DSP codec
for baseline profile, and software fallback for main/high profile; or a DSP
codec only supporting certain resolutions, with a software fallback for
unusual resolutions). So if decodebin just plugged the most highest-ranking
decoder, that decoder might not be be able to handle the actual stream later
on, which would yield an error (this is a data flow error then which would
be hard to intercept and avoid in decodebin). In other words, we can't solve
this issue by plugging a decoder right away with the parser.
So decodebin needs to communicate to the parser the set of available decoder
caps (which would contain the relevant capabilities/restrictions such as
supported profiles, resolutions, etc.), after the usual "autoplug-*" signal
filtering/sorting of course.
This is done by plugging a capsfilter element right after the parser, and
constructing set of filter caps from the list of available decoders (one
appends at the end just the name(s) of the caps structures from the parser
pad template caps to function as an 'ANY other' caps equivalent). This let
the parser negotiate to a supported stream format in the same way as with
the static pipeline mentioned above, but of course incur some overhead
through the additional capsfilter element.
Encoding and Muxing
-------------------
Summary
-------
A. Problems
B. Goals
1. EncodeBin
2. Encoding Profile System
3. Helper Library for Profiles
I. Use-cases researched
A. Problems this proposal attempts to solve
-------------------------------------------
* Duplication of pipeline code for gstreamer-based applications
wishing to encode and or mux streams, leading to subtle differences
and inconsistencies across those applications.
* No unified system for describing encoding targets for applications
in a user-friendly way.
* No unified system for creating encoding targets for applications,
resulting in duplication of code across all applications,
differences and inconsistencies that come with that duplication,
and applications hardcoding element names and settings resulting in
poor portability.
B. Goals
--------
1. Convenience encoding element
Create a convenience GstBin for encoding and muxing several streams,
hereafter called 'EncodeBin'.
This element will only contain one single property, which is a
profile.
2. Define a encoding profile system
2. Encoding profile helper library
Create a helper library to:
* create EncodeBin instances based on profiles, and
* help applications to create/load/save/browse those profiles.
1. EncodeBin
------------
1.1 Proposed API
----------------
EncodeBin is a GstBin subclass.
It implements the GstTagSetter interface, by which it will proxy the
calls to the muxer.
Only two introspectable property (i.e. usable without extra API):
* A GstEncodingProfile*
* The name of the profile to use
When a profile is selected, encodebin will:
* Add REQUEST sinkpads for all the GstStreamProfile
* Create the muxer and expose the source pad
Whenever a request pad is created, encodebin will:
* Create the chain of elements for that pad
* Ghost the sink pad
* Return that ghost pad
This allows reducing the code to the minimum for applications
wishing to encode a source for a given profile:
...
encbin = gst_element_factory_make("encodebin, NULL);
g_object_set (encbin, "profile", "N900/H264 HQ", NULL);
gst_element_link (encbin, filesink);
...
vsrcpad = gst_element_get_src_pad(source, "src1");
vsinkpad = gst_element_get_request_pad (encbin, "video_%u");
gst_pad_link(vsrcpad, vsinkpad);
...
1.2 Explanation of the Various stages in EncodeBin
--------------------------------------------------
This describes the various stages which can happen in order to end
up with a multiplexed stream that can then be stored or streamed.
1.2.1 Incoming streams
The streams fed to EncodeBin can be of various types:
* Video
* Uncompressed (but maybe subsampled)
* Compressed
* Audio
* Uncompressed (audio/x-raw)
* Compressed
* Timed text
* Private streams
1.2.2 Steps involved for raw video encoding
(0) Incoming Stream
(1) Transform raw video feed (optional)
Here we modify the various fundamental properties of a raw video
stream to be compatible with the intersection of:
* The encoder GstCaps and
* The specified "Stream Restriction" of the profile/target
The fundamental properties that can be modified are:
* width/height
This is done with a video scaler.
The DAR (Display Aspect Ratio) MUST be respected.
If needed, black borders can be added to comply with the target DAR.
* framerate
* format/colorspace/depth
All of this is done with a colorspace converter
(2) Actual encoding (optional for raw streams)
An encoder (with some optional settings) is used.
(3) Muxing
A muxer (with some optional settings) is used.
(4) Outgoing encoded and muxed stream
1.2.3 Steps involved for raw audio encoding
This is roughly the same as for raw video, expect for (1)
(1) Transform raw audo feed (optional)
We modify the various fundamental properties of a raw audio stream to
be compatible with the intersection of:
* The encoder GstCaps and
* The specified "Stream Restriction" of the profile/target
The fundamental properties that can be modifier are:
* Number of channels
* Type of raw audio (integer or floating point)
* Depth (number of bits required to encode one sample)
1.2.4 Steps involved for encoded audio/video streams
Steps (1) and (2) are replaced by a parser if a parser is available
for the given format.
1.2.5 Steps involved for other streams
Other streams will just be forwarded as-is to the muxer, provided the
muxer accepts the stream type.
2. Encoding Profile System
--------------------------
This work is based on:
* The existing GstPreset system for elements [0]
* The gnome-media GConf audio profile system [1]
* The investigation done into device profiles by Arista and
Transmageddon [2 and 3]
2.2 Terminology
---------------
* Encoding Target Category
A Target Category is a classification of devices/systems/use-cases
for encoding.
Such a classification is required in order for:
* Applications with a very-specific use-case to limit the number of
profiles they can offer the user. A screencasting application has
no use with the online services targets for example.
* Offering the user some initial classification in the case of a
more generic encoding application (like a video editor or a
transcoder).
Ex:
Consumer devices
Online service
Intermediate Editing Format
Screencast
Capture