Commit 46138b1b authored by Tim-Philipp Müller's avatar Tim-Philipp Müller

docs: design: move most design docs to gst-docs module

parent 49653b05
......@@ -2,16 +2,5 @@ SUBDIRS =
design-audiosinks.txt \
design-decodebin.txt \
design-encoding.txt \
design-orc-integration.txt \
draft-hw-acceleration.txt \
draft-keyframe-force.txt \
draft-va.txt \
part-interlaced-video.txt \
Audiosink design
- must operate chain based.
Most simple playback pipelines will push audio from the decoders
into the audio sink.
- must operate getrange based
Most professional audio applications will operate in a mode where
the audio sink pulls samples from the pipeline. This is typically
done in a callback from the audiosink requesting N samples. The
callback is either scheduled from a thread or from an interrupt
from the audio hardware device.
- Exact sample accurate clocks.
the audiosink must be able to provide a clock that is sample
accurate even if samples are dropped or when discontinuities are
found in the stream.
- Exact timing of playback.
The audiosink must be able to play samples at their exact times.
- use DMA access when possible.
When the hardware can do DMA we should use it. This should also
work over bufferpools to avoid data copying to/from kernel space.
The design is based on a set of base classes and the concept of a
ringbuffer of samples.
+-----------+ - provide preroll, rendering, timing
+ basesink + - caps nego
+-----V----------+ - manages ringbuffer
+ audiobasesink + - manages scheduling (push/pull)
+-----+----------+ - manages clock/query/seek
| - manages scheduling of samples in the ringbuffer
| - manages caps parsing
+-----V------+ - default ringbuffer implementation with a GThread
+ audiosink + - subclasses provide open/read/close methods
The ringbuffer is a contiguous piece of memory divided into segtotal
pieces of segments. Each segment has segsize bytes.
play position
+ 0 | 1 | 2 | .... | segtotal |
segsize bytes = N samples * bytes_per_sample.
The ringbuffer has a play position, which is expressed in
segments. The play position is where the device is currently reading
samples from the buffer.
The ringbuffer can be put to the PLAYING or STOPPED state.
In the STOPPED state no samples are played to the device and the play
pointer does not advance.
In the PLAYING state samples are written to the device and the ringbuffer
should call a configurable callback after each segment is written to the
device. In this state the play pointer is advanced after each segment is
A write operation to the ringbuffer will put new samples in the ringbuffer.
If there is not enough space in the ringbuffer, the write operation will
block. The playback of the buffer never stops, even if the buffer is
empty. When the buffer is empty, silence is played by the device.
The ringbuffer is implemented with lockfree atomic operations, especially
on the reading side so that low-latency operations are possible.
Whenever new samples are to be put into the ringbuffer, the position of the
read pointer is taken. The required write position is taken and the diff
is made between the required and actual position. If the difference is <0,
the sample is too late. If the difference is bigger than segtotal, the
writing part has to wait for the play pointer to advance.
- chain based mode:
In chain based mode, bytes are written into the ringbuffer. This operation
will eventually block when the ringbuffer is filled.
When no samples arrive in time, the ringbuffer will play silence. Each
buffer that arrives will be placed into the ringbuffer at the correct
times. This means that dropping samples or inserting silence is done
automatically and very accurate and independend of the play pointer.
In this mode, the ringbuffer is usually kept as full as possible. When
using a small buffer (small segsize and segtotal), the latency for audio
to start from the sink to when it is played can be kept low but at least
one context switch has to be made between read and write.
- getrange based mode
In getrange based mode, the audiobasesink will use the callback function
of the ringbuffer to get a segsize samples from the peer element. These
samples will then be placed in the ringbuffer at the next play position.
It is assumed that the getrange function returns fast enough to fill the
ringbuffer before the play pointer reaches the write pointer.
In this mode, the ringbuffer is usually kept as empty as possible. There
is no context switch needed between the elements that create the samples
and the actual writing of the samples to the device.
DMA mode:
- Elements that can do DMA based access to the audio device have to subclass
from the GstAudioBaseSink class and wrap the DMA ringbuffer in a subclass
of GstRingBuffer.
The ringbuffer subclass should trigger a callback after writing or playing
each sample to the device. This callback can be triggered from a thread or
from a signal from the audio device.
The GstAudioBaseSink class will use the ringbuffer to act as a clock provider.
It can do this by using the play pointer and the delay to calculate the
clock time.
This diff is collapsed.
This diff is collapsed.
Orc Integration
- About Orc
- Fast memcpy()
- Normal Usage
- Build Process
- Testing
- Orc Limitations
About Orc
Orc code can be in one of two forms: in .orc files that is converted
by orcc to C code that calls liborc functions, or C code that calls
liborc to create complex operations at runtime. The former is mostly
for functions with predetermined functionality. The latter is for
functionality that is determined at runtime, where writing .orc
functions for all combinations would be prohibitive. Orc also has
a fast memcpy and memset which are useful independently.
Fast memcpy()
*** This part is not integrated yet. ***
Orc has built-in functions orc_memcpy() and orc_memset() that work
like memcpy() and memset(). These are meant for large copies only.
A reasonable cutoff for using orc_memcpy() instead of memcpy() is
if the number of bytes is generally greater than 100. DO NOT use
orc_memcpy() if the typical is size is less than 20 bytes, especially
if the size is known at compile time, as these cases are inlined by
the compiler.
(Example: sys/ximage/ximagesink.c)
Add $(ORC_CFLAGS) to libgstximagesink_la_CFLAGS and $(ORC_LIBS) to
libgstximagesink_la_LIBADD. Then, in the source file, add:
#ifdef HAVE_ORC
#include <orc/orc.h>
#define orc_memcpy(a,b,c) memcpy(a,b,c)
Then switch relevant uses of memcpy() to orc_memcpy().
The above example works whether or not Orc is enabled at compile
Normal Usage
The following lines are added near the top of for plugins
that use Orc code in .orc files (this is for the volume plugin):
include $(top_srcdir)/common/
Also add the generated source file to the plugin build:
nodist_libgstvolume_la_SOURCES = $(ORC_SOURCES)
And of course, add $(ORC_CFLAGS) to libgstvolume_la_CFLAGS, and
$(ORC_LIBS) to libgstvolume_la_LIBADD.
The value assigned to ORC_BASE does not need to be related to
the name of the plugin.
Advanced Usage
The Holy Grail of Orc usage is to programmatically generate Orc code
at runtime, have liborc compile it into binary code at runtime, and
then execute this code. Currently, the best example of this is in
Schroedinger. An example of how this would be used is audioconvert:
given an input format, channel position manipulation, dithering and
quantizing configuration, and output format, a Orc code generator
would create an OrcProgram, add the appropriate instructions to do
each step based on the configuration, and then compile the program.
Successfully compiling the program would return a function pointer
that can be called to perform the operation.
This sort of advanced usage requires structural changes to current
plugins (e.g., audioconvert) and will probably be developed
incrementally. Moreover, if such code is intended to be used without
Orc as strict build/runtime requirement, two codepaths would need to
be developed and tested. For this reason, until GStreamer requires
Orc, I think it's a good idea to restrict such advanced usage to the
cog plugin in -bad, which requires Orc.
Build Process
The goal of the build process is to make Orc non-essential for most
developers and users. This is not to say you shouldn't have Orc
installed -- without it, you will get slow backup C code, just that
people compiling GStreamer are not forced to switch from Liboil to
Orc immediately.
With Orc installed, the build process will use the Orc Compiler (orcc)
to convert each .orc file into a temporary C source (tmp-orc.c) and a
temporary header file (${name}orc.h if constructed from ${base}.orc).
The C source file is compiled and linked to the plugin, and the header
file is included by other source files in the plugin.
If 'make orc-update' is run in the source directory, the files
tmp-orc.c and ${base}orc.h are copied to ${base}orc-dist.c and
${base}orc-dist.h respectively. The -dist.[ch] files are automatically
disted via The -dist.[ch] files should be checked in to
git whenever the .orc source is changed and checked in. Example
edit .orc file
... make, test, etc.
make orc-update
git add volume.orc volumeorc-dist.c volumeorc-dist.h
git commit
At 'make dist' time, all of the .orc files are compiled, and then
copied to their -dist.[ch] counterparts, and then the -dist.[ch]
files are added to the dist directory.
Without Orc installed (or --disable-orc given to configure), the
-dist.[ch] files are copied to tmp-orc.c and ${name}orc.h. When
compiled Orc disabled, DISABLE_ORC is defined in config.h, and
the C backup code is compiled. This backup code is pure C, and
does not include orc headers or require linking against liborc.
The common/ build method is limited by the inflexibility of
automake. The file tmp-orc.c must be a fixed filename, using ORC_NAME
to generate the filename does not work because it conflicts with
automake's dependency generation. Building multiple .orc files
is not possible due to this restriction.
If you create another .orc file, please add it to
tests/orc/ This causes automatic test code to be
generated and run during 'make check'. Each function in the .orc
file is tested by comparing the results of executing the run-time
compiled code and the C backup function.
Orc Limitations
Orc doesn't have a mechanism for generating random numbers, which
prevents its use as-is for dithering. One way around this is to
generate suitable dithering values in one pass, then use those
values in a second Orc-based pass.
Orc doesn't handle 64-bit float, for no good reason.
Irrespective of Orc handling 64-bit float, it would be useful to
have a direct 32-bit float to 16-bit integer conversion.
audioconvert is a good candidate for programmatically generated
Orc code.
audioconvert enumerates functions in terms of big-endian vs.
little-endian. Orc's functions are "native" and "swapped".
Programmatically generating code removes the need to worry about
Orc doesn't handle 24-bit samples. Fixing this is not a priority
(for ds).
Orc doesn't handle horizontal resampling yet. The plan is to add
special sampling opcodes, for nearest, bilinear, and cubic
Lots of code in videotestsrc needs to be rewritten to be SIMD
(and Orc) friendly, e.g., stuff that uses oil_splat_u8().
A fast low-quality random number generator in Orc would be useful
Many of the comments on audioconvert apply here as well.
There are a bunch of FIXMEs in here that are due to misapplied
Forcing keyframes
Consider the following use case:
We have a pipeline that performs video and audio capture from a live source,
compresses and muxes the streams and writes the resulting data into a file.
Inside the uncompressed video data we have a specific pattern inserted at
specific moments that should trigger a switch to a new file, meaning, we close
the existing file we are writing to and start writing to a new file.
We want the new file to start with a keyframe so that one can start decoding
the file immediately.
1) We need an element that is able to detect the pattern in the video stream.
2) We need to inform the video encoder that it should start encoding a keyframe
starting from exactly the frame with the pattern.
3) We need to inform the demuxer that it should flush out any pending data and
start creating the start of a new file with the keyframe as a first video
4) We need to inform the sink element that it should start writing to the next
file. This requires application interaction to instruct the sink of the new
filename. The application should also be free to ignore the boundary and
continue to write to the existing file. The application will typically use
an event pad probe to detect the custom event.
The implementation would consist of generating a GST_EVENT_CUSTOM_DOWNSTREAM
event that marks the keyframe boundary. This event is inserted into the
pipeline by the application upon a certain trigger. In the above use case this
trigger would be given by the element that detects the pattern, in the form of
an element message.
The custom event would travel further downstream to instruct encoder, muxer and
sink about the possible switch.
The information passed in the event consists of:
name: GstForceKeyUnit
(G_TYPE_UINT64)"timestamp" : the timestamp of the buffer that
triggered the event.
(G_TYPE_UINT64)"stream-time" : the stream position that triggered the
(G_TYPE_UINT64)"running-time" : the running time of the stream when the
event was triggered.
(G_TYPE_BOOLEAN)"all-headers" : Send all headers, including those in
the caps or those sent at the start of
the stream.
.... : optional other data fields.
Note that this event is purely informational, no element is required to
perform an action but it should forward the event downstream, just like any
other event it does not handle.
Elements understanding the event should behave as follows:
1) The video encoder receives the event before the next frame. Upon reception
of the event it schedules to encode the next frame as a keyframe.
Before pushing out the encoded keyframe it must push the GstForceKeyUnit
event downstream.
2) The muxer receives the GstForceKeyUnit event and flushes out its current state,
preparing to produce data that can be used as a keyunit. Before pushing out
the new data it pushes the GstForceKeyUnit event downstream.
3) The application receives the GstForceKeyUnit on a sink padprobe of the sink
and reconfigures the sink to make it perform new actions after receiving
the next buffer.
When using RTP packets can get lost or receivers can be added at any time,
they may request a new key frame.
An downstream element sends an upstream "GstForceKeyUnit" event up the
When an element produces some kind of key unit in output, but has
no such concept in its input (like an encoder that takes raw frames),
it consumes the event (doesn't pass it upstream), and instead sends
a downstream GstForceKeyUnit event and a new keyframe.
This diff is collapsed.
Interlaced Video
Video buffers have a number of states identifiable through a combination of caps
and buffer flags.
Possible states:
- Progressive
- Interlaced
- Plain
- One field
- Two fields
- Three fields - this should be a progressive buffer with a repeated 'first'
field that can be used for telecine pulldown
- Telecine
- One field
- Two fields
- Progressive
- Interlaced (a.k.a. 'mixed'; the fields are from different frames)
- Three fields - this should be a progressive buffer with a repeated 'first'
field that can be used for telecine pulldown
Note: It can be seen that the difference between the plain interlaced and
telecine states is that in the telecine state, buffers containing two fields may
be progressive.
Tools for identification:
- GstVideoInfo
- GstVideoInterlaceMode - enum - GST_VIDEO_INTERLACE_MODE_...
- Buffers flags - GST_VIDEO_BUFFER_FLAG_...
Identification of Buffer States
Note that flags are not necessarily interpreted in the same way for all
different states nor are they necessarily required nor make sense in all cases.
If the interlace mode in the video info corresponding to a buffer is
"progressive", then the buffer is progressive.
Plain Interlaced
If the video info interlace mode is "interleaved", then the buffer is plain
GST_VIDEO_BUFFER_FLAG_TFF indicates whether the top or bottom field is to be
displayed first. The timestamp on the buffer corresponds to the first field.
GST_VIDEO_BUFFER_FLAG_RFF indicates that the first field (indicated by the TFF flag)
should be repeated. This is generally only used for telecine purposes but as the
telecine state was added long after the interlaced state was added and defined,
this flag remains valid for plain interlaced buffers.
GST_VIDEO_BUFFER_FLAG_ONEFIELD means that only the field indicated through the TFF
flag is to be used. The other field should be ignored.
If video info interlace mode is "mixed" then the buffers are in some form of
telecine state.
The TFF and ONEFIELD flags have the same semantics as for the plain interlaced
GST_VIDEO_BUFFER_FLAG_RFF in the telecine state indicates that the buffer contains
only repeated fields that are present in other buffers and are as such
unneeded. For example, in a sequence of three telecined frames, we might have:
AtAb AtBb BtBb
In this situation, we only need the first and third buffers as the second
buffer contains fields present in the first and third.
Note that the following state can have its second buffer identified using the
ONEFIELD flag (and TFF not set):
AtAb AtBb BtCb
The telecine state requires one additional flag to be able to identify
progressive buffers.
The presence of the GST_VIDEO_BUFFER_FLAG_INTERLACED means that the buffer is an
'interlaced' or 'mixed' buffer that contains two fields that, when combined
with fields from adjacent buffers, allow reconstruction of progressive frames.
The absence of the flag implies the buffer containing two fields is a
progressive frame.
For example in the following sequence, the third buffer would be mixed (yes, it
is a strange pattern, but it can happen):
AtAb AtBb BtCb CtDb DtDb
Media Types
format, G_TYPE_STRING, mandatory
The format of the audio samples, see the Formats section for a list
of valid sample formats.
rate, G_TYPE_INT, mandatory
The samplerate of the audio
channels, G_TYPE_INT, mandatory
The number of channels
channel-mask, GST_TYPE_BITMASK, mandatory for more than 2 channels
Bitmask of channel positions present. May be omitted for mono and
stereo. May be set to 0 to denote that the channels are unpositioned.
layout, G_TYPE_STRING, mandatory
The layout of channels within a buffer. Possible values are
"interleaved" (for LRLRLRLR) and "non-interleaved" (LLLLRRRR)
Use GstAudioInfo and related helper API to create and parse raw audio caps.
A matrix for downmixing multichannel audio to a lower numer of channels.
The following values can be used for the format string property.
"S8" 8-bit signed PCM audio
"U8" 8-bit unsigned PCM audio
"S16LE" 16-bit signed PCM audio
"S16BE" 16-bit signed PCM audio
"U16LE" 16-bit unsigned PCM audio
"U16BE" 16-bit unsigned PCM audio
"S24_32LE" 24-bit signed PCM audio packed into 32-bit
"S24_32BE" 24-bit signed PCM audio packed into 32-bit
"U24_32LE" 24-bit unsigned PCM audio packed into 32-bit
"U24_32BE" 24-bit unsigned PCM audio packed into 32-bit
"S32LE" 32-bit signed PCM audio
"S32BE" 32-bit signed PCM audio
"U32LE" 32-bit unsigned PCM audio
"U32BE" 32-bit unsigned PCM audio
"S24LE" 24-bit signed PCM audio
"S24BE" 24-bit signed PCM audio
"U24LE" 24-bit unsigned PCM audio
"U24BE" 24-bit unsigned PCM audio
"S20LE" 20-bit signed PCM audio
"S20BE" 20-bit signed PCM audio
"U20LE" 20-bit unsigned PCM audio
"U20BE" 20-bit unsigned PCM audio
"S18LE" 18-bit signed PCM audio
"S18BE" 18-bit signed PCM audio
"U18LE" 18-bit unsigned PCM audio
"U18BE" 18-bit unsigned PCM audio
"F32LE" 32-bit floating-point audio
"F32BE" 32-bit floating-point audio
"F64LE" 64-bit floating-point audio
"F64BE" 64-bit floating-point audio
Media Types
format, G_TYPE_STRING, mandatory
The format of the text, see the Formats section for a list of valid format
There are no common metas for this raw format yet.
"utf8" plain timed utf8 text (formerly text/plain)
Parsed timed text in utf8 format.
"pango-markup" plain timed utf8 text with pango markup (formerly text/x-pango-markup)
Same as "utf8", but text embedded in an XML-style markup language for
size, colour, emphasis, etc.
This diff is collapsed.
The purpose of this element is to decode and render the media contained in a
given generic uri. The element extends GstPipeline and is typically used in
playback situations.
Required features:
- accept and play any valid uri. This includes
- rendering video/audio
- overlaying subtitles on the video
- optionally read external subtitle files
- allow for hardware (non raw) sinks
- selection of audio/video/subtitle streams based on language.
- perform network buffering/incremental download
- gapless playback
- support for visualisations with configurable sizes
- ability to reject files that are too big, or of a format that would require
too much CPU/memory usage.
- be very efficient with adding elements such as converters to reduce the
amount of negotiation that has to happen.
- handle chained oggs. This includes having support for dynamic pad add and
remove from a demuxer.
* decodebin2
- performs the autoplugging of demuxers/decoders
- emits signals when for steering the autoplugging
- to decide if a non-raw media format is acceptable as output
- to sort the possible decoders for a non-raw format
- see also decodebin2 design doc
* uridecodebin
- combination of a source to handle the given uri, an optional queueing element
and one or more decodebin2 elements to decode the non-raw streams.