Television and digital video standards summary

Broadcast terms

Digital Video Broadcasting (DVB), including DVB-C (cable), DVB-T (terrestrial), and DVB-S (satellite).

Each NTSC or PAL video frame consists of two "fields." When displaying video, an NTSC television draws one field every 1/60th of a second, and PAL televisions display one field every 1/50th of a second. Interlacing involves merging the alternating fields of an image into a single frame. The process of separating the single field back into the original two fields is called DeInterlacing.

Network where a single fixed connection is established as a route.

National Television Standards Committee. The NTSC is responsible for setting television and video standards in the United States. The NTSC standard for television defines a composite video signal with a refresh rate of 29.97 FPS. The NTSC standard also requires that these frames be interlaced.

Phase Alternating Line. The dominant television standard in Europe. The PAL standard delivers 25 FPS.

Refers to protocols in which messages are divided into packets before they are sent. Each packet is then transmitted individually and can even follow different routes to its destination. Once all the packets forming a message arrive at the destination, they are recompiled into the original message.

Systeme Electronique pour Couleur avec Memoire. Developed in France.
Sequential Colour with Memory.

Most film content is created at 24 FPS. To meet the NTSC standard, extra frames are added to reach the 30 FPS requirement. This is done through an algorithm that creates an intermediate frame between two other frames. The process that removes the frames that were added when 24 FPS film was converted to 30 FPS video is known as Inverse Telecine.

Colour representation

CIE stands for Comission Internationale de l'Eclairage (International Commission on Illumination). The commission was founded in 1913 as an autonomous international board to provide a forum for the exchange of ideas and information and to set standards for all things related to lighting. As a part of this mission, CIE has a technical committee, Vison and Colour, that has been a leading force in colourimetry since it first met to set its standards in Cambridge, England, in 1931.

A different approach developed by Richard Hunter in 1942 that defines colours along two polar axes for colour (a and b) and a third for lightness (L).

A model composed in 1960 and revised in 1976. This model uses an altered and elongated form of the original chromaticity diagram in an attempt to correct its non-uniformity.

The CIEXYZ structure contains the x, y, and z coordinates of a specific colour in a specified colour space The 1931 CIEXYZ colour space is widely used as the basis for colour space conversion. With the rise of the Internet, however, bandwidth considerations have made the XYZ colour space unwieldy. The exchange of images over the limited bandwidth of the Internet necessitates a more compact colour model.

CIE considered the tristimulus values for red, green, and blue to be undesirable for creating a standardized colour model.

Cyan, magenta, and yellow correspond roughly to the primary colours in art production: red, blue, and yellow.

One of the most influential colour-modeling systems was devised by Albert Henry Munsell, an American artist. Munsell desired to create a "rational way to describe colour" that would use clear decimal notation instead of a lot of colour names that he considered "foolish" and "misleading". His system, which he began in 1898 with the creation of his colour sphere, or tree, saw its full expression with his publication, A Colour Notation, in 1905. This work has been reprinted several times and is still a standard for colourimetry (the measuring of colour).

HSB/HLS are two variations of a very basic colour model for defining colours in desktop graphics programs that closely matches the way we perceive colour. This model is somewhat analogous to Munsell's system of hue, value, and chroma in that it uses three similar axes to define a colour. In HSB, these are hue, saturation, and brightness; in HLS, they are defined by hue, lightness, and saturation.

A colour model that describes colour information in terms of the red (R), green (G), and blue (B) elements that make up the colour.

RGB and its subset CMY form the most basic and well-known colour model. This model bears closest resemblance to how we perceive colour. It also corresponds to the principles of additive and subtractive colours.

As a result of Internet bandwidth considerations, Hewlett-Packard and Microsoft have proposed the adoption of a standard predefined colour space known as sRGB (IEC 61966-2-1), so as to allow accurate colour mapping with very little data overhead.

YUV (Y'CbCr)
A colour model that describes colour information in terms of luminance (Y) and chrominance (U, V). YUV formats fall into two distinct groups, the packed formats where Y, U and V samples are packed together into macropixels which are stored in a single array, and the planar formats where each component is stored as a separate array, the final image being a fusing of the three separate planes.

The notation YUV, and the term luminance, are widespread in digital video. However, digital video almost never uses Y'UV colour difference components, and never directly represents the luminance of colour science. Video engineers and computer graphics specialists should use the correct terms, almost always Y'CBCR and luma.

To cut a long story short, here are the formulae that I have used to do the conversions for PC video applications. The colour space in question is actually YCbCr and not YUV, which a video purist will tell you is, in fact, the colour scheme employed in PAL TV systems and is somewhat different (NTSC TVs use YIQ which is different again). Why the PC video fraternity adopted the term YUV is a mystery but I strongly suspect that it has something to do with not having to type subscripts.

The following 2 sets of formulae are taken from information from Keith Jack's excellent book "Video Demystified" (ISBN 1-878707-09-4).

RGB to YUV Conversion

Y = (0.257 * R) + (0.504 * G) + (0.098 * B) + 16
Cr = V = (0.439 * R) - (0.368 * G) - (0.071 * B) + 128
Cb = U = -(0.148 * R) - (0.291 * G) + (0.439 * B) + 128

YUV to RGB Conversion
B = 1.164(Y - 16) + 2.018(U - 128)
G = 1.164(Y - 16) - 0.813(V - 128) - 0.391(U - 128)
R = 1.164(Y - 16) + 1.596(V - 128)

In both these cases, you have to clamp the output values to keep them in the [0-255] range. Rumour has it that the valid range is actually a subset of [0-255] (I've seen an RGB range of [16-235] mentioned) but clamping the values into [0-255] seems to produce acceptable results to me.

A C program is available to do these conversions, the source code can be downloaded from yuv2ppm.c

Digital Video and picture formats use YUV because it resolves colour to a single number that is easier for numeric comparison of individual pixels and macroblocks.

Digital video terms

Adaptive Transform Acoustic Coding. High quality psycho-acoustic coding scheme developed by Sony for MiniDisc.

Annex D Graphics
Recommendation H.261 Annex D Graphic Transfer mode can be supported by reading four decoded pictures from the accelerator back onto the host and interleaving them there for display as a higher-resolution graphic picture.

Microsoft's high-performance multimedia programming libraries.

DirectX VA
DirectX VA describes an Application Programming Interface (API) and a corresponding Device Driver Interface (DDI) for hardware acceleration of digital video decoding processing, with support of alpha blending for such purposes as DVD subpicture support. It provides an interface definition focused on support of MPEG-2 "main profile" video (formally ITU-T H.262 | ISO/IEC 13818-2), but is also intended to support other key video codecs (for example, ITU-T Recommendations H.263 and H.261, and MPEG-1 and MPEG-4).

Note: The DirectX VA specification is now located in the Microsoft Platform DDK. Developers of software decoders as well as device drivers should refer to that specification. The following section is primarily relevant for decoder developers. It provides detailed information on how a software MPEG-2 decoder uses the IAMVideoAccelerator Interface when communicating with a DXVA-enabled hardware device.

Digital Versatile Disc. High capacity storage disk used for video and data. More.

Video: up to 9.8 Mbit/sec MPEG-2, 720 x 576 pixels, 25 frames/second
Audio: Dolby Digital, DTS, Stereo
Video: up to 9.8 Mbit/sec MPEG-2, 720 x 480 pixels, 29.97 frames/second (23.976 frames/second NTSC Film)
Audio: Dolby Digital, DTS, Stereo

This standard is formally titled "Video Codec for Audiovisual Services at px64 kbit/s," ITU-T Recommendation H.261. Recommendation H.261 contains the same basic design later used in other video codec standards. This standard uses 8-bit samples with Y, Cb, and Cr components, 4:2:0 sampling, 16x16 "macroblock"-based motion compensation, 8x8 IDCT, zig-zag inverse scanning of coefficients, scalar quantization, and variable-length coding of coefficients based on a combination of zero-valued run-lengths and quantization index values.

All H.261 prediction blocks use forward-only prediction from the previous picture. H.261 does not have half-sample accurate prediction filters, but instead uses a type of low-pass filter called the "loop filter" (Section 3.2.3 of the H.261 specification) that can be turned off or on during motion compensation prediction for each macroblock.

ITU-T Recommendation H.263 is formally titled "Video Coding for Low Bit Rate Communication." Recommendation H.263 is a more recent video codec standard that offers improved compression performance relative to H.261, MPEG-1, and MPEG-2. This standard contains a "baseline" mode of operation that supports only the most basic form of H.263. It also contains a large number of optional enhanced modes of operation that can be used for various purposes. Baseline H.263 prediction operates in this interface using a subset of the MPEG-1 features. The baseline mode contains no bidirectional prediction only forward prediction.

The H.323 protocol is the International Telecommunications Union - Telecommunications (ITU-T) standard for real-time multimedia communications and conferencing over packet-based networks.

ITU Standard Codec and RTP Payload Handlers
NetMeeting ships with four ITU standard codec components: G.711, G.723, H.261, and H.263.

For streaming audio and video, there is a tight coupling between a codec and its RTP payload handler. RTP is the network transport protocol that NetMeeting uses for audio and video streams. An RTP payload handler organizes audio streams into packets for use by the RTP. RTP is not yet a part of the Windows operating system. Windows has a general-purpose means for installing and using codecs, namely Address Complete Message (ACM) and VCM. VCM is also known as Integrated Call Management (ICM). But Windows has no general-purpose means for installing RTP payload handlers.

NetMeeting 3 supports the H.323 standard, which calls for the use of RTP and the four standard codecs mentioned above. To get optimal performance and a quality end-user experience, NetMeeting 3 has gone beyond the limitations of the interfaces defined in ACM and VCM by linking the NetMeeting implementations of the standard codecs with specific RTP payload handlers built to use these implementations with the RTP and call control components in NetMeeting.

The MPEG-1 video standard is formally titled ISO/IEC 11172-2. This standard was developed not long after H.261 and borrowed significantly from it. The MPEG-1 standard does not have a loop filter; instead it uses a simple half-sample filter that attempts to resolve subpixel movement between frames. Two additional prediction modes, bidirectional and backward prediction, were added. These prediction modes require one additional reference frame to be buffered. The bidirectional prediction mode averages forward-predicted and backward-predicted prediction blocks. The arithmetic for averaging forward and backward prediction blocks is similar to that for creating a half-sampled "interpolated" prediction block. The basic structure is otherwise the same as H.261.

MPEG-2 (H.262)
The MPEG-2 standard is formally titled "Information Technology Generic Coding of Moving Pictures and Associated Audio Information: Video," ITU-T Recommendation H.262 | ISO/IEC 13818-2. This standard added only a basic 16x8 shape to the existing tools of MPEG-1 (from a very low layer perspective). From a slightly higher-layer perspective, MPEG-2 added many additional ways to combine predictions referenced from multiple fields in order to deal with interlaced video characteristics.

MPEG-4 was based heavily on H.263 for progressive-scan coding, and on MPEG-2 for support of interlace and colour sampling formats other than 4:2:0. The features that support H.263 and MPEG-2 can be used to support MPEG-4. MPEG-4 can support a sample accuracy of more than eight bits. DirectX VA includes a mechanism to support more than eight bits per pixel using the bBPPminus1 member of DXVA_PictureParameters.

Note: The features most unique to MPEG-4, such as shape coding, object orientation, face modeling, mesh objects, and sprites, are not supported in DirectX VA.

High quality audio compression method based on MPEG1 layer 3 developed initially by Fraunhofer Institute Integrierte Schaltungen (IIS).

Powerful application that supports real-time communication and collaboration over the Internet or intranet, providing standards-based audio, video, and multipoint data conferencing support developed by Microsoft.

Real Media
Brand name for high quality, low bitrate audio and video streaming technology developed by RealNetworks Inc. Note that RealPlayer 4 supplied with Windows NT 4.0 has the following credits: 1994-1997 Universite de Sherbrooke / Sipro LabTelecom, Inc. RealVideo (Fractal) codec by Iterated Systems, Inc. Copyright 1996-1997 Iterated Systems, Inc.

SVCD stands for "Super VideoCD". A SVCD is very similiar to a VCD, it has the capacity to hold about 35-60 minutes on 74/80 min CDs of very good quality full-motion video along with up to 2 stereo audio tracks and also 4 removable subtitles.

Video: max ~2600 kbit/sec MPEG-2, 480 x 576 pixels, 25 frames/second with up to 4 Subtitles
Audio: from 32 - 384 kbit/sec MPEG-1 layer2 or MPEG-2 with up to 2 Audio Tracks
Extra: Menus and chapters. Still pictures 720x576, 352x288
Video: max 2600 kbit/sec MPEG-2, 480 x 480 pixels, 29.97 frames/second with up to 4 Subtitles
Audio: from 32 - 384 kbit/sec MPEG-1 layer2 or MPEG-2 with up to 2 Audio Tracks
Extra: Menus and chapters. Still pictures 720x480, 352x240

VCD stands for Video CD. A VCD is an ordinary CD with video on it. It contain up to 74 minutes video on one CD. A VCD can be played on almost all DVD Players (but not all can play cd-rs) and of course on all CR-ROMs. This format was initially released by Philips in 1994.

Video: MPEG-1, 352 x 288 pixels, 25 frames/second
Audio: from 32 - 384 kbit/sec MPEG-1
Video: MPEG-1, 320 x 240 pixels, 29.97 frames/second
Audio: from 32 - 384 kbit/sec MPEG-1

DVD Video Object. All DVD movies are stored in so-called vob files. Vob files usually contain multiplexed Dolby Digital Audio and MPEG-2 video. Vob Files are called as follows: vts_XX_y.vob where XX represents the title and Y the part of the title. There can be 99 titles and 10 parts, although vts_XX_0.vob does never contain any video, usually just menu or navigational information.

Not to be confused with the Rational ClearCase Version Object Base (VOB). That is a proprietory database.


Consultative Committee for International Telegraph and Telephone

Comission Internationale de l'Eclairage (International Commission on Illumination)

Institute of Electrical and Electronic Engineers

International Standards Organisation

International Telecommunications Union - Telecommunications

Joint Picture Expert Group

Motion Picture Expert Group


Adobe Technical Guide
MSDN Library
VCD Helper
Web Artz

Back to index.