International study groups, VCEG (Video Coding Experts Group) of ITU-T (International Telecommunication Union-Telecommunication sector) and MPEG (Moving Picture Experts Group) of ISO/IEC, have researched the video coding techniques for various applications of moving pictures since the early 1990s. Since then, ITU-T developed H.261 as the first video coding standard for videoconferencing application. MPEG-1 video coding standard was accomplished for storage in compact disk and MPEG-2 (ITU-T adopted it as H.262) standard for digital TV and HDTV as extension of MPEG-1. Also, for covering the very wide range of applications such as shaped regions of video objects as well as rectangular pictures, MPEG-4 part 2 standard was developed. This includes also natural and synthetic video / audio combinations with interactivity built in. On the other hand, ITU-T developed H.263 in order to improve the compression performance of H.261, and the base coding model of H.263 was adopted as the core of some parts in MPEG-4 part 2. MPEG 1,2 and 4 also cover audio coding.
In order to provide better compression of video compared to previous standards, H.264 / MPEG-4 part 10 video coding standard was recently developed by the JVT (Joint Video Team) consisting of experts from VCEG and MPEG. H.264 fulfills significant coding efficiency, simple syntax specifications, and seamless integration of video coding into all current protocols and multiplex architectures. Thus H.264 can support various applications like video broadcasting, video streaming, video conferencing over fixed and wireless networks and over different transport protocols.
H.264 video coding standard has the same basic functional elements as previous standards (MPEG-1,
MPEG-2, MPEG-4 part 2, H.261, H.263) , i.e., transform for reduction of spatial correlation, quantization for bitrate control, motion compensated prediction for reduction of temporal correlation, entropy encoding for reduction of statistical correlation. However, in order to fulfill better coding performance, the important changes in H.264 occur in the details of each functional element by including intra-picture prediction, a new 4x4 integer transform, multiple reference pictures, variable block sizes and a quarter pel precision for motion compensation, a deblocking filter, and improved entropy coding.
Improved coding efficiency comes at the expense of added complexity to the coder/decoder. H.264 utilizes some methods to reduce the implementation complexity. Multiplier-free integer transform is introduced.Multiplication operation for the exact transform is combined with the multiplication of quantization.
The noisy channel conditions like the wireless networks obstruct the perfect reception of coded video bitstream in the decoder. Incorrect decoding by the lost data degrades the subjective picture quality and propagates to the subsequent blocks or pictures. So, H.264 utilizes some methods to exploit error resilience to network noise. The parameter setting, flexible macroblock ordering, switched slice, redundant slice methods are added to the data partitioning, used in previous standards.
For the particular applications, H.264 defines the Profiles and Levels specifying restrictions on bitstreams like some of the previous video standards. Seven Profiles are defined to cover the various applications from the wireless networks to digital cinema.
Besides H.264, other video coding techniques using the same functional block diagram with some
modifications have been developed. These are Mocrosoft Windows Media Video 9 (WMV-9) by the
Society of Motion Picture and Television Engineers (SMPTE) and AVS (Audio Video Coding Standard) by China.
2. Profiles and Levels
Each Profile specifies a subset of entire bitstream of syntax and limits that shall be supported by alldecoders conforming to that Profile. There are three Profiles in the first version: Baseline, Main, and Extended. Baseline Profile is to be applicable to real-time conversational services such as video conferencing and videophone. Main Profile is designed for digital storage media and television broadcasting. Extended Profile is aimed at multimedia services over Internet. Also there are four High Profiles defined in the fidelity range extensions for applications such as content-contribution, content-distribution, and studio editing and post-processing : High, High 10, High 4:2:2, and High 4:4:4. High Profile is to support the 8-bit video with 4:2:0 sampling for applications using high resolution. High 10 Profile is to support the 4:2:0 sampling with up to 10 bits of representation accuracy per sample. High 4:2:2 Profile is to support up to 4:2:2 chroma sampling and up to 10 bits per sample. High 4:4:4 Profile is to support up to 4:4:4 chroma sampling, up to 12 bits per sample, and integer residual color transform for coding RGB signal.
o. Common Parts of All Profiles
- I slice (Intra-coded slice) : the coded slice by using prediction only from decoded samples within thesame slice.
- P slice (Predictive-coded slice) : the coded slice by using inter prediction from previously-decodedreference pictures, using at most one motion vector and reference index to predict the sample valuesof each block.
- CAVLC (Context-based Adaptive Variable Length Coding) for entropy coding
o. Baseline Profile
- Flexible macroblock order : macroblocks may not necessarily be in the raster scan order. The mapassigns macroblocks to a slice group.
- Arbitrary slice order : the macroblock address of the first macroblock of a slice of a picture may besmaller than the macroblock address of the first macroblock of some other preceding slice of thesame coded picture.
- Redundant slice : This slice belongs to the redundant coded data obtained by same or different
coding rate, in comparison with previous coded data of same slice.
o. Main Profile
- B slice (Bi-directionally predictive-coded slice) : the coded slice by using inter prediction frompreviously-decoded reference pictures, using at most two motion vectors and reference indices topredict the sample values of each block.
- Weighted prediction : scaling operation by applying a weighting factor to the samples of motioncompensatedprediction data in P or B slice.
- CABAC (Context-based Adaptive Binary Arithmetic Coding) for entropy coding
o. Extended Profile
- Includes all parts of Baseline Profile : flexible macroblock order, arbitrary slice order, redundantslice
- SP slice : the specially coded slice for efficient switching between video streams, similar to coding of a P slice.
- SI slice : the switched slice, similar to coding of an I slice.
- Data partition : the coded data is placed in separate data partitions, each partition can be placed in different layer unit.
- B slice
- Weighted prediction
o. High Profiles
- Includes all parts of Main Profile : B slice, weighted prediction, CABAC
- Adaptive transform block size : 4 x 4 or 8 x 8 integer transform for luma samples
- Quantization scaling matrices : different scaling according to specific frequency associated with thetransform coefficients in the quantization process to optimize the subjective quality