Open Source H.265 Encoder

November 12, 2013, AMD Developer Summit, San Jose, CA—Steve Borho from MulticoreWare detailed the developments on an H.265/HEVC video encoder project. This open source project will produce an encoder set that will eventually replace the H.264 encoders currently in place.

The background for the encoders covers much of the video landscape. In video, quality matters, and viewers greatly dislike the buffer underflow needed to compensate for bandwidth issues. Within a fixed bandwidth, a better encoder reduces bandwidth requirements and generates a better video stream. The need is clear. Mobile video consumes two-thirds of all bandwidth and is growing at over 66 percent per year.

In January of this year, the HEVC specification became official after extensive testing and refinements. The specification provides greater compression ratios, lower losses, and fewer compression artifacts than the previous H.264 standard. The improved performance is due to the internal changes in the approach to compression.

The first change is to use 64x64 computational units and quad trees to recursively reduce the area to as little as 8x8 compared to the H.264 16x16 fixed matrices. The predictor can be inter or intra and the blocks can be asymmetrically split as 1/4, 3/4, halves or all four quadrants. The filter taps increased and AMPV allows motion prediction from a list versus the implicit requirement previously.

The resulting lower complexity enables the larger blocks, which increases compression efficiency, improves the angle predictors, and reduces the number of special cases. The sample adaptive offset loop filter reduces compression artifacts.

The changes also allow more parallelism and better use with GPUs. The algorithms support wavefront parallel processing for higher throughput. For example, files for a frame can be split into regular rectangles and compressed in parallel. Then deblocking is done on 8x8 boundaries.

Generally, a larger block size reduces frame parallelism, leading to a 3-row (192) line lag between refreshes. Changing to a wavefront or file system reduces computation and lag , as will using more B versus P frames. The increasing serial operations for the larger blocks and more reconstruction for the individual pixels for adjacency corrections makes the block encode times differ which causes blockiness in the reconstructed frame.

Due to these issues, HEVC is from 5-10 times more compute intensive than H.264. This increase goes up even more for 4k and again for 8 k. Nevertheless, the H.265 encoder achieves at least 50 percent reduction in bits for a given bit rate, or an equivalent reduction in bit rate for the same number of bits.

The X265 open source project achieved a full 1080p at 15 frames per second in June. This effort used a 16 core SandyBridge Xeon processor array. The initial implementation used a lot of code and algorithms from the H.254 CX264 implementation. Current efforts are to rewrite critical portions in assembly code.

The encoding efforts are not well suited for GPU processing, since much of the coding is serial. The intra prediction needs the blocks above and to the left be completed before commencing, and inter prediction needs to fully analyze the similar blocks in scan order for completion. The complex data dependencies force the CPU and GPU to share structures at full speed.

In contrast, the APU and HSA offer a balance with their shared memory architectures. The GPU portions can process the parallel operations while the CPU handles the serial functions by using the shared memory.

Latest Images

Trending Articles

Latest Images