Πως περιγράφεται το bitstream Η.264, Κεφάλαιο 7.1, σελ. 37 p20
Η γλώσσα Flavor Formal Language for Audio-Visual Object Representation What does it do? Automatic mapping of media information represented in arbitrary ways to program data structures p21
Technical Approach Principle of separation Same core decoding algorithm can be used with different syntax Declarative, not procedural Power of abstraction Efficiency (opportunities for optimization) Extension of typing system of C++ and Java Seamless integration with users programs Rich typing system Easy to translate to regular C++ and Java Note: Where C++ and Java differ, we select the common denominator p22
Hello Bits! Hello World! in Flavor C++/Java class HelloBits { char data; void get(void){ data=getint(8); } }; Flavor class HelloBits { char(8) data; } p23
Using Flavor in Programs media.fl Translator flavorc Run-Time Library libflavor class foo { int(15) a; } media.h media.cpp user.h user.cpp class foo { public: int a; int get(...); int put(...); }; CC Program Compiler p24
Status Used for syntax description in MPEG-4 (Systems, Structured Audio) About 1,000 users in diverse fields (even VLSI design) Open-sourced since 10/2001 XFlavor XML support (with D. Hong) Winner 1 st ACM Multimedia Open Source Software Competition (ACM MM-2004) Flavor Web Site: http://flavor.sourceforge.net Download Version 5 software (source/binaries for Win32 and all UNIX) Extensive Documentation and Papers p25
H.264 SVC (Scalable Video Coding) Σηµ.: Μερικές διαφάνειες και διαγράµµατα είναι από τον Thomas Wiegand, HHI p26
Scalable Video Coding (SVC) Principle SVC encoder QCIF@ 7,5 Hz 128 kbit/s 256 kbit/s H.264/AVC decoder SVC decoder CIF@ 15 Hz 512 kbit/s SVC decoder CIF@ 30 Hz scene 1024 kbit/s SVC decoder TV@ 60 Hz p27
Scalability of Video - Modalities Temporal: change of frame rate 30 Hz 15 Hz 7.5 Hz Spatial: change of frame size QCIF CIF TV Fidelity: change of quality (e.g. SNR) p28
SVC Dependency Structure Spatial Resolution global bit-stream 4CIF CIF Bit Rate (Quality, SNR) QCIF high low Temporal Resolution 60 30 15 7.5 - In scalable video coding there are multiple scalability and dependency dimensions p29
Prior scalable video coding standards: MPEG-2, H.263, MPEG-4 failed because of technical shortcomings and Graceful degradation not needed: transmission channel works or not Bit rate adaptation not needed: transmission channel has fixed throughput Power adaptation not needed: no handheld devices Format adaptation not needed: just one single format standard definition Today, the situation has changed Internet and mobile transmission are becoming primary distribution mechanisms Shared resource systems with varying throughput and errors Graceful degradation, bit rate adaptation, power adaptation A variety of terminals and displays from QCIF, QVGA, VGA, SD, to HD Backwards compatible extension Format adaptation: QVGA VGA, 720p/1080i 1080p p30
N=1 Video Coding Experiment with H.264/AVC: Foreman, CIF 30Hz @ 132 kbit/s Temporal Scalability I P P P P P P P P N=2 Cascaded QP assignment I/P: best quality, B 0 : much worse quality than I/P, B 1 : slightly worse quality than B 0, I B 0 P B 0 P B 0 P B 0 P 33,2 33 32,8 Y-PSNR [db] I B 1 B 0 B 1 P B 1 B 0 B 1 P N=4 32,6 32,4 32,2 Same QP for all pictures 32 31,8 I B 2 B 1 B 2 B 0 B 2 B 1 B 2 P N=8 31,6 31,4 1 2 4 8 16 32 64 Distance between two P pictures N p31
SNR Scalability: Typical Encoding Hierarchical MCP & Intra prediction texture motion Base layer coding Inter-layer prediction: Intra Motion Residual Hierarchical MCP & Intra prediction texture motion Base layer coding Multiplex Scalable bit-stream Inter-layer prediction: Intra Motion Residual H.264/AVC MCP & Intra prediction texture motion Base layer coding H.264/AVC compatible encoder H.264/AVC-compatible base layer bit-stream p32
Layered Coding for SNR Scalability Scal. / Quant. Transform Coding Entropy 2nd SNR Enhancement Layer (EL) + - Inv. Transform Inv. Scaling Scal. / Quant. Transform Coding Entropy 1st SNR Enhancement Layer (EL) + + - Prediction MC // Intra + - Inv. Transform Inv. Scaling Scal. / Quant. Transform Coding Entropy SNR Base Layer (BL) Same resolution of pictures over all layers Residual coding of the quantization error between original pictures and their lower layer reconstruction Can be converted to H.264/AVC without transcoding at any bit rate p33
Drift in Past SNR Scalable Coding Source of drift: Non-synchronized motion compensation loops in encoder and decoder due to loss of enhancement layer information EL only control - MPEG-2 SNR - Drift in both BL and EL - Low complexity - Inefficient BL and efficient EL coding (when latter complete) BL only control - MPEG-4 FGS - No drift - Low complexity - Efficient BL and inefficient EL coding 2-loop control - No drift in BL - Drift in EL - High complexity - Efficient BL and medium efficient EL coding p34
PSNR [db] SNR Scalability Results: Past Codecs 37 36 35 H.264: Hier. B, Single Layer 2-loop control MPEG-2: EL-only Control MPEG-4 FGS: BL-only Control 34 33 32 31 30 0 50 100 150 200 City, CIF, 30Hz Bit-rate [kbit/s] p35
Drift Control in SVC Adaptive BL/EL only encoder control for hierarchical prediction structures Single motion compensation loop decoding Enhancement layer Base layer time p36
PSNR [db] SNR Scalability Results: New SVC 37 36 35 34 H.264: Hier. B, Single Layer 2-loop control MPEG-2: EL-only Control MPEG-4 FGS: BL-only Control SVC: Random Losses in EL SVC: "Optimum" Losses in EL How much is the bit rate overhead? 33 32 31 30 0 50 100 150 200 City, CIF, 30Hz Bit-rate [kbit/s] p37
Coder Control Constrained problem: min p D( p) s.t. R( p) R T Unconstrained Lagrangian formulation: p = argmin{ D( p) + λ R(p) } p with l controlling the rate-distortion trade-off D Distortion R Rate R T Target rate p Parameter Vector Rate-Constrained Mode Decision D 2 (M QP)+λ M R(M QP) Rate-Constrained Motion Estimation D 1 (m,d)+λ D R(m,D) M Evaluated macroblock mode out of a set of possible modes QP Value of quantizer for transform coefficients l M Lagrange parameter for mode decision D 2 Sum of squared differences (luminance & chrominance) R Number of bits associated with header, motion, transform coefficients m Motion vector containing spatial displacement D Picture reference parameter l D Lagrange parameter for motion estimation D 1 Sum of absolute differences (luminance) R Number of bits associated with motion information p38
Encoder Optimization: JSVM encoder control Lagrangian bit-allocation techniques: bottom-up encoding process Encode base layer: Coding parameters are optimized for the base layer p 0 = argmin[ D 0 (p 0 ) + λ 0 R 0 ( p 0 )] { } p 0 Encode enhancement layer Coding parameters are optimized for the enhancement layer p 1 = argmin[ D 1 (p 1 p 0 ) + λ 1 R 1 ( p 1 p 0 )] { } p 1 p 0 All enhancement layer decisions are conditioned on already determined base layer coding parameters p39
Encoder Optimization: Joint Control Jointly optimize base and enhancement layer coding parameters Consider enhancement layer during base layer encoding Modified Lagrangian approach for base layer encoding p 0 = arg min p 0, p 1 p 0 { } Weighting factor w [( 1 w) ( D 0 (p 0 ) + λ 0 R 0 ( p 0 )) + w ( D 1 (p 1 p 0 ) + λ 1 R 1 (p 1 p 0 ))] w = 0: Current JSVM encoder control w = 1: Single-loop encoder control (base layer is not controlled) 0 < w < 1: Partly consider enhancement layer during base layer encoding Enhancement layer encoding is not modified p40
Results for SNR Scalability Average bit rate overhead relative to single layer coding 40% 35% 30% 25% 20% 15% 10% 5% 0% -5% Average bit rate overhead (JSVM): 21 % Average bit rate overhead (optimized): 11 % Bus (JSVM) Bus (optimized) Foreman (JSVM) Foreman (optimized) Football (JSVM) Football (optimized) Mobile (JSVM) Mobile (optimized) City (JSVM) City (optimized) Crew (JSVM) Crew (optimized) Harbour (JSVM) Harbour (optimized) Soccer (JSVM) Soccer (optimized) Average (JSVM) Average (optimized) 0 0,5 1 1,5 2 2,5 3 PSNR difference [db] relative to JSVM base layer p41
Spatial Scalability: Typical Encoding Hierarchical MCP & Intra prediction texture motion Base layer coding Spatial decimation Inter-layer prediction: Intra Motion Residual Hierarchical MCP & Intra prediction texture motion Base layer coding Multiplex Scalable bit-stream Spatial decimation Inter-layer prediction: Intra Motion Residual H.264/AVC MCP & Intra prediction texture motion Base layer coding H.264/AVC compatible encoder H.264/AVC-compatible base layer bit-stream p42
Layered Coding for Spatial Scalability Layer n+1 Layer n Layered coding Oversampled pyramid for each resolution: QCIF, CIF, 4CIF, 16CIF Independent MC prediction structure in each layer Inter-layer prediction: Switchable prediction (with upsampling) Prediction of intra macroblocks Prediction of partitioning and motion information - NEW Prediction of residual data (prediction errors) - NEW Single motion-compensation loop for decoding - NEW Reconstruction signal for motion-compensated macroblock cannot be used for inter-layer prediction Loss between 0 0.5 db p43
Similar to MPEG-2 / H.262, H. 263, and MPEG-4 Additional intra macroblock type Switch between existing Intra_4x4, Intra_8x8, or Intra_16x16 and new Intra_BL mode Up-sampling Intra_BL: prediction signal is generated by up-sampling the colocated signal in the base layer Luma: 4-tap filter Chroma: bi-linear filter p44
Additional inter macroblock type: data are derived from base-layer base_mode_flag Macroblock partitioning Reference indices Motion vectors 8x8 8x4 4x8 4x4 Additional motion vector prediction mode from base layer for conventional macroblock types motion_pred_flag inter-layer motion predictor existing spatial predictor 16x16 8x16 16x8 8x8 p45
Inter-Layer Prediction: Residual (New) Flag for inter macroblock types Signalling that residual is predicted by up-sampled base layer residual signal Block-wise, bi-linear up-sampling Not blockwise upsampling Blockwise upsampling p46
Single-loop Decoding (New) Past spatial scalable video: Inter-layer intra prediction requires that base layer is completely reconstructed Decode multiple motion compensation loops (and deblocking filter) Complexity larger than simulcast: full decoding plus inter-layer prediction In SVC: Inter-layer intra prediction is restricted to macroblocks for which the co-located base layer signal is intra-coded No need to decode multiple motion compensation loops Complexity overhead comparable to single-layer coding p47
Impact on Coding Efficiency PSNR [db] 38 Crew, GOP16 (CIF 30Hz - 4CIF 30Hz) 37 36 35 34 33 32 31 30 base layer Single layer Simulcast SVC (intra only) SVC (intra & motion only) 0 500 1000 1500 2000 2500 bit-rate [kbit/s] SVC Multiple-loop intra only (as in old standards) Multiple-loop (not supported in SVC) p48
Enhancement Layer Bit rate Overhead for CIF/30Hz 4CIF/30Hz Spatial Scalability 60% Average bit rate overhead relative to single layer coding 50% 40% 30% 20% 10% 0% 331 566 962 1632 kbps 81 kbps 163 312 602 138 308 52 kbps 15 kbps City simulcast City JSVM City optimized Crew simulcast Crew JSVM Crew optimized Harbour simulcast Harbour JSVM Harbour optimized Soccer simulcast Soccer JSVM Soccer optimized Average simulcast Average JSVM Average optimized -10% BL: 331 SL: 594 BL: 566 SL: 1053 BL: 962 SL: 1881 BL: 1632 SL: 3518 Average bit rate [kbps] BL: base layer SL: single layer p49
Base Layer Bit rate Overhead in Joint Optimization for CIF/30Hz 4CIF/30Hz Spatial Scalability 30% Average bit rate overhead relative to single layer coding 20% 10% 0% BL overhead: 13 kbps EL saving: 66 kpbs BL: 29 EL: 111 BL: 57 EL: 174 BL: 106 EL: 294 City optimized Crew optimized Harbour optimized Soccer optimized Average optimized -10% BL: 331 SL: 594 BL: 566 SL: 1053 BL: 962 SL: 1881 BL: 1632 SL: 3518 Average bit rate [kbps] BL: base layer EL: enhancement layer p50
Summary of SVC H.264/AVC hierarchical B pictures allow better coding efficiency and temporal scalability Scalable Video Coding is the next step video applications SVC realized through layered extension of H.264/AVC Power adaptation: decode appropriate part of bit-stream Graceful degradation when the right parts of the bitstream get lost Format adaptation: backwards compatible extension in mobile TV: QVGA VGA Interlaced and other ratios than 2:1 are also specified, e.g.. 1.5:1 for 720p to 1080p What s next in SVC standardization? Bit-depth scalability (8bit 4:2:0 -> 10bit 4:2:0) Color format scalability (4:2:0 -> 4:4:4) p51