ΝΤUA Τεχνολογία Πολυμέσων

ΝΤUA Τεχνολογία Πολυμέσων http://hscnl.ece.ntua.gr/index.php/teaching/undergraduate/multimedia-technology

Rate Distortion Theory D may be the MSE or some human perceived measure of distortion

Types of Lossy Compression VBR Variable Bit Rate CBR Constant Bit Rate we Discuss Later

Fig. 16-7, p. 356

Color components

Color image Three basic colors Primitives e.g., RGB, YCrCb,.. Video series of frames per second Europe PAL system 25fps USA NTSC system 30fps No improvement in video quality if we add more frames per second

Contents 2. Lesson 3: Transform Coding

Κωδικοποίηση μετασχηματισμού (transform coding) Στη κωδικοποίηση μετασχηματισμού, το σήμα υφίσταται ένα μαθηματικό μετασχηματισμό από το αρχικό πεδίο του χρόνου ή του χώρου σε ένα αφηρημένο πεδίο το οποίο είναι πιο κατάλληλο για συμπίεση. x(t) --> g(λ) Αυτή η διαδικασία είναι αντιστρεπτή, δηλαδή υπάρχει ο αντίστροφος μετασχηματισμός που θα επαναφέρει το σήμα στην αρχική του μορφή.

Πχ Μετασχηματισμός Fourier Έστω σήμα x(t), τότε ο μετασχηματισμός Fourier X(f) ορίζεται: j2 ft X ( f ) x( t) e dt Ο αντίστροφος μετασχηματισμός Fourier ορίζεται: 1 2 F X ( f ) x( t) X ( f ) e j ft df

Ασχετο: Συστηματα, Συνελιξη, κλπ

Μετασχηματισμός Fourier (2) - Εδω ο μετασχηματισμός μας νοιάζει για άλλον λόγο: θα μεταδώσουμε τους Fourier συντελεστες αντι για το σημα (αγνοώντας τους μικρούς). - Στην πραγματικότητα θα χρησιμοποιήσουμε έναν άλλον μετασχηματισμό, τον discrete cosine transform (DCT) και όχι τον Fourier, γιατί ο DCT δουλεύει καλύτερα (σε εικόνες που περιέχουν ευθείες γραμμές κλπ) Πεδίο χρόνου ή χώρου Πεδίο συχνοτήτων Τ Οι συντελεστές αυτοί αγνοούνται t Οι σημαντικότεροι συντελεστές για πολλούς τύπους Πληροφορίας συγκεντρώνονται στις χαμηλές συχνότητες f

ΠΑΡΑΔΕΙΓΜΑΤΑ (1) Θεωρούμε τη ΘΕΜΕΛΙΩΔΗ ΣΥΧΝΟΤΗΤΑ : α 1 =1 και α 3 =1/3, α 5 =1/5 και α 7 =1/7 Το παραπάνω προσεγγίζει το τετραγωνικό παλμό 16

(fundamental + third + fifth) + seventh harmonic + =

(fundamental + third + fifth + seventh) + ninth harmonic + =

(fundamental + third + fifth + seventh + ninth) + eleventh harmonic + =

(fundamental + third + fifth + seventh + ninth + eleventh) + thirteenth harmonic + =

Joint Photographic Experts Group (JPEG) compression algorithm The JPEG compression algorithm is at its best on photographs and paintings of realistic scenes with smooth variations of tone and color. JPEG may not be as well suited for line drawings and other textual or iconic graphics, where there are sharp contrasts between adjacent pixels (such images may be better saved in a lossless graphics format such as TIFF, GIF, PNG, or a raw image format)

The Discrete Cosine Transform (DCT) The discrete cosine transform (DCT) helps separate the image into parts (or spectral sub-bands) of differing importance (with respect to the image's visual quality). The DCT is similar to the discrete Fourier transform: it transforms a signal or image from the spatial domain to the frequency domain The general equation for a 1D (N data items) DCT is defined by the following equation: where The general equation for a 2D (N by M image) DCT is defined as: Φυσικά υπάρχει και ο αντίστοιχος μετασχηματισμός για να πάρουμε πίσω το σήμα

DCT on real image block

Properties of Transform Coding The basic operation of the DCT is as follows: The input image is N by M; f(i,j) is the intensity of the pixel in row i and column j; F(u,v) is the DCT coefficient in row u and column v of the DCT matrix. For most images, much of the signal energy lies at low frequencies; these appear in the upper left corner of the DCT. Compression is achieved since the lower right values represent higher frequencies, and are often small enough to be neglected with little visible distortion. The DCT input is an 8 by 8 array of integers. This array contains each pixel's gray scale level; 8 bit pixels have levels from 0 to 255. The output array of DCT contains a set of coefficients (between -2047 and 2048). It is computationally easier to regard the DCT as a set of basis functions which given a known input array size (8 x 8) can be precomputed and stored. This involves simply computing values for a convolution mask (8 x8 window) that gets applied. The values are simply calculated from the DCT formula.

DCT Discrete Cosine Transform The 64 (8 x 8) DCT basis functions

Transform coding

Computing the 2D DCT Problem can be reduced to a series of 1D DCTs apply 1D DCT (Vertically) to Columns apply 1D DCT (Horizontally) to resultant Vertical DCT above. or alternatively Horizontal to Vertical.

Fundamentals of JPEG Encoder DCT Quantizer Entropy coder Compressed image data IDCT Dequantizer Entropy decoder Decoder

Compression in JPEG="Joint Photographic Expert Group" JPEG works on 8 8 blocks Extract 8 8 block of pixels Convert to DCT domain Quantize each coefficient Different stepsize for each coefficient Based on sensitivity of human visual system Order coefficients in zig-zag order Entropy code the quantized values

Zig-Zag Scanning Purpose of the Zig-zag Scan: to group low frequency coefficients in top of vector. Maps 8 x 8 to a 1 x 64 vector

Zig-Zag Scanning and DCT coefficiant variation

Approximation by DCT basis

Transform Coding

Quantization of DCT coefficients

Default quantization table in JPEG A common quantization table (for luminance component) is 16 11 10 16 24 40 51 61 12 12 14 19 26 58 60 55 14 13 16 24 40 57 69 56 14 17 22 29 51 87 80 62 18 22 37 56 68 109 103 77 24 35 55 64 81 104 113 92 49 64 78 87 103 121 120 101 72 92 95 98 112 100 103 99

Example: quantized indices

Example: quantized coefficients

Example: reconstructed image

Zig-zag ordering Fundamentals of JPEG 0 1 5 6 14 15 27 28 2 4 7 13 16 26 29 42 3 8 12 17 25 30 41 43 9 11 18 24 31 40 44 53 10 19 23 32 39 45 52 54 20 22 33 38 46 51 55 60 21 34 37 47 50 56 59 61 35 36 48 49 57 58 62 63

Entropy coding Fundamentals of JPEG Run length encoding followed by Huffman Arithmetic DC term treated separately Differential Pulse Code Modulation (DPCM) 2-step process 1. Convert zig-zag sequence to a symbol sequence 2. Convert symbols to a data stream

Modes Sequential Progressive Fundamentals of JPEG Spectral selection Send lower frequency coefficients first Successive approximation Send lower precision first, and subsequently refine Lossless Hierarchical Send low resolution image first

Transform coding Encoder Image block Decoder T Transform Coefficients Q Zigzag Scan (2D->1D) Entropy coding Bitstream Reconstructed Image block T -1 Reconstructed Transform Coefficients Q -1 Entropy coding Inverse Zigzag Scan (1D->2D) Bitstream

Why divide to blocks? Image->Blocks Block-Based Coding

Example of JPEG Coding(Encoder) Transform coding(dct) Quantization Zigzag Scan Entropy Coding (bit stream) -415/16 = -26 2D->1D -26-415 52 16 55 11-29 -3 10 61-62 16 66 25 24 7055 2 40 61-20 0 51 64-1 0 61 73 3 0 Number->binary 63 12 17 59 12-21 66 14-62 -4 90 19 09 109 26 011 85 58-7 0 69 60-6 0 672 55 0-26 -3-46 62 14 3 159 13 3 8 1 268 16 6 77 5 2113 24-25 4-1 1144 40-1 -30 4 1104 57 110 05 066 692 07 0-5 73 56 0 1 2 0 0-4 -50 63 14 0 0 058 1713 1 1 1 71 2235 2 EOB 122 29-15 154 51 0-9 106 87 70 80 0 369 62 0 1010110 67 18 11 1 61 22-8 00100 68 37-13 0001 104 56-2 00100 126 68-1 0 109 0101 88 103 68-4 100001 0 170 77 0 0110 100011 001 100011 001 001 100101-10 79 24 0 65 35 11100110 01 60 55 03 70 64-3 110110 0 77 81-1 0 104 68 0110 0 113 58 211110100 0-1 92 75 0 00085 49-4 01010 71 64-1 0 64 78 02 59 87-1 0 103 55 02 121 61-3 0 120 651 0-2 101 83 0 87 72-1 0 79 92-1 0 69 95-1 0 68 98-2 0 112 65 0-1100 76-1 0 103 780-1 99 94 0-26 3 1 3 2 6 2 4 1 4 1 1 5 0 2 0 0 1 2 0 0 0 0 0 1 1 EOB

Example of JPEG Coding(decoder) Inverse Transform coding(dct) Inverse Quantization Inverse Zigzag Scan Inverse Entropy Coding (bit stream) 1D->2D -26-416 58 64-33 67-60 642 32 5948 2 620 70 00 078 0 Binary->number 56 112 55-24 67-4 -56 890 0 98 0 880 74 00 069 0 1010110-3 -42 60 5013 0100 1 7080 5001 119-1 -24 0100 141-1 -40116 0101 0 80 01000010 064 0 0110-4 -56 69 100011 517 1 71001 44 2 128-1 -29 100011 149 115 001 0 77 001 0 068 100101 11100110 110110 0110 11110100 00074 18 11010 53 0 640 0105 001150 0 840 650 72 0-2676 3 0 157 3 0 256 6 00274 4 00175 4 00157 10 5 057 02 0074 1 2 0 083 00 069 1 0 1 590 EOB 0 600 0 610 0 610 67 0 83 0 93 0 81 0 670 0 620 0 690 0 800 84 0 84 0-26 3 1 3 2 6 2 4 1 4 1 1 5 0 2 0 0 1 2 0 0 0 0 0 1 1 EOB

Example of JPEG Coding(Encoder) 52 55 61 66 70 61 64 73 63 59 66 90 109 85 69 72 62 59 68 113 144 104 66 73 63 58 71 122 154 106 70 69 67 61 68 104 126 88 68 70 79 65 60 70 77 68 58 75 85 71 64 59 55 61 65 83 87 79 69 68 65 76 78 94 DCT -415-29 -62 25 55-20 -1 3 7-21 -62 9 11-7 -6 6-46 8 77-25 -30 10 7-5 -50 13 35-15 -9 6 0 3 11-8 -13-2 -1 1-4 1-10 1 3-3 -1 0 2-1 -4-1 2-1 2-3 1-2 -1-1 -1-2 -1-1 0-1

Example of JPEG Coding(Encoder) -415/16 = -26-415 -29-62 25 55-20 -1 3 7-21 -62 9 11-7 -6 6-46 8 77-25 -30 10 7-5 -50 13 35-15 -9 6 0 3 11-8 -13-2 -1 1-4 1-10 1 3-3 -1 0 2-1 -4-1 2-1 2-3 1-2 -1-1 -1-2 -1-1 0-1 16 11 10 16 24 40 51 61 12 12 14 19 26 58 60 55 14 13 16 24 40 57 69 56 14 17 22 29 51 87 80 62 18 22 37 56 68 109 103 77 24 35 55 64 81 104 113 92 49 64 78 87 103 121 120 101 72 92 95 98 112 100 103 99

Example of JPEG Coding (Encoder) -415-29 -62 25 55-20 -1 3 7-21 -62 9 11-7 -6 6-46 8 77-25 -30 10 7-5 -50 13 35-15 -9 6 0 3 11-8 -13-2 -1 1-4 1-10 1 3-3 -1 0 2-1 -4-1 2-1 2-3 1-2 -1-1 -1-2 -1-1 0-1 -26-3 -6 2 2 0 0 0 1-2 -4 0 0 0 0 0-3 1 5-1 -1 0 0 0-4 1 2-1 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0

General Transform coding Encoder Image block T Transform Coefficients Q Zigzag Scan (2D->1D) Entropy coding Bitstream Decoder Reconstructed Image block T -1 Reconstructed Transform Coefficients Q -1 Entropy coding Inverse Zigzag Scan (1D->2D) Bitstream

Example of Zig Zag Scanning Transform coding(dct) Quantization Zigzag Scan Entropy Coding (bit stream) 2D->1D -26-3 -6 2 2 0 0 0 1-2 -4 0 0 0 0 0-3 1 5-1 -1 0 0 0-4 1 2-1 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0-26 3 1 3 2 6 2 4 1 4 1 1 5 0 2 0 0 1 2 0 0 0 0 0 1 1 EOB

Effect of QP Nearly all software implementations of JPEG permit user control over the compression-ratio (as well as other optional parameters), allowing the user to trade off picture-quality for smaller file size.

Summary