Θεωρία Πληροφορίας - Κώδικες Γιαννακόπουλος Θεόδωρος 1
Θεωρία Πληροφορίας - Κώδικες Εργαστήριο 3 Παραδείγματα Python για Θεωρία Πληροφορίας (1, 2 και 5) 2
inf_teiste_info_theory_lab - διαθέσιμο στο github github.com/tyiannak/inf_teiste_info_theory_lab - βασικές λειτουργίες και παραδείγματα example1.py import ITlib ITlib.py βασικές συναρτήσεις (υπολογισμός εντροπίας, οριακών πιθανοτήτων, χωρητικότητας καναλιού κτλ) example2.py import ITlib example3.py import ITlib import ITlib 3
Εντροπία - Μέγιστη Εν. - Οριακές πιθανότητες (1) ITlib.py (part) import numpy eps = 0.000000000001 Υπολογισμός εντροπίας από array πιθανοτήτων συμβόλων def computeentropy(probs): if numpy.abs(probs.sum()-1.0)<eps: # if sum of probabilities is almost equal to 1 ProbsNonZero = Probs[Probs>eps] # remove zero probabilities return -numpy.sum(probsnonzero * numpy.log2(probsnonzero)) # compute entropy else: raise ValueError("Probabilities must sum to unity!") # raise an error if probabilities do not sum to 1 def computemaxentropy(n): return numpy.math.log(n, 2) Υπολογισμός μέγιστης εντροπίας def computepriorsfromjointp(jointp): if numpy.abs(jointp.sum()-1.0) < eps: Υπολογισμός marginal Px = jointp.sum(axis = 1) πιθανοτήτων P(x), P(y) από τις Py = jointp.sum(axis = 0) από κοινού πιθανότητες P(x,y) return Px, Py else: raise ValueError("Probabilities must sum to unity!") # raise an error if probabilities do not sum to 1 4
example1.py: εντροπία και υπό συνθήκη εντροπία import ITlib, numpy import matplotlib.pyplot as plt def computechannelentropies(jointprobmatrix): [Px, Py] = ITlib.computePriorsFromJointP(jP) print "P(x)", Px print "P(y)", Py Hx = ITlib.computeEntropy(Px) Hy = ITlib.computeEntropy(Py) Hjoint = ITlib.computeEntropy(jP.flatten()) Hcond = Hjoint - Hy print "H(x) = %.4f" % Hx print "H(y) = %.4f" % Hy print "H(x,y) = %.4f" % Hjoint print "H(x y) = %.4f" % Hcond jp = numpy.array([[1.0/4, 1.0/16, 0],[1.0/4, 1.0/8, 0],[0.0, 1.0/16, 1.0/4.0]]) print "Joint probability matrix is " print jp computechannelentropies(jp) Παράδειγμα χρήσης της computeentropy(), computepriorsfromentropy() για τον υπολογισμό εντροπίας και δεσμευμένης εντροπίας από πίνακα αποκοινου πιθανοτήτων Joint probability matrix is [[ 0.25 0.0625 0. ] [ 0.25 0.125 0. ] [ 0. 0.0625 0.25 ]] P(x) [ 0.3125 0.375 0.3125] P(y) [ 0.5 0.25 0.25] H(x) = 1.5794 H(y) = 1.5000 H(x,y) = 2.3750 H(x y) = 0.8750 5
example2.py: εντροπία δυαδικής πηγής Παράδειγμα υπολογισμού και plotting της εντροπίας για μία πηγή με 2 σύμβολα, με πιθανότητες p και 1-p import ITlib import numpy import matplotlib.pyplot as plt step = 0.001 p1 = numpy.arange(0,1+step,step) p2 = 1 - p1 H = numpy.zeros(p1.shape) for i in range(p1.shape[0]): Probs = numpy.array([p1[i], p2[i]]) H[i] = ITlib.computeEntropy(Probs) plt.plot(p1, H) plt.xlabel("p1") plt.ylabel("entropy") plt.show() 6
example5.py: υπολογισμός εντροπίας σε κείμενα H = ; Hmax = ; Π = ; Pink Floyd - Time Ticking away the moments that make up a dull day Fritter and waste the hours in an offhand way. Kicking around on a piece of ground in your home town Waiting for someone or something to show you the way. Tired of lying in the sunshine staying home to watch the rain. You are young and life is long and there is time to kill today. And then one day you find ten years have got behind you. No one told you when to run, you missed the starting gun. Lady Gaga - Bad Romance Beatles - Love me do Love, love me do You know I love you I'll always be true So please, love me do Whoa, love me do Love, love me do You know I love you I'll always be true So please, love me do Whoa, love me do Beatles - Love me do Love, love me do You know I love you I'll always be true So please, love me do Whoa, love me do Love, love me do You know I love you I'll always be true So please, love me do Whoa, love me do Eminem - Rap God Look, I was gonna go easy on you and not to hurt your feelings But I'm only going to get this one chance Something's wrong, I can feel it (Six minutes, Slim Shady, you're on) Just a feeling I've got, like something's about to happen, but I don't know what If that means, what I think it means, we're in trouble, big trouble, And if he is as bananas as you say, I'm not taking any chances You were just what the doctor ordered I'm beginning to feel like a Rap God, Rap God All my people from the front to the back nod, back nod Now who thinks their arms are long enough to slap box, slap box? They said I rap like a robot, so call me Rapbot Mos def - Mathematic Ha ha You know the deal It's just me yo Beats by Su-Primo for all of my peoples, Negros and Latinos And even the gringos Yo, check it one for Charlie Hustle, two for Steady Rock Three for the fourth coming live, future shock It's five dimensions, six senses 7
example5.py Βασικά βήματα ανάλυσης κειμένου document Ticking away the moments that make up a dull day Fritter and waste the hours in an offhand way. Kicking around on a piece of ground in your home town Waiting for someone or something to show you the way. Tired of lying in the sunshine staying home to watch the rain. You are young and life is long and there is time to kill today. And then one day you find ten years have got behind you. No one told you when to run, you missed the starting gun. tokenization and lower case ticking away the moments that make up a dull day fritter and waste the hours in reduced list of words ticking away moments make up dull day fritter waste hours stop word removal list of words counting words+counts words+frequencies Term # way 4 run 3 day 3 come 3 tired 2 sun 2 something 2 shorter 2 find 2 behind 2 away 2 around 2 again 2 young 1 years 1 year 1 watch 1 waste 1 warm 1 waiting 1 normalize Term Freq way 0.035 run 0.026 day 0.026 come 0.026 tired 0.018 sun 0.018 something 0.018 shorter 0.018 find 0.018 behind 0.018 away 0.018 around 0.018 again 0.018 young 0.009 years 0.009 year 0.009 watch 0.009 waste 0.009 warm 0.009 waiting 0.009 entropy H 6.48 Hmax 6.58 Π 1.60% 8
example5.py import ITlib, numpy, re stopwords = ["i'm","don't","it's","i'll","you're","the","of" ] def readtxtandcomputeentropy(filename): print "filename: " + filename f = open(filename, "r") text = f.read() f.close() words = re.findall(r"[\w']+", text) # split to words (tokenization) words2 = [w.lower() for w in words] # convert to lower words3 = [word for word in words2 if word not in stopwords] # remove stopwords terms = []; weights = [] for w in words3: # for each word in the list if w in terms: # if it is already included in the list of terms weights[terms.index(w.lower())] += 1 # increase its count by 1 else: # else terms.append(w.lower()) # add it to the list of terms and weights.append(1) # initalize its counter to 1 terms2 = [t for (w, t) in sorted(zip(weights, terms), reverse = True)] # sort terms by their weights (counts) weights2 = numpy.array(sorted(weights, reverse = True)).astype(float) # and the weights (as float) weights = numpy.array(sorted(weights, reverse = True)) # and the initial weights (as integers) weights2 = weights2 / weights2.sum() # normalize weights2 by the sum (to convert to prob) H = ITlib.computeEntropy(weights2) # compute entropy Hmax = ITlib.computeMaxEntropy(float(len(terms2))) # compute max entropy P = 1 - H/Hmax # compute redundancy print "H=%.2f\tHmax=%.2f\tRed.=%.1f%%" % (H, Hmax, 100*P) readtxtandcomputeentropy("data/lyrics/pinkfloyd_time"); print
example5.py Example 5 Parse the lyrics from 5 songs and compute the entropies (after removing stopwords) filename: data/lyrics/pinkfloyd_time H=6.48 Hmax=6.58 Red.=1.6% filename: data/lyrics/eminem_rapgod H=8.74 Hmax=9.08 Red.=3.8% filename: data/lyrics/ladygaga_badromance H=4.43 Hmax=5.70 Red.=22.2% filename: data/lyrics/beatles_lovemedo H=2.57 Hmax=3.32 Red.=22.7% filename: data/lyrics/mosdef_mathematic H=8.15 Hmax=8.32 Red.=2.0%