Γιώργος Γιαννακάκης No Parametric ΠΑΝΕΠΙΣΤΗΜΙΟ ΚΡΗΤΗΣ ΕΠΙΣΤΗΜΗΣ ΥΠΟΛΟΓΙΣΤΩΝ
Probability Desity Fuctio If the radom variable is deoted by X, its probability desity fuctio f has the property that
No Parametric Desity estimatio Problem: patter distributio is ukow -> estimate PDF Whe patter distributio is ukow o parametric techiques are employed Estimate desity distributio from the data through geeralizatio of the histogram. Uderlyig PDF Data
Histogram The simplest form of o-parametric desity estimatio is the histogram Separates the samplig area i small areas ad approximates the desity from umber of samples fallig withi each regio.
No Parametric Desity estimatio Histogram methods partitio the data space ito distict bis with widths Δ i ad cout the umber of observatios, i, i each bi. Ofte, the same width is used for all bis, Δ i = Δ. Δ acts as a smoothig parameter.
No parametric desity estimatio Give that the probability P of the PDF p(x) that a vector x will fall i a regio R is give by P P{ x R} p( x ')dx' Suppose we have N samples x 1, x 2,, x draw from the distributio p(x). The probability that k poits fall i R is the give by biomial distributio N k P{ x1, x2,..., xn} P (1 P) k Whe N the distributio becomes more sharp, so we ca cosider a good estimate of P the mea value of samples R k P k N
If we ow assume that p(x) is cotiuous ad that the regio R is so small that p does ot vary appreciably withi it Combiig results No parametric desity estimatio P p( x ')dx' p( x) V R k P k k p( x) V p( x) V P p( x) V Desity estimatio is more accurate as the sample size N icreases
No parametric desity estimatio To estimate the desity at x, we form a sequece of regios R 1, R 2,, R where R i cotais i samples. Let V be the volume of R, k be the umber of samples fallig i R, ad p (x) be the th estimate for p(x): k p ( x) V V: area of R k: umber of samples withi area R : total umber of samples Desity estimatio is more accurate as the sample size N icreases
No parametric desity estimatio If the total umber of samples N is stable the to improve desity estimatio accuracy miimize the volume, but the the area R will become so small will ot cotai practical samples A compromise must therefore be made so that V be large eough to cotai eough samples ad small eough to support the hypothesis that p (x) remais costat withi R Three coditios are required i order p ( ) ( ) x p x k p ( x) V 1) limv 0 2) lim k 3) lim k / 0
Example No parametric desity estimatio p k ( x) V
No parametric desity estimatio p k ( x) V There are two commo approaches of obtaiig sequeces of regios R i so as p (x) -> p(x) Set a fixed value for volume V ad calculate the cotets samples from the data (Parze Widows) Set a fixed umber samples k ad calculate the correspodig oe volume V of data (k-nearest Neighbours) It turs out that both approaches above coverge to the actual value of the fuctio probability desity whe N, sice thevolume V shriks ad k grows with N
I k-earest eighbor approach we fix k, ad fid V that cotais k poits iside Algorithm K-earest eighbors a iitial area aroud x is selected i order to estimate p(x) the area icreases i order k samples to be withi the area the k are the k-earest eighbors of x desity is estimated usig the formula p k ( x) NV
K-earest eighbors It the desity is high ear x, the cell will be relatively small, which leads to good resolutio. If the desity is low, the cell will grow large, but it will stop soo after it eters regios of higher desity
K-earest eighbors K selectio for desity estimatio A good rule of thumb is k It ca prove covergece if goes to ifiity
K-earest eighbors K selectio k should be large so that error rate is miimized k too small will lead to oisy decisio boudaries k should be small eough so that oly earby samples are icluded k too large will lead to oversmoothed boudaries Balacig 1 ad 2 is ot trivial This is a recurret issue, eed to smooth data, but ot too much
K-earest eighbor classificatio The k-earest-eighbor classificatio problem Goal: Classify a sample x by assigig it the label most frequetly represeted amog the k earest samples ad use a votig scheme Idea: To determie the label of a ukow sample (x), look at x s k-earest eighbors Compute Distace Test Record Traiig Records Choose k of the earest records
Load the data K-earest eighbor classificatio Iitialize the value of k For each data poit x i Calculate the distace metric betwee test poit ad each row of traiig data. Sort the calculated distaces ad idetify k-earest eighbors Get the most observed class amog the k-earest eighbors Classify x i accordig to the bigger umber belogig to a predefied class K is odd i order to esure votig will have a resultig class
K-earest eighbor classificatio
K-earest eighbor classificatio
k - Nearest eighbor method Majority vote withi the k earest eighbors ew K= 1: brow K= 3: gree 4/30/2018
k - Nearest eighbor method For k = 1,,7 poit x gets classified correctly red class For larger k classificatio of x is wrog blue class
K earest algorithm
How may eighbors to cosider? Noisy decisio boudaries
k - Nearest eighbor method K acts as a smoother For, the error rate of the 1-earest-eighbour classifier is ever more tha twice the optimal error (obtaied from the true coditioal class distributios).
Traiig error rate is a icreasig fuctio of k. As you ca see, the error rate at K=1 is always zero for the traiig sample. This is because the closest poit to ay traiig data poit is itself. Validatio error rate iitially decreases ad reaches a miima. After the miima poit, it the icrease with icreasig K. To get the optimal value of K, you ca segregate the traiig ad validatio from the iitial dataset. The optimal k (miimum) should be used for all predictios.
Value of k Larger k icreases cofidece i predictio Note that if k is too large, decisio may be skewed Weighted evaluatio of earest eighbors Plai majority may ufairly skew decisio Revise algorithm so that closer eighbors have greater vote weight Other distace measures k-nn variatios
k - Nearest eighbor method KNN belogs to the class of lazy algorithms: Process the traiig data after classificatio request Aswers to the classificatio request combiig stored data educatio Does ot take accout of logic or other results
Pros ad Cos of k-nn Pros Simple Good results Easy to add ew traiig examples Cos Computatioally expesive To determie earest eighbor, visit each traiig samples O(d) = umber of traiig samples d = dimesios