16 3 013 3 JOURNAL OF MANAGEMENT SCIENCES IN CHINA Vol 16 No 3 Mar 013 1 K 30007 K Hausdorff K K K O1 4 A 1007-9807 013 03-001 - 08 0 3 X = 5 36 K SDA 1 symbolic data aalysis SDA 3 5 SDA 1 011-06 - 15 01-10 - 19 7171147 7100307 1973 Email guojp@ tju edu c
013 3 6 a 11 b 11 a 1p b 1p X = 1 7 Mahalaobi distaces 8 K 9 a 1 b 1 a p b p 1 X j city-block distaces X Hausdorff j = 1 μ kj k = 1 10 Wasserstei X kj μ kj X kj μ kj X j 11 1 X j S j = 1 σ kj + X j - μ kj 3 k = 1 μ kj X kj σ kj X kj X kj S kj K K K 1 Hausdorff Hausdorff 1 R P A = a b B = c d 9 13 Hausdorff H A B = c A - c B + r A - r B 1 1 X 4 a b X c X X r X X X = A B a b 4 Hausdorff c X X 1 X p p r X k = 1 a kj b kj X j k
3 K 3 Hausdorff μσ A B A X = x 1 x x p T B μσ D S A B = A - B + 槡 3 S A - S B 5 A B A B = c 1 d 1 c d c p d p T S A S B A B p 5 槡 3 d S X Y = p d x i y i = A B 槡 i =1 X = x + x p x i - y i + 槡 3 S xi - S yi 槡 i =1 7 S X = x - x 6 = r X μσ K 槡 3 槡 3 K 5 Hausdorff 1 5 1 μσ Hausdorff 1 3 A B 1 B A = 0 16 0 8 10 10 11 11 11 1 1 K 1 14 14 14 14 16 16 16 16 B 1 = 18 B = - 14 1 μσ K B 1 B A B 1 X 1 X K P 1 A P K y 1 y k K B A J B 1 A B A 4 Hausdorff D H A B 1 = D H A B = 5 μσ D S A B 1 = 150 66 D S A B = 6 150 66 D S A B 1 < D S A B μσ = a 1 b 1 a b a p b p T Y = y 1 y y p T 7 Hausdorff x i i = 1 y k k = 1 K 13 K J = K k = 1 x i y k i P k x i y k x i y k 1 K y 1 y K test = 0
4 013 3 μσ P j y j U V 1 x i P j 1 P h P h * μσ h h * test = 1x i P h * P h * = P h * i P h = P h \ i V CR 4 k x i = a i b i x i P k P k y k k = 1 K 3 3 P k y k y k S yk 5 test = 0 test = 0 3 3 1 150 50 CR corrected rad idex 14 x y U = u 1 u i u R μ = μ 1 V = v 1 v j v C [ ] Σ = σ 1 0 [ ] μ 0 σ R C 3 CR = [ 1 R R C i =1 j =1 ( io i =1 ( ( ij + C ( ( oj ] - j =1-1 R io ( ( oj i =1 C j =1 -( -1 R ( io i =1 C j =1 ( oj 8 0 3 x i i = U CR CR x i μσ K 1 350 350 3 data 1 μ 1 data μ 1 = μ = 64 μ = 0-1 1 4 = ij u i v j x y io u i o j x y x - r 1 x + r 1 y - r y + r v j CR - 1 1 1 1 σ 1 = 30 σ 1 = 100 σ = 9 = 9 σ = 36 data 3 μ 1 = 40 μ = 4 σ 1 = 9 σ = 9 r 1 r
3 K 5 3 3 r 1 r 60 K CR 1 1 K CR Table 1 The average CR values of K-meas clusterig of ormally distributed iterval symbolic data r 1 r 1 Fig 1 The radom iterval data 3 1 16 0 90 510 4 0 779 131 74 1 0 0 90 49 46 0 773 38 38 50 K K 1 μ - 58σ μ + 58σ 99 73% CR 0 7% r K N x r 1 / 58 N y r / 58 r 1 r 3 1 4 1 8 1 1 1 16 1 0 5 CR K 5 60 350 50 K 5 1 Hausdorff 13 CR 1 4 0 95 77 67 0 783 69 47 1 8 0 94 783 18 0 775 969 05 1 1 0 919 936 31 0 775 379 41 ad uiformly distributed iterval symbolic data K Fig The iterative process of K-meas clusterig of geerally
6 013 3 4 7 3 k = 3 K 3 8 7 K 7 0-100 3 Table Iterval symbolic data of the cars sample 0-100 /mm /mm /mm / /hp /s /km h -1 911 19 191 3 35 517 4 5 8 75 310 4 47 4 491 1 770 1 85 1 75 1 310 SL Z4 119 8 56 8 31 55 4 6 7 8 57 8 9 8 184 340 4 8 7 3 3 4 535 4 605 1 815 1 835 1 98 1 303 4 39 4 44 1 790 1 790 1 84 1 91 3 58 79 8 15 306 5 7 10 9 14 4 580 4 61 1 78 1 78 1 384 1 384 TT 50 9 70 8 00 7 5 6 4 37 4 178 4 198 1 84 1 84 1 345 1 358 3 49 1 69 8 15 306 5 4 9 7 18 4 580 4 61 1 78 1 78 1 395 1 395 Cayma 7 8 114 8 45 331 4 9 7 53 85 4 341 4 347 1 801 1 801 1 304 1 305 Boxster 68 8 106 45 30 5 7 51 74 4 39 4 34 1 801 1 801 1 9 1 31 11 8 16 8 105 131 9 8 1 8 174 00 4 608 4 608 1 743 1 743 1 465 1 465 13 8 18 58 101 161 8 8 16 180 0 4 544 4 544 1 760 1 760 1 461 1 464 GT 13 77 18 97 11 184 9 13 180 0 4 671 4 671 1 815 1 815 1 478 1 478 1 34 18 05 105 161 7 9 14 176 13 4 569 4 57 1 769 1 769 1 46 1 46 19 8 43 98 131 6 9 14 5 193 4 765 4 865 1 80 1 80 1 47 1 475 C5 17 69 40 38 140 0 8 6 10 9 00 30 4 745 4 805 1 780 1 860 1 458 1 476 10 89 15 99 117 184 8 7 13 9 180 5 4 598 4 598 1 797 1 797 1 477 1 477 A6L 35 5 85 33 170 350 6 4 9 3 0 65 5 01 5 035 1 855 1 855 1 485 1 485 5 41 6 79 76 156 30 6 11 8 10 E 46 5 71 184 45 8 5 9 1 30 45 73 45 41 41 450 6 1 9 7 39 4 981 5 039 1 846 1 860 1 471 1 477 5 01 5 01 1 855 1 855 1 464 1 466 5 175 5 175 1 903 1 903 1 450 1 450 G 38 8 73 41 35 351 5 6 8 9 10 49 4 653 4 780 1 773 1 85 1 394 1 455 7 89 8 39 8 59 544 4 6 7 8 45 5 179 5 1 1 90 1 90 1 484 1 478 S 93 59 8 31 517 4 6 8 3 44 5 06 5 30 1 871 1 871 1 473 1 485
3 K 7 3 Table 3 The compariso of precisio of the two methods μσ K K 3 3 G 3 3 5 E G 3 5 19 17 86 36% 77 7% μσ K K K 5 1 J 004 17 40-44 Hu Ya Wag Huiwe A ew data miig method based o huge data ad its applicatio J Joural of Beijig Uiversity of Aeroautics ad Astroautics Social Scieces Editio 004 17 40-44 i Chiese Bock H H Diday E Aalysis of Symbolic Data M New York Spriger-Verlag 000 3 J 010 13 4 38-43 Li Wehua Guo Jupeg Methodology ad applicatio of regressio aalysis of iterval-type symbolic data J Joural of Maagemet Scieces i Chia 010 13 4 38-43 i Chiese 4 Diday E Brito M P Symbolic cluster aalysis C / / Coceptual ad Numerical Aalysis of Data Eds Opitz O Heidelberg Spriger-Verlag 1989 45-84 5 De Carvalho F A T Cserel M Lechevallier Y Clusterig costraied symbolic data J Patter Recogitio Letters 009 30 11 1037-1045 6 De Carvalho F A T Brito P Bock H H Dyamic clusterig for iterval data based o L distace J Computatioal Statistics 006 1 31-50 7 Teorio C P De Carvalho F A T Pimetel J T A partitioig fuzzy clusterig algorithm for symbolic iterval data based o adaptive mahalaobis distaces C / / Proceedigs of 7th Iteratioal Coferece o Hybrid Itelliget Systems 007 174-179 8 De Carvalho F A T Teorio C P Fuzzy K-meas clusterig algorithms for iterval-valued data based o adaptive quadratic distaces J Fuzzy Sets ad Systems 010 161 3 978-999 9 De Carvalho F A T Lechevallier Y Partitioal clusterig algorithms for symbolic iterval data based o sigle adaptive distaces J Patter Recogitio 009 4 7 13-136 10 Irpio A Verde R Dyamic clusterig of iterval data usig a Wasserstei-based distace J Patter Recogitio 008 9 11 1648-1658
8 013 3 11 J 009 4 6-30 Re Shiji Lü Juhuai Geetic algorithm-based kerel fuctio FCM clusterig algorithm for iterval umbers J Joural of System Egieerig 009 4 6-30 i Chiese 1 J 011 31 1 367-37 Guo Jupeg Li Wehua Gao Feg Descriptive statistics ad aalysis of iterval symbolic data with geeral distributio J Systems Egieerig - Theory & Practice 011 31 1 367-37 i Chiese 13 J 008 11 3 16-8 Fa Bo Spatial clusterig miig method for site selectio problem of emergecy respose ceter J Joural of Maagemet Scieces i Chia 008 11 3 16-8 i Chiese 14 De Carvalho F A T De Souza R M C R Chavet M et al Adaptive Hausdorff distaces ad dyamic clusterig of symbolic iterval data J Patter Recogitio 006 7 3 167-179 K-meas clusterig of geerally distributed iterval symbolic data GUO Ju-peg CHEN Yig LI We-hua College of Maagemet ad Ecoomics Tiaji Uiversity Tiaji 30007 Chia Abstract The existed clusterig methods of iterval data mostly supposed that the data are uiformly distributed across the iterval However this is ot always practical Takig this ito accout this paper aims to research the k-meas clusterig method of iterval data with a geeral distributio The defiitio of geerally distributed iterval data is proposed ad descriptive statistics was researched based o empirical distributio theory O the basis of Hausdorff distace the paper puts forward a ew distace for iterval data which cosiders the poit data cotaied i the itervals Based o this we preset a algorithm of k-meas clusterig of geerally distributed iterval symbolic data A simulatio experimet is coducted to evaluate the validity of our method The results show that compared with aalysis methods of uiform iterval symbolic data the a- alysis methods of geerally distributed iterval symbolic data are more effective uder all the coditios desiged i our experimet Fially the method is illustrated by a example of real-case data which shows the advatages of our method i the practical applicatio Key words iterval symbolic data geeral distributio symbolic data aalysis clusterig aalysis