Cost-Sensitive Margin Distribution Optimization for Software Bug Localization

Σχετικά έγγραφα
ER-Tree (Extended R*-Tree)

The Research on Sampling Estimation of Seasonal Index Based on Stratified Random Sampling

Vol. 31,No JOURNAL OF CHINA UNIVERSITY OF SCIENCE AND TECHNOLOGY Feb

3: A convolution-pooling layer in PS-CNN 1: Partially Shared Deep Neural Network 2.2 Partially Shared Convolutional Neural Network 2: A hidden layer o

Area Location and Recognition of Video Text Based on Depth Learning Method

Nov Journal of Zhengzhou University Engineering Science Vol. 36 No FCM. A doi /j. issn

YOU Wen-jie 1 2 JI Guo-li 1 YUAN Ming-shun 2

No. 7 Modular Machine Tool & Automatic Manufacturing Technique. Jul TH166 TG659 A

Study on the Strengthen Method of Masonry Structure by Steel Truss for Collapse Prevention

FENXI HUAXUE Chinese Journal of Analytical Chemistry. Savitzky-Golay. n = SG SG. Savitzky-Golay mmol /L 5700.

Q L -BFGS. Method of Q through full waveform inversion based on L -BFGS algorithm. SUN Hui-qiu HAN Li-guo XU Yang-yang GAO Han ZHOU Yan ZHANG Pan

Stabilization of stock price prediction by cross entropy optimization

High order interpolation function for surface contact problem

Buried Markov Model Pairwise

Quick algorithm f or computing core attribute


Reading Order Detection for Text Layout Excluded by Image

1530 ( ) 2014,54(12),, E (, 1, X ) [4],,, α, T α, β,, T β, c, P(T β 1 T α,α, β,c) 1 1,,X X F, X E F X E X F X F E X E 1 [1-2] , 2 : X X 1 X 2 ;

Research on model of early2warning of enterprise crisis based on entropy

Congruence Classes of Invertible Matrices of Order 3 over F 2

A research on the influence of dummy activity on float in an AOA network and its amendments

, Litrrow. Maxwell. Helmholtz Fredholm, . 40 Maystre [4 ], Goray [5 ], Kleemann [6 ] PACC: 4210, 4110H

Research of Han Character Internal Codes Recognition Algorithm in the Multi2lingual Environment

ΕΘΝΙΚΟ ΜΕΤΣΟΒΙΟ ΠΟΛΥΤΕΧΝΕΙΟ ΣΧΟΛΗ ΗΛΕΚΤΡΟΛΟΓΩΝ ΜΗΧΑΝΙΚΩΝ ΚΑΙ ΜΗΧΑΝΙΚΩΝ ΥΠΟΛΟΓΙΣΤΩΝ

Bayesian Discriminant Feature Selection

CE 530 Molecular Simulation

Homomorphism in Intuitionistic Fuzzy Automata

Gro wth Properties of Typical Water Bloom Algae in Reclaimed Water

SocialDict. A reading support tool with prediction capability and its extension to readability measurement

CorV CVAC. CorV TU317. 1

An Automatic Modulation Classifier using a Frequency Discriminator for Intelligent Software Defined Radio

Error ana lysis of P2wave non2hyperbolic m oveout veloc ity in layered media

: Monte Carlo EM 313, Louis (1982) EM, EM Newton-Raphson, /. EM, 2 Monte Carlo EM Newton-Raphson, Monte Carlo EM, Monte Carlo EM, /. 3, Monte Carlo EM

1 (forward modeling) 2 (data-driven modeling) e- Quest EnergyPlus DeST 1.1. {X t } ARMA. S.Sp. Pappas [4]

Automatic extraction of bibliography with machine learning

2002 Journal of Software

J. of Math. (PRC) Banach, , X = N(T ) R(T + ), Y = R(T ) N(T + ). Vol. 37 ( 2017 ) No. 5

N. P. Mozhey Belarusian State University of Informatics and Radioelectronics NORMAL CONNECTIONS ON SYMMETRIC MANIFOLDS

Detection and Recognition of Traffic Signal Using Machine Learning

( ) , ) , ; kg 1) 80 % kg. Vol. 28,No. 1 Jan.,2006 RESOURCES SCIENCE : (2006) ,2 ,,,, ; ;

Twitter 6. DEIM Forum 2014 A Twitter,,, Wikipedia, Explicit Semantic Analysis,

SCITECH Volume 13, Issue 2 RESEARCH ORGANISATION Published online: March 29, 2018

VSC STEADY2STATE MOD EL AND ITS NONL INEAR CONTROL OF VSC2HVDC SYSTEM VSC (1. , ; 2. , )

Application of Genetic Algorithm in Architectural Conceptual Design

Πτυχιακή Εργασι α «Εκτι μήσή τής ποιο τήτας εικο νων με τήν χρή σή τεχνήτων νευρωνικων δικτυ ων»

Studies on the Binding Mechanism of Several Antibiotics and Human Serum Albumin

Analysis of energy consumption of telecommunications network and application of energy-saving techniques

Computing Gradient. Hung-yi Lee 李宏毅

{takasu, Conditional Random Field

[4] 1.2 [5] Bayesian Approach min-max min-max [6] UCB(Upper Confidence Bound ) UCT [7] [1] ( ) Amazons[8] Lines of Action(LOA)[4] Winands [4] 1

Motion analysis and simulation of a stratospheric airship

Zigbee. Zigbee. Zigbee Zigbee ZigBee. ZigBee. ZigBee

Διπλωματική Εργασία του φοιτητή του Τμήματος Ηλεκτρολόγων Μηχανικών και Τεχνολογίας Υπολογιστών της Πολυτεχνικής Σχολής του Πανεπιστημίου Πατρών

SVM. Research on ERPs feature extraction and classification

Electronic Supplementary Information (ESI)

Approximation Expressions for the Temperature Integral


ΔΙΠΛΩΜΑΤΙΚΕΣ ΕΡΓΑΣΙΕΣ

A Fault Identification Algorithm for Satellite Networks Based on System Level Diagnosis

Optimization, PSO) DE [1, 2, 3, 4] PSO [5, 6, 7, 8, 9, 10, 11] (P)

Chapter 1 Introduction to Observational Studies Part 2 Cross-Sectional Selection Bias Adjustment

Stress Relaxation Test and Constitutive Equation of Saturated Soft Soil

Estimation of stability region for a class of switched linear systems with multiple equilibrium points

2016 IEEE/ACM International Conference on Mobile Software Engineering and Systems

ΕΛΕΓΧΟΣ ΚΑΙ ΤΡΟΦΟΔΟΤΗΣΗ ΜΕΛΙΣΣΟΚΟΜΕΙΟΥ ΑΠΟ ΑΠΟΣΤΑΣΗ

[1] DNA ATM [2] c 2013 Information Processing Society of Japan. Gait motion descriptors. Osaka University 2. Drexel University a)

Evaluation on precision of occurrence measurement based on theory of errors

IPSJ SIG Technical Report Vol.2014-CE-127 No /12/6 CS Activity 1,a) CS Computer Science Activity Activity Actvity Activity Dining Eight-He


GPU. CUDA GPU GeForce GTX 580 GPU 2.67GHz Intel Core 2 Duo CPU E7300 CUDA. Parallelizing the Number Partitioning Problem for GPUs

Research on Economics and Management

EPL 603 TOPICS IN SOFTWARE ENGINEERING. Lab 5: Component Adaptation Environment (COPE)

ΓΕΩΠΟΝΙΚΟ ΠΑΝΕΠΙΣΤΗΜΙΟ ΑΘΗΝΩΝ ΤΜΗΜΑ ΕΠΙΣΤΗΜΗΣ ΤΡΟΦΙΜΩΝ ΚΑΙ ΔΙΑΤΡΟΦΗΣ ΤΟΥ ΑΝΘΡΩΠΟΥ

2 ~ 8 Hz Hz. Blondet 1 Trombetti 2-4 Symans 5. = - M p. M p. s 2 x p. s 2 x t x t. + C p. sx p. + K p. x p. C p. s 2. x tp x t.

1 n-gram n-gram n-gram [11], [15] n-best [16] n-gram. n-gram. 1,a) Graham Neubig 1,b) Sakriani Sakti 1,c) 1,d) 1,e)

Lecture 2: Dirac notation and a review of linear algebra Read Sakurai chapter 1, Baym chatper 3

ΠΟΛΥΤΕΧΝΕΙΟ ΚΡΗΤΗΣ ΣΧΟΛΗ ΜΗΧΑΝΙΚΩΝ ΠΕΡΙΒΑΛΛΟΝΤΟΣ

ΣΤΥΛΙΑΝΟΥ ΣΟΦΙΑ

[15], [16], [17] [6] [2] [5] Jiang [6] 2.1 [6], [10] Score(x, y) y ( 1) ( 1 ) b e ( 1 ) b e. O(n 2 ) Jiang [6] (word lattice reranking)

Research on real-time inverse kinematics algorithms for 6R robots

Development of a Seismic Data Analysis System for a Short-term Training for Researchers from Developing Countries

Πανεπιστήμιο Κρήτης, Τμήμα Επιστήμης Υπολογιστών Άνοιξη HΥ463 - Συστήματα Ανάκτησης Πληροφοριών Information Retrieval (IR) Systems

J. of Math. (PRC) 6 n (nt ) + n V = 0, (1.1) n t + div. div(n T ) = n τ (T L(x) T ), (1.2) n)xx (nt ) x + nv x = J 0, (1.4) n. 6 n

BCI On Feature Extraction from Multi-Channel Brain Waves Used for Brain Computer Interface

MOTROL. COMMISSION OF MOTORIZATION AND ENERGETICS IN AGRICULTURE 2014, Vol. 16, No. 5,

MUL TIL EVEL2USER2ORIENTED AGRICUL TURAL INFORMATION CLASSIFICATION

ΣΧΕΔΙΑΣΜΟΣ ΔΙΚΤΥΩΝ ΔΙΑΝΟΜΗΣ. Η εργασία υποβάλλεται για τη μερική κάλυψη των απαιτήσεων με στόχο. την απόκτηση του διπλώματος

Dynamic types, Lambda calculus machines Section and Practice Problems Apr 21 22, 2016

A summation formula ramified with hypergeometric function and involving recurrence relation

Table S1. Summary of data collections and structure refinements for crystals 1Rb-1h, 1Rb-2h, and 1Rb-4h.

MIDI [8] MIDI. [9] Hsu [1], [2] [10] Salamon [11] [5] Song [6] Sony, Minato, Tokyo , Japan a) b)

Applying Markov Decision Processes to Role-playing Game

CLIMATE CHANGE IMPACTS ON THE WATER BALANCE OF SMALL SCALE WATER BASINS

Test Data Management in Practice

* * E mail : matsuto eng.hokudai.ac.jp. Zeiss

Research on explaining porosity in carbonate reservoir by capture cross section method

College of Life Science, Dalian Nationalities University, Dalian , PR China.

Correction of chromatic aberration for human eyes with diffractive-refractive hybrid elements

PACS: Ox, Cw, a, TP

Resurvey of Possible Seismic Fissures in the Old-Edo River in Tokyo

Transcript:

ISSN 000-9825, CODEN RUXUEW E-ail: jos@iscas.ac.cn Journal of Software,207,28():3072 3079 [doi: 0.3328/j.cnki.jos.005345] http://www.jos.org.cn. Tel: +86-0-62562563, ((), 20023) :, E-ail: li@lada.nju.edu.cn :,,,.,,.,,,,,.,(cost-sensitive argin distribution optiization, CSMDO),,,. : ;;;; : TP3 :,..,207,28():3072 3079. http://www.jos. org.cn/000-9825/5345.ht : Xie Z, Li M. Cost-Sensitive argin distribution optiization for software bug localization. Ruan Jian Xue Bao/ Journal of Software, 207,28():3072 3079 (in Chinese). http://www.jos.org.cn/000-9825/5345.ht Cost-Sensitive Margin Distribution Optiization for Software Bug Localization XIE Zheng, LI Ming (State Key Laboratory for Novel Software Technology (Nanjing University), Nanjing 20023, China) Abstract: It is costly to identify bugs fro nuerous source code files in a large software project. Thus, locating bug autoatically and effectively becoes a worthy proble. Bug report is one of the ost valuable source of bug description, and precisely locating related source codes linked to the bug reports can help reducing software developent cost. Currently, ost of the research on bug localization based on deep neural networks focus on design of network structures while lacking attention to the loss function, which ipacts the perforance significantly in prediction tasks. In this paper, a cost-sensitive argin distribution optiization (CSMDO) loss function is proposed and applied to deep neural networks. This new ethod is capable of handling the ibalance of software defect data sets, and iproves the accuracy significantly. Key words: software ining; achine learning; bug localization; convolutional neural network; argin distribution optiization,,,.,, [,2]. [3 0],,,;,. : (6422304, 627227) Foundation ite: National Natural Science Foundation of China (6422304, 627227). :207-05-09; : 207-06-6; : 207-08-23

: 3073,,.,,,.,.,,,.,.,,.,,,.,Poshyvanyk [5] ;Lukins [6] LDA(latent Dirichlet allocation),;gay [7] (vector space odel, VSM);Zhou [8] BugLocator, VSM.,.La [9] auto encoder ;Huo [0],,.,,. [9 ] sigoid ( softax ),.,., AdaBoost [2,3].Zhang [4],(optial argin distribution achine, ODM),,., ODM.,(cost-sensitive argin distribution optiization, CSMDO),,,. CSMDO.,CSMDO.. 2. 3. 4..,.,.,,. C={c,c 2,,c n },R={r,r 2,,r n2 },y ij Y={+, }., f:r C, sgn(f(r i,c j )) Y={+, } r i c j.: in L ( f ) = λ L( f( r, c ), y ) + Ω( f) () f i, j i j ij,l(, ),Ω( ).λ,..2,[5,6]. [0] :

3074 Journal of Software Vol.28, No., Noverber 207,.,,., x ij =φ(r i,c j ),.,, f(r i,c j ). Fig. Convolutional neural network structure for bug localization one-hot, [7]., V={ bug, class, contains, file, is, this }, S= this file contains bug. 0 0 0 0 0 this 0 0 0 0 0 file x = (2) 0 0 0 0 0 contains 0 0 0 0 0 bug one-hot,.,., [8]. 2 2 : 2,; 2,.,.,,.,,,.

: 3075 2 2.,,.,, AdaBoost.Schapire [9] AdaBoost.Breian [20], boosting Arc-gv,.,Rayzin [2],Arc-gv,,., AdaBoost, [2,3].Zhang, ODM [4].ODM w, ;,ODM.,,.,,: 2 2 2 in w w+ C+ i i 0 i,, 2 ξ + C ξ + C ε w ξε yi=+ yi = i= s.t. yw x D ξ, yw x + D+ ε, i i i i i i ξ 0, ε 0, i =,...,. i i,,; 2,. D,. C + C., C + >C,. C 0. 2.2,(3) : C in L( w) w w ax{0, D yw x} w + 2 = + i i + 2 yi =+ C 2 0 2 ax{0, D yiw xi} + ax{0, yiw xi D} yi = i=,... (x i,y i ), L(w,x i,y i ) L(w). :L(w) 2C+ L( w) = w+ ( yiw xi + D ) yixii( yiw xi < D) I( yi =+ ) + 2C 2C 0 i= i= i= ( yw x + D ) y xi( yw x < D) I( y = ) + i i i i i i i ( yw x D ) y xi( yw x > + D) C i i i i i i (3) (4) (5)

3076 Journal of Software Vol.28, No., Noverber 207,I( ),, 0.,: E[ L( wx, i, yi)] = w+ 2 C+ E[( yiw xi + D ) yixii( yiw xi < D) I( yi =+ )] + 2 C E[( yiw xi + D ) yixii( yiw xi < D) I( yi = )] + 2 C0E[( yiw xi D ) yixii( yiw xi > + D)] 2C+ = w+ ( yiw xi + D ) yixii( yiw xi < D) I( yi =+ ) + i= (6) 2C ( yiw xi + D ) yixii( yiw xi < D) I( yi = ) + i= 2C0 ( yiw xi D ) yixii( yiw xi > + D) i= = L( w), L(w,x i,y i ) L(w).,., (ini-batch gradient descent),. 3,.. 3. 4,,.,, [22]. [8,9],,JDT(Java developent tools) Eclipse Java IDE,PF(eclipse platfor) Eclipse,PDE(plug-in developent environent) Eclipse,AspectJ Java.. Table Statistics of the data sets JDT 2 826 2 272 4.39 PF 4 893 02 6.79 PDE 4 034 2 970 8.34 AspectJ 734 36.73,,,., Top 0 Rank,AUC(area under ROC curve) MAP(ean average precision) [8].,,BugLocator [8] Zhou,;Two- Phase [8] Ki,, ;HyLoc [9] La, auto encoder,..,,.

: 3077 2,,.,NP-CNN logistic (cross entropy), Logistic logistic.np-cnn hinge SVM hinge,,.., CSMDO. 3.2 0,.Top 0 Rank,MAP AUC 2~ 4.,/ CSMDO /(Wilcoxon, 0.05). Table 2 2 Top 0 Rank of copared ethods on all data sets Top 0 Rank Data set BugLocator Two-Phase HyLoc NP-CNN logistic NP-CNN higne CSMDO JDT 0.742 0.642 0.789 0.938 0.933 0.935 PF 0.725 0.732 0.760 0.894 0.867 0.897 PDE 0.673 0.67 0.78 0.842 0.85 0.86 AspectJ 0.622 0.570 0.74 0.848 0.842 0.87 Avg. 0.69 0.640 0.752 0.880 0.873 0.89 Table 3 3 MAP of copared ethods on all data sets MAP Data set BugLocator Two-Phase HyLoc NP-CNN logistic NP-CNN higne CSMDO JDT 0.44 0.372 0.455 0.522 0.525 0.533 PF 0.350 0.398 0.40 0.537 0.532 0.539 PDE 0.422 0.38 0.552 0.624 0.632 0.674 AspectJ 0.48 0.397 0.433 0.545 0.546 0.577 Avg. 0.408 0.387 0.463 0.557 0.559 0.58 Table 4 4 AUC of coparedethods on all data sets AUC Data set BugLocator Two-Phase HyLoc NP-CNN logistic NP-CNN higne CSMDO JDT 0.78 0.724 0.83 0.88 0.97 0.927 PF 0.772 0.803 0.830 0.93 0.895 0.94 PDE 0.753 0.763 0.824 0.94 0.925 0.927 AspectJ 0.683 0.664 0.76 0.857 0.852 0.926 Avg. 0.747 0.739 0.807 0.89 0.897 0.930,CSMDO. 4, BugLocator,Two-Phase HyLoc 3,CSMDO AUC 8.3%,9.% 2.3%,.:,CSMDO,. NP-CNN logistic,csmdo Top 0 Rank,MAP AUC 3.%,2.4% 3.9%, hinge NP-CNN hinge,.8%,2.2% 3.3%., CSMDO MAP AUC 3 Top 0 Rank.,,

3078 Journal of Software Vol.28, No., Noverber 207,. 4, (CSMDO),.,,.,,,.,,.,.,,. References: [] Li M, Huo X. Software defect ining based on sei-supervised learning. Journal of Data Acquisition and Processing, 206,3(): 56 64 (in Chinese with English abstract). [doi: 0.6337/j.004-9037.206.0.005] [2] Li M. Software defect ining. Chinese Association for Artificial Intelligence Conunication, 206,2(8):44 49 (in Chinese with English abstract). [3] Tu WW, Li M, Zhou ZH. Mining software defect factor. Journal of Jilin University (Engineering and Technology Edition), 202, 42(s):383 386 (in Chinese with English abstract). [4] Ding H, Chen L, Qian J, Xu L, Xu BW. Fault localization ethod using inforation quantity. Ruan Jian Xue Bao/Journal of Software, 203,24(7):484 494 (in Chinese with English abstract). http://www.jos.org.cn/000-9825/4294.ht [doi: 0.3724/SP.J. 00.203.04294] [5] Poshyvanyk D, Gueheneuc YG, Marcus A, Antoniol G, Rajlich VC. Feature location using probabilistic ranking of ethods based on execution scenarios and inforation retrieval. IEEE Trans. on Software Engineering, 2007,33(6):420 432. [doi: 0.09/tse. 2007.06] [6] Lukins SK, Kraft N, Etzkorn LH. Source code retrieval for bug localization using latent dirichlet allocation. In: Proc. of 5th Working Conf. on Reverse Engineering. IEEE, 2008. 55 64. [doi: 0.09/wcre.2008.33] [7] Gay G, Haiduc S, Marcus A, Menzies T. On the use of relevance feedback in IR-based concept location. In: Proc. of Int l Conf. on Software Maintenance. 2009. 35 360. [doi: 0.09/ics.2009.530635] [8] Zhou J, Zhang H, Lo D. Where should the bugs be fixed? More accurate inforation retrieval-based bug localization based on bug reports. In: Proc. of the 34th Int l Conf. on Software Engineering. 202. 4 24. [doi: 0.09/icse.202.622720] [9] La AN, Nguyen AT, Nguyen HA, Nguyen TN. Cobining deep learning with inforation retrieval to localize buggy files for bug reports. In: Proc. of the 28th Int l Conf. on Autoated Software Engineering. 205. [doi: 0.09/ase.205.73] [0] Huo X, Li M, Zhou ZH. Learning unified features fro natural and prograing languages for locating buggy source code. In: Proc. of the 25th Int l Joint Conf. on Artificial Intelligence (IJCAI 206). New York, 206. 606 62. [] Mou L, Li G, Zhang L, Wang T, Jin Z. Convolutional neural networks over tree structures for prograing language processing. In: Proc. of the 30th AAAI Conf. on Artificial Intelligence. 206. 287 293. [2] Wang L, Sugiyaa M, Jing Z, Yang C, Zhou ZH, Feng J. A refined argin analysis for boosting algoriths via equilibriu argin. Journal of Machine Learning Research, 20,2:835 863. [3] Gao W, Zhou ZH. On the doubt about argin explanation of boosting. Artificial Intelligence, 203,203: 8. [doi: 0.06/j.artint. 203.07.002] [4] Zhang T, Zhou ZH. Optial argin distribution achine. arxiv:604.03348, 206.

: 3079 [5] Johnson R, Zhang T. Effective use of word order for text categorization with convolutional neural networks. In: Proc. of the Conf. of the North Aerican Chapter of the Association for Coputational Linguistics: Huan Language Technologies. 205. 03 2. [doi: 0.35/v/n5-0] [6] Zhang X, LeCun Y. Text understanding fro scratch. arxiv:502.070, 205. [7] Ki Y. Convolutional neural networks for sentence classification. In: Proc. of the Conf. on Epirical Methods in Natural Language Processing. ACL, 204. 746 75. [doi: 0.35/v/d4-8] [8] Ki D, Zeller A, Tao Y, Ki S. Where should we fix this bug? A two-phase recoendation odel. IEEE Trans. on Software Engineering, IEEE, 203,39():597 60. [doi: 0.09/tse.203.24] [9] Schapire RE, Freund Y, Barlett P, Lee WS. Boosting the argin: A new explanation for the effectiveness of voting ethods. In: Fisher DH, ed. Proc. of the 4th Int l Conf. on Machine Learning (ICML 97). Morgan Kaufann Publishers, 997. 322 330. [20] Breian L. Prediction gaes and arcing algoriths. Neural Coputation, 999,(7):493 57. [doi: 0.62/08997669930006 06] [2] Reyzin L, Schapire RE. How boosting the argin can also boost classifier coplexity. In: Cohen WW, Moore A, eds. Proc. of the 23rd Int l Conf. on Machine Learning (ICML 2006), Vol.48. 2006. 753 760. [doi: 0.45/43844.43939] [22] Fischer M, Pinzger M, Gall H. Populating a release history database fro version control and bug tracking systes. In: Proc. of the Int l Conf. on Software Maintenance. IEEE, 2003. 23 32. [doi: 0.09/ics.2003.235403] : [],..,206,3():56 64. [doi: 0.6337/j.004-9037.206.0.005] [2]..,206,2(8):44 49. [3],,..(),202,42(s):383 386. [4],,,,..,203,24(7):484 494. http://www.jos.org.cn/000-982 5/4294.ht [doi: 0.3724/SP.J.00.203.04294] (994),,,, (980),,,,,.,CCF,,.