Stock Research Reports Classification Based on Sentiment Analysis



Σχετικά έγγραφα
ER-Tree (Extended R*-Tree)

Quick algorithm f or computing core attribute

Nov Journal of Zhengzhou University Engineering Science Vol. 36 No FCM. A doi /j. issn

{takasu, Conditional Random Field

Automatic Domain2Specific Term Extraction and Its Application in Text Cla ssification

SVM. Research on ERPs feature extraction and classification

Automatic extraction of bibliography with machine learning

1530 ( ) 2014,54(12),, E (, 1, X ) [4],,, α, T α, β,, T β, c, P(T β 1 T α,α, β,c) 1 1,,X X F, X E F X E X F X F E X E 1 [1-2] , 2 : X X 1 X 2 ;

ΓΙΑΝΝΟΥΛΑ Σ. ΦΛΩΡΟΥ Ι ΑΚΤΟΡΑΣ ΤΟΥ ΤΜΗΜΑΤΟΣ ΕΦΑΡΜΟΣΜΕΝΗΣ ΠΛΗΡΟΦΟΡΙΚΗΣ ΤΟΥ ΠΑΝΕΠΙΣΤΗΜΙΟΥ ΜΑΚΕ ΟΝΙΑΣ ΒΙΟΓΡΑΦΙΚΟ ΣΗΜΕΙΩΜΑ

ΔΙΑΧΕΙΡΙΣΗ ΠΕΡΙΕΧΟΜΕΝΟΥ ΠΑΓΚΟΣΜΙΟΥ ΙΣΤΟΥ ΚΑΙ ΓΛΩΣΣΙΚΑ ΕΡΓΑΛΕΙΑ. Data Mining - Classification

ΔΙΠΛΩΜΑΤΙΚΕΣ ΕΡΓΑΣΙΕΣ

2016 IEEE/ACM International Conference on Mobile Software Engineering and Systems

An Automatic Modulation Classifier using a Frequency Discriminator for Intelligent Software Defined Radio

Schedulability Analysis Algorithm for Timing Constraint Workflow Models

: Monte Carlo EM 313, Louis (1982) EM, EM Newton-Raphson, /. EM, 2 Monte Carlo EM Newton-Raphson, Monte Carlo EM, Monte Carlo EM, /. 3, Monte Carlo EM


Reading Order Detection for Text Layout Excluded by Image

Twitter 6. DEIM Forum 2014 A Twitter,,, Wikipedia, Explicit Semantic Analysis,

Buried Markov Model Pairwise

ΑΓΓΛΙΚΑ Ι. Ενότητα 7α: Impact of the Internet on Economic Education. Ζωή Κανταρίδου Τμήμα Εφαρμοσμένης Πληροφορικής

ΟΙΚΟΝΟΜΙΚΟ ΠΑΝΕΠΙΣΤΗΜΙΟ ΑΘΗΝΩΝ ΠΑΤΗΣΙΩΝ ΑΘΗΝΑ Ε - ΜΑΙL : mkap@aueb.gr ΤΗΛ: , ΚΑΠΕΤΗΣ ΧΡΥΣΟΣΤΟΜΟΣ. Βιογραφικό Σημείωμα

IPSJ SIG Technical Report Vol.2014-CE-127 No /12/6 CS Activity 1,a) CS Computer Science Activity Activity Actvity Activity Dining Eight-He

Probabilistic Approach to Robust Optimization

Study on the Strengthen Method of Masonry Structure by Steel Truss for Collapse Prevention

1 n-gram n-gram n-gram [11], [15] n-best [16] n-gram. n-gram. 1,a) Graham Neubig 1,b) Sakriani Sakti 1,c) 1,d) 1,e)

No. 7 Modular Machine Tool & Automatic Manufacturing Technique. Jul TH166 TG659 A

Research of Han Character Internal Codes Recognition Algorithm in the Multi2lingual Environment

(Υπογραϕή) (Υπογραϕή) (Υπογραϕή)

Analysis of energy consumption of telecommunications network and application of energy-saving techniques

Arbitrage Analysis of Futures Market with Frictions

Εξόρυξη Γνώμης: Δημιουργία Ελληνικού Λεξικού Πόρου

( ) , ) , ; kg 1) 80 % kg. Vol. 28,No. 1 Jan.,2006 RESOURCES SCIENCE : (2006) ,2 ,,,, ; ;

Detection and Recognition of Traffic Signal Using Machine Learning

CorV CVAC. CorV TU317. 1

Big Data/Business Intelligence

Αντώνης Βεντούρης. Επίκουρος Καθηγητής Διδακτικής των Γλωσσών Τμήμα Ιταλικής Γλώσσας και Φιλολογίας Αριστοτέλειο Πανεπιστήμιο Θεσσαλονίκης

SocialDict. A reading support tool with prediction capability and its extension to readability measurement

ΠΩΣ ΕΠΗΡΕΑΖΕΙ Η ΜΕΡΑ ΤΗΣ ΕΒΔΟΜΑΔΑΣ ΤΙΣ ΑΠΟΔΟΣΕΙΣ ΤΩΝ ΜΕΤΟΧΩΝ ΠΡΙΝ ΚΑΙ ΜΕΤΑ ΤΗΝ ΟΙΚΟΝΟΜΙΚΗ ΚΡΙΣΗ

Gro wth Properties of Typical Water Bloom Algae in Reclaimed Water

46 2. Coula Coula Coula [7], Coula. Coula C(u, v) = φ [ ] {φ(u) + φ(v)}, u, v [, ]. (2.) φ( ) (generator), : [, ], ; φ() = ;, φ ( ). φ [ ] ( ) φ( ) []

GPU. CUDA GPU GeForce GTX 580 GPU 2.67GHz Intel Core 2 Duo CPU E7300 CUDA. Parallelizing the Number Partitioning Problem for GPUs

Study of urban housing development projects: The general planning of Alexandria City

Area Location and Recognition of Video Text Based on Depth Learning Method

User Behavior Analysis for a Large2scale Search Engine

Toward a SPARQL Query Execution Mechanism using Dynamic Mapping Adaptation -A Preliminary Report- Takuya Adachi 1 Naoki Fukuta 2.

Research on Economics and Management

Vol. 31,No JOURNAL OF CHINA UNIVERSITY OF SCIENCE AND TECHNOLOGY Feb

Stabilization of stock price prediction by cross entropy optimization

Development of the Nursing Program for Rehabilitation of Woman Diagnosed with Breast Cancer

1 (forward modeling) 2 (data-driven modeling) e- Quest EnergyPlus DeST 1.1. {X t } ARMA. S.Sp. Pappas [4]

Optimization, PSO) DE [1, 2, 3, 4] PSO [5, 6, 7, 8, 9, 10, 11] (P)

A research on the influence of dummy activity on float in an AOA network and its amendments

Application of a novel immune network learn ing algorithm to fault diagnosis

DOI /J. 1SSN

(clusters) clusters : clusters : clusters : 4. :

3: A convolution-pooling layer in PS-CNN 1: Partially Shared Deep Neural Network 2.2 Partially Shared Convolutional Neural Network 2: A hidden layer o

Topic Structure Mining based on Wikipedia and Web Search

Ερευνητική+Ομάδα+Τεχνολογιών+ Διαδικτύου+

HIV HIV HIV HIV AIDS 3 :.1 /-,**1 +332

«ΑΓΡΟΤΟΥΡΙΣΜΟΣ ΚΑΙ ΤΟΠΙΚΗ ΑΝΑΠΤΥΞΗ: Ο ΡΟΛΟΣ ΤΩΝ ΝΕΩΝ ΤΕΧΝΟΛΟΓΙΩΝ ΣΤΗΝ ΠΡΟΩΘΗΣΗ ΤΩΝ ΓΥΝΑΙΚΕΙΩΝ ΣΥΝΕΤΑΙΡΙΣΜΩΝ»

Research on model of early2warning of enterprise crisis based on entropy

Wiki. Wiki. Analysis of user activity of closed Wiki used by small groups

ΣΤΟΙΧΕΙΑ ΠΡΟΤΕΙΝΟΜΕΝΟΥ ΕΞΩΤΕΡΙΚΟΥ ΕΜΠΕΙΡΟΓΝΩΜΟΝΟΣ Προσωπικά Στοιχεία:

Optimization Investment of Football Lottery Game Online Combinatorial Optimization

ΕΛΛΗΝΙΚΑ. Πεδίο Έρευνας και Τεχνολογίας. Όνομα Εργαστηρίου Σχολή Ιστορίας. Έρευνα Εργαστηρίου Α/Α

Αξιολόγηση των εκπαιδευτικών δραστηριοτήτων των νοσοκομειακών βιβλιοθηκών.

Q L -BFGS. Method of Q through full waveform inversion based on L -BFGS algorithm. SUN Hui-qiu HAN Li-guo XU Yang-yang GAO Han ZHOU Yan ZHANG Pan

ΑΠΟΔΟΤΙΚΗ ΑΠΟΤΙΜΗΣΗ ΕΡΩΤΗΣΕΩΝ OLAP Η ΜΕΤΑΠΤΥΧΙΑΚΗ ΕΡΓΑΣΙΑ ΕΞΕΙΔΙΚΕΥΣΗΣ. Υποβάλλεται στην

Motion analysis and simulation of a stratospheric airship

Τοποθέτηση τοπωνυµίων και άλλων στοιχείων ονοµατολογίας στους χάρτες

Η Διαδραστική Τηλεδιάσκεψη στο Σύγχρονο Σχολείο: Πλαίσιο Διδακτικού Σχεδιασμού

2 ~ 8 Hz Hz. Blondet 1 Trombetti 2-4 Symans 5. = - M p. M p. s 2 x p. s 2 x t x t. + C p. sx p. + K p. x p. C p. s 2. x tp x t.

Octretide joint proton pump inhibitors in treating non-variceal gastrointestinal bleeding a Metaanalysis

ΕΥΘΑΛΙΑ ΚΑΜΠΟΥΡΟΠΟΥΛΟΥ

Χρήση οντολογιών στη χαρτογράφηση γνώσης: Μελέτη περίπτωσης σε μία ακαδημαϊκή βιβλιοθήκη

ΤΕΧΝΟΛΟΓΙΚΟ ΠΑΝΕΠΙΣΤΗΜΙΟ ΚΥΠΡΟΥ ΣΧΟΛΗ ΕΠΙΣΤΗΜΩΝ ΥΓΕΙΑΣ. Πτυχιακή Εργασία

Bayesian Discriminant Feature Selection

J. of Math. (PRC) Banach, , X = N(T ) R(T + ), Y = R(T ) N(T + ). Vol. 37 ( 2017 ) No. 5

ΔΙΠΛΩΜΑΤΙΚΗ ΕΡΓΑΣΙΑ. «Προστασία ηλεκτροδίων γείωσης από τη διάβρωση»

Re-Pair n. Re-Pair. Re-Pair. Re-Pair. Re-Pair. (Re-Merge) Re-Merge. Sekine [4, 5, 8] (highly repetitive text) [2] Re-Pair. Blocked-Repair-VF [7]

Approximation Expressions for the Temperature Integral

High order interpolation function for surface contact problem

Μιχαήλ Νικητάκης 1, Ανέστης Σίτας 2, Γιώργος Παπαδουράκης Ph.D 1, Θοδωρής Πιτηκάρης 3

The Impact of Stopping IPO in Shenzhen A Stock Market on Guiding Pattern of Information in China s Stock Markets

Optimizing Microwave-assisted Extraction Process for Paprika Red Pigments Using Response Surface Methodology

,,, Learning to Identif y Chinese Comparative Sentences


Αξιολόγηση πληροφοριακών συστηµάτων και υπηρεσιών πληροφόρησης

Newman Modularity Newman [4], [5] Newman Q Q Q greedy algorithm[6] Newman Newman Q 1 Tabu Search[7] Newman Newman Newman Q Newman 1 2 Newman 3

FX10 SIMD SIMD. [3] Dekker [4] IEEE754. a.lo. (SpMV Sparse matrix and vector product) IEEE754 IEEE754 [5] Double-Double Knuth FMA FMA FX10 FMA SIMD

ΠΑΡΑΔΟΤΕΟ 3.1 : Έκθεση καταγραφής χρήσεων γης

Research on real-time inverse kinematics algorithms for 6R robots

Resurvey of Possible Seismic Fissures in the Old-Edo River in Tokyo

ΔΙΠΛΩΜΑΤΙΚΕΣ ΕΡΓΑΣΙΕΣ ΠΜΣ «ΠΛΗΡΟΦΟΡΙΚΗ & ΕΠΙΚΟΙΝΩΝΙΕς» OSWINDS RESEARCH GROUP

ΟΡΓΑΝΙΣΜΟΣ ΒΙΟΜΗΧΑΝΙΚΗΣ ΙΔΙΟΚΤΗΣΙΑΣ

[15], [16], [17] [6] [2] [5] Jiang [6] 2.1 [6], [10] Score(x, y) y ( 1) ( 1 ) b e ( 1 ) b e. O(n 2 ) Jiang [6] (word lattice reranking)

[5] F 16.1% MFCC NMF D-CASE 17 [5] NMF NMF 3. [5] 1 NMF Deep Neural Network(DNN) FUSION 3.1 NMF NMF [12] S W H 1 Fig. 1 Our aoustic event detect

1) Abstract (To be organized as: background, aim, workpackages, expected results) (300 words max) Το όριο λέξεων θα είναι ελαστικό.

ΑΝΙΧΝΕΥΣΗ ΓΕΓΟΝΟΤΩΝ ΒΗΜΑΤΙΣΜΟΥ ΜΕ ΧΡΗΣΗ ΕΠΙΤΑΧΥΝΣΙΟΜΕΤΡΩΝ ΔΙΠΛΩΜΑΤΙΚΗ ΕΡΓΑΣΙΑ

ΒΙΟΓΡΑΦΙΚΟ ΣΗΜΕΙΩΜΑ ΠΡΟΣΩΠΙΚΑ ΣΤΟΙΧΕΙΑ ΣΠΟΥΔΕΣ

Transcript:

61 2 2015 4 J Wuhan Univ Nat Sci Ed Vol 61 No 2 Apr 2015 124 ~ 130 DOI10 14188 /j 1671-8836 2015 02 004 1 2 1 1 1 1 1 430072 2 518057 SVM 14 000 TP 391 A 1671-8836201502-0124-07 Stock Research Reports Classification Based on Sentiment Analysis PENG Min 1 2 WANG Qing 1 HUANG Jimin 1 ZHOU Li 1 HU Xinhui 1 1 School of ComputerWuhan University Wuhan 430072HubeiChina 2 Shenzhen Institute of Wuhan UniversityShenzhen 518057GuangdongChina AbstractThe stock research report is the important professional investment advice in stock areas Based on web information extraction and retrievalautomatic analysis of the investment advice in massive stock reports will make the significant impact on investors behaviors In this paperwe propose a classification strategy for the stock research report based on sentiment analysis methods Firstlywe extract the integrated features in the stock research report Secondlywe leverage feature selection with the improved CHI statistical methodsand classify the stock research reports through the SVM and Naive Bayes classifiers Finallywe evaluate the classification result considering feature weight feature dimension and sampling number Based on 14000 research reports collected from the www eastmoney com the experimental results show thatthe strategy of integrated features selectiondimension reduction as well as training resamplingcan achieve higher performance Key wordssentiment analysisfeature selectionsvmsupport vector machinenaive Bayesimbalanced data stock research report 0 Schu- 2014-07-08 6147229161303115 E-mailpengm@ whu edu cn

2 125 maker 1 O Hare 2 3 /// 3 1 2 4 Naive Bayes k k- NearestNeighbor Support Vector Machine SVM Pang 5 1 SVM SVM 2 6 SVM Naive 3 1 1 Bayes 7 2

126 61 VSM VSM uni-gram 8 1 1 DF IG MI CHI Qiu 9 + CHI + CHI + + + + 2 2 1 12 000 1 400 600 TF( b ) = k m tf( b k d ) i 1 i = 1 tf( b k d i ) b k d i TF( bk ) 1 2 2 10 6 119 3 874 4 510 97 125 2 6 000 3 000 1 500 TF( bk ) 2 2 n-gram 2 3 CHI t k c i χ 2 t k uni-gram c i CHI χ 2 N AD - BC 2 ( t k c ) i = A + C B + D A + B C + D 2

2 127 A c i t k B c i t k C c i t k D c i t k CHI χ 2 ( t k c i ) 2 4 χ 2 ( t k c i ) = 0 2 4 1 AD - BC CHI 2 CHI d i D d i = w 1i w 2i w ki 5 w ki k d i CHI 11 2 4 2 12 0 13 CHI FI CI DI CHI TF TF- IDF IDF( tk ) t k TF-IDF 3 χ 2 ( t k c ) i = N AD - BC 2 A + C B + D A + B C + D FI + CI + DI AD - BC > 0 3 0 AD - BC 0 FI = TF ( t k c ) i A + C CI = DI = A A + B A A + C S( t ) k m = max 1 χ 2 ( c ) i { t k } 4 w ki = TF( t ) IDF( ki t ) = k TF( t ) lg N ki ( n tk + 0 5 + 0 5) 6 N n tk t k 14 TF-IDF IDF TF-CHI m 3 w ki = TF( t ) S( ki t ) = TF( k t ) max{ ki χ 2 ( t 1 k c ) } i TF( t k c i ) t k c i 7 FI t k CI t k DI t S( tk ) 4 k TF TF-IDF t k TF-CHI t k 2 5 4 1 2 3 SVM Naive Bayes

128 61 Naive Bayes 2 5 1 SVM SVM c i Xx 1 x 2 x n P( c i X ) = P ( X c ) i P( c ) i 9 P( X) P( c i X) ( x i y i ) = 1 x i R y i c i P ( X c i ) ( P X) { 1-1} SVM min{ 1 2 w 2 + C n ζi } P( X c ) i = 1 8 i = n p( x k c ) i 10 k = 1 subject to y i [ wx i + b ] 1 - ζ i p ( x k c i ) x k c i P C ζ i ( X c i ) 3 SVM SVM 3 1 1 one-against-one one-against-all one-against-one 14 000 600 SVM k kk - 1/ 1 400 12 000 2 SVM 2 one-against-all 15 k k SVM one-against-all 3 4 one-against-one 3 3 2 5 80% 20% 5 2 5 2 Naive Bayes Naive Bayes ICTCLAS P Precision R Recall F1 F1 MacroF1 1

2 129 5 TF-IDF TF 600 1 400 1 500 40% TF-IDF 1 400 TF SVM 3 3 % 72 8 78 0 75 3 71 2 82 2 76 3 Bayes 6 3 6 Macro F1 11 6% F1 10% 20% 40% 60% SVM 0 579 0 633 0 658 0 668 Naive Bayes 0 476 0 501 0 535 0 587 6 Naive Bayes 2SVM SVM 2 Naive Bayes Pang 5 5 600 1 400 1 500 10% 20% 40% 60% TF SVM 600 1 400 1 500 600 1 400 3 000 600 1 400 6 4 000 TF 4 F1 10% 20% 40% 50% 60% 0 521 0 608 0 597 0 606 0 603 0 462 0 52 0 613 0 612 0 629 0 755 0 771 0 763 0 774 0 774 4 40% 0 597 0 569 0 490 0 613 0 475 0 303 0 763 0 896 0 874 3 7 600 1 400 1 500 10% 20% 40% 60% TF TF-IDF TF-CHI SVM 5 5 Macro F1 10% 20% 40% 60% TF 0 579 0 633 0 658 0 668 TF-IDF 0 627 0 634 0 632 0 637 TF-CHI 0 546 0 591 0 624 0 616 TF TF-CHI 40% 4 F1 F1 600 1 400 1 500 63 4 49 7 55 8 74 0 50 1 59 7 10% 20% 40% 60% 62 3 55 6 58 7 64 7 58 2 61 3 TF SVM Naive 40% SVM 7 7 600 1 400 1 500 F1 600 1 400 3 000 600 1 400 6 000 SVM TF 40%

130 61 4 8Bolón-Canedo VSánchez-Maro no 珘 NAlonso-Betanzos A A review of feature selection methods on synthetic data J Knowledge and Information Systems 2013 343 CHI 483-519 9Qiu L QZhao R YZhou G et al An extensive empirical study of feature selection for text categorizationdb / OL 2014-02-03 http/ /ieeexplore ieee org /xpls / icp jsparnumber = 4529838 1Schumaker R PZhang YHuang C N et al Evaluating sentiment in financial news articles J Decision Support Systems2012533458-464 2O hare NDavy MBermingham Aet al Topic-dependent sentiment analysis of financial blogsc/ /Proceedings of the 1st International CIKM Workshop on Topic- Sentiment Analysis for Mass Opinion New YorkACM 20099-16 3 J 20133361574-1607 Yang L GZhu JTang S P Survey of text sentiment a- nalysis J Journal of Computer Applications201333 6 1574-1607Ch 4 J classification on imbalanced data distribution J Journal of Chinese Information Processing2012 26 3 33-37 Ch 10Drummond CHolte R C C4 5class imbalanceand cost sensitivitywhy under-sampling beats over-sampling C/ /Workshop on Learning from Imbalanced Datasets II WashingtonD CICML200311 11Galavotti LSebastiani FSimi M Experiments on the use of feature selection and negative evidence in automated text categorizationc/ /Research and Advanced Technology for Digital Libraries Berlin HeidelbergSpringer- Verlag200059-68 12 CHI J 201147 4 128-130 Pei Y BLiu X X Study on Improved CHI for feature selection in Chinese text categorization J Computer Engineering and Applications201147 4 128-130 Ch 13 x 2 J 2008282 513-514 2010218 1834-1848 Xiong Z YZhang P ZZhang Y F Improved approach to Zhao Y YQin BLiu T Sentiment Analysis J Journal of Software2010218 1834-1848Ch CHI in feature extraction J Journal of Computer Applications2008282 513-514Ch 5Pang BLee LVaithyanathan S Thumbs upsenti- 14Debole FSebastiani F Supervised term weighting for ment classification using machine learning techniques automated text categorizationc Text mining and its C/ /Proceedings of the ACL-02 conference on Empirical applications Berlin HeidelbergSpringer-Verlag2004 methods in natural language processing-volume 10 81-97 StroudsburgAssociation for Computational Linguistics 15 200279-86 J 201327 6Debole FSebastiani F An analysis of the relative hardness of Reuters-21578 subsets J Journal of the Ameri- 4113-118 Wang Z HWang Z QLi S Set al Feature selection can Society for Information Science and technology2005 for imbalanced sentiment classificationj Journal of 566 584-596 Chinese Information Processing201327 4 113-118 7 Ch J 201226333-37 Wang Z QLi S SZhu Q Met al Chinese sentiment