Επεξεργαςία τθσ πλθροφορίασ για τθ διαχείριςθ τθσ φιμθσ ενόσ brand Ιάςων Δεμοίροσ Qualia Οικονομικό Πανεπιζηήμιο Αθηνών Ομάδα Επεξεπγαζίαρ Φςζικήρ Γλώζζαρ 04/05/ 2011
in a nutshell Qualia tracks, measures and analyzes the thousands of on-line conversations on web sites, blogs, microblogs social networks, social news and forums. The company's technologies make the wealth of data searchable so that clients better understand threats and opportunities, develop creative strategies, understand what is being said and decide what they should do next. 09/05/2011 AUEB talk 2
architecture Information Sources Intelligent Processing Services TV, Radio Web news Blogs Social Media Forums, Discussion Boards Consumer Reviews User Generated Content Speech Recognition Video Text Recognition Topic Detection Names, Brands, Locations Influencers Terms & Key-words Conceptual Search Natural Language Processing Opinion Reputation Management Buzz Monitoring Analytics Actionable Intelligence Data and Text Mining Search Microblogs Text Analytics 09/05/2011 AUEB talk 3
αναηιτθςθ πλθροφορίασ Profile = ςφνολο οντοτιτων (entities) Entity = ςφνολο από λζξεισ-κλειδιά, όρουσ, ονόματα, φράςεισ, κανόνεσ AND/OR/NOT WSD μζςα από μεγάλο ςώμα κειμζνων Cross-media indexing, weighted boolean search ςε όλα τα μζςα. Αναγνώριςθ φωνισ + κειμζνου ςτο βίντεο (τθλεόραςθ), αναγνώριςθ φωνισ (ραδιόφωνο), html indexing (web) 09/05/2011 AUEB talk 4
television & radio TV media source selected TV Results 09/05/2011 AUEB talk 5
speech recognition Large vocabulary ASR in Greek Broadcast News Audio extraction from video broadcasts Audio segmentation (music, noise): keep speech segments only Speech transcription in XML format with time stamps Continuous training and lexicon enrichment with NEs and terms Η επιτυχία εξαρτάται από το περιβάλλον. QUALIA ASR Resources Segmentation Feature extraction Training Recognition XML ΗΜΜ Models o t o 09/05/2011 AUEB talk 6
video text recognition Typical text area: Typical non-text area: 25 20 15 10 5 0-5 -10-15 0 50 100 150 Model the quasi-periodicity and use it to discriminate the two classes Text areas -> OCR -> index 4 2 0-2 -4-6 -8 0 10 20 30 40 50 60 70 09/05/2011 AUEB talk 7
online news & social indexing aino crawls and indexes all major Greek news sites and blogs plus selected information sources from around the world. aino also tracks the millions of conversations taking place on Twitter, Facebook, YouTube, Google buzz and other social nets. 09/05/2011 AUEB talk 8
social media buzz Thessaloniki municipal elections candidates 09/05/2011 AUEB talk 9
daily topics All daily news are automatically computed and measured in real time. Topics are ranked by using various impact metrics. Topic search is provided. All topics related to the monitored Entities are spotted. Filtering and measurement by Category is also provided. Date, Category and Entity Filters Topic search Impact metrics Topic measurement Topic Summary Topic cloud 09/05/2011 AUEB talk 10
impact on daily news Compare Entities media image for a given period using the total value of aino impact metric Compare each Entity s media image to the TOP day topic image (headline) using the aino impact metric 09/05/2011 AUEB talk 11
opinion Στόχοσ μασ είναι θ μζτρθςθ τθσ γνώμθσ ςε ζνα κείμενο, ανεξάρτθτα ςε ποιον αφορά Λεκτικι και μορφοςυντακτικι ανάλυςθ Lexical resources with polarity Κανόνεσ Υπολογίηουμε 2 ςκορ για κάκε κείμενο: κετικό και αρνθτικό Συνολικό ςκορ για όλο το κείμενο κανονικοποιθμζνο ςτο διάςτθμα *-1...+1] Δουλεφουμε ςτθ μζτρθςθ γνώμθσ που εςτιάηει ςε ςυγκεκριμζνο entity 09/05/2011 AUEB talk 12
Elections analysis opinion polarity pleasure, excitement buzz topics 09/05/2011 AUEB talk 13
rank ranking influencers skaigr 2000 bankingnewsgr enetgr madata 1500 cyprusnews ireportergr newsbeastgreece chiosguide ligoapola path_stocks opterios diavgeia_gr Inner-network influence is a metric that computes the influence of a user within her personal network. psemata kerdos_online 1000 500 tweetyblogs PrasinoMilo argoliki eoellas vraseryzi agiamalis georgiosnikitas prokiller130 z80mob nargyris NickKlisiaris Rank is a combined metric that represents the user s popularity and authority within the global twitter network. doleross 0-400 -200 0 200 400 600 800 1000 1200 1400 1600 1800 geezgr kitsosmitsos tvxs newsbeast HEYngineSports EMEAPORTAL pulse_gr euro2day_gr reportergr The sphere diameter represents the number of user tweets that are related to an entity. -500 kathimerini_gr inner-network influence oichalia Kontiloforos digitalcrete 09/05/2011 AUEB talk 14
topics of discussion Μορφολογικι, ςυντακτικι και ςτατιςτικι ανάλυςθ των κειμζνων Αναγνώριςθ ονοματικών φράςεων Φιλτράρουμε τισ χαμθλζσ ςυχνότθτεσ Xuerui Wang, Andrew Mccallum, Xing Wei, 2007: Topical n-grams: Phrase and topic discovery, with an application to information retrieval, In Proceedings of the 7th IEEE International Conference on Data Mining, 2007. 09/05/2011 AUEB talk 15
topics of discussion: ζλλθνεσ και φαγθτό Πότε μιλάνε οι Ζλλθνεσ για φαγθτό ςτα social media; 09/05/2011 AUEB talk 16
R&D topics Προφίλ χριςτθ: personalization, semantic filtering, social signals, ontologies Ανάλυςθ γνώμθσ: emotion, υπολογιςμόσ για ςυγκεκριμζνο entity Θζματα ςυηιτθςθσ: fact/scenario extraction, who/what where elements, NP and term filtering Influencers: graph analysis Οι καλζσ ιδζεσ είναι πάντα ευπρόςδεκτεσ! Εργαλεία: πτυχιακζσ & μεταπτυχιακζσ εργαςίεσ, internship, FP7, ΕΣΠΑ, μεταφορά τεχνογνωςίασ 09/05/2011 AUEB talk 17
Get in touch! W: www.qualia.gr F: facebook.com/ainoqualia T: twitter.com/ainoqualia B: ainoqualia.tumblr.com/ P: +30-210-6202811 09/05/2011 AUEB talk 18