Monitoring Rare Categories in Sentiment and Opinion Analysis - Expo Milano 2015 on Twitter Platform.
Code:
02/2017
Title:
Monitoring Rare Categories in Sentiment and Opinion Analysis - Expo Milano 2015 on Twitter Platform.
Date:
Friday 13th January 2017
Author(s):
Arena, M.; Calissano, A.; Vantini, S.
Abstract:
This paper proposes a new aggregated classification scheme aimed
to support the implementation of text analysis methods in contexts
characterised by the presence of rare text categories. The proposed
approach starts from the aggregate supervised text classifier developed
by Hopkins and King and moves forward relying on rare event sampling
methods. In details, it enables the analyst to enlarge the number
of text categories whose proportions can be estimated preserving the
estimation accuracy of standard aggregate supervised algorithms and
reducing the working time w.r.t. to unconditionally increase the size
of the random training set. The approach is applied to study the daily
evolution of the web reputation of Expo Milano 2015, before, during
and after the event. The data set is constituted by about 900,000
tweets in Italian and 260,000 tweets in English, posted about the event
between March 2015 and December 2015. The analysis provides an
interesting portray of the evolution of Expo stakeholders’ opinions over
time and allow to identify the main drivers of Expo reputation. The
algorithm will be implemented as a running option of the next release
of R package ReadMe.