Combining Rules and CRF Learning for Opinion Source Identification in Spanish Texts

Tipo

Artículo de journal

Año

2012

Publisher

Springer Berlin Heidelberg

Páginas

452

Volúmen

7637

Abstract

In this work we present a system for the automatic annotation of opinions in Spanish texts. We focus mainly in the definition of a {TFS}-style model for the predicates of opinion and their arguments, in the creation of a lexicon of opinion predicates and in two additional variants for identifying the source of opinions. The original system extracts opinions and all its elements (predicate, source, topic and message) based on hand-coded rules, the first variant uses a {CRF} model for learning the source, assuming that the predicate is already tagged, and the second variant is a combined version, with the result of source recognition via the rule-based system being added as an additional attribute for training the {CRF} model. We found that this hybrid system performs better than each of the systems evaluated separately. This work involved the construction of several resources for Spanish: a lexicon of opinion predicates, a 13,000 word corpus with whole opinion annotations and a 40,000 word corpus with annotations of opinion predicates and sources.

Autores

Jean-Luc Minel

Aiala Rosá

Dina Wonsever

NéstorD Duque-Méndez

Juan Pavón

Rubén Fuentes-Fernández

Citekey

citeulike:12275311

doi

10.1007/978-3-642-34654-5\\_46

Keywords