Temporal Tagging
WhentheFact
WhenTheFact is a event extractor for legal texts. It identifies events, its type, the related date and the subject. Additionally, it generates a timeline for sentences from the European Court of Human Rights. A demo is available online, and this work is described in a paper, together with other approaches and an annotated corpus, accepted in JURIX2020.
Filtz, E., Navas-Loro, M., Santos, C., Polleres, A., Kirrane, Events Matter: Extraction of Events from Court Decisions. In: Serena Villata, Jakub Harašta, Petr Křemen (eds) Frontiers in Artificial Intelligence and Applications, vol 334. IOS Press. JURIX 2020. pp. 33 – 42. doi: 10.3233/FAIA200847.
Añotador and Hourglass corpus
Añotador is a temporal tagger for English and Spanish able to find and normalize time expressions such as dates, durations, times and sets. The Hourglass corpus is a dataset of annotated short texts developed for temporal taggers testing in Spanish. The info of both resources is available on the website of Añotador, and a paper describing them has been published in a Special Issue of the Journal of Intelligent and Fuzzy Systems.
Navas-Loro, M., Rodríguez-Doncel, V. (2020). Annotador: a Temporal Tagger for Spanish. Journal of Intelligent & Fuzzy Systems 39 (2020) 1979-1991, (2) doi:10.3233/JIFS-179865.
TempCourt
TempCourt is the first corpus of legal documents annotated with temporal expressions. It is the result of a collaboration with the Wirtschaftsuniversität Wien and has been published in the Knowledge Engineering Review. More information is available on its website.
Navas-Loro, M., Filtz, E., Rodríguez-Doncel, V., Polleres, A., Kirrane, S. (2019). TempCourt: Evaluation of Temporal Taggers on a new Corpus of Court Decisions. The Knowledge Engineering Review 34 (2019) e24. doi.org/10.1017/S0269888919000195.
Analysis of events in the legal domain
In collaboration with Cristiana Santos, we did a first analysis on events in the legal domain. This work was presented at TeReCom 2018, and both the slides and the video are available.
Navas-Loro, M., Santos, C. (2018). Events in the legal domain: first impressions. In: Proceedings of the 2nd Workshop on Technologies for Regulatory Compliance co-located with the 31st International Conference on Legal Knowledge and Information Systems (JURIX 2018), Groningen, The Netherlands, December 12, 2018. Pp. 45–57.
LawORDate
LawORDate is a webservice that temporally replaces the legal references in a Spanish text to facilitate its temporal annotation. The presentation used in TeReCom 2017 is available and the service can be used from here.
Navas-Loro, M. (2017). LawORDate: a Service for Distinguishing Legal References from Temporal Expressions. Proceedings of TeReCom 2017: Workshop on Technologies for Regulatory Compliance at JURIX (TeReCom 2017)
ContractFrames
ContractFrames is a framework created with the National Institute of Informatics de Tokyo in order to detect events related to contracts in English texts. The software is available on GitHub, and the date model can be found here.
Navas-Loro, M., Satoh, K., Rodríguez-Doncel, V.. ContractFrames: Bridging the Gap Between Natural Language and Logics in Contract Law. K. Kojima et al. (Eds.): JSAI-isAI 2018, LNAI 11717, pp. 1–14, 2019. https://doi.org/10.1007/978-3-030-31605-1_9
Sentiment Analysis
State of the art in corpora for Sentiment Analysis in Spanish
In 2019 we published a state of the art on corpora availabe in Spanish for Sentiment Analysis, were 20 different resources from various domains were analysed.
Navas-Loro, M., Rodríguez-Doncel, V. (2019). “Spanish corpora for sentiment analysis: a survey”. In: Language Resources and Evaluation. issn: 1574-0218. doi: 10.1007/s10579-019-09470-8.
Corpus SAB/MAS
Framed in the Emotion Analysis field, we built a Spanish corpus of tweets expressing emotions toward concrete products and brands. This corpus, called Spanish Corpus for Sentiment Analysis towards Brands (SAB), is classified following a taxonomy of four emotions and their direct opposites, as well as a neutral one. It can be found published as Linked Data in here; also a vocabulary has been developed. It was further extended in the MAS corpus, where we added for each tweet the categories Marketing Mix (with the four Marketing Ps: Price, Place, Product and Promotion) and Purchase Funnel (indicating in which point of the purchase journey the opinion is given).
Navas-Loro, M., Rodríguez-Doncel, V., Santana I., Fernández-Izquierdo, A., Sánchez, A., MAS: A Corpus of Tweets for Marketing in Spanish – The Semantic Web: ESWC 2018 Satellite Events, September 12-16, 2017. Ed. by Aldo Gangemi, Anna Lisa Gentile, Andrea Giovanni Nuzzolese, Sebastian Rudolph, Maria Maleshkova, Heiko Paulheim, Jeff Z Pan and Mehwish Alam. Cham: Springer International Publishing, pp. 363–375. isbn: 978-3-319-98192-5. doi: 10.1007/978-3-319-98192-5_53
Navas-Loro, M., Rodríguez-Doncel, V. , Santana I., Sánchez, A., Spanish Corpus for Sentiment Analysis towards Brands – Proceedings of Speech and Computer: 19th International Conference, SPECOM 2017, Hatfield, UK, September 12-16, 2017. Ed. by Alexey Karpov, Rodmonga Potapova, and Iosif Mporas. Cham: Springer International Publishing, pp. 680–689. isbn: 978-3-319-66429-3. doi: 10.1007/978-3-319-66429-3_68
Resources: SAB vocabulary SAB corpus MAS corpus