A Crowd-Annotated Spanish Corpus for Humor Analysis

Santiago Castro, Luis Chiruzzo, Aiala Rosá, Diego Garat, and Guillermo Moncecchi

Grupo de Procesamiento de Lenguaje Natural (NLP Group), Universidad de la República — Uruguay

Crowd-annotated corpus of 27k tweets written in Spanish, labeled by humor and funniness (1 to 5) value, created for Humor Analysis and Natural Language Processing.



The dataset consists of the following 2 files:

Aggregated version (one row per tweet with the sum of annotations for each category): All Annotations by Tweet CSV

Aggregated version, without the annotations from people who did not pass the test tweets: Annotations by Tweet CSV. This one was used by the HAHA task, about Humor Recognition and Funniness Detection.


If you publish work that uses this dataset, please cite as follows:
  title={A Crowd-Annotated Spanish Corpus for Humor Analysis},
  author={Castro, Santiago and Chiruzzo, Luis and Ros{\'a}, Aiala and Garat, Diego and Moncecchi, Guillermo},
  booktitle={Proceedings of SocialNLP 2018, The 6th International Workshop on Natural Language Processing for Social Media},


SocialNLP 2018 @ ACL slides