MWAHAHA: A Competition on Humor Generation

While Humor Understanding has been the focus of many shared tasks, Humor Generation remains an even more challenging and largely unexplored frontier. MWAHAHA, which stands for Models Write Automatic Humor And Humans Annotate, is SemEval 2026's Task 1 and is the first task dedicated to advancing the state of the art in Computational Humor Generation. We invite participants to develop systems capable of generating genuinely humorous content under various constraints.

Our goal is to push models beyond memorization and towards true humorous creativity. By using carefully designed constraints, we aim to ensure fairness in evaluation and encourage the generation of novel jokes. This task has significant implications for more engaging conversational AI, creative writing tools, and a deeper understanding of Humor's complex nature.

To participate (and download the input data), please visit our CodaBench page.

If you aren't participating, please consider helping us by annotating which computer program output is funnier! You can view the live leaderboard.

I want you! ... to make computers funny.

Quick Links

Data + Submission Page Discussion Forum Annotation Page Live Leaderboard

News

: The evaluation phase has just started. You can download the new data from CodaBench and generate your submission with it.
: The live leaderboard is ready.
: The annotation page is ready. Any non-participant is encouraged to annotate and have fun!
: Evaluation Trial phase started: new data released, and it's open to submissions.
: The CodaBench page is available, including the first part of the dev data.
: This website was created, and the sample data was released.

All dates are 23:59 UTC-12h ("anywhere on Earth").

Subtasks

Subtask A: Text-based Humor Generation

Given a set of text-based constraints, generate a joke. This subtask will be conducted in English, Spanish, and Chinese.

Constraints:

Each generated joke must respect one of the following constraints, designed to make it difficult to simply retrieve existing jokes from the web:

Word Inclusion: Must contain two specific words (from a list of rare word combinations).
News Headline: Must be related to a given news article headline (it could be a punchline, or a joke inspired by the headline).

Subtask B: Multimodal Humor Generation with Images

This subtask explores humor in a multimodal context, combining visual inputs with text generation. This subtask is in English only.

Image-Based Caption Generation

Given an image in GIF format, generate a humorous caption (max 20 words) that enhances its comedic effect, in two variants:

Subtask B1: Only use the GIF image to inspire the caption.
Subtask B2: Use the GIF file and complete a given text prompt with humorous content.

Data and Resources

No labels

In line with the task's focus on genuine generation over memorization, and given the diversity of humor and the difficulty of evaluating jokes, we will not provide labeled data; instead, we will provide only inputs. Participants are encouraged to use any publicly available data, pre-trained models, API, or rule-based systems.

Get the Data

To download the data, please refer to our CodaBench page.

You can also download the baselines for the evaluation trial phase use, and the prompts used to create them.

For task A, the baseline consists of prompting the Gemini 2.5 Flash model with simple, task-specific prompts for both the two-word constraint and the news title. The prompts are written in English. For Spanish and Chinese, a simple indication of the expected output language is appended to the prompt. For tasks B1 and B2, the first frame of the GIF is extracted and provided to the same model, together with task-specific prompts.

Evaluation

The evaluation will be based on human preference judgments. We will use a pairwise comparison setup ("battle"), where annotators choose the funnier of the two generated texts produced under the same conditions. We will use a web interface inspired by Chatbot Arena to crowdsource annotations from anybody on the Internet. The systems will be ranked using an Elo-based leaderboard.

Important Dates

Development data release:
Evaluation trial phase starts:
Evaluation trial phase ends:
Evaluation phase starts:
Evaluation phase ends:
System description paper submission:
Notification of acceptance:
Camera-ready papers due:
SemEval 2026 Workshop (approx.):

All dates are 23:59 UTC-12h ("anywhere on Earth").

Organizers

Santiago Castro
Universidad de la República

Luis Chiruzzo
Universidad de la República

Naihao Deng
University of Michigan

Julie-Anne Meaney
University of Edinburgh

Santiago Góngora
Universidad de la República

Ignacio Sastre
Universidad de la República

Victoria Amoroso
Universidad de la República

Guillermo Rey
Universidad de la República

Salar Rahili

Guillermo Moncecchi
Universidad de la República

Juan José Prada
Universidad de la República

Aiala Rosá
Universidad de la República

Rada Mihalcea
University of Michigan