Home | CLEF | CLEF program| Publications | Partners | Contest | Contact Us | Tools | CLEF-2023
Humour remains one of the most difficult aspects of intercultural communication: understanding humour often requires understanding implicit cultural references and/or double meanings, and this raises the question of its (un)translatability. Wordplay is a common source of humour due to its attention-getting and subversive character. The translation of humour and wordplay is therefore in high demand. Modern translation depends heavily on technological aids, yet few works have treated the automation of humour and wordplay translation, or the creation of humour corpora. The goal of the JOKER workshop is to bring together translators and computer scientists to work on an evaluation framework for wordplay, including data and metric development, and to foster work on automatic methods for wordplay translation.
We invite you to submit both automatic and manual runs! Manual intervention should be reported.
Sign up at the CLEF website (https://clef2022-labs-registration.dei.unipd.it/). All team members should join the JOKER mailing list (https://groups.google.com/u/4/g/joker-project). After registration, you will receive an email with information on how to get access to the data.
The data is split into 3 folders corresponding to the shared tasks. Each task folder is further split in train data and test data.
Meta-data will be available as the participants’ results will be published.
Participants should put their run results into the folder Documents created for their user and submit them by email to contact@joker-project.com.
The email subject has to be in the format [CLEF TASK <NUMBER>] TEAM_ID.
Runs should be submitted as a ZIP folder of the corresponding JSON files. Manual runs are allowed to be submitted in a CSV format.
A confirmation email will be sent within 2 days after the submission deadline.
Train data format: List of classified wordplay instances in a JSON format or a CSV file (for manual runs) with the following fields:
ID
: a unique wordplay identifierWORDPLAY
: wordplay textLOCATION
: word(s) forming the wordplay, e.g. ambiguous wordsINTERPRETATION
: explanation of the wordplayHORIZONTAL/VERTICAL
: co-presence of source and target of the wordplay. In horizontal wordplay, both the source and the target of the wordplay are given (ex. 1: “They’re called lessons because they lessen from day to day”); in vertical wordplay, source and target are collapsed in a single occurrence (ex. 2: “How do you make a cat drink? Easy: put it in a liquidizer”)MANIPULATION_TYPE
: Identity (source and target are formally identical, as in ex. 2 above); Similarity (as in ex. 1 above: source and target are not perfectly identical, but the resemblance is obvious); Permutation (the textual material is given a new order, as in anagrams or spoonerism. Ex. 3: “Dormitory = dirty room”); Abbreviation (an ad-hoc category for textual material where the initials form another meaning, as in acrostics or “funny” acronyms. Ex. 4: “BRAINS: Biobehavioral Research Awards for Innovative New Scientists”)MANIPULATION_LEVEL
: most wordplay involves some kind of phonological manipulation – that’s why Sound will be the default category. Examples 1 and 2 involve a clear sound similarity (ex. 1) or identity (ex. 2). Only if this category cannot be applied to the wordplay should you look for another level of manipulation. First consider whether the manipulation involves Writing (as in ex. 3 and 4). If neither Sound nor Writing are manipulated, specify the level of manipulation as Other. This level of manipulation may arise, for instance, in chiasmuses (ex. 5: “We shape our buildings, and afterwards our buildings shape us”).CULTURAL_REFERENCE
: this is a binary (True/False) category. In order to understand some instances of wordplay, one has to be aware of some extra-linguistic factorsCONVENTIONAL_FORM
: this is a binary (True/False) category, e.g. Tom Swifty (wellerism), Monsieur et Madame… ont un filsOFFENSIVE
(not evaluated category): some wordplay instances are marked as offensive.Example:
[{"ID":"noun_1063","WORDPLAY":"Elimentaler","LOCATION":"Elimentaler","INTERPRETATION":"Emmental (cheese) + Eliminator","HORIZONTAL\/VERTICAL":"vertical","MANIPULATION_TYPE":"Similarity","MANIPULATION_LEVEL":"Sound","CULTURAL_REFERENCE":false,"CONVENTIONAL_FORM":false,"OFFENSIVE":null},{"ID":"pun_341","WORDPLAY":"Geologists can be sedimental about their work.","LOCATION":"sedimental","INTERPRETATION":"sentimental\/sediment","HORIZONTAL\/VERTICAL":"vertical","MANIPULATION_TYPE":"Similarity","MANIPULATION_LEVEL":"Sound","CULTURAL_REFERENCE":false,"CONVENTIONAL_FORM":false,"OFFENSIVE":null}]
Test data input format: List of wordplay instances to classify in a JSON format or a CSV file (for manual runs) with the following fields:
ID
: a unique wordplay identifierWORDPLAY
: wordplay textInput example:
[{"ID":"noun_1","WORDPLAY":"Ambipom"},{"ID":"het_1011","WORDPLAY":"These are my parents, said Einstein relatively"}]
Test data output format:
List of wordplay instances to be classified in a JSON format or a CSV file (for manual runs) with the following fields:
RUN_ID
: Run ID starting with team_id_task_id_ (as registered at the CLEF website)MANUAL
: Whether the run is manual {0,1}ID
: a unique wordplay identifier from the input fileWORDPLAY
: wordplay textTARGET_WORD
: word(s)DISAMBIGUATION
: explanation of the wordplayHORIZONTAL/VERTICAL
: co-presence of source and target of the wordplay (horizontal/vertical)MANIPULATION_TYPE
: Identity/Similarity/Permutation/AbbreviationMANIPULATION_LEVEL
: Sound/Writing/Other.CULTURAL_REFERENCE
: this is a binary (True/False) category. In order to understand some instances of wordplay, one has to be aware of some extra-linguistic factorsCONVENTIONAL_FORM
: this is a binary (True/False) category, e.g. Tom Swifty (wellerism), Monsieur et Madame… ont un filsOFFENSIVE
(not evaluated category): some wordplay instances are marked as offensive.Output example:
[{"RUN_ID":"RT_task_1_run1","MANUAL":1,"ID":"noun_1063","WORDPLAY":"Elimentaler","TARGET_WORD":"Elimentaler","DISAMBIGUATION":"Emmental (cheese) + Eliminator","HORIZONTAL\/VERTICAL":"vertical","MANIPULATION_TYPE":"Similarity","MANIPULATION_LEVEL":"Sound","CULTURAL_REFERENCE":false,"CONVENTIONAL_FORM":false,"OFFENSIVE":null},{"RUN_ID":"RT_task_1_run1","MANUAL":1,"ID":"pun_341","WORDPLAY":"Geologists can be sedimental about their work.","TARGET_WORD":"sedimental","DISAMBIGUATION":"sentimental\/sediment","HORIZONTAL\/VERTICAL":"vertical","MANIPULATION_TYPE":"Similarity","MANIPULATION_LEVEL":"Sound","CULTURAL_REFERENCE":false,"CONVENTIONAL_FORM":false,"OFFENSIVE":null}]
Output format checker
You can use this python3 script to check the output format. The script requires Python 3 and the Pandas library: Download python output checker
Evaluation. Pilot Task 1 includes both classification and interpretation components. Classification performance will be evaluated with respect to accuracy, while interpretation performance will be evaluated semi-manually.
Result submission. Participants should put their run results into the folder Documents created for their user and submit them by email to contact@joker-project.com. The email subject has to be in the format [CLEF TASK 1] TEAM_ID.
Train data format: List of translated wordplay instances in a JSON format or a CSV file (for manual runs) with the following fields:
id
: a unique wordplay identifieren
: wordplay text in English (source)fr
: wordplay text in French (target)Example:
[{"id":"noun_1","en":"Ambipom","fr":"Capidextre"}]
Test data input format: List of wordplay instances to translate in a JSON format or a CSV file (for manual runs) with the following fields:
id
: a unique wordplay identifieren
: wordplay text in English (source)Input example:
[{"id":"noun_1185","en":"Fungun"}]
Test data output format:
List of wordplay instances to be translated in a JSON format or a CSV file (for manual runs) with the following fields:
RUN_ID
: Run ID starting with team_id_task_id_ (as registered at the CLEF website)MANUAL
: Whether the run is manual {0,1}id
: a unique wordplay identifieren
: wordplay text in English (source)fr
: wordplay text in French (target)Output example:
[{"RUN_ID":"OFFICIAL_task_2_run1","MANUAL":1,"id":"noun_1","en":"Ambipom","fr":"Capidextre"}]
Output format checker
You can use this python3 script to check the output format. The script requires Python 3 and the Pandas library: Download python output checker
Evaluation. Human evaluators will manually annotate the submitted translations according to both subjective measures and according to more concrete features such as whether wordplay exists in the target text, whether it corresponds to the type used in the source text, whether the target text preserves the semantic field, etc.
Result submission. Participants should put their run results into the folder Documents created for their user and submit them by email to contact@joker-project.com. The email subject has to be in the format [CLEF TASK 2] TEAM_ID.
Train data format: List of translated wordplay instances in a JSON format or a CSV file (for manual runs) with the following fields:
id
: a unique wordplay identifieren
: wordplay text in English (source)fr
: wordplay text in French (target)Example:
[{"id":"pun_724_1","en":"My name is Wade and I'm in swimming pool maintenance.","fr":" Je m\u2019appelle Jacques Ouzy, je m\u2019occupe de l\u2019entretien des piscines."}]
Test data input format: List of wordplay instances to translate in a JSON format or a CSV file (for manual runs) with the following fields:
id
: a unique wordplay identifieren
: wordplay text in English (source)Input example:
[{"id":"het_713","en":"Ever since my mineral extraction facility was converted to parking, I've had a lot on my mine."}]
Test data output format:
List of wordplay instances to be translated in a JSON format or a CSV file (for manual runs) with the following fields:
RUN_ID
: Run ID starting with team_id_task_id_ (as registered at the CLEF website)MANUAL
: Whether the run is manual {0,1}id
: a unique wordplay identifieren
: wordplay text in English (source)fr
: wordplay text in French (target)Output example:
[{"RUN_ID":"JCM_task_3_run1","MANUAL":1,"id":"pun_724_1","en":"My name is Wade and I'm in swimming pool maintenance.","fr":" Je m\u2019appelle Jacques Ouzy, je m\u2019occupe de l\u2019entretien des piscines."}]
Output format checker
You can use this python3 script to check the output format. The script requires Python 3 and the Pandas library: Download python output checker
Evaluation. Human evaluators will manually annotate the submitted translations according to both subjective measures and according to more concrete features such as whether wordplay exists in the target text, whether it corresponds to the type used in the source text, whether the target text preserves the semantic field, etc.
Result submission. Participants should put their run results into the folder Documents created for their user and submit them by email to contact@joker-project.com. The email subject has to be in the format [CLEF TASK 3] TEAM_ID.
By downloading and using JOKER data, you agree to the terms of use. Any use of the data for any purpose other than academic research, would be in violation of the intended use of these data.
Therefore, by downloading and using these data you give the following assurances with respect to the JOKER data:
In case of violation of the conditions for access to the data for scientific purposes, this access may be withdrawn from the research entity and/or from the researcher. The research entity may also be liable to pay compensation for damages for third parties or asked to take disciplinary action against the offending researcher.
If you extend or use this work, please cite the paper where it was introduced:
Liana Ermakova, Tristan Miller, Fabio Regattin, Anne-Gwenn Bosser, Claudine Borg, Élise Mathurin, Gaëlle Le Corre,
Sílvia Araújo, Radia Hannachi, Julien Boccou, Albin Digue, Aurianne Damoy & Benoît Jeanjean, 2022.
Overview of JOKER@ CLEF 2022: Automatic Wordplay and Humour Translation workshop.
In International Conference of the Cross-Language Evaluation Forum for European Languages (pp. 447-469). Springer, Cham.
1-st Call for Participation (pdf)
This project has received a government grant managed by the National Research Agency under the program "Investissements d'avenir" integrated into France 2030, with the Reference ANR-19-GURE-0001.
JOKER is supported by The Human Science Institute in Brittany (MSHB)