top of page
  • Xenophobia Victims:

Hace más de un siglo sufrían la misma clase de discursos y situaciones de exclusión los italianos y los irlandeses en los EE.UU.
Over a century ago, Italians and Irish people suffered the same sort of discourses and situations of exclusion in the USA.

 
  • Suffering Victims:

Pero existe un hecho objetivo: una persona cruza a nado 500 metros y llega a la orilla exhausta, quizá hambrienta, y arrastrando el dolor de una vida miserable.
But there is an objective fact: a person swims 500 meters and reaches the shore exhausted, perhaps hungry, and dragging the pain of a miserable life.

 
  • Economic Resource:

Las labores del campo han recaído sobre las espaldas de migrantes y gracias a ellos los supermercados han estado abastecidos.
The work of the fields has fallen on the backs of migrants and thanks to them the supermarkets have been supplied.
 
  • Migration Control:

En un país medianamente serio, serían expulsadas y aquí en este país de pandereta se premia la ilegalidad. De pena.
In a moderately serious country, they would be expelled, but here, in this "tambourine country", illegality is rewarded. Lame.
 
​
  • Cultural and Religious Differences:

Lo malo de la educación es que es un proceso largo y lento, y es difícil de llevar a países atrasados.
The drawback of education is that it is a long and slow process, and it is difficult to bring to backward countries.
 
  • Benefits:
     
El 78 por ciento de las mujeres musulmanas no trabajan, son apoyadas por el estado + vivienda gratuita.
78 percent of Muslim women do not work, they are supported by the state + free housing.
 
  • Public Health:

Estos inmigrantes... seguro que además vienen con desnutriciones, anemias, piojos, infecciones varias y vaya usted a saber con qué extrañas enfermedades.
These immigrants... surely they also come with malnutrition, anemia, lice, various infections and who knows what strange diseases.
 
  • Security:

Al final lo de siempre, sólo una minoría de musulmanes matan, pero la gran mayoría son fundamentalistas.
At the end it is always the same, only a minority of Muslims kill, but the vast majority are fundamentalists.
 
  • Dehumanization:

En estos casos, lo mejor es dejarlos caer antes de aterrizar.
In these cases, it is best to drop them before landing.
 
  • Others:

El gobierno comunista y socialista quiere importar cuantos mas ilegales pueda, son votos futuros.
The communist and socialist government wants to import as many illegals as possible, they are future votes.

Description of the task

​

The aim of the task is to detect and classify stereotypes in sentences from comments posted in Spanish in response to different online news articles related to immigration. A sentence can contain multiple stereotypes belonging to different categories and, therefore, it may have multiple labels that need to be accurately detected. This scenario is known in the literature as a multi-label classification problem. However, to adapt the problem to a variety of participants’ interests, the task is designed in a hierarchical fashion by chaining two subtasks and allowing participants to either model the simple binary scenario or complete the entire pipeline by modeling the complex multi-label classification problem. Next, a description of both subtasks is provided:

​

  • Subtask 1: This subtask follows the SemEval 2021 Task 12 (Uma et al., 2021) proposal about learning with disagreements, in which authors state that there does not necessarily exist a single gold label for every sample in the dataset. This fact can be clearly noticed when multiple contradictory annotations arise at the data labelling stage due “to debatable, difficult, or linguistic ambiguity”. Thus, participants that tackle this problem will have to determine whether the sentences in a comment contain at least one stereotype (positive example) or none (negative example) considering the full distribution of labels provided by the annotators. The actual gold label of this subtask is left as a proxy to determine the subset of sentences that will be evaluated in the posterior subtask.

​

  • Subtask 2: This subtask consists of determining whether a sentence contains at least one stereotype (positive example) or none (negative example) and assigning those sentences previously marked as positive (with stereotypes) to ten categories that present immigrants as: 1) ‘victims of xenophobia’, 2) ‘suffering victims’, 3) ‘economic resources’, 4) a problem of ‘migration control’, 5) people with ‘cultural and religious differences’, 6) people which takes ‘benefits’ of our social policy, 7) a problem for ‘public health’, 8) a threat to ‘security’, 9) ‘dehumanization’ and 10) ‘other’ types of stereotypes. These categories could be grouped together as positive or negative stereotypes according to Sánchez-Junquera et al. (2021). Since a sentence can contain multiple stereotypes belonging to different categories, this subtask will be presented as a multi-label hierarchical classification problem. 

 

      The following examples correspond to the ten categories proposed for subtask 2. The examples are extracted from the dataset:

​

​

​

 

 

 

 

 

 

​

​

 

 

 

 

 

 

 

 

 

 

 

 

 

 

​

 

 

 

 

 

 

 

 

 

 

​

 

 

 

 

 

 

 

 

 

 

​

 

 

 

 

 

 

 

 

 

​

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

​

 

 

 

 

 

 

 

 

 

 

 

Teams will be allowed (and encouraged) to submit multiple runs (max. 5).

 

Expected target community, and actual or potential industrial take up

 

The present task is proposed to participants interested in racial, national, or ethnic stereotype detection and classification tasks, which is a relevant and relatively novel area of research due to its impact on modern society. Furthermore, the annotated dataset is a valuable resource for exploratory linguistic analysis, as well as for comparing the application of deep learning and classical machine learning models on Spanish stereotyped expressions under the recently introduced learning with disagreements paradigm (Basile et al. 2021, Uma et al. 2021). Moreover, the characteristics of the dataset make it suitable for further research on data augmentation, pre-training and fine-tuning techniques to achieve state-of-the-art results in tasks lacking many labeled examples.

​

To sum up, since the language of the texts in the dataset is Spanish, the dataset size is prone to data augmentation techniques and transfer learning and the evaluation focus is application-oriented, this proposal will be an attractive choice for beginner, medium and advanced-level NLP scientists to work on.

​

References

​

Basile,V., Fell, M., Fornaciari, T., Hovy, D., Paun, S., Plank, B., Poesio, M. and Uma, A. (2021). ‘We Need to Consider Disagreement in Evaluation’. In Proceedings of the 1st Workshop on Benchmarking: Past, Present and Future. pp. 15-21. Association for Computational Linguistics.

​

Sánchez-Junquera J, Chulvi B, Rosso P, Ponzetto SP. ‘How Do You Speak about Immigrants? Taxonomy and StereoImmigrants Dataset for Identifying Stereotypes about Immigrants’. Applied Sciences. 2021; 11(8):3610. https://doi.org/10.3390/app11083610

​

Uma, A., Fornaciari, T., Dumitrache, A., Miller, T., Chamberlain, J., Plank, B., Simpson, E. & Poesio, M. (2021). ‘SemEval-2021 Task 12: Learning with Disagreements’. In  Proceedings of the 15th International Workshop on Semantic Evaluation (SemEval-2021) (pp. 338-347). Association for Computational Linguistics. DOI: 10.18653/v1/2021.semeval-1.41

bottom of page