Hate Speech Detection: A Lifelong Learning Challenge
While hate speech is a common topic in the news, it is not a new phenomenon. Hate speech has been used to incite violence, discrimination, and prejudice against individuals and groups since ages. Online platforms such as social media have made it easier for hate speech to spread and reach a wider audience in a short amount of time.
The online anonymity that social media provides has made it safer for individuals to engage in hate speech without fear of retribution. This has led to a tremendous increase in online hate speech and has made it more difficult to combat, especially using traditional methods of manual moderation and reporting. Therefore, to combat hate speech, we need to develop new tools and techniques that can automatically detect hate speech from online platforms. In this blog post, we will explore the challenges of detecting hate speech and discuss some of the current approaches that are being used to combat hate speech online.
According to the Cambridge Dictionary, the definition of hate speech is:
public speech that expresses hate or encourages violence towards a person or group based on something such as race, religion, sex, or sexual orientation.
Numerous studies have used similar definitions and built techniques to detect hate speech. Techniques such as machine learning, natural language processing, and deep learning have been used to detect hate speech. The simplest approach is to use a dictionary of hate words and phrases to detect hate speech. However, this approach is not very effective as hate speech can be subtle and may not always contain explicit hate words. Therefore, more sophisticated techniques are needed to detect hate speech such as data-driven methods of machine learning and deep learning. Recently with the popularity of large language models such as BERT, researchers have been able to build more accurate hate speech detection models. Traditionally, once a model is trained, it is deployed to detect hate speech in real-time.
However, hate speech is a dynamic phenomenon and constantly evolving due to the new terminology for expressing hateful behavior and new targets of hate. For example, a model is trained to detect hate speech by gender as targets, such as women, men, and transgenders and by race as targets, such as Black people. Then it is deployed in practice where it encounters a need to detect hate speech by religion as targets, such as Muslims. Moreover, there could be a need to accurately understand such hate speech across languages or dialects that the model was not trained in. In all such cases, it is important to build models that can adapt to new forms of hate speech. This is where lifelong/continual learning comes to rescue and can play a crucial role in building effective solutions.
Lifelong learning is a machine learning paradigm that aims to build models that can learn continuously from new data. The challenge with lifelong learning using deep learning methods is when new data is used to train the model, the model may forget about the older data distribution. This is known as catastrophic forgetting/interference in deep learning. This is illustrated by declining performance on inputs with specific aspects (identities) as a model is trained with new aspects (see Figure 1).
In our research under the SOCYTI project, we explore this idea and extend state of the art methods that aim to continuously learn.
Reference
Senarath, Y., & Purohit, H. (2024, September 18). Lifelong Learning Framework for Multilingual Hate Speech Detection in Social Media Streams. 17th International Conference on Social Computing, Behavioral-Cultural Modeling, & Prediction and Behavior Representation in Modeling and Simulation.
By Yasas Senarath and Hemant Purohit