Telugu language hate speech detection using deep learning transformer models: Corpus generation and evaluation
In today’s digital era, social media has become a new tool for communication and sharing information, with the availability of high-speed internet it tends to reach the masses much faster. Lack of regulations and ethics have made advancement in the proliferation of abusive language and hate speech has become a growing concern on social media platforms in the form of posts, replies, and comments towards individuals, groups, religions, and communities. However, the process of classification of hate speech manually on online platforms is cumbersome and impractical due to the excessive amount of data being generated. Therefore, it is crucial to automatically filter online content to identify and eliminate hate speech from social media. Widely spoken resource-rich languages like English have driven the research and achieved the desired result due to the accessibility of large corpora, annotated datasets, and tools. Resource-constrained languages are not able to achieve the benefits of advancement due to a lack of data corpus and annotated datasets. India has diverse languages that change with demographics and languages that have limited data availability and semantic differences. Telugu is one of the low-resource Dravidian languages spoken in the southern part of India….