The rapid proliferation of social media platforms has drastically transformed global communication, democratizing the sharing of information. However, this significant digital expansion has catalyzed an alarming rise in cyberbullying, hate speech, and toxic discourse. Traditional manual moderation is no longer viable given the sheer volume, velocity, and linguistic complexity of user-generated data. Furthermore, the modern digital landscape of 2026 is increasingly characterized by code-mixed languages (such as Hinglish) and sophisticated obfuscation techniques, which theoretically mimic the complexity of LLM- generated toxicity, easily evading standard keyword-based filters. This research presents the comprehensive design, mathematical formulation, and implementation of an advanced AI- based content moderation system tailored for automated detection of hate speech and toxic language. The study evaluates traditional machine learning algorithms alongside modern multilingual baselines, such as IndicBERT, against a novel hybrid DistilBERT-BiLSTM architecture. To address the extreme class imbalance inherent in real-world toxic comment datasets without corrupting discrete text features, Class-Weighted Binary Cross-Entropy is applied. The system’s performance is evaluated using advanced metrics including Accuracy, F1-score, Matthews Correlation Coefficient (MCC), and inference latency. Experimental results across multiple random seeds demonstrate that the proposed hybrid architecture achieves a mean MCC score of 0.904 ± 0.003 and an F1-score of 0.913 ± 0.002, with an inference latency of 49.5 milliseconds, making it highly suitable for real-time enterprise deployment. Comprehensive ablation studies and hyperparameter optimizations validate the architectural choices. Additionally, integrating Explainable AI via SHAP enables interpretable decision-making by identifying the specific linguistic tokens responsible for toxicity predictions. The research contributes a scalable, transparent, and high-performance AI moderation framework that promotes safer digital ecosystems while preserving freedom of expression and adhering to India's stringent digital governance frameworks.
Artificial Intelligence, Content Moderation, Hate Speech Detection, Toxic Language Detection, Natural Language Processing, DistilBERT, BiLSTM, IndicBERT, Explainable AI, Class-Weighted Loss, Matthews Correlation Coefficient, Code-Mixed Text.
International Journal of Trend in Scientific Research and Development - IJTSRD having
online ISSN 2456-6470. IJTSRD is a leading Open Access, Peer-Reviewed International
Journal which provides rapid publication of your research articles and aims to promote
the theory and practice along with knowledge sharing between researchers, developers,
engineers, students, and practitioners working in and around the world in many areas
like Sciences, Technology, Innovation, Engineering, Agriculture, Management and
many more and it is recommended by all Universities, review articles and short communications
in all subjects. IJTSRD running an International Journal who are proving quality
publication of peer reviewed and refereed international journals from diverse fields
that emphasizes new research, development and their applications. IJTSRD provides
an online access to exchange your research work, technical notes & surveying results
among professionals throughout the world in e-journals. IJTSRD is a fastest growing
and dynamic professional organization. The aim of this organization is to provide
access not only to world class research resources, but through its professionals
aim to bring in a significant transformation in the real of open access journals
and online publishing.