LyubomirT's Toxicity Detector Model v17

Welcome to LyubomirT's Toxicity Detector repository! This project contains a Jupyter notebook that demonstrates how to train and use a lightweight BERT-based model for detecting toxicity in text.

Overview

This notebook utilizes the BERT model to classify comments into different toxicity categories based on the "Severity of Toxic Comments" dataset. The model is implemented using the transformers library and torch.

Features

Model Training: Instructions on how to train the BERT model on the toxicity dataset.
Inference: Methods to use the pre-trained model for making predictions on new text.
Pre-trained Weights: Option to download pre-trained model weights for quick inference.

Getting Started

Prerequisites

Before running the notebook, make sure you have the following Python packages installed:

pandas
transformers
torch
tqdm
scikit-learn

You can install these packages using pip:

pip install pandas transformers torch tqdm scikit-learn

Training the Model

Run the Notebook: Start by running all the cells in the notebook from the beginning.
GPU Requirement: Training the model efficiently requires a GPU. Ensure you have access to one for best performance.

Quick Inference

If you prefer to skip the training process and test the model directly:

Download Pre-trained Weights: Obtain the trained model weights from the release page.
Upload Weights: Upload the model weights to the notebook.
Run Inference: Navigate to the "Inference" section in the notebook. You can input text and observe the model's predictions.

Dataset

The model is trained on the "Severity of Toxic Comments" dataset, which includes labels such as toxic, severe_toxic, obscene, threat, insult, and identity_hate. The dataset consists of comments from Wikipedia's talk page edits.

Model Details

Model Architecture: Utilizes BERT (bert-base-uncased) with a classification head for toxicity detection.
Training Configuration: The model is trained using the AdamW optimizer with a learning rate of 2e-5 and employs mixed precision training and gradient accumulation for efficiency.

Inference

To perform inference:

Model Loading: The notebook includes code to load the pre-trained model weights.
Prediction Function: Use the predict_toxicity function to evaluate the toxicity of input text.
Output: The function returns a dictionary with the toxicity probabilities for each category.

Basically, this means that you can download the weights and run all cells + the last one to test the model on your own text. Also, you can train the model on your own dataset by following the instructions in the notebook.

Contributing

Feel free to contribute to this project by submitting issues or pull requests. Your feedback and improvements are welcome!

License

This project is licensed under the MIT License. See the LICENSE file for details.

Name		Name	Last commit message	Last commit date
Latest commit History 8 Commits
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
lyubomirt-toxicity-detector-nb.ipynb		lyubomirt-toxicity-detector-nb.ipynb
train.csv		train.csv

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

LyubomirT's Toxicity Detector Model v17

Overview

Features

Getting Started

Prerequisites

Training the Model

Quick Inference

Dataset

Model Details

Inference

Contributing

License

About

Releases 1

Packages

Languages

License

LyubomirT/bert-toxicity-detection-model

Folders and files

Latest commit

History

Repository files navigation

LyubomirT's Toxicity Detector Model v17

Overview

Features

Getting Started

Prerequisites

Training the Model

Quick Inference

Dataset

Model Details

Inference

Contributing

License

About

Resources

License

Stars

Watchers

Forks

Releases 1

Packages 0

Languages

Packages