| | --- |
| | license: apache-2.0 |
| | base_model: bert-large-uncased |
| | tags: |
| | - generated_from_trainer |
| | - phishing |
| | - BERT |
| | metrics: |
| | - accuracy |
| | - precision |
| | - recall |
| | model-index: |
| | - name: bert-finetuned-phishing |
| | results: [] |
| | widget: |
| | - text: https://www.verif22.com |
| | example_title: Phishing URL |
| | - text: Dear colleague, An important update about your email has exceeded your |
| | storage limit. You will not be able to send or receive all of your messages. |
| | We will close all older versions of our Mailbox as of Friday, June 12, 2023. |
| | To activate and complete the required information click here (https://ec-ec.squarespace.com). |
| | Account must be reactivated today to regenerate new space. Management Team |
| | example_title: Phishing Email |
| | - text: You have access to FREE Video Streaming in your plan. REGISTER with your email, password and |
| | then select the monthly subscription option. https://bit.ly/3vNrU5r |
| | example_title: Phishing SMS |
| | - text: if(data.selectedIndex > 0){$('#hidCflag').val(data.selectedData.value);};; |
| | var sprypassword1 = new Spry.Widget.ValidationPassword("sprypassword1"); |
| | var sprytextfield1 = new Spry.Widget.ValidationTextField("sprytextfield1", "email"); |
| | example_title: Phishing Script |
| | - text: Hi, this model is really accurate :) |
| | example_title: Benign message |
| | datasets: |
| | - ealvaradob/phishing-dataset |
| | language: |
| | - en |
| | pipeline_tag: text-classification |
| | --- |
| | |
| | <!-- This model card has been generated automatically according to the information the Trainer had access to. You |
| | should probably proofread and complete it, then remove this comment. --> |
| |
|
| | # BERT FINETUNED ON PHISHING DETECTION |
| |
|
| | This model is a fine-tuned version of [bert-large-uncased](https://huggingface.co/bert-large-uncased) on an [phishing dataset](https://huggingface.co/datasets/ealvaradob/phishing-dataset), |
| | capable of detecting phishing in its four most common forms: URLs, Emails, SMS messages and even websites. |
| |
|
| | It achieves the following results on the evaluation set: |
| |
|
| | - Loss: 0.1953 |
| | - Accuracy: 0.9717 |
| | - Precision: 0.9658 |
| | - Recall: 0.9670 |
| | - False Positive Rate: 0.0249 |
| |
|
| | ## Model description |
| |
|
| | BERT is a transformers model pretrained on a large corpus of English data in a self-supervised fashion. |
| | This means it was pretrained on the raw texts only, with no humans labelling them in any way (which is why |
| | it can use lots of publicly available data) with an automatic process to generate inputs and labels from |
| | those texts. |
| |
|
| | This model has the following configuration: |
| |
|
| | - 24-layer |
| | - 1024 hidden dimension |
| | - 16 attention heads |
| | - 336M parameters |
| |
|
| | ## Motivation and Purpose |
| |
|
| | Phishing is one of the most frequent and most expensive cyber-attacks according to several security reports. |
| | This model aims to efficiently and accurately prevent phishing attacks against individuals and organizations. |
| | To achieve it, BERT was trained on a diverse and robust dataset containing: URLs, SMS Messages, Emails and |
| | Websites, which allows the model to extend its detection capability beyond the usual and to be used in various |
| | contexts. |
| |
|
| | ### Training hyperparameters |
| |
|
| | The following hyperparameters were used during training: |
| | - learning_rate: 2e-05 |
| | - train_batch_size: 16 |
| | - eval_batch_size: 16 |
| | - seed: 42 |
| | - optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08 |
| | - lr_scheduler_type: linear |
| | - num_epochs: 4 |
| |
|
| | ### Training results |
| |
|
| | | Training Loss | Epoch | Step | Validation Loss | Accuracy | Precision | Recall | False Positive Rate | |
| | |:-------------:|:-----:|:-----:|:---------------:|:--------:|:---------:|:------:|:-------------------:| |
| | | 0.1487 | 1.0 | 3866 | 0.1454 | 0.9596 | 0.9709 | 0.9320 | 0.0203 | |
| | | 0.0805 | 2.0 | 7732 | 0.1389 | 0.9691 | 0.9663 | 0.9601 | 0.0243 | |
| | | 0.0389 | 3.0 | 11598 | 0.1779 | 0.9683 | 0.9778 | 0.9461 | 0.0156 | |
| | | 0.0091 | 4.0 | 15464 | 0.1953 | 0.9717 | 0.9658 | 0.9670 | 0.0249 | |
| |
|
| |
|
| | ### Framework versions |
| |
|
| | - Transformers 4.34.1 |
| | - Pytorch 2.1.1+cu121 |
| | - Datasets 2.14.6 |
| | - Tokenizers 0.14.1 |