| | --- |
| | license: apache-2.0 |
| | language: |
| | - en |
| | library_name: transformers |
| | pipeline_tag: text2text-generation |
| | tags: |
| | - information extraction |
| | - entity linking |
| | - named entity recogntion |
| | - relation extraction |
| | - text-to-text generation |
| | --- |
| | # T5-for-information-extraction |
| |
|
| | This is an encoder-decoder model that was trained on various information extraction tasks, including text classification, named entity recognition, relation extraction and entity linking. |
| |
|
| | ### How to use: |
| | First of all, initialize the model: |
| | ```python |
| | from transformers import T5Tokenizer, T5ForConditionalGeneration |
| | import torch |
| | |
| | device = torch.device("cuda") if torch.cuda.is_available() else torch.device('cpu') |
| | |
| | tokenizer = T5Tokenizer.from_pretrained("google/flan-t5-large") |
| | |
| | model = T5ForConditionalGeneration.from_pretrained("knowledgator/t5-for-ie").to(device) |
| | ``` |
| |
|
| | You need to set a prompt and put it with text to the model, below are examples of how to use it for different tasks: |
| |
|
| | **named entity recognition** |
| | ```python |
| | input_text = "Extract entity types from the text: <e1>Kyiv</e1> is the capital of <e2>Ukraine</e2>." |
| | input_ids = tokenizer(input_text, return_tensors="pt").input_ids.to(device) |
| | |
| | outputs = model.generate(input_ids) |
| | print(tokenizer.decode(outputs[0])) |
| | ``` |
| |
|
| | **text classification** |
| | ```python |
| | input_text = "Classify the following text into the most relevant categories: Kyiv is the capital of Ukraine" |
| | input_ids = tokenizer(input_text, return_tensors="pt").input_ids.to(device) |
| | |
| | outputs = model.generate(input_ids) |
| | print(tokenizer.decode(outputs[0])) |
| | ``` |
| |
|
| | **relation extraction** |
| | ```python |
| | input_text = "Extract relations between entities in the text: <e1>Kyiv</e1> is the capital of <e2>Ukraine</e2>." |
| | input_ids = tokenizer(input_text, return_tensors="pt").input_ids.to(device) |
| | |
| | outputs = model.generate(input_ids) |
| | print(tokenizer.decode(outputs[0])) |
| | ``` |
| | ### Unlimited-classifier |
| | With our [unlimited-classifier](https://github.com/Knowledgator/unlimited_classifier) you can use `t5-for-ie` to classify text into millions of categories. It applies generation with contraints that is super helful when structured and deterministic outputs are needed. |
| |
|
| | To install it, run the following command: |
| |
|
| | ```bash |
| | pip install -U unlimited-classifier |
| | ``` |
| |
|
| | Right now you can try it with the following example: |
| | ```python |
| | from unlimited_classifier import TextClassifier |
| | |
| | labels=[ |
| | "e1 - capital of Ukraine", |
| | "e1 - capital of Poland", |
| | "e1 - European city", |
| | "e1 - Asian city", |
| | "e1 - small country" |
| | ] |
| | |
| | classifier = TextClassifier( |
| | labels=['default'], |
| | model=model, |
| | tokenizer=tokenizer, |
| | device=device #if cuda |
| | ) |
| | classifier.initialize_labels_trie(labels) |
| | |
| | text = "<e1>Kyiv</e1> is the capital <e2>Ukraine</e2>." |
| | |
| | output = classifier.invoke(text) |
| | print(output) |
| | ``` |
| |
|
| | ### Turbo T5 |
| |
|
| | We recommend to use this model on GPU with our [TurboT5 package](https://github.com/Knowledgator/TurboT5), it uses custom CUDA kernels that accelerate computations and allows much longer sequences. |
| |
|
| | First of all, you need to install the package |
| |
|
| | ``` |
| | pip install turbot5 -U |
| | ``` |
| | Then you can import different heads for various purposes; we released more encoder heads for tasks such as token classification, question-answering or text classification and, of course, encoder-decoder heads for conditional generation: |
| |
|
| | ```python |
| | from turbot5 import T5ForConditionalGeneration |
| | from turbot5 import T5Config |
| | from transformers import T5Tokenizer |
| | import torch |
| | |
| | tokenizer = T5Tokenizer.from_pretrained("google/flan-t5-large") |
| | model = T5ForConditionalGeneration.from_pretrained("knowledgator/t5-for-ie", |
| | attention_type = 'flash', #put attention type you want to use |
| | use_triton=True).to('cuda') |
| | ``` |
| |
|
| | ### Feedback |
| | We value your input! Share your feedback and suggestions to help us improve our models. |
| | Fill out the feedback [form](https://forms.gle/5CPFFuLzNWznjcpL7) |
| |
|
| | ### Join Our Discord |
| | Connect with our community on Discord for news, support, and discussion about our models. |
| | Join [Discord](https://discord.gg/dkyeAgs9DG) |
| |
|