Simple QA with BERT



This is the easiest Q and A example that you can find.
Workflow goes something like this,

1. tokenizer to convert into torch tensor. You need to encode beginning with [CLS] and [SEP]. Statement or knowledge is encode with 1. Question is 0.

2. create BERT QA model and run. 

3. get the score out. Interpreting the score is key. You get start and end index of a answer. Represented by start_scores and end_scores below. It is like using a highlighter and crossing out answer to our face.


## initialize the model
tokenizer = BertTokenizer.from_pretrained('bert-base-uncased')
model = BertForQuestionAnswering.from_pretrained('bert-large-uncased-whole-word-masking-finetuned-squad')

### Please note here that encoding are given as  
#                 0 0 0 0 0 0 0 (denote question) while 1 1 1 1 1 1 1 (denotes answers / knowledge )
question, text = "Who was Jim Henson?""Jim Henson was a nice puppet"
input_text = "[CLS] " + question + " [SEP] " + text + " [SEP]"

input_ids = tokenizer.encode(input_text)

## doing what we defined above f

token_type_ids = [0 if i <= input_ids.index(102else 1 for i in range(len(input_ids))]

## what is torch.tensor

start_scores, end_scores = model(torch.tensor([input_ids]), token_type_ids=torch.tensor([token_type_ids]))

all_tokens = tokenizer.convert_ids_to_tokens(input_ids)


Complete code can be found here

https://colab.research.google.com/drive/1TcbYjjiQQE9UDYoTtdl8dIjUAQKz11xf



Comments

Popular posts from this blog

The specified initialization vector (IV) does not match the block size for this algorithm