Simple QA with BERT

November 25, 2019

This is the easiest Q and A example that you can find.
Workflow goes something like this,

1. tokenizer to convert into torch tensor. You need to encode beginning with [CLS] and [SEP]. Statement or knowledge is encode with 1. Question is 0.

2. create BERT QA model and run.

3. get the score out. Interpreting the score is key. You get start and end index of a answer. Represented by start_scores and end_scores below. It is like using a highlighter and crossing out answer to our face.

## initialize the model

tokenizer = BertTokenizer.from_pretrained('bert-base-uncased')

model = BertForQuestionAnswering.from_pretrained('bert-large-uncased-whole-word-masking-finetuned-squad')

### Please note here that encoding are given as  

#                 0 0 0 0 0 0 0 (denote question) while 1 1 1 1 1 1 1 (denotes answers / knowledge )

question, text = "Who was Jim Henson?", "Jim Henson was a nice puppet"

input_text = "[CLS] " + question + " [SEP] " + text + " [SEP]"

input_ids = tokenizer.encode(input_text)

## doing what we defined above f

token_type_ids = [0 if i <= input_ids.index(102) else 1 for i in range(len(input_ids))]

## what is torch.tensor

start_scores, end_scores = model(torch.tensor([input_ids]), token_type_ids=torch.tensor([token_type_ids]))

all_tokens = tokenizer.convert_ids_to_tokens(input_ids)

Complete code can be found here

https://colab.research.google.com/drive/1TcbYjjiQQE9UDYoTtdl8dIjUAQKz11xf

Search This Blog

mitzen

Simple QA with BERT

Comments

Popular posts from this blog

Nextjs - How do you handle onclick which do something

The specified initialization vector (IV) does not match the block size for this algorithm

Azure function error : Missing value for AzureWebJobsStorage in local.settings.json