Simple QA with BERT
This is the easiest Q and A example that you can find.
Workflow goes something like this,
1. tokenizer to convert into torch tensor. You need to encode beginning with [CLS] and [SEP]. Statement or knowledge is encode with 1. Question is 0.
2. create BERT QA model and run.
3. get the score out. Interpreting the score is key. You get start and end index of a answer. Represented by start_scores and end_scores below. It is like using a highlighter and crossing out answer to our face.
## initialize the model
tokenizer = BertTokenizer.from_pretrained('bert-base-uncased')
model = BertForQuestionAnswering.from_pretrained('bert-large-uncased-whole-word-masking-finetuned-squad')
### Please note here that encoding are given as
# 0 0 0 0 0 0 0 (denote question) while 1 1 1 1 1 1 1 (denotes answers / knowledge )
question, text = "Who was Jim Henson?", "Jim Henson was a nice puppet"
input_text = "[CLS] " + question + " [SEP] " + text + " [SEP]"
input_ids = tokenizer.encode(input_text)
## doing what we defined above f
token_type_ids = [0 if i <= input_ids.index(102) else 1 for i in range(len(input_ids))]
## what is torch.tensor
start_scores, end_scores = model(torch.tensor([input_ids]), token_type_ids=torch.tensor([token_type_ids]))
all_tokens = tokenizer.convert_ids_to_tokens(input_ids)
Complete code can be found here
https://colab.research.google.com/drive/1TcbYjjiQQE9UDYoTtdl8dIjUAQKz11xf
Comments