Answering questions based on domain knowledge (like internal documentation, contracts, books, etc.) is challenging because it requires to address large documents. In this article we explore an advanced technique in order to achieve question answering on large texts with great accuracy, mixing semantic search and text generation with models like GPT-3, GPT-J, or GPT-NeoX.
The Challenges About Answering Questions On Domain Knowledge
Question answering on domain knowledge requires that you first send some context to the AI model, and then ask a question about it.
For example you could send the following context:
All NLP Cloud plans can be stopped anytime. You only pay for the time you used the service. In case of a downgrade, you will get a discount on your next invoice.
Now you might want to ask the following question:
When can plans be stopped?
The AI would answer something like this:
Anytime
For more details, see our documentation about question answering here.
The problem with this approach is that the size of your context is limited. On modern AI models (based on Transformers), the maximum size of such a context is 2048 tokens (which is pretty much equivalent to 1500 words).
Let's say that you want to build a support chatbot that knows everything about your product documentation, so end users can ask any product-related question to the chatbot without contacting a real support agent. Most likely, your documentation will be made up of several hundreds or thousands of words, or even millions of words...
Let's explore how to overcome this limitation and perform question answering on very large documents.
Semantic Search VS Text Generation
When it comes to question answering, 2 kinds of technologies can be used. Text generation and semantic search.
The first one, text generation, is basically what I just showed above. It usually takes an advanced text generation model like GPT-J, GPT-NeoX, GPT-3, OPT, Bloom... It is able to understand a human question, and respond like a human too. However it does not work on large documents. A strategy could be to fine-tune your own text generation model with your own domain knowledge. For more details, see our documentation about fine-tuning here. This strategy sometimes works well, but in some cases the text generation model tends to "forget" some of the facts mentioned in the dataset. So text generation is more fluent to answer questions, but also a bit less reliable when dealing with open knowledge.
Semantic search is basically about searching a document the same way as Google, but based on your own domain knowledge. In order to achieve that, you need to train your own semantic search model with your own domain knowledge. Then once your model is created, you can ask questions in natural language and your AI model will return the entry in your dataset that best answers your question.
Semantic search is usually very fast and relatively cheap. It is also more reliable than the text generation fine-tuning strategy. But it is not able to properly "answer" a question. It simply returns a piece of text that contains an answer to your question. Then it is up to the user to read the whole piece of text in order to find the answer to his question.
For more details, see our documentation about semantic search here.
Good news is that it is possible to combine both semantic search and text generation in order to achieve advanced results!
Question Answering Mixing Semantic Search And Text Generation
In order to answer questions on domain knowledge, the strategy we prefer at NLP Cloud is the following: first make a request with semantic search in order to retrieve the resource that best answers your question, and then use text generation to answer the question based on this resource like a human.
Let's say that we are a HP printers reseller, and we want to answer our customer's questions on our website.
First we will need to create our own semantic search model. Here it will be made of 3 examples only, but in real life you can include up 1 million examples when using semantic search on NLP Cloud. We simply create a CSV file and put the following inside:
HP® LaserJets have unmatched printing speed, performance and reliability that you can trust. Enjoy Low Prices and Free Shipping when you buy now online.
Every HP printer comes with at least a one-year HP commercial warranty (or HP Limited Warranty). Some models automatically benefit from a three-year warranty, which is the case of the HP Color LaserJet Plus, the HP Color LaserJet Pro, and the HP Color LaserJet Expert.
HP LaserJet ; Lowest cost per page on mono laser printing. · $319.99 ; Wireless options available. · $109.00 ; Essential management features. · $209.00.
We then upload our CSV dataset to NLP Cloud and click "Create model". After a while, our own semantic search model containing our own domain knowledge will be ready and we will receive a private API URL in order to use it.
Let's ask a question to our brand new model using the NLP Cloud Python client:
import nlpcloud
# We use a fake model name and a fake API key for illustration reasons.
client = nlpcloud.Client("custom-model/5d8e6s8w5", "poigre5754gaefdsf5486gdsa56", gpu=True)
client.semantic_search("How long is the warranty on the HP Color LaserJet Pro?")
The model quickly returns the following with a short response time:
{
"search_results": [
{
"score": 0.99,
"text": "Every HP printer comes with at least a one-year HP commercial warranty (or HP Limited Warranty). Some models automatically benefit from a three-year warranty, which is the case of the HP Color LaserJet Plus, the HP Color LaserJet Pro, and the HP Color LaserJet Expert."
},
{
"score": 0.74,
"text": "All consumer PCs and printers come with a standard one-year warranty. Care packs provide an enhanced level of support and/or an extended period of coverage for your HP hardware. All commercial PCs and printers come with either a one-year or three-year warranty."
},
{
"score": 0.68,
"text": "In-warranty plan · Available in 2-, 3-, or 4-year extension plans · Includes remote problem diagnosis support and Next Business Day Exchange Service."
},
]
}
Now we retrieve the answer that has the highest score: "Every HP printer comes with at least a one-year HP commercial warranty (or HP Limited Warranty). Some models automatically benefit from a three-year warranty, which is the case of the HP Color LaserJet Plus, the HP Color LaserJet Pro, and the HP Color LaserJet Expert."
This response is correct but it is not very user friendly since the user needs to read quite a long piece of text in order to get the answer. So now we ask the same question again to our question answering endpoint, using the GPT-J model. We will use the semantic search response as a context:
import nlpcloud
client = nlpcloud.Client("fast-gpt-j", "poigre5754gaefdsf5486gdsa56", gpu=True)
client.question(
"""How long is the warranty on the HP Color LaserJet Pro?""",
context="""Every HP printer comes with at least a one-year HP commercial warranty (or HP Limited Warranty). Some models automatically benefit from a three-year warranty, which is the case of the HP Color LaserJet Plus, the HP Color LaserJet Pro, and the HP Color LaserJet Expert."""
)
It returns the following answer:
{
"answer": "Three years"
}
Pretty good isn't it?
Conclusion
Despite the recent progress made on Transformer-based AI models like GPT-J, GPT-3, etc. the limited request size makes it impossible to use these great models on specific domain knowledge for question answering. Fine-tuning these models does not always work well for such a use case unfortunately...
A good strategy is to first answer your question with your own semantic search model trained on large documents, and then use a regular question answering model in order to return a human answer to the initial question.
If you want to implement this strategy, don't hesitate to create your own semantic search model
No comments:
Post a Comment