The GPT-3 text generation process cannot provide accurate information.
The text generated by GPT-3 is certainly incredible, as it brings an elegant set of words, along with the intelligent use of intelligence in its sentences. But the question remains as to whether such embellished phrases and paragraphs actually provide the truth.
This issue is well known to the research community, which formulates it in terms of substantiation. Currently, most generative text models do not have the ability to substantiate their claims in reality, or at least to attribute them to an external source; often hallucinates fakes with plausible sound. This limits their applicability to creative fields (writing fiction, games, entertainment, etc.) and makes them dangerous in places where truthfulness should be the first priority (news, scientific articles, education, etc.). The situation becomes even more worrying in ill-intentioned hands. Generative text templates provide a tool for bad actors to lie on the scale. They can flood social platforms with overwhelming amounts of content that seems true only because of its sheer volume. They can also target individuals, portraying lies in a way that convinces each individual in particular, based on their social media profile. The way researchers are currently trying to address the issue of soundness is by incorporating an additional recovery step into the generation process: before producing the output text, the model is trained to perform a search in an external database and gather evidence. to support his claims.
LaMDA’s attempt to be grounded
Google’s state-of-the-art conversational model, LaMDA, uses a whole set of tools to substantiate its answers: a recovery component, a computer, and a translator. Based on the conversation history, LaMDA first produces an unconstrained response (unfounded at this stage). LaMDA assesses whether adjustments are required (ie whether to use any of its tools). When LaMDA decides that supporting evidence is required, (a) it produces a query, (b) issues the query in search of a supporting document, and (b) rewrites the previous answer in a manner consistent with the source taken. Steps 2 and 3 are repeated until LaMDA decides that no further adjustments are required. Please note that this may require recovery of multiple samples. While this method is shown to improve the soundness of a basic model without recovery (from 60% to just under 80%), it is still far from human performance (~ 95% even when people do not have access to a information retrieval system).
It is still a long and complicated process
Defining and measuring soundness is a challenge in itself. In a recent paper from Google, researchers distinguish between fact-checking (i.e., judging whether a statement is universally true) and attribution (i.e., identifying a supporting document). The latter is somewhat more treatable, as it postpones the question of whether the identified source is credible. But even after decoupling these two concepts, the reasoning about attribution is not trivial. Another challenge in ensuring soundness is the lack of training data. To teach a model about assignment, you need to provide positive and negative pairs of
Most word-generation models (including the GPT family) are prone to making false statements because of their inability to substantiate their answers in the outside world. This is because they have been instructed to sound true, not honest. Models such as LaMDA try to address this issue by incorporating a recovery component (i.e., searches into an external database) and iteratively improving the model’s response until it is consistent with the evidence. Although promising, this strategy is uncertain. It will be interesting to see how the community responds to this pressing challenge.
Posting GPT-3 can lie wrongly to make the content interesting appeared first on.