AI startup Hugging Face and Facebook chose to open-source RAG (Retrieval Augmented Generation). RAG is a language processing model that can find and understand context in order to complete a variety of tasks. Team Zuckerberg says RAG can attain state-of-the-art results by way of altering its internal knowledge on the spot. This lets scientists to control what the model does and doesn’t know without wasting time or compute power.
Starting now, RAG is open-sourced and available as a part of the Hugging Face transformer library. The new software integrates with the
Datasets library. This is so it can provide knowledge source that RAG relies on.
Cutting-edge work is being done in the field of natural language understanding. This has produced models that are general-purpose. Most of the efforts that go into this have applied these new models to functions that a human could produce the solution for without needing background knowledge.
In contrast to the above, RAG relies on input data to be able to retrieve set of documents that’s relevant. It gets these documents from databases like Wikipedia. For example, given, “When did the first reptile appear on Earth?,” RAG would index documents for “reptile,” “history of earth,” and “evolution of animals.” These are then taken as computed with context using the input. The new data is then fed into a model that can produce relevant output text.
Facebook says that open-sourced RAG can leverage something called ‘late fusion’. This integrates knowledge from documents retrieved. So, it can make predictions for pairs of document-based question. Then, it aggregates the final prediction. Using this system, the software has the access it needs to find documents that contain clues to the answer, but don’t have the answer directly.
Facebook says that when performance is checked, RAG shows a high affinity for generating correct answers. It does so in situations where the answer actually was not found. RAG is also great at knowledge-intensive natural language questions.