Brief Review — Leveraging Passage Retrieval with Generative Models for Open Domain Question Answering

Fusion-in-Decoder, Text Retrieval for Question Answering

Sik-Ho Tsang
3 min readDec 29, 2023
An approach to open domain question answering. First, it retrieves support text passages from an external source of knowledge such as Wikipedia. Then, a generative encoder-decoder model produces the answer, conditioned on the question and the retrieved passages.

Leveraging Passage Retrieval with Generative Models for Open Domain Question Answering
, by Facebook AI Research; ENS, PSL University; Inria
2021 EACL, Over 600 Citations (Sik-Ho Tsang @ Medium)

Dense Text Retrieval
2019 [Sentence-BERT (SBERT)] 2020 [Retrieval-Augmented Generation (RAG)] [Dense Passage Retriever (DPR)]
==== My Other Paper Readings Are Also Over Here ====

  • Fusion-in-Decoder is proposed, in which encoder processes passages independently, which is allowed to scale to large number of contexts.
  • On the other hand, processing passages jointly in the decoder allows to better aggregate evidence from multiple passages.


  1. Fusion-in-Decoder
  2. Results

1. Fusion-in-Decoder

  • Fusion-in-Decoder proceeds in two steps, by first retrieving supporting passages using either sparse or dense representations.
  • Then, a sequence-to-sequence (Seq2Seq) model generates the answer, taking as input the retrieved passages in addition to the question.

1.1. Retrieval

  • Two methods are considered: BM25 (Robertson et al., 1995) and DPR.

1.2. Reading

  • Pretrained language models, T5, base and large, containing respectively 220M and 770M parameters, are used.

Each retrieved passage and its title are concatenated with the question, and processed independently from other passages by the encoder.

  • Special tokens are added: [question:, title: and context:] before the question, title and text of each passage.

Finally, the decoder performs attention over the concatenation of the resulting representations of all the retrieved passages. The model thus performs evidence fusion in the decoder only.

1.3. Difference From RAG Model

  • By processing passages independently in the encoder, but jointly in the decoder, this method differs from Min et al. (2020) and RAG.

2. Results

2.1. SOTA Comparisons

Comparison to state-of-the-art

While conceptually simple, this method outperforms existing work on the NaturalQuestion and TriviaQA benchmarks.

  • On NaturalQuestions, the closed book T5 model obtains 36.6% accuracy with 11B parameters, while the proposed approach obtains 44.1% with 770M parameters plus Wikipedia with BM25 retrieval.

2.2. Scaling with Number of Passages

Increasing the number of passages from 10 to 100 leads to 6% improvement on TriviaQA and 3.5% improvement on NaturalQuestions.

2.3. Impact of the Number of Training Passages

  • The reported performance is obtained by training with different numbers of passages, while testing with 100 passages.

It is observed that reducing the number of training passages leads to a decrease of accuracy.

  • Further, authors propose to finetune the previous models using 100 passages for 1000 steps. This allows to reduce the accuracy gap, while using significantly less computational resources.



Sik-Ho Tsang

PhD, Researcher. I share what I learn. :) Linktree: for Twitter, LinkedIn, etc.