Brief Review — Bidirectional Recurrent Neural Networks

Bidirectional RNN (BRNN), One Forward, One Backward

3 min readAug 21, 2022

--

Bidirectional Recurrent Neural Networks
Bidirectional RNN (BRNN), by ATR Interpreting Telecommunications Research Laboratory
1997 TSP, Over 7000 Citations (Sik-Ho Tsang @ Medium)
Recurrent Neural Network, RNN, Sequence Model

A regular recurrent neural network (RNN) is extended to a bidirectional recurrent neural network (BRNN), by training it simultaneously in positive and negative time direction.

Outline

Bidirectional Recurrent Neural Network (BRNN)
Results

1. Bidirectional Recurrent Neural Network (BRNN)

1. Conventional Uni-Directional RNN

**General structure of a regular unidirectional RNN shown (a) with a delay line and (b) unfolded in time for two time steps**

The above figure shows a basic RNN architecture with a delay line and unfolded in time for two time steps.

Predictions are always coming from the previous information.

1.2. Proposed Bidirectional RNN (BRNN)

**General structure of the bidirectional recurrent neural network (BRNN) shown unfolded in time for three time steps**

By using bidirectional RNN (BRNN), future information can also be included for prediction.

2. Results

**TIMIT Phoneme Classification Results**

The TIMIT phoneme database is a well-established database consisting of 6300 sentences spoken by 630 speakers.
The training data set consisting of 3696 sentences from 462 speakers, and the test data set consisting of 1344 sentences from 168 speakers.
The data is segment which gives 142910 phoneme segments for training and 51681 for testing.
MERGE is the method to merge results from FOR-RNN and BACK-RNN.

BRNN outperforms all MLP and RNN.

**Modified bidirectional recurrent neural network structure shown here with extensions for the forward posterior probability estimation**

A more enhanced BRNN is proposed.
First, instead of connecting the forward and backward states to the current output states, they are connected to the next and previous output states, respectively, and the inputs are directly connected to the outputs.
Second, if in the resulting structure the first L weight connections from the inputs to the backward states and the inputs to the outputs are cut, then only discrete input information from can be used to make predictions.