DeepGRU: Gesture Recognition Utility (DeepGRU)

DeepGRU is a neural network architecture created by Mehran Maghoumi and Joseph J. LaViola Jr, originally designed to perform the task of gesture recognition, but is widely applicable to general sequence classification tasks.

The architecture is essentially a recurrent neural network encoder combined with an attentional module which learns to place more focus on sub-sequences which are more important for the classification.

Rather than the commonly used long short-term memory (LSTM) unit, the authors opt for the gated recurrent unit (GRU), which has fewer parameters, and therefore makes the network faster to train. Interestingly, the encoder network used in DeepGRU is not bidirectional, which is typically the standard way to use recurrent neural networks in sequence classification and sequence-to-sequence modelling. The authors found that a unidirectional one was sufficient, faster to train and had similar performance to a bidirectional one.

DeepGRU

The DeepGRU class is a PyTorch implementation of the DeepGRU architecture.

A utility function collate_fn() is also provided, which is passed to a torch.utils.data.DataLoader and specifies how batches should be formed from provided observation sequences.

Note

The existing preprocessing methods in sequentia.preprocessing are currently only applicable to lists of numpy.ndarray objects, and therefore cannot be applied as transformations for torch.Tensor objects.

Unfortunately this means that the preprocessing methods can only be used to preprocess data for sequentia.classifiers.knn.KNNClassifier and sequentia.classifiers.hmm.HMMClassifier, and not sequentia.classifiers.rnn.DeepGRU.

API Reference

class sequentia.classifiers.rnn.DeepGRU(n_features, n_classes, dims={'fc': 256, 'gru1': 512, 'gru2': 256, 'gru3': 128}, device=None)[source]

A modular PyTorch implementation of the DeepGRU (Deep Gesture Recognition Utility) recurrent neural network architecture designed by Maghoumi & LaViola Jr. 1, originally for gesture recognition, but applicable to general sequence classification tasks.

Parameters
n_features: int

The number of features that each observation within a sequence has.

n_classes: int

The number of different sequence classes.

dims: dict

A dictionary consisting of dimension configuration for the GRUs and fully-connected layers.

Note

Values for the keys 'gru1', 'gru2', 'gru3' and 'fc' must be set.

device: str, optional

The device to send the model parameters to for computation.

If no device is specified, a check is made for any available CUDA device, otherwise the CPU is used.

Notes

1

Mehran Maghoumi & Joseph J. LaViola Jr. “DeepGRU: Deep Gesture Recognition Utility” Advances in Visual Computing, 14th International Symposium on Visual Computing, ISVC 2019, Lake Tahoe, NV, USA, October 7–9, 2019, Proceedings, Part I (pp.16-31)

forward(x, x_lengths)[source]

Passes the batched input sequences through the encoder network, attention module and classifier to generate log-softmax scores.

Note

Since log-softmax scores are returned, it is advised to use the negative log-likelihood loss torch.nn.NLLLoss.

Parameters
x: torch.PackedSequence

A packed representation of a batch of input observation sequences.

x_lengths: torch.Tensor (long/int)

A tensor of the sequence lengths of the batch in descending order.

Returns
log_softmax: torch.Tensor (float)

\(B\times C\) tensor of \(C\) log-softmax scores (class predictions) for each observation sequence in the batch.

Batching and collation

sequentia.classifiers.rnn.collate_fn(batch)[source]

Collects together univariate or multivariate sequences into a single batch, arranged in descending order of length.

Also returns the corresponding lengths and labels as torch.LongTensor objects.

Parameters
batch: list of tuple(torch.FloatTensor, int)

Collection of \(B\) sequence-label pairs, where the \(n^\text{th}\) sequence is of shape \((T_n \times D)\) or \((T_n,)\) and the label is an integer.

Returns
padded_sequences: torch.Tensor (float)

A tensor of size \(B \times T_\text{max} \times D\) containing all of the sequences in descending length order, padded to the length of the longest sequence in the batch.

lengths: torch.Tensor (long/int)

A tensor of the \(B\) sequence lengths in descending order.

labels: torch.Tensor (long/int)

A tensor of the \(B\) sequence labels in descending length order.