Question about Attention Module

Hi!

I have a small question "Attention module" part of your code.

Before passing final attention linear layer, there is tanh for non-linearity not ReLU.
And "flattened.shape[1]**-0.5" is multiplied after final attention.

Is there a special reason for using tanh not ReLU?
And why is that value multiplied?

Original code below.
att = self.f_att(torch.tanh(att_enc+att_dec))*flattened.shape[1]**-0.5  # att.shape = (batch, locations, 1)




Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Question about Attention Module #2

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Question about Attention Module #2

Description

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions