Skip to content

Question about Attention Module #2

@kjae0

Description

@kjae0

Hi!

I have a small question "Attention module" part of your code.

Before passing final attention linear layer, there is tanh for non-linearity not ReLU.
And "flattened.shape[1]**-0.5" is multiplied after final attention.

Is there a special reason for using tanh not ReLU?
And why is that value multiplied?

Original code below.
att = self.f_att(torch.tanh(att_enc+att_dec))*flattened.shape[1]**-0.5 # att.shape = (batch, locations, 1)

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions