{"version":"1.0","author_name":"derwind","blog_url":"https://randommemory.hatenablog.com/","provider_name":"Hatena Blog","height":"190","blog_title":"\u3089\u3093\u3060\u3080\u306a\u8a18\u61b6","width":"100%","provider_url":"https://hatena.blog","html":"<iframe src=\"https://hatenablog-parts.com/embed?url=https%3A%2F%2Frandommemory.hatenablog.com%2Fentry%2F2022%2F03%2F05%2F180312\" title=\"\u81ea\u5df1\u6ce8\u610f - \u3089\u3093\u3060\u3080\u306a\u8a18\u61b6\" class=\"embed-card embed-blogcard\" scrolling=\"no\" frameborder=\"0\" style=\"display: block; width: 100%; height: 190px; max-width: 500px; margin: 10px 0px;\"></iframe>","url":"https://randommemory.hatenablog.com/entry/2022/03/05/180312","author_url":"https://blog.hatena.ne.jp/derwind/","type":"rich","categories":["machine_learning"],"description":"torch.nn.Transformer \u306e\u81ea\u5df1\u6ce8\u610f\u30d6\u30ed\u30c3\u30af\u306e\u5b9f\u88c5\u3092\u898b\u308b\u3068\u3001transformer.py#L352 \u306e\u3088\u3046\u306b\u306a\u3063\u3066\u3044\u308b\u3002 # self-attention block def _sa_block(self, x: Tensor, attn_mask: Optional[Tensor], key_padding_mask: Optional[Tensor]) -> Tensor: x = self.self_attn(x, x, x, \u4ed6\u306e\u5b9f\u88c5\u30b5\u30f3\u30d7\u30eb\u3082\u540c\u69d8\u306e\u5b9f\u88c5\u3067\u3042\u308b\u3002\u3064\u307e\u308a\u3001Q-K-V \u306b\u540c\u3058\u30c6\u30f3\u30bd\u30eb\u3092\u6e21\u3057\u3066\u3044\u308b\u3002\u3053\u306e\u52b9\u679c\u306b\u3064\u3044\u3066 Attention (4) - \u3089\u3093\u3060\u3080\u306a\u8a18\u2026","image_url":null,"title":"\u81ea\u5df1\u6ce8\u610f","published":"2022-03-05 18:03:12"}