nanowell

optimizer.step() carefully

World

Github Data

Followers 39

Following 5

Links

AI Project

Public repos: 34Public gists: 0

Differential-Transformer-PyTorch

PyTorch implementation of the Differential-Transformer architecture for sequence modeling, specifically tailored as a decoder-only model similar to large language models (LLMs). The architecture incorporates a novel Differential Attention mechanism, Multi-Head structure, RMSNorm, and SwiGLU.

star: 52fork: 5

language: Python

created at: 2024-10-08

updated at: 2025-02-06