avatar
Articles
14
Tags
12
Categories
6
Home
Archives
Tags
Categories
Magnicord
Home
Archives
Tags
Categories

Magnicord

Transformer: From Principle to Implementation
Created2025-01-11|Updated2025-03-07|NLP|deep-learning•LLM•NLP•Python•Pytorch
Overview Transformer 的提出主要解决 RNN 的三个问题: 最小化每层的计算复杂度。 最小化任何一对词间的路径长度:RNN 从左到右顺序编码,需要 O(N)\mathcal{O}(N)O(N) 步才能让远距离的词间进行交互。这意味着 RNN 难以学习长距离依赖,由于梯度问题。 最大化可并行化的计算量:RNN 前向与反向传播均有 O(N)\mathcal{O}(N)O(N) 步不可并行的计算,无法充分利用 GPU, TPU 等 假设 NNN 为序列长度,DDD 为表示维度。recurrent 和 self-attention 的每层复杂度如下表所示: Layer Type Complexity per Layer Self-Attention O(N2⋅D)\mathcal{O}(N^{2} \cdot D)O(N2⋅D) Recurrent O(N⋅D2)\mathcal{O}(N \cdot D^{2})O(N⋅D2) 当 N≪DN \ll DN≪D 时,Transformer 的每层复杂度比 RNN 低。 以机器翻译任务为例,T...
Linear Regression: From Principle to Implementation
Created2024-01-17|Updated2025-03-05|Deep Learning Basics|deep-learning•Python•Pytorch
Introduction Suppose we have a dataset giving the area and age of some houses, how can we predict future house prices? Now we introduce linear regression to tackle this prediction problem. Linear regression model assumes that: price=warea⋅area+wage⋅age+b\textrm{price} = w_{\textrm{area}} \cdot \textrm{area} + w_{\textrm{age}} \cdot \textrm{age} + b price=warea​⋅area+wage​⋅age+b Example Concepts area\textrm{area}area, age\textrm{age}age features (a.k.a. inputs) price\textrm{price}pric...
Python Basic Data Types: Dictionary
Created2023-10-21|Updated2025-03-05|Python Basics|Python•data-structure
This note mainly focuses on summarizing knowledge based on Corey Schafer’s Python Tutorial. Dictionary is a collection of key-value pairs. Creating Dictionaries We use curly braces notation to represent a dictionary. empty_dict = {} # create an empty dictionarystudent = {'name': 'John', 'age': 25, 'course': ['Math', 'CompSci']}print(student) {'name': 'John', 'age': 25, 'co...
Python Basic Data Types: Lists, Tuples and Sets
Created2023-10-21|Updated2025-03-05|Python Basics|Python•data-structure
This note mainly focuses on summarizing knowledge based on Corey Schafer’s Python Tutorial Lists List is a collection which is: ordered changeable Creating Lists We use square bracket notation to represent a list. empty_list = [] # create an empty listcourses = ['History', 'Math', 'Physics', 'CompSci']print(courses) ['History', 'Math', 'Physics', 'CompSci'] Similar to string, we can use len to get the length o...
12
avatar
Magnicord
Re: Deep Learning From Scratch
Articles
14
Tags
12
Categories
6
Follow Me
Recent Posts
UV: The Definitive Solution for PyTorch, flash-attn, VeRL and OpenRLHF2025-08-10
Policy Gradient Algorithms: From REINFORCE to PPO2025-07-15
The Evolution of Policy Optimization for Enhancing LLM Reasoning: From PPO to GRPO Variants2025-07-15
An Introduction to PPO in RLHF2025-07-15
X-Enhanced Contrastive Decoding Strategies for Large Language Models2025-05-30
Categories
  • Deep Learning Basics1
  • NLP5
  • Python Basics2
  • RL4LLM2
  • Reinforcement Learning Basics3
  • dev-tools1
Tags
data-structure Pytorch LLM uv reasoning environment RLHF reinforcement-learning PEFT deep-learning Python NLP
Archives
  • August 2025 1
  • July 2025 3
  • May 2025 1
  • March 2025 2
  • January 2025 4
  • January 2024 1
  • October 2023 2
Website Info
Article Count :
14
Total Word Count :
71.1k
Unique Visitors :
Page Views :
Last Update :
© 2023 - 2025 By Magnicord
Welcome to the Journey of Deep Learning