Basic Function with Return Value: int add(int a, int b) { return a + b; } int result = add(5, 3); Reference, address, pointer are vital and fundamental concepts in C and C++! A variable is a named storage location in memory that holds a value. It has a specific data type, which determines what … Continue reading Practicing C C++
Versatile “Kernels”
The concept of kernel is everywhere: Here’s a comparison of the various contexts in which the term "kernel" is used, organized in a table format: ContextDefinitionFunctionalityCommon CharacteristicsGPU ComputingA function executed in parallel on the GPU.Performs data processing tasks efficiently using parallelism.Written in CUDA/OpenCL, optimized for large datasets.Linear AlgebraThe set of vectors mapped to zero by … Continue reading Versatile “Kernels”
LLM.c Codes from Andrej’s github
/* GPT-2 Transformer Neural Net training loop. See README.md for usage. */ #include <unistd.h> #include <stdio.h> #include <stdlib.h> #include <stdarg.h> #include <string> #include <string_view> #include <sys/stat.h> #include <sys/types.h> // ----------- CPU utilities ----------- // defines: fopenCheck, freadCheck, fcloseCheck, fseekCheck, mallocCheck // defines: create_dir_if_not_exists, find_max_step, ends_with_bin #include "llmc/utils.h" // defines: tokenizer_init, tokenizer_decode, tokenizer_free #include "llmc/tokenizer.h" // … Continue reading LLM.c Codes from Andrej’s github
CUDA Programming
If I have this most simple task to accomplish, writing in CUDA codes: #include <iostream> #include <cuda_runtime.h> // CUDA kernel for adding two integers __global__ void addKernel(int* a, int* b, int* c) { *c = *a + *b; } int main() { // Create two integers for host int h_a = 5, h_b = 7; … Continue reading CUDA Programming
Reproduce GPT2 (124M) by Andrej Karpathy LLM.c
In his latest talk at the CUDA event, Andrej showcased his work on replicating the GPT-2 LLM using C and CUDA, effectively eliminating reliance on PyTorch and all dependencies except one. The key takeaway is profound: PyTorch, once considered a massive and indispensable package for LLM and AI programming, is essentially a crutch for when … Continue reading Reproduce GPT2 (124M) by Andrej Karpathy LLM.c
Reproduce GPT2 (124M) by Andrej Karpathy 2 Self-Attention Transformer
The key content here is generated from the 2017 paper "attention is all you need". so what is the attention? Attention is a communication mechanism. Can be seen as nodes in a directed graph looking at each other and aggregating information with a weighted sum from all nodes that point to them, with data-dependent weights. but … Continue reading Reproduce GPT2 (124M) by Andrej Karpathy 2 Self-Attention Transformer
Reproduce GPT2 (124M) by Andrej Karpathy 2 Weights and Bias Initials Normalization, BatchNorm and BackProp in Makemore
Diving deeper into Makemore codes to illustrate subtle details affecting the nn output. For example, the concept of "dead neurons" if the squashing function, say, tanh squashed too many inputs to the polar data points of -1 and +1, causing previous neuron's gradients killed: By resetting the scale of the weights and bias initialized, the … Continue reading Reproduce GPT2 (124M) by Andrej Karpathy 2 Weights and Bias Initials Normalization, BatchNorm and BackProp in Makemore
Some Neural Network Concepts
What is residual pathway? A residual pathway (or residual connection) is a mechanism in neural networks that allows the original input of a layer to be added directly to its output. It was introduced in ResNet (Residual Networks) and has since become a fundamental component in modern architectures like Transformers. In a Transformer block, the … Continue reading Some Neural Network Concepts
Reproduce GPT2 (124M) by Andrej Karpathy 3 Tokenization
Tokenization is the process of breaking down text into smaller units (tokens) such as words, subwords, or characters. Different tokenization methods are used based on the task, language, and requirements of the model. Word-Based Tokenization Character-Based Tokenization, this increase computation cost a lot. Subword-Based Tokenization like Byte Pair Encoding (BPE), Wordpiece of BERT, SentencePiece. Sentence … Continue reading Reproduce GPT2 (124M) by Andrej Karpathy 3 Tokenization
Reproduce GPT2 (124M) by Andrej Karpathy 2 Makemore
""" you give this script some words (one per line) and it will generate more things like it. uses super state of the art Transformer AI tech this code is intended to be super hackable. tune it to your needs. Changes from minGPT: - I removed the from_pretrained function where we init with GPT2 weights … Continue reading Reproduce GPT2 (124M) by Andrej Karpathy 2 Makemore