Practicing C C++

January 19, 2025January 24, 2025 / Naixian Zhang / Leave a comment

Basic Function with Return Value: int add(int a, int b) { return a + b; } int result = add(5, 3); Reference, address, pointer are vital and fundamental concepts in C and C++! A variable is a named storage location in memory that holds a value. It has a specific data type, which determines what … Continue reading Practicing C C++

Versatile “Kernels”

January 18, 2025January 18, 2025 / Naixian Zhang / Leave a comment

The concept of kernel is everywhere: Here’s a comparison of the various contexts in which the term "kernel" is used, organized in a table format: ContextDefinitionFunctionalityCommon CharacteristicsGPU ComputingA function executed in parallel on the GPU.Performs data processing tasks efficiently using parallelism.Written in CUDA/OpenCL, optimized for large datasets.Linear AlgebraThe set of vectors mapped to zero by … Continue reading Versatile “Kernels”

LLM.c Codes from Andrej’s github

January 18, 2025January 19, 2025 / Naixian Zhang / Leave a comment

/* GPT-2 Transformer Neural Net training loop. See README.md for usage. */ #include <unistd.h> #include <stdio.h> #include <stdlib.h> #include <stdarg.h> #include <string> #include <string_view> #include <sys/stat.h> #include <sys/types.h> // ----------- CPU utilities ----------- // defines: fopenCheck, freadCheck, fcloseCheck, fseekCheck, mallocCheck // defines: create_dir_if_not_exists, find_max_step, ends_with_bin #include "llmc/utils.h" // defines: tokenizer_init, tokenizer_decode, tokenizer_free #include "llmc/tokenizer.h" // … Continue reading LLM.c Codes from Andrej’s github

CUDA Programming

January 17, 2025January 17, 2025 / Naixian Zhang / Leave a comment

If I have this most simple task to accomplish, writing in CUDA codes: #include <iostream> #include <cuda_runtime.h> // CUDA kernel for adding two integers __global__ void addKernel(int* a, int* b, int* c) { *c = *a + *b; } int main() { // Create two integers for host int h_a = 5, h_b = 7; … Continue reading CUDA Programming

Reproduce GPT2 (124M) by Andrej Karpathy LLM.c

January 15, 2025 / Naixian Zhang / Leave a comment

In his latest talk at the CUDA event, Andrej showcased his work on replicating the GPT-2 LLM using C and CUDA, effectively eliminating reliance on PyTorch and all dependencies except one. The key takeaway is profound: PyTorch, once considered a massive and indispensable package for LLM and AI programming, is essentially a crutch for when … Continue reading Reproduce GPT2 (124M) by Andrej Karpathy LLM.c

Reproduce GPT2 (124M) by Andrej Karpathy 2 Self-Attention Transformer

January 13, 2025January 15, 2025 / Naixian Zhang / Leave a comment

The key content here is generated from the 2017 paper "attention is all you need". so what is the attention? Attention is a communication mechanism. Can be seen as nodes in a directed graph looking at each other and aggregating information with a weighted sum from all nodes that point to them, with data-dependent weights. but … Continue reading Reproduce GPT2 (124M) by Andrej Karpathy 2 Self-Attention Transformer

Reproduce GPT2 (124M) by Andrej Karpathy 2 Weights and Bias Initials Normalization, BatchNorm and BackProp in Makemore

January 12, 2025January 12, 2025 / Naixian Zhang / Leave a comment

Diving deeper into Makemore codes to illustrate subtle details affecting the nn output. For example, the concept of "dead neurons" if the squashing function, say, tanh squashed too many inputs to the polar data points of -1 and +1, causing previous neuron's gradients killed: By resetting the scale of the weights and bias initialized, the … Continue reading Reproduce GPT2 (124M) by Andrej Karpathy 2 Weights and Bias Initials Normalization, BatchNorm and BackProp in Makemore

Some Neural Network Concepts

January 9, 2025January 10, 2025 / Naixian Zhang / Leave a comment

What is residual pathway? A residual pathway (or residual connection) is a mechanism in neural networks that allows the original input of a layer to be added directly to its output. It was introduced in ResNet (Residual Networks) and has since become a fundamental component in modern architectures like Transformers. In a Transformer block, the … Continue reading Some Neural Network Concepts

Reproduce GPT2 (124M) by Andrej Karpathy 3 Tokenization

January 9, 2025January 9, 2025 / Naixian Zhang / Leave a comment

Tokenization is the process of breaking down text into smaller units (tokens) such as words, subwords, or characters. Different tokenization methods are used based on the task, language, and requirements of the model. Word-Based Tokenization Character-Based Tokenization, this increase computation cost a lot. Subword-Based Tokenization like Byte Pair Encoding (BPE), Wordpiece of BERT, SentencePiece. Sentence … Continue reading Reproduce GPT2 (124M) by Andrej Karpathy 3 Tokenization

Reproduce GPT2 (124M) by Andrej Karpathy 2 Makemore

January 8, 2025January 11, 2025 / Naixian Zhang / Leave a comment

""" you give this script some words (one per line) and it will generate more things like it. uses super state of the art Transformer AI tech this code is intended to be super hackable. tune it to your needs. Changes from minGPT: - I removed the from_pretrained function where we init with GPT2 weights … Continue reading Reproduce GPT2 (124M) by Andrej Karpathy 2 Makemore