Deepseek Codes and Paper

January 26, 2025January 26, 2025 / Naixian Zhang / Leave a comment

in the github page, the three repos are pinnned: DeepSeek-VL2, an advanced series of large Mixture-of-Experts (MoE) Vision-Language Models that significantly improves upon its predecessor, DeepSeek-VL. DeepSeek-VL2 demonstrates superior capabilities across various tasks, including but not limited to visual question answering, optical character recognition, document/table/chart understanding, and visual grounding. Our model series is composed of … Continue reading Deepseek Codes and Paper

Who is the Inventor of Deepseek and What are the Key Innovation of Deepseek v3 R1 Model

January 26, 2025January 26, 2025 / Naixian Zhang / Leave a comment

DeepSeek originated as a research initiative within High-Flyer, a Chinese quantitative hedge fund known for its AI-driven trading strategies. In April 2023, High-Flyer established DeepSeek as an independent entity dedicated to advancing artificial general intelligence (AGI), explicitly separating its research from the firm's financial operations. Wikipedia Since its inception, DeepSeek has developed several notable AI … Continue reading Who is the Inventor of Deepseek and What are the Key Innovation of Deepseek v3 R1 Model

Ability to Understand Essence of Things

January 24, 2025January 26, 2025 / Naixian Zhang / Leave a comment

The ability to grasp the essence—the INSIGHT—is everything. You can outsource or delegate many tasks, but the insight must remain your own. Insight drives conviction, and conviction fuels perseverance. So, how do you develop INSIGHT? From my experience, it starts with mastering the highest forms of human knowledge: math, physics, and engineering. These disciplines capture … Continue reading Ability to Understand Essence of Things

Practicing C C++: Continue

January 24, 2025 / Naixian Zhang / Leave a comment

The deeper I dive into this topic, the more fascinating and fundamental knowledge I uncover. Using the swap function outlined in the previous blog, we understand the core concepts of pointers and addresses. In C, however, you can’t pass constants directly to a function like swap(3, 4). Instead, you must assign the values to variables, … Continue reading Practicing C C++: Continue

Practicing C C++

January 19, 2025January 24, 2025 / Naixian Zhang / Leave a comment

Basic Function with Return Value: int add(int a, int b) { return a + b; } int result = add(5, 3); Reference, address, pointer are vital and fundamental concepts in C and C++! A variable is a named storage location in memory that holds a value. It has a specific data type, which determines what … Continue reading Practicing C C++

Versatile “Kernels”

January 18, 2025January 18, 2025 / Naixian Zhang / Leave a comment

The concept of kernel is everywhere: Here’s a comparison of the various contexts in which the term "kernel" is used, organized in a table format: ContextDefinitionFunctionalityCommon CharacteristicsGPU ComputingA function executed in parallel on the GPU.Performs data processing tasks efficiently using parallelism.Written in CUDA/OpenCL, optimized for large datasets.Linear AlgebraThe set of vectors mapped to zero by … Continue reading Versatile “Kernels”

LLM.c Codes from Andrej’s github

January 18, 2025January 19, 2025 / Naixian Zhang / Leave a comment

/* GPT-2 Transformer Neural Net training loop. See README.md for usage. */ #include <unistd.h> #include <stdio.h> #include <stdlib.h> #include <stdarg.h> #include <string> #include <string_view> #include <sys/stat.h> #include <sys/types.h> // ----------- CPU utilities ----------- // defines: fopenCheck, freadCheck, fcloseCheck, fseekCheck, mallocCheck // defines: create_dir_if_not_exists, find_max_step, ends_with_bin #include "llmc/utils.h" // defines: tokenizer_init, tokenizer_decode, tokenizer_free #include "llmc/tokenizer.h" // … Continue reading LLM.c Codes from Andrej’s github

CUDA Programming

January 17, 2025January 17, 2025 / Naixian Zhang / Leave a comment

If I have this most simple task to accomplish, writing in CUDA codes: #include <iostream> #include <cuda_runtime.h> // CUDA kernel for adding two integers __global__ void addKernel(int* a, int* b, int* c) { *c = *a + *b; } int main() { // Create two integers for host int h_a = 5, h_b = 7; … Continue reading CUDA Programming

Reproduce GPT2 (124M) by Andrej Karpathy LLM.c

January 15, 2025 / Naixian Zhang / Leave a comment

In his latest talk at the CUDA event, Andrej showcased his work on replicating the GPT-2 LLM using C and CUDA, effectively eliminating reliance on PyTorch and all dependencies except one. The key takeaway is profound: PyTorch, once considered a massive and indispensable package for LLM and AI programming, is essentially a crutch for when … Continue reading Reproduce GPT2 (124M) by Andrej Karpathy LLM.c

Reproduce GPT2 (124M) by Andrej Karpathy 2 Self-Attention Transformer

January 13, 2025January 15, 2025 / Naixian Zhang / Leave a comment

The key content here is generated from the 2017 paper "attention is all you need". so what is the attention? Attention is a communication mechanism. Can be seen as nodes in a directed graph looking at each other and aggregating information with a weighted sum from all nodes that point to them, with data-dependent weights. but … Continue reading Reproduce GPT2 (124M) by Andrej Karpathy 2 Self-Attention Transformer