Deepseek Codes and Paper

in the github page, the three repos are pinnned: DeepSeek-VL2, an advanced series of large Mixture-of-Experts (MoE) Vision-Language Models that significantly improves upon its predecessor, DeepSeek-VL. DeepSeek-VL2 demonstrates superior capabilities across various tasks, including but not limited to visual question answering, optical character recognition, document/table/chart understanding, and visual grounding. Our model series is composed of … Continue reading Deepseek Codes and Paper

Who is the Inventor of Deepseek and What are the Key Innovation of Deepseek v3 R1 Model

DeepSeek originated as a research initiative within High-Flyer, a Chinese quantitative hedge fund known for its AI-driven trading strategies. In April 2023, High-Flyer established DeepSeek as an independent entity dedicated to advancing artificial general intelligence (AGI), explicitly separating its research from the firm's financial operations. Wikipedia Since its inception, DeepSeek has developed several notable AI … Continue reading Who is the Inventor of Deepseek and What are the Key Innovation of Deepseek v3 R1 Model

Ability to Understand Essence of Things

The ability to grasp the essence—the INSIGHT—is everything. You can outsource or delegate many tasks, but the insight must remain your own. Insight drives conviction, and conviction fuels perseverance. So, how do you develop INSIGHT? From my experience, it starts with mastering the highest forms of human knowledge: math, physics, and engineering. These disciplines capture … Continue reading Ability to Understand Essence of Things

Versatile “Kernels”

The concept of kernel is everywhere: Here’s a comparison of the various contexts in which the term "kernel" is used, organized in a table format: ContextDefinitionFunctionalityCommon CharacteristicsGPU ComputingA function executed in parallel on the GPU.Performs data processing tasks efficiently using parallelism.Written in CUDA/OpenCL, optimized for large datasets.Linear AlgebraThe set of vectors mapped to zero by … Continue reading Versatile “Kernels”

LLM.c Codes from Andrej’s github

/* GPT-2 Transformer Neural Net training loop. See README.md for usage. */ #include <unistd.h> #include <stdio.h> #include <stdlib.h> #include <stdarg.h> #include <string> #include <string_view> #include <sys/stat.h> #include <sys/types.h> // ----------- CPU utilities ----------- // defines: fopenCheck, freadCheck, fcloseCheck, fseekCheck, mallocCheck // defines: create_dir_if_not_exists, find_max_step, ends_with_bin #include "llmc/utils.h" // defines: tokenizer_init, tokenizer_decode, tokenizer_free #include "llmc/tokenizer.h" // … Continue reading LLM.c Codes from Andrej’s github

Reproduce GPT2 (124M) by Andrej Karpathy LLM.c

In his latest talk at the CUDA event, Andrej showcased his work on replicating the GPT-2 LLM using C and CUDA, effectively eliminating reliance on PyTorch and all dependencies except one. The key takeaway is profound: PyTorch, once considered a massive and indispensable package for LLM and AI programming, is essentially a crutch for when … Continue reading Reproduce GPT2 (124M) by Andrej Karpathy LLM.c

Reproduce GPT2 (124M) by Andrej Karpathy 2 Self-Attention Transformer

The key content here is generated from the 2017 paper "attention is all you need". so what is the attention? Attention is a communication mechanism. Can be seen as nodes in a directed graph looking at each other and aggregating information with a weighted sum from all nodes that point to them, with data-dependent weights. but … Continue reading Reproduce GPT2 (124M) by Andrej Karpathy 2 Self-Attention Transformer