# What An Algorithm Worker Needs To Grasp

I stumbled upon a training course given by a so-called expert to train algorithm workers, claiming they will be paid extraordinary salary once qualified. So I jot down the content and fill in what I know accordingly.

1.语言特性 – versatile, easy to pick up

2.程序设计环境

Anaconda环境简介 – It is an open source, easy-to-install high-performance Python and R distribution, with the conda package and environment manager and collection of 1,000+ open source packages with free community support. (from Anaconda distribution documentation)

3.语法基础

4.常用函数

5.语句结构

6. 常用库介绍

Numpy、 Pandas、 Matplotlib等等

7. 面向对象方法

案例

1.算法分析与Big O简介 – Big O problem describes the performance or complexity of an algorithm, specifically when the worst-case scenario happens.

2.Big O 案例

3.Python 数据结构中的Big O

1.特性

2.函数与方法

1.时间序列处理初步

2.Dataframe与Series

3.常用方法与函数

4.类数据库查询

1.可视化图件意义及制作方法

2.Matplotlib,Seaborn及Pandas Plotting应用

3.对象特性

Python实现方式

1.贪心算法：原理与实例, in contrast to the brute force approach, in optimizing assignment problems, Greedy Algorithm approach is adopted to pair the longest task with a sorted list.

`````` A = [6, 3, 2, 7, 5, 5]
A = sorted(A)
print(A)
for i in range(len(A)//2):
print(A[i], A[~i]``````

2.递归与遍历 – recursion

3.常用排序算法：算法原理、实例

4.动态规划算法初步：原理、应用场景案例

5.Hash函数：原理、Hash表的应用 – Hash Tables and Hash Functions

key value pairs, or hash map, transform the key to a small index number this process is hashing algorithm or function.
Collision resolution
-linear probing
plus 3 rehash
double hashing

1.概率论与统计基础

2.Bayes原理

3.最大似然原理

4.机器学习“武器库”概况

1.预测模型与最小二乘：（多元）线性回归

2.Lagarange法：案例投资组合管理 – Maximization of a function with a Constraint is common in economic situations. That’s why Lagarange multipliers come into play. Referring to this article for its application.

3.牛顿法，最速下降及其变种

1.Logit回归原理

2.损失函数

3.偏差与方差

4.欠拟合与过拟合

5.评估参数与方法

6.案例

1.EM算法思想： Kmeans算法等 -K-means clustering is an unsupervised machine learning algorithms. It starts with a randomly selected centroids, then performs iterative calculations to optimize the positions of the centroids.

2.树类算法：不纯度计算：熵与Gini系数 – cluster

Ensemble原理：Boosting，Bagging, Stacking –

Bagging, Boosting and Stacking are three main terms describing the ensemble (combination) of various models into one more effective model:
1. Bagging, is shorthand for the combination of bootstrapping and aggregating. Bootstrapping is a method to help decrease the variance of the classifier and reduce overfitting, by resampling data from the training set with the same cardinality as the original set. The model created should be less overfitted than a single individual model.
2. Boosting is to add additional models to the overall ensemble model sequentially.
3. Stacking is a new model is trained from the combined predictions of two (or more) previous model. The predictions from the models are used as inputs for each sequential layer, and combined to form a new set of predictions.

Citing from the paper by Scott Fortmann, “Bagging and other resampling techniques can be used to reduce the variance in model predictions. In bagging (Bootstrap Aggregating), numerous replicates of the original data set are created using random selection with replacement. Each derivative data set is then used to construct a new model and the models are gathered together into an ensemble. To make a prediction, all of the models in the ensemble are polled and their results are averaged.

One powerful modeling algorithm that makes good use of bagging is Random Forests. Random Forests works by training numerous decision trees each based on a different resampling of the original training data. In Random Forests the bias of the full model is equivalent to the bias of a single decision tree (which itself has high variance). By creating many of these trees, in effect a “forest”, and then averaging them the variance of the final model can be greatly reduced over that of a single tree. In practice the only limitation on the size of the forest is computing time as an infinite number of trees could be trained without ever increasing bias and with a continual (if asymptotically declining) decrease in the variance.”

GBDT,RandomForest -GBM and RF both are ensemble learning methods. GBM and RF differs in the way the trees are built. GBT build trees one at a time, GBMs are more sensitive to overfitting if the data is noisy. FRs train each tree independently, using a random sample, are less likely to overfit on the training data.

3.聚类算法：PCA、 SVD、T-SNE

4.支持向量机 – SVM

5.特征工程及实战技巧

Sk-Learn库使用方法

K-Fold交叉检验

1.信号分解与时频分析

2.滤波与重构

3.ARIMA模型

Python实现

4.Garch模型：原理及Python实现 – ARCH model is appropriate when the error variance in a time series follows an autoregressive (AR) model; if an autoregressive moving average (ARMA) model is assumed for the error variance, the model is a generalized autoregressive conditional heteroskedasticity (GARCHmodel

5.随机过程：理论、随机采样，蒙特卡罗法

6.案例

ARIMA股价预测 –ARIMA is an acronym that stands for AutoRegressive Integrated Moving Average. It is a class of model that captures a suite of different standard temporal structures in time series data.

1.激活函数

2.梯度下降算法

3.正向传播与反向传播

1.Tensorflow、Keras、Theano库应用基础

2.手把手学习底层代码

1.Dropout

2.BatchNormalisation

3.激活函数优化

4.结构优化

1.图像滤波与特征

2.输出特征尺寸计算

3.参数调优

1.循环神经网络（RNN）

2.LSTM，Gru等

3.新技术及学习方法

1.统计与概率题

2.智力题

3.数据库SQL

4.经典算法题概要

5.机器学习算法相关

6.实务类算法设计题

This site uses Akismet to reduce spam. Learn how your comment data is processed.