

…
We’ll need a multiple classifier for multiple features, then the tool is softmax regression.
Further, to understand softmax, see the equations below
Softmax regression generalizes logistic regression to C instead of 2 classes, if c is 2, the output is two numbers only, since they add up to 1, so the other value is redundant.



To solve the issue that we need to say “the cat … was full” which is “vanishing gradient” phenomena, we apply GRU approach

LSTM (long short term memory) is aimed to solve the same vanishing gradient issue:

Deep RNN