Learning

Supervised

Dataset with inputs manually annotated for desired output
- Desired output = supervisory signal
- Manually annotated = ground truth
  - Annotated correct categories

Split data

Samples described by feature vector Dataset forms a matrix

Critic
- Converts primary reinforcement to heuristic reinforcement
- Both scalar inputs
Delayed reinforcement
- System observes temporal sequence of stimuli
- Results in generation of heuristic reinforcement signal
Minimise cost-to-go function
- Expectation of cumulative cost of actions taken over sequence of steps
- Instead of just immediate cost
- Earlier actions may have been good
  - Identify and feedback to environment
Closely related to dynamic programming

No teacher to provide desired response
Must solve temporal credit assignment problem
- Need to know which actions were the good ones

Over-fitting
- Classifier too specific to training set
- Can’t adequately generalise
Under-fitting
- Too general, not inferred enough detail
- Learns non-discriminative or non-desired pattern

Receiver Operator Characteristic Curve