Wednesday, June 30, 2010

Problem Simplification

I will mention how different AI problems are related. I present them from the most general problems to the most specific problems. Each more specific problem includes also the assumptions from the previous simplifications.

The problems:

1) General Intelligence

The goal is to choose an action to maximize the future total reward:

best_y(observations) = argmax_action future_total_reward(
    observations, action)

This problem is considered by AIXI.

2) Reinforcement Learning


  1. The environment is stationary. I.e., P(Trajectory) is a fixed probability distribution.


  • We can talk about the expected value of a function with respect to the probability distribution.
  • A fixed policy is enough for the fixed environment.
best_policy = argmax_policy E[

best_y(observations) = a draw from best_policy(Y|observations)

Note that it is not needed to compute the expected total reward. Its gradient is enough.

3) Contextual Bandits


  1. An action does not affect the future observations. The sequence of seen contexts is already assigned.
  2. The reward depends only on the used action and the context.


  • There is no delayed future reward. Only the immediate reward is caused by the action. There is no confusion what caused it. That simplifies training.
  • It is not needed to use a stochastic policy. It cannot help us from being stuck. We cannot affect the observations. The chosen decision could be deterministic.
best_y = argmax_y E[total_reward(Contexts, y)]

4) Supervised Regression

The agent is presented with (x, target) examples.


  1. The seen examples are independent. Their probability does not depend on the already seen examples.
  2. The examples are identically distributed. They share a P(Target, X) probability distribution.
  3. The reward function is known. It is possible to compute all possible rewards after seeing a target.


  • The maximum of the expected total reward is at the same point as the maximum of the expected reward from a single example.
  • No exploration is needed. We cannot affect the future observations. And we can compare all possible rewards.
best_y = argmax_y E[reward(Target, y(X))]

The distribution P(X,Target) is still unknown.

5) Squared Loss


  1. The reward function is known to be:
reward(target, y(x)) = - constant * (target - y(x))**2


  • We can find the argmax_y explicitly. The derivative of the expected reward is a linear function of y. It leads to the following solution:
best_y(x) = E[Target|x]

Other simplifications could go in different directions. For example, planning assumes a known deterministic environment.

Used Offline Resources

The supervised learning problem is described in chapter 1 of Bishop's Pattern Recognition and Machine Learning.

Thursday, June 24, 2010

Statistical Analysis Community

Do you want to teach yourself machine-learning? The following links may guide you:

The above articles were written by the brain behind FlightCaster. That give them some credibility.

There is also a proposal to create a Q&A site for statistics, data analysis, data mining and data visualization. Join the community if you plan to ask and answer statistical questions. We need more people to start the beta phase. Programmers already have their Q&A site at Stack Overflow.