RL-05-02-结构-Q-Table

Q-Table 存储 $Q(s,a)$，形状 $|S| \times |A|$。见 RL-03-02-算法-Q-Learning。

一、稠密表

import numpy as np
Q = np.zeros((n_states, n_actions), dtype=np.float32)
# 更新
Q[s, a] += alpha * (td_target - Q[s, a])
# 贪心
a = Q[s].argmax()

空间：$|S| \times |A| \times 4$ bytes（float32）。

状态空间大但访问少时：

1
2
3

from collections import defaultdict
Q = defaultdict(lambda: np.zeros(n_actions))
Q[s][a] += alpha * delta

或用 dict[(s,a)] -> float（更新需遍历同 s 的 a）。

1
2
3

import matplotlib.pyplot as plt
plt.imshow(Q.max(axis=1).reshape(4, 4))
plt.colorbar(); plt.title("max_a Q(s,a)")