ๅบไบNumpy็ๆทฑๅบฆๅผบๅๅญฆไน
่ฟไธช repo ๅ ๅซไธ็ปไปๅคดๅผๅงไฝฟ็จ numpy ๅฎ็ฐ็็จไบๅผบๅๅญฆไน ็ๅบ็จ็จๅบๅ็ฎๆณใๅ ๆฌ็็ฎๆณq-learningใๅบไบๆทฑๅบฆ็ฅ็ป็ฝ็ป็REINFORCEใActor-Criticๅppo็ญใ
.
โโโ core
โโโ bandit.py # EpsilonGreedy/UCB/LinUCB/ThompsonSampling algorithm
โโโ smab.py # stochastic Multi-Armed Bandit (sMAB)
โโโ cmab.py # contextual Multi-Armed Bandit (cMAB) based on Thompson Sampling
โโโ onlineCluster.py # online k-means using Lloyd's algorithm
โโโ pg.py # REINFORCE algorithm
โโโ deep_q_learning.py # Deep Neural Network based Q-learning
โโโ ac.py # Actor-Critic algorithm
โโโ ppo.py # Proximal Policy Optimization
โโโ DynaQ.py # Dyna-Q algorithm
โโโ DynaQ_plus.py # Time-based model for planning in Dyna-Q+
โโโ preprocessing
โโโ feature_transformer.py # OneHotEncoder/TargetEncoder
โโโ scaler.py # StandardScaler/MinMaxScaler/MaxAbsScaler
โโโ stats.py # runningReward
โโโ common
โโโ net.py # Common-deep-network
โโโ optim.py # Optimizer
โโโ README.md
