Skip to content

AlgoLink/minirl

Folders and files

NameName
Last commit message
Last commit date

Latest commit

ย 

History

21 Commits
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 

Repository files navigation

ๅŸบไบŽNumpy็š„ๆทฑๅบฆๅผบๅŒ–ๅญฆไน 


่ฟ™ไธช repo ๅŒ…ๅซไธ€็ป„ไปŽๅคดๅผ€ๅง‹ไฝฟ็”จ numpy ๅฎž็Žฐ็š„็”จไบŽๅผบๅŒ–ๅญฆไน ็š„ๅบ”็”จ็จ‹ๅบๅ’Œ็ฎ—ๆณ•ใ€‚ๅŒ…ๆ‹ฌ็š„็ฎ—ๆณ•q-learningใ€ๅŸบไบŽๆทฑๅบฆ็ฅž็ป็ฝ‘็ปœ็š„REINFORCEใ€Actor-Criticๅ’Œppo็ญ‰ใ€‚

้กน็›ฎ็ป“ๆž„

.
โ”œโ”€โ”€ core
    โ”œโ”€โ”€ bandit.py               # EpsilonGreedy/UCB/LinUCB/ThompsonSampling algorithm
    โ”œโ”€โ”€ smab.py                 # stochastic Multi-Armed Bandit (sMAB)
    โ”œโ”€โ”€ cmab.py                 # contextual Multi-Armed Bandit (cMAB) based on Thompson Sampling
    โ”œโ”€โ”€ onlineCluster.py        # online k-means using Lloyd's algorithm
    โ”œโ”€โ”€ pg.py                   # REINFORCE algorithm
    โ”œโ”€โ”€ deep_q_learning.py      # Deep Neural Network based Q-learning
    โ”œโ”€โ”€ ac.py                   # Actor-Critic algorithm
    โ”œโ”€โ”€ ppo.py                  # Proximal Policy Optimization
    โ”œโ”€โ”€ DynaQ.py                # Dyna-Q algorithm
    โ”œโ”€โ”€ DynaQ_plus.py           # Time-based model for planning in Dyna-Q+
โ”œโ”€โ”€ preprocessing
    โ”œโ”€โ”€ feature_transformer.py  # OneHotEncoder/TargetEncoder
    โ”œโ”€โ”€ scaler.py               # StandardScaler/MinMaxScaler/MaxAbsScaler
    โ”œโ”€โ”€ stats.py                # runningReward
โ”œโ”€โ”€ common                      
    โ”œโ”€โ”€ net.py                  # Common-deep-network
    โ”œโ”€โ”€ optim.py                # Optimizer        
โ””โ”€โ”€ README.md

ๆŠ€ๆœฏๆžถๆž„

About

minirl

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published