OAC

class OAC(model, gamma=None, tau=None, alpha=None, beta=None, delta=None, actor_lr=None, critic_lr=None)[源代码]

基类:parl.core.paddle.algorithm.Algorithm

__init__(model, gamma=None, tau=None, alpha=None, beta=None, delta=None, actor_lr=None, critic_lr=None)[源代码]

OAC algorithm :param model: forward network of actor and critic. :type model: parl.Model :param gamma: discounted factor for reward computation :type gamma: float :param tau: decay coefficient when updating the weights of self.target_model with self.model :type tau: float :param alpha: Temperature parameter determines the relative importance of the entropy against the reward :type alpha: float :param beta: determines the relative importance of sigma_Q :type beta: float :param delta: determines the relative changes of exploration`s mean :type delta: float :param actor_lr: learning rate of the actor model :type actor_lr: float :param critic_lr: learning rate of the critic model :type critic_lr: float

learn(obs, action, reward, next_obs, terminal)[源代码]

Define the loss function and create an optimizer to minize the loss.

predict(obs)[源代码]

Refine the predicting process, e.g,. use the policy model to predict actions.

sample(obs)[源代码]

Define the sampling process. This function returns an action with noise to perform exploration.