TD3¶
- class TD3(model, gamma=None, tau=None, actor_lr=None, critic_lr=None, policy_noise=0.2, noise_clip=0.5, policy_freq=2)[源代码]¶
基类:
parl.core.paddle.algorithm.Algorithm
- __init__(model, gamma=None, tau=None, actor_lr=None, critic_lr=None, policy_noise=0.2, noise_clip=0.5, policy_freq=2)[源代码]¶
SAC algorithm :param model: forward network of actor and critic. :type model: parl.Model :param gamma: discounted factor for reward computation :type gamma: float :param tau: decay coefficient when updating the weights of self.target_model with self.model :type tau: float :param actor_lr: learning rate of the actor model :type actor_lr: float :param critic_lr: learning rate of the critic model :type critic_lr: float :param policy_noise: noise added to target policy during critic update :type policy_noise: float :param noise_clip: range to clip target policy noise :type noise_clip: float :param policy_freq: frequency of delayed policy updates :type policy_freq: int