Skip to content

📝 The tricks in Learn to optimize in TNCO sycamore #135

@Yonv1943

Description

@Yonv1943

下面 theta 表示 TNCO任务的“解”,它是一个表示量子电路收缩顺序的tensor, 计算 theta.argsort() 就能获得有序的 edge_id,表示依次收缩某一条边。

  1. 维持了两个 ReplayBuffer,一个保存了模型实时迭代产生的theta,另一个保存了得分较好的theta。这样让得分好的theta不至于被 ReplayBuffer 的 FIFO 规则删掉。

计算 keep_score
https://github.com/AI4Finance-Foundation/ElegantRL_Solver/blob/41a58d0ecb9daeddfa635d19a3741b0a29162342/rlsolver/rlsolver_learn2opt/tensor_train/TNCO_H2O.py#L418-L419

根据 keep_score 找出需要保存到 buffer0 的 theta
https://github.com/AI4Finance-Foundation/ElegantRL_Solver/blob/41a58d0ecb9daeddfa635d19a3741b0a29162342/rlsolver/rlsolver_learn2opt/tensor_train/TNCO_H2O.py#L372-L375

  1. 从更好的解附近开始搜索

historical_theta 是一个得分好的theta,我们对它加上噪声并在它附近开始迭代
https://github.com/AI4Finance-Foundation/ElegantRL_Solver/blob/41a58d0ecb9daeddfa635d19a3741b0a29162342/rlsolver/rlsolver_learn2opt/tensor_train/TNCO_H2O.py#L402-L408

我们也从 保存了较好的 theta 的 buffer0 里随机选出的theta 开始迭代
https://github.com/AI4Finance-Foundation/ElegantRL_Solver/blob/41a58d0ecb9daeddfa635d19a3741b0a29162342/rlsolver/rlsolver_learn2opt/tensor_train/TNCO_H2O.py#L410-L419

  1. 在训练迭代器之后,使用迭代器进行推理

可以看到

Metadata

Metadata

Assignees

No one assigned

    Labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions