here is the solution https://github.com/philtabor/Multi-Agent-Deep-Deterministic-Policy-Gradients/issues/2#issuecomment-912548033