-
Notifications
You must be signed in to change notification settings - Fork 944
Open
Description
Environment:
- Python version 3.7
- Spark version 2.4
- TensorFlow version 2.5
- TensorFlowOnSpark version 2.2.3
- Cluster version hadoop
Describe the bug:
I found the evaluator node won't work any more after sometime while training nodes work fine and the whole cluster doesn't crash. The total training step is 80000 and the evaluator only evaluates for 10000+ step. After that no more logs are output.
Metadata
Metadata
Assignees
Labels
No labels