-
Notifications
You must be signed in to change notification settings - Fork 6
Open
Description
Hi, thanks for your nice job.
I ran the training script provided in this repo, and did not change any code. However, there is a significant performance gap between the code and your paper (for example, 0.588 v.s. 0.715 AUROC on tinyimagenet).
Should I tune some learning parameters for increasing accuracy? I have tried to adjust the lr and epochs but it does not work. I am looking forward to your insightful suggestions for this.
(cvaecaposr) xx@xxx:~/cvaecaposr$ sh ./scripts/train_tinyimagenet.sh
{
"data_base_path": "./data",
"val_ratio": 0.2,
"seed": 1234,
"known_classes": [
2,
3,
13,
30,
44,
45,
64,
66,
76,
101,
111,
121,
128,
130,
136,
158,
167,
170,
187,
193
],
"unknown_classes": [
0,
1,
4,
5,
6,
7,
8,
9,
10,
11,
12,
14,
15,
16,
17,
18,
19,
20,
21,
22,
23,
24,
25,
26,
27,
28,
29,
31,
32,
33,
34,
35,
36,
37,
38,
39,
40,
41,
42,
43,
46,
47,
48,
49,
50,
51,
52,
53,
54,
55,
56,
57,
58,
59,
60,
61,
62,
63,
65,
67,
68,
69,
70,
71,
72,
73,
74,
75,
77,
78,
79,
80,
81,
82,
83,
84,
85,
86,
87,
88,
89,
90,
91,
92,
93,
94,
95,
96,
97,
98,
99,
100,
102,
103,
104,
105,
106,
107,
108,
109,
110,
112,
113,
114,
115,
116,
117,
118,
119,
120,
122,
123,
124,
125,
126,
127,
129,
131,
132,
133,
134,
135,
137,
138,
139,
140,
141,
142,
143,
144,
145,
146,
147,
148,
149,
150,
151,
152,
153,
154,
155,
156,
157,
159,
160,
161,
162,
163,
164,
165,
166,
168,
169,
171,
172,
173,
174,
175,
176,
177,
178,
179,
180,
181,
182,
183,
184,
185,
186,
188,
189,
190,
191,
192,
194,
195,
196,
197,
198,
199
],
"split_num": 0,
"batch_size": 32,
"num_workers": 0,
"dataset": "tiny_imagenet",
"z_dim": 128,
"lr": 5e-05,
"t_mu_shift": 10.0,
"t_var_scale": 0.01,
"alpha": 1.0,
"beta": 0.01,
"margin": 10.0,
"in_dim_caps": 16,
"out_dim_caps": 32,
"checkpoint": "",
"mode": "train",
"epochs": 100
}
GPU available: True, used: True
TPU available: False, using: 0 TPU cores
LOCAL_RANK: 0 - CUDA_VISIBLE_DEVICES: [0,1,2,3,4,5,6,7]
| Name | Type | Params
--------------------------------------
0 | enc | ResNet34 | 21.3 M
1 | vae_cap | VaeCap | 23.5 M
2 | fc | Linear | 10.5 M
3 | dec | Decoder | 760 K
4 | t_mean | Embedding | 51.2 K
5 | t_var | Embedding | 51.2 K
--------------------------------------
56.1 M Trainable params
0 Non-trainable params
56.1 M Total params
224.552 Total estimated model params size (MB)
/xx/anaconda3/envs/cvaecaposr/lib/python3.6/site-packages/pytorch_lightning/utilities/distributed.py:52: UserWarning: The dataloader, train dataloader, does not have many workers which may be a bottleneck. Consider increasing the value of the `num_workers` argument` (try 80 which is the number of cpus on this machine) in the `DataLoader` init to improve performance.
warnings.warn(*args, **kwargs)
/xx/anaconda3/envs/cvaecaposr/lib/python3.6/site-packages/pytorch_lightning/utilities/distributed.py:52: UserWarning: The dataloader, val dataloader 0, does not have many workers which may be a bottleneck. Consider increasing the value of the `num_workers` argument` (try 80 which is the number of cpus on this machine) in the `DataLoader` init to improve performance.
warnings.warn(*args, **kwargs)
Epoch 20: 100%|██████████▉| 312/313 [00:38<00:00, 8.21it/s, loss=4.99e+03, v_num=0, train_acc=0.938, validation_acc=0.456Epoch 21: reducing learning rate of group 0 to 2.5000e-05.█████████████████████████████▊ | 62/63 [00:03<00:00, 16.56it/s]
Epoch 28: 100%|██████████▉| 312/313 [00:37<00:00, 8.35it/s, loss=3.31e+03, v_num=0, train_acc=0.906, validation_acc=0.460Epoch 29: reducing learning rate of group 0 to 1.2500e-05.█████████████████████████████▊ | 62/63 [00:03<00:00, 16.48it/s]
Epoch 42: 100%|██████████▉| 312/313 [00:38<00:00, 8.20it/s, loss=2.32e+03, v_num=0, train_acc=0.906, validation_acc=0.459Epoch 43: reducing learning rate of group 0 to 6.2500e-06.█████████████████████████████▊ | 62/63 [00:03<00:00, 16.35it/s]
Epoch 48: 100%|██████████▉| 312/313 [00:37<00:00, 8.35it/s, loss=1.88e+03, v_num=0, train_acc=0.969, validation_acc=0.474Epoch 49: reducing learning rate of group 0 to 3.1250e-06.█████████████████████████████▊ | 62/63 [00:03<00:00, 16.42it/s]
Epoch 54: 100%|██████████▉| 312/313 [00:37<00:00, 8.25it/s, loss=2.41e+03, v_num=0, train_acc=0.938, validation_acc=0.465Epoch 55: reducing learning rate of group 0 to 1.5625e-06.█████████████████████████████▊ | 62/63 [00:03<00:00, 16.49it/s]
Epoch 60: 100%|██████████▉| 312/313 [00:37<00:00, 8.35it/s, loss=1.72e+03, v_num=0, train_acc=1.000, validation_acc=0.468Epoch 61: reducing learning rate of group 0 to 7.8125e-07.█████████████████████████████▊ | 62/63 [00:03<00:00, 16.53it/s]
Epoch 66: 100%|██████████▉| 312/313 [00:37<00:00, 8.35it/s, loss=1.88e+03, v_num=0, train_acc=1.000, validation_acc=0.471Epoch 67: reducing learning rate of group 0 to 3.9063e-07.█████████████████████████████▊ | 62/63 [00:03<00:00, 16.44it/s]
Epoch 72: 100%|██████████▉| 312/313 [00:38<00:00, 8.21it/s, loss=1.62e+03, v_num=0, train_acc=1.000, validation_acc=0.466Epoch 73: reducing learning rate of group 0 to 1.9531e-07.█████████████████████████████▊ | 62/63 [00:03<00:00, 16.36it/s]
Epoch 78: 100%|██████████▉| 312/313 [00:37<00:00, 8.35it/s, loss=1.15e+03, v_num=0, train_acc=1.000, validation_acc=0.472Epoch 79: reducing learning rate of group 0 to 9.7656e-08.█████████████████████████████▊ | 62/63 [00:03<00:00, 16.41it/s]
Epoch 84: 100%|██████████▉| 312/313 [00:40<00:00, 7.73it/s, loss=1.48e+03, v_num=0, train_acc=0.969, validation_acc=0.470Epoch 85: reducing learning rate of group 0 to 4.8828e-08.█████████████████████████████▊ | 62/63 [00:04<00:00, 15.34it/s]
Epoch 90: 100%|██████████▉| 312/313 [00:38<00:00, 8.04it/s, loss=1.68e+03, v_num=0, train_acc=0.938, validation_acc=0.472Epoch 91: reducing learning rate of group 0 to 2.4414e-08.█████████████████████████████▊ | 62/63 [00:03<00:00, 16.67it/s]
Epoch 96: 100%|██████████▉| 312/313 [00:38<00:00, 8.09it/s, loss=1.82e+03, v_num=0, train_acc=0.938, validation_acc=0.471Epoch 97: reducing learning rate of group 0 to 1.2207e-08.█████████████████████████████▊ | 62/63 [00:03<00:00, 16.40it/s]
Epoch 99: 100%|███████████| 313/313 [00:38<00:00, 8.10it/s, loss=1.76e+03, v_num=0, train_acc=1.000, validation_acc=0.468Saving latest checkpoint...
Epoch 99: 100%|███████████| 313/313 [00:40<00:00, 7.73it/s, loss=1.76e+03, v_num=0, train_acc=1.000, validation_acc=0.468]
LOCAL_RANK: 0 - CUDA_VISIBLE_DEVICES: [0,1,2,3,4,5,6,7]
/xx/anaconda3/envs/cvaecaposr/lib/python3.6/site-packages/pytorch_lightning/utilities/distributed.py:52: UserWarning: The dataloader, test dataloader 0, does not have many workers which may be a bottleneck. Consider increasing the value of the `num_workers` argument` (try 80 which is the number of cpus on this machine) in the `DataLoader` init to improve performance.
warnings.warn(*args, **kwargs)
Testing: 100%|██████████████████████████████████████████████████████████████████████████▊| 312/313 [00:23<00:00, 13.07it/s]/xx/anaconda3/envs/cvaecaposr/lib/python3.6/site-packages/pytorch_lightning/utilities/distributed.py:52: UserWarning: Metric `AUROC` will save all targets and predictions in buffer. For large datasets this may lead to large memory footprint.
warnings.warn(*args, **kwargs)
Testing: 100%|███████████████████████████████████████████████████████████████████████████| 313/313 [00:23<00:00, 13.10it/s]
--------------------------------------------------------------------------------
DATALOADER:0 TEST RESULTS
{'test_auroc': 0.5880855321884155}
Metadata
Metadata
Assignees
Labels
No labels