Task 3039289

Name rand_automata_0049-1606103205-5788-1_3
Workunit 1375715
Created 23 Nov 2020, 7:18:24 UTC
Sent 23 Nov 2020, 7:19:26 UTC
Report deadline 30 Nov 2020, 7:19:26 UTC
Received 23 Nov 2020, 7:30:55 UTC
Server state Over
Outcome Computation error
Client state Aborted by user
Exit status 203 (0x000000CB) EXIT_ABORTED_VIA_GUI
Computer ID 4754
Run time 37 sec
CPU time 32 sec
Validate state Invalid
Credit 0.00
Device peak FLOPS 3,339.24 GFLOPS
Application version Machine Learning Dataset Generator (test) v9.80 (cuda10200)
x86_64-pc-linux-gnu
Peak working set size 2.19 GB
Peak swap size 9.99 GB
Peak disk usage 2.98 GB

Stderr output

<core_client_version>7.14.2</core_client_version>
<![CDATA[
<message>
aborted by user</message>
<stderr_txt>
DEBUG: Args: ../../projects/www.mlcathome.org_mlcathome/mlds-gpu_9.80_x86_64-pc-linux-gnu__cuda10200 -c -a LSTM --lr 0.001 -w 64 -b 2 -s 32 --maxepoch 192 
nthreads: 1 gpudev: 0
Re-exec()-ing to set environment correctly
Machine Learning Dataset Generator v9.80 (Linux/x86_64) (libTorch: release/1.7 GPU: GeForce GTX 1650)
[2020-11-23 02:28:06	                main:442]	:	INFO	:	Set logging level to 1
[2020-11-23 02:28:06	                main:448]	:	INFO	:	Running in BOINC Client mode
[2020-11-23 02:28:06	                main:451]	:	INFO	:	Resolving all filenames
[2020-11-23 02:28:06	                main:459]	:	INFO	:	Resolved: dataset.hdf5 => ../../projects/www.mlcathome.org_mlcathome/rand_automata_0049-train-val-dataset.hdf5 (exists = 1)
[2020-11-23 02:28:06	                main:459]	:	INFO	:	Resolved: model.cfg => ../../projects/www.mlcathome.org_mlcathome/rand_automata_0049-1606103205-5788-1_3_r710964226_1 (exists = 0)
[2020-11-23 02:28:06	                main:459]	:	INFO	:	Resolved: model-final.pt => ../../projects/www.mlcathome.org_mlcathome/rand_automata_0049-1606103205-5788-1_3_r710964226_0 (exists = 0)
[2020-11-23 02:28:06	                main:459]	:	INFO	:	Resolved: model-input.pt => ../../projects/www.mlcathome.org_mlcathome/rand_automata_0049-1606103205-5788-1 (exists = 1)
[2020-11-23 02:28:06	                main:459]	:	INFO	:	Resolved: snapshot.pt => snapshot.pt (exists = 0)
[2020-11-23 02:28:06	                main:479]	:	INFO	:	Dataset filename: ../../projects/www.mlcathome.org_mlcathome/rand_automata_0049-train-val-dataset.hdf5
[2020-11-23 02:28:06	                main:481]	:	INFO	:	Configuration: 
[2020-11-23 02:28:06	                main:482]	:	INFO	:	    Model type: LSTM
[2020-11-23 02:28:06	                main:483]	:	INFO	:	    Validation Loss Threshold: 0.0001
[2020-11-23 02:28:06	                main:484]	:	INFO	:	    Max Epochs: 192
[2020-11-23 02:28:06	                main:485]	:	INFO	:	    Batch Size: 32
[2020-11-23 02:28:06	                main:486]	:	INFO	:	    Learning Rate: 0.001
[2020-11-23 02:28:06	                main:487]	:	INFO	:	    Patience: 10
[2020-11-23 02:28:06	                main:488]	:	INFO	:	    Hidden Width: 64
[2020-11-23 02:28:06	                main:489]	:	INFO	:	    # Recurrent Layers: 4
[2020-11-23 02:28:06	                main:490]	:	INFO	:	    # Backend Layers: 2
[2020-11-23 02:28:06	                main:491]	:	INFO	:	    # Threads: 1
[2020-11-23 02:28:06	                main:493]	:	INFO	:	Preparing Dataset
[2020-11-23 02:28:06	load_hdf5_ds_into_tensor:28]	:	INFO	:	Loading Dataset /Xt from ../../projects/www.mlcathome.org_mlcathome/rand_automata_0049-train-val-dataset.hdf5 into memory
[2020-11-23 02:28:06	load_hdf5_ds_into_tensor:28]	:	INFO	:	Loading Dataset /Yt from ../../projects/www.mlcathome.org_mlcathome/rand_automata_0049-train-val-dataset.hdf5 into memory
[2020-11-23 02:28:11	                load:106]	:	INFO	:	Successfully loaded dataset of 4096 examples into memory.
[2020-11-23 02:28:11	load_hdf5_ds_into_tensor:28]	:	INFO	:	Loading Dataset /Xv from ../../projects/www.mlcathome.org_mlcathome/rand_automata_0049-train-val-dataset.hdf5 into memory
[2020-11-23 02:28:11	load_hdf5_ds_into_tensor:28]	:	INFO	:	Loading Dataset /Yv from ../../projects/www.mlcathome.org_mlcathome/rand_automata_0049-train-val-dataset.hdf5 into memory
[2020-11-23 02:28:11	                load:106]	:	INFO	:	Successfully loaded dataset of 512 examples into memory.
[2020-11-23 02:28:11	                main:501]	:	INFO	:	Creating Model
[2020-11-23 02:28:11	                main:514]	:	INFO	:	Preparing config file
[2020-11-23 02:28:11	                main:526]	:	INFO	:	Creating new config file
[2020-11-23 02:28:11	                main:545]	:	INFO	:	This is a continuation WU, loading previous network
[2020-11-23 02:28:11	                main:566]	:	INFO	:	Loading DataLoader into Memory
[2020-11-23 02:28:11	                main:569]	:	INFO	:	Starting Training
[2020-11-23 02:28:15	                main:581]	:	INFO	:	Epoch 1 | loss: 0.000134859 | val_loss: 0.000125692 | Time: 4381.29 ms
[2020-11-23 02:28:19	                main:581]	:	INFO	:	Epoch 2 | loss: 0.000121145 | val_loss: 0.000125004 | Time: 3184.23 ms
[2020-11-23 02:28:22	                main:581]	:	INFO	:	Epoch 3 | loss: 0.000121253 | val_loss: 0.000127373 | Time: 3082.89 ms
[2020-11-23 02:28:25	                main:581]	:	INFO	:	Epoch 4 | loss: 0.000120517 | val_loss: 0.000124272 | Time: 3262.51 ms
[2020-11-23 02:28:28	                main:581]	:	INFO	:	Epoch 5 | loss: 0.000119474 | val_loss: 0.000123944 | Time: 3287.47 ms
[2020-11-23 02:28:32	                main:581]	:	INFO	:	Epoch 6 | loss: 0.000118302 | val_loss: 0.000122249 | Time: 3265.82 ms
[2020-11-23 02:28:35	                main:581]	:	INFO	:	Epoch 7 | loss: 0.000117152 | val_loss: 0.000120391 | Time: 3082.16 ms
[2020-11-23 02:28:38	                main:581]	:	INFO	:	Epoch 8 | loss: 0.000115891 | val_loss: 0.000118105 | Time: 3114.23 ms
[2020-11-23 02:28:41	                main:581]	:	INFO	:	Epoch 9 | loss: 0.00011539 | val_loss: 0.000117459 | Time: 3202.89 ms

</stderr_txt>
]]>


©2022 MLC@Home Team
A project of the Cognition, Robotics, and Learning (CORAL) Lab at the University of Maryland, Baltimore County (UMBC)