Task 3507180

Name ParityModified-1606530440-31170-8_3
Workunit 1695353
Created 2 Jan 2021, 19:25:16 UTC
Sent 2 Jan 2021, 20:15:30 UTC
Report deadline 9 Jan 2021, 20:15:30 UTC
Received 2 Jan 2021, 21:03:56 UTC
Server state Over
Outcome Computation error
Client state Aborted by user
Exit status 203 (0x000000CB) EXIT_ABORTED_VIA_GUI
Computer ID 7050
Run time 1 min 31 sec
CPU time 1 min 29 sec
Validate state Invalid
Credit 0.00
Device peak FLOPS 4,233.48 GFLOPS
Application version Machine Learning Dataset Generator (GPU) v9.80 (cuda10200)
x86_64-pc-linux-gnu
Peak working set size 1.83 GB
Peak swap size 13.39 GB
Peak disk usage 2.98 GB

Stderr output

<core_client_version>7.16.6</core_client_version>
<![CDATA[
<message>
aborted by user</message>
<stderr_txt>
DEBUG: Args: ../../projects/www.mlcathome.org_mlcathome/mlds-gpu_9.80_x86_64-pc-linux-gnu__cuda10200 -c --maxepoch 1024 
nthreads: 1 gpudev: 0
Re-exec()-ing to set environment correctly
Machine Learning Dataset Generator v9.80 (Linux/x86_64) (libTorch: release/1.7 GPU: GeForce GTX 1060 3GB)
[2021-01-02 21:46:23	                main:442]	:	INFO	:	Set logging level to 1
[2021-01-02 21:46:23	                main:448]	:	INFO	:	Running in BOINC Client mode
[2021-01-02 21:46:23	                main:451]	:	INFO	:	Resolving all filenames
[2021-01-02 21:46:23	                main:459]	:	INFO	:	Resolved: dataset.hdf5 => ../../projects/www.mlcathome.org_mlcathome/ParityModified-train-val-dataset.hdf5 (exists = 1)
[2021-01-02 21:46:23	                main:459]	:	INFO	:	Resolved: model.cfg => ../../projects/www.mlcathome.org_mlcathome/ParityModified-1606530440-31170-8_3_r306304352_1 (exists = 0)
[2021-01-02 21:46:23	                main:459]	:	INFO	:	Resolved: model-final.pt => ../../projects/www.mlcathome.org_mlcathome/ParityModified-1606530440-31170-8_3_r306304352_0 (exists = 0)
[2021-01-02 21:46:23	                main:459]	:	INFO	:	Resolved: model-input.pt => ../../projects/www.mlcathome.org_mlcathome/ParityModified-1606530440-31170-8 (exists = 1)
[2021-01-02 21:46:23	                main:459]	:	INFO	:	Resolved: snapshot.pt => snapshot.pt (exists = 0)
[2021-01-02 21:46:23	                main:479]	:	INFO	:	Dataset filename: ../../projects/www.mlcathome.org_mlcathome/ParityModified-train-val-dataset.hdf5
[2021-01-02 21:46:23	                main:481]	:	INFO	:	Configuration: 
[2021-01-02 21:46:23	                main:482]	:	INFO	:	    Model type: GRU
[2021-01-02 21:46:23	                main:483]	:	INFO	:	    Validation Loss Threshold: 0.0001
[2021-01-02 21:46:23	                main:484]	:	INFO	:	    Max Epochs: 1024
[2021-01-02 21:46:23	                main:485]	:	INFO	:	    Batch Size: 128
[2021-01-02 21:46:23	                main:486]	:	INFO	:	    Learning Rate: 0.01
[2021-01-02 21:46:23	                main:487]	:	INFO	:	    Patience: 10
[2021-01-02 21:46:23	                main:488]	:	INFO	:	    Hidden Width: 12
[2021-01-02 21:46:23	                main:489]	:	INFO	:	    # Recurrent Layers: 4
[2021-01-02 21:46:23	                main:490]	:	INFO	:	    # Backend Layers: 4
[2021-01-02 21:46:23	                main:491]	:	INFO	:	    # Threads: 1
[2021-01-02 21:46:23	                main:493]	:	INFO	:	Preparing Dataset
[2021-01-02 21:46:23	load_hdf5_ds_into_tensor:28]	:	INFO	:	Loading Dataset /Xt from ../../projects/www.mlcathome.org_mlcathome/ParityModified-train-val-dataset.hdf5 into memory
[2021-01-02 21:46:23	load_hdf5_ds_into_tensor:28]	:	INFO	:	Loading Dataset /Yt from ../../projects/www.mlcathome.org_mlcathome/ParityModified-train-val-dataset.hdf5 into memory
[2021-01-02 21:46:24	                load:106]	:	INFO	:	Successfully loaded dataset of 2048 examples into memory.
[2021-01-02 21:46:24	load_hdf5_ds_into_tensor:28]	:	INFO	:	Loading Dataset /Xv from ../../projects/www.mlcathome.org_mlcathome/ParityModified-train-val-dataset.hdf5 into memory
[2021-01-02 21:46:24	load_hdf5_ds_into_tensor:28]	:	INFO	:	Loading Dataset /Yv from ../../projects/www.mlcathome.org_mlcathome/ParityModified-train-val-dataset.hdf5 into memory
[2021-01-02 21:46:24	                load:106]	:	INFO	:	Successfully loaded dataset of 512 examples into memory.
[2021-01-02 21:46:24	                main:501]	:	INFO	:	Creating Model
[2021-01-02 21:46:24	                main:514]	:	INFO	:	Preparing config file
[2021-01-02 21:46:24	                main:526]	:	INFO	:	Creating new config file
[2021-01-02 21:46:24	                main:545]	:	INFO	:	This is a continuation WU, loading previous network
[2021-01-02 21:46:25	                main:566]	:	INFO	:	Loading DataLoader into Memory
[2021-01-02 21:46:25	                main:569]	:	INFO	:	Starting Training
[2021-01-02 21:46:27	                main:581]	:	INFO	:	Epoch 1 | loss: 0.0313184 | val_loss: 0.0311949 | Time: 1886.54 ms
[2021-01-02 21:46:28	                main:581]	:	INFO	:	Epoch 2 | loss: 0.0311461 | val_loss: 0.0311439 | Time: 1649.76 ms
[2021-01-02 21:46:30	                main:581]	:	INFO	:	Epoch 3 | loss: 0.0311273 | val_loss: 0.0311394 | Time: 1652.28 ms
[2021-01-02 21:46:31	                main:581]	:	INFO	:	Epoch 4 | loss: 0.0311244 | val_loss: 0.0311373 | Time: 1616.81 ms
[2021-01-02 21:46:33	                main:581]	:	INFO	:	Epoch 5 | loss: 0.031119 | val_loss: 0.0311353 | Time: 1657.28 ms
[2021-01-02 21:46:35	                main:581]	:	INFO	:	Epoch 6 | loss: 0.0311183 | val_loss: 0.0311396 | Time: 1650.92 ms
[2021-01-02 21:46:36	                main:581]	:	INFO	:	Epoch 7 | loss: nan | val_loss: nan | Time: 1649.45 ms
[2021-01-02 21:46:38	                main:581]	:	INFO	:	Epoch 8 | loss: nan | val_loss: nan | Time: 1648.15 ms
[2021-01-02 21:46:40	                main:581]	:	INFO	:	Epoch 9 | loss: nan | val_loss: nan | Time: 1649.92 ms
[2021-01-02 21:46:41	                main:581]	:	INFO	:	Epoch 10 | loss: nan | val_loss: nan | Time: 1654.78 ms
[2021-01-02 21:46:43	                main:581]	:	INFO	:	Epoch 11 | loss: nan | val_loss: nan | Time: 1647.22 ms
[2021-01-02 21:46:45	                main:581]	:	INFO	:	Epoch 12 | loss: nan | val_loss: nan | Time: 1649.81 ms
[2021-01-02 21:46:46	                main:581]	:	INFO	:	Epoch 13 | loss: nan | val_loss: nan | Time: 1647.6 ms
[2021-01-02 21:46:48	                main:581]	:	INFO	:	Epoch 14 | loss: nan | val_loss: nan | Time: 1649.09 ms
[2021-01-02 21:46:50	                main:581]	:	INFO	:	Epoch 15 | loss: nan | val_loss: nan | Time: 1647.33 ms
[2021-01-02 21:46:51	                main:581]	:	INFO	:	Epoch 16 | loss: nan | val_loss: nan | Time: 1648.68 ms
[2021-01-02 21:46:53	                main:581]	:	INFO	:	Epoch 17 | loss: nan | val_loss: nan | Time: 1647.41 ms
[2021-01-02 21:46:55	                main:581]	:	INFO	:	Epoch 18 | loss: nan | val_loss: nan | Time: 1647.04 ms
[2021-01-02 21:46:56	                main:581]	:	INFO	:	Epoch 19 | loss: nan | val_loss: nan | Time: 1647.69 ms
[2021-01-02 21:46:58	                main:581]	:	INFO	:	Epoch 20 | loss: nan | val_loss: nan | Time: 1649 ms
[2021-01-02 21:46:59	                main:581]	:	INFO	:	Epoch 21 | loss: nan | val_loss: nan | Time: 1648.07 ms
[2021-01-02 21:47:01	                main:581]	:	INFO	:	Epoch 22 | loss: nan | val_loss: nan | Time: 1646.52 ms
[2021-01-02 21:47:03	                main:581]	:	INFO	:	Epoch 23 | loss: nan | val_loss: nan | Time: 1648.33 ms
[2021-01-02 21:47:04	                main:581]	:	INFO	:	Epoch 24 | loss: nan | val_loss: nan | Time: 1646.63 ms
[2021-01-02 21:47:06	                main:581]	:	INFO	:	Epoch 25 | loss: nan | val_loss: nan | Time: 1647.81 ms
[2021-01-02 21:47:08	                main:581]	:	INFO	:	Epoch 26 | loss: nan | val_loss: nan | Time: 1649.33 ms
[2021-01-02 21:47:09	                main:581]	:	INFO	:	Epoch 27 | loss: nan | val_loss: nan | Time: 1647.25 ms
[2021-01-02 21:47:11	                main:581]	:	INFO	:	Epoch 28 | loss: nan | val_loss: nan | Time: 1647.12 ms
[2021-01-02 21:47:13	                main:581]	:	INFO	:	Epoch 29 | loss: nan | val_loss: nan | Time: 1649.7 ms
[2021-01-02 21:47:14	                main:581]	:	INFO	:	Epoch 30 | loss: nan | val_loss: nan | Time: 1655.7 ms
[2021-01-02 21:47:16	                main:581]	:	INFO	:	Epoch 31 | loss: nan | val_loss: nan | Time: 1646.46 ms
[2021-01-02 21:47:18	                main:581]	:	INFO	:	Epoch 32 | loss: nan | val_loss: nan | Time: 1648.44 ms
[2021-01-02 21:47:19	                main:581]	:	INFO	:	Epoch 33 | loss: nan | val_loss: nan | Time: 1650.46 ms
[2021-01-02 21:47:21	                main:581]	:	INFO	:	Epoch 34 | loss: nan | val_loss: nan | Time: 1649.04 ms
[2021-01-02 21:47:23	                main:581]	:	INFO	:	Epoch 35 | loss: nan | val_loss: nan | Time: 1648.13 ms
[2021-01-02 21:47:24	                main:581]	:	INFO	:	Epoch 36 | loss: nan | val_loss: nan | Time: 1647.3 ms
[2021-01-02 21:47:26	                main:581]	:	INFO	:	Epoch 37 | loss: nan | val_loss: nan | Time: 1648.7 ms
[2021-01-02 21:47:27	                main:581]	:	INFO	:	Epoch 38 | loss: nan | val_loss: nan | Time: 1647.15 ms
[2021-01-02 21:47:29	                main:581]	:	INFO	:	Epoch 39 | loss: nan | val_loss: nan | Time: 1647.04 ms
[2021-01-02 21:47:31	                main:581]	:	INFO	:	Epoch 40 | loss: nan | val_loss: nan | Time: 1648.85 ms
[2021-01-02 21:47:32	                main:581]	:	INFO	:	Epoch 41 | loss: nan | val_loss: nan | Time: 1646.74 ms
[2021-01-02 21:47:34	                main:581]	:	INFO	:	Epoch 42 | loss: nan | val_loss: nan | Time: 1648.5 ms
[2021-01-02 21:47:36	                main:581]	:	INFO	:	Epoch 43 | loss: nan | val_loss: nan | Time: 1649.1 ms
[2021-01-02 21:47:37	                main:581]	:	INFO	:	Epoch 44 | loss: nan | val_loss: nan | Time: 1649.35 ms
[2021-01-02 21:47:39	                main:581]	:	INFO	:	Epoch 45 | loss: nan | val_loss: nan | Time: 1645.62 ms
[2021-01-02 21:47:41	                main:581]	:	INFO	:	Epoch 46 | loss: nan | val_loss: nan | Time: 1649.23 ms
[2021-01-02 21:47:42	                main:581]	:	INFO	:	Epoch 47 | loss: nan | val_loss: nan | Time: 1656.66 ms
[2021-01-02 21:47:44	                main:581]	:	INFO	:	Epoch 48 | loss: nan | val_loss: nan | Time: 1648.08 ms
[2021-01-02 21:47:46	                main:581]	:	INFO	:	Epoch 49 | loss: nan | val_loss: nan | Time: 1647.67 ms
[2021-01-02 21:47:47	                main:581]	:	INFO	:	Epoch 50 | loss: nan | val_loss: nan | Time: 1648.66 ms
[2021-01-02 21:47:49	                main:581]	:	INFO	:	Epoch 51 | loss: nan | val_loss: nan | Time: 1649.52 ms
[2021-01-02 21:47:51	                main:581]	:	INFO	:	Epoch 52 | loss: nan | val_loss: nan | Time: 1646.78 ms
[2021-01-02 21:47:52	                main:581]	:	INFO	:	Epoch 53 | loss: nan | val_loss: nan | Time: 1648.48 ms

</stderr_txt>
]]>


©2022 MLC@Home Team
A project of the Cognition, Robotics, and Learning (CORAL) Lab at the University of Maryland, Baltimore County (UMBC)