Task 3317878

Name ParityModified-1608176525-21512-0_0
Workunit 1584810
Created 17 Dec 2020, 3:42:10 UTC
Sent 18 Dec 2020, 2:17:53 UTC
Report deadline 26 Dec 2020, 2:17:53 UTC
Received 18 Dec 2020, 2:42:52 UTC
Server state Over
Outcome Computation error
Client state Aborted by user
Exit status 203 (0x000000CB) EXIT_ABORTED_VIA_GUI
Computer ID 5489
Run time 54 sec
CPU time 53 sec
Validate state Invalid
Credit 0.00
Device peak FLOPS 6,857.74 GFLOPS
Application version Machine Learning Dataset Generator (GPU) v9.80 (cuda10200)
x86_64-pc-linux-gnu
Peak working set size 1.83 GB
Peak swap size 13.43 GB
Peak disk usage 2.98 GB

Stderr output

<core_client_version>7.16.14</core_client_version>
<![CDATA[
<message>
aborted by user</message>
<stderr_txt>
DEBUG: Args: ../../projects/www.mlcathome.org_mlcathome/mlds-gpu_9.80_x86_64-pc-linux-gnu__cuda10200 --maxepoch 2048 
nthreads: 1 gpudev: 0
Re-exec()-ing to set environment correctly
Machine Learning Dataset Generator v9.80 (Linux/x86_64) (libTorch: release/1.7 GPU: GeForce GTX 1070)
[2020-12-17 21:21:02	                main:442]	:	INFO	:	Set logging level to 1
[2020-12-17 21:21:02	                main:448]	:	INFO	:	Running in BOINC Client mode
[2020-12-17 21:21:02	                main:451]	:	INFO	:	Resolving all filenames
[2020-12-17 21:21:02	                main:459]	:	INFO	:	Resolved: dataset.hdf5 => ../../projects/www.mlcathome.org_mlcathome/ParityModified-train-val-dataset.hdf5 (exists = 1)
[2020-12-17 21:21:02	                main:459]	:	INFO	:	Resolved: model.cfg => ../../projects/www.mlcathome.org_mlcathome/ParityModified-1608176525-21512-0_0_r144292394_1 (exists = 0)
[2020-12-17 21:21:02	                main:459]	:	INFO	:	Resolved: model-final.pt => ../../projects/www.mlcathome.org_mlcathome/ParityModified-1608176525-21512-0_0_r144292394_0 (exists = 0)
[2020-12-17 21:21:02	                main:459]	:	INFO	:	Resolved: model-input.pt => model-input.pt (exists = 0)
[2020-12-17 21:21:02	                main:459]	:	INFO	:	Resolved: snapshot.pt => snapshot.pt (exists = 0)
[2020-12-17 21:21:02	                main:479]	:	INFO	:	Dataset filename: ../../projects/www.mlcathome.org_mlcathome/ParityModified-train-val-dataset.hdf5
[2020-12-17 21:21:02	                main:481]	:	INFO	:	Configuration: 
[2020-12-17 21:21:02	                main:482]	:	INFO	:	    Model type: GRU
[2020-12-17 21:21:02	                main:483]	:	INFO	:	    Validation Loss Threshold: 0.0001
[2020-12-17 21:21:02	                main:484]	:	INFO	:	    Max Epochs: 2048
[2020-12-17 21:21:02	                main:485]	:	INFO	:	    Batch Size: 128
[2020-12-17 21:21:02	                main:486]	:	INFO	:	    Learning Rate: 0.01
[2020-12-17 21:21:02	                main:487]	:	INFO	:	    Patience: 10
[2020-12-17 21:21:02	                main:488]	:	INFO	:	    Hidden Width: 12
[2020-12-17 21:21:02	                main:489]	:	INFO	:	    # Recurrent Layers: 4
[2020-12-17 21:21:02	                main:490]	:	INFO	:	    # Backend Layers: 4
[2020-12-17 21:21:02	                main:491]	:	INFO	:	    # Threads: 1
[2020-12-17 21:21:02	                main:493]	:	INFO	:	Preparing Dataset
[2020-12-17 21:21:02	load_hdf5_ds_into_tensor:28]	:	INFO	:	Loading Dataset /Xt from ../../projects/www.mlcathome.org_mlcathome/ParityModified-train-val-dataset.hdf5 into memory
[2020-12-17 21:21:02	load_hdf5_ds_into_tensor:28]	:	INFO	:	Loading Dataset /Yt from ../../projects/www.mlcathome.org_mlcathome/ParityModified-train-val-dataset.hdf5 into memory
[2020-12-17 21:21:04	                load:106]	:	INFO	:	Successfully loaded dataset of 2048 examples into memory.
[2020-12-17 21:21:04	load_hdf5_ds_into_tensor:28]	:	INFO	:	Loading Dataset /Xv from ../../projects/www.mlcathome.org_mlcathome/ParityModified-train-val-dataset.hdf5 into memory
[2020-12-17 21:21:04	load_hdf5_ds_into_tensor:28]	:	INFO	:	Loading Dataset /Yv from ../../projects/www.mlcathome.org_mlcathome/ParityModified-train-val-dataset.hdf5 into memory
[2020-12-17 21:21:04	                load:106]	:	INFO	:	Successfully loaded dataset of 512 examples into memory.
[2020-12-17 21:21:04	                main:501]	:	INFO	:	Creating Model
[2020-12-17 21:21:04	                main:514]	:	INFO	:	Preparing config file
[2020-12-17 21:21:04	                main:526]	:	INFO	:	Creating new config file
[2020-12-17 21:21:05	                main:566]	:	INFO	:	Loading DataLoader into Memory
[2020-12-17 21:21:05	                main:569]	:	INFO	:	Starting Training
[2020-12-17 21:21:08	                main:581]	:	INFO	:	Epoch 1 | loss: 0.0478791 | val_loss: 0.0323086 | Time: 3231.54 ms
[2020-12-17 21:21:11	                main:581]	:	INFO	:	Epoch 2 | loss: 0.0317861 | val_loss: 0.031511 | Time: 2611.13 ms
[2020-12-17 21:21:13	                main:581]	:	INFO	:	Epoch 3 | loss: 0.0313577 | val_loss: 0.0312794 | Time: 2609.33 ms
[2020-12-17 21:21:16	                main:581]	:	INFO	:	Epoch 4 | loss: 0.0312699 | val_loss: 0.0312669 | Time: 2605.74 ms
[2020-12-17 21:21:18	                main:581]	:	INFO	:	Epoch 5 | loss: 0.0312569 | val_loss: 0.0312535 | Time: 2614.11 ms
[2020-12-17 21:21:21	                main:581]	:	INFO	:	Epoch 6 | loss: 0.0312543 | val_loss: 0.0312622 | Time: 2607.34 ms
[2020-12-17 21:21:24	                main:581]	:	INFO	:	Epoch 7 | loss: 0.0312573 | val_loss: 0.0312577 | Time: 2606.03 ms
[2020-12-17 21:21:26	                main:581]	:	INFO	:	Epoch 8 | loss: 0.031253 | val_loss: 0.0312529 | Time: 2607.83 ms
[2020-12-17 21:21:29	                main:581]	:	INFO	:	Epoch 9 | loss: 0.0312519 | val_loss: 0.0312512 | Time: 2598.38 ms
[2020-12-17 21:21:32	                main:581]	:	INFO	:	Epoch 10 | loss: 0.0312524 | val_loss: 0.0312515 | Time: 2611.66 ms
[2020-12-17 21:21:34	                main:581]	:	INFO	:	Epoch 11 | loss: 0.0312509 | val_loss: 0.0312518 | Time: 2608.9 ms
[2020-12-17 21:21:37	                main:581]	:	INFO	:	Epoch 12 | loss: 0.0312534 | val_loss: 0.0312508 | Time: 2606.59 ms
[2020-12-17 21:21:39	                main:581]	:	INFO	:	Epoch 13 | loss: 0.0312507 | val_loss: 0.0312509 | Time: 2604.27 ms
[2020-12-17 21:21:42	                main:581]	:	INFO	:	Epoch 14 | loss: 0.0312507 | val_loss: 0.0312529 | Time: 2602.74 ms
[2020-12-17 21:21:45	                main:581]	:	INFO	:	Epoch 15 | loss: 0.0312528 | val_loss: 0.0312504 | Time: 2595.94 ms
[2020-12-17 21:21:47	                main:581]	:	INFO	:	Epoch 16 | loss: 0.0312507 | val_loss: 0.0312524 | Time: 2609.39 ms
[2020-12-17 21:21:50	                main:581]	:	INFO	:	Epoch 17 | loss: 0.0312517 | val_loss: 0.0312503 | Time: 2608.38 ms
[2020-12-17 21:21:52	                main:581]	:	INFO	:	Epoch 18 | loss: 0.0312512 | val_loss: 0.0312514 | Time: 2597.81 ms
[2020-12-17 21:21:55	                main:581]	:	INFO	:	Epoch 19 | loss: 0.031253 | val_loss: 0.0312502 | Time: 2606.5 ms
SIGSEGV: segmentation violation

</stderr_txt>
]]>


©2022 MLC@Home Team
A project of the Cognition, Robotics, and Learning (CORAL) Lab at the University of Maryland, Baltimore County (UMBC)