Task 3362770

Name ParityModified-1605453698-22768-5_1
Workunit 1542111
Created 20 Dec 2020, 3:46:53 UTC
Sent 20 Dec 2020, 4:55:16 UTC
Report deadline 27 Dec 2020, 4:55:16 UTC
Received 21 Dec 2020, 13:52:14 UTC
Server state Over
Outcome Computation error
Client state Compute error
Exit status 197 (0x000000C5) EXIT_TIME_LIMIT_EXCEEDED
Computer ID 5242
Run time 2 hours 48 min 40 sec
CPU time 35 sec
Validate state Invalid
Credit 0.00
Device peak FLOPS 1,508.42 GFLOPS
Application version Machine Learning Dataset Generator (GPU) v9.75 (cuda10200)
windows_x86_64
Peak working set size 1.55 GB
Peak swap size 3.49 GB
Peak disk usage 1.53 GB

Stderr output

<core_client_version>7.14.2</core_client_version>
<![CDATA[
<message>
exceeded elapsed time limit 10111.70 (1000000.00G/98.90G)</message>
<stderr_txt>
Machine Learning Dataset Generator v9.75 (Windows/x64) (libTorch: release/1.6 GPU: GeForce GTX 960M)
[2020-12-21 02:07:42	                main:435]	:	INFO	:	Set logging level to 1
[2020-12-21 02:07:42	                main:441]	:	INFO	:	Running in BOINC Client mode
[2020-12-21 02:07:42	                main:444]	:	INFO	:	Resolving all filenames
[2020-12-21 02:07:42	                main:452]	:	INFO	:	Resolved: dataset.hdf5 => ../../projects/www.mlcathome.org_mlcathome/ParityModified-train-val-dataset.hdf5 (exists = 1)
[2020-12-21 02:07:42	                main:452]	:	INFO	:	Resolved: model.cfg => ../../projects/www.mlcathome.org_mlcathome/ParityModified-1605453698-22768-5_1_r1476732121_1 (exists = 0)
[2020-12-21 02:07:42	                main:452]	:	INFO	:	Resolved: model-final.pt => ../../projects/www.mlcathome.org_mlcathome/ParityModified-1605453698-22768-5_1_r1476732121_0 (exists = 0)
[2020-12-21 02:07:42	                main:452]	:	INFO	:	Resolved: model-input.pt => ../../projects/www.mlcathome.org_mlcathome/ParityModified-1605453698-22768-5 (exists = 1)
[2020-12-21 02:07:42	                main:452]	:	INFO	:	Resolved: snapshot.pt => snapshot.pt (exists = 0)
[2020-12-21 02:07:42	                main:472]	:	INFO	:	Dataset filename: ../../projects/www.mlcathome.org_mlcathome/ParityModified-train-val-dataset.hdf5
[2020-12-21 02:07:42	                main:474]	:	INFO	:	Configuration: 
[2020-12-21 02:07:42	                main:475]	:	INFO	:	    Model type: GRU
[2020-12-21 02:07:42	                main:476]	:	INFO	:	    Validation Loss Threshold: 0.0001
[2020-12-21 02:07:42	                main:477]	:	INFO	:	    Max Epochs: 1024
[2020-12-21 02:07:42	                main:478]	:	INFO	:	    Batch Size: 128
[2020-12-21 02:07:42	                main:479]	:	INFO	:	    Learning Rate: 0.01
[2020-12-21 02:07:42	                main:480]	:	INFO	:	    Patience: 10
[2020-12-21 02:07:42	                main:481]	:	INFO	:	    Hidden Width: 12
[2020-12-21 02:07:42	                main:482]	:	INFO	:	    # Recurrent Layers: 4
[2020-12-21 02:07:42	                main:483]	:	INFO	:	    # Backend Layers: 4
[2020-12-21 02:07:42	                main:484]	:	INFO	:	    # Threads: 1
[2020-12-21 02:07:42	                main:486]	:	INFO	:	Preparing Dataset
[2020-12-21 02:07:42	load_hdf5_ds_into_tensor:28]	:	INFO	:	Loading Dataset /Xt from ../../projects/www.mlcathome.org_mlcathome/ParityModified-train-val-dataset.hdf5 into memory
[2020-12-21 02:09:19	load_hdf5_ds_into_tensor:28]	:	INFO	:	Loading Dataset /Yt from ../../projects/www.mlcathome.org_mlcathome/ParityModified-train-val-dataset.hdf5 into memory
[2020-12-21 02:10:15	                load:106]	:	INFO	:	Successfully loaded dataset of 2048 examples into memory.
[2020-12-21 02:10:15	load_hdf5_ds_into_tensor:28]	:	INFO	:	Loading Dataset /Xv from ../../projects/www.mlcathome.org_mlcathome/ParityModified-train-val-dataset.hdf5 into memory
[2020-12-21 02:10:15	load_hdf5_ds_into_tensor:28]	:	INFO	:	Loading Dataset /Yv from ../../projects/www.mlcathome.org_mlcathome/ParityModified-train-val-dataset.hdf5 into memory
[2020-12-21 02:11:04	                load:106]	:	INFO	:	Successfully loaded dataset of 512 examples into memory.
[2020-12-21 02:11:04	                main:494]	:	INFO	:	Creating Model
[2020-12-21 02:11:04	                main:507]	:	INFO	:	Preparing config file
[2020-12-21 02:11:04	                main:519]	:	INFO	:	Creating new config file
[2020-12-21 02:11:04	                main:538]	:	INFO	:	This is a continuation WU, loading previous network
[2020-12-21 02:11:06	                main:559]	:	INFO	:	Loading DataLoader into Memory
[2020-12-21 02:11:06	                main:562]	:	INFO	:	Starting Training
[2020-12-21 02:15:53	                main:574]	:	INFO	:	Epoch 1 | loss: 0.0311342 | val_loss: 0.0311585 | Time: 287409 ms
[2020-12-21 02:21:36	                main:574]	:	INFO	:	Epoch 2 | loss: 0.0311136 | val_loss: 0.031147 | Time: 341974 ms
[2020-12-21 02:29:00	                main:574]	:	INFO	:	Epoch 3 | loss: 0.0311065 | val_loss: 0.0311478 | Time: 444074 ms
[2020-12-21 02:35:48	                main:574]	:	INFO	:	Epoch 4 | loss: 0.0311046 | val_loss: 0.0311451 | Time: 408447 ms
[2020-12-21 02:42:43	                main:574]	:	INFO	:	Epoch 5 | loss: 0.0311037 | val_loss: 0.0311499 | Time: 414392 ms
[2020-12-21 02:49:20	                main:574]	:	INFO	:	Epoch 6 | loss: 0.0311011 | val_loss: 0.031154 | Time: 397765 ms
[2020-12-21 02:56:15	                main:574]	:	INFO	:	Epoch 7 | loss: 0.0311004 | val_loss: 0.0311465 | Time: 414400 ms
[2020-12-21 03:06:13	                main:574]	:	INFO	:	Epoch 8 | loss: 0.0311011 | val_loss: 0.0311445 | Time: 598449 ms
[2020-12-21 03:13:54	                main:574]	:	INFO	:	Epoch 9 | loss: 0.0310989 | val_loss: 0.0311523 | Time: 460716 ms
[2020-12-21 03:23:47	                main:574]	:	INFO	:	Epoch 10 | loss: 0.0310977 | val_loss: 0.0311498 | Time: 592500 ms
[2020-12-21 03:33:36	                main:574]	:	INFO	:	Epoch 11 | loss: 0.0310972 | val_loss: 0.0311524 | Time: 588955 ms
[2020-12-21 03:44:30	                main:574]	:	INFO	:	Epoch 12 | loss: 0.0310959 | val_loss: 0.0311522 | Time: 654273 ms
[2020-12-21 03:51:33	                main:574]	:	INFO	:	Epoch 13 | loss: 0.0311004 | val_loss: 0.031151 | Time: 422678 ms
[2020-12-21 03:57:54	                main:574]	:	INFO	:	Epoch 14 | loss: 0.0311031 | val_loss: 0.031148 | Time: 381168 ms
[2020-12-21 04:04:14	                main:574]	:	INFO	:	Epoch 15 | loss: 0.0311048 | val_loss: 0.031157 | Time: 379962 ms
[2020-12-21 04:10:15	                main:574]	:	INFO	:	Epoch 16 | loss: 0.0311005 | val_loss: 0.0311523 | Time: 360969 ms
[2020-12-21 04:19:52	                main:574]	:	INFO	:	Epoch 17 | loss: 0.0311034 | val_loss: 0.0311586 | Time: 577078 ms
[2020-12-21 04:28:12	                main:574]	:	INFO	:	Epoch 18 | loss: 0.0311002 | val_loss: 0.0311497 | Time: 499900 ms
[2020-12-21 04:41:09	                main:574]	:	INFO	:	Epoch 19 | loss: 0.0310965 | val_loss: 0.0311539 | Time: 776574 ms


Unhandled Exception Detected...

- Unhandled Exception Record -
Reason: Breakpoint Encountered (0x80000003) at address 0x00007FFBB8619AD2

Engaging BOINC Windows Runtime Debugger...

[2020-12-21 04:51:36	                main:574]	:	INFO	:	Epoch 20 | loss: 0.0310952 | val_loss: 0.0311527 | Time: 626977 ms


********************


BOINC Windows Runtime Debugger Version 7.17.0


Dump Timestamp    : 12/21/20 04:51:37
Install Directory : C:\Program Files\BOINC\
Data Directory    : C:\ProgramData\BOINC
Project Symstore  : 
LoadLibraryA( C:\ProgramData\BOINC\dbghelp.dll ): GetLastError = 126
Loaded Library    : dbghelp.dll
LoadLibraryA( C:\ProgramData\BOINC\symsrv.dll ): GetLastError = 126
LoadLibraryA( symsrv.dll ): GetLastError = 126
LoadLibraryA( C:\ProgramData\BOINC\srcsrv.dll ): GetLastError = 126
LoadLibraryA( srcsrv.dll ): GetLastError = 126
LoadLibraryA( C:\ProgramData\BOINC\version.dll ): GetLastError = 126
Loaded Library    : version.dll

</stderr_txt>
]]>


©2022 MLC@Home Team
A project of the Cognition, Robotics, and Learning (CORAL) Lab at the University of Maryland, Baltimore County (UMBC)