Task 5096526

Name ParityModified-1607233792-4279-56_2
Workunit 3057528
Created 27 Apr 2021, 1:00:14 UTC
Sent 27 Apr 2021, 1:08:17 UTC
Report deadline 4 May 2021, 1:08:17 UTC
Received 27 Apr 2021, 1:11:49 UTC
Server state Over
Outcome Computation error
Client state Compute error
Exit status 193 (0x000000C1) EXIT_SIGNAL
Computer ID 218
Run time
CPU time
Validate state Invalid
Credit 0.00
Device peak FLOPS 11,403.35 GFLOPS
Application version Machine Learning Dataset Generator (test) v9.80 (amdrocm)
x86_64-pc-linux-gnu

Stderr output

<core_client_version>7.16.16</core_client_version>
<![CDATA[
<message>
process exited with code 193 (0xc1, -63)</message>
<stderr_txt>
DEBUG: Args: ../../projects/www.mlcathome.org_mlcathome/mlds-gpu_9.80_x86_64-pc-linux-gnu__rocm -c --maxepoch 128 
nthreads: 1 gpudev: 0
Re-exec()-ing to set environment correctly
21:08:56 (1259613): start_timer_thread(): pthread_create(): 22Machine Learning Dataset Generator v9.80 (Linux/x86_64) (libTorch: release/1.7 GPU: Vega 10 XL/XT [Radeon RX Vega 56/64])
[2021-04-26 21:08:56	                main:442]	:	INFO	:	Set logging level to 1
[2021-04-26 21:08:56	                main:448]	:	INFO	:	Running in BOINC Client mode
[2021-04-26 21:08:56	                main:451]	:	INFO	:	Resolving all filenames
[2021-04-26 21:08:56	                main:459]	:	INFO	:	Resolved: dataset.hdf5 => ../../projects/www.mlcathome.org_mlcathome/ParityModified-train-val-dataset.hdf5 (exists = 1)
[2021-04-26 21:08:56	                main:459]	:	INFO	:	Resolved: model.cfg => ../../projects/www.mlcathome.org_mlcathome/ParityModified-1607233792-4279-56_2_r73522781_1 (exists = 0)
[2021-04-26 21:08:56	                main:459]	:	INFO	:	Resolved: model-final.pt => ../../projects/www.mlcathome.org_mlcathome/ParityModified-1607233792-4279-56_2_r73522781_0 (exists = 0)
[2021-04-26 21:08:56	                main:459]	:	INFO	:	Resolved: model-input.pt => ../../projects/www.mlcathome.org_mlcathome/ParityModified-1607233792-4279-56 (exists = 1)
[2021-04-26 21:08:56	                main:459]	:	INFO	:	Resolved: snapshot.pt => snapshot.pt (exists = 0)
[2021-04-26 21:08:56	                main:479]	:	INFO	:	Dataset filename: ../../projects/www.mlcathome.org_mlcathome/ParityModified-train-val-dataset.hdf5
[2021-04-26 21:08:56	                main:481]	:	INFO	:	Configuration: 
[2021-04-26 21:08:56	                main:482]	:	INFO	:	    Model type: GRU
[2021-04-26 21:08:56	                main:483]	:	INFO	:	    Validation Loss Threshold: 0.0001
[2021-04-26 21:08:56	                main:484]	:	INFO	:	    Max Epochs: 128
[2021-04-26 21:08:56	                main:485]	:	INFO	:	    Batch Size: 128
[2021-04-26 21:08:56	                main:486]	:	INFO	:	    Learning Rate: 0.01
[2021-04-26 21:08:56	                main:487]	:	INFO	:	    Patience: 10
[2021-04-26 21:08:56	                main:488]	:	INFO	:	    Hidden Width: 12
[2021-04-26 21:08:56	                main:489]	:	INFO	:	    # Recurrent Layers: 4
[2021-04-26 21:08:56	                main:490]	:	INFO	:	    # Backend Layers: 4
[2021-04-26 21:08:56	                main:491]	:	INFO	:	    # Threads: 1
[2021-04-26 21:08:56	                main:493]	:	INFO	:	Preparing Dataset
[2021-04-26 21:08:56	load_hdf5_ds_into_tensor:28]	:	INFO	:	Loading Dataset /Xt from ../../projects/www.mlcathome.org_mlcathome/ParityModified-train-val-dataset.hdf5 into memory
[2021-04-26 21:08:57	load_hdf5_ds_into_tensor:28]	:	INFO	:	Loading Dataset /Yt from ../../projects/www.mlcathome.org_mlcathome/ParityModified-train-val-dataset.hdf5 into memory
free(): invalid pointer
SIGABRT: abort called
Stack trace (31 frames):
../../projects/www.mlcathome.org_mlcathome/mlds-gpu_9.80_x86_64-pc-linux-gnu__rocm(+0x37f44c)[0x55a98fd7f44c]
/lib/x86_64-linux-gnu/libpthread.so.0(+0x141f0)[0x7f744336a1f0]
/lib/x86_64-linux-gnu/libc.so.6(gsignal+0xcb)[0x7f741215cfbb]
/lib/x86_64-linux-gnu/libc.so.6(abort+0x116)[0x7f7412142864]
/lib/x86_64-linux-gnu/libc.so.6(+0x89736)[0x7f74121a5736]
/lib/x86_64-linux-gnu/libc.so.6(+0x9208c)[0x7f74121ae08c]
/lib/x86_64-linux-gnu/libc.so.6(+0x93aac)[0x7f74121afaac]
/lib/x86_64-linux-gnu/libc.so.6(cfree+0x68)[0x7f74121b37a8]
./libamdhip64.so.3(+0x115a4e)[0x7f74113c6a4e]
./libamdhip64.so.3(+0x119e83)[0x7f74113cae83]
./libamdhip64.so.3(+0x11a000)[0x7f74113cb000]
./libamdhip64.so.3(+0x615de)[0x7f74113125de]
./libamdhip64.so.3(hipMemcpyWithStream+0x100)[0x7f741136fa70]
./libtorch_hip.so(+0x107c1d4)[0x7f74137b51d4]
./libtorch_cpu.so(+0x86b028)[0x7f743d57c028]
./libtorch_cpu.so(+0x869b07)[0x7f743d57ab07]
./libtorch_cpu.so(_ZN2at6native5copy_ERNS_6TensorERKS1_b+0x54)[0x7f743d57bf34]
./libtorch_cpu.so(_ZNK3c1010Dispatcher19callWithDispatchKeyIRN2at6TensorEJS4_RKS3_bEEET_RKNS_19TypedOperatorHandleIFS7_DpT0_EEENS_11DispatchKeyESA_+0xb0)[0x7f743dd31e10]
./libtorch_cpu.so(_ZNK2at6Tensor5copy_ERKS0_b+0xd5)[0x7f743de568b5]
./libtorch_cpu.so(+0x2516fe8)[0x7f743f227fe8]
./libtorch_cpu.so(_ZNK3c1010Dispatcher19callWithDispatchKeyIRN2at6TensorEJS4_RKS3_bEEET_RKNS_19TypedOperatorHandleIFS7_DpT0_EEENS_11DispatchKeyESA_+0xb0)[0x7f743dd31e10]
./libtorch_cpu.so(_ZNK2at6Tensor5copy_ERKS0_b+0xd5)[0x7f743de568b5]
./libtorch_cpu.so(_ZN2at6native2toERKNS_6TensorERKN3c1013TensorOptionsEbbNS4_8optionalINS4_12MemoryFormatEEE+0xea8)[0x7f743d828578]
./libtorch_cpu.so(+0x10d62f5)[0x7f743dde72f5]
./libtorch_cpu.so(+0x623379)[0x7f743d334379]
./libtorch_cpu.so(+0xe2673d)[0x7f743db3773d]
./libtorch_cpu.so(_ZNK2at6Tensor2toERKN3c1013TensorOptionsEbbNS1_8optionalINS1_12MemoryFormatEEE+0x306)[0x7f743de6ed56]
../../projects/www.mlcathome.org_mlcathome/mlds-gpu_9.80_x86_64-pc-linux-gnu__rocm(+0xc12b8)[0x55a98fac12b8]
../../projects/www.mlcathome.org_mlcathome/mlds-gpu_9.80_x86_64-pc-linux-gnu__rocm(+0x89e04)[0x55a98fa89e04]
/lib/x86_64-linux-gnu/libc.so.6(__libc_start_main+0xd5)[0x7f7412144565]
../../projects/www.mlcathome.org_mlcathome/mlds-gpu_9.80_x86_64-pc-linux-gnu__rocm(+0x86efa)[0x55a98fa86efa]

Exiting...

</stderr_txt>
]]>


©2022 MLC@Home Team
A project of the Cognition, Robotics, and Learning (CORAL) Lab at the University of Maryland, Baltimore County (UMBC)