Task 5096382

Name ParityModified-1607233792-4279-56_0
Workunit 3057528
Created 27 Apr 2021, 0:21:03 UTC
Sent 27 Apr 2021, 0:21:18 UTC
Report deadline 4 May 2021, 0:21:18 UTC
Received 27 Apr 2021, 0:30:42 UTC
Server state Over
Outcome Computation error
Client state Compute error
Exit status 193 (0x000000C1) EXIT_SIGNAL
Computer ID 10429
Run time 2 sec
CPU time
Validate state Invalid
Credit 0.00
Device peak FLOPS 9,222.22 GFLOPS
Application version Machine Learning Dataset Generator (test) v9.80 (amdrocm)
x86_64-pc-linux-gnu
Peak disk usage 2.25 GB

Stderr output

<core_client_version>7.16.6</core_client_version>
<![CDATA[
<message>
process exited with code 193 (0xc1, -63)</message>
<stderr_txt>
DEBUG: Args: ../../projects/www.mlcathome.org_mlcathome/mlds-gpu_9.80_x86_64-pc-linux-gnu__rocm -c --maxepoch 128 
nthreads: 1 gpudev: 0
Re-exec()-ing to set environment correctly
02:21:55 (368772): start_timer_thread(): pthread_create(): 22Machine Learning Dataset Generator v9.80 (Linux/x86_64) (libTorch: release/1.7 GPU: Vega 20 [Radeon VII])
[2021-04-27 02:21:55	                main:442]	:	INFO	:	Set logging level to 1
[2021-04-27 02:21:55	                main:448]	:	INFO	:	Running in BOINC Client mode
[2021-04-27 02:21:55	                main:451]	:	INFO	:	Resolving all filenames
[2021-04-27 02:21:55	                main:459]	:	INFO	:	Resolved: dataset.hdf5 => ../../projects/www.mlcathome.org_mlcathome/ParityModified-train-val-dataset.hdf5 (exists = 1)
[2021-04-27 02:21:55	                main:459]	:	INFO	:	Resolved: model.cfg => ../../projects/www.mlcathome.org_mlcathome/ParityModified-1607233792-4279-56_0_r1669757652_1 (exists = 0)
[2021-04-27 02:21:55	                main:459]	:	INFO	:	Resolved: model-final.pt => ../../projects/www.mlcathome.org_mlcathome/ParityModified-1607233792-4279-56_0_r1669757652_0 (exists = 0)
[2021-04-27 02:21:55	                main:459]	:	INFO	:	Resolved: model-input.pt => ../../projects/www.mlcathome.org_mlcathome/ParityModified-1607233792-4279-56 (exists = 1)
[2021-04-27 02:21:55	                main:459]	:	INFO	:	Resolved: snapshot.pt => snapshot.pt (exists = 0)
[2021-04-27 02:21:55	                main:479]	:	INFO	:	Dataset filename: ../../projects/www.mlcathome.org_mlcathome/ParityModified-train-val-dataset.hdf5
[2021-04-27 02:21:55	                main:481]	:	INFO	:	Configuration: 
[2021-04-27 02:21:55	                main:482]	:	INFO	:	    Model type: GRU
[2021-04-27 02:21:55	                main:483]	:	INFO	:	    Validation Loss Threshold: 0.0001
[2021-04-27 02:21:55	                main:484]	:	INFO	:	    Max Epochs: 128
[2021-04-27 02:21:55	                main:485]	:	INFO	:	    Batch Size: 128
[2021-04-27 02:21:55	                main:486]	:	INFO	:	    Learning Rate: 0.01
[2021-04-27 02:21:55	                main:487]	:	INFO	:	    Patience: 10
[2021-04-27 02:21:55	                main:488]	:	INFO	:	    Hidden Width: 12
[2021-04-27 02:21:55	                main:489]	:	INFO	:	    # Recurrent Layers: 4
[2021-04-27 02:21:55	                main:490]	:	INFO	:	    # Backend Layers: 4
[2021-04-27 02:21:55	                main:491]	:	INFO	:	    # Threads: 1
[2021-04-27 02:21:55	                main:493]	:	INFO	:	Preparing Dataset
[2021-04-27 02:21:55	load_hdf5_ds_into_tensor:28]	:	INFO	:	Loading Dataset /Xt from ../../projects/www.mlcathome.org_mlcathome/ParityModified-train-val-dataset.hdf5 into memory
[2021-04-27 02:21:56	load_hdf5_ds_into_tensor:28]	:	INFO	:	Loading Dataset /Yt from ../../projects/www.mlcathome.org_mlcathome/ParityModified-train-val-dataset.hdf5 into memory
[2021-04-27 02:21:56	                load:106]	:	INFO	:	Successfully loaded dataset of 2048 examples into memory.
[2021-04-27 02:21:56	load_hdf5_ds_into_tensor:28]	:	INFO	:	Loading Dataset /Xv from ../../projects/www.mlcathome.org_mlcathome/ParityModified-train-val-dataset.hdf5 into memory
[2021-04-27 02:21:56	load_hdf5_ds_into_tensor:28]	:	INFO	:	Loading Dataset /Yv from ../../projects/www.mlcathome.org_mlcathome/ParityModified-train-val-dataset.hdf5 into memory
[2021-04-27 02:21:57	                load:106]	:	INFO	:	Successfully loaded dataset of 512 examples into memory.
[2021-04-27 02:21:57	                main:501]	:	INFO	:	Creating Model
[2021-04-27 02:21:57	                main:514]	:	INFO	:	Preparing config file
[2021-04-27 02:21:57	                main:526]	:	INFO	:	Creating new config file
[2021-04-27 02:21:57	                main:545]	:	INFO	:	This is a continuation WU, loading previous network
[2021-04-27 02:21:57	                main:566]	:	INFO	:	Loading DataLoader into Memory
[2021-04-27 02:21:57	                main:569]	:	INFO	:	Starting Training
sh: 1: /opt/rocm-3.9.0/bin/clang-ocl: not found
MIOpen Error: /root/driver/MLOpen/src/tmp_dir.cpp:47: Can't execute cd /tmp/miopen-MIOpenSubTensorOpWithScalarKernel.cl-23a1-fdc0-dfd1-e69c; /opt/rocm-3.9.0/bin/clang-ocl  -DSUBTENSOR_OP_WITH_SCALAR=SUBTENSOR_OP_WITH_SCALAR_SET -DMIOPEN_USE_FP16=0 -DMIOPEN_USE_FP32=1 -DMIOPEN_USE_INT8=0 -DMIOPEN_USE_INT8x4=0 -DMIOPEN_USE_BFP16=0 -DMIOPEN_USE_INT32=0 -DMIOPEN_USE_RNE_BFLOAT16=1 -DWORK_LENGTH_0=65536 -mcpu=gfx906 -Wno-everything -Xclang -target-feature -Xclang +code-object-v3 MIOpenSubTensorOpWithScalarKernel.cl -o /tmp/miopen-MIOpenSubTensorOpWithScalarKernel.cl-23a1-fdc0-dfd1-e69c/MIOpenSubTensorOpWithScalarKernel.cl.o
terminate called after throwing an instance of 'at::native::miopen_exception'
  what():  miopenStatusUnknownError
SIGABRT: abort called
Stack trace (34 frames):
../../projects/www.mlcathome.org_mlcathome/mlds-gpu_9.80_x86_64-pc-linux-gnu__rocm(+0x37f44c)[0x5588f98b644c]
/lib/x86_64-linux-gnu/libpthread.so.0(+0x153c0)[0x7ff3e7d4e3c0]
/lib/x86_64-linux-gnu/libc.so.6(gsignal+0xcb)[0x7ff3b6b3f18b]
/lib/x86_64-linux-gnu/libc.so.6(abort+0x12b)[0x7ff3b6b1e859]
../../projects/www.mlcathome.org_mlcathome/mlds-gpu_9.80_x86_64-pc-linux-gnu__rocm(_ZN9__gnu_cxx27__verbose_terminate_handlerEv+0x135)[0x5588f9967ca5]
../../projects/www.mlcathome.org_mlcathome/mlds-gpu_9.80_x86_64-pc-linux-gnu__rocm(+0x399cf6)[0x5588f98d0cf6]
../../projects/www.mlcathome.org_mlcathome/mlds-gpu_9.80_x86_64-pc-linux-gnu__rocm(+0x399d41)[0x5588f98d0d41]
../../projects/www.mlcathome.org_mlcathome/mlds-gpu_9.80_x86_64-pc-linux-gnu__rocm(+0x397d74)[0x5588f98ced74]
./libtorch_hip.so(+0x35e8d8)[0x7ff3b747a8d8]
./libtorch_hip.so(_ZN2at6native10miopen_rnnERKNS_6TensorEN3c108ArrayRefIS1_EElS3_S3_lllbdbbNS5_IlEES3_+0xc73)[0x7ff3b748f0e3]
./libtorch_hip.so(+0x3fbf3d)[0x7ff3b7517f3d]
./libtorch_hip.so(+0x44f7d4)[0x7ff3b756b7d4]
./libtorch_hip.so(+0x441200)[0x7ff3b755d200]
./libtorch_cpu.so(+0x10038f0)[0x7ff3e26f78f0]
./libtorch_cpu.so(_ZN2at10miopen_rnnERKNS_6TensorEN3c108ArrayRefIS0_EElS2_RKNS3_8optionalIS0_EElllbdbbNS4_IlEES9_+0x1f6)[0x7ff3e260bab6]
./libtorch_cpu.so(+0x1c53a8f)[0x7ff3e3347a8f]
./libtorch_cpu.so(+0x1c62f30)[0x7ff3e3356f30]
./libtorch_cpu.so(+0x10038f0)[0x7ff3e26f78f0]
./libtorch_cpu.so(_ZN2at10miopen_rnnERKNS_6TensorEN3c108ArrayRefIS0_EElS2_RKNS3_8optionalIS0_EElllbdbbNS4_IlEES9_+0x1f6)[0x7ff3e260bab6]
./libtorch_hip.so(+0x36e6c3)[0x7ff3b748a6c3]
./libtorch_hip.so(+0x36ea5a)[0x7ff3b748aa5a]
./libtorch_cpu.so(+0xa43ad3)[0x7ff3e2137ad3]
./libtorch_cpu.so(_ZN2at6native3gruERKNS_6TensorES3_N3c108ArrayRefIS1_EEbldbbb+0x1ca)[0x7ff3e212e6ca]
./libtorch_cpu.so(+0x1095ca6)[0x7ff3e2789ca6]
./libtorch_cpu.so(+0x10d31e9)[0x7ff3e27c71e9]
./libtorch_cpu.so(+0x102ab12)[0x7ff3e271eb12]
./libtorch_cpu.so(_ZN2at3gruERKNS_6TensorES2_N3c108ArrayRefIS0_EEbldbbb+0x174)[0x7ff3e26375a4]
./libtorch_cpu.so(_ZN5torch2nn7GRUImpl14forward_helperERKN2at6TensorES5_S5_lS3_+0x43a)[0x7ff3e3d46baa]
./libtorch_cpu.so(_ZN5torch2nn7GRUImpl7forwardERKN2at6TensorES3_+0xb0)[0x7ff3e3d46d10]
../../projects/www.mlcathome.org_mlcathome/mlds-gpu_9.80_x86_64-pc-linux-gnu__rocm(+0x930ae)[0x5588f95ca0ae]
../../projects/www.mlcathome.org_mlcathome/mlds-gpu_9.80_x86_64-pc-linux-gnu__rocm(+0x9a276)[0x5588f95d1276]
../../projects/www.mlcathome.org_mlcathome/mlds-gpu_9.80_x86_64-pc-linux-gnu__rocm(+0x8ae9d)[0x5588f95c1e9d]
/lib/x86_64-linux-gnu/libc.so.6(__libc_start_main+0xf3)[0x7ff3b6b200b3]
../../projects/www.mlcathome.org_mlcathome/mlds-gpu_9.80_x86_64-pc-linux-gnu__rocm(+0x86efa)[0x5588f95bdefa]

Exiting...

</stderr_txt>
]]>


©2022 MLC@Home Team
A project of the Cognition, Robotics, and Learning (CORAL) Lab at the University of Maryland, Baltimore County (UMBC)