| Name | ParityModified-1607233792-4279-56_1 |
| Workunit | 3057528 |
| Created | 27 Apr 2021, 0:30:45 UTC |
| Sent | 27 Apr 2021, 0:43:03 UTC |
| Report deadline | 4 May 2021, 0:43:03 UTC |
| Received | 27 Apr 2021, 1:00:13 UTC |
| Server state | Over |
| Outcome | Computation error |
| Client state | Compute error |
| Exit status | 193 (0x000000C1) EXIT_SIGNAL |
| Computer ID | 11140 |
| Run time | 3 sec |
| CPU time | |
| Validate state | Invalid |
| Credit | 0.00 |
| Device peak FLOPS | 13,358.86 GFLOPS |
| Application version | Machine Learning Dataset Generator (test) v9.80 (amdrocm) x86_64-pc-linux-gnu |
| Peak disk usage | 2.25 GB |
<core_client_version>7.16.14</core_client_version> <![CDATA[ <message> process exited with code 193 (0xc1, -63)</message> <stderr_txt> DEBUG: Args: ../../projects/www.mlcathome.org_mlcathome/mlds-gpu_9.80_x86_64-pc-linux-gnu__rocm -c --maxepoch 128 nthreads: 1 gpudev: 0 Re-exec()-ing to set environment correctly 20:59:28 (382828): start_timer_thread(): pthread_create(): 22Machine Learning Dataset Generator v9.80 (Linux/x86_64) (libTorch: release/1.7 GPU: Vega 10 XL/XT [Radeon RX Vega 56/64]) [2021-04-26 20:59:28 main:442] : INFO : Set logging level to 1 [2021-04-26 20:59:28 main:448] : INFO : Running in BOINC Client mode [2021-04-26 20:59:28 main:451] : INFO : Resolving all filenames [2021-04-26 20:59:28 main:459] : INFO : Resolved: dataset.hdf5 => ../../projects/www.mlcathome.org_mlcathome/ParityModified-train-val-dataset.hdf5 (exists = 1) [2021-04-26 20:59:28 main:459] : INFO : Resolved: model.cfg => ../../projects/www.mlcathome.org_mlcathome/ParityModified-1607233792-4279-56_1_r1924839405_1 (exists = 0) [2021-04-26 20:59:28 main:459] : INFO : Resolved: model-final.pt => ../../projects/www.mlcathome.org_mlcathome/ParityModified-1607233792-4279-56_1_r1924839405_0 (exists = 0) [2021-04-26 20:59:28 main:459] : INFO : Resolved: model-input.pt => ../../projects/www.mlcathome.org_mlcathome/ParityModified-1607233792-4279-56 (exists = 1) [2021-04-26 20:59:28 main:459] : INFO : Resolved: snapshot.pt => snapshot.pt (exists = 0) [2021-04-26 20:59:28 main:479] : INFO : Dataset filename: ../../projects/www.mlcathome.org_mlcathome/ParityModified-train-val-dataset.hdf5 [2021-04-26 20:59:28 main:481] : INFO : Configuration: [2021-04-26 20:59:28 main:482] : INFO : Model type: GRU [2021-04-26 20:59:28 main:483] : INFO : Validation Loss Threshold: 0.0001 [2021-04-26 20:59:28 main:484] : INFO : Max Epochs: 128 [2021-04-26 20:59:28 main:485] : INFO : Batch Size: 128 [2021-04-26 20:59:28 main:486] : INFO : Learning Rate: 0.01 [2021-04-26 20:59:28 main:487] : INFO : Patience: 10 [2021-04-26 20:59:28 main:488] : INFO : Hidden Width: 12 [2021-04-26 20:59:28 main:489] : INFO : # Recurrent Layers: 4 [2021-04-26 20:59:28 main:490] : INFO : # Backend Layers: 4 [2021-04-26 20:59:28 main:491] : INFO : # Threads: 1 [2021-04-26 20:59:28 main:493] : INFO : Preparing Dataset [2021-04-26 20:59:28 load_hdf5_ds_into_tensor:28] : INFO : Loading Dataset /Xt from ../../projects/www.mlcathome.org_mlcathome/ParityModified-train-val-dataset.hdf5 into memory [2021-04-26 20:59:28 load_hdf5_ds_into_tensor:28] : INFO : Loading Dataset /Yt from ../../projects/www.mlcathome.org_mlcathome/ParityModified-train-val-dataset.hdf5 into memory [2021-04-26 20:59:29 load:106] : INFO : Successfully loaded dataset of 2048 examples into memory. [2021-04-26 20:59:29 load_hdf5_ds_into_tensor:28] : INFO : Loading Dataset /Xv from ../../projects/www.mlcathome.org_mlcathome/ParityModified-train-val-dataset.hdf5 into memory [2021-04-26 20:59:29 load_hdf5_ds_into_tensor:28] : INFO : Loading Dataset /Yv from ../../projects/www.mlcathome.org_mlcathome/ParityModified-train-val-dataset.hdf5 into memory [2021-04-26 20:59:29 load:106] : INFO : Successfully loaded dataset of 512 examples into memory. [2021-04-26 20:59:29 main:501] : INFO : Creating Model [2021-04-26 20:59:29 main:514] : INFO : Preparing config file [2021-04-26 20:59:29 main:526] : INFO : Creating new config file [2021-04-26 20:59:29 main:545] : INFO : This is a continuation WU, loading previous network [2021-04-26 20:59:29 main:566] : INFO : Loading DataLoader into Memory [2021-04-26 20:59:29 main:569] : INFO : Starting Training sh: /opt/rocm-3.9.0/bin/clang-ocl: No such file or directory MIOpen Error: /root/driver/MLOpen/src/tmp_dir.cpp:47: Can't execute cd /tmp/miopen-MIOpenSubTensorOpWithScalarKernel.cl-141e-04bb-ec47-d6b7; /opt/rocm-3.9.0/bin/clang-ocl -DSUBTENSOR_OP_WITH_SCALAR=SUBTENSOR_OP_WITH_SCALAR_SET -DMIOPEN_USE_FP16=0 -DMIOPEN_USE_FP32=1 -DMIOPEN_USE_INT8=0 -DMIOPEN_USE_INT8x4=0 -DMIOPEN_USE_BFP16=0 -DMIOPEN_USE_INT32=0 -DMIOPEN_USE_RNE_BFLOAT16=1 -DWORK_LENGTH_0=65536 -mcpu=gfx900 -Wno-everything -Xclang -target-feature -Xclang +code-object-v3 MIOpenSubTensorOpWithScalarKernel.cl -o /tmp/miopen-MIOpenSubTensorOpWithScalarKernel.cl-141e-04bb-ec47-d6b7/MIOpenSubTensorOpWithScalarKernel.cl.o terminate called after throwing an instance of 'at::native::miopen_exception' what(): miopenStatusUnknownError SIGABRT: abort called Stack trace (34 frames): ../../projects/www.mlcathome.org_mlcathome/mlds-gpu_9.80_x86_64-pc-linux-gnu__rocm(+0x37f44c)[0x55d64057f44c] /lib64/libpthread.so.0(+0x13140)[0x7fb4cd73c140] /lib64/libc.so.6(gsignal+0x10a)[0x7fb49c56902a] /lib64/libc.so.6(abort+0x119)[0x7fb49c55253a] ../../projects/www.mlcathome.org_mlcathome/mlds-gpu_9.80_x86_64-pc-linux-gnu__rocm(_ZN9__gnu_cxx27__verbose_terminate_handlerEv+0x135)[0x55d640630ca5] ../../projects/www.mlcathome.org_mlcathome/mlds-gpu_9.80_x86_64-pc-linux-gnu__rocm(+0x399cf6)[0x55d640599cf6] ../../projects/www.mlcathome.org_mlcathome/mlds-gpu_9.80_x86_64-pc-linux-gnu__rocm(+0x399d41)[0x55d640599d41] ../../projects/www.mlcathome.org_mlcathome/mlds-gpu_9.80_x86_64-pc-linux-gnu__rocm(+0x397d74)[0x55d640597d74] ./libtorch_hip.so(+0x35e8d8)[0x7fb49ce818d8] ./libtorch_hip.so(_ZN2at6native10miopen_rnnERKNS_6TensorEN3c108ArrayRefIS1_EElS3_S3_lllbdbbNS5_IlEES3_+0xc73)[0x7fb49ce960e3] ./libtorch_hip.so(+0x3fbf3d)[0x7fb49cf1ef3d] ./libtorch_hip.so(+0x44f7d4)[0x7fb49cf727d4] ./libtorch_hip.so(+0x441200)[0x7fb49cf64200] ./libtorch_cpu.so(+0x10038f0)[0x7fb4c80fe8f0] ./libtorch_cpu.so(_ZN2at10miopen_rnnERKNS_6TensorEN3c108ArrayRefIS0_EElS2_RKNS3_8optionalIS0_EElllbdbbNS4_IlEES9_+0x1f6)[0x7fb4c8012ab6] ./libtorch_cpu.so(+0x1c53a8f)[0x7fb4c8d4ea8f] ./libtorch_cpu.so(+0x1c62f30)[0x7fb4c8d5df30] ./libtorch_cpu.so(+0x10038f0)[0x7fb4c80fe8f0] ./libtorch_cpu.so(_ZN2at10miopen_rnnERKNS_6TensorEN3c108ArrayRefIS0_EElS2_RKNS3_8optionalIS0_EElllbdbbNS4_IlEES9_+0x1f6)[0x7fb4c8012ab6] ./libtorch_hip.so(+0x36e6c3)[0x7fb49ce916c3] ./libtorch_hip.so(+0x36ea5a)[0x7fb49ce91a5a] ./libtorch_cpu.so(+0xa43ad3)[0x7fb4c7b3ead3] ./libtorch_cpu.so(_ZN2at6native3gruERKNS_6TensorES3_N3c108ArrayRefIS1_EEbldbbb+0x1ca)[0x7fb4c7b356ca] ./libtorch_cpu.so(+0x1095ca6)[0x7fb4c8190ca6] ./libtorch_cpu.so(+0x10d31e9)[0x7fb4c81ce1e9] ./libtorch_cpu.so(+0x102ab12)[0x7fb4c8125b12] ./libtorch_cpu.so(_ZN2at3gruERKNS_6TensorES2_N3c108ArrayRefIS0_EEbldbbb+0x174)[0x7fb4c803e5a4] ./libtorch_cpu.so(_ZN5torch2nn7GRUImpl14forward_helperERKN2at6TensorES5_S5_lS3_+0x43a)[0x7fb4c974dbaa] ./libtorch_cpu.so(_ZN5torch2nn7GRUImpl7forwardERKN2at6TensorES3_+0xb0)[0x7fb4c974dd10] ../../projects/www.mlcathome.org_mlcathome/mlds-gpu_9.80_x86_64-pc-linux-gnu__rocm(+0x930ae)[0x55d6402930ae] ../../projects/www.mlcathome.org_mlcathome/mlds-gpu_9.80_x86_64-pc-linux-gnu__rocm(+0x9a276)[0x55d64029a276] ../../projects/www.mlcathome.org_mlcathome/mlds-gpu_9.80_x86_64-pc-linux-gnu__rocm(+0x8ae9d)[0x55d64028ae9d] /lib64/libc.so.6(__libc_start_main+0x103)[0x7fb49c553f03] ../../projects/www.mlcathome.org_mlcathome/mlds-gpu_9.80_x86_64-pc-linux-gnu__rocm(+0x86efa)[0x55d640286efa] Exiting... </stderr_txt> ]]>
©2022 MLC@Home Team
A project of the Cognition, Robotics, and Learning (CORAL) Lab at the University of Maryland, Baltimore County (UMBC)