| Name | ParityModified-1607233792-4279-56_3 |
| Workunit | 3057528 |
| Created | 27 Apr 2021, 1:11:50 UTC |
| Sent | 27 Apr 2021, 1:24:40 UTC |
| Report deadline | 4 May 2021, 1:24:40 UTC |
| Received | 27 Apr 2021, 21:37:33 UTC |
| Server state | Over |
| Outcome | Computation error |
| Client state | Compute error |
| Exit status | 193 (0x000000C1) EXIT_SIGNAL |
| Computer ID | 11158 |
| Run time | 1 min 26 sec |
| CPU time | |
| Validate state | Invalid |
| Credit | 0.00 |
| Device peak FLOPS | 13,837.92 GFLOPS |
| Application version | Machine Learning Dataset Generator (test) v9.80 (amdrocm) x86_64-pc-linux-gnu |
| Peak working set size | 1.61 GB |
| Peak swap size | 8.18 GB |
| Peak disk usage | 2.25 GB |
<core_client_version>7.16.6</core_client_version> <![CDATA[ <message> process exited with code 193 (0xc1, -63)</message> <stderr_txt> DEBUG: Args: ../../projects/www.mlcathome.org_mlcathome/mlds-gpu_9.80_x86_64-pc-linux-gnu__rocm -c --maxepoch 128 nthreads: 1 gpudev: 0 Re-exec()-ing to set environment correctly 14:35:03 (48648): start_timer_thread(): pthread_create(): 22Machine Learning Dataset Generator v9.80 (Linux/x86_64) (libTorch: release/1.7 GPU: Vega 20 [Radeon VII]) [2021-04-27 14:35:03 main:442] : INFO : Set logging level to 1 [2021-04-27 14:35:03 main:448] : INFO : Running in BOINC Client mode [2021-04-27 14:35:03 main:451] : INFO : Resolving all filenames [2021-04-27 14:35:03 main:459] : INFO : Resolved: dataset.hdf5 => ../../projects/www.mlcathome.org_mlcathome/ParityModified-train-val-dataset.hdf5 (exists = 1) [2021-04-27 14:35:03 main:459] : INFO : Resolved: model.cfg => ../../projects/www.mlcathome.org_mlcathome/ParityModified-1607233792-4279-56_3_r2051608797_1 (exists = 0) [2021-04-27 14:35:03 main:459] : INFO : Resolved: model-final.pt => ../../projects/www.mlcathome.org_mlcathome/ParityModified-1607233792-4279-56_3_r2051608797_0 (exists = 0) [2021-04-27 14:35:03 main:459] : INFO : Resolved: model-input.pt => ../../projects/www.mlcathome.org_mlcathome/ParityModified-1607233792-4279-56 (exists = 1) [2021-04-27 14:35:03 main:459] : INFO : Resolved: snapshot.pt => snapshot.pt (exists = 0) [2021-04-27 14:35:03 main:479] : INFO : Dataset filename: ../../projects/www.mlcathome.org_mlcathome/ParityModified-train-val-dataset.hdf5 [2021-04-27 14:35:03 main:481] : INFO : Configuration: [2021-04-27 14:35:03 main:482] : INFO : Model type: GRU [2021-04-27 14:35:03 main:483] : INFO : Validation Loss Threshold: 0.0001 [2021-04-27 14:35:03 main:484] : INFO : Max Epochs: 128 [2021-04-27 14:35:03 main:485] : INFO : Batch Size: 128 [2021-04-27 14:35:03 main:486] : INFO : Learning Rate: 0.01 [2021-04-27 14:35:03 main:487] : INFO : Patience: 10 [2021-04-27 14:35:03 main:488] : INFO : Hidden Width: 12 [2021-04-27 14:35:03 main:489] : INFO : # Recurrent Layers: 4 [2021-04-27 14:35:03 main:490] : INFO : # Backend Layers: 4 [2021-04-27 14:35:03 main:491] : INFO : # Threads: 1 [2021-04-27 14:35:03 main:493] : INFO : Preparing Dataset [2021-04-27 14:35:03 load_hdf5_ds_into_tensor:28] : INFO : Loading Dataset /Xt from ../../projects/www.mlcathome.org_mlcathome/ParityModified-train-val-dataset.hdf5 into memory [2021-04-27 14:35:04 load_hdf5_ds_into_tensor:28] : INFO : Loading Dataset /Yt from ../../projects/www.mlcathome.org_mlcathome/ParityModified-train-val-dataset.hdf5 into memory [2021-04-27 14:35:06 load:106] : INFO : Successfully loaded dataset of 2048 examples into memory. [2021-04-27 14:35:06 load_hdf5_ds_into_tensor:28] : INFO : Loading Dataset /Xv from ../../projects/www.mlcathome.org_mlcathome/ParityModified-train-val-dataset.hdf5 into memory [2021-04-27 14:35:06 load_hdf5_ds_into_tensor:28] : INFO : Loading Dataset /Yv from ../../projects/www.mlcathome.org_mlcathome/ParityModified-train-val-dataset.hdf5 into memory [2021-04-27 14:35:06 load:106] : INFO : Successfully loaded dataset of 512 examples into memory. [2021-04-27 14:35:06 main:501] : INFO : Creating Model [2021-04-27 14:35:06 main:514] : INFO : Preparing config file [2021-04-27 14:35:06 main:526] : INFO : Creating new config file [2021-04-27 14:35:06 main:545] : INFO : This is a continuation WU, loading previous network [2021-04-27 14:35:07 main:566] : INFO : Loading DataLoader into Memory [2021-04-27 14:35:07 main:569] : INFO : Starting Training /src/external/hip-on-vdi/rocclr/hip_global.cpp:69: guarantee(false && "Cannot find Symbol") SIGABRT: abort called Stack trace (30 frames): ../../projects/www.mlcathome.org_mlcathome/mlds-gpu_9.80_x86_64-pc-linux-gnu__rocm(+0x37f44c)[0x565165b2a44c] /lib/x86_64-linux-gnu/libpthread.so.0(+0x153c0)[0x7fea1adf03c0] /lib/x86_64-linux-gnu/libc.so.6(gsignal+0xcb)[0x7fe9e9be118b] /lib/x86_64-linux-gnu/libc.so.6(abort+0x12b)[0x7fe9e9bc0859] ./libamdhip64.so.3(+0x15f92f)[0x7fe9e8e8f92f] ./libamdhip64.so.3(+0x8b380)[0x7fe9e8dbb380] ./libamdhip64.so.3(+0x8bac7)[0x7fe9e8dbbac7] ./libamdhip64.so.3(+0x5df5d)[0x7fe9e8d8df5d] ./libamdhip64.so.3(+0x1055ec)[0x7fe9e8e355ec] ./libamdhip64.so.3(hipLaunchKernel+0x172)[0x7fe9e8e18242] ./libtorch_hip.so(+0x165ca89)[0x7fe9eb81aa89] ./libtorch_hip.so(_ZN2at6native17gpu_reduce_kernelIffLi4ENS0_14func_wrapper_tIfZNS0_11sum_functorIfffEclERNS_14TensorIteratorEEUlffE_EEdEEvS6_RKT2_T3_PNS0_18AccumulationBufferEl+0xb3c)[0x7fe9eb82d2cc] ./libtorch_hip.so(+0x16594a9)[0x7fe9eb8174a9] ./libtorch_cpu.so(_ZN2at6native7sum_outERNS_6TensorERKS1_N3c108ArrayRefIlEEbNS5_8optionalINS5_10ScalarTypeEEE+0x130)[0x7fea1520ec50] ./libtorch_cpu.so(_ZN2at6native3sumERKNS_6TensorEN3c108ArrayRefIlEEbNS4_8optionalINS4_10ScalarTypeEEE+0x5b)[0x7fea1520f1cb] ./libtorch_hip.so(+0x3fec54)[0x7fe9ea5bcc54] ./libtorch_hip.so(+0x440eef)[0x7fe9ea5feeef] ./libtorch_cpu.so(+0xfff171)[0x7fea15795171] ./libtorch_cpu.so(_ZN2at3sumERKNS_6TensorEN3c108ArrayRefIlEEbNS3_8optionalINS3_10ScalarTypeEEE+0x100)[0x7fea156a9a50] ./libtorch_cpu.so(+0x1c1355e)[0x7fea163a955e] ./libtorch_cpu.so(+0x62217f)[0x7fea14db817f] ./libtorch_cpu.so(+0xfff171)[0x7fea15795171] ./libtorch_cpu.so(_ZNK2at6Tensor3sumEN3c108ArrayRefIlEEbNS1_8optionalINS1_10ScalarTypeEEE+0x100)[0x7fea158e8dc0] ./libtorch_cpu.so(+0x1fa1158)[0x7fea16737158] ./libtorch_cpu.so(_ZN5torch8autograd6Engine17evaluate_functionERSt10shared_ptrINS0_9GraphTaskEEPNS0_4NodeERNS0_11InputBufferERKS2_INS0_10ReadyQueueEE+0x4fc)[0x7fea1673d59c] ./libtorch_cpu.so(_ZN5torch8autograd6Engine11thread_mainERKSt10shared_ptrINS0_9GraphTaskEE+0x4e9)[0x7fea1673f219] ./libtorch_cpu.so(_ZN5torch8autograd6Engine11thread_initEiRKSt10shared_ptrINS0_10ReadyQueueEEb+0x99)[0x7fea16736529] ../../projects/www.mlcathome.org_mlcathome/mlds-gpu_9.80_x86_64-pc-linux-gnu__rocm(+0x3e48cf)[0x565165b8f8cf] /lib/x86_64-linux-gnu/libpthread.so.0(+0x9609)[0x7fea1ade4609] /lib/x86_64-linux-gnu/libc.so.6(clone+0x43)[0x7fe9e9cbd293] Exiting... </stderr_txt> ]]>
©2022 MLC@Home Team
A project of the Cognition, Robotics, and Learning (CORAL) Lab at the University of Maryland, Baltimore County (UMBC)