| Name | ParityModified-1607233792-4279-56_4 |
| Workunit | 3057528 |
| Created | 27 Apr 2021, 21:37:38 UTC |
| Sent | 27 Apr 2021, 22:34:56 UTC |
| Report deadline | 4 May 2021, 22:34:56 UTC |
| Received | 27 Apr 2021, 22:41:13 UTC |
| Server state | Over |
| Outcome | Computation error |
| Client state | Compute error |
| Exit status | 193 (0x000000C1) EXIT_SIGNAL |
| Computer ID | 11060 |
| Run time | 1 sec |
| CPU time | |
| Validate state | Invalid |
| Credit | 0.00 |
| Device peak FLOPS | 13,837.92 GFLOPS |
| Application version | Machine Learning Dataset Generator (test) v9.80 (amdrocm) x86_64-pc-linux-gnu |
| Peak disk usage | 2.25 GB |
<core_client_version>7.16.16</core_client_version> <![CDATA[ <message> process exited with code 193 (0xc1, -63)</message> <stderr_txt> DEBUG: Args: ../../projects/www.mlcathome.org_mlcathome/mlds-gpu_9.80_x86_64-pc-linux-gnu__rocm -c --maxepoch 128 nthreads: 1 gpudev: 0 Re-exec()-ing to set environment correctly 00:35:45 (119699): start_timer_thread(): pthread_create(): 22Machine Learning Dataset Generator v9.80 (Linux/x86_64) (libTorch: release/1.7 GPU: Vega 20 [Radeon VII]) [2021-04-28 00:35:45 main:442] : INFO : Set logging level to 1 [2021-04-28 00:35:45 main:448] : INFO : Running in BOINC Client mode [2021-04-28 00:35:45 main:451] : INFO : Resolving all filenames [2021-04-28 00:35:45 main:459] : INFO : Resolved: dataset.hdf5 => ../../projects/www.mlcathome.org_mlcathome/ParityModified-train-val-dataset.hdf5 (exists = 1) [2021-04-28 00:35:45 main:459] : INFO : Resolved: model.cfg => ../../projects/www.mlcathome.org_mlcathome/ParityModified-1607233792-4279-56_4_r1510431337_1 (exists = 0) [2021-04-28 00:35:45 main:459] : INFO : Resolved: model-final.pt => ../../projects/www.mlcathome.org_mlcathome/ParityModified-1607233792-4279-56_4_r1510431337_0 (exists = 0) [2021-04-28 00:35:45 main:459] : INFO : Resolved: model-input.pt => ../../projects/www.mlcathome.org_mlcathome/ParityModified-1607233792-4279-56 (exists = 1) [2021-04-28 00:35:45 main:459] : INFO : Resolved: snapshot.pt => snapshot.pt (exists = 0) [2021-04-28 00:35:45 main:479] : INFO : Dataset filename: ../../projects/www.mlcathome.org_mlcathome/ParityModified-train-val-dataset.hdf5 [2021-04-28 00:35:45 main:481] : INFO : Configuration: [2021-04-28 00:35:45 main:482] : INFO : Model type: GRU [2021-04-28 00:35:45 main:483] : INFO : Validation Loss Threshold: 0.0001 [2021-04-28 00:35:45 main:484] : INFO : Max Epochs: 128 [2021-04-28 00:35:45 main:485] : INFO : Batch Size: 128 [2021-04-28 00:35:45 main:486] : INFO : Learning Rate: 0.01 [2021-04-28 00:35:45 main:487] : INFO : Patience: 10 [2021-04-28 00:35:45 main:488] : INFO : Hidden Width: 12 [2021-04-28 00:35:45 main:489] : INFO : # Recurrent Layers: 4 [2021-04-28 00:35:45 main:490] : INFO : # Backend Layers: 4 [2021-04-28 00:35:45 main:491] : INFO : # Threads: 1 [2021-04-28 00:35:45 main:493] : INFO : Preparing Dataset [2021-04-28 00:35:45 load_hdf5_ds_into_tensor:28] : INFO : Loading Dataset /Xt from ../../projects/www.mlcathome.org_mlcathome/ParityModified-train-val-dataset.hdf5 into memory [2021-04-28 00:35:45 load_hdf5_ds_into_tensor:28] : INFO : Loading Dataset /Yt from ../../projects/www.mlcathome.org_mlcathome/ParityModified-train-val-dataset.hdf5 into memory free(): invalid pointer SIGABRT: abort called Stack trace (31 frames): ../../projects/www.mlcathome.org_mlcathome/mlds-gpu_9.80_x86_64-pc-linux-gnu__rocm(+0x37f44c)[0x5576c6b7f44c] /usr/lib/libpthread.so.0(+0x13960)[0x7fdada6b0960] /usr/lib/libc.so.6(gsignal+0x145)[0x7fdaa94c7ef5] /usr/lib/libc.so.6(abort+0x116)[0x7fdaa94b1862] /usr/lib/libc.so.6(+0x7ef38)[0x7fdaa9509f38] /usr/lib/libc.so.6(+0x86bea)[0x7fdaa9511bea] /usr/lib/libc.so.6(+0x87fbc)[0x7fdaa9512fbc] /usr/lib/libc.so.6(cfree+0x68)[0x7fdaa9516ca8] ./libamdhip64.so.3(+0x115a4e)[0x7fdaa8736a4e] ./libamdhip64.so.3(+0x119e83)[0x7fdaa873ae83] ./libamdhip64.so.3(+0x11a000)[0x7fdaa873b000] ./libamdhip64.so.3(+0x615de)[0x7fdaa86825de] ./libamdhip64.so.3(hipMemcpyWithStream+0x100)[0x7fdaa86dfa70] ./libtorch_hip.so(+0x107c1d4)[0x7fdaaab051d4] ./libtorch_cpu.so(+0x86b028)[0x7fdad48cc028] ./libtorch_cpu.so(+0x869b07)[0x7fdad48cab07] ./libtorch_cpu.so(_ZN2at6native5copy_ERNS_6TensorERKS1_b+0x54)[0x7fdad48cbf34] ./libtorch_cpu.so(_ZNK3c1010Dispatcher19callWithDispatchKeyIRN2at6TensorEJS4_RKS3_bEEET_RKNS_19TypedOperatorHandleIFS7_DpT0_EEENS_11DispatchKeyESA_+0xb0)[0x7fdad5081e10] ./libtorch_cpu.so(_ZNK2at6Tensor5copy_ERKS0_b+0xd5)[0x7fdad51a68b5] ./libtorch_cpu.so(+0x2516fe8)[0x7fdad6577fe8] ./libtorch_cpu.so(_ZNK3c1010Dispatcher19callWithDispatchKeyIRN2at6TensorEJS4_RKS3_bEEET_RKNS_19TypedOperatorHandleIFS7_DpT0_EEENS_11DispatchKeyESA_+0xb0)[0x7fdad5081e10] ./libtorch_cpu.so(_ZNK2at6Tensor5copy_ERKS0_b+0xd5)[0x7fdad51a68b5] ./libtorch_cpu.so(_ZN2at6native2toERKNS_6TensorERKN3c1013TensorOptionsEbbNS4_8optionalINS4_12MemoryFormatEEE+0xea8)[0x7fdad4b78578] ./libtorch_cpu.so(+0x10d62f5)[0x7fdad51372f5] ./libtorch_cpu.so(+0x623379)[0x7fdad4684379] ./libtorch_cpu.so(+0xe2673d)[0x7fdad4e8773d] ./libtorch_cpu.so(_ZNK2at6Tensor2toERKN3c1013TensorOptionsEbbNS1_8optionalINS1_12MemoryFormatEEE+0x306)[0x7fdad51bed56] ../../projects/www.mlcathome.org_mlcathome/mlds-gpu_9.80_x86_64-pc-linux-gnu__rocm(+0xc12b8)[0x5576c68c12b8] ../../projects/www.mlcathome.org_mlcathome/mlds-gpu_9.80_x86_64-pc-linux-gnu__rocm(+0x89e04)[0x5576c6889e04] /usr/lib/libc.so.6(__libc_start_main+0xd5)[0x7fdaa94b2b25] ../../projects/www.mlcathome.org_mlcathome/mlds-gpu_9.80_x86_64-pc-linux-gnu__rocm(+0x86efa)[0x5576c6886efa] Exiting... </stderr_txt> ]]>
©2022 MLC@Home Team
A project of the Cognition, Robotics, and Learning (CORAL) Lab at the University of Maryland, Baltimore County (UMBC)