| Name | ParityModified-1631730375-25333-1_1 |
| Workunit | 5219302 |
| Created | 4 Oct 2021, 23:13:34 UTC |
| Sent | 4 Oct 2021, 23:36:23 UTC |
| Report deadline | 11 Oct 2021, 23:36:23 UTC |
| Received | 6 Oct 2021, 16:39:52 UTC |
| Server state | Over |
| Outcome | Computation error |
| Client state | Compute error |
| Exit status | 193 (0x000000C1) EXIT_SIGNAL |
| Computer ID | 15437 |
| Run time | 7 sec |
| CPU time | 1 sec |
| Validate state | Invalid |
| Credit | 0.00 |
| Device peak FLOPS | 9,221.50 GFLOPS |
| Application version | Machine Learning Dataset Generator (GPU) v9.80 (cuda10200) x86_64-pc-linux-gnu |
| Peak working set size | 74.48 MB |
| Peak swap size | 2.72 GB |
| Peak disk usage | 2.98 GB |
<core_client_version>7.14.2</core_client_version> <![CDATA[ <message> process exited with code 193 (0xc1, -63)</message> <stderr_txt> DEBUG: Args: ../../projects/www.mlcathome.org_mlcathome/mlds-gpu_9.80_x86_64-pc-linux-gnu__cuda10200 -c --maxepoch 2048 nthreads: 1 gpudev: 0 Re-exec()-ing to set environment correctly Machine Learning Dataset Generator v9.80 (Linux/x86_64) (libTorch: release/1.7 GPU: GeForce RTX 2070 SUPER) [2021-10-05 14:26:31 main:442] : INFO : Set logging level to 1 [2021-10-05 14:26:31 main:448] : INFO : Running in BOINC Client mode [2021-10-05 14:26:31 main:451] : INFO : Resolving all filenames [2021-10-05 14:26:31 main:459] : INFO : Resolved: dataset.hdf5 => ../../projects/www.mlcathome.org_mlcathome/ParityModified-train-val-dataset.hdf5 (exists = 1) [2021-10-05 14:26:31 main:459] : INFO : Resolved: model.cfg => ../../projects/www.mlcathome.org_mlcathome/ParityModified-1631730375-25333-1_1_r652322843_1 (exists = 0) [2021-10-05 14:26:31 main:459] : INFO : Resolved: model-final.pt => ../../projects/www.mlcathome.org_mlcathome/ParityModified-1631730375-25333-1_1_r652322843_0 (exists = 0) [2021-10-05 14:26:31 main:459] : INFO : Resolved: model-input.pt => ../../projects/www.mlcathome.org_mlcathome/ParityModified-1631730375-25333-1 (exists = 1) [2021-10-05 14:26:31 main:459] : INFO : Resolved: snapshot.pt => snapshot.pt (exists = 0) [2021-10-05 14:26:31 main:479] : INFO : Dataset filename: ../../projects/www.mlcathome.org_mlcathome/ParityModified-train-val-dataset.hdf5 [2021-10-05 14:26:31 main:481] : INFO : Configuration: [2021-10-05 14:26:31 main:482] : INFO : Model type: GRU [2021-10-05 14:26:31 main:483] : INFO : Validation Loss Threshold: 0.0001 [2021-10-05 14:26:31 main:484] : INFO : Max Epochs: 2048 [2021-10-05 14:26:31 main:485] : INFO : Batch Size: 128 [2021-10-05 14:26:31 main:486] : INFO : Learning Rate: 0.01 [2021-10-05 14:26:31 main:487] : INFO : Patience: 10 [2021-10-05 14:26:31 main:488] : INFO : Hidden Width: 12 [2021-10-05 14:26:31 main:489] : INFO : # Recurrent Layers: 4 [2021-10-05 14:26:31 main:490] : INFO : # Backend Layers: 4 [2021-10-05 14:26:31 main:491] : INFO : # Threads: 1 [2021-10-05 14:26:31 main:493] : INFO : Preparing Dataset [2021-10-05 14:26:31 load_hdf5_ds_into_tensor:28] : INFO : Loading Dataset /Xt from ../../projects/www.mlcathome.org_mlcathome/ParityModified-train-val-dataset.hdf5 into memory [2021-10-05 14:26:32 load_hdf5_ds_into_tensor:28] : INFO : Loading Dataset /Yt from ../../projects/www.mlcathome.org_mlcathome/ParityModified-train-val-dataset.hdf5 into memory terminate called after throwing an instance of 'c10::Error' what(): CUDA error: invalid device ordinal Exception raised from exchangeDevice at /home/mlcbuild/git/pytorch-build/build-cuda/pytorch-prefix/src/pytorch/c10/cuda/impl/CUDAGuardImpl.h:31 (most recent call first): frame #0: c10::Error::Error(c10::SourceLocation, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >) + 0x6b (0x7f317e4d199b in ./libc10.so) frame #1: <unknown function> + 0x13e29 (0x7f30ff280e29 in ./libc10_cuda.so) frame #2: <unknown function> + 0x89f6e0 (0x7f3100c016e0 in ./libtorch_cuda.so) frame #3: <unknown function> + 0x87bbca (0x7f3100bddbca in ./libtorch_cuda.so) frame #4: <unknown function> + 0x896055 (0x7f3100bf8055 in ./libtorch_cuda.so) frame #5: <unknown function> + 0xcd9db7 (0x7f3179004db7 in ./libtorch_cpu.so) frame #6: <unknown function> + 0xcdb675 (0x7f3179006675 in ./libtorch_cpu.so) frame #7: at::empty_strided(c10::ArrayRef<long>, c10::ArrayRef<long>, c10::TensorOptions const&) + 0x114 (0x7f3179126664 in ./libtorch_cpu.so) frame #8: <unknown function> + 0x21c2266 (0x7f317a4ed266 in ./libtorch_cpu.so) frame #9: <unknown function> + 0xcdb675 (0x7f3179006675 in ./libtorch_cpu.so) frame #10: at::empty_strided(c10::ArrayRef<long>, c10::ArrayRef<long>, c10::TensorOptions const&) + 0x114 (0x7f3179126664 in ./libtorch_cpu.so) frame #11: at::native::to(at::Tensor const&, c10::TensorOptions const&, bool, bool, c10::optional<c10::MemoryFormat>) + 0xc66 (0x7f3178d6e5d6 in ./libtorch_cpu.so) frame #12: <unknown function> + 0xe9476a (0x7f31791bf76a in ./libtorch_cpu.so) frame #13: <unknown function> + 0x2260e0a (0x7f317a58be0a in ./libtorch_cpu.so) frame #14: <unknown function> + 0xcdbd62 (0x7f3179006d62 in ./libtorch_cpu.so) frame #15: at::Tensor::to(c10::TensorOptions const&, bool, bool, c10::optional<c10::MemoryFormat>) const + 0x162 (0x7f3179294c42 in ./libtorch_cpu.so) frame #16: <unknown function> + 0xc0db8 (0x5633a08c0db8 in ../../projects/www.mlcathome.org_mlcathome/mlds-gpu_9.80_x86_64-pc-linux-gnu__cuda10200) frame #17: <unknown function> + 0x89c77 (0x5633a0889c77 in ../../projects/www.mlcathome.org_mlcathome/mlds-gpu_9.80_x86_64-pc-linux-gnu__cuda10200) frame #18: __libc_start_main + 0xf3 (0x7f30ffd660b3 in /lib/x86_64-linux-gnu/libc.so.6) frame #19: <unknown function> + 0x8675a (0x5633a088675a in ../../projects/www.mlcathome.org_mlcathome/mlds-gpu_9.80_x86_64-pc-linux-gnu__cuda10200) SIGABRT: abort called Stack trace (27 frames): ../../projects/www.mlcathome.org_mlcathome/mlds-gpu_9.80_x86_64-pc-linux-gnu__cuda10200(+0x37df9c)[0x5633a0b7df9c] /lib/x86_64-linux-gnu/libpthread.so.0(+0x153c0)[0x7f317e45f3c0] /lib/x86_64-linux-gnu/libc.so.6(gsignal+0xcb)[0x7f30ffd8518b] /lib/x86_64-linux-gnu/libc.so.6(abort+0x12b)[0x7f30ffd64859] ../../projects/www.mlcathome.org_mlcathome/mlds-gpu_9.80_x86_64-pc-linux-gnu__cuda10200(_ZN9__gnu_cxx27__verbose_terminate_handlerEv+0x135)[0x5633a0c2f7f5] ../../projects/www.mlcathome.org_mlcathome/mlds-gpu_9.80_x86_64-pc-linux-gnu__cuda10200(+0x398846)[0x5633a0b98846] ../../projects/www.mlcathome.org_mlcathome/mlds-gpu_9.80_x86_64-pc-linux-gnu__cuda10200(+0x398891)[0x5633a0b98891] ../../projects/www.mlcathome.org_mlcathome/mlds-gpu_9.80_x86_64-pc-linux-gnu__cuda10200(+0x3968c4)[0x5633a0b968c4] ./libc10_cuda.so(+0x13e90)[0x7f30ff280e90] ./libtorch_cuda.so(+0x89f6e0)[0x7f3100c016e0] ./libtorch_cuda.so(+0x87bbca)[0x7f3100bddbca] ./libtorch_cuda.so(+0x896055)[0x7f3100bf8055] ./libtorch_cpu.so(+0xcd9db7)[0x7f3179004db7] ./libtorch_cpu.so(+0xcdb675)[0x7f3179006675] ./libtorch_cpu.so(_ZN2at13empty_stridedEN3c108ArrayRefIlEES2_RKNS0_13TensorOptionsE+0x114)[0x7f3179126664] ./libtorch_cpu.so(+0x21c2266)[0x7f317a4ed266] ./libtorch_cpu.so(+0xcdb675)[0x7f3179006675] ./libtorch_cpu.so(_ZN2at13empty_stridedEN3c108ArrayRefIlEES2_RKNS0_13TensorOptionsE+0x114)[0x7f3179126664] ./libtorch_cpu.so(_ZN2at6native2toERKNS_6TensorERKN3c1013TensorOptionsEbbNS4_8optionalINS4_12MemoryFormatEEE+0xc66)[0x7f3178d6e5d6] ./libtorch_cpu.so(+0xe9476a)[0x7f31791bf76a] ./libtorch_cpu.so(+0x2260e0a)[0x7f317a58be0a] ./libtorch_cpu.so(+0xcdbd62)[0x7f3179006d62] ./libtorch_cpu.so(_ZNK2at6Tensor2toERKN3c1013TensorOptionsEbbNS1_8optionalINS1_12MemoryFormatEEE+0x162)[0x7f3179294c42] ../../projects/www.mlcathome.org_mlcathome/mlds-gpu_9.80_x86_64-pc-linux-gnu__cuda10200(+0xc0db8)[0x5633a08c0db8] ../../projects/www.mlcathome.org_mlcathome/mlds-gpu_9.80_x86_64-pc-linux-gnu__cuda10200(+0x89c77)[0x5633a0889c77] /lib/x86_64-linux-gnu/libc.so.6(__libc_start_main+0xf3)[0x7f30ffd660b3] ../../projects/www.mlcathome.org_mlcathome/mlds-gpu_9.80_x86_64-pc-linux-gnu__cuda10200(+0x8675a)[0x5633a088675a] Exiting... </stderr_txt> ]]>
©2022 MLC@Home Team
A project of the Cognition, Robotics, and Learning (CORAL) Lab at the University of Maryland, Baltimore County (UMBC)