| Name | ParityModified-1645997191-17300-3-0_0 |
| Workunit | 10049937 |
| Created | 11 Mar 2022, 7:20:52 UTC |
| Sent | 11 Mar 2022, 15:58:03 UTC |
| Report deadline | 19 Mar 2022, 15:58:03 UTC |
| Received | 11 Mar 2022, 20:29:24 UTC |
| Server state | Over |
| Outcome | Computation error |
| Client state | Compute error |
| Exit status | 193 (0x000000C1) EXIT_SIGNAL |
| Computer ID | 13796 |
| Run time | 6 sec |
| CPU time | 4 sec |
| Validate state | Invalid |
| Credit | 0.00 |
| Device peak FLOPS | 5,490.01 GFLOPS |
| Application version | Machine Learning Dataset Generator (GPU) v9.80 (cuda10200) x86_64-pc-linux-gnu |
| Peak disk usage | 2.99 GB |
<core_client_version>7.14.2</core_client_version> <![CDATA[ <message> process exited with code 193 (0xc1, -63)</message> <stderr_txt> DEBUG: Args: ../../projects/www.mlcathome.org_mlcathome/mlds-gpu_9.80_x86_64-pc-linux-gnu__cuda10200 -c --maxepoch 2048 nthreads: 1 gpudev: 0 Re-exec()-ing to set environment correctly Machine Learning Dataset Generator v9.80 (Linux/x86_64) (libTorch: release/1.7 GPU: NVIDIA GeForce GTX 1660 Ti) [2022-03-11 17:31:28 main:442] : INFO : Set logging level to 1 [2022-03-11 17:31:28 main:448] : INFO : Running in BOINC Client mode [2022-03-11 17:31:28 main:451] : INFO : Resolving all filenames [2022-03-11 17:31:28 main:459] : INFO : Resolved: dataset.hdf5 => dataset.hdf5 (exists = 1) [2022-03-11 17:31:28 main:459] : INFO : Resolved: model.cfg => model.cfg (exists = 0) [2022-03-11 17:31:28 main:459] : INFO : Resolved: model-final.pt => model-final.pt (exists = 0) [2022-03-11 17:31:28 main:459] : INFO : Resolved: model-input.pt => model-input.pt (exists = 1) [2022-03-11 17:31:28 main:459] : INFO : Resolved: snapshot.pt => snapshot.pt (exists = 0) [2022-03-11 17:31:28 main:479] : INFO : Dataset filename: dataset.hdf5 [2022-03-11 17:31:28 main:481] : INFO : Configuration: [2022-03-11 17:31:28 main:482] : INFO : Model type: GRU [2022-03-11 17:31:28 main:483] : INFO : Validation Loss Threshold: 0.0001 [2022-03-11 17:31:28 main:484] : INFO : Max Epochs: 2048 [2022-03-11 17:31:28 main:485] : INFO : Batch Size: 128 [2022-03-11 17:31:28 main:486] : INFO : Learning Rate: 0.01 [2022-03-11 17:31:28 main:487] : INFO : Patience: 10 [2022-03-11 17:31:28 main:488] : INFO : Hidden Width: 12 [2022-03-11 17:31:28 main:489] : INFO : # Recurrent Layers: 4 [2022-03-11 17:31:28 main:490] : INFO : # Backend Layers: 4 [2022-03-11 17:31:28 main:491] : INFO : # Threads: 1 [2022-03-11 17:31:28 main:493] : INFO : Preparing Dataset [2022-03-11 17:31:28 load_hdf5_ds_into_tensor:28] : INFO : Loading Dataset /Xt from dataset.hdf5 into memory [2022-03-11 17:31:28 load_hdf5_ds_into_tensor:28] : INFO : Loading Dataset /Yt from dataset.hdf5 into memory [2022-03-11 17:31:30 load:106] : INFO : Successfully loaded dataset of 2048 examples into memory. [2022-03-11 17:31:30 load_hdf5_ds_into_tensor:28] : INFO : Loading Dataset /Xv from dataset.hdf5 into memory [2022-03-11 17:31:30 load_hdf5_ds_into_tensor:28] : INFO : Loading Dataset /Yv from dataset.hdf5 into memory [2022-03-11 17:31:30 load:106] : INFO : Successfully loaded dataset of 512 examples into memory. [2022-03-11 17:31:30 main:501] : INFO : Creating Model [2022-03-11 17:31:31 main:514] : INFO : Preparing config file [2022-03-11 17:31:31 main:526] : INFO : Creating new config file [2022-03-11 17:31:31 main:545] : INFO : This is a continuation WU, loading previous network [2022-03-11 17:31:32 main:566] : INFO : Loading DataLoader into Memory [2022-03-11 17:31:32 main:569] : INFO : Starting Training terminate called after throwing an instance of 'c10::Error' what(): CUDA error: CUBLAS_STATUS_NOT_INITIALIZED when calling `cublasCreate(handle)` Exception raised from createCublasHandle at /home/mlcbuild/git/pytorch-build/build-cuda/pytorch-prefix/src/pytorch/aten/src/ATen/cuda/CublasHandlePool.cpp:8 (most recent call first): frame #0: c10::Error::Error(c10::SourceLocation, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >) + 0x6b (0x7f3a9507099b in ./libc10.so) frame #1: <unknown function> + 0x84963d (0x7f3a1772463d in ./libtorch_cuda.so) frame #2: at::cuda::getCurrentCUDABlasHandle() + 0xd86 (0x7f3a17725796 in ./libtorch_cuda.so) frame #3: <unknown function> + 0x833a42 (0x7f3a1770ea42 in ./libtorch_cuda.so) frame #4: at::native::(anonymous namespace)::addmm_out_cuda_impl(at::Tensor&, at::Tensor const&, at::Tensor const&, at::Tensor const&, c10::Scalar, c10::Scalar) + 0xfca (0x7f3a18979bba in ./libtorch_cuda.so) frame #5: at::native::mm_cuda(at::Tensor const&, at::Tensor const&) + 0xc5 (0x7f3a1897bc65 in ./libtorch_cuda.so) frame #6: <unknown function> + 0x866150 (0x7f3a17741150 in ./libtorch_cuda.so) frame #7: <unknown function> + 0x5d3704 (0x7f3a8f477704 in ./libtorch_cpu.so) frame #8: at::Tensor c10::Dispatcher::call<at::Tensor, at::Tensor const&, at::Tensor const&>(c10::TypedOperatorHandle<at::Tensor (at::Tensor const&, at::Tensor const&)> const&, at::Tensor const&, at::Tensor const&) const + 0xd0 (0x7f3a8fce8100 in ./libtorch_cpu.so) frame #9: at::mm(at::Tensor const&, at::Tensor const&) + 0x5b (0x7f3a8fc36b7b in ./libtorch_cpu.so) frame #10: <unknown function> + 0x22c45d2 (0x7f3a911685d2 in ./libtorch_cpu.so) frame #11: <unknown function> + 0x5d3704 (0x7f3a8f477704 in ./libtorch_cpu.so) frame #12: at::Tensor c10::Dispatcher::call<at::Tensor, at::Tensor const&, at::Tensor const&>(c10::TypedOperatorHandle<at::Tensor (at::Tensor const&, at::Tensor const&)> const&, at::Tensor const&, at::Tensor const&) const + 0xd0 (0x7f3a8fce8100 in ./libtorch_cpu.so) frame #13: at::Tensor::mm(at::Tensor const&) const + 0x5b (0x7f3a8fde93db in ./libtorch_cpu.so) frame #14: <unknown function> + 0x89008d (0x7f3a8f73408d in ./libtorch_cpu.so) frame #15: at::native::matmul(at::Tensor const&, at::Tensor const&) + 0x4a (0x7f3a8f734a0a in ./libtorch_cpu.so) frame #16: <unknown function> + 0xea5480 (0x7f3a8fd49480 in ./libtorch_cpu.so) frame #17: <unknown function> + 0x2211d1d (0x7f3a910b5d1d in ./libtorch_cpu.so) frame #18: <unknown function> + 0x5d3704 (0x7f3a8f477704 in ./libtorch_cpu.so) frame #19: at::Tensor c10::Dispatcher::call<at::Tensor, at::Tensor const&, at::Tensor const&>(c10::TypedOperatorHandle<at::Tensor (at::Tensor const&, at::Tensor const&)> const&, at::Tensor const&, at::Tensor const&) const + 0xd0 (0x7f3a8fce8100 in ./libtorch_cpu.so) frame #20: at::Tensor::matmul(at::Tensor const&) const + 0x5b (0x7f3a8fde92fb in ./libtorch_cpu.so) frame #21: torch::nn::LinearImpl::forward(at::Tensor const&) + 0xe1 (0x7f3a91b96ab1 in ./libtorch_cpu.so) frame #22: <unknown function> + 0x93014 (0x558df9d65014 in ../../projects/www.mlcathome.org_mlcathome/mlds-gpu_9.80_x86_64-pc-linux-gnu__cuda10200) frame #23: <unknown function> + 0x9a0e6 (0x558df9d6c0e6 in ../../projects/www.mlcathome.org_mlcathome/mlds-gpu_9.80_x86_64-pc-linux-gnu__cuda10200) frame #24: <unknown function> + 0x8ad10 (0x558df9d5cd10 in ../../projects/www.mlcathome.org_mlcathome/mlds-gpu_9.80_x86_64-pc-linux-gnu__cuda10200) frame #25: __libc_start_main + 0xeb (0x7f3a1690d09b in /lib/x86_64-linux-gnu/libc.so.6) frame #26: <unknown function> + 0x8675a (0x558df9d5875a in ../../projects/www.mlcathome.org_mlcathome/mlds-gpu_9.80_x86_64-pc-linux-gnu__cuda10200) SIGABRT: abort called Stack trace (33 frames): ../../projects/www.mlcathome.org_mlcathome/mlds-gpu_9.80_x86_64-pc-linux-gnu__cuda10200(+0x37df9c)[0x558dfa04ff9c] /lib/x86_64-linux-gnu/libpthread.so.0(+0x12730)[0x7f3a95008730] /lib/x86_64-linux-gnu/libc.so.6(gsignal+0x10b)[0x7f3a169207bb] /lib/x86_64-linux-gnu/libc.so.6(abort+0x121)[0x7f3a1690b535] ../../projects/www.mlcathome.org_mlcathome/mlds-gpu_9.80_x86_64-pc-linux-gnu__cuda10200(_ZN9__gnu_cxx27__verbose_terminate_handlerEv+0x135)[0x558dfa1017f5] ../../projects/www.mlcathome.org_mlcathome/mlds-gpu_9.80_x86_64-pc-linux-gnu__cuda10200(+0x398846)[0x558dfa06a846] ../../projects/www.mlcathome.org_mlcathome/mlds-gpu_9.80_x86_64-pc-linux-gnu__cuda10200(+0x398891)[0x558dfa06a891] ../../projects/www.mlcathome.org_mlcathome/mlds-gpu_9.80_x86_64-pc-linux-gnu__cuda10200(__cxa_rethrow+0x49)[0x558dfa068919] ./libtorch_cuda.so(_ZN2at4cuda24getCurrentCUDABlasHandleEv+0x13d4)[0x7f3a17725de4] ./libtorch_cuda.so(+0x833a42)[0x7f3a1770ea42] ./libtorch_cuda.so(_ZN2at6native84_GLOBAL__N__60_tmpxft_000025a8_00000000_12_LinearAlgebra_compute_75_cpp1_ii_5e5bd7fb19addmm_out_cuda_implERNS_6TensorERKS2_S5_S5_N3c106ScalarES7_+0xfca)[0x7f3a18979bba] ./libtorch_cuda.so(_ZN2at6native7mm_cudaERKNS_6TensorES3_+0xc5)[0x7f3a1897bc65] ./libtorch_cuda.so(+0x866150)[0x7f3a17741150] ./libtorch_cpu.so(+0x5d3704)[0x7f3a8f477704] ./libtorch_cpu.so(_ZNK3c1010Dispatcher4callIN2at6TensorEJRKS3_S5_EEET_RKNS_19TypedOperatorHandleIFS6_DpT0_EEES9_+0xd0)[0x7f3a8fce8100] ./libtorch_cpu.so(_ZN2at2mmERKNS_6TensorES2_+0x5b)[0x7f3a8fc36b7b] ./libtorch_cpu.so(+0x22c45d2)[0x7f3a911685d2] ./libtorch_cpu.so(+0x5d3704)[0x7f3a8f477704] ./libtorch_cpu.so(_ZNK3c1010Dispatcher4callIN2at6TensorEJRKS3_S5_EEET_RKNS_19TypedOperatorHandleIFS6_DpT0_EEES9_+0xd0)[0x7f3a8fce8100] ./libtorch_cpu.so(_ZNK2at6Tensor2mmERKS0_+0x5b)[0x7f3a8fde93db] ./libtorch_cpu.so(+0x89008d)[0x7f3a8f73408d] ./libtorch_cpu.so(_ZN2at6native6matmulERKNS_6TensorES3_+0x4a)[0x7f3a8f734a0a] ./libtorch_cpu.so(+0xea5480)[0x7f3a8fd49480] ./libtorch_cpu.so(+0x2211d1d)[0x7f3a910b5d1d] ./libtorch_cpu.so(+0x5d3704)[0x7f3a8f477704] ./libtorch_cpu.so(_ZNK3c1010Dispatcher4callIN2at6TensorEJRKS3_S5_EEET_RKNS_19TypedOperatorHandleIFS6_DpT0_EEES9_+0xd0)[0x7f3a8fce8100] ./libtorch_cpu.so(_ZNK2at6Tensor6matmulERKS0_+0x5b)[0x7f3a8fde92fb] ./libtorch_cpu.so(_ZN5torch2nn10LinearImpl7forwardERKN2at6TensorE+0xe1)[0x7f3a91b96ab1] ../../projects/www.mlcathome.org_mlcathome/mlds-gpu_9.80_x86_64-pc-linux-gnu__cuda10200(+0x93014)[0x558df9d65014] ../../projects/www.mlcathome.org_mlcathome/mlds-gpu_9.80_x86_64-pc-linux-gnu__cuda10200(+0x9a0e6)[0x558df9d6c0e6] ../../projects/www.mlcathome.org_mlcathome/mlds-gpu_9.80_x86_64-pc-linux-gnu__cuda10200(+0x8ad10)[0x558df9d5cd10] /lib/x86_64-linux-gnu/libc.so.6(__libc_start_main+0xeb)[0x7f3a1690d09b] ../../projects/www.mlcathome.org_mlcathome/mlds-gpu_9.80_x86_64-pc-linux-gnu__cuda10200(+0x8675a)[0x558df9d5875a] Exiting... </stderr_txt> ]]>
©2022 MLC@Home Team
A project of the Cognition, Robotics, and Learning (CORAL) Lab at the University of Maryland, Baltimore County (UMBC)