Task 7830648

Name ParityModified-1631730375-25333-1_1
Workunit 5219302
Created 4 Oct 2021, 23:13:34 UTC
Sent 4 Oct 2021, 23:36:23 UTC
Report deadline 11 Oct 2021, 23:36:23 UTC
Received 6 Oct 2021, 16:39:52 UTC
Server state Over
Outcome Computation error
Client state Compute error
Exit status 193 (0x000000C1) EXIT_SIGNAL
Computer ID 15437
Run time 7 sec
CPU time 1 sec
Validate state Invalid
Credit 0.00
Device peak FLOPS 9,221.50 GFLOPS
Application version Machine Learning Dataset Generator (GPU) v9.80 (cuda10200)
x86_64-pc-linux-gnu
Peak working set size 74.48 MB
Peak swap size 2.72 GB
Peak disk usage 2.98 GB

Stderr output

<core_client_version>7.14.2</core_client_version>
<![CDATA[
<message>
process exited with code 193 (0xc1, -63)</message>
<stderr_txt>
DEBUG: Args: ../../projects/www.mlcathome.org_mlcathome/mlds-gpu_9.80_x86_64-pc-linux-gnu__cuda10200 -c --maxepoch 2048 
nthreads: 1 gpudev: 0
Re-exec()-ing to set environment correctly
Machine Learning Dataset Generator v9.80 (Linux/x86_64) (libTorch: release/1.7 GPU: GeForce RTX 2070 SUPER)
[2021-10-05 14:26:31	                main:442]	:	INFO	:	Set logging level to 1
[2021-10-05 14:26:31	                main:448]	:	INFO	:	Running in BOINC Client mode
[2021-10-05 14:26:31	                main:451]	:	INFO	:	Resolving all filenames
[2021-10-05 14:26:31	                main:459]	:	INFO	:	Resolved: dataset.hdf5 => ../../projects/www.mlcathome.org_mlcathome/ParityModified-train-val-dataset.hdf5 (exists = 1)
[2021-10-05 14:26:31	                main:459]	:	INFO	:	Resolved: model.cfg => ../../projects/www.mlcathome.org_mlcathome/ParityModified-1631730375-25333-1_1_r652322843_1 (exists = 0)
[2021-10-05 14:26:31	                main:459]	:	INFO	:	Resolved: model-final.pt => ../../projects/www.mlcathome.org_mlcathome/ParityModified-1631730375-25333-1_1_r652322843_0 (exists = 0)
[2021-10-05 14:26:31	                main:459]	:	INFO	:	Resolved: model-input.pt => ../../projects/www.mlcathome.org_mlcathome/ParityModified-1631730375-25333-1 (exists = 1)
[2021-10-05 14:26:31	                main:459]	:	INFO	:	Resolved: snapshot.pt => snapshot.pt (exists = 0)
[2021-10-05 14:26:31	                main:479]	:	INFO	:	Dataset filename: ../../projects/www.mlcathome.org_mlcathome/ParityModified-train-val-dataset.hdf5
[2021-10-05 14:26:31	                main:481]	:	INFO	:	Configuration: 
[2021-10-05 14:26:31	                main:482]	:	INFO	:	    Model type: GRU
[2021-10-05 14:26:31	                main:483]	:	INFO	:	    Validation Loss Threshold: 0.0001
[2021-10-05 14:26:31	                main:484]	:	INFO	:	    Max Epochs: 2048
[2021-10-05 14:26:31	                main:485]	:	INFO	:	    Batch Size: 128
[2021-10-05 14:26:31	                main:486]	:	INFO	:	    Learning Rate: 0.01
[2021-10-05 14:26:31	                main:487]	:	INFO	:	    Patience: 10
[2021-10-05 14:26:31	                main:488]	:	INFO	:	    Hidden Width: 12
[2021-10-05 14:26:31	                main:489]	:	INFO	:	    # Recurrent Layers: 4
[2021-10-05 14:26:31	                main:490]	:	INFO	:	    # Backend Layers: 4
[2021-10-05 14:26:31	                main:491]	:	INFO	:	    # Threads: 1
[2021-10-05 14:26:31	                main:493]	:	INFO	:	Preparing Dataset
[2021-10-05 14:26:31	load_hdf5_ds_into_tensor:28]	:	INFO	:	Loading Dataset /Xt from ../../projects/www.mlcathome.org_mlcathome/ParityModified-train-val-dataset.hdf5 into memory
[2021-10-05 14:26:32	load_hdf5_ds_into_tensor:28]	:	INFO	:	Loading Dataset /Yt from ../../projects/www.mlcathome.org_mlcathome/ParityModified-train-val-dataset.hdf5 into memory
terminate called after throwing an instance of 'c10::Error'
  what():  CUDA error: invalid device ordinal
Exception raised from exchangeDevice at /home/mlcbuild/git/pytorch-build/build-cuda/pytorch-prefix/src/pytorch/c10/cuda/impl/CUDAGuardImpl.h:31 (most recent call first):
frame #0: c10::Error::Error(c10::SourceLocation, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >) + 0x6b (0x7f317e4d199b in ./libc10.so)
frame #1: <unknown function> + 0x13e29 (0x7f30ff280e29 in ./libc10_cuda.so)
frame #2: <unknown function> + 0x89f6e0 (0x7f3100c016e0 in ./libtorch_cuda.so)
frame #3: <unknown function> + 0x87bbca (0x7f3100bddbca in ./libtorch_cuda.so)
frame #4: <unknown function> + 0x896055 (0x7f3100bf8055 in ./libtorch_cuda.so)
frame #5: <unknown function> + 0xcd9db7 (0x7f3179004db7 in ./libtorch_cpu.so)
frame #6: <unknown function> + 0xcdb675 (0x7f3179006675 in ./libtorch_cpu.so)
frame #7: at::empty_strided(c10::ArrayRef<long>, c10::ArrayRef<long>, c10::TensorOptions const&) + 0x114 (0x7f3179126664 in ./libtorch_cpu.so)
frame #8: <unknown function> + 0x21c2266 (0x7f317a4ed266 in ./libtorch_cpu.so)
frame #9: <unknown function> + 0xcdb675 (0x7f3179006675 in ./libtorch_cpu.so)
frame #10: at::empty_strided(c10::ArrayRef<long>, c10::ArrayRef<long>, c10::TensorOptions const&) + 0x114 (0x7f3179126664 in ./libtorch_cpu.so)
frame #11: at::native::to(at::Tensor const&, c10::TensorOptions const&, bool, bool, c10::optional<c10::MemoryFormat>) + 0xc66 (0x7f3178d6e5d6 in ./libtorch_cpu.so)
frame #12: <unknown function> + 0xe9476a (0x7f31791bf76a in ./libtorch_cpu.so)
frame #13: <unknown function> + 0x2260e0a (0x7f317a58be0a in ./libtorch_cpu.so)
frame #14: <unknown function> + 0xcdbd62 (0x7f3179006d62 in ./libtorch_cpu.so)
frame #15: at::Tensor::to(c10::TensorOptions const&, bool, bool, c10::optional<c10::MemoryFormat>) const + 0x162 (0x7f3179294c42 in ./libtorch_cpu.so)
frame #16: <unknown function> + 0xc0db8 (0x5633a08c0db8 in ../../projects/www.mlcathome.org_mlcathome/mlds-gpu_9.80_x86_64-pc-linux-gnu__cuda10200)
frame #17: <unknown function> + 0x89c77 (0x5633a0889c77 in ../../projects/www.mlcathome.org_mlcathome/mlds-gpu_9.80_x86_64-pc-linux-gnu__cuda10200)
frame #18: __libc_start_main + 0xf3 (0x7f30ffd660b3 in /lib/x86_64-linux-gnu/libc.so.6)
frame #19: <unknown function> + 0x8675a (0x5633a088675a in ../../projects/www.mlcathome.org_mlcathome/mlds-gpu_9.80_x86_64-pc-linux-gnu__cuda10200)

SIGABRT: abort called
Stack trace (27 frames):
../../projects/www.mlcathome.org_mlcathome/mlds-gpu_9.80_x86_64-pc-linux-gnu__cuda10200(+0x37df9c)[0x5633a0b7df9c]
/lib/x86_64-linux-gnu/libpthread.so.0(+0x153c0)[0x7f317e45f3c0]
/lib/x86_64-linux-gnu/libc.so.6(gsignal+0xcb)[0x7f30ffd8518b]
/lib/x86_64-linux-gnu/libc.so.6(abort+0x12b)[0x7f30ffd64859]
../../projects/www.mlcathome.org_mlcathome/mlds-gpu_9.80_x86_64-pc-linux-gnu__cuda10200(_ZN9__gnu_cxx27__verbose_terminate_handlerEv+0x135)[0x5633a0c2f7f5]
../../projects/www.mlcathome.org_mlcathome/mlds-gpu_9.80_x86_64-pc-linux-gnu__cuda10200(+0x398846)[0x5633a0b98846]
../../projects/www.mlcathome.org_mlcathome/mlds-gpu_9.80_x86_64-pc-linux-gnu__cuda10200(+0x398891)[0x5633a0b98891]
../../projects/www.mlcathome.org_mlcathome/mlds-gpu_9.80_x86_64-pc-linux-gnu__cuda10200(+0x3968c4)[0x5633a0b968c4]
./libc10_cuda.so(+0x13e90)[0x7f30ff280e90]
./libtorch_cuda.so(+0x89f6e0)[0x7f3100c016e0]
./libtorch_cuda.so(+0x87bbca)[0x7f3100bddbca]
./libtorch_cuda.so(+0x896055)[0x7f3100bf8055]
./libtorch_cpu.so(+0xcd9db7)[0x7f3179004db7]
./libtorch_cpu.so(+0xcdb675)[0x7f3179006675]
./libtorch_cpu.so(_ZN2at13empty_stridedEN3c108ArrayRefIlEES2_RKNS0_13TensorOptionsE+0x114)[0x7f3179126664]
./libtorch_cpu.so(+0x21c2266)[0x7f317a4ed266]
./libtorch_cpu.so(+0xcdb675)[0x7f3179006675]
./libtorch_cpu.so(_ZN2at13empty_stridedEN3c108ArrayRefIlEES2_RKNS0_13TensorOptionsE+0x114)[0x7f3179126664]
./libtorch_cpu.so(_ZN2at6native2toERKNS_6TensorERKN3c1013TensorOptionsEbbNS4_8optionalINS4_12MemoryFormatEEE+0xc66)[0x7f3178d6e5d6]
./libtorch_cpu.so(+0xe9476a)[0x7f31791bf76a]
./libtorch_cpu.so(+0x2260e0a)[0x7f317a58be0a]
./libtorch_cpu.so(+0xcdbd62)[0x7f3179006d62]
./libtorch_cpu.so(_ZNK2at6Tensor2toERKN3c1013TensorOptionsEbbNS1_8optionalINS1_12MemoryFormatEEE+0x162)[0x7f3179294c42]
../../projects/www.mlcathome.org_mlcathome/mlds-gpu_9.80_x86_64-pc-linux-gnu__cuda10200(+0xc0db8)[0x5633a08c0db8]
../../projects/www.mlcathome.org_mlcathome/mlds-gpu_9.80_x86_64-pc-linux-gnu__cuda10200(+0x89c77)[0x5633a0889c77]
/lib/x86_64-linux-gnu/libc.so.6(__libc_start_main+0xf3)[0x7f30ffd660b3]
../../projects/www.mlcathome.org_mlcathome/mlds-gpu_9.80_x86_64-pc-linux-gnu__cuda10200(+0x8675a)[0x5633a088675a]

Exiting...

</stderr_txt>
]]>


©2022 MLC@Home Team
A project of the Cognition, Robotics, and Learning (CORAL) Lab at the University of Maryland, Baltimore County (UMBC)