[TWIM Notes] Nov 2 2020

Message boards : News : [TWIM Notes] Nov 2 2020
Message board moderation

To post messages, you must log in.

AuthorMessage
pianoman [MLC@Home Admin]
Project administrator
Project developer
Project tester
Project scientist

Send message
Joined: 30 Jun 20
Posts: 462
Credit: 21,406,548
RAC: 0
Message 758 - Posted: 3 Nov 2020, 3:14:07 UTC

This Week in MLC@Home
Notes for Nov 2 2020
A weekly summary of news and notes for MLC@Home

If you're in the US, and you haven't already taken advantage of early voting, please VOTE tomorrow.

Summary
GPU week(s), part 2. The downsides of giving weekly updates is sometimes you don't have a lot to report. Sadly most of this week was lost attempting to debug a strange crash related solely to the client when compiled for Linux/CUDA. It turns out it has nothing to do with our code, its a strange interaction between the threading library used in the pre-compiled pytorch libraries, and the BOINC library. It took several days for us to realize it wasn't our code causing the issue, now it'll take a few more to find a solution, which will likely require a custom-compiled pytorch, which will take a few more days to debug.

GPU speedups are worth it, but please bear with us as we work through these issues in the testing queue, and keep churning away on the main research queue!

News:

  • We expect to have most CUDA issues ironed out and have them in general (non-beta) use by lastnext week. ROCm support would be next, but it is a lower priority.
  • Datasets 1,2 and 3 continue crunching away. GREAT progress so far!
  • We will back off on the WU FLOPS estimates for any newly issues WUs starting this week, this should solve the overestimation problems with time.
  • Spent a little time working on DS4 and the DS3 paper, but GPU debugging again took up the majority of free time.



Project status snapshot:
(note these numbers are approximations)

Tasks
Tasks ready to send 41034
Tasks in progress 9231
Users
With credit 1080
Registered in past 24 hours 39
Hosts
With recent credit 2106
Registered in past 24 hours 24
Current GigaFLOPS 29333.56

Dataset 1 and 2 progress:

SingleDirectMachine      10002/10004
EightBitMachine          10001/10006
SingleInvertMachine      10001/10003
SimpleXORMachine         10000/10002
ParityMachine              912/10005

ParityModified             289/10005
EightBitModified          6597/10006
SimpleXORModified        10005/10005
SingleDirectModified     10004/10004
SingleInvertModified     10002/10002 

Dataset 3 progress:
Overall (so far): 44466/50557
Milestone 1, 100x100:  10000/10000
Milestone 2, 100x1000: 44466/100000
Milestone 3: 100x10000: 44466/1000000


Last week's TWIM Notes: Oct 27 2020

Thanks again to all our volunteers!

-- The MLC@Home Admins[/s]
ID: 758 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
bozz4science

Send message
Joined: 9 Jul 20
Posts: 142
Credit: 11,536,204
RAC: 3
Message 760 - Posted: 3 Nov 2020, 16:21:21 UTC

Thanks for the update! Custom-compiling pytorch surely sounds like a lot of work... Wish you the best of luck for this!
ID: 760 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote

Message boards : News : [TWIM Notes] Nov 2 2020

©2024 MLC@Home Team
A project of the Cognition, Robotics, and Learning (CORAL) Lab at the University of Maryland, Baltimore County (UMBC)