Volunteer Computing is important.
There are many areas of science and engineering that have benefited from volunteer computing. This is a type of distributed computing that sources compute from the public. Platforms such as BOINC, Folding@home and others have opened up the world of citizen science to computer owners in the most passive way possible. Your computer rarely maxes out its computing potential. In fact, it typically runs at 40-60% power, leaving the remaining potential unused. So while your working on your next novel or facebook post, you can help solve the world’s most pressing issues.
Through Volunteer Computing, scientists have gained access to precious compute power for their projects. Projects, that otherwise would have been unable to proceed due to time or cost limitations. That means that for every second of compute performed in these platforms, we are closer to finding a cure for cancer, understanding the fabric of the universe or improving agriculture.
So, what’s the volunteer computing fallacy?
Volunteer Computing systems obviously relies on volunteers to contribute their devices’ CPU time for others to use. For every task completed, costs are incurred by the device owner in the form of electricity and bandwidth.
The fallacy is calculating the amount of compute performed, as being used effectively. BOINC is the most popular platform for Volunteer Computing projects and many spin-off programs use BOINC technology such as Boid for monetizing volunteer computing. BOINC and their spin-offs proudly report on the FLOPS produced or compute hours generated. For 10 hours of compute generated, only 3 hours are actually used on new problems. The rest of the processing is to validate others’ calculations.
A minimum of 66% of the computations for these projects is being wasted.
Why is so much compute wasted?
Faults periodically happen in a network such as unresponsive devices or corrupted data. Corrupted data may sound malicious but it can also be because of disruptions in a network. Another reason may be the returned result from a device may be outside of the tolerated variance for a calculation. Because of slight differences in computer architectures, not all calculations return the same result.
VC platforms typically tackle this issue by distributing the same task to multiple devices. The thinking goes that at least one of the devices will return the acceptable result if there is a fault. All the results are then collected and the majority result is determined to be the correct answer. One of the correct answers is then saved and the rest are discarded.
At a minimum, a majority evaluation would need to have 3 results returned, with at least 2 results the same. No matching results can be handled in a number of ways including reissuing the task more devices. Increasing the results to compare and the chance of finding a matching result.
We are also assuming that we received all the results back in the required time. If after a specified amount of time, a result is not received it can trigger the task to be sent to a new device, in the hope of receiving a result on that attempt.
So if 1 in 3 results received for a task is actually used, then the reported cost for a single task can be 66% higher than perceived. And while the total amount of processing power reported by VCs is actually needed in this architecture, a maximum of 1/3rd is actually used in a result for a task. This reduces the total potential of compute available to 33% of what is stated. So if my project requires 100 TFLOPS over a year, the actual compute consumed would be at least 300 TFLOPS. Considering this would cost on average $225,000 with BOINC – that’s a big difference, or at most 33 TFLOPS for the same cost.
Can that be fixed?
LiteSolve has developed an architecture that maximises the compute available to near 100%. Some processing is utilised for security and identity management. This distributed compute platform has been built from the ground up, looking to address many inefficiencies that have persisted from early architectures. Mostly we are looking to use the capabilities of the technology available today to advance science. We have ongoing work with the distribution algorithms to further increase the efficiency of the platform utilising AI and ML. We are also assessing various methods of determining ‘green power’ for the computations.