distributed.net Faq-O-Matic: What kinds of problems are well-suited for distributed computing?

distributed.net Faq-O-Matic :

General background information :

What kinds of problems are well-suited for distributed computing?

Distributed computing can be used in a variety of situations where there is a need for a large amount of computation, but there are a number of factors that that may limit the effectiveness of distributed computing. Although it is possible to parallelize nearly any computation problem, if the "overhead" of distributing a problem is very expensive (in terms of time and/or network bandwidth) then it may actually be quicker to simply compute it on one machine instead. In general, the goal of distributed computing is to minimize the overall time required to complete a problem, from start to finish.

data-to-compute ratio

One of the most significant factors is whether or not a problem has a low "data-to-compute" ratio. This ratio reflects the amount of network communication that needs to be sent or received to each machine (from the servers), relative to the amount of computation that is performed on each machine. Problems that require a lot of data to be sent before any computation can be done may mean that more time is spent just delivering the information required to begin computation, than actually performing it. All of the projects that have been selected by distributed.net have had the property that only a few hundred bytes of data need to be transmitted for several hours of computation.

inter-node communication

Another factor that is partially related to the previous item is how much "inter-node communication" is required for a problem. Some types of distributed computing problems require that machines occasionally synchronize, coordinate, or exchange information with the other machines that are also working on the problem. Depending on the choice of implementation, this communication can be directly between machines (via direct "peer-to-peer" communication, via server-based coordination, or other techniques).

Problems that do not require any type of coordination or synchronization between machines, nor any ordering in computation of work, are commonly called "embarrasingly parallel". These types of problems are very well suited for very large-scale distributed computing since the individual pieces of work can be completed in any order, and can be redistributed to other machines if any single result fails to be returned. Depending on the type of computation being done, not all of the workunits may need to positively be completed if certain "key" results are received, or if a certain threshold has been met. All projects that distributed.net has worked on generally fit in this category. compute-split-merge.png (19.4 K)

Programming libraries such as MPI (Message Passing Interface), PVM (Parallel Virtual Machine), BSPlib (Bulk Synchronous Parallel library), and others are commonly used in scientific applications that require a high amount of precise intercommunication and syncronization between machines. These libraries generally are only usable in cluster environments because they need very fast network connectivity between machines due to the large amount of nearly continuous network traffic that is done. The fact that network communication is done between all machines in the computation also means that there generally cannot be firewalls or other highly restrictive network blocking between machines. Generally they also all require that there be very all machines remain available from start to finish of the problem. Because of these constraints, problems that require these libraries (or otherwise need a high degree of interconnect) are generally not well suited for large-scale Internet based computing. compute-cluster.png (15.2 K)

In multi-pass, server-based coordination schemes, all of the intermediate results are collected by a central server, data coordination performed by that server, and then redistributed back to machines again, if needed. Problems that rely on this technique still need to have a low "data-to-compute" ratio, since the amount of time that is spent communicating with the server and getting back another piece of work can become expensive. compute-multi-split.png (33.7 K)

public attraction and incentives

In public Internet projects like distributed.net, the machines that are performing the computational work are generally volunteered machines that are owned by other people. Therefore in order to get people to install the client on their machines, your problem will need to have some level of appeal or attraction that will make people want to donate their computer's power toward the effort. Projects that have a potential benefit for the common good tend to have high amounts of appeal (such as encouraging higher levels of data security, searching for cures for cancer/diseases). Projects that have technical appeal can also be popular among some groups of people, but are sometimes difficult to encourage wide appeal (such as proving large prime numbers, identifying mathematical oddities).

Some people are willing to allow their machines to be used for arbitrary purposes if there is the potential for significant compensation in return. (Some groups of people are even willing to permit the use of their machines for commercial purposes that would not benefit themselves or any public entity as long as they are personally compensated in some way.) Sometimes this type of compensation may mean direct compensation for every unit of time utilized, while other times it may mean a potential for a larger compensation that is awarded in a lottery-style selection process.

Other groups of people are motivated by community competition that are possible via "statistics" and ranking of the amount of computational time contributed. These groups of people enjoy "seeing their name in lights", especially if their name is ranked higher than someone else's name.

This document is: http://faq.distributed.net/?file=280

	[Search]	[Appearance]		[Show Expert Edit Commands]
This is a Faq-O-Matic 2.721.test.