What is Distributed Computing?
Distributed Computing is the sharing of tasks across many computers problem anywhere in the globe. It does this by breaking down a large problem into small parts so that different computers can work on the parts at the same time. The communication with all the devices and coordination is where it becomes complex.
Distributed computing is typically associated with High-Performance Computing (HPC) where complex calculations are required to solve a problem. When HPC is used in connection with a supercomputer, all these machines; processors, memory, storage and networking is housed in a single location with its own power supply, backups and administration needs. With Distributed Computing, this does not need to be the case and connected machines can be geographically located anywhere, with an internet connection.
Why is Distributed Computing important
Single processor chip design has been pushed to its limits and not much more power can be gotten out of it. So for years, computer manufacturers have been putting more processors into a single chip. Processing simultaneously is where the gains in performance can be achieved. We now have processors that have multiple cores and some clever ways of emulating that it has even more. When processors hit their limit and we want to go faster, the only way is to break a problem down into parts and process them in parallel on different processors.
Introduction to parallel processing (https://computing.llnl.gov/tutorials/parallel_comp/)
Distributed computing can be used to work on a very large task, broken into parts so that the computing power can be resourced from many locations. It is also applied to large datasets to transform it into insights. With distributed computing, the hyper-scale clouds and applications built on them are able to scale in proportion to the demands placed on them at any time. There may be several master computers that are responsible for a particular type or problem that feed into the main system. Google Mail, for example, is not just one server performing all of your mail requirements. There are clusters of computers that perform specific parts of the mail system. The master computers just need to see what servers are available and send whatever processing job they have to an available device. In cloud structures, these are usually servers housed in massive data centres.
With the anticipation of much more IoT devices, there will be much more data produced both from human interactions and machine data. This is coming from and will come from everything from watches to street lights. The insight sought is formed from processing that data. The power for this all of which needs to come from somewhere.
So if you are hoping the next high-end product you buy, from a car to a child’s car seat has gone through enough testing to make sure it is safe, and cost-efficient – you can bet that some computer simulation has been used in the process.
How does Distributed Computing work?
The goal of a distributed computing system is to use many machines on a network as if they were one, working concurrently. The machines that are connected can be of varying types, servers, laptops, mobiles and if one of these machines fails the system should keep running.
Distributed Computing uses a master computer to organise the participating devices. The master splits up problems into small parts and each part is worked on by a different computer. As each device completes its computations it returns the results and awaits a new problem to work on.
The master looks at the returned results and assesses its integrity. From simple, to more complex checks the master assesses if the file returned is as expected. If it is not, the master will reissue the same problem to another device. This is to combat any errors in the computations or malicious interference with the file and data.
Some distributed computing platforms distribute the same problem to multiple devices in order to compare the results from several responses so that a majority result is chosen. As some processes can take several hours or days to complete this tactic is also used in case a device gets disconnected during the processing.
Distributed Computing Use Cases
Distributed systems have many advantages over centralised systems including scalability and redundancy. It is very easy to add another device into the network because the network can consist of any number of operating systems, hardware and communication protocols. This is vertical scaling – rather than horizontal, which is to upgrade the hardware of a device. Hardware upgrades can only get so far, whereas vertical scaling has no limit.
When a device is not available for any reason the work can be taken up by any other machine. Failure in one part of the system does not kill the whole system.
Similar to HPC, distributed computing is used in a variety of industries from healthcare to finance. It is used to speed up completion of a single task distributed amongst many computers. In the context of a fluctuating computing environment where demand is unpredictable, resources can be scaled up and contracted as needed. This is particularly true for hyperscale cloud platforms such as Google, Salesforce, Amazon and the like. These companies could not perform as they do without distributed computing.
Types of Tasks
The types of tasks that can be performed are varied but can fall into two main categories. One complex computation that needs a lot of compute power in order to process a problem. The other being a large set of data that needs calculations performed on the numerous entries. This can be used on all sorts of computationally intensive problems from weather models to rendering scenes for movies or games.
Machine data such as data that comes from fitness trackers, phones, IoT devices and apps is being stored in order to process and make sense of. The ability and costs involved in processing this data used to be exclusive to big organisations focused on their own objectives. With cloud providers and the ability to distribute computations across a network, the costs have decreased and the barrier to entry has opened up for more competition. Even so, Mark Zukerberg has slammed AWS for the high costs of compute aiding his philanthropic ventures in scientific research.
For new companies looking to scale, the costs required for computing can now scale with the business’s success. Peak compute time can expand and contract as needed in line with the business operations.
Closing
Distributed computing is a way to tap into more computational resources than any single machine can provide. The concepts and partial implementations have been around for a very long time, but it is only now that all the hardware, software and network are at a stage where a profound leap can be made.
With the computations that can be run, new treatments for treating and curing cancer can be explored. We can forecast the impact of certain actions on the climate, improve energy efficiency and sustainability in production processes.
In order for distributed computing to really make an impact on the world, there needs to be a commercial benefit. Costs have been the biggest barrier to entry, closely followed by the steep learning curve in order to run a distributed computing project. The big players in cloud, Google, Microsoft, AWS and IBM all provide their own flavours of distributed compute services. They are also looking to simplify the process of creating and deploying a compute project. There are other distributed compute solutions that are looking to use existing resources – such as the data centres used by the big players, but also volunteer computing which taps into personal devices. I’ll talk more about volunteer computing in a following post.
The possibilities and the benefits of energy efficiency and problem-solving for humanity is what led us to start LiteSolve. We are excited to get it fully up and running and will need the help of a lot of people for its potential to be realised. We hope you will join us!