IBM Blue Gene/Q supercomputer
5D Torus interconnection
Even one of the highest performance supercomputer is also built with similar architecture with PCs.
Difference is that how to interconnect nodes ( each can be a computer by itself ) for internal connectivity not “Internet” connection.
Surely the architecture design can incorporate faster internal bus and better architecture, faster I/O, high availability, fail-over feature, fail resiliency, etc.
However, that kind of super computer is very expensive. When processing unit was not as fast as current processing unit, they have to design the fastest architecture. However, as you can see from Blue Gene/X diagram, you can practically build cost effective cluster supercomputers or a group of computers.
Surely the maximum performance can be slower than a supercomputer which is designed for maximum performance from the scratch, but because processing unit is fast nowadays, it may not need to require the fastest performance always. If it requires long processing time, distribute the data to some group of computers or the whole group of computers and let them process it at the same time and collect the result over “TCP/IP” can be good enough.
(This works when required processing time is a lot longer than networking speed. ) Analyzing big data, processing tons of video, or rendering photo-realistic 3D animations can be those examples.
So, cluster solutions like BeoWolf is cost effective choice for choice.
So, it means that there can be some alternative ways to achieve good enough performance with lower price requirement for certain specific situation.
Even S/W design/architecture affects it and can compromise requirement for low cost but high performance turn-around for submitted tasks.
Instead of working on original uncompressed video data, for example, directly, it can present smaller version of video data, which can be loaded and edited quickly by users. Then any steps of editing like what effect to use, from where to where to cut out, etc can be applied to the original big-sized video data.
It can at least reduce the time to load, edit and play for confirmation. if we consider actual processing time is not only governed by computing power but also human interaction and any job involving operators, working on smaller version of data which represent its big original counterpart can actually overcome low internetwork performance. ( depending on situation )
So, when designing a high performance system and solution, the person should also consider S/W system architecture of the solution as well as H/W architecture. If one of them is omitted, it can be either that the wanted system can’t be designed or that you need to pay a lot to buy a dream machine.