Parallel Computer Communications Topology



 

3 Topology

The individual processors of a multiprocessor computer need to communicate by some means. This is effected by connecting the processors to some form of network or to a global memory bus. A communications network may be either static or dynamic. Static networks have a fixed pattern ofconnections, whilst dynamic networks have switching elements which can change the pattern of connections under the computer's control.

The two main parameters of a communications network are the data transfer rate and the shape of the network. 'Grain size' is the amount of computation a node does relative to the amount it can communicate with other nodes, see section 7.5.4. A large grain size indicates the node does a lot of computation while communicating infrequently. A small grain size indicates that only a small amount of computation is done between communications. The network's data transfer rate is as important as the performance of the processors in determining the optimal grain size for a particular computer.

In describing the shape of the network it is the pattern of connections that is important rather than the physical lay out of the wires and circuits. The mathematical term for the study of connectedness is 'topology'. Networks investigated in the literature includes: chains, grids, trees, hypercubes, omega networks, delta networks, indirect binary n-cubes, Banyan networks, hyper toruses, twisted toruses, k-folded toruses, x-trees, shuffle exchange networks, k-way shuffles, Batcher networks, Clos networks, De Brujn networks, reverse exchange networks and butterfly networks. Communications networks are important not only in the design of parallel computers but also in the design of telecommunications systems.

Topologies differ in at least one of the following properties: diameter, extendability, uniformity, multiplicity of paths, routing algorithm complexity, coupling, link complexity, node complexity and degree. A particular choice of topology will favour certain algorithms over others. The favoured algorithms have a high correspondence between the flow of data in the algorithm and the paths of the network. Many networks are closely related; some are subsets of others and some are isomorphic.

The final choice of interconnection network will determine which algorithms and grain sizes can be executed efficiently. Hence, the interconnection network plays a crucial role in determining suitable applications for a given parallel computer. Furthermore, it is usually the choice of communications network that determines how much a parallel computer can be scaled up.

3.1 The Bus

This is a simple topology in which all the processors are connected to a single common bus, see figure 3.1. The bus can be either a dedicated communications bus or the memory bus. Memory bus systems allow all the processors to access a global memory. Communication between processors is achieved by the sender writing its message to a reserved memory area and the receiver reading it as though it were a normal memory access.

Bus

Figure 3.1 Bus

Alternatively, dedicated communication buses 'frame' messages with an address specifying the processor that is to receive it. All processors are able to read the message, however, only the processor addressed in the frame is permitted to do so. A particular advantage of dedicated communications buses is that one address can be reserved for use as a global address for the whole network. A message with this address would permit every node in the network to read the message.

The bus allows new processors to be added without the need to add more routers or to modify the hardware of existing nodes. Routing algorithms are not necessary as all the processors are able to read every message. Only one processor can send a message at a time, therefore, there is an arbiter that decides which processor is to be allowed to transmit on the bus. A simple way to implement an arbiter is to cycle around the available processors offering each one, in turn, control of the bus for 1ms.

If several processors wish to send a message at the same time then all, bar the one selected, have to remain idle until they get their own turn on the bus. The extent to which one processor effects the behaviour of the others in the network is known as the level of 'coupling'. Buses induce a medium level of coupling between the processors. This induced coupling is the most serious disadvantage of the bus system. Unfortunately, the problem becomes worse as more processors are added to the system. As general communications networks, buses are limited to systems with a small number of processors that communicate infrequently with small messages.

Bus systems are often used as secondary communications systems in parallel computers. Bus systems are very effective where global message passing is required, for example from a host processor or controller.

 

3.2 Static Networks

Static networks have some form of resource connected to every node. The resource could be a memory bank, a peripheral, or a processor. Dynamic networks are made up of a block of switching nodes. Only the switching nodes at the 'edges' of the block are connected to a resource. Internal switching nodes are only connected to other switching nodes, and are used to redirect messages from one path to another. The variety and complexity of dynamic networks places them beyond the scope of this thesis, and the following sections refer only to static networks.

3.2.1 Chains

A chain comprises of a linear array of processors which have direct connections only with their immediate neighbours, see figure 3.2.1 The number of connections a node makes is called its 'degree' or 'valency'. Nodes in a chain have degree two, except the terminal nodes whose degree is one. As more nodes are added to a chain its degrees do not change, therefore, chains are said to have a 'fixed degree'. It is proposed that the number of links in a network, as a function of the number of nodes, be called the 'link complexity' of the network. Chains have one less link than they have nodes so the link complexity is N-1 - where N is the number of nodes.

Chain

Figure 3.2.1 Chain

An important limitation of this topology stems from the fact that as more processors are added the number of nodes that have to relay messages increases. The length of the longest direct path in a network is called its 'diameter', and the smaller this quantity the better. In chains the longest path is from one end to the other, giving a diameter of N, which is not only high relative to most other networks, but the highest theoretically possible.

For every processor to be able to communicate with every other, information must flow both upstream and downstream. Chains have only one route between any two nodes, hence, the routing algorithm is trivial. As in the bus system, messages are framed by an address. A processor receives a message from 'downstream' and accepts it if the address is correct, otherwise, the message is re-transmitted 'upstream'. Chains differ from buses in that not all of a chain's processors are able to read every message. Specifically, the processors downstream of the source and upstream of the destination will not have the opportunity to read a message. This makes it more difficult to implement global message passing. A further difference from buses is that chains allow several message packets to flow at the same time, providing they do not overlap.

Chains are relatively easy to extend. The hardware of existing nodes does not need to be modified and the routing algorithm remains essentially the same. It is proposed that the number of nodes needed to extend a network, so that it has the same topology but of a higher order, be called the 'growth complexity' of the network. Many topologies have a growth complexity of N, ie the number of nodes has to be doubled every time the network is extended. The growth complexity of a chain is one which is the smallest possible growth complexity for any topology.

The lack of multiple paths and the linear increase of diameter have several unfortunate consequences which restrict the practical length of chains. Should one node fail, the network will be cut into two and all traffic between the two parts will be blocked - chains have poor fault-tolerance. The maximum volume of traffic on a network (called its 'bandwidth') remains almost constant as extra nodes are added. However, as more processors are added there will come a point where the volume of traffic wishing to use the network exceeds its bandwidth. The network becomes saturated with messages, and whilst it will not crash, many processors will have to remain idle waiting for their communications to cross the network. Chains have a medium level of coupling.

3.2.1.1 Rings

A useful modification of the chain involves linking its ends and forming a ring, see figure 3.2.1.1. All the processors are now able to communicate even if the messages can only flow in one direction. For a given number of processors N, a ring has one more link than a chain, so the link complexit is N. If a particular ring implementation allows messages to flow in both directions there are two routes between any two processors. Typically there will be a long route and a short route. There is now scope for including a more sophisticated routing algorithm that can select the shortest route if it is free. The diameter of a ring is |(N+2)/2|.

Ring

Figure 3.2.1.1 Ring

As there are no longer any terminal nodes, all the nodes are topologically equivalent, so the topology is said to be 'uniform'. Computers based on uniform topologies are easier to program because there is only one type of node to write programs for. The degree of every node is two. Rings have very similar properties to chains and are not suitable for parallel computers with a large number of processors for the same reasons. Ring topologies are, however, popular for local area networks, which usually have a relatively small number of processors that are loosely-coupled and communicate infrequently.

3.2.1.1.1 Three-Dimensional Rings

As well as simply inserting more nodes into a two-dimensional ring, the ring topology can be extended by moving from two dimensions into three. Several identical two-dimensional rings are laid flat and stacked one on top of one another so that links can by made between corresponding nodes of adjacent layers. Figure 3.2.1.1.1 shows a three-dimensional ring thus formed. In this particular example there are four processors in both horizontal and vertical rings. Three-dimensional rings are uniform, ie all the nodes are in topologically equivalent sites.

3-D Ring

Figure 3.2.1.1.1 3-D Ring

Now that there are multiple paths between any two processors the routing algorithm must choose the optimal route for a message. There may be several different paths with the shortest length. There will also be many longer paths. Longer paths can not be dismissed as useless or irrelevant. The shortest path may not always be the best because other traffic on the network may slow down that route. All these details have to be resolved by a fairly sophisticated routing algorithm.

Should a node fail in a chain or a two-dimensional ring the network would be catastrophically affected. With its multiple paths the same is not true of three-dimensional rings, where, failed nodes can be routed around. The multiplicity of paths also reduces the levels of coupling between processors. Three-dimensional rings can be used for linking together the processors in a loosely coupled parallel computer.

A ring made by stacking p two-dimensional rings each with p nodes results in a three-dimensional ring with p2 nodes (N). The diameter is 2|(N/4)½|. Every node has degree four, and the link complexity is 2 * p2 = 2N. Like chains, three-dimensional rings are relatively easy to extend. There is no need to modify the existing hardware and only 2p + 1 nodes have to be added to achieve the next level. Rewriting this in N, we get 2N½ + 1 for the growth complexity.

It will be shown later that three-dimensional rings are isomorphic with toroidal grids.

3.2.1.1.2 Chordal Rings

Another way to extend the two-dimensional ring is to add extra connections between all pairs of nodes that are a set distance apart. The topology thus formed is called a 'chordal ring'. A 24 node chordal ring with chords set 6 nodes apart is illustrated in figure 3.2.1.1.2. (Chord is the mathematical term for a straight line connecting two points on a curve.)

Chordal Ring

Figure 3.2.1.1.2 A 24 Node Choordal Ringwith Chords of length 6

The most obvious advantage of a chordal ring over a simple two-dimensional ring is that there is a 'short-cut' in the path between various nodes. The diameter of a chordal ring is a function of the number of nodes in the ring and the 'length' of the chord. There are now several routes a message can take to reach its destination, but unfortunately, this adds to the complexity of the routing algorithm. At 2N the number of links in a chordal ring is twice that of a simple ring. The topology can be extended by adding a segment to the ring with the same number of nodes as the 'length' of the chord.

A 64 node chordal ring was adopted by the designers of the Illiac-IV parallel computer; in this case the chords connect pairs of nodes nine apart. The Illiac-IV is described in section 6.2 and its chordal ring illustrated in figure 6.2.2.1.

 

3.2.2 Lattices

Lattice topologies can be defined as groups of nodes arranged in two- or three-dimensional periodic patterns.

3.2.2.1 Grids

Chains can be used as building blocks for many different topologies. Here they are stacked one on top of another, so the processors are connected both vertically and horizontally. Figure 3.2.2.1-1 shows the resulting topology which is called a 'grid' or a 'mesh'.

Mesh Grid

Figure 3.2.2.1-1 Mesh / Grid

As with three-dimensional rings, there are multiple paths between any two nodes, and so the network is tolerant of the failure of specific nodes. The multiplicity of paths decouples the processors and allows the topology to simultaneously support many messages. The routing algorithm is similar to that of the three-dimensional ring and can take advantage of the multiple paths to avoid particularly busy routes. Square grids with p nodes along one side have p2 nodes in total. The link complexity is 2(N - N½). Nodes found in the corners of the grid have degree two, the other nodes along the edges have degree three, and the internal nodes have degree four.

If the number of nodes is increased the variety of degrees remains the same so the degree is fixed. The fact that there is a range of degrees indicates that there are differing topological sites and so the topology is not uniform. Non-uniform topologies increase the level of difficulty in a routing algorithm which must cope with a range of topological sites. A disadvantage of the grid topology is its relatively large diameter -- 2(N½ - 1).

The grid is relatively easy to extend. The growth complexity is (2N½) + 1. Grids can be extended without the need to change the hardware of the existing nodes.

Any application problem that involves calculations in a space of n-dimensions, is well suited to the grid topology. Fortunately, this is true for many important problems in science and engineering. Such problems includes: finite element analysis, fluid dynamics, electrodynamics, quantum chromodynamics, and image processing. The advantages bestowed by the high level of correspondence between topology and problem more than makes up for the relatively poor diameter of grids.

Many of the properties elaborated about the square mesh also hold for a hexagonal mesh, which is illustrated in figure 3.2.2.1-2. The differences are that: all nodes have degree three, with the exception of every second node at the edge of the structure whose degree is two. The node complexity N = 2p2 + 4p, where p is the number of hexagons along an edge. The link complexity is = N + p2 - 1 = 3p2 + 4p - 1. The diameter = 4p - 1. Every increase in p adds 4p + 6 extra nodes.

hexagonal grids

Figure 3.2.2.1-2 Family of Hexagonal Grids

Figure 2.4-4 in the section on systolic arrays, shows a different hexagonal mesh and figure 3.2.2.1-3 shows a mesh which is less uniform than the square and hexagonal grids described above.

Mesh

Figure 3.2.2.1-3 Mesh

3.2.2.1.1 Toruses

Grids can be extended by connecting the loose ends together in the same way the chain is converted to a ring, see figure 3.2.2.1.1. This topology is called a 'toroidally connected grid' or 'torus', and is isomorphic with three-dimensional rings. (The properties of toruses are discussed in the section on three-dimensional rings, 3.2.1.1.1). An important feature of toroidally connected grids is that the grid area can be regarded as very large or even infinite by pumping data round and round the rings. All nodes in a torus have degree four and the topology is uniform which simplifies the routing algorithm.

toroidal grids

Figure 3.2.2.1.1 Two Representations of a Toroidal Grids

3.2.2.2 Three-Dimensional Girds

Another way the grid can be extended is to lay them flat and stack several identical grids on top of each other. Vertical connections are added to the horizontal ones which already exist, see figure 3.2.2.2. The topology thus formed is called a three-dimensional grid or 'lattice'.

3-D grid

Figure 3.2.2.2 3-D Grid

The total number of nodes N is p3, where p is the number of nodes along one edge. The corner nodes have degree three, the edge nodes degree four, the face nodes have degree five, and internal nodes degree six. The link complexity L is 3p2(p - 1) and the diameter is 3(p - 1). The growth complexity G as a function of p is 3p + 3p + 1.

There is a larger number of routes between any two nodes in the three-dimensional lattice than in the two-dimensional lattice. This means that more routes must be evaluated in order to select an optimal one. As the routing algorithm must do more computations the bandwidth of a path will be slightly lower than in two-dimensional grids. For a given number of nodes the three-dimensional lattice has more links than a two-dimensional lattice so the total network bandwidth is higher. As before, the optimal route is selected on the criterion of length and current traffic loading.

The three dimensional grid's extra paths further reduce the coupling between processors. This makes it easier to implement an asynchronous multiprocessor computer. More multiple paths also means higher system fault tolerance.

It is possible to go on and build four and five-dimensional grids, however, these are difficult to visualise and have not been illustrated. Computational problems based on two, three, four or more-dimensional arrays map well onto two-dimensional grid. A three-dimensional lattice will complete a problem at least as quickly as a two-dimensional lattice. However, for two-dimensional problems it may not always be possible to use all of a three-dimensional lattice's nodes. If nodes have to remain idle the network is not being used efficiently. As the number of dimensions increases, the lattice becomes suited to fewer-and-fewer problems.

 

3.2.3 Trees

'Trees' are hierarchical structures that have some resemblance to natural trees. Trees start with a node at the top called the root, this node is connected to other nodes by 'edges' or 'branches'. These nodes may spawn further nodes forming a multilayered structure. Nodes at one level can only connect to nodes in adjacent levels, furthermore, a node may only stem from one other node (ie, it may only have one parent), even though it may give rise to several nodes (children). The connections are such that the branches are disjoint and there are no loops in the structure. Figures 3.2.3-1 and 3.2.3-2 show 'binary' and 'ternary' trees respectively. Nodes that do not have any children are called 'terminal nodes'. The number of processors in a chain between the root and the deepest terminating node is called the 'depth' of the tree.

Binary Tree

Figure 3.2.3-1 Binary Tree

Ternary Tree

Figure 3.2.3-2 Ternary Tree

By examining the figures it can be seen that there is only one path between any two nodes. Even so, the nature of the tree makes the routing algorithm slightly more complex than that for chains. A message from one terminal node to another terminal node has to be routed back up the tree to the first node that is common to both the sender and the receiver. Once the message arrives at the common parent it can then travel back down the tree to the receiving node. A single message path leaves most of the communications links free so the tree structure can support many messages at the same time. This requires that the routing algorithm be extended to cope with messages clashing on particular routes. The extended algorithm must also make sure that two frequently communicating processors do not completely block out other processors - a situation called lockout.

Binary trees allow a simplification of the general tree routing algorithm. The binary representation of a node's address can be used in determining the message path. Multiplying the current node address by two gives the address of the node below and to the left. Multiplying by two and adding one gives the address for the node below and to the right.

An important step in finding the message path involves finding the first node that is common to both sender and receiver. This can be done by generating two lists of successive parents all the way up to the root. One list for the sender and one for the receiver. The parent of the current node can be found by dividing the address by two and taking the modulus. (This is equivalent to an 'arithmetic shift right by one bit', which can be found as a single machine code instruction on most processors.) The formula is repeated until the root node is reached (ie the address is 00001).

Path finding can be illustrated with reference to figure 3.2.3-3. Let node four be the sender and node ten the receiver. The list of successive parents is as follows:

Message Passing in Binary Tree

Figure 3.2.3-3 Message Passed from Node 4 to Node 10 in a Binary Tree

Table 3.2.3 Path from Node to Root
SenderReceiver
BinaryDecimalBinaryDecimal
01004101010
0010201015
00011 (root)00102
00011 (root)

It can be seen that the first node to appear in both lists is 2. The path is generated by traversing down the sender list as far as the common node (in this case 2), and then up the receiver list from the common node to the top.

Path { 4, 2, 5, 10 }

Given the common parent the following algorithm will find the path between any two nodes of a binary tree:

find_path(sender, receiver, common_parent)
STACK    path_list, receiver_list
DUMMY    bin
push sender to path_list
if sender <> common_parent    			--traverse up tree to
    parent = sender                      	--common parent
    repeat until parent = common_parent
        parent = |parent/2|                	--arithmetic right shift
        push parent onto path_list      	--this section will also
    :                                   	--push the common parent
:
if receiver <> common_parent
    parent = receiver
    repeat until parent = common_parent
        parent = |parent/2|
        push parent onto receiver_list
    :
:
pop receiver_list into bin          		-- dump common_parent 
                                    		-- which is already in list
repeat until stack empty
    pop receiver_list into path_list
:
RESULT path_list

Listing 3.2.3 Pseudo-Code for Generating the Path Between Two Nodes of a Binary Tree

Trees have a fixed degree and their diameter, (log2N and log3N for binary and ternary trees respectively), is relatively small.

Fat Tree

Figure 3.2.3-4 Fat Tree

The major disadvantage of this topology is that nodes higher up the tree tend to get congested if several processors further down wish to communicate. One way to alleviate this communication problem is to add additional connections between levels. The resulting structure is called a 'fat-tree', see figure 3.2.3-4. Another possible solution to the problem of congestion is to add links between branches at the same level. This structure is called an 'x-tree', see figure 3.2.3-5. Both of these solutions increase the complexity of the routing algorithm. (Fat-trees and x-trees are no longer proper trees as there are loops in the structure and all the branches are no longer disjoint.)

X Tree

Figure 3.2.3-5 X Tree

Several important problems map themselves very well on to tree topologies. These include searches and sorts. Search algorithms map particularly well onto the diamond tree structure illustrated in figure 3.2.3-6.

Diamond Tree

Figure 3.2.3-6 Diamond Tree

The number of nodes in a tree is given by the mathematical formula for the sum of a geometric progression:

N = (dw - 1)/(d - 1)

The number of nodes can be increased by either increasing the depth of the tree 'w', or by increasing the fan-out of the nodes 'd'. The number of links in a tree is given by the formula:

L = (dw - d)/(d - 1)

The growth complexity G is given by the formula:

G = (d-1)/(N + 1)

When a ternary tree is enlarged another twice as many plus one nodes must be added in order to complete the next level. The growth complexity of trees is high compared to other topologies. Figure 3.2.3-7 shows a non-uniform tree structure where the fan-out of the root node is three, and the fan-out of the other nodes is two.

Non-Uniform Tree Structure

Figure 3.2.3-7 Non-Uniform Tree Structure

3.2.3.1 Pyramids

Pyramid topologies are a subset of the tree topology. The pyramid in figure 3.2.3.1 can be arrived at by redrawing a quaternary-tree. All the topological properties of the pyramid are the same as those of the quaternary-tree;

Quaterrnary Tree Pyramid

Figure 3.2.3.1 Quaternary Tree and its Pyramid

Topological Characteristics of Pyramids
CharacteristicFormula
Node ComplexityN = (4w - 1)/3          where w is the depth
Link ComplexityL = (4w - 4)/3
DiameterD = 2w = 2log4[(3N + 1)/4]
DegreeE = 1 for the terminal branches
E = 4 for the root node
E = 5 for the rest
Growth ComplexityG = 3N + 1

3.2.3.2 Stars

The star topology is the degenerate case of a tree. It is a tree of depth two and usually has a high fan out, see figure 3.2.3.2. This figure also shows the same star in its more familiar form with processors arranged around the root node. The central node acts as a switching network, routing messages from one processor to another, rather like a telephone exchange.

Star Topology

Figure 3.2.3.2 Two Representations of the Star Topology

This arrangement can be useful as it matches the way peripherals are configured about a computer. Usually a computer will have only one screen, printer and backing store. With star topologies these resources would be placed at the central node, thereby bestowing even access by all processors. Stars have a mixed degree. The terminal nodes are only linked to the central node so their degree is one. The central node is linked to all the other processors, so its degree is N-1. If the degree is a function of N, as in this case, it is said to be variable. The routing algorithm is simple, and need only reside at the central node. It receives messages from a port and simply redirects them to the port corresponding to the destination.

Extending the star would involves increasing the fan-out of the central node rather than the depth, as with a tree. This makes the growth complexity one, which is not only the better than most other topologies but also the best theoretically possible. The central node has to be modified in order to cope with the extra node, however, all the other nodes can remain the same.

The longest path starts at a terminal node and is along the branch to the central node and then back out and down to another terminal node; the diameter is, therefore, three. As the diameter is not a function of N it is said to be 'fixed'. A disadvantages of this topology lies with the central processor. This can become a communication bottleneck, consequently star networks tend to be heterogeneous, with the central processor being designed with a much higher communications bandwidth than the others.

Another, more serious, problem is that should the central processor fail, the whole network fails, along with all access to peripherals. Again the central processor design needs to be different from the others. The reliability of the central processor can be improved by using higher grade components and adding error tolerant memory. The bottleneck at the central node makes it impractical to have many processors in such a computer. However, star networks are commonly found in local area networks.

 

3.2.4 Hypercubes

nCubes

Figure 3.2.4-1 Family of nCubes

Simple cubes have three dimensions, 'hypercubes'[1], [2] are produced by increasing the number of dimensions in a cube. A four-dimensional cube is called a 'tesseract' and can be thought of as two three-dimensional cubes whose corresponding vertices have been connected, see figure 3.2.4-1. Figure 3.2.4-2 shows the tesseract in various other forms. The term hypercube refers to any cube with four or more dimensions.

Tessaract

Figure 3.2.4-2 Various Representations of a Tesseract

A hypercube of dimension n can be subdivided into two hypercubes of dimension n-1. These two subcubes can, in turn, be divided into four subcubes of dimension n-2 and so on until the number of nodes reaches one. Conversely, every time another dimension is added the number of nodes in the hypercube doubles.

An n-dimensional hypercube contains 2n nodes, ie the node complexity N = 2n. Each node is directly connected to exactly n neighbours. The degree is, therefore, n or log2N. Each node in a hypercube can be uniquely labelled by an n-bit address. The n neighbours of a particular node have an address that differs from the node's address by exactly one bit. For example, if n=4 and the node has the binary address (0010) then its neighbours have the addresses (1010), (0110), (0000) and (0011). This property is the basis of a sophisticated routing algorithm. Hypercubes have a relatively low diameter - log2N.

All the nodes are in topologically equivalent sites so the hypercube is an example of a uniform topology. Each node has n links associated with it, there are therefore n2n node-link associations in the network as a whole. Each link is shared by two nodes. Therefore, each node can be though of as possessing n/2 links. The link complexity is:

L = (n2n)/2 = (Nlog2N)/2

There are many paths between any two nodes. However, only a few are of minimal length. The multiplicity of paths makes it less likely that a message will be blocked by dense traffic on a particular route. However, this also makes the routing algorithm more complex. The existence of independent multiple paths allows many messages to be carried at the same time. The multiplicity of paths also means there is fault tolerance and a low degree of coupling.

As already mentioned, the number of nodes doubles when a hypercube is extended from one level to the next. The growth complexity is, therefore, N, which is relatively high. As each node must accommodate another new link, the degree of all the nodes also increases. The need to modify each node in order to extend the network is a major disadvantage of this topology.

The hypercube is of interest because it has a topology that is suited to several important groups of algorithms. In particular, it is well-suited to problems involving the evaluation of fast Fourier transforms (FFTs). Many topologies are subsets of the hypercube. Examples include: lower dimensional cubes, trees, rings, cylinders, meshes and butterfly networks.

A computer called the Cosmic cube was the first practical hypercube multiprocessor and was built at 'Caltech' in 1983. Each of its 64 processing elements had a 8086 microprocessor with an 8087 floating-point coprocessor.

Topological Characteristics of Hypercubes
CharacteristicFormula
DiameterD = log2N = n          where n is the dimension
Node ComplexityN = 2n
DegreeE = log2N
Link ComplexityL = (Nlog2N)/2
Average diameterB = n/2
Growth ComplexityG = N
 

3.2.5 Fully Connected Topologies

One way to get around the need to route messages through several nodes is to connect every node to every other node, see figure 3.2.5. This is called a fully connected network and has degree N-1. Unfortunately there are a few serious disadvantages to this topology. For a given number of processors N, there are N*(N-1)/2 communications links.

Fully Connected Topology

Figure 3.2.5 Fully Collected Topology with 5,8 and 16 Nodes

Table 3.2.5 Number of Links Compared to the Number of Nodes for a Fully Connected Network
Number of NodesNumber of Links
N[N*(N-1)/2]
10
21
36
1045
100495
1,000499,500

As the number of nodes increases, the computer contains more and more links, see Table 3.2.5. Fairly quickly, most of the computer becomes communication links rather than computing elements. Furthermore, as a node is unlikely to be able to cope with simultaneous communication on every link, most links will be idle most of the time.

Despite these handicaps there are a few advantages to fully connected networks. The network is uniform and the node complexity is one. The growth complexity is also one which compares favourably with the growth complexity of N for hypercubes. Unfortunately, every node has to be modified every time the network is extended.

There is no need for a routing algorithm: a message is simply sent to the port linked to the relevant node. As there are no intermediate nodes that have to pass messages on, the diameter of this network is two, which is the best theoretically possible. There is no delay caused by placing a message on the network, because there is no contention for the use of any link or path. The network can cope with multiple messages.

Fully connected networks are only suitable for parallel computers with a small number of processors, due to the rapid growth of communications links.

Topological Characteristics of Fully Connected Networks
CharacteristicFormula
DiameterD = 2
Node ComplexityN = 1
DegreeE = N-1
Link Complexity L = [N*(N-1)]/2
Average diameterB = 2
Growth ComplexityG = 1
 

3.2.6 Dedicated Topologies

Dedicated topologies are designed to match the communications requirements of a particular algorithm. They are not regular and do not correspond to uniform topologies, as represented by symmetrical shapes, see figure 3.2.6.

Dedicated Topology

Figure 3.2.6 Dedicated Topology

These topologies are too diverse for many characteristic features to be identified. It is not likely that they are easily extended, as they already match the algorithm they were built to execute. They are similar to analogue computers in that they are dedicated to one problem and have to be rewired to solve another one. Advantages include a high degree of correspondence between topology and algorithm and little or no communication link contention.

The routing algorithm is complicated as every node must possess a 'map' of the whole network. However, as the network was designed for a specific algorithm most communications only occur between adjacent nodes and a routing algorithm may not even be necessary.

 

3.2.7 Hash-Nets

An anarchic approach to the choice of topology for a parallel computer is to connect everything randomly, see figure 3.2.7. Somewhat surprisingly, random networks are reported to perform fairly well for general computation. This surprise reflects how little is know about the relationship between computation and communication topology in parallel computers.

Randomly Connected Topology Hash Nets

Figure 3.2.7 Randomly Collected Topology or Hash-Net

An important advantage of hash-nets is that there is little effect on the system if one node fails - they are fault-tolerant. The growth complexity is one and adding extra nodes to the network is a relatively simple procedure. However, the routing algorithm has to be fairly sophisticated. Further analysis of their properties has to be done statistically rather than analytically.

Ideally, parallel computers would have a diameter of two, ie every node would be connected to every other node. However, it has already pointed out that this would make large parallel computers impractical, due to the exponential growth in the number of links. An alternative method of achieving an effective diameter of two is to write programs whose communications patterns are only with immediate neighbours. This means finding algorithms that match particular topologies. It is only through the coupling of software with the underlying hardware that parallel computers can approach their theoretical performance. It is much easier to find algorithms which only communicate with immediate neighbours for regular topologies than for irregular ones.

General information for this chapter also came from references [3], [4], [5] and [6].

 

References

[1] Modi JJ, Parallel Algorithms and Matrix Computation, Clarendon Press, Oxford, 1988, 0-19-859670-7, p29-30
[2] Seigel HJ & McMillen RJ, The Multistage Cube: A Versatile Interconnection Network, Computer, IEEE, 14(12), December 1981, p65-76
[3] Feng T, A Survey of Interconnection Networks, Computer, IEEE, 14(12), December 1981, p12-27
[4] Hillis WD, The Connection Machine, The MIT Press, London, 1985, 0-262-58097-7
[5] Hockney RW & Jesshope CR, Parallel Computers 2, Adam Hilger/IOP Publishing, Bristol, 1988, 0-85274-812-4
[6] Hwang K & Briggs FA, Computer Architecture and Parallel Processing, McGraw Hill, London, 1984, 0-07-031556-6, p333-350, p481-502