High-Performance Networking Unleashed

- 26 -

Clustering

by Mark Sportack

Clustering is a computing technique that has slowly, but steadily, been gaining popularity at all levels of computing. Originally a data center-grade technology, the last 20 years have witnessed support for clustering being added to mid-range UNIX processors and, today, to low-end client/server computing architectures. For all its heritage, clustering remains a vague concept that all but defies definition. This chapter examines clustering techniques, identifies their relative strengths and weaknesses, and explores the ways that high-performance networking can be used to support clusters.

Overview of Clustering

The concept of clustering computers dates back to 1982 when the Digital Equipment Company (DEC) introduced its VAXCluster. The VAXCluster offered more economical computing by decoupling the input/output (I/O) devices from any single CPU. Instead, all CPUs could access the devices, and their contents, by way of a star topology bus and coupling device, as shown in Figure 26.1. This simple form of clustering is still useful today, although it has been refocused to provide scalability and/or fault tolerance rather than simple device sharing.

FIGURE 26.1. The typical VAXCluster configuration.

The original clustering product, the DEC VAXCluster, allowed "clusters" of VAX systems to share I/O devices.

From this rather humble beginning, clustering has grown into a confusing aspect of parallel computing that almost defies definition. Numerous factors contribute to this. First, there is no standard for "clustering" computers. Clusters can be implemented in many different ways. They can be designed and engineered to solve many different business problems, assuming numerous different topologies in the process.

There is also no standard platform to build a cluster on. Uniprocessor and multiprocessor machines from all vendors can be mixed and matched in clusters, too. Even the choice of microprocessor is not limited. Clusters can also be built using either Reduced Instruction Set Computing (RISC), Complex Instruction Set Computing (CISC), or even Very Large Instruction Word (VLIW) processors.

RISC processors and UNIX-based clustering products have been available for some time. It is the relatively recent introduction of products based on the "Wintel" platform that is generating excitement in the marketplace. Given the relatively low cost of the x86 CISC microprocessors, and the broad knowledge base that the various Microsoft Windows operating systems enjoy, clustering software for this platform can greatly reduce both the acquisition and operation costs of a cluster without compromising scalability, availability, or functionality. Thus, clustering appears poised for the mass market.

But, you may point out, we haven't really identified what a cluster is yet! All we've done is identify some of the potential physical platforms that clusters can be built upon. Clusters are a distributed form of parallel computing. Implementations and topologies can vary significantly in the degree of parallelism, functionality, physical platform, operating system, networks, and so on.

Not surprisingly, clusters are frequently confused with two other forms of parallel computing: Symmetric Multiprocessors (SMPs) and Massively Parallel Processors (MPPs). As Figure 26.2 illustrates, clusters demonstrate a significant overlap with both SMPs and MPPs. This is to be expected, given that they are all forms of parallel computing, yet they are not completely interchangeable.

This diagram, though not scientifically derived, visually demonstrates the partial overlap of clustered computers with uniprocessors, symmetric multiprocessors, and massively parallel processors relative to each one's trade-offs between scalability and availability. Clusters are capable of broader simultaneous support for scalability and availability.

Despite the functional similarities, there is one important architectural distinction between clusters and both SMPs and MPPs. Clusters are distributed. SMPs and MPPs are self-contained within a single computer. Therefore, even though they can redistribute workloads internally in the event of a CPU failure, they are vulnerable to downtime from failures in other parts of the computer. Clusters are capable of greater availability rates because they have fewer single points of failure. They distribute the processing across multiple separate computers that are networked together.

FIGURE 26.2. Functional overlap with SMPs and MPPs.

System architects who are considering clusters must also decide whether one of the myriad commercial cluster products will suffice, or whether they need to cobble their own cluster with a pastiche of hardware, software, and networking products. The availability of "canned" cluster solutions only adds to the confusion about what clustering means because of their great dissimilarities. Vendors are intentionally trying to differentiate their clustered solutions in the marketplace either by focusing on specific niches or by concentrating on feature- and/or performance-based competition.

The physical separation and redundancy of computers within a cluster lends itself to architectural creativity. Clusters can be implemented in so many different ways, and for so many different purposes, that one would be hard pressed to find anything in common between some types of clusters.

In short, there is no single, coherent definition for clustering. It is, rather, a generic concept for configuring multiple computers to perform the same set of tasks. Consequently, many people use clusters and cluster products every day, without recognizing them for what they are.

Basic Cluster Architectures

Given that there is no consensus on the proper way to design, or even use a cluster, it is not surprising that numerous topologies have appeared. By examining some of the potential cluster topologies, their strengths and weaknesses should become apparent. An understanding of each topology's strengths and weaknesses is essential to developing effective clustered computing solutions.

Contemporary clusters tend to embrace one of two architectures: shared disk or shared nothing. Both of these architectures are subject to a seemingly infinite array of variation and combination. The following figures demonstrate some of the more common examples, albeit in an intentionally generic manner. The network technologies indicated in the following figures are somewhat arbitrary, but functional.

Shared-Disk Clustering

Shared disk clustering is a close relative to the I/O sharing VAXClusters. Ostensibly, the primary difference is that the computers illustrated in Figure 26.3 are all performing the same application work, although this is not an absolute. As the computers in that figure are likely to be sharing the same data, an access manager is needed to coordinate the access, modification, deletion of the shared data.

FIGURE 26.3. Shared-disk clustering.

Clusters that share disks, and their contents, are directly descended from the original VAXClusters. Unlike the hosts in Figure 26.1, these hosts are dedicated to the same task and must coordinate access to, and modification of, the data. This requires interhost communication that can be satisfied through the local area network (LAN) that connects them to their wide area network (WAN).

Several companies have introduced products based on the shared disk cluster configuration, albeit with a slight variation. The clustered hosts access the shared disks directly, without a physical disk access management device. This variant is illustrated in Figure 26.4. Disk access management is still critical to the successful operation of the cluster, but is performed in the application layer rather than embodied in a physical device.

FIGURE 26.4. Shared-disk clustering without access management.

Single-application clusters that access shared disks rely directly upon database management or other software to coordinate access to disks and data.

Shared-disk clustering, in general, excels at satisfying I/O-intensive requirements, maximizing aggregate system performance, and load balancing. This approach is often used in conjunction with other mechanisms that provide auto-recovery from failures within the cluster.

Shared disk clusters, in general, are the least scalable of all clusters. Their limiting factor is the coordination of sharing disks and, more importantly, their contents. The coordination chore becomes increasingly complex as more computers are added to the cluster. Another limitation is that it may not be possible to share disks over great distances. Technologies exist that can network geographically dispersed disk drives for sharing, but they may be expensive to implement and operate. Geographic dispersion of the data also complicates and increases the time required to coordinate access to the data.

Shared-Nothing Clustering

The second cluster architecture is known as a shared-nothing cluster. This type of cluster, despite its somewhat oxymoronic name, is more scalable and has greater potential for delivering fault tolerance and auto-recovery from failures than shared-disk clusters. To achieve scalability and availability, shared-nothing clusters may compromise application performance. This type of cluster can be built in two major variations: close proximity and geographically dispersed. Figure 26.5 presents the close proximity form of a shared-nothing cluster. The geographically dispersed version is illustrated in Figure 26.6.

FIGURE 26.5. Shared-nothing clustering.

Shared-nothing clusters eliminate shared drives as a single point of failure. This degree of redundancy increases both the cost and complexity of building and operating a cluster.

The shared-nothing approach to clustering eliminates the scalability problems faced by shared-disk clusters. Each computer in the cluster has its own disk(s). Whether the client request for data can be resolved locally, or requires making an I/O request of another host in the cluster depends on how the application, and its database subsystem, was designed and implemented.

The shared-nothing clustering also enables clusters to be distributed geographically. This permits taking availability to an extreme by creating options for disaster recovery.

Shared-nothing clusters can also be dispersed geographically. This provides a degree of availability that is impossible to achieve with any system, clustered or not, that is wholly contained in a single location. It is also possible for the hosts in this figure to be clusters, rather than individual computers.

The flexibility of a geographically distributed cluster comes at the cost of performance. The WAN facilities will, almost certainly, be slower than any LAN technology. Attempts to use geographically dispersed clustered hosts for load balancing of a single application is doomed, as the participating hosts may need to ship I/O requests to other computers within the cluster. For example, if Host A in Figure 26.6 receives a request from a client for data that resides on Host B's disk, it must ship the I/O request across the relatively low-bandwidth wide area network to Host B for fulfillment. This is a substantially slower process than retrieving data from a local disk drive.

This negative performance delta is probably sufficient to warrant limiting this clustered arrangement to batch processing, disaster recovery, or other applications that are tolerant of long waits for I/O requests.

FIGURE 26.6. Geographically dispersed shared-nothing clustering.

Topological Variations

Other topological variations on the shared-disk or shared-nothing architectures can be developed by customizing clusters for specific purposes. For example, contemporary clustering products typically focus on providing specific functionality, such as scalability, fault tolerance, failure recovery, performance, and so on. This functionality is provided by varying either the shared-disk or shared-nothing architecture, or even by offering a combination of the two. Although numerous subtle differences exist, the key difference between the various cluster products lies in the specialized software that coordinates activity between the clustered hosts.

Fail-Over Clustering

There are numerous specialized, high-availability clustering products designed for automatic failure recovery. These are known as fail-over systems. In a fail-over configuration, two or more computers (or clusters, for that matter!) serve as functional backups for each other. If one should fail, the other automatically takes over the processing normally performed by the failed system, thus eliminating downtime. Needless to say, fail-over clusters are highly desirable for supporting mission-critical applications. Figure 26.7 presents the basic fail-over cluster topology.

FIGURE 26.7. Fail-over clustering.

The shared-disk fail-over cluster typically requires an extra, dedicated high-performance network that is used solely for communication and coordination between the clustered hosts.

Traditionally, building a fault tolerant system meant buying two complete systems. One would be used to actively support the application while the second sat almost unused, except for keeping its copy of the application software and data up to date. In the event of a failure, the idle system would be pressed into service, buying time for the stricken host to be repaired. This approach tended to be fairly expensive and, depending upon how it was implemented, could still result in downtime and/or the loss of data.

Fail-over clusters take a more active approach to redundancy. The redundant hosts continuously monitor each other's health and have contingencies in place that will allow the cluster to recover from a failed host almost immediately. This monitoring mechanism also permits system architects to establish some load balancing between the hosts, provided that CPU utilization remains below 50 percent on all hosts.

Regardless of how they are implemented, fail-over clusters can be designed to provide high levels of availability without high levels of hardware inactivity.

Scalable Clustering

Designing clusters for scalability, too, has direct implications on the cluster's topology and functionality. Commercial clustering products that emphasize scalability tend to have stronger cluster management software. They also are more aggressive at load balancing than fail-over products. This requires all clustered hosts to have equal access to all data, regardless of where the data resides, or how that access is provided.

NOTE: Highly scalable clusters absolutely require the elimination of systemic performance bottlenecks. Given the current combinations of technologies in any given "system," slight mismatches are inevitable. The impacts of these mismatches are magnified by scale. The most obvious bottleneck component is I/O. Therefore, highly scalable clusters made from low-end processors will remain unattainable until technological advance closes the gap between the speed of I/O and processors.

Despite the availability of commercial products, designing scalable clusters can be difficult. The biggest trap that awaits anyone designing a scalable cluster is not compromising aggregate system performance for the sake of future scalability. Essentially, the cluster must be designed so that managing access to shared disks and data is not compromised by increases in usage volumes, or the cluster growth that should follow any such increases. Given that managing disk and file access becomes more complex as additional computers are added to the cluster, one easy way to avoid this trap is to build the cluster using expandable (that is, not fully configured) SMPs. This enables the entire cluster, regardless of architecture, to scale upwards by simply adding microprocessors to the existing SMPs.

While this solution may seem somewhat glib, consider the architectural alternatives and their risks. Servers in a poorly designed shared-disk cluster spend an unacceptable amount of time negotiating for permission to disk files as the cluster grows. Using a shared-nothing architecture may provide erratic performance, as perceived by the clients. Requests for I/O that can be satisfied locally are, typically, fulfilled very quickly. Requests for I/O that must be shipped to other servers in the cluster take considerably longer to fulfill.

Another alternative may be available, if the system or application that is being clustered, lends itself to task separation. In such cases, the cluster may be designed so that certain hosts have primary responsibility for specific functions. If this task separation enabled a similar separation of data, this type of cluster could be best implemented with a shared-nothing architecture, as (under normal circumstances) most requests for I/O would not have to be shipped to a different host. Applications that offered task separation, but not data separation, would probably perform best in a shared-disk cluster.

Properly designed, scalable clusters offer numerous benefits. Technological change, upgrades, and even maintenance can occur without disrupting service. This genre of cluster tends to have a greater need for high-speed I/O and access to shared storage devices.

Multitiered Clustering

Clusters can also be designed to satisfy numerous other objectives and can be implemented in combinations of the illustrated models. For example, multiple logical tiers of clustering functionality can be added through commercially available software without altering the physical cluster topology. Relational database management software, transaction processing management software, queuing management software, and so on, usually contain some provisions for either fail-over or load balancing.

A physically distinct topological variation of the multitiered cluster can best be described as a client/server/server. As Figure 26.8 demonstrates, a cluster of application servers can share access to a server that "owns" the data and database subsystem. This configuration enables one class of machines to focus on database management and another to be dedicated to performing application work.

Using a two-tiered server model physically decouples the application from the data, while permitting scalable growth of the application and its host. The small, private FDDI ring that interconnects the three servers is used to segregate inbound traffic from I/O requests. This network may also be used for interhost communications, if a fail-over mechanism was installed.

Given that each type of work imposes different requirements on their host, this arrangement offers system architects the ability to customize each server's configuration according to its specialized function. For example, the application servers can be optimized for either transaction processing or computation, depending upon the nature of the application. Similarly, the data server would be equipped with high-speed I/O capabilities and large disks.

For applications that cannot afford a single point of failure, the cluster's data server may be clustered, too. Thus, two or more fully redundant servers could function interchangeably as the cluster's server for database management.

FIGURE 26.8. Client/server/server clustering.

In Figure 26.9, the single point of failure evident is eliminated by introducing a fail-over cluster in the role of the primary cluster's data server. Depending upon usage volumes, either a more robust and/or separate LAN may be required to further segregate "keep alive" communications from I/O requests.

The last variation of a basic cluster topology that this chapter addresses is a form of the remotely distributed, shared-nothing cluster. Properly planned and implemented, this topology can provide application-level disaster recovery. It requires that clustered servers, and their storage facilities, meet certain criteria. These criteria are as follows:

They must have sufficient spare capacity to instantly absorb the processing and I/O demands of the application that they are backing up.
The LANs and WANs that interconnect the user base and the clustered hosts should have adequate spare capacity to automatically accommodate the shift in traffic patterns that will result from implementation of the disaster recovery contingency plans.
The servers should be geographically separated from each other to ensure that regional disasters do not simultaneously impact both an application's primary and backup hosts.
Provisions must be made for maintaining current copies of the application software and data at the emergency host.

FIGURE 26.9. Client/server/server clustering, with internal cluster.

Figure 26.10 depicts a typical disaster recovery cluster.

Using a variation of the remotely distributed, shared-nothing cluster enables system architects to accommodate disaster recovery requirements, without incurring the costs normally associated with fully redundant, emergency backup systems.

FIGURE 26.10. Clustering for disaster recovery.

Summary of Cluster Varieties

This collection of typical clustering configurations, though by no means complete, should adequately convey the degree of flexibility that clustering affords. Topologies and their variations can be mixed and matched and even nested together to accommodate business requirements. Individual computers within a cluster can also be tailored to meet specific performance and functional requirements.

A slightly more subtle purpose of these examples, however, is to demonstrate the extent to which data networking supports clustering. If data networking were a homogenous quantity, this chapter could end here. Alas, it is not. Networks are almost as varied in their design and implementation as clusters.

Selection of Network Technologies

Given that interhost communications are essential to all clustering configurations, network technologies must be selected to ensure an optimal fit. This requires an understanding of the cluster's mechanics and performance requirements, as well as the various network technologies' performance specifications. Failure to use the right networking tool for the right job directly, and negatively, affects the performance of the cluster.

Network Functional Areas

Clustering may require network connectivity for one or more of three network functional areas:

Client to cluster
Cluster host to cluster host
Cluster host to shared storage device

A fourth network functional area exists, intracomputer communications. The choice of technology for this function is determined by its manufacturer.

Each network technology must be evaluated relative to the unique demands and constraints of each of these functional capacities.

Client-to-cluster connectivity depends a great deal on where the clients are located, relative to the location of the cluster! If, for example, they are all located in the same location, WAN technologies need not be discussed. On the other hand, if the users are geographically dispersed, a combination of LAN and WAN technologies will be required. Fortunately, for the purposes of clustering, this aspect is likely to impose the least stringent network performance requirement.

Cluster host-to-cluster host connectivity also depends on whether or not the hosts are co-located. The performance requirements for this network component depend directly on the nature of the cluster. For example, fail-over clusters will require this to be a high-speed, non-contention-based network that is dedicated solely to interhost communication. This enables the quickest possible identification of a failed host and, subsequently, the quickest auto-recovery.

NOTE: Of all the network functional areas, cluster host-to-cluster host networking has the most impact on the aggregate performance of the cluster. This highly specialized network is best described as a Cluster Area Network (CAN). Technological innovations will soon render this network functional area a semi-internal function of the clustered processors. Such new technologies will bypass the I/O bus and feature direct memory-to-memory connectivity of the clustered nodes.

Scalable clusters, depending upon the actual implementation, typically impose more demanding performance requirements for I/O than they do for intracluster host communications. As a result, scalable clusters may be able to use the same network used for client-to-cluster host connectivity for all its interhost communications.

Finally, cluster host-to-shared-storage device always require high-speed connectivity. The distances between the cluster hosts and the storage devices, as well as the expected number of hosts that comprise the cluster, dictate the choice of network technology.

Network Technologies

Some network and/or bus technologies that can be used in a cluster are: 10Mbps Ethernet, 100Mbps Ethernet, FDDI and CDDI, ATM, ESCON, SCSI-II and III, Fibre Channel, as well as numerous other LAN, WAN, and bus technologies. Each has its own strengths, weaknesses, and peculiarities that must be considered against the clustered system's performance requirements. A quick survey should reveal just how different network technologies can be. Strengths and weaknesses should amply demonstrate the criticality of network technology selection for each network functional area of a cluster.

Given that all network technologies must abide by the laws of physics, each one represents a different balance struck between speed and distance, varying by the physical media that they are implemented with. Two other differences between network technologies that also serve as key metrics for comparisons, are sustainable throughput and latency. Both of these are a function of the network's protocols for media access and packet handling. These are the metrics that should be applied to each technology to assess its viability in the cluster configuration.

10Mbps Ethernet

10Mbps Ethernet is an extremely mature and stable technology. As defined in the IEEE's specification 802.3, there are four different physical layer specifications for transmission media. Table 26.1 presents the distance limits and data rates that can be achieved with each of the physical layer specifications.

Table 26.1. Ethernet's distance limitations.

Physical Media Max. Distance Data Rate

10base-2 thin coaxial cable Up to 185 meters 10Mbps

10base-5 thick coaxial cable Up to 500 meters 10Mbps

10base-T unshielded twisted pair Up to 100 meters 10Mbps

10base-FL fiber-optic cable Up to 2000 meters 10Mbps

Ethernet uses a variable length packet to transmit data that can vary from 64 to 1500 octets. This is an extremely efficient means of transporting bulk data, as the packet-to-payload ratio can be optimized automatically. Ethernet's effectiveness in bulk data transport renders it incapable of providing consistently low latency levels for time-sensitive traffic.

For its access method, Ethernet uses Carrier Sense, Multiple Access with Collision Detection, also affectionately known as CSMA/CD. This is a contention-based media access method. Devices connected to the LAN must compete for empty packets. This competition may result in the collision of packets, especially if the network is heavily used. Collisions between packets result in a retransmission of those packets.

The net effect of Ethernet's flexible packet sizes and competition for bandwidth, is a protocol that is incapable of gracefully accommodating heavy traffic volumes. At utilization rates in excess of 20 percent, performance begins to degrade quickly. Thus, its sustainable throughputs are limited to less than 3Mbps. Using switched Ethernet bolsters the performance of this technology by increasing the amount of bandwidth available to switch-connected devices.

100Mbps Ethernet

100Mbps Ethernet (or Fast Ethernet) is a recent extension to the 802.3 specification. It presents a graceful migration path from, and ready interoperability with, its slower sibling. Unfortunately, it also retains all of Ethernet's shortcomings, albeit at a faster transmission rate. Table 26.2 shows the distance limits and data rates for 100Mbps Ethernet.

Table 26.2. Fast Ethernet's distance limitations.

Physical Media Distance Data Rate

62.5 micron multimode fiber-optic cabling
(100base-FX) Up to 412 meters 100Mbps

Category 3 unshielded twisted pair
(100base-T4) Up to 100 meters 100Mbps

Category 5 unshielded twisted pair
(100base-TX) Up to 100 meters 100Mbps

Given an order-of-magnitude-faster clock than its predecessor, Fast Ethernet is capable of sustaining approximately an order of magnitude more throughput, that is, about an aggregate of 20Mbps to 30Mbps, before it begins experiencing performance degradation. Implementing port-switched Fast Ethernet effectively reduces competition for packets on a segment to just two devices: the hub port and the computer that it serves. This means that each switch- connected device can use at least 40Mbps to 60Mbps, rather than competing for that bandwidth with all the other devices, and their respective hub ports, on the LAN.

FDDI and CDDI

FDDI and CDDI are 100Mbps token-passing local area networks that use a ring topology. Network access is highly deterministic as it is governed by a "token" that passes around the FDDI loop. Decreasing network latency is easily accomplished by reducing the size of the ring, that is, the fewer devices connected, the more frequently each device gets the token.

FDDI also features a dual, counter-rotating, ring topology that can "splice" logically to heal a broken cable. The drawback to this self-healing capability is a sudden increase in propagation delay in the event of a cable break. This is a minor price to pay for a network that can auto-recover. Table 26.3 shows the distance limits and data rates for FDDI.

Table 26.3. FDDI and CDDI's distance limitations.

Physical Media Max. Distance Data Rate

62.5 micron multimode fiber-optic cabling 200 total kilometers
(100 per "ring") 100Mbps

Category 5 unshielded twisted pair > 100 meters 100Mbps

Type 1 shielded twisted pair > 100 meters 100Mbps

FDDI, and its wire-based sibling CDDI, are designed to provide high levels of sustainable throughput, approximately 60Mbps to 80Mbps. This is due largely to the regulated media access method.

ATM

Asynchronous Transfer Mode, ATM's proper name, was originally developed as an asynchronous transfer mechanism for Broadband Integrated Services Digital Network (B-ISDN). It is a high-bandwidth switched protocol that uses a 53-byte packet that includes 48 bytes of payload with a 5-byte header.

Although essentially a connectionless protocol, mechanisms have been implemented that enable ATM to function in a connection-oriented mode. ATM was initially touted as a grand unifier in the world of networking, capable of seamlessly integrating LANs and WANs. Predictably, ATM was implemented in numerous data rates that were designed specifically for a LAN environment. For example, data rates as low as 25.6Mbps were developed for client connectivity, whereas 155.52Mbps was intended for initial implementation a LAN backbone technology, as well as for servers and high-end client connectivity.

This protocol is, in theory, scalable up to approximately 2.4 gigabits per second, although LAN products are currently only available up to 622Mbps. The norm for host connectivity is the 155.52Mbps interface. Distance limits and data rates for ATM appear in Table 26.4.

Table 26.4. ATM's distance limitations and data rates.

Physical Media Distance Data Rate

Category 3 unshielded twisted pair Up to 100 meters 25.6Mbps

Category 5 unshielded twisted pair Up to 100 meters 155.52Mbps

62.5 micron multimode fiber-optic cabling Up to 2 kilometers 155.52Mbps

Given that ATM is an inherently switched technology, its sustainable throughputs should be fairly close to its data rate. ATM also uses a fixed-length packet. This makes it a low latency protocol, ideally suited to time-sensitive applications. Conversely, it has a high overhead of packet frame to payload. Thus, it might not be as efficient at bulk data transport as a protocol with a flexible-length packet, yet it operates at a higher data rate than either Fast Ethernet or FDDI.

ESCON

ESCON (Enterprise Server Connectivity) is an IBM channel technology. It provides 17.1Mbps sustainable throughput. Due to its protocol and packet structure, ESCON excels at bulk data transfer. It does not handle short transactions or interactivity well at all. Attempts to use ESCON for a high volume of small transfers results in a premature deterioration of its performance, causing it to fall well short of its potential. The distance limits and data rates for ESCON appear in Table 26.5.

Table 26.5. ESCON's distance limitations.

Physical Media Distance Data Rate

50 micron multimode fiber-optic cabling Up to 3 kilometers 200Mbps

62.5 micron multimode fiber-optic cabling Up to 3 kilometers
(9 with repeaters) 200Mbps

SCSI-II

Small Computer Systems Interface, version 2, known as SCSI-II, is a moderately high- bandwidth bus technology. It was designed for peer-to-peer connectivity of peripheral devices and at least one host. Its major limitations are the number of devices that can be connected, and the short distance that the bus can span. These limitations make SCSI-II useful only for connecting cluster hosts to storage devices. Table 26.6 shows the distance limits and data rates for SCSI-II.

Table 26.6. SCSI-II distance limitations.

Physical Media Max. Total Distance Data Rate

Ribbon cable (16-bit SCSI-II) 25 meters 10Mbps

50-pin shielded cable (16-bit SCSI-II) 25 meters 10Mbps

Ribbon cable (32-bit SCSI-II) 25 meters 40Mbps

50-pin shielded cable (32-bit SCSI-II) 25 meters 40Mbps

For asynchronous transmissions, the actual data rates achieved are a function of aggregate cable length and implementation. Synchronous transmissions are a function of cable length and SCSI implementation. For example, 32-bit SCSI-II buses are capable of transmission speeds of up to 40MB per second and 16-bit SCSI-II can transmit up to 10MB per second.

Fibre Channel

Fibre Channel was originally developed by IBM as an optical channel technology for mainframes. Its specification provided for a transmission rate of one gigabit per second! Given that mainframes are not likely to support time-sensitive applications like voice and videoconferencing any time soon, flexible-length packets were used.

Fibre Channel has since been implemented as a LAN technology. The physical layer specification for this technology provides for a variety of speed and distance trade-offs over most common transmission media. Table 26.7 shows the distance limits and data rates for Fibre Channel.

Table 26.7. Fibre Channel's distance limitations.

Physical Media Distance Data Rate

9 micron single-mode fiber-optic cabling Up to 10 kilometers 1062.5Mbaud

50 micron multimode fiber-optic cabling Up to 1 kilometer 531.25Mbaud

Up to 2 kilometers 265.6Mbaud

62.5 micron multimode fiber-optic cabling Up to 500 meters 265.6Mbaud

Up to 1 kilometer 132.8Mbaud

Video coaxial cabling Up to 25 meters 1062.5Mbaud

Up to 50 meters 531.25Mbaud

Up to 75 meters 265.6Mbaud

Up to 100 meters 132.8Mbaud

Miniature coaxial cabling Up to 10 meters 1062.5Mbaud

Up to 20 meters 531.25Mbaud

Up to 30 meters 265.6Mbaud

Up to 40 meters 132.8Mbaud

Shielded twisted pair Up to 50 meters 265.6Mbaud

Up to 100 meters 132.8Mbaud

This technology provides for an automatic scaling back of the clock rate if it begins to experience transmission errors. Thus, the values listed in Table 26.7 should be considered the maximum data rates that can be supported.

Summary of Network Technologies

Once the technologies have been identified for each required networking function, equal due diligence must be paid to their implementation. Networks must be considered a functional extension of the cluster. Thus, it is critical that they be implemented so as to reinforce the intended functionality of the cluster. For example, if the cluster supports a mission-critical application designed for 100 percent availability, the network, too, must be capable of 100 percent availability. This means selecting hardware that supports hot-swapping of circuit boards, protecting all the network's electrical components with uninterruptible power supplies (UPSs), and having fully redundant network paths to all clustered hosts. Otherwise, the cluster may find itself experiencing service outages that it cannot fix with a fail-over.

Similarly, if the cluster is designed for scalability, the network must be designed to scale upwards as gracefully and easily as the cluster. Ideally, the networks would be over-engineered relative to the cluster's initial loads. Thus, the additional ports for connectivity and bandwidth would be available in advance to support the cluster's growth as it materializes.

Another important implementation issue is addressing. Many of the network technologies mentioned are also available in a switched form. Switching is a technique that improves network performance by dividing the LAN into smaller segments, thereby providing greater overall bandwidth across the network. This performance gain brings with it additional cost and complexity in addressing. Is switching necessary and/or desirable in terms of the cluster's projected network requirements?

Switches work like high-speed bridges. They use tables to collate physical ports with addresses. Host naming and addressing must be worked out so that it can support the functionality of the planned cluster. For example, you need to ensure that, when a fail-over occurs, all network routing and switching tables are updated automatically to reflect the failed host's unavailability. Ideally, clients access the cluster using a single mnemonic and do not need to know any specific addresses. Properly planned and implemented, the network automatically routes around failed hosts without the clients' even knowing a failure occurred.

Summary

Clustering uniprocessors and SMPs is an exciting aspect of computer parallelism. It can be used to provide both fault tolerance and disaster recovery for mission-critical applications, as easily as it can support graceful scalability. By distributing the processing workload, clustering enables smaller, less expensive computers to simulate the processing power of larger, more expensive computers. Their distributed nature makes clusters absolutely dependent upon data networking technologies. If the wrong network technology is chosen, even in just one of the three identified functional areas, the results on aggregate cluster performance can be disastrous.

Matching network technologies to functional areas may seem trivial. After all, it is relatively easy to examine network technologies from an academic perspective and to select the ideal candidates for each network component of a planned cluster. This is especially true if the comparison is based on a limited number of obvious criteria, that is, the distance limitations and transmission speeds for various physical media. These are good, tangible criteria for a "paper" evaluation that can be used by anyone with a passing familiarity with data networking.

However, network selection criteria cannot stop there. There are many more subtle, yet significant criteria to consider. For example, is the network's maximum sustainable throughput capable of accommodating the traffic load? Will the network(s) scale upward with the projected growth of the cluster? Are there any hardware and/or software dependencies dictated by your cluster's platform, that is, availability of network interface cards (NICs) and software drivers, that would preclude the use of any network technologies? Is the network capable of living up to the cluster's expected availability rate?

In real life, numerous other variables, many of them subjective and/or non-technical, must also be factored into the selection of network technologies. Existing skill sets, training costs, budgetary constraints, vendor relationships, and so on, are all important constraints that must be considered when selecting technologies.

This may begin to resemble an impossible dilemma and, in fact, that's partially true. There is no single correct answer. Don't despair, the correct answers vary from project to project, and from company to company. To find your correct answers, start with a fundamental appreciation for the business goals and performance requirements of your cluster. Identify the expected intensity with which the cluster's resources, that is, CPU cycles or I/O, will be consumed by each cluster component.

Next, identify all the network technologies that may be appropriate for each network functional area required. Then, use the realities of your particular situation, including the local techno-politics and all the other soft criteria, as the final criteria for matching your short list of network technologies to your cluster's performance requirements.

Physical Media	Max. Distance	Data Rate
10base-2 thin coaxial cable	Up to 185 meters	10Mbps
10base-5 thick coaxial cable	Up to 500 meters	10Mbps
10base-T unshielded twisted pair	Up to 100 meters	10Mbps
10base-FL fiber-optic cable	Up to 2000 meters	10Mbps

Physical Media	Distance	Data Rate
62.5 micron multimode fiber-optic cabling (100base-FX)	Up to 412 meters	100Mbps
Category 3 unshielded twisted pair (100base-T4)	Up to 100 meters	100Mbps
Category 5 unshielded twisted pair (100base-TX)	Up to 100 meters	100Mbps

Physical Media	Max. Total Distance	Data Rate
Ribbon cable (16-bit SCSI-II)	25 meters	10Mbps
50-pin shielded cable (16-bit SCSI-II)	25 meters	10Mbps
Ribbon cable (32-bit SCSI-II)	25 meters	40Mbps
50-pin shielded cable (32-bit SCSI-II)	25 meters	40Mbps