The most powerful supercomputers are represented by a series. The most powerful supercomputers in the world

The K Computer supercomputer, which previously occupied first place, has been pushed to third place. Its performance is 11.28 Pflops (see Figure 1). Let us recall that FLOPS (FLoating-point Operations Per Second, FLOPS) is a unit of measurement of computer performance, which shows how many floating point operations per second a given computing system is capable of performing.

K Computer is a joint development of the Rikagaku Kenkiyo Institute of Physical and Chemical Research (RIKEN) and Fujitsu. It was created as part of the High-Performance Computing Infrastructure initiative led by the Japanese Ministry of Education, Culture, Sports, Science and Technology (MEXT). The supercomputer is installed on the territory of the Institute of Advanced Computing Sciences in the Japanese city of Kobe.

The supercomputer is based on a distributed memory architecture. The system consists of more than 80,000 compute nodes and is housed in 864 racks, each of which accommodates 96 compute nodes and 6 I/O nodes. The nodes, each containing one processor and 16 GB of RAM, are interconnected in accordance with the “six-dimensional loop / torus” topology. The system uses a total of 88,128 eight-core SPARC64 VIIIfx processors (705,024 cores) manufactured by Fujitsu using 45 nm technology.

This general purpose supercomputer provides high levels of performance and support for a wide range of applications. The system is used to conduct research in the fields of climate change, disaster prevention and medicine.

The unique water cooling system reduces the likelihood of equipment failure and reduces overall energy consumption. Energy savings are achieved through the use of highly efficient equipment, a heat and electricity cogeneration system and an array of solar panels. In addition, the mechanism for reusing waste water from the cooler reduces the negative impact on the environment.

The building in which K Computer is located is earthquake-resistant and can withstand earthquakes of magnitude 6 or more on the Japanese scale (0-7). To more efficiently accommodate equipment racks and cables, the third floor, measuring 50 × 60 m, is completely free of load-bearing columns. Modern construction technologies have made it possible to ensure an acceptable load level (up to 1 t/m2) for the installation of racks, the weight of which can reach 1.5 tons.

SEQUOIA SUPERCOMPUTER

The Sequoia supercomputer installed at the Lawrence Livermore National Laboratory. Lawrence, has a performance of 16.32 Pflops and ranks second in the ranking (see Figure 2).

This petaflop supercomputer, developed by IBM based on Blue Gene/Q, was created for the US National Nuclear Security Administration (NNSA) as part of the Advanced Simulation and Computing program.

The system consists of 96 racks and 98,304 compute nodes (1024 nodes per rack). Each node includes a 16-core PowerPC A2 processor and 16 GB of DDR3 RAM. In total, 1,572,864 processor cores and 1.6 PB of memory are used. The nodes are connected to each other in accordance with the “five-dimensional torus” topology. The area occupied by the system is 280 m2. Total energy consumption is 7.9 MW.

The Sequoia supercomputer was the first in the world to carry out scientific calculations that required more than 10 Pflops of computing power. Thus, the HACC cosmology simulation system required about 14 Pflops when running in the 3.6 trillion particle mode, and when running the Cardiod project code for simulating the electrophysiology of the human heart, performance reached almost 12 Pflops.

TITAN SUPERCOMPUTER

The Titan supercomputer, installed at the Oak Ridge National Laboratory (ORNL) in the USA, was recognized as the world's fastest supercomputer. In Linpack benchmark tests, its performance was 17.59 Pflops.

Titan implements a hybrid CPU-GPU architecture (see Figure 3). The system consists of 18,688 nodes, each equipped with a 16-core AMD Opteron processor and an Nvidia Tesla K20X graphics accelerator. A total of 560,640 processors are used. Titan is an update to ORNL's previously operated Jaguar supercomputer and occupies the same server cabinets (total area of ​​404 m2).

The ability to use existing power and cooling systems saved approximately $20 million during construction. The power consumption of the supercomputer is 8.2 MW, which is 1.2 MW more than the Jaguar, while its performance in floating point operations is almost 10 times higher.

Titan will primarily be used to conduct research in materials science and nuclear energy, as well as research related to improving the efficiency of internal combustion engines. In addition, it will be used to model climate change and analyze potential strategies to address its negative impacts.

THE "GREENEST" SUPERCOMPUTER

In addition to the Top500 rating, aimed at identifying the most high-performance system, there is a Green500 rating, which recognizes the “greenest” supercomputers. Here, the energy efficiency indicator (Mflops/W) is taken as a basis. At the moment (the latest release of the rating is November 2012), the leader of the Green500 is the supercomputer Beacon (253rd place in the Top500). Its energy efficiency indicator is 2499 Mflops/W.

Beacon is powered by Intel Xeon Phi 5110P coprocessors and Intel Xeon E5-2670 processors, so peak performance can reach 112,200 Gflops with a total power consumption of 44.9 kW. Xeon Phi 5110P coprocessors provide high performance with low power consumption. Each coprocessor has 1 teraflops of power (double precision) and supports up to 8 GB of GDDR5 memory with 320 Gbps bandwidth.

The Xeon Phi 5110P's passive cooling system is rated at 225W TDP, which is ideal for high-density servers.

SUPERCOMPUTER EURORA

However, in February 2013, reports emerged that the Eurora supercomputer, located in Bologna (Italy), surpassed Beacon in energy efficiency (3150 Mflops/watt versus 2499 Mflops/W).

Eurora is built by Eurotech and consists of 64 nodes, each of which includes two Intel Xeon E5-2687W processors, two Nvidia Tesla K20 GPU accelerators and other hardware. The dimensions of such a node do not exceed the dimensions of a laptop, but their performance is 30 times higher and power consumption is 15 times lower.

High energy efficiency in Eurora is achieved through the use of several technologies. Water cooling makes the greatest contribution. Thus, each supercomputer node is a kind of sandwich: central equipment at the bottom, a water heat exchanger in the middle, and another electronics unit at the top (see Figure 4).

Such high results are achieved thanks to the use of materials with good thermal conductivity, as well as an extensive network of cooling channels. When installing a new computing module, its channels are combined with the channels of the cooling system, which allows you to change the configuration of the supercomputer depending on specific needs. According to the manufacturers, the risk of leaks is eliminated.

The Eurora supercomputer elements are powered by 48-volt DC sources, the introduction of which has reduced the number of energy conversions. Finally, the warm water removed from computing equipment can be used for other purposes.

CONCLUSION

The supercomputer industry is actively developing and setting more and more new records for performance and energy efficiency. It should be noted that it is in this industry, like nowhere else, that liquid cooling and 3D modeling technologies are widely used today, since specialists are faced with the task of assembling a super-powerful computing system that would be able to function in a limited volume with minimal energy losses.

Yuri Khomutsky- Chief Project Engineer at I-Teco. He can be contacted at: [email protected]. The article uses materials from the Internet portal about data centers “www.AboutDC.ru - Solutions for Data Centers”.

Home → History of domestic computer technology → Supercomputers

Supercomputers

Andrey Borzenko

Supercomputers are the fastest computers. Their main difference from mainframes is the following: all the resources of such a computer are usually aimed at solving one or at least several tasks as quickly as possible, while mainframes, as a rule, perform a fairly large number of tasks that compete with each other. The rapid development of the computer industry determines the relativity of the basic concept - what ten years ago could be called a supercomputer, today no longer falls under this definition. There is also a humorous definition of a supercomputer: it is a device that reduces the computing problem to an input-output problem. However, there is some truth in it: often the only bottleneck in a high-speed system is the I/O devices. You can find out which supercomputers currently have the maximum performance from the official list of the five hundred most powerful systems in the world - Top500 (http://www.top500.org), which is published twice a year.

In any computer, all the main parameters are closely related. It is difficult to imagine a universal computer that has high performance and scanty RAM, or huge RAM and a small disk space. For this reason, supercomputers are currently characterized not only by maximum performance, but also by the maximum amount of RAM and disk memory. Providing such technical characteristics is quite expensive - the cost of supercomputers is extremely high. What tasks are so important that they require systems costing tens or hundreds of millions of dollars? As a rule, these are fundamental scientific or engineering computing problems with a wide range of applications, the effective solution of which is possible only with the availability of powerful computing resources. Here are just a few areas where this type of problem arises:

  • predictions of weather, climate and global changes in the atmosphere;
  • materials science;
  • construction of semiconductor devices;
  • superconductivity;
  • structural biology;
  • development of pharmaceuticals;
  • human genetics;
  • quantum chromodynamics;
  • astronomy;
  • automotive industry;
  • transport tasks;
  • hydro- and gas dynamics;
  • controlled thermonuclear fusion;
  • efficiency of fuel combustion systems;
  • oil and gas exploration;
  • computational problems in ocean sciences;
  • speech recognition and synthesis;
  • image recognition.

Supercomputers calculate very quickly thanks not only to the use of the most modern element base, but also to new solutions in system architecture. The main place here is occupied by the principle of parallel data processing, which embodies the idea of ​​simultaneous (parallel) execution of several actions. Parallel processing has two types: pipeline and actual parallelism. The essence of pipeline processing is to highlight individual stages of performing a general operation, and each stage, having completed its work, passes the result to the next one, while simultaneously accepting a new portion of input data. An obvious gain in processing speed is obtained by combining previously spaced operations.

If a certain device performs one operation per unit of time, then it will perform a thousand operations in a thousand units. If there are five identical independent devices capable of operating simultaneously, then a system of five devices can perform the same thousand operations not in a thousand, but in two hundred units of time. Similarly, a system of N devices will perform the same work in 1000/N units of time.

Of course, today few people are surprised by parallelism in computer architecture. All modern microprocessors use some form of parallel processing, even within the same chip. At the same time, these ideas themselves appeared a very long time ago. Initially, they were implemented in the most advanced, and therefore single, computers of their time. Here, special credit goes to IBM and Control Data Corporation (CDC). We are talking about such innovations as bit-parallel memory, bit-parallel arithmetic, independent input-output processors, command pipeline, pipeline independent functional units, etc.

Usually the word “supercomputer” is associated with Cray computers, although today this is far from the case. The developer and chief designer of the first supercomputer was Seymour Cray, one of the most legendary figures in the computer industry. In 1972, he left CDC and founded his own company, Cray Research. The first supercomputer, CRAY-1, was developed four years later (in 1976) and had a vector-pipeline architecture with 12 pipelined functional units. The Cray-1's peak performance was 160 MT/s (12.5 ns clock time), and the 64-bit RAM (which could be expanded to 8 MB) had a cycle time of 50 ns. The main innovation was, of course, the introduction of vector commands that work with entire arrays of independent data and allow efficient use of pipeline functional devices.

Throughout the 60-80s, the attention of the world's leaders in the production of supercomputers was focused on the production of computing systems that were good at solving large-volume floating-point problems. There was no shortage of such tasks - almost all of them were related to nuclear research and aerospace modeling and were carried out in the interests of the military. The desire to achieve maximum performance in the shortest possible time meant that the criterion for assessing the quality of a system was not its price, but its performance. For example, the Cray-1 supercomputer then cost from 4 to 11 million dollars, depending on the configuration.

At the turn of the 80-90s. The Cold War ended and military orders were replaced by commercial ones. By that time, the industry had made great strides in the production of serial processors. They had approximately the same computing power as custom ones, but were significantly cheaper. The use of standard components and a variable number of processors made it possible to solve the scalability problem. Now, as the computing load increased, it was possible to increase the performance of the supercomputer and its peripheral devices by adding new processors and I/O devices. Thus, in 1990, the Intel iPSC/860 supercomputer appeared with the number of processors equal to 128, which showed a performance of 2.6 Gflops on the LINPACK test.

Last November, the 18th edition of the list of the 500 most powerful computers in the world - Top500 - was published. The leader of the list is still IBM Corporation (http://www.ibm.com), which owns 32% of installed systems and 37% of total productivity. Interesting news was the emergence of Hewlett-Packard in second place in terms of the number of systems (30%). Moreover, since all these systems are relatively small, their total performance is only 15% of the entire list. Following the merger with Compaq, the new company is expected to dominate the list. Next in terms of number of computers on the list are SGI, Cray and Sun Microsystems.

The most powerful supercomputer in the world was still the ASCI White system (we will return to it later), installed at the Livermore Laboratory (USA) and showing a performance of 7.2 Tflops on the LINPACK test (58% of peak performance). In second place was the Compaq AlphaServer SC system installed at the Pittsburgh Supercomputing Center with a performance of 4 Tflops. The Cray T3E system closes the list with LINPACK performance of 94 Gflops.

It is worth noting that the list already included 16 systems with a performance of more than 1 teraflops, half of which were installed by IBM. The number of systems that are clusters of small SMP blocks is steadily increasing - there are now 43 such systems on the list. However, the majority of the list is still for massively parallel systems (50%), followed by clusters consisting of large SMP systems (29%).

Types of architectures

The main parameter for classifying parallel computers is the presence of shared or distributed memory. Something in between are architectures where memory is physically distributed but logically shared. From a hardware point of view, two main schemes suggest themselves for implementing parallel systems. The first is several separate systems, with local memory and processors, interacting in some environment by sending messages. The second is systems that interact through shared memory. Without going into technical details for now, let's say a few words about the types of architectures of modern supercomputers.

The idea of ​​massively parallel systems with distributed memory (Massively Parallel Processing, MPP) is quite simple. For this purpose, ordinary microprocessors are taken, each of which is equipped with its own local memory and connected through some kind of switching medium. There are many advantages to such an architecture. If you need high performance, you can add more processors, and if finances are limited or the required computing power is known in advance, then it is easy to select the optimal configuration. However, MPP also has disadvantages. The fact is that interaction between processors is much slower than data processing by the processors themselves.

In parallel computers with shared memory, all the RAM is shared among several identical processors. This removes the problems of the previous class, but adds new ones. The fact is that the number of processors with access to shared memory cannot be made large for purely technical reasons.

The main features of vector-pipeline computers are, of course, pipeline functional units and a set of vector instructions. Unlike the traditional approach, vector commands operate on entire arrays of independent data, which allows efficient loading of available pipelines.

The last direction, strictly speaking, is not independent, but rather a combination of the previous three. A computing node is formed from several processors (traditional or vector-pipeline) and their common memory. If the obtained computing power is not enough, then several nodes are combined with high-speed channels. As you know, such an architecture is called cluster.

MPP systems

Massively parallel scalable systems are designed to solve application problems that require a large amount of computing and data processing. Let's take a closer look at them. As a rule, they consist of homogeneous computing nodes, including:

  • one or more central processing units;
  • local memory (direct access to the memory of other nodes is not possible);
  • communications processor or network adapter;
  • sometimes hard drives and/or other input/output devices.

In addition, special I/O nodes and control nodes can be added to the system. They are all connected through some communication medium (high-speed network, switch, etc.). As for the OS, there are two options. In the first case, a full-fledged OS runs only on the control machine, while each node runs a greatly reduced version of the OS, providing only the operation of the branch of the parallel application located in it. In another case, each node runs a full-fledged UNIX-like OS.

The number of processors in distributed memory systems is theoretically unlimited. Using such architectures, it is possible to build scalable systems whose performance increases linearly with the number of processors. By the way, the term “massively parallel systems” itself is usually used to refer to such scalable computers with a large number (tens and hundreds) of nodes. Scalability of a computing system is necessary to proportionally speed up calculations, but, alas, it is not enough. To obtain an adequate gain in solving a problem, a scalable algorithm is also required that can load all the processors of a supercomputer with useful calculations.

Let us recall that there are two models of program execution on multiprocessor systems: SIMD (single instruction stream - multiple data streams) and MIMD (multiple instruction streams - multiple data streams). The first assumes that all processors execute the same command, but each on its own data. In the second, each processor processes its own command stream.

In systems with distributed memory, to transfer information from processor to processor, a mechanism for passing messages over a network connecting computing nodes is required. To abstract from the details of the functioning of communication equipment and program at a high level, message passing libraries are usually used.

Intel Supercomputers

Intel Corporation (http://www.intel.com) is well known in the world of supercomputers. Its distributed-memory Paragon multiprocessor computers have become as classic as Cray Research's vector-pipeline computers.

Intel Paragon uses five i860 XP processors with a clock frequency of 50 MHz in one node. Sometimes processors of different types are placed in one node: scalar, vector and communication. The latter serves to relieve the main processor from performing operations related to message transmission.

The most significant characteristic of the new parallel architecture is the type of communication equipment. The two most important indicators of a supercomputer’s operation depend on it: the speed of data transfer between processors and the overhead of transmitting one message.

The interconnect is designed to provide high messaging speeds with minimal latency. It provides the connection of more than a thousand heterogeneous nodes along a two-dimensional rectangular lattice topology. However, for most application development, any node can be considered to be directly connected to all other nodes. The interconnect is scalable: its throughput increases with the number of nodes. When designing, the developers sought to minimize the participation in message transmission of those processors that execute user processes. For this purpose, special message processing processors have been introduced, which are located on the node board and are responsible for processing the messaging protocol. As a result, the main processors of the nodes are not distracted from solving the problem. In particular, there is no expensive switching from task to task, and the solution of applied problems occurs in parallel with the exchange of messages.

The actual transmission of messages is carried out by a routing system based on the components of the router of network nodes (Mesh Router Components, MRC). For MRC access of a given node to its memory, the node also has a special interface network controller, which is a custom VLSI that provides simultaneous transmission to and from the node’s memory, as well as monitoring errors during message transmission.

The modular design of Intel Paragon does more than just support scalability. It allows us to count on the fact that this architecture will serve as the basis for new computers based on other microprocessors or using new messaging technologies. Scalability also relies on balancing the various blocks of a supercomputer at a variety of levels; otherwise, as the number of nodes increases, a bottleneck may appear somewhere in the system. Thus, the speed and memory capacity of the nodes are balanced with the bandwidth and latency of the interconnect, and the performance of the processors inside the nodes is balanced with the bandwidth of the cache memory and RAM, etc.

Until recently, one of the fastest computers was Intel ASCI Red - the brainchild of the Accelerated Strategic Computing Initiative ASCI (Accelerated Strategic Computing Initiative). The three largest US national laboratories (Livermore, Los Alamos and Sandia) participate in this program. Built for the US Department of Energy in 1997, ASCI Red combines 9152 Pentium Pro processors, has 600 GB of total RAM and a total performance of 1800 billion operations per second.

IBM supercomputers

When universal systems with scalable parallel architecture SP (Scalable POWER parallel) from IBM Corporation (http://www.ibm.com) appeared on the computer market, they quickly gained popularity. Today, such systems operate in a variety of application areas, such as computational chemistry, accident analysis, electronic circuit design, seismic analysis, reservoir modeling, decision support, data analytics, and online transaction processing. The success of SP systems is determined primarily by their versatility, as well as the flexibility of the architecture, based on a distributed memory model with message passing.

Generally speaking, an SP supercomputer is a scalable, massively parallel general-purpose computing system consisting of a set of RS/6000 base stations connected by a high-performance switch. Indeed, who doesn’t know, for example, the supercomputer Deep Blue, which managed to beat Garry Kasparov at chess? But one of its modifications consists of 32 nodes (IBM RS/6000 SP), based on 256 P2SC (Power Two Super Chip) processors.

The RS/6000 family is IBM's second generation of computers, based on the limited instruction set architecture (RISC) developed by the corporation in the late 1970s. With this concept, a very simple set of commands is used to do all the work in a computer system. Because the commands are simple, they can be executed at very high speeds and also provide a more efficient implementation of the executable program. The RS/6000 family is based on the POWER architecture (Performance Optimized by Advanced RISC architecture) and its derivatives - PowerPC, P2SC, POWER3, etc. Because the POWER architecture combines RISC architecture concepts with some more traditional concepts, the result is system with optimal overall performance.

The RS/6000 SP system provides the power of multiple processors to solve the most complex computing problems. The SP switching system is IBM's latest innovation in high-bandwidth, latency-free interprocessor communication for efficient parallel computing. Several types of processor nodes, variable frame (rack) sizes and a variety of additional I/O capabilities ensure the selection of the most suitable system configuration. SP is supported by leading software vendors in areas such as parallel databases and real-time transaction processing, as well as major technical software vendors in areas such as seismic processing and engineering design.

IBM RS/6000 SP enhances application capabilities with parallel processing. The system removes performance limitations and helps avoid problems associated with scaling and the presence of indivisible, separately executed fragments. With over a thousand customers installed worldwide, SPs provide solutions for complex and high-volume technical and commercial applications.

The SP main unit is a processor node that has an RS/6000 workstation architecture. There are several types of SP nodes: Thin, Wide, High, differing in a number of technical parameters. For example, High nodes based on POWER3-II include up to 16 processors and up to 64 GB of memory, but Thin nodes allow no more than 4 processors and 16 GB of memory.

The system is scalable up to 512 nodes, and it is possible to combine different types of nodes. The nodes are installed in racks (up to 16 nodes in each). SP can scale disks almost linearly along with processors and memory, allowing true access to terabytes of memory. This increase in power makes it easier to build and expand the system.

The nodes are interconnected by a high-performance switch (IBM high-performance switch), which has a multi-stage structure and operates with packet switching.

Each SP node runs a full AIX operating system, allowing you to leverage thousands of pre-existing AIX applications. In addition, system nodes can be combined into groups. For example, several nodes can act as Lotus Notes servers, while all the others process a parallel database.

Managing large systems is always a challenging task. SP uses a single graphical console for this purpose, which displays hardware and software states, running tasks, and user information. The system administrator, using such a console (control workstation) and the PSSP (Parallel Systems Support Programs) software product attached to the SP, solves management tasks, including managing password protection and user permissions, accounting for performed tasks, print management, system monitoring, launching and turning off the system.

The best

As already noted, according to Top500 (table), the most powerful supercomputer of our time is ASCI White, which occupies an area the size of two basketball courts and is installed at the Livermore National Laboratory. It includes 512 SMP nodes based on 64-bit POWER3-II processors (for a total of 8192 processors) and uses new Colony communications technology with a throughput of approximately 500 MB/s, which is almost four times faster than the SP high-performance switch.

Top ten Top500 (18th edition)

Position Manufacturer Computer Where installed A country Year Number of processors
1 IBM ASCI White USA 2000 8192
2 Compaq AlphaServer SC Pittsburgh Supercomputing Center USA 2001 3024
3 IBM SP Power3 NERSC Energy Research Institute USA 2001 3328
4 Intel ASCI Red Sandia National Laboratory USA 1999 9632
5 IBM ASCI Blue Pacific Livermore National Laboratory USA 1999 5808
6 Compaq AlphaServer SC USA 2001 1536
7 Hitachi SR8000/MPP Tokyo University Japan 2001 1152
8 SGI ASCI Blue Mountain Los Alamos National Laboratory USA 1998 6144
9 IBM SP Power3 Oceanographic Center NAVOCEANO USA 2000 1336
10 IBM SP Power3 German weather service Germany 2001 1280

The architecture of the new supercomputer is based on the proven massively parallel RS/6000 architecture and provides performance of 12.3 teraflops (trillion operations per second). The system includes a total of 8 TB of RAM distributed across 16-processor SMP nodes and 160 TB of disk memory. Delivering the system from IBM laboratories in New York state to Livermore, California, required 28 truck-trailers.

All system nodes run the AIX OS. The supercomputer is being used by US Department of Energy scientists to run complex 3D models to keep nuclear weapons safe. Actually, ASCI White is the third step in ASCI's five-stage program, which plans to create a new supercomputer in 2004. Generally speaking, ASCI White consists of three separate systems, of which White is the largest (512 nodes, 8192 processors), and there is also Ice (28 nodes, 448 processors) and Frost (68 nodes, 1088 processors).

The predecessor of ASCI White was the Blue Pacific supercomputer (another name for ASCI Blue), which included 1464 four-processor nodes based on PowerPC 604e/332 MHz crystals. The nodes are connected into a single system using cables totaling nearly five miles, and the computer room area is 8 thousand square feet. The ASCI Blue system consists of a total of 5856 processors and provides peak performance of 3.88 teraflops. The total amount of RAM is 2.6 TB.

A supercomputer consists of kilometers of cables.

The US National Center for Atmospheric Research (NCAR) has selected IBM as the supplier of the world's most powerful supercomputer designed to predict climate change. The system, known as Blue Sky, will increase NCAR's climate modeling capabilities by an order of magnitude when fully operational this year. The core of Blue Sky will be the IBM SP supercomputer and IBM eServer p690 systems, the use of which will achieve peak performance of almost 7 Tflops with a volume of IBM SSA disk subsystem of 31.5 TB.

The supercomputer, called Blue Storm, is being created by order of the European Center for Medium-Range Weather Forecasts (ECMWF). Blue Storm will be twice as powerful as ASCI White. To create it, you need 100 IBM eServer p690 servers, also known as Regatta. Each system unit, the size of a refrigerator, contains more than a thousand processors. In 2004, Blue Storm will be equipped with new generation p960 servers, which will make it twice as powerful. The supercomputer will run the AIX OS. Initially, the total capacity of Blue Storm drives will be 1.5 petabytes, and the computing power will be about 23 teraflops. The system will weigh 130 tons, and will be 1,700 times more powerful than the Deep Blue chess supercomputer.

IBM researchers are working with Livermore National Laboratory on the Blue Gene/L and Blue Gene/C computers. These computers are part of the 5-year Blue Gene project, which began back in 1999 to study proteins, in which $100 million was invested. The creation of a new supercomputer Blue Gene/L (200 teraflops) will be completed in 2004 - for six months - a year earlier than the expected completion of work on the more powerful Blue Gene/C computer (1000 teraflops). The design performance of Blue Gene/L will thus exceed the combined performance of the 500 most powerful computers in the world. At the same time, the new supercomputer occupies an area equal to only half a tennis court. IBM engineers also worked to reduce energy consumption - they managed to reduce it by 15 times.

Notes

LINPACK tests.
LINPACK benchmarks are based on solving a system of linear equations with a dense matrix of coefficients over a real number field using Gaussian elimination. Real numbers are usually represented with full precision. Due to the large number of operations on real numbers, LINPACK results are considered to be the benchmark for the performance of hardware and software configurations in areas that intensively use complex mathematical calculations.

Earth Simulator.
According to New Scientist magazine, in the new, 19th version of the Top500 list of supercomputers, the supercomputer system for the NEC Corporation's Earth Simulator project will take first place. It is installed at the Japanese Institute of Earth Sciences (Yokohama Institute for Earth Sciences) in Kanagawa, Yokohama Prefecture. The developers claim that its peak performance can reach 40 Tflops.

The Earth Simulator supercomputer is designed to simulate climate change based on data received from satellites. According to NEC representatives, high computer performance is achieved through the use of specially designed vector processors. The system is based on 5120 such processors, combined into 640 SX-6 nodes (8 processors each). The supercomputer runs SUPER-UX OS. Development tools include compilers for C/C++, Fortran 90 and HPF languages, as well as automatic vectorization tools, an implementation of the MPI-2 interface and the ASL/ES mathematical library. The entire machine occupies the area of ​​three tennis courts (50 by 65 m) and uses several kilometers of cable.


The first Atlas supercomputer appeared in the early 60s and was installed at the University of Manchester. It was several times less powerful than modern home computers. Our review contains the “ten” most powerful supercomputers in history. True, due to the rapidly developing technologies in this area, these powerful machines become obsolete in an average of 5 years.

The performance of modern supercomputers is measured in petaflops, a unit of measurement that shows how many floating point operations a computer can perform per second. Today we will talk about the ten most expensive modern supercomputers.

1. IBM Roadrunner (USA)


$130 million
Roadrunner was built by IBM in 2008 for the Los Alamos National Laboratory (New Mexico, USA). It became the world's first computer whose average operating performance exceeded 1 petaflops. At the same time, it was designed for a maximum performance of 1.7 petaflops. According to the Supermicro Green500 list, Roadrunner was the fourth most energy-efficient supercomputer in the world in 2008. Roadrunner was retired on March 31, 2013, after which it was replaced by a smaller, more energy-efficient supercomputer called Cielo.

2. Vulcan BlueGene/Q (USA)


$100 million
Vulcan is a supercomputer consisting of 24 individual rack units that was built by IBM for the Department of Energy and installed at the Lawrence Livermore National Laboratory in California. It has a peak performance of 5 petaflops and is currently the ninth fastest supercomputer in the world. Vulcan entered service in 2013 and is now used by the Livermore National Laboratory for research in biology, plasma physics, climate change, molecular systems, and more.

3. SuperMUC (Germany)

$111 million
SuperMUC is currently the 14th fastest supercomputer in the world. In 2013 it was 10th, but technology development does not stand still. However, it is currently the second fastest supercomputer in Germany. SuperMUC is run by the Leibniz Supercomputing Center at the Bavarian Academy of Sciences near Munich.

The system was created by IBM, runs on Linux, contains more than 19,000 Intel and Westmere-EX processors, and has a peak performance of just over 3 petaflops. SuperMUC is used by European researchers in the fields of medicine, astrophysics, quantum chromodynamics, computational fluid dynamics, computational chemistry, genome analysis and earthquake modeling.

4. Trinity (USA)

$174 million
One would expect that a supercomputer like this (given what it is being built for) would be insanely expensive, but advances in technology have made it possible to reduce the price of Trinity. The US government intends to use Trinity to maintain the efficiency and security of America's nuclear arsenal.

Trinity, currently under construction, will be a joint project between Sandia National Laboratory and Los Alamos National Laboratory as part of the National Nuclear Security Administration's Predictive Modeling and Computational Data Science program.

5. Sequoia BlueGene/Q (USA)


$250 million
The Sequoia BlueGene/Q class supercomputer was developed by IBM for the National Nuclear Security Administration as part of its Predictive Modeling and Computational Data Processing program. It went into operation in June 2012 at the Livermore National Laboratory and became the fastest supercomputer in the world at that time. Now it ranks third in the world in speed (Sequoia's theoretical peak performance is 20 petaflops or 20 trillion calculations per second).

The computer operates stably at 10 petaflops. Used by Sequoia to support a variety of scientific applications, studying astronomy, energy, the human genome, climate change and nuclear weapons development.

6. ASC Purple and BlueGene / L (USA)


$290 million
These two supercomputers worked together. They were built by IBM and installed in 2005 at the Livermore National Laboratory. They were taken out of service in 2010. At the time of its inception, ASC Purple was ranked 66th in speed on the list of top 500 supercomputers, and BlueGene/L was the previous generation of the BlueGene/Q model.

ASCI Purple was built for the fifth phase of the US Department of Energy and National Nuclear Security Administration's Predictive Modeling and Computational Data Processing program. Its purpose was to simulate and replace real tests of weapons of mass destruction. BlueGene/L was used to predict global climate change.

7. Sierra and Summit (USA)


$325 million
Nvidia and IBM will soon help America regain its leadership position in ultra-fast supercomputing technology, scientific research, and economic and national security. Both computers will be finished in 2017.

Currently, the fastest supercomputer in the world is China's Tianhe-2, which is capable of reaching 55 petaflops, which is twice as fast as the device in second place on the list. Sierra will produce more than 100 petaflops, while Summit will be able to achieve 300 petaflops.

Sierra, which will be installed at Livermore National Laboratory, will ensure the safety and efficiency of the nation's nuclear program. Summit will replace the aging Titan supercomputer at Oak Ridge National Laboratory and will be designed to test and support scientific applications around the world.

8. Tianhe-2 (China)

$390 million
China's Tianhe-2 (which translates to "Milky Way 2") is the world's fastest supercomputer. The computer, developed by a team of 1,300 scientists and engineers, is located at the National Supercomputing Center in Guangzhou. It was built by the Chinese People's Liberation Army Defense Science and Technology University. Tianhe-2 is capable of performing 33,860 trillion calculations per second. For example, one hour of supercomputer calculations is equivalent to 1,000 years of work by 1.3 billion people. The machine is used to model and analyze government security systems.

9. Earth Simulator (Japan)


$500 million
Earth Simulator was developed by the Japanese government back in 1997. The project cost is 60 billion yen, or approximately $500 million. Earth Simulator was completed in 2002 for the Japan Aerospace Exploration Agency, the Japan Atomic Energy Research Institute, and the Japan Center for Marine and Land Research and Technology.

ES was the fastest supercomputer in the world from 2002 to 2004, and it still serves today for working with global climate models, assessing the effects of global warming and assessing problems in the geophysics of the Earth's crust.

10. Fujitsu K (Japan)

$1.2 billion
The world's most expensive supercomputer is only the fourth fastest in the world (11 petaflops). In 2011, it was the fastest supercomputer in the world. Fujitsu K, located at the RIKEN Institute of Advanced Computing Technology, is approximately 60 times faster than Earth Simulator. Its maintenance costs about $10 million a year, and the supercomputer uses 9.89 MW (the same amount of electricity used by 10,000 country houses or one million personal computers).

It is worth noting that modern scientists have stepped so far that they have already appeared.

In recent years, computer design and production companies have been working tirelessly. As a result, the amount of technology in the world is growing exponentially.

The most powerful computers

Just recently, the world did not know about DirectX10, and the graphics of FarCry or NFS Underground 2 seemed to be the pinnacle of computer capabilities. Once upon a time, a disk capable of storing 600 megabytes of information seemed like a miracle of technology, but now terabyte memory cards are freely available.

In the field of supercomputers, much the same thing happens. In 1993, University of Tennessee professor Jack Dongarra came up with the idea of ​​creating a ranking of the most powerful computers in the world. Since then, this list, called the TOP500, has been updated twice a year: in June and November.

Time passes, and the leaders in the supercomputer ratings of the early 90s are already ungodly outdated even by the standards of ordinary PC users. So, the first in 1993 was the CM-5/1024, assembled by Thinking Machines: 1024 processors with a clock frequency of 32 MHz, a computing speed of 59.7 gigaflops - slightly faster than an ordinary 8-core PC under your desk. What is the best computer today?


Sunway TaihuLight

Just five years ago, the palm in terms of power was consistently held by supercomputers made in the USA. In 2013, Chinese scientists seized the leadership and, apparently, are not going to give it up.

At the moment, the most powerful computer in the world is considered to be the Sunway TaihuLight (translated as “The Divine Light Power of Lake Taihu”), a grandiose machine with a computing speed of 93 petaflops (maximum speed - 125.43 petaflops). This is 2.5 times more powerful than the previous record holder - the Tianhe-2 supercomputer, which was considered the most powerful until June 2016.


Sunway Taihulight has 10.5 million built-in cores (40,960 processors, each with 256 computing and 4 control cores).

This is what the most powerful computer of 2016 looks like

All equipment was developed and manufactured in China, while the processors of the previous most powerful computer were produced by the American company Intel. The cost of Sunway TaihuLight is estimated at $270 million. The supercomputer is located at the National Supercomputer Center of Wuxi County.

Record holders of past years

Until June 2016 (and the TOP500 list is updated every June and November), the most powerful and fastest computer was the Tianhe-2 supermachine (translated from Chinese as “Milky Way”), developed in China at the Defense Science and Technology University in Changsha with the help of the company Inspur.


The power of Tianhe-2 provides 2507 trillion operations per second (33.86 petaflops per second), peak performance is 54.9 petaflops. The Chinese development has topped this ranking since its launch in 2013 – an incredibly impressive figure!

Supercomputer Tianhe-2

The characteristics of Tianhe-2 are as follows: 16 thousand nodes, 32 thousand 12-core Intel Xeon E5-2692 processors and 48 thousand 57-core Intel Xeon Phi 31S1P accelerators, which means 3,120,000 cores in total; 256 thousand DDR3 RAM sticks of 4 GB each and 176,000 GDDR5 8 GB sticks - 2,432,000 GB of RAM in total. Hard disk capacity is more than 13 million GB. However, you won’t be able to play on it - it is intended solely for computing, and Milky Way 2 does not have a video card installed. In particular, it helps with calculations for laying subways and urban development.

Jaguar

For a long time, Jaguar, a supercomputer from the USA, was at the top of the ranking. How is it different from the others and what are its technical advantages?


The supercomputer, called Jaguar, consists of a large number of independent cells divided into two sections - XT4 and XT5. The last section contains exactly 18688 computational cells. Each cell contains two six-core AMD Opteron 2356 processors with a frequency of 2.3 GHz, 16 GB of DDR2 RAM, as well as a SeaStar 2+ router. Even one cell from this section would be enough to create the most powerful computer for gaming. The section contains only 149,504 computing cores, a huge amount of RAM - more than 300 TB, as well as a performance of 1.38 Petaflops and more than 6 Petabytes of disk space.

Building a computer monster

The XT4 partition contains 7832 cells. Their characteristics are more modest than those of the previous XT5 section: each cell contains one six-core processor with a frequency of 2.1 GHz, 8 GB of RAM and a SeaStar 2 router. In total, the section has 31,328 computing cores and more than 62 TB of memory, as well as a peak performance of 263 TFLOPS and more than 600 TB of disk space. The Jaguar supercomputer runs on its own operating system, Cray Linux Environment.

Another computer is breathing in the back of Jaguar, the brainchild of IBM - Roadrunner. The most powerful computing monster is capable of calculating up to 1000,000,000,000 operations per second. It was developed specifically for the Department of Energy's National Nuclear Security Administration at Los Alamos. With the help of this supercomputer they planned to monitor the operation of all nuclear installations located in the United States.


The Road Runner's peak processing speed is about 1.5 petaflops. We are talking about a total capacity of 3,456 original tri-blade servers, each of which is capable of performing about 400 billion operations per second (that is, 400 gigaflops). Inside the Roadrunner there are about 20 thousand high-performance dual-core processors - 12,960 Cell Broadband Engine and 6948 AMD Opteron, the brainchild of IBM itself. Such a supercomputer has a system memory of 80 terabytes.

So how much space does this miracle of technology take up? The machine is located on an area of ​​560 square meters. And all the department’s equipment is packaged in servers of the original architecture. All equipment weighs about 23 tons. So to transport it, the National Nuclear Security Administration staff will need at least 21 large tractors.

A few words about what petaflops is. One petaflop is approximately equal to the total power of 100 thousand modern laptops. If you try to imagine, they can pave a road almost two and a half kilometers long. Another accessible comparison: within 46 years, the entire population of the planet will be using calculators to do calculations that Roadrunner can do in one day. Can you imagine how little Sunway TaihuLigh, the leader of our rating, will need?

Titan

In 2012, the US Department of Energy's Oak Ridge National Laboratory launched the Titan supercomputer, which is rated at 20 petaflops, meaning it can perform a quadrillion floating point operations in one second.


Titan was developed by Cray. In addition to Titan, American specialists have developed two more supercomputers in recent years. One of them - Mira - is intended for industrial and scientific research needs, and with the help of the other - Sequoia - they simulate nuclear weapons tests. IBM Corporation is behind all these developments.

The most powerful computer in Russia

Alas, the Russian development “Lomonosov-2”, recognized as the most powerful computer in Russia, is only in 41st place in the TOP500 (as of June 2016). It is based at the Scientific Computing Center of Moscow State University. The power of the domestic supercomputer is 1,849 petaflops, peak power is about 2.5 petaflops. Number of cores: 42,688.



Subscribe to our channel in Yandex.Zen

Supercomputer is a fairly flexible and very broad term. In the general sense, a supercomputer is a computer that is much more powerful than all the computers available on the market. Some engineers jokingly call any computer whose mass exceeds one ton a supercomputer. And although most modern supercomputers do weigh more than a ton. Not every computer can be called “super”, even if it weighs more than a ton. Mark-1, Eniak are also heavyweights, but they are not considered supercomputers even for their time.

The speed of technological progress is so great that today's supercomputer will be inferior to a home computer in 5-10 years. The term supercomputing appeared in the 20s of the last century, and the term supercomputer in the 60s. But it became widespread largely thanks to Seymour Cray and his supercomputers Cray-1, Cray-2. Although Seymour Cray himself does not prefer to use this term. He calls his cars just a computer.

In 1972, S. Cray left the CDC and founded his own company, Cray Research, which in 1976 releases the first vector-pipeline computer CRAY-1: clock time 12.5ns, 12 pipeline functional units, peak performance 160 million operations per second, RAM up to 1Mword (word - 64 bits), memory cycle 50ns. The main innovation is the introduction of vector commands that work with entire arrays of independent data and allow efficient use of pipeline functional devices.

Cray-1 It is considered to be one of the first supercomputers. Computer processors had a huge, at that time, set of registers. Which were divided into groups. Each group had its own functional purpose. A block of address registers that was responsible for addressing in computer memory. Block of vector registers, block of scalar registers.

Assembling a Cray-1 computer

Computer Cray-2

The first Soviet supercomputer

At the very beginning, the emergence of supercomputers was associated with the need for rapid processing of large amounts of data and complex mathematical and analytical calculations. Therefore, the first supercomputers differed little in their architecture from conventional computers. Only their power was many times greater than standard workstations. Initially, supercomputers were equipped with vector processors, conventional scalar ones. By the 80s, they switched to parallel operation of several vector processors. But this development path turned out to be not rational. Supercomputers switched to parallel working scalar processors.

Massively parallel processors became the basis for supercomputers. Thousands of processing elements were combined to create a powerful computing platform. Most parallel processors were created based on the RISC architecture. RISC (Reduced Instruction Set Computing) - calculations with a reduced set of instructions. By this term, processor manufacturers understand the concept where simpler instructions are executed faster. This method allows you to reduce the cost of processor production. At the same time increase their productivity.

The need for powerful computing solutions has grown rapidly. Supercomputers are too expensive. An alternative was needed. And they were replaced by clusters. But even today, powerful computers are called supercomputers. A cluster is a set of servers connected to a network and working on one task. This group of servers has high performance. Many times more than the same number of servers that would work separately. The cluster provides high reliability. The failure of one server will not lead to an emergency stop of the entire system, but will only slightly affect its performance. It is possible to replace a server in a cluster without stopping the entire system. There is no need to immediately shell out huge sums for a supercomputer. The cluster can be expanded gradually, which significantly amortizes the enterprise’s costs.

University cluster

Goals of the Supercomputer

1.Maximum arithmetic processor performance;

2.efficiency of the operating system and ease of communication with it for the programmer;

3.Efficiency of translation from high-level languages ​​and elimination of writing programs in autocode;

4.Efficiency of parallelization of algorithms for parallel architectures;

5.Increasing reliability.

Architecture of modern supercomputers

Computer architecture covers a significant range of problems associated with the creation of a complex of hardware and software and taking into account a large number of determining factors. Among these factors, the main ones are: cost, scope of application, functionality, ease of use, and hardware is considered one of the main components of the architecture. Computer architecture includes both a structure that reflects the composition of the PC and software and mathematical support. The structure of a computer is a set of elements and connections between them. The basic principle of building all modern computers is program control.

All computers are divided into four classes depending on the number of command and data streams.

To first class(von Neumann serial computers) belong to conventional scalar uniprocessor systems: single instruction stream - single stream data (SISD). The personal computer has a SISD architecture, and it does not matter whether the PC uses pipelines to speed up operations.

Second class characterized by the presence of a single command stream, but multiple nomoka data (SIMD). Single-processor vector or, more precisely, vector-pipeline supercomputers, for example, Cray-1, belong to this architectural class. In this case, we are dealing with one stream of (vector) commands, but there are many data streams: each element of the vector is included in a separate data stream. Matrix processors, for example, the once famous ILLIAC-IV, belong to the same class of computer systems. They also have vector instructions and implement vector processing, but not through pipelines, as in vector supercomputers, but using processor matrices.

By third grade- MIMD - refers to systems that have multiple command streams and multiple data streams. It includes not only multiprocessor vector supercomputers, but also all multiprocessor computers in general. The vast majority of modern supercomputers have MIMD architecture.

Fourth grade in Flynn's taxonomy, MISD is of no practical interest, at least for the computers we analyze. Recently, the term SPMD (single program multiple data) is also often used in the literature. It does not refer to computer architecture, but to a model of program parallelization, and is not an extension of Flynn's taxonomy. SPMD usually refers to MPP (ie MIMD) systems and means multiple copies of the same program.

Supercomputer tasks

At the very beginning, the emergence of supercomputers was associated with the need for rapid processing of large amounts of data and complex mathematical and analytical calculations. Computers are machines for large-scale tasks.

1.To solve complex and large scientific problems, in management, exploration

2. The latest architectural developments using modern elementary base and arithmetic accelerators

3.Design and simulation

4.Improved productivity

5. Centralized information storage

6.Assessing the complexity of problems solved in practice

Supercomputer at the Technical University of Munich

Second generation supercomputer located at VNIIEF

Supercomputer performance characteristics

Over half a century, computer performance has increased more than seven hundred million times. At the same time, the gain in performance associated with reducing the clock cycle time from 2 microseconds to 1.8 nanoseconds is only about 1000 times. The use of new solutions in computer architecture. The main place among them is occupied by the principle of parallel data processing, which embodies the idea of ​​simultaneous (parallel) execution of several actions. Parallel data processing, embodying the idea of ​​simultaneous execution of several actions, has two varieties: pipeline and actual parallelism. Parallel data processing, embodying the idea of ​​simultaneous execution of several actions, has two varieties: pipeline and actual parallelism.

Parallel processing. If a certain device performs one operation per unit of time, then it will perform a thousand operations in a thousand units. If we assume that there are five identical independent devices capable of operating simultaneously, then a system of five devices can perform the same thousand operations not in a thousand, but in two hundred units of time. Similarly, a system of N devices will perform the same work in 1000/N units of time. Similar analogies can be found in life: if one soldier digs up a garden in 10 hours, then a company of fifty soldiers with the same abilities, working simultaneously, will cope with the same work in 12 minutes - the principle of parallelism in action!

Conveyor processing A whole lot of small operations such as comparing orders, aligning orders, adding mantissas, normalizing, etc. The processors of the first computers performed all these “micro-operations” for each pair of arguments, one after another, until they reached the final result, and only then proceeded to process the next pair of terms.

All the very first computers (EDSAC, EDVAC, UNIVAC) had bit-sequential memory, from which words were read sequentially bit by bit. The first commercially available computer using bit-parallel memory (on CRT) and bit-parallel arithmetic was the IBM 701, and the most popular model was the IBM 704 (150 copies sold), which, in addition to the above, was the first to use ferrite memory. cores and hardware floating point amplifier. Hierarchy of memory. The memory hierarchy is not directly related to parallelism, however, it certainly refers to those features of computer architecture that are of great importance for increasing their performance (smoothing out the difference between the processor speed and memory access time). Main levels: registers, cache memory, RAM, disk memory. The sampling time for memory levels from disk memory to registers decreases, the cost per 1 word (byte) increases. Currently, such a hierarchy is supported even on personal computers.

Currently used:

1. Vector-pipeline computers. Pipeline Functional Devices and Vector Instruction Set

2. Massively parallel computers with distributed memory.

3. Parallel computers with shared memory. All RAM of such computers is shared by several identical processors

4.Use of parallel computing systems

List of the most powerful computers in the world

Organization where the computer is installed Computer type Number of computing cores Maximum performance Power consumption
Jaguar - Cray XT5-HE Opteron Six Core 2.6 GHz / 2009224162 1759.00 6950.60
National Supercomputing Center in Shenzhen (NSCS)Nebulae - Dawning TC3600 Blade, Intel X5650, NVidia Tesla C2050 GPU / 2010120640 1271.00 2984.30
DOE/NNSA/LANLRoadrunner - BladeCenter QS22/LS21 Cluster, PowerXCell 8i 3.2 Ghz / Opteron DC 1.8 GHz, Voltaire Infiniband / 2009122400 1042.00 2345.50
National Institute for Computational Sciences/University of TennesseeKraken XT5 - Cray XT5-HE Opteron Six Core 2.6 GHz / 200998928 831.70 2569
JUGENE - Blue Gene/P Solution / 2009294912 825.50 2268.00
National SuperComputer Center in Tianjin/NUDTTianhe-1 - NUDT TH-1 Cluster, Xeon E5540/E5450, ATI Radeon HD 4870 2, Infiniband / 200971680 563.10 2578
DOE/NNSA/LLNLBlueGene/L - eServer Blue Gene Solution / 2007212992 478.20 2329.60
Argonne National LaboratoryIntrepid - Blue Gene/P Solution / 2007163840 458.61 1260
Sandia National Laboratories/National Renewable Energy LaboratoryRed Sky - Sun Blade x6275, Xeon X55xx 2.93 Ghz, Infiniband / 2010

Sun Microsystems

42440 433.50 1254
Texas Advanced Computing Center/Univ. of TexasRanger - SunBlade x6420, Opteron QC 2.3 Ghz, Infiniband / 2008

Sun Microsystems

62976 433.20 2000.00
DOE/NNSA/LLNLDawn - Blue Gene/P Solution / 2009147456 415.70 1134
Moscow State University - Research Computing Center Russia Lomonosov - T-Platforms T-Blade2, Xeon 5570 2.93 GHz, Infiniband QDR / 2009 T-Platforms 35360 350.10 1127
Forschungszentrum Juelich (FZJ)JUROPA - Sun Constellation, NovaScale R422-E2, Intel Xeon X5570, 2.93 GHz, Sun M9/Mellanox QDR Infiniband/Partec Parastation / 200926304 274.80 1549.00
KISTI Supercomputing CenterTachyonII - Sun Blade x6048, X6275, IB QDR M9 switch, Sun HPC stack Linux edition / 2009

Sun Microsystems

26232 274.20 307.80
University of EdinburghHECToR - Cray XT6m 12-Core 2.1 GHz / 201043660 274.70 1189.80
NERSC/LBNLFranklin - Cray XT4 QuadCore 2.3 GHz / 200838642 266.30 1150.00
Grand Equipement National de Calcul Intensif - Center Informatique National de l"Enseignement Supç╘rieur (GENCI-CINES)Jade - SGI Altix ICE 8200EX, Xeon E5472 3.0/X5560 2.8 GHz / 201023040 237.80 1064.00
Institute of Process Engineering, Chinese Academy of SciencesMole-8.5 - Mole-8.5 Cluster Xeon L5520 2.26 Ghz, nVidia Tesla, Infiniband / 2010

IPE, nVidia Tesla C2050, Tyan

33120 207.30 1138.44
Oak Ridge National LaboratoryJaguar - Cray XT4 QuadCore 2.1 GHz / 200830976 205.00 1580.71}

 

It might be useful to read: