Unseen City: The SciNet Supercomputer
Torontoist has been acquired by Daily Hive Toronto - Your City. Now. Click here to learn more.

Torontoist

10 Comments

cityscape

Unseen City: The SciNet Supercomputer


One of the earliest innovators in the realm of high-performance computing was Gaspard Clair François Marie Riche de Prony. In 1791―as David Alan Grier tells it in his book, When Computers Were Human―de Prony, a middle-aged civil engineer, was asked by the newly instated revolutionary government of France to create a detailed set of trigonometric tables (which are good for making the types of precise angle measurements necessary for large engineering projects). De Prony realized that the computational demands of the task at hand were too great for him to handle on his own; he was going to need help, and lots of it. Taking a cue from Adam Smith, he hired a staff of around 90 workers to do the arithmetic, many of them former wig-makers left jobless after the revolution. (Members of the aristocratic classes were about to lose their heads and were therefore in less need of fashionable things to put on top of them.) The work was finished after about a decade, by which point it was too late: de Prony’s government had grown indifferent to the project, and his publisher had gone bankrupt.

This has been a roundabout way of establishing the following: if de Prony had possessed a computer as brawny as the one operated by Toronto’s SciNet consortium, he would have finished his decade-long computation in seconds. SciNet’s machine can do the work of trillions of wig-makers.

The SciNet supercomputer was the sixteenth most powerful in the world when it went online last year, but has since fallen to twenty-eighth (an inevitability in a field ruled by Moore’s Law). It remains Canada’s single most powerful piece of computing equipment, and in fact it almost single-handedly puts Canada on the map of supercomputing superpowers, neck-and-neck with Sweden and Switzerland. The main part of the system is capable of a theoretical peak performance of 306 teraflops. A teraflop is one trillion operations per second.

Because the supercomputer requires a tremendous amount of space and power, housing it near SciNet’s McCaul Street main office was an impossibility. As a result, the SciNet data centre, where all the supercomputer’s hardware is physically located, is in an unmarked unit in a strip mall in Vaughan, next to a car and truck rental outlet and something called The World’s Greatest Art and Mirror Store.

The data centre was designed and built by IBM, on behalf of the SciNet consortium, using money from federal and provincial grants, and from the University of Toronto. The SciNet consortium consists of the University of Toronto and a few hospitals near the St. George campus.
Making supercomputers isn’t particularly profitable, but manufacturers compete for opportunities to build them, regardless, for much the same reasons that big automakers build race cars: the company name benefits from being associated with so much raw horsepower. IBM’s pride in their involvement with SciNet is part of the reason we were able to tour the data centre: they like it when media outlets publicize their high-end technical prowess. Our cause, also, was helped by the fact that SciNet happened to be hosting this year’s High Performance Computing Symposium, and was already giving data centre tours to attendees. We accompanied a group of HPCS 2010 delegates on a chartered bus from U of T to the strip mall―an hour-long trip in traffic.

On the bus, we were seated next to Jamie Pinto, a jolly, soft-spoken guy in his forties who has been a sysadmin with SciNet since the computer went online. (He began campaigning for a job with the consortium as soon as he heard that a new supercomputer was being built.) In his time maintaining the rig, he’s developed a theory about the function of supercomputers. They enable scientists, he said, to “compress time.” In other words, they act as prophylactics against sad stories like de Prony’s, of brilliant lives frittered away on scientific scutwork. To Pinto’s way of thinking, fantastically powerful computers increase the amount of social good researchers can do while they’re still alive―not just a little, but by orders of magnitude.

The SciNet supercomputer is used to perform calculations in cutting-edge biology, aerospace engineering, particle physics, and even climate science. There are currently about five hundred users, and any qualified researcher at a Canadian university can get an account. (Researchers outside of Canada can sometimes get accounts, as well.)
Pinto, who travels to the data centre on a weekly basis, tries to lower our expectations by warning us that there’s nothing particularly flashy inside for us to photograph: “It’s big. It’s cold. And it’s noisy,” he says.
True.

201006scinet08.jpg
The data centre’s many servers are thick with network cabling.

Inside the data centre, the air smells like Home Depot. Since the facility has only been operational for a year, the white paint on the walls is still fresh and unblemished―almost hospital-clean. Everything in the facility―not just the computer hardware, but everything from the walls to the hinges on the doors―was erected to IBM’s exact specifications. “When IBM first came onto this site, we had nothing,” says Neil Bunn, our tour guide.
Bunn, one of three IBM engineers who helped design the data centre, is a youngish guy in a paisley shirt, with a crew cut so precise it might have been laser-sculpted by a piece of IBM office equipment. He explains that the design team made all their decisions with energy efficiency in mind. The building has its own four-megawatt power system, complete with a giant transformer room. The approximate annual cost of electricity is one million dollars. That’s considered cheap.

The computers suck up prodigious amounts of power, but they also produce incredible amounts of heat, and so a large portion of the data centre’s electricity draw is devoted solely to keeping everything cold enough to function. Most of the computer hardware is serviced by liquid coolant, pumped from a 735-ton, fully automated coolant plant that occupies its own room. It was built off-site and delivered ready to install, in a single piece.
There is also air conditioning, to supplement the liquid cooling system. The air circulators perform the minority of the cooling, but they produce the majority of the facility’s noise. Thanks to them, the entire place is slightly chilly, and constantly suffused with a dull, white roar.

The cooling system has failed, on occasion. When this happens, the air temperature in the equipment room can rise from its usual, relatively balmy twenty-five degrees to a scorching fifty in under twelve minutes. “I think the hottest we ever measured during one of the tests was sixty-two degrees,” says Bunn. “I was in a suit when I came in, so you can tell how much of a ‘test’ it really was.”

Cooling is so much of a concern that it even informed the floor plan. The space consists of a square outer hallway surrounding a square inner room. The outer hallway has some offices, the coolant plant room, the transformer room―and it also serves as storage space for, among other things, what looks like a year’s supply of toilet paper. The inner room is where all the computer hardware is, because it’s compact and therefore requires less energy to keep cold. The entire facility is, essentially, a walk-in PC case.

After we’ve made the circuit of the outer hall, Bunn leads our group into the inner room, to see the computer hardware. The floors are made of large beige tiles, and the walls are white. There are multiple rows of black, rectangular cabinets, all of them tall as basketball players, and each emblazoned with an IBM logo. Each cabinet contains a stack of server nodes (black, about the size and shape of a DVD player), each of which is festooned with yellow, orange, and blue cabling―some of it roped, like musculature. These nodes are where the actual computation happens. Each one has dozens of indicator lights that thrum with constant activity. (The lights are the only evidence of work being done, since the supercomputer’s users log in remotely from their home or office computers.) The pipes that carry the liquid coolant to each tower of components run beneath the floor. Bunn has some suction cups attached to a metal handle, which he uses to pull up tiles, so he can show off the plumbing.
For the technically minded, the setup is as follows. Non-technicals, meanwhile, might appreciate the cadences of IT jargon. Say it three times fast:

The main cluster consists of 30,240 cores of Intel Xeon E5540, clocked at 2.53 gigahertz, with two gigabytes of RAM per core. All of this is spread across 3,780 IBM iDataPlex 2U dx360 nodes, linked together with hybrid Gigabit Ethernet and InfiniBand. There’s also another cluster used for certain kinds of high-intensity tasks. It has 3,328 cores of IBM POWER6, clocked at 4.7 gigahertz, with four gigabytes of RAM each, for a total of 104 nodes, all of which are linked with 4x DDR InfiniBand. Because none of the system’s nodes have hard drives (data is stored in a separate, 1.5 petabyte array), the rig is capable of an interesting computational magic trick known as “dynamic provisioning,” which allows users to switch operating systems on the fly, to anything except Mac OS. The discless nature of the system, according to Pinto, enables it to boot in roughly five to seven minutes.

Our toes start to get cold as we take notes, because all the chilled air from the circulators travels along the floor. Bunn tells the group that SciNet burns out a stick of RAM every couple of days, and the eyes of a few computer scientists widen slightly. “It’s just the limits of what SRAM can handle,” he says. So, not IBM’s fault. Naturally.

The SciNet supercomputer, too, has limits. As ridiculously fast as it is, the sad truth is that it, like all computers, has been staring into the maw of obsolescence since the moment it shed its shrink-wrap. Complicating matters further is the fact that the federal and provincial grant money that currently covers most of the project’s expenses runs out in 2013, as does SciNet’s lease on the strip mall unit. Since it’s not certain that funds will be available, there’s currently no official upgrade plan.
“Basically, though, it’s very likely that we will get more funding,” Jillian Dempsey, SciNet’s project coordinator, tells us. “Must always keep your fingers crossed though!”

Comments