SANTA CLARA, Calif. — Jeff Rothschild’s machines at Facebook had a problem
he knew he had to solve immediately. They were about to melt.
The company had been packing a 40-by-60-foot rental space
here with racks of computer servers that were needed to store and process
information from members’ accounts. The electricity pouring into the computers
was overheating Ethernet sockets and other crucial components.
Thinking fast, Mr. Rothschild, the company’s engineering
chief, took some employees on an expedition to buy every fan they could find —
“We cleaned out all of the Walgreens in the area,” he said — to blast cool air
at the equipment and prevent the Web site from going down.
That was in early 2006, when Facebook had a quaint 10 million
or so users and the one main server site. Today, the information generated by
nearly one billion people requires outsize versions of these facilities, called
data centers, with rows and rows of servers spread over hundreds of thousands
of square feet, and all with industrial cooling systems.
They are a mere fraction of the tens of thousands of data
centers that now exist to support the overall explosion of digital information.
Stupendous amounts of data are set in motion each day as, with an innocuous
click or tap, people download movies on iTunes, check credit card balances
through Visa’s Web site, send Yahoo e-mail with files
attached, buy products on Amazon, post on Twitter or
read newspapers online.
A yearlong examination by The New York Times has revealed
that this foundation of the information industry is sharply at odds with its
image of sleek efficiency and environmental friendliness.
Most data centers, by design, consume vast amounts of energy
in an incongruously wasteful manner, interviews and documents show. Online
companies typically run their facilities at maximum capacity around the clock,
whatever the demand. As a result, data centers can waste 90 percent or more of
the electricity they pull off the grid, The Times found.
To guard against a power failure, they further rely on banks
of generators that emit diesel exhaust. The pollution from data centers has
increasingly been cited by the authorities for violating clean air regulations,
documents show. In Silicon Valley, many data centers appear on the state
government’s Toxic
Air Contaminant Inventory, a roster of the area’s top stationary diesel polluters.
Worldwide, the digital warehouses use about 30 billion watts
of electricity, roughly equivalent to the output of 30 nuclear power plants,
according to estimates industry experts compiled for The Times. Data centers in
the United States account for one-quarter to one-third of that load, the
estimates show.
“It’s staggering for most people, even people in the
industry, to understand the numbers, the sheer size of these systems,” said
Peter Gross, who helped design hundreds of data centers. “A single data center
can take more power than a medium-size town.”
Energy efficiency varies widely from company to company. But
at the request of The Times, the consulting firm McKinsey & Company
analyzed energy use by data centers and found that, on average, they were using
only 6 percent to 12 percent of the electricity powering their servers to
perform computations. The rest was essentially used to keep servers idling and
ready in case of a surge in activity that could slow or crash their operations.
A server is a sort of bulked-up desktop computer, minus a
screen and keyboard, that contains chips to process data. The study sampled
about 20,000 servers in about 70 large data centers spanning the commercial
gamut: drug companies, military contractors, banks, media companies and
government agencies.
“This is an industry dirty secret, and no one wants to be
the first to say mea culpa,” said a senior industry executive who asked not to
be identified to protect his company’s reputation. “If we were a manufacturing
industry, we’d be out of business straightaway.”
These physical realities of data are far from the mythology
of the Internet: where lives are lived in the “virtual” world and all manner of
memory is stored in “the cloud.”
The inefficient use of power is largely driven by a
symbiotic relationship between users who demand an instantaneous response to
the click of a mouse and companies that put their business at risk if they fail
to meet that expectation.
Even running electricity at full throttle has not been
enough to satisfy the industry. In addition to generators, most large data
centers contain banks of huge, spinning flywheels or thousands of lead-acid
batteries — many of them similar to automobile batteries — to power the
computers in case of a grid failure as brief as a few hundredths of a second,
an interruption that could crash the servers.
“It’s a waste,” said Dennis P. Symanski, a senior researcher
at the Electric Power
Research Institute, a nonprofit industry group. “It’s too many insurance
policies.”
At least a dozen major data centers have been cited for
violations of air quality regulations in Virginia and Illinois alone, according
to state records. Amazon was cited with more than 24 violations over a
three-year period in Northern Virginia, including running some of its
generators without a basic environmental permit.
A few companies say they are using extensively re-engineered
software and cooling systems to decrease wasted power. Among them are Facebook
and Google, which also have
redesigned their hardware. Still, according to recent disclosures, Google’s
data centers consume nearly 300 million watts and Facebook’s about 60 million
watts.
Many of these solutions are readily available, but in a
risk-averse industry, most companies have been reluctant to make wholesale
change, according to industry experts.
Improving or even assessing the field is complicated by the
secretive nature of an industry that is largely built around accessing other
people’s personal data.
For security reasons, companies typically do not even reveal
the locations of their data centers, which are housed in anonymous buildings
and vigilantly protected. Companies also guard their technology for competitive
reasons, said Michael Manos, a longtime industry executive. “All of those
things play into each other to foster this closed, members-only kind of group,”
he said.
That secrecy often extends to energy use. To further
complicate any assessment, no single government agency has the authority to track
the industry. In fact, the federal government was unable to determine how much
energy its own data centers consume, according to officials involved in a
survey completed last year.
The survey did discover that the number of federal data
centers grew from 432 in 1998 to 2,094 in 2010.
To investigate the industry, The Times obtained thousands of
pages of local, state and federal records, some through freedom of information
laws, that are kept on industrial facilities that use large amounts of energy.
Copies of permits for generators and information about their emissions were
obtained from environmental agencies, which helped pinpoint some data center
locations and details of their operations.
In addition to reviewing records from electrical utilities,
The Times also visited data centers across the country and conducted hundreds
of interviews with current and former employees and contractors.
Some analysts warn that as the amount of data and energy use
continue to rise, companies that do not alter their practices could eventually
face a shake-up in an industry that has been prone to major upheavals,
including the bursting of the first Internet bubble in the late 1990s.
“It’s just not sustainable,” said Mark Bramfitt, a former
utility executive who now consults for the power and information technology
industries. “They’re going to hit a brick wall.”
Bytes by the Billions
Wearing an FC Barcelona T-shirt and plaid Bermuda shorts,
Andre Tran strode through a Yahoo data center in Santa Clara where he was the
site operations manager. Mr. Tran’s domain — there were servers assigned to
fantasy sports and photo sharing, among other things — was a fair sample of the
countless computer rooms where the planet’s sloshing tides of data pass through
or come to rest.
Aisle after aisle of servers, with amber, blue and green
lights flashing silently, sat on a white floor punctured with small round holes
that spit out cold air. Within each server were the spinning hard drives that
store the data. The only hint that the center was run by Yahoo, whose name was
nowhere in sight, could be found in a tangle of cables colored in the company’s
signature purple and yellow.
“There could be thousands of people’s e-mails on these,” Mr.
Tran said, pointing to one storage aisle. “People keep old e-mails and
attachments forever, so you need a lot of space.”
This is the mundane face of digital information — player
statistics flowing into servers that calculate fantasy points and league
rankings, snapshots from nearly forgotten vacations kept forever in storage
devices. It is only when the repetitions of those and similar transactions are
added up that they start to become impressive.
Each year, chips in servers get faster, and storage media
get denser and cheaper, but the furious rate of data production goes a notch
higher.
Jeremy Burton, an expert in data storage, said that when he
worked at a computer technology company 10 years ago, the most data-intensive customer
he dealt with had about 50,000 gigabytes in its entire database. (Data storage
is measured in bytes. The letter N, for example, takes 1 byte to store. A
gigabyte is a billion bytes of information.)
Today, roughly a million gigabytes are processed and stored
in a data center during the creation of a single 3-D animated movie, said Mr.
Burton, now at EMC,
a company focused on the management and storage of data.
Just one of the company’s clients, the New York Stock
Exchange, produces up to 2,000 gigabytes of data per day that must be stored
for years, he added.
EMC and the International
Data Corporation together estimated that more than 1.8 trillion
gigabytes of digital information were created globally last year.
“It is absolutely a race between our ability to create data
and our ability to store and manage data,” Mr. Burton said.
About three-quarters of that data, EMC estimated, was
created by ordinary consumers.
With no sense that data is physical or that storing it uses
up space and energy, those consumers have developed the habit of sending huge
data files back and forth, like videos and mass e-mails with photo attachments.
Even the seemingly mundane actions like running an app to find an Italian
restaurant in Manhattan or a taxi in Dallas requires servers to be turned on
and ready to process the information instantaneously.
The complexity of a basic transaction is a mystery to most
users: Sending a message with photographs to a neighbor could involve a trip
through hundreds or thousands of miles of Internet conduits and multiple data
centers before the e-mail arrives across the street.
“If you tell somebody they can’t access YouTube or download
from Netflix, they’ll tell you it’s a God-given right,” said Bruce Taylor, vice
president of the Uptime Institute,
a professional organization for companies that use data centers.
To support all that digital activity, there are now more
than three million data centers of widely varying sizes worldwide, according to
figures from the International Data Corporation.
Nationwide, data centers used about 76 billion
kilowatt-hours in 2010, or roughly 2 percent of all electricity used in the
country that year, based on an analysis by Jonathan G. Koomey, a research
fellow at Stanford University who has been studying data center energy use for
more than a decade. DatacenterDynamics, a London-based firm, derived similar
figures.
The industry has long argued that computerizing business
transactions and everyday tasks like banking and reading library books has the
net effect of saving energy and resources. But the paper industry, which some
predicted would be replaced by the computer age, consumed 67 billion
kilowatt-hours from the grid in 2010, according to Census Bureau figures
reviewed by the Electric Power Research Institute for The Times.
Direct comparisons between the industries are difficult:
paper uses additional energy by burning pulp waste and transporting products.
Data centers likewise involve tens of millions of laptops, personal computers
and mobile devices.
Chris Crosby, chief executive of the Dallas-based Compass Datacenters, said there
was no immediate end in sight to the proliferation of digital infrastructure.
“There are new technologies and improvements,” Mr. Crosby
said, “but it still all runs on a power cord.”
‘Comatose’ Power Drains
Engineers at Viridity
Software, a start-up that helped companies manage energy resources, were
not surprised by what they discovered on the floor of a sprawling data center
near Atlanta.
Viridity had been brought on board to conduct basic
diagnostic testing. The engineers found that the facility, like dozens of
others they had surveyed, was using the majority of its power on servers that
were doing little except burning electricity, said Michael Rowan, who was
Viridity’s chief technology officer.
A senior official at the data center already suspected that
something was amiss. He had previously conducted his own informal survey,
putting red stickers on servers he believed to be “comatose” — the term
engineers use for servers that are plugged in and using energy even as their
processors are doing little if any computational work.
“At the end of that process, what we found was our data
center had a case of the measles,” said the official, Martin Stephens, during a
Web seminar with Mr. Rowan. “There were so many red tags out there it was
unbelievable.”
The Viridity tests backed up Mr. Stephens’s suspicions: in
one sample of 333 servers monitored in 2010, more than half were found to be
comatose. All told, nearly three-quarters of the servers in the sample were
using less than 10 percent of their computational brainpower, on average, to
process data.
The data center’s operator was not some seat-of-the-pants
app developer or online gambling company, butLexisNexis, the database
giant. And it was hardly unique.
In many facilities, servers are loaded with applications and
left to run indefinitely, even after nearly all users have vanished or new
versions of the same programs are running elsewhere.
“You do have to take into account that the explosion of data
is what aids and abets this,” said Mr. Taylor of the Uptime Institute. “At a
certain point, no one is responsible anymore, because no one, absolutely no
one, wants to go in that room and unplug a server.”
Kenneth Brill, an engineer who in 1993 founded the Uptime
Institute, said low utilization began with the field’s “original sin.”
In the early 1990s, Mr. Brill explained, software operating
systems that would now be considered primitive crashed if they were asked to do
too many things, or even if they were turned on and off. In response, computer
technicians seldom ran more than one application on each server and kept the
machines on around the clock, no matter how sporadically that application might
be called upon.
So as government energy watchdogs urged consumers to turn
off computers when they were not being used, the prime directive at data
centers became running computers at all cost.
A crash or a slowdown could end a career, said Michael
Tresh, formerly a senior official at Viridity. A field born of cleverness and
audacity is now ruled by something else: fear of failure.
“Data center operators live in fear of losing their jobs on
a daily basis,” Mr. Tresh said, “and that’s because the business won’t back
them up if there’s a failure.”
In technical terms, the fraction of a computer’s brainpower
being used on computations is called “utilization.”
McKinsey & Company, the consulting firm that analyzed
utilization figures for The Times, has been monitoring the issue since at least
2008, when it published a report that received little notice outside the field.
The figures have remained stubbornly low: the current findings of 6 percent to
12 percent are only slightly better than those in 2008. Because of
confidentiality agreements, McKinsey is unable to name the companies that were
sampled.
David Cappuccio, a managing vice president and chief of
research at Gartner,
a technology research firm, said his own recent survey of a large sample of
data centers found that typical utilizations ran from 7 percent to 12 percent.
“That’s how we’ve overprovisioned and run data centers for
years,” Mr. Cappuccio said. “ ‘Let’s overbuild just in case we need it’ —
that level of comfort costs a lot of money. It costs a lot of energy.”
Servers are not the only components in data centers that
consume energy. Industrial cooling systems, circuitry to keep backup batteries
charged and simple dissipation in the extensive wiring all consume their share.
In a typical data center, those losses combined with low
utilization can mean that the energy wasted is as much as 30 times the amount
of electricity used to carry out the basic purpose of the data center.
Some companies, academic organizations and research groups
have shown that vastly more efficient practices are possible, although it is
difficult to compare different types of tasks.
A company called Power
Assure, based in Santa Clara, markets a technology that enables commercial
data centers to safely power down servers when they are not needed — overnight,
for example.
But even with aggressive programs to entice its major
customers to save energy, Silicon Valley Power has
not been able to persuade a single data center to use the technique in Santa
Clara, said Mary Medeiros McEnroe, manager of energy efficiency programs at the
utility.
“It’s a nervousness in the I.T. community that something
isn’t going to be available when they need it,” Ms. McEnroe said.
The streamlining of the data center done by Mr. Stephens for
LexisNexis Risk Solutions is an illustration of the savings that are possible.
In the first stage of the project, he said that by
consolidating the work in fewer servers and updating hardware, he was able to
shrink a 25,000-square-foot facility into 10,000 square feet.
Of course, data centers must have some backup capacity
available at all times and achieving 100 percent utilization is not possible.
They must be prepared to handle surges in traffic.
Mr. Symanski, of the Electric Power Research Institute, said
that such low efficiencies made sense only in the obscure logic of the digital
infrastructure.
“You look at it and say, ‘How in the world can you run a
business like that,’ ” Mr. Symanski said. The answer is often the same, he
said: “They don’t get a bonus for saving on the electric bill. They get a bonus
for having the data center available 99.999 percent of the time.”
The Best-Laid Plans
In Manassas, Va., the retailing colossus Amazon runs servers
for its cloud amid a truck depot, a defunct grain elevator, a lumberyard and
junk-strewn lots where machines compress loads of trash for recycling.
The servers are contained in two Amazon data centers run out
of three buildings shaped like bulky warehouses with green, corrugated sides.
Air ducts big enough to accommodate industrial cooling systems sprout along the
rooftops; huge diesel generators sit in rows around the outside.
The term “cloud” is often generally used to describe a data
center’s functions. More specifically, it refers to a service for leasing
computing capacity. These facilities are primarily powered from the national
grid, but generators and batteries are nearly always present to provide
electricity if the grid goes dark.
The department is on familiar terms with Amazon. As a result
of four inspections beginning in October 2010, the company was told it would be
fined $554,476 by the agency for installing and repeatedly running diesel
generators without obtaining standard environmental permits required to operate
in Virginia.
Even if there are no blackouts, backup generators still emit
exhaust because they must be regularly tested.
After months of negotiations, the penalty was reduced to
$261,638. In a “degree of culpability” judgment, all 24 violations were given
the ranking “high.”
Drew Herdener, an Amazon spokesman, agreed that the company
“did not get the proper permits” before the generators were turned on. “All of
these generators were all subsequently permitted and approved,” Mr. Herdener
said.
The violations came in addition to a series of lesser
infractions at one of Amazon’s data centers in Ashburn, Va., in 2009, for which
the company paid $3,496, according to the department’s records.
Of all the things the Internet was expected to become, it is
safe to say that a seed for the proliferation of backup diesel generators was
not one of them.
Terry Darton, a former manager at Virginia’s environmental
agency, said permits had been issued to enough generators for data centers in
his 14-county corner of Virginia to nearly match the output of a nuclear power
plant.
“It’s shocking how much potential power is available,” said
Mr. Darton, who retired in August.
No national figures on environmental violations by data
centers are available, but a check of several environmental districts suggests
that the centers are beginning to catch the attention of regulators across the
country.
Over the past five years in the Chicago area, for example,
the Internet powerhouses Savvis and Equinix received violation notices,
according to records from the Illinois
Environmental Protection Agency. Aside from Amazon, Northern Virginia
officials have also cited data centers run by Qwest, Savvis, VeriSign and NTT
America.
Despite all the precautions — the enormous flow of
electricity, the banks of batteries and the array of diesel generators — data
centers still crash.
Amazon, in particular, has had a series of failures in
Northern Virginia over the last several years. One, in May 2010 at a facility
in Chantilly, took businesses dependent on Amazon’s cloud offline for what the
company said was more than an hour — an eternity in the data business.
Pinpointing the cause became its own information glitch.
Amazon announced that the failure “was triggered when a
vehicle crashed into a high-voltage utility pole on a road near one of our data
centers.”
As it turns out, the car accident was mythical, a
misunderstanding passed from a local utility lineman to a data center worker to
Amazon headquarters. Instead, Amazon said that its backup gear mistakenly shut
down part of the data center after what Dominion Virginia Power said was a
short on an electrical pole that set off two momentary failures.
Mr. Herdener of Amazon said the backup system had been
redesigned, and that “we don’t expect this condition to repeat.”
The Source of the Problem
Last year in the Northeast, a $1 billion feeder line for the
national power grid went into operation, snaking roughly 215 miles from
southwestern Pennsylvania, through the Allegheny Mountains in West Virginia and
terminating in Loudon County, Va.
The work was financed by millions of ordinary ratepayers.
Steven R. Herling, a senior official at PJM
Interconnection, a regional authority for the grid, said the need to feed
the mushrooming data centers in Northern Virginia was the “tipping point” for
the project in an otherwise down economy.
Data centers in the area now consume almost 500 million
watts of electricity, said Jim Norvelle, a spokesman for Dominion Virginia
Power, the major utility there. Dominion estimates that the load could rise to
more than a billion watts over the next five years.
Data centers are among utilities’ most prized customers.
Many utilities around the country recruit the facilities for their almost
unvarying round-the-clock loads. Large, steady consumption is profitable for
utilities because it allows them to plan their own power purchases in advance
and market their services at night, when demand by other customers plummets.
Mr. Bramfitt, the former utility executive, said he feared
that this dynamic was encouraging a wasteful industry to cling to its
pedal-to-the-metal habits. Even with all the energy and hardware pouring into
the field, others believe it will be a challenge for current methods of storing
and processing data to keep up with the digital tsunami.
Some industry experts believe a solution lies in the cloud:
centralizing computing among large and well-operated data centers. Those data
centers would rely heavily on a technology called virtualization, which in
effect allows servers to merge their identities into large, flexible computing
resources that can be doled out as needed to users, wherever they are.
One advocate of that approach is Mr. Koomey, the Stanford
data center expert. But he said that many companies that try to manage their
own data centers, either in-house or in rental spaces, are still unfamiliar
with or distrustful of the new cloud technology. Unfortunately, those companies
account for the great majority of energy usage by data centers, Mr. Koomey
said.
Others express deep skepticism of the cloud, saying that the
sometimes mystical-sounding belief in its possibilities is belied by the
physicality of the infrastructure needed to support it.
Using the cloud “just changes where the applications are
running,” said Hank Seader, managing principal for research and education at
the Uptime Institute. “It all goes to a data center somewhere.”
Some wonder if the very language of the Internet is a
barrier to understanding how physical it is, and is likely to stay. Take, for
example, the issue of storing data, said Randall H. Victora, a professor of
electrical engineering at the University of Minnesota who does research on
magnetic storage devices.
“When somebody says, ‘I’m going to store something in the
cloud, we don’t need disk drives anymore’ — the cloud is disk drives,” Mr.
Victora said. “We get them one way or another. We just don’t know it.”
Whatever happens within the companies, it is clear that
among consumers, what are now settled expectations largely drive the need for
such a formidable infrastructure.
“That’s what’s driving that massive growth — the end-user expectation
of anything, anytime, anywhere,” said David Cappuccio, a managing vice
president and chief of research at Gartner, the technology research firm.
“We’re what’s causing the problem.”