Los Alamos PRObE Center Allows Computer Scientists to Test Super-Computing and Big Data Systems Software Scale
The New Mexico Consortium (NMC), Los Alamos National Laboratory (LANL), Carnegie Mellon University (CMU), and the National Science Foundation (NSF) have partnered to create the PRObE Center (Parallel Reconfigurable Observational Environment), a one-of-a-kind computer systems research center located at the Los Alamos Research Park.
A grand opening celebration is scheduled from 1-3 p.m. Thursday, Oct. 18, at the Research Park with U.S. Rep. Ben Ray Lujan, LANL Director Charlie McMillan and NSF representative Dr. Keith Marzullo among the keynote speakers for the event.
Using $10 million provided by the NSF, along with 2,048 recently retired computers from LANL, the PRObE Center will be the world’s first facility where computer systems researchers have access to a dedicated large scale supercomputer where disruptive -- and even destructive -- testing can be done.
Currently, high performance computing research is limited to using small clusters, or renting virtual machines in large, shared cloud clusters, to test the systems they develop.
When supercomputers are built, they go directly into production for their intended application, such as climate modeling, leaving no time for systems researchers to try out new concepts in systems software like operating systems, network software, and storage software.
There have been no supercomputers available for computer scientists to work out issues with the operating systems or other systems software.
As supercomputers get larger and more complex, providing reliable and efficient system level software becomes a daunting task; without a place to test new concepts, advancement in high performance computing will be delayed.
Several years ago, LANL’s deputy division leader for High Performance Computing Gary Grider was working to decommission some old supercomputer hardware when it occurred to him there might be a better solution.
“I realized our retired machines might still have some value for other purposes than providing cycles for science,” Grider said. “I had this idea that there ought to be a way to reuse these things. One way would be to help systems researchers who never get a chance at a dedicated large resource.”
After developing this idea with colleagues at Carnegie Mellon University, Grider started a conversation with the NSF, which agreed that this was a great opportunity to collaborate and build something that would have enormous, positive implications worldwide.
With a $10 million grant from the NSF, the PRObE team got to work on creating the Center, housed at the New Mexico Consortium’s offices in Los Alamos.
PRObE is a joint effort of the LANL, Carnegie Mellon University and the New Mexico Consortium.
After two years of construction, moving computers and testing operations, the PRObE Center is now ready to begin hosting researchers with it’s 2,048-core cluster, located at the NMC in Los Alamos. A smaller cluster is located at Carnegie Mellon in Pittsburgh, PA.
All of these are recycled machines donated by LANL. The testing management software is provided by the University of Utah’s Emulab Group.
Katharine Chartrand, the Executive Director of the New Mexico Consortium, who has worked on the project with Grider and the NSF from the very beginning, is thrilled to see the culmination of their efforts.
“Re-purposing a supercomputer is hard. Building the first computer science research environment of this scale is an experiment in itself. I am amazed at the commitment of this partnership -- the institutions, the individuals, the NSF -- to making this resource available to the nation. It takes a tremendous, sustained and shared commitment to make a program of this complexity work,” Chartrand said.
Dr. Garth Gibson, a professor of computer science and electrical and computer engineering at Carnegie Mellon University (CMU), has been an integral part of the collaboration.
CMU is a renowned leader in computer systems research; Gibson immediately saw the value in the PRObE proposal, and was eager to lend his expertise to the project.
“Unless they leave universities for government or industry jobs, researchers and students rarely have access to these expensive large-scale clusters,” Gibson explained. “That means they don’t get the training and education necessary to develop innovations for the fast-approaching era of exascale computing.”
“Moreover, when a supercomputer is new, it’s immediately needed for applications research,” Gibson said. “So even when they do get permission to use larger clusters, systems scientists can’t run experiments on low-level hardware and purposely break these machines to see what happens.”
Researchers will be given dedicated use of the PRObE clusters for days, even weeks at a time.
They will be allowed to replace any and all of the code and even inject faults that might be destructive to some equipment.
The idea for PRObE originated with LANL but they approached the New Mexico Consortium to operate the Center.
“LANL isn’t in the business of supporting academic research and didn’t have authority to foot the bill to house, power, cool and maintain these old systems,” Grider explained. “We run a supercomputer complex to do nuclear weapons calculations. This is an offshoot thing that isn’t in our mission.”
PRObE includes a focused educational component targeting undergraduate students nationally.
The Computer System, Cluster, and Networking Summer Institute is a nine-week technical intensive that emphasizes practical skill development in setting up, configuring, administering, testing, monitoring, and scheduling computer systems, supercomputer clusters, and computer networks through a variety of activities including hands-on technical training, lectures, professional development seminars, and tours of LANL facilities.
This innovative and highly successful program was developed and piloted by LANL's Information Science and Technology Institute (ISTI.)
It has been incorporated into the PRObE Center and is now jointly managed by ISTI and the NMC.
The NSF awarded a $10 million grant for PRObE operations to the New Mexico Consortium in October of 2010.
The super-computer facility was completed a year later, in December 2011, with the help of local high school and college students. The computers are now configured, and researchers worldwide may start applying to use the 2,048-node facility.
The New Mexico Consortium is a non-profit partnership between the University of New Mexico, New Mexico Institute of Mining and Technology, and New Mexico State University.
The NMC facilitates research and educational collaborations between universities and Los Alamos National Laboratory.