SGI 'Big Brain' computer depicts world according to Wikipedia
Researcher uses SGI supercomputer to visualise history based on data from Wikipedia
A researcher at the University of Illinois has created the first ever graphical visualisation of modern history according to Wikipedia, using SGI's new “Big Brain” computer.
Kalev H Leetaru loaded the entire English language edition of Wikipedia onto SGI's new UV2 supercomputer, which used algorithms to identify every mention of location and date across the text of every Wikipedia entry. More than 80 million locations and 42 million dates were extracted, averaging 19 locations and 11 dates per article.
The connections between these datasets were also captured, allowing Leetaru to perform near-real time analysis and create visual maps of how history has unfolded over the past thousand years. He identified four periods of growth in Wikipedia's historical coverage: 1001-1500 (Middle Ages), 1501-1729 (Early Modern Period), 1730-2003 (Age of Enlightenment), 2004-2011 (Wikipedia Era).
Related Articles on Techworld
Leetaru was also able to ascertain that the site's continued growth is focused on enhancing coverage of historical events, rather than increased documenting of the present, and that the “copyright gap” that blanks out most of the twentieth century in digitised print collections is not a problem with Wikipedia, where there is steady exponential growth in its coverage from 1924 to the present.
“The one-way nature of connections in Wikipedia, the lack of links, and the uneven distribution of Infoboxes, all point to the limitations of metadata-based data mining of collections like Wikipedia. With SGI UV2, the large shared memory available allowed me to ask questions of the entire dataset in near-real time,” said Leetaru.
“With a huge amount of cache-coherent shared memory at my fingertips, I could simply write a few lines of code and run it across the entire dataset. This isn’t possible with a scale-out computing approach. It’s very similar to using a word processor instead of using a typewriter – I can conduct my research in a completely different way, focusing on the outcomes, not the algorithms.”
SGI's UV2 supercomputer is billed as the world's largest in-memory system for data-intensive problems, offering up to 4,096 cores and up to 64 terabytes of coherent main memory. Built using Intel's Xeon E5-4600 processors, it can scale to eight petabytes of shared memory at a peak I/O rate of four terabytes per second (14 PB/hour).
The focus of the new system is on usability. SGI UV2 works like a workstation, in that any given program has access to all the cores and all the memory of the system. This means it is less complex to manage than traditional scale-out systems with many nodes, and applications can scale without the complexity of multi-instance software.
“A lot of scientists are used to using PCs in terms of their environment,” said Bill Mannel, vice president of product marketing for SGI, speaking to Techworld at the International Supercomputing Conference in Hamburg.
“UV2 has only one operating system (Linux) and one interface – it’s very easy to use. It’s not like a big cluster where you’ve got all these little parts you have to worry about. And on top of it, we can also integrate graphics directly in with compute and memory, so now you can actually look at the data interactively.”
SGI announced availability of the UV2 at the conference yesterday, with prices starting at $30,000 (£20,000). The systems will start shipping in August 2012.
The first UV2 supercomputer, integrated with Intel's Many Integrated Core (MIC) technology, has been sent to Professor Stephen Hawking's COSMOS consortium. Hawking said that COSMOS will use UV2 to test its mathematical theories and create computer simulations of the universe.
Mannel added that although some UV2s will ship with Intel MIC technology (recently rebranded as Xeon Phi), not every problem requires massively parallel architectures, so some will just use a standard CPU.
“If there are a lot of independent particles that don’t share a lot of data between them, you could use each of those little cores to do a portion of that problem. But in some cases it takes a lot of work to actually make this thing work on an accelerator,” said Mannel.
“It depends on how much work you want to do to make it work, and then what kind of problem it is. Some problems in structural mechanics, they have a lot of communication between them, so they don’t work well on these kind of Xeon Phi architectures.”
SGI UV2 also supports NVIDIA Quadro GPUs and Tesla accelerators.