SpiNNaker, the Million-Core Supercomputer, Finally Switched On

After 12 years in the making, the “brain computer” designed at the University of Manchester is finally switched on. What does this computer do? How is it made? And who is Steve Furber?

AI systems have been rapidly developed in the past decade with the use of deep learning, neural networks, and large computers to try and simulate neurons. But AI is not the only area of interest when using such techniques; scientists and engineers alike are also keen to try and simulate the human brain to better understand how it works and why.

Simulating the brain is no trivial task. The complexity of the human brain is difficult to replicate, which is part of why the SpiNNaker computer is important.

The Challenges of Simulating a Brain

One of the first fundamental differences between the brain and computers is how their “smallest units” function. Brain neurons can have multiple connections and react to impulses in a range of different ways. Computer transistors, by comparison, are switches that, while can be connected to other transistors, can only have one of two states.

Neurons are also able to forge links between other neurons and react to stimuli differently (which is one definition of “learning”), whereas transistor connections are fixed.

Because of these differences, scientists have to “simulate” neurons and connections in software rather than in hardware, which severely impacts the number of neurons and links that can be simulated simultaneously.

What about simulation neurons in hardware?

Neurons and transistors share little in common but a better comparison would be simple microcontrollers and FPGAs; microcontrollers are akin to neurons in that they can process outside signals quickly while being comparatively simple in architecture while FPGAs provide the ability to break and create connections between microcontrollers.

Could hardware simulation be the key? One team of researchers believes so and has spent the last 12 years on the idea.

The SpiNNaker

A research team at the University of Manchester have spent the last 12 years creating a computer that will simulate neurons and connections with the use of many simple cores all interconnected on a massive parallel system and the computer, called SpiNNaker, was finally turned on.

The million-core computer is designed to simulate up to a billion neurons in real-time to allow scientists to study neural networks and pathways in a realistic manner by using hardware as opposed to software.

Unlike traditional methods for simulating neurons, SpiNNaker has individual processors that each simulate up to 1000 neurons that transmit and receive small packets of data to and from many other neurons simultaneously.

Hexagonal topology between processors and a 48-processor SpiNNaker 
computer - Image courtesy University of Manchester

The Spiking Neural Network Architecture system (SpiNNaker) consists of 10 19-inch computer racks with each rack containing 100,000 ARM cores. This core density is achieved with the use of a custom IC that contain up to 18 cores. Each board in a rack has 48 chips, which results in each board containing 864 processors.

Unlike typical software systems, the cores are arranged in a hexagonal pattern with data transmission handled entirely in hardware. It is this topology that allows for the system to simulate one billion neurons in real-time. The system uses ARM9 processors containing a total of 7TB of RAM and 57K nodes while each processor has an off-die 128MB SDRAM and each core has 32KB ROM and 64KB data tightly-coupled memory DTCM …



3D-printed Deep Learning neural network uses light instead of electrons

It’s a novel idea, using light diffracted through numerous plates instead of electrons. And to some, it might seem a little like replacing a computer with an abacus, but researchers at UCLA have high hopes for their quirky, shiny, speed-of-light artificial neural network.

Coined by Rina Dechter in 1986, Deep Learning is one of the fastest-growing methodologies in the machine learning community and is often used in face, speech and audio recognition, language processing, social network filtering and medical image analysis as well as addressing more specific tasks, such as solving inverse imaging problems.

Traditionally, deep learning systems are implemented on a computer to learn data representation and abstraction and perform tasks, on par with or better than – the performance of humans. However the team led by Dr. Aydogan Ozcan, the Chancellor’s Professor of electrical and computer engineering at UCLA, didn’t use a traditional computer set-up, instead choosing to forgo all those energy-hungry electrons in favor of light waves. The result was its all-optical Diffractive Deep Neural Network (D2NN) architecture.

The setup uses 3D-printed translucent sheets, each with thousands of raised pixels, which deflect light through each panel in order to perform set tasks. By the way, these tasks are performed without the use of any power, except for the input light beam.

The UCLA team’s all-optical deep neural network – which looks like the guts of a solid gold car battery – literally operates at the speed of light, and will find applications in image analysis, feature detection and object classification. Researchers on the team also envisage possibilities for D2NN architectures performing specialized tasks in cameras. Perhaps your next DSLR might identify your subjects on the fly and post the tagged image to your Facebook timeline.

view gallery

“Using passive components that are fabricated layer by layer, and connecting these layers to each other via light diffraction created a unique all-optical platform to perform machine learning tasks at the speed of light,” said Dr. Ozcan.

For now though, this is a proof of concept, but it shines a light on some unique opportunities for the machine learning industry.

The research has been published in the journal Science.


Volta Tensor Core GPU Achieves New AI Performance Milestones

Artificial intelligence powered by deep learning now solves challenges once thought impossible, such as computers understanding and conversing in natural speech and autonomous driving. Inspired by the effectiveness of deep learning to solve a great many challenges, the exponentially growing complexity of algorithms has resulted in a voracious appetite for faster computing. NVIDIA designed the Volta Tensor Core architecture to meet these needs.

NVIDIA and many other companies and researchers have been developing both computing hardware and software platforms to address this need. For instance, Google created their TPU (tensor processing unit) accelerators which have generated good performance on the limited number of neural networks that can run on TPUs.

In this blog, we share some of our recent advancements which deliver dramatic performance gains on GPUs to the AI community. We have achieved record-setting ResNet-50 performance for a single chip and single server with these improvements. Recently, fast.ai also announced their record-setting performance on a single cloud instance.

Our results demonstrate that:

  • A single V100 Tensor Core GPU achieves 1,075 images/second when training ResNet-50, a 4x performance increase compared to the previous generation Pascal GPU.
  • A single DGX-1 server powered by eight Tensor Core V100s achieves 7,850 images/second, almost 2x the 4,200 images/second from a year ago on the same system.
  • A single AWS P3 cloud instance powered by eight Tensor Core V100s can train ResNet-50 in less than three hours, 3x faster than a TPU instance.


Volta Tensor Core GPU ResNet-50 record
Figure 1. Volta Tensor Core GPU Achieves Speed Records In ResNet-50 (AWS P3.16xlarge instance consists of 8x Tesla V100 GPUs).

Massive parallel processing performance on a diversity of algorithms makes NVIDIA GPUs naturally great for deep learning. We didn’t stop there. Tapping our years of experience and close collaboration with AI researchers all over the world, we created a new architecture optimized for the many models of deep learning – the NVIDIA Tensor Core GPU.

Combined with high-speed NVLink interconnect plus deep optimizations within all current frameworks, we achieve state-of-the-art performance. NVIDIA CUDA GPU programmability ensures performance for the large diversity of modern networks, as well as provides a platform to bring up emerging frameworks and tomorrow’s deep network inventions  …..
<read continue>

High-speed light-based systems could replace supercomputers for certain ‘deep learning’ calculations

Low power requirements for photons (instead of electrons) may make deep learning more practical in future self-driving cars and mobile consumer devices

(a) Optical micrograph of an experimentally fabricated on-chip optical interference unit; the physical region where the optical neural network program exists is highlighted in gray. A programmable nanophotonic processor uses a field-programmable gate array (similar to an FPGA integrated circuit ) — an array of interconnected waveguides, allowing the light beams to be modified as needed for a specific deep-learning matrix computation. (b) Schematic illustration of the optical neural network program, which performs matrix multiplication and amplification fully optically. (credit: Yichen Shen et al./Nature Photonics)

A team of researchers at MIT and elsewhere has developed a new approach to deep learning systems — using light instead of electricity, which they say could vastly improve the speed and efficiency of certain deep-learning computations.

Deep-learning systems are based on artificial neural networks that mimic the way the brain learns from an accumulation of examples. They can enable technologies such as face- and voice-recognition software, or scour vast amounts of medical data to find patterns that could be useful diagnostically, for example.

But the computations these systems carry out are highly complex and demanding, even for supercomputers. Traditional computer architectures are not very efficient for calculations needed for neural-network tasks that involve repeated multiplications of matrices (arrays of numbers). These can be computationally intensive for conventional CPUs or even GPUs.

Programmable nanophotonic processor

Instead, the new approach uses an optical device that the researchers call a “programmable nanophotonic processor.” Multiple light beams are directed in such a way that their waves interact with each other, producing interference patterns that “compute” the intended operation.

The optical chips using this architecture could, in principle, carry out dense matrix multiplications (the most power-hungry and time-consuming part in AI algorithms) for learning tasks much faster, compared to conventional electronic chips. The researchers expect a computational speed enhancement of at least two orders of magnitude over the state-of-the-art and three orders of magnitude in power efficiency.

“This chip, once you tune it, can carry out matrix multiplication with, in principle, zero energy, almost instantly,” says Marin Soljacic, one of the MIT researchers on the team.

To demonstrate the concept, the team set the programmable nanophotonic processor to implement a neural network that recognizes four basic vowel sounds. Even with the prototype system, they were able to achieve a 77 percent accuracy level, compared to about 90 percent for conventional systems. There are “no substantial obstacles” to scaling up the system for greater accuracy, according to Soljacic.

The team says is will still take a lot more time and effort to make this system useful. However, once the system is scaled up and fully functioning, the low-power system should find many uses, especially for situations where power is limited, such as in self-driving cars, drones, and mobile consumer devices. Other uses include signal processing for data transmission and computer centers.

The research was published Monday (June 12, 2017) in a paper in the journal Nature Photonics (open-access version available on arXiv).

The team also included researchers at Elenion Technologies of New York and the Université de Sherbrooke in Quebec. The work was supported by the U.S. Army Research Office through the Institute for Soldier Nanotechnologies, the National Science Foundation, and the Air Force Office of Scientific Research.

Abstract of Deep learning with coherent nanophotonic circuits

Artificial neural networks are computational network models inspired by signal processing in the brain. These models have dramatically improved performance for many machine-learning tasks, including speech and image recognition. However, today’s computing hardware is inefficient at implementing neural networks, in large part because much of it was designed for von Neumann computing schemes. Significant effort has been made towards developing electronic architectures tuned to implement artificial neural networks that exhibit improved computational speed and accuracy. Here, we propose a new architecture for a fully optical neural network that, in principle, could offer an enhancement in computational speed and power efficiency over state-of-the-art electronics for conventional inference tasks. We experimentally demonstrate the essential part of the concept using a programmable nanophotonic processor featuring a cascaded array of 56 programmable Mach–Zehnder interferometers in a silicon photonic integrated circuit and show its utility for vowel recognition.

References from:

Yichen Shen et al. Deep learning with coherent nanophotonic circuits. Nature Photonics (2017) doi:10.1038/nphoton.2017.93