8th Aurora Forum

NEC HPC Solution Special Site for SC21

NEC have developed a Vector Engine (VE) for accelerated computing using vectorization, with the concept that the full application runs on the high performance VE and the operating system tasks are taken care of by the Vector Host (VH), which is a standard x86 server. This is the first time that the NEC SX series vector processor is integrated transparently into a Linux software environment. This allows the VE to concentrate on providing the best application performance. The SX-Aurora TSUBASA VE is flexible in sizing, can be water or air cooled, with an outstanding 48 GB HBM2 memory with a bandwidth of up to 1.53 TB/s and a comparatively low energy consumption with even better performance with the upcoming third generation of VE cards. Over the years a large amount of applications from different areas have been covered with especially good performance on simulation, weather forecasting, disaster prevention, resource exploration or oil and gas seismic imaging. Besides these applications the VE strongly supports Artificial Intelligence, Machine Learning, Big Data Analytics and Deep Learning to name a few.
On this page you'll find some examples explained in technical articles and in our upcoming webinar.

NEC Events in SC21

Aurora Forum Webinar

09:00 – 12:00 on Nov 15 CST

In Aurora Forum, we will show you the latest developments of the SX-Aurora TSUBASA Vector Engine. In addition to that our invited users will share their experiences on enhancing their research and daily work by using the vector architecture.

Here you can find the VODs of the below mentioned presentations.

SX-Aurora TSUBASA - PCIe card type vector computer -
Presenter: Tsutomu Takeda, Manager, NEC Corporation

SX-Aurora TSUBASA is a PCIe card type vector supercomputer and inherits the technology of SX Vector Supercomputer first released in 1983. The presentation will give the feature of SX-Aurora TSUBASA, and announce some new functions. Starting card business and channel sales will be shared as well. Besides, we will see the value proposition of Vector Engine. The Vector Engine can show high performance for not only traditional HPC applications but also a variety of other applications. The "NEC Vector Annealing Service," a quantum-inspired simulated annealing service that uses SX-Aurora TSUBASA, will be touched upon as well, which will be launched in November 2021 in Japan. The talk will also cover the roadmap of SX-Aurora TSUBASA including Vector Engine 3.0 information, and our new partnering (e.g. GRAPHCORE) will be announced.

Introduction of a new qubit cencept expressed by the extended Riemann sphere
Presenter: Kosuke Tomonaga, Collaborative Researcher, The University of Tokyo / SoftBank Robotics Corp.

We propose a new concept of qubit expressed by an extended Riemann sphere and introduce our approach to demonstrate the new qubit on the SX-Aurora TSUBASA.
On the extended Riemann sphere, we don’t need to fear singularities because we can naturally handle infinities and divisions by zero on the sphere. Even resonance and emergent phenomena at an ideal point can be processed.
In the near future, we will show a demonstration of an artificial ego that sympathizes with people and creates new algorithms autonomously for the people because our qubits can naturally process resonance and emergent phenomena of the people.

Introduction of native Spark SQL support for SX-Aurora TSUBASA
Presenter: Manami Abe, Assistant Manager, NEC Corporation

Nowadays, Machine Learning/Deep Learning are major focusing area for improving the performance of data analytics. Even in HPC area, convergence of HPC and AI are sometime in the topics. However, looking at the entire data analysis process, improving fast data preparation is very important. In order to generate and retain large amounts of fresh data with high quality, the data preparation process needs to be repeated at frequent intervals. In this session, NEC will introduce Spark SQL on SX-Aurora TSUBASA, which enables high-speed data preparation and contributes to improving high-speed data analysis with high-quality data.

Accelerating Seismic Redatuming Using Tile Low-Rank Approximations on NEC SX-Aurora TSUBASA
Presenter: Dr. Hatem Ltaief, Principal Research Scientist, King Abdullah University of Science and Technology

With the aim of imaging subsurface discontinuities, seismic data recorded at the surface of the Earth must be numerically re-positioned at locations in the subsurface where reflections have originated, a process generally referred to as redatuming by the geophysical community (Berryhill, 1984). Historically, this process has been carried out by numerically time-reversing the data recorded along an open boundary of surface receivers into the subsurface. Despite its simplicity, such an approach is only able to handle seismic energy from primary arrivals (i.e., waves that interact only once with the medium discontinuities) failing to explain multi-scattering in the subsurface; as a result, seismic images are contaminated by artificial reflectors if data are not pre-processed prior to imaging such that multiples are removed from the data. In the last decade, a novel family of methods has emerged under the name of Marchenko redatuming (Broggini et al., 2012; Wapenaar et al., 2014; Ravasi et al., 2016); such methods allow for accurate redatuming of the full-wavefield recorded seismic data including multiple arrivals. This is achieved by solving an inverse problem, whose adjoint modeling can be shown to be equivalent to the standard single-scattering redatuming method of Berryhill (1984). We accelerate the dense matrix-vector multiplication (MVM) that represents one of the most time-consuming operations in the forward and adjoint processes of the inverse problem. We identify and leverage the data sparsity structure for each of the frequency operators. We present the impact of tile low-rank (TLR) matrix approximations on time-to-solution for the MVM using different accuracy thresholds and assess the resulting subsurface seismic image quality. We provide performance evaluation on NEC SX-Aurora TSUBASA. We achieve performance improvement up to two orders of magnitude for TLR-MVM compared to regular vendor optimized dense MVM, without deteriorating the image quality.

VGL: a high-performance graph-processing framework for the NEC SX-Aurora TSUBASA architecture
Presenter: Dr. Ilya Afanasyev, Resercher, Research Computing Center, Lomonosov Moscow State University

This talk presents Vector Graph Library (VGL) — a novel graph-processing framework, which targets NEC SX-Aurora TSUBASA vector engines and provides relatively simple computational and data abstractions, which allow framework users to easily express various graph algorithms. These abstractions incorporate many vector-oriented optimization strategies into a high-level programming model, allowing quick implementation of new graph algorithms with a small amount of code and minimal knowledge about features of vector systems. In this talk, I will discuss basic principles standing behind VGL API, will show sample implementations of fundamental graph algorithms using VGL, and, finally, will evaluate VGL's performance on several widely used graph processing problems. The provided comparative performance analysis demonstrates that the VGL-based implementations achieve significant acceleration over the existing high-performance frameworks and libraries: up to 14 times speedup over multicore CPUs (Ligra, Galois, GAPBS) and up to 3 times speedup compared to NVIDIA GPU (Gunrock, NVGRAPH) implementations. In the end, I will also discuss the process of porting VGL to other architectures with high-bandwidth memory, such as NVIDIA GPUs and A64FX, in an attempt to develop an architecture-independent and portable API.

Introduction of innovative AI processor, IPU and IPU-POD system
Presenter: Mamoru Nakano, President, Graphcore Japan KK

IPU, Intelligence Processing Unit is newly designed specifically for AI and a highly flexible, easy-to-use, parallel processor designed from the ground up for machine intelligence workloads by using various high technology, MIMD, In-processor memory and BSP. Its IPU-POD system delivers ultimate flexibility to maximize all available space and power as the system for AI production system. it can scale from POD4 with 4IPU to EXA-POD with 64,000IPU. It is highly regarded in many fields such as life science, drug discovery, finance, telecommunication, internet business and HPC&AI. this talk will cover how IPU can achieve the next breakthrough in machine learning, accelerating current AI algorithms and the next generation of large-scale machine learning models.