Frovedis, a development platform for AI / ML

Jun 1, 2021 Shoichiro Yokotani, Application Development Expert AI Platform division

1. AI on HPC platform
In recent years, there have been an increasing number of cases where the HPC platform is used not only for conventional science and technology simulation, but also for executing AI / machine learning (ML) and for integrating AI / ML and science and technology simulation. For example, there is an example of hybrid execution that complements fluid calculation simulation with supervised learning. [* 1]
In order to respond to this trend, NEC is working on the development AI/ML framework so that it can be used in the AI / ML domain. In this article, we will introduce Frovedis, a high-performance middleware for AI/ML that runs on SX-Aurora TSUBASA.

[*1]https://www.hpcwire.com/2021/01/21/researchers-train-fluid-dynamics-neural-networks-on-supercomputers/

2. Comparison of HPC and AI / ML development methods
In the field of science and technology simulation, application development proceeds in the order of programming the process in Fortran or C / C ++, then compiling it to create an executable file, and running the executable file. On the other hand, application development using AI / ML often takes a different approach. In many cases, it is common to use frameworks such as scikit-learn and TensorFlow to call APIs from Python and execute processing. Under these circumstances, an AI / ML development framework is available for SX-Aurora TSUBASA to execute machine learning and deep learning processing at high speed on a vector processor.

3. AI Development Framework
Let's take a look at the specific contents of the AI / ML development framework.
Data cleaning and preprocessing are front-row steps before doing data analytics in AI / ML. First, the data stored in the database is extracted by SQL, then tables joined or merged if necessary. Null values processing, scaling or standardizing are applied to the extracted data. These processes use NumPy and Pandas. It is said that this pre-processing work actually accounts for a large proportion of the total machine learning work time. After the data pre-processing is completed, the data is then loaded on to AI/ML framework to get learning process. For example, scikit-learn, one of the AI development frameworks, provides many algorithms such as classification, clustering, and regression analysis. Machine learning can be performed with algorithms suitable for each analysis.

4. What is Frovedis?
Frovedis is a machine learning framework for SX-Aurora TSUBASA and middleware compatible to Spark. It is published on as open source.
By using Frovedis, users can use the machine learning library for data analysis of Apache Spark (Spark) without being aware of the vector processor of SX-Aurora TSUBASA.

5. Frovedis APIs
You can call the Frovedis APIs from Python. For example, as the dataset size is small and around a few gigabytes, simply calling the Frovedis API from Python is better choice in the data processing and analytics rather than deploying dataset to the Spark framework. By offloading processing from Python to a vector processor through the Frovedis API, processing time can be significantly reduced compared to CPU execution. It turns out that trial and error cycle in an opting the best machine learning algorithm becomes much faster and to be ease,

The Frovedis machine learning APIs for Python can be broadly divided into the three groups; scikit-learn, Frovedis original APIs, and graph algorithms.

a. scikit-learn API
b. Frovedis original API
c. Frovedis.graph algorithm

a. scikit-learn is a set of libraries for machine learning using Python. By using scikit-learn based machine learning library, it makes less cost to deploy statistic-based machine learning application than taking from-scratch base Numpy and SciPy coding. Through the Frovedis API, it is possible to offload machine learning processing to a vector processor. These APIs provided by Frovedis are compatible with the parameters and attributes of the scikit-learn API. Therefore, it is very easy to change to a program that offloads processing to a vector processor without rewriting the call part of scikit-learn of Python code written based on CPU.

b. The Frovedis Origil API group consists of the following four.
Frovedis.mllib.fm.FactorizationMachineClassifier
Frovedis.mllib.recommendation.ALS
Frovedis.mllib.fpm.FP Growth
Frovedis.mllib.feature.Word2Vec

c. The graph algorithm is similar to the NetworkX API interface. Graph algorithms can be used for analysis such as extracting communities within social networks. Currently, there are eight graph algorithms that can be executed by a vector processor.
Frovedis.graph.pagerank
Frovedis.graph.connected_components
Frovedis.graph.single_source_shortest_path
Frovedis.graph.bfs_edges
Frovedis.graph.bfs_tree
Frovedis.graph.bfs_predecessors
Frovedis.graph.bfs_successors
Frovedis.graph.descendants_at_distance

In addition to the API for Python, Frovedis has a Spark compatible API, an interface for performing Pandas-like DataFrame operations on the vector processor of SX-Aurora TSUBASA, and an API for using ScaLAPACK and PBLAS from Python.

Next time, we will introduce specific usage examples of data preprocessing using Frovedis DataFrame and NLCPy, and learning using machine learning API.

- The product and service names on this website are trademarks or registered trademarks of either NEC Corporation, NEC Group companies or other companies respectively.