MN-1: The GPU cluster behind 15-min ImageNet

doipfn

2017-11-30 11:00:05

Preferred Networks, Inc. has completed ImageNet training in 15 minutes [1,2]. This is the fastest time to perform a 90-epoch ImageNet training ever achieved. Let me describe the MN-1 cluster used for this accomplishment.

Preferred Networks’ MN-1 cluster started operation this September [3]. It consists of 128 nodes with 8 NVIDIA P100 GPUs each, for 1024 GPUs in total. As each GPU unit has 4.7 TFLOPS in double precision floating point as its theoretical peak, the total theoretical peak capacity is more than 4.7 PFLOPS (including CPUs as well). The nodes are connected with two FDR Infiniband links (56Gbps x 2). PFN has exclusive use of the cluster, which is located in an NTT datacenter.

MN-1 in a Data center

MN-1 Cluster in an NTT Datacenter

On the TOP500 list published in this November, the MN-1 cluster is listed as the 91st most powerful supercomputer, with approx. 1.39PFLOPS maximum performance on the LINPACK benchmark[4]. Compared to traditional supercomputers, MN-1’s computation efficiency (28%) is not high. One of the performance bottlenecks is the interconnect. Unlike typical supercomputers, MN-1 is connected as a thin tree (compared to a fat tree). A group of sixteen nodes is connected to a pair of redundant infiniband switches. In the cluster, we have eight groups, and links between groups are aggregated in a redundant pair of infiniband switches. Thus, if a process needs to communicate with different group, the link between groups becomes a bottleneck, which lowers the LINPACK benchmark score.

Distributed Learning in ChainerMN

However, as stated at the beginning of this article, MN-1 can perform ultra-fast Deep Learning (DL). This is because ChainerMN does not require bottleneck-free communication for DL training. While training, ChainerMN collects and re-distributes parameter updates between all nodes. In the 15-minute trial, we used the ring allreduce algorithm. With the ring allreduce algorithm, nodes communicate with their adjacent node in the ring topology. The accumulation is performed on the first round, and the accumulated parameter update is distributed on the second round. Since we can make a ring without hitting the bottleneck on full duplex network, MN-1 cluster can efficiently finish the ImageNet training in 15 minutes with 1024 GPUs.

Scalability of ChainerMN up to 1024 GPUs

[1] https://arxiv.org/abs/1711.04325

[2] https://www.preferred-networks.jp/en/news/pr20171110

[3] https://www.preferred-networks.jp/en/news/pr20170920

[4] https://www.preferred-networks.jp/en/news/pr20171114

IROS 2017 Report

jethrotan

2017-11-06 10:30:04

Writers: Ryoma Kawajiri, Jethro Tan

Preferred Networks (PFN) attended the 30th IEEE/RSJ IROS conference held in Vancouver, Canada. IROS is known to be the second biggest robotics conference in the world after ICRA (see here for our report on this year’s ICRA) with 2797 total registrants, 2164 submitted papers (of which 970 were accepted amounting to an acceptance rate of 44.82%). With no less than 18 sessions being held in parallel, our members had a hard time to decide which ones to attend.

more »

2018 Intern Results at Preferred Networks (Part 1)

hido

2017-10-18 07:44:24

This summer, Preferred Networks accepted a record number of interns in Tokyo from all over the world. They tackled challenging tasks around artificial intelligence together with PFN mentors. We appreciate their passion, focus, and designation to the internship.

In this post, we would like to share some of their great jobs (more to come).

more »

Guest blog with Weihua, a former intern at PFN

hido

2017-09-11 16:29:13

This is a guest post in an interview style with Weihua Hu, a former intern at Preferred Networks last year from University of Tokyo, whose research has been extended after the internship and accepted at ICML 2017.

“Learning Discrete Representations via Information Maximizing Self-Augmented Training,” Weihua Hu, Takeru Miyato, Seiya Tokui, Eiichi Matsumoto, and Masashi Sugiyama; Proceedings of the 34th International Conference on Machine Learning, PMLR 70:1558-1567, 2017. (Link)

more »

ACL 2017 Report

Yuta Kikuchi

2017-09-08 13:54:22

Writers: Yuta Kikuchi, Sosuke Kobayashi

Preferred Networks (PFN) attended the 55th Annual Meeting of the
Association for Computational Linguistics (ACL 2017) in Vancouver, Canada. ACL is one of the largest conferences in the Natural Language Processing (NLP) field.

As in other Machine Learning research fields, use of deep learning in NLP is increasing. The most popular topic in NLP deep learning is sequence-to-sequence learning tasks. This model receives a sequence of discrete symbols (words) and learns to output a correct sequence conditioned by the input.

IMG_4383 (1)

more »

Deep Reinforcement Learning Bootcamp: Event Report

hido

2017-09-04 14:49:39

Preferred Networks proudly sponsored an exciting two-day event, Deep Reinforcement Learning Bootcamp, which was held August 26-27th at UC Berkeley.

The instructors of this event included famous researchers in this field, such as Vlad Mnih (DeepMind, creator of DQN), Pieter Abbeel (OpenAI/UC Berkeley), Sergey Levine (Google Brain/UC Berkeley), Andrej Karpathy (Tesla, head of AI), John Schulman (OpenAI) and up-and-coming researchers such as Chelsea Finn, Rocky Duan, and Peter Chen (UC Berkeley).

[Taken from the event page]

more »

ICML 2017 Report

Brian Vogel

2017-08-25 10:09:00

Preferred Networks (PFN) attended the International Conference on Machine Learning (ICML) in Sydney, Australia. The first ICML was held in 1980 in Pittsburgh, last year’s conference was in New York, and the 2018 ICML will be held in Stockholm. ICML is one of the largest machine learning conferences, with approximately 2400 people attending this year. There were 434 accepted submissions spanning nearly all areas of machine learning.

more »

CVPR 2017: Conference Report

Tommi Kerola

2017-08-17 10:00:50

Writers: Richard Calland, Tommi Kerola

Preferred Networks (PFN) attended the CVPR 2017 conference in Honolulu, U.S., one of the flagship conferences for discussing research and applications in computer vision and pattern recognition. Computer vision is of major importance for our activities at PFN, including applications for autonomous driving, robotics, and of course products such as PaintsChainer. Modern computer vision is largely based on deep learning, which is relevant for our continued research and product development. In this blog post, we will briefly summarize trends from this conference, focusing on a few papers relevant to each topic.

more »

ChainerCV Release

Yusuke Niitani

2017-08-14 11:07:10

We released ChainerCV: a utility library for computer vision in deep learning. This library aims at making the process of training and applying deep learning models for computer vision easier using Chainer. It contains high quality implementations of computer vision models, and tools that are necessary to conduct research in this field.

GitHub page: https://github.com/chainer/chainercv
Documentation: http://chainercv.readthedocs.io/en/stable/

more »

IETF 99 Report

Hirochika Asai

2017-08-08 14:01:13

Hello! I’m Hirochika Asai, working as a researcher at Preferred Networks (PFN). I joined PFN from this April. Today, I briefly report the 99th meeting of the Internet Engineering Task Force (IETF) (IETF 99) in this post. I participated in IETF 99, held at Hilton Prague, Prague, Czech. It was the first attendance to the IETF meetings from PFN.

more »