Open-set Few-shot Speaker Identification

tsuboiyuta

2019-10-21 15:56:12

This is a guest blog from an ex-intern, Nontawat Charoenphakdee.

About me

I am Nontawat Charoenphakdee from Bangkok, Thailand. I am currently a second-year PhD student (starting from Sep 2018) working on machine learning at the Sugiyama-Sato-Honda lab in the University of Tokyo. I graduated with a master’s degree from my current lab. My hobbies are listening to music, karaoke, and playing games. More information about me can be found here: https://nolfwin.github.io/.This blog entry introduces my work during the summer internship (Aug-Sep 2019) at PFN.

Introduction: a system that can recognize you by your voice

Speaking is a natural way for humans to communicate. As we can see from recent developments in speech technology, the way we communicate with robots is getting closer and closer to the way we talk to people [1, 2, 3, 4]. For example, PFN’s interactive robot can receive voice commands from humans and follow their orders (see PFN’s ICRA-2018 paper and Autonomous Tidying-up Robot System at CEATEC2018 for more information).

Currently, several voice assistant applications are focusing on understanding the voice command without verifying the identity of speakers [2]. It is known that being able to know a speaker’s identity can enhance the security of an application. Intuitively, we definitely do not want anyone to be able to give an order to any robots, especially in a critical application (for example, see “Amazon’s Alexa started ordering people dollhouses after hearing its name on TV” and “How A Few Words To Apple’s Siri Unlocked A Man’s Front Door“). 

Not only security issues, we believe that being able to recognize a speaker’s identity will lead to more exciting and useful applications of the current technology we have. Two examples are given as follows.

First, a robot can provide an appropriate response for each speaker. For example, a robot teacher may adjust an explanation according to the student or a personal robot may interact with its owner and their friends differently according to the robot’s knowledge about each user. Another example is we can design a permission level for each user to use a robot. This can also prevent a robot to receive a command from an unknown speaker who wants to use a robot in an inappropriate way. 

Second, we can communicate with a robot more naturally. Consider a scenario such as a person says “Take my cup” behind the robot. Although a robot cannot see that person, a robot can associate the word “my” with the speech identity of the speaker and perform a command accordingly. It would be less natural to say “Take [person_name]’s cup” when the context is clear. Intuitively, knowing who you are talking to will give you a better understanding on the current context of the conversation.

For these reasons, being able to recognize a speaker allows a personal robot to support a wider range of applications. Thus, this study aims to explore the possibility of using speaker identification under the situation where we only have a few training data for our target speakers (few-shot). The main motivation is that we do not want our customers/users to spend too much time teaching our robots. Moreover, for safety reasons, the system should be able to detect unknown speakers that are not in the training data in the test phase (open-set). So that we can avoid any potential damages that may cause by them. As a result, we created an application and tested it using real-world few-shot speech data (which are collected from PFN members and interns, thank you for your cooperation!).

Problem setting: Open-set Few-shot Speaker Identification

Without any prior knowledge of the task, it would be difficult to use machine learning when the number of data is very few (e.g., two data points per class) because it is prone to overfitting. In our speaker identification task, we are working on human speech information. Therefore, in addition to a small dataset of our target speakers (target data), we may consider incorporating a large labeled speech dataset even though such data are not from our target speakers (source data). Our problem setting can be informally explained as follows:

Given:

  1. A large labeled data (speech-speaker pairs) from benchmark datasets.
  2. A small labeled data (speech-speaker pairs) from target speakers 

Goal: 

Learn a classifier that can classify well from given new speech input, whether it comes from which target speaker or it is not from any target speakers at all. To evaluate the performance of a classifier, we use the following three evaluation metrics: accuracy (ACC), balanced accuracy (BAC, i.e. averaged accuracy), and F1-measure (F1) (see [5] for more information on each metric). 

Source data

We used the LibriSpeech dataset [6] as the source data. It is a famous freely available speech data consists of more than 1000 hours of English speech. There are almost 2500 speakers in this dataset.

Figure 1: Statistics of LibriSpeech dataset

Figure 1: Statistics of LibriSpeech dataset

Target data

Figure 2:  An invitation to join this project used in the interim presentation

Figure 2:  An invitation to join this project used in the interim presentation

Figure 3:  An instruction for collecting target speaker data

Figure 3: An instruction for collecting target speaker data

In this internship program, we collected two datasets from PFN members. The first dataset is a 4-speaker dataset recorded in “Banana” room in PFN (we call this dataset PFN-banana). Banana is a meeting room for up to 10 people. The recording environment was clean, i.e., without noise. The other dataset is a 14-speaker dataset recorded in PFN’s cafe (we call this dataset PFN-Cafe), which is a large room that can be used to hold a party for more than 100 people. Since we recorded the data during the interim poster presentation for PFN-Cafe, there were many people speaking (e.g., other presenters were presenting their work next to my poster). As a result, the collected data were quite noisy. Furthermore, because I was worried that data could be too noisy, I asked the speakers to speak loudly and found that it was too loud and the record was clipped (audio clipping is a type of waveform distortion). 

Figure 4:  Audio clipping issue in PFN-Cafe dataset

Figure 4:  Audio clipping issue in PFN-Cafe dataset

We also exclude one folder of LibriSpeech (test-other) as target data with 33 speakers.

Method

Data preprocessing

We used 16,000 Hz as a sampling rate. An audio file is trimmed to start from speech information. Then we extracted a log filterbank feature (n_filter = 24) from given speech. Next, we stacked 5 adjacent filterbank features together (n_dim=n_filter x 5 = 120). Note that we are using variable-length data (each speech input data do not need to have the same dimension). For one speech input, we order these stacked filterbank features up to 80 stacked filterbank features for each input. As a result, the possible dimensions of inputs are from (1,120) to (80, 120). This preprocessing idea is quite similar to the preprocessing of X-vector but not exactly the same (see [7] for more information), because we found our preprocessing scheme empirically works better in our experiment.

The model choice for training a feature extractor

We used a very simple LSTM for our experiment. We used Chainer [8] to implement all methods. We used LSTM (chainer.links.NStepLSTM) as our model. We used 10% dropout [13] for regularization. For the optimization algorithm, we used AdamW [14] with 1e-4 as a weight decay rate. One may try a more complicated model to get better performance. We also implemented X-vector [7],  but found that it takes a long time to run and we couldn’t do many trial-and-error. d-vector [9] is another choice but it cannot support the variable-length data. i-vector [10] is also another alternative to deal with speaker identification, which is well-known and highly-used before the popularity of deep neural networks. Although we used a relatively small network (one-layer LSTM) that can be trained in such a short time (within a day using 1 GPU), we still obtained a good performance on the source data. Table 5 shows the performance on source unseen data (but seen 2451 speakers). 

Simple yet the best method in my experiment

We have attempted several methods but we found that this highly simple approach achieves the best performance and we used it in our demo for the final poster presentation. 

  1. Using source data: learn a neural network to classify source data effectively using cross-entropy loss
  2. Remove the final linear layer and the last softmax layer and use the remaining network as a feature extractor
  3. For dealing with the open-set scenario, we simply used unseen LibriSpeech data (test-other folder) as a background class.
  4. Learn a new linear layer and a softmax layer on target data and a background class data

(Optional) for 4, we may also fine-tune the feature extractor for our target speakers, however, we have to be careful to avoid overfitting since we have very few target data. We also did a fine-tune with a few epochs and we observed a small improvement.

One interesting but still explainable thing is we found that using one-layer LSTM and 200 number of units, the performance is the best in our experiments. Although the performance on the source is not as good as when using two-layer LSTM with the same number of units, In a preliminary experiment, we found the best model in the source suffers from overfitting in the few-shot setting. Figure 6 shows the difference in the performance of one-layer LSTM and two-layer LSTM with the same number of units (200). One-layer LSTM outperformed two-layer LSTM in the few-shot learning scenario although two-layer LSTM is better when evaluating on the source (see Table 5 for the performance on the source data). Note that this method discards the final linear layer and the last softmax layer after finishing training the source domain. In our opinion, it is interesting to explore the possibility to incorporate this information to improve the performance of few-shot learning for future work.

Table 5: We can achieve 99 percent test accuracy on 2451-speaker classification (unseen test points, but seen 2451-speaker)  for LibriSpeech dataset using our preprocessing method and simply ran it with cross-entropy loss (200 epochs).

LSTM layersNumber of unitsTest accuracy on source data with 2451 classes (%)
15084.60
110096.23
120097.75
210097.1
220099.04
Figure 6: Performance of 2-shot learning on PFN-Cafe without open-set scenario as the number of target speakers increases. Left: one-layer, 200 number of units LSTM. Right: two-layer, 200 number of units LSTM.

Figure 6: Performance of 2-shot learning on PFN-Cafe without open-set scenario as the number of target speakers increases. Left: one-layer, 200 number of units LSTM. Right: two-layer, 200 number of units LSTM.

Related work

Baseline++

In the paper: A closer look at Few-shot learning [11] proposed a simple method that can perform well in their experiment (Baseline++), which is based on cosine-similarity. However, we found that this method did not work well when the number of pre-trained classes is large (2451 in our case). We found that the implementation of this method is not that straightforward and the author introduced the scale factor, which needs to be adjusted appropriately depending on the task (see the code from the original paper). We also tried to adjust this scale factor and improve the performance but it still did not work well when we have 1000+ of classes (2451 in our case).

Prototypical network

We also tried a famous prototypical network [12] for our problem. However, it did not work well in our preliminary experiments and it is not straightforward to extend a prototypical network to support the open-set classification. It is one interesting research direction to make this happens.

Results

We presented our results on three datasets. First, the benchmark dataset (LibriSpeech). Second, PFN-Banana datasets, which are speech recorded without noise from four PFN members. Third, PFN-Cafe, which are speech recorded in the café during the interim presentation. Figure 7 shows an overview of how we evaluate the result.

Figure 7: An overview of the evaluation procedure.

Figure 7: An overview of the evaluation procedure.

Result on few-shot learning in LibriSpeech

We exclude the test-other folder from the pre-training set (source) and because we will use it for evaluating the performance of the few-shot learning. Note that target speakers are not given in the source data. Experiments show that we can achieve over 99% ACC/F1/BAC for 10-shot learning with 33 target speakers. 

Figure 8: Performance of 10-shot learning on LibriSpeech without open-set scenario as the number of target speakers increases.

Figure 8: Performance of 10-shot learning on LibriSpeech without open-set scenario as the number of target speakers increases.

Figure 9:  Performance of 33-speaker learning on LibriSpeech without open-set scenario as the number of shots increases.

Figure 9:  Performance of 33-speaker learning on LibriSpeech without open-set scenario as the number of shots increases.

Although we have never observed 33 speakers before in the pre-training phase. We can obtain highly accurate predictions in the few-shot learning scenario. This suggests that our simple pre-training method can extract useful information to identify a speaker identity to some extent. However, one may argue that it is still the same dataset collected under similar environments, and it may not work when we use a different dataset. Motivated by this argument, we collect real-world data and test it on completely different environments (PFN-Banana, PFN-Cafe).  

PFN-Banana

Figure 10 shows the performance of our method for PFN-Banana dataset. For the closed-set scenario (the scenario without open-set scenario), we can achieve over 90 percent for this dataset. For the open-set scenario, the performance dropped (around 6-7%). It is not surprising that accuracy is very high in the open-set scenario because we add a lot of open-set data in the test phase, which caused the data to be highly imbalanced between in-distribution and out-distribution data. 

Figure 10:  Performance of 2-shot learning on PFN-Banana as the number of target speakers increases. Left: Closed-set scenario. Right: Open-set scenario.

PFN-Cafe

We report the performance on PFN-Cafe dataset, which we conducted similarly to our experiment on PFN-Banana dataset, in Figure 10. Although the data is quite noisy and the audio amplitude is clipped, our method still performed reasonably well on PFN-Cafe dataset. 

Figure 11:  Performance of 2-shot learning on PFN-Cafe as the number of target speakers increases. Left: Closed-set scenario. Right: Open-set scenario.

Figure 11:  Performance of 2-shot learning on PFN-Cafe as the number of target speakers increases. Left: Closed-set scenario. Right: Open-set scenario.

Demo (final presentation):

The final presentation was done in the room “Forest”, which we have never collected the data in this room. Nevertheless, our simple method can classify reasonably well. Unfortunately, we did not record the exact performance of our method on the day. We found that it can classify many target speakers very accurately. But at the same time, there was one target speaker that our classifier almost always failed to recognize. Our method could detect unknown speakers pretty well although there were also a few misclassifications to target speakers. We also found that the first prediction result users see really affects the first impression towards our application, which is reasonable and developers should keep this in mind.

Figure 12:  An invitation to test our system in the final lightning talk

Figure 12:  An invitation to test our system in the final lightning talk

Figure 13: Testing a demo (Yuya Unno (left), Nontawat Charoenphakdee (right))

Figure 13:  Testing a demo (Yuya Unno (left), Nontawat Charoenphakdee (right))

Discussion

It is important to know the limitation of this technology. For example, if a target speaker is sick and his/her voice sounds different from usual, can the system still detect that person accurately? Is there a good and cheap data augmentation method to alleviate this problem because it is impractical to record one’s voice in every condition? Moreover, in practice, we may incorporate visual information to handle this problem. But sometimes only visual information is insufficient since we may not see everything in the range. For example, we may not be able to see something behind us or there might be something that blocks our vision. In such a case, the hearing will be very helpful.

Acknowledgment

My mentors are Yuta Tsuboi (main) and Katsuhiko Ishiguro (sub). I received tremendous support from them. Also, I would like to thank Toru Taniguchi for teaching me a lot especially during the first week: from preprocessing the speech data to introducing several interesting state-of-the-art papers in the field of speech processing. Moreover, I would like to thank Takashi Masuko, who actively attended my weekly meeting and gave me useful comments. Finally, I would like to thank people from Human-robot interface team, Intelligent information processing team, and everyone who provided data for this project.

References

[1] Fong, T., Nourbakhsh, I., & Dautenhahn, K. (2003). A survey of socially interactive robots. Robotics and autonomous systems, 42(3-4), 143-166.
[2] Hoy, M. B. (2018). Alexa, Siri, Cortana, and more: an introduction to voice assistants. Medical reference services quarterly, 37(1), 81-88.
[3] Kepuska, V., & Bohouta, G. (2018). Next-generation of virtual personal assistants (microsoft cortana, apple siri, amazon alexa and google home). In 2018 IEEE 8th Annual Computing and Communication Workshop and Conference (CCWC) (pp. 99-103). IEEE.
[4] Hatori, J., Kikuchi, Y., Kobayashi, S., Takahashi, K., Tsuboi, Y., Unno, Y., Ko, W. & Tan, J. (2018). Interactively picking real-world objects with unconstrained spoken language instructions. In 2018 IEEE International Conference on Robotics and Automation (ICRA) (pp. 3774-3781). IEEE.
[5] Hossin, M., & Sulaiman, M. N. (2015). A review on evaluation metrics for data classification evaluations. International Journal of Data Mining & Knowledge Management Process, 5(2), 1.
[6] Panayotov, V., Chen, G., Povey, D., & Khudanpur, S. (2015). Librispeech: an ASR corpus based on public domain audio books. In 2015 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) (pp. 5206-5210). IEEE.
[7] Snyder, D., Garcia-Romero, D., Sell, G., Povey, D., & Khudanpur, S. (2018). X-vectors: Robust dnn embeddings for speaker recognition. In 2018 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) (pp. 5329-5333). IEEE.
[8] Tokui, S., Okuta, R., Akiba, T., Niitani, Y., Ogawa, T., Saito, S., Suzuki, S., Uenishi, K., Vogel, B. & Yamazaki Vincent, H. (2019). Chainer: A deep learning framework for accelerating the research cycle. In Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining (SIGKDD) (pp. 2002-2011). ACM.
[9] Variani, E., Lei, X., McDermott, E., Moreno, I. L., & Gonzalez-Dominguez, J. (2014). Deep neural networks for small footprint text-dependent speaker verification. In 2014 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) (pp. 4052-4056). IEEE.
[10] Dehak, N., Kenny, P. J., Dehak, R., Dumouchel, P., & Ouellet, P. (2010). Front-end factor analysis for speaker verification. IEEE Transactions on Audio, Speech, and Language Processing, 19(4), 788-798.
[11] Chen, W. Y., Liu , Y. C., Kira, Z., Wang, Y. C. F. & Huang, J. B. (2019). A Closer Look at Few-shot Classification. In Proceedings of International Conference on Learning Representations (ICLR).
[12] Snell, J., Swersky, K., & Zemel, R. (2017). Prototypical networks for few-shot learning. In Advances in Neural Information Processing Systems (NeurIPS) (pp. 4077-4087).
[13] Srivastava, N., Hinton, G., Krizhevsky, A., Sutskever, I., & Salakhutdinov, R. (2014). Dropout: a simple way to prevent neural networks from overfitting. The journal of machine learning research, 15(1), 1929-1958.
[14] Loshchilov, I., & Hutter, F. (2019). Decoupled weight decay regularization. In Proceedings of International Conference on Learning Representations (ICLR).

A Brief History of Object Detection – from Haar-like features to losing anchors

Tommi Kerola

2019-09-27 14:50:00

Hello, my name is Tommi Kerola, an engineer at Preferred Networks. I would like to share some slides about recent research in object detection that was presented at an internal PFN seminar. We are making the slides publicly available with the hope that others may find it interesting or useful for research purposes.

Object detection is an important computer vision technique with applications in several domains such as autonomous driving, personal and industrial robotics. The below slides cover the history of object detection from before deep learning until recent research. The slides aim to cover the history and future directions of object detection, as well as some guidelines for how to choose which type of object detector to use for your own project.

A Brief History of Object Detection / Tommi Kerola from Preferred Networks

 

References:

  • T. Akiba et al. Pfdet:  2nd place solution to open images challenge 2018 object detection track. arXiv preprint arXiv:1809.00778, 2018.
  • N. Bodla et al. Soft-nms–improving object detection with one line of code. In Proceedings of the IEEE International Conference on Computer Vision, pages 5561–5569, 2017.
  • N. Dalal and B. Triggs. Histograms of oriented gradients for human detection. In International Conference on computer vision & Pattern Recognition (CVPR’05) , volume 1, pages 886–893. IEEE Computer Society, 2005.
  • K. Duan et al. Centernet:  Object detection with keypoint triplets. arXiv preprint arXiv:1904.08189, 2019.
  • P. Felzenszwalb et al. A discriminatively trained, multiscale, deformable part model. In CVPR, 2008.
  • R. Girshick. Fast r-cnn. In Proceedings of the IEEE international conference on computer vision, pages 1440–1448, 2015
  • R. Girshick et al.Rich feature hierarchies for accurate object detection and semantic segmentation. In Proceedings of the IEEE conference on computer vision and pattern recognition, pages 580–587, 2014.
  • J. Hosang et al. Learning non-maximum suppression. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pages 4507–4515, 2017.
  • H. Hu et al. Relation networks for object detection.In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pages 3588–3597, 2018.
  • A. Kirillov et al.Panoptic segmentation.In The IEEE Conference on Computer Vision and Pattern Recognition (CVPR) , June 2019.
  • A. Krizhevsky et al. Imagenet classification with deep convolutional neural networks. In Advances in neural information processing systems, pages 1097–1105, 2012.
  • H. Law and J. Deng. Cornernet:  Detecting objects as paired keypoints. In Proceedings of the European Conference on Computer Vision (ECCV), pages 734–750, 2018
  • Y. Li et al. Scale-aware trident networks for object detection.arXiv preprint arXiv:1901.01892, 2019.
  • T.-Y. Lin et al. Feature pyramid networks for object detection. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pages 2117–2125, 2017a.
  • T.-Y. Lin et al. Focal loss for dense object detection. In Proceedings of the IEEE international conference on computer vision, pages 2980–2988, 2017b.
  • W. Liu et al. Ssd:  Single shot multibox detector. In European conference on computer vision, pages 21–37. Springer, 2016.
  • X. Lu et al.Grid r-cnn.In The IEEE Conference on Computer Vision and Pattern Recognition (CVPR), June 2019.
  • A. Newell and J. Deng. Pixels to graphs by associative embedding. In Advances in neural information processing systems, pages 2171–2180, 2017.
  • A. Newell et al.Associative embedding:  End-to-end learning for joint detection and grouping. In Advances in Neural Information Processing Systems, pages 2277–2287, 2017.
  • J. Redmon and A. Farhadi. Yolo9000:  better, faster, stronger. In Proceedings of the IEEE conference on computer vision and pattern recognition , pages 7263–7271, 2017.
  • J. Redmon et al. You only look once:  Unified, real-time object detection. In Proceedings of the IEEE conference on computer vision and pattern recognition, pages 779–788, 2016.
  • S. Ren et al. Faster r-cnn:  Towards real-time object detection with region proposal networks. In Advances in neural information processing systems, pages 91–99, 2015. B. Singh et al.
  • Sniper:  Efficient multi-scale training.In Advances in Neural Information Processing Systems , pages 9310–9320, 2018.
  • Z. Tian et al.Fcos:  Fully convolutional one-stage object detection.arXiv preprint arXiv:1904.01355, 2019
  • J. Uijlings et al. Selective search for object recognition. International journal of computer vision , 104(2):154–171, 2013.
  • P. Viola and M. Jones. Rapid object detection using a boosted cascade of simple features. CVPR, 2001.
  • S. Zhang et al. Single-shot refinement neural network for object detection. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition , pages 4203–4212, 2018.
  • H. Zhou et al. Cad:  Scale invariant framework for real-time object detection. In The IEEE International Conference on Computer Vision (ICCV) Workshops , Oct 2017.
  • X. Zhou et al. Bottom-up object detection by grouping extreme and center points. In The IEEE Conference on Computer Vision and Pattern Recognition (CVPR), June 2019a.
  • X. Zhou et al. Objects as points. arXiv preprint arXiv:1904.07850 , 2019b
  • Z. Zou et al. Object detection in 20 years:  A survey. arXiv preprint arXiv:1905.05055, 2019

GraphNVP

Kosuke Nakago

2019-07-16 11:59:50

This post is contributed by Mr. Kaushalya Madhawa, who was an intern and a part-time engineer at PFN. Japanese version is available here.

In this post we introduce our recent paper “GraphNVP: An Invertible Flow Model for Generating Molecular Graphs“. Our code can be accessed from Github repo.

Molecule Generation

Discovery of new molecules with desirable pharmacological properties is a crucial problem in computational drug discovery. Traditionally, this task is performed by clinically synthesizing candidate chemical compounds and running experiments over them. However, due to the sheer size of chemical space, synthesizing molecules and extensively performing experiments on them is an extremely time consuming task. Instead of searching through the space of molecules with desirable properties, de novo drug design involves designing new chemical compounds with the properties that we are interested in.

Recent advancements in deep learning, especially deep generative models proved to be invaluable in de novo drug designing.

The choice of molecule representation

An important step in the application of deep learning on molecule generation is how chemical compounds are represented. Earlier models relied on a string-based representation named SMILES. RNN-based language models or Variational Autoencoders (VAE) are used to generate SMILES strings which are then converted to molecules. A major issue in using SMILES strings is that they are not robust to minor changes of a string, resulting in drastically different molecules although the corresponding SMILES strings are almost similar. These problems prompted recent researches to rely on more expressive graph representations of molecules. Therefore, this problem became to known as molecular graph generation.

A molecule is represented by an undirected graph, in which the atoms and bonds are represented nodes and edges respectively. The structure of a molecule is represented by an adjacency tensor \(A \) and a node feature matrix \(X\) is used to represent the type of atoms (e.g., Oxygen, Fluorine etc.). The molecule generation problem reduces to generation of graphs which can represent valid molecules. This is a problem in which deep generative models such as GANs or VAEs can be leveraged. We can classify previous work into two categories based on how they generate a graph. Some models generate molecular graphs sequentially such that nodes (atoms) and edges (bonds) are added in a step-by-step fashion. The alternative is straightforward, generate a graph in a single step in a similar manner to image generation models.

The importance of reversibility

A significant advantage of the invertible flow-based models is they perform precise likelihood maximization, unlike VAEs or GANs. We believe precise optimization is crucial in molecule generation for drugs as they are highly sensitive to a minor replacement of a single atom (node). An additional advantage of flow models is that, since they are invertible by design, perfect reconstruction is guaranteed and no time-consuming procedures are needed. Simply running the reverse step of the model on a latent vector results in a molecular graph. Moreover, the lack of an encoder in GAN models makes it challenging to manipulate the sample generation. For example, it is not straightforward to use a GAN model to generate molecules that are similar to a query molecule (e.g., lead optimization for drug discovery), while it is easy for flow-based models.

Our model

GraphNVP, our proposed model is shown above. GraphNVP is the first graph generation model based on the invertible flow which follows one-shot generation strategy. We introduce two latent representations, one for node assignments and another for the adjacency tensor, to capture the unknown distributions of the graph structure and its node assignments respectively. We use two new types of coupling layers: Adjacency Coupling and Node Feature Coupling for obtaining these two latent representations. During graph generation, first we generate an adjacency tensor and then the node feature tensor is generated using graph convolutional networks.

Qualitative results

We randomly select a molecule from the training set and encode it into a latent vector \(z_0\) using our proposed model. Then we choose two random axes which are orthogonal to each other. We decode latent points lying on a 2-dimensional grid spanned by those two axes and with \(z_0\) as the origin. The visualization below indicates that the learned latent space is smooth such that neighboring latent points correspond to molecules with minor variations.

zinc_neighborhood_2d

Comments from mentors

We, Nakago and Ishiguro, were responsible for mentor of Kaushalya. We started this research from 2018 summer internship. The research of deep graph generative models are getting attention, and many kinds of models are suggested at that time. However the model with Flow was still not suggested, and we started this research based on suggestion from Kaushalya.

It is first time application for graph generation, and model tend to need deeper layers for neural network with flow which requires large computation resource. It took some time to complete the research, but it was glad that we could publish the paper as well as the code finally.

Many projects are running in PFN, not only in “Drug Discovery / Material Discovery” but also in various kinds of fields. Please check our job list if you get interested!

Git Ghost: A command line tool to execute a program with local modifications without losing reproducibility

Daisuke Taniwaki

2019-05-13 08:00:17

Overview

We’re happy to open-source Git Ghost, which is developed by Shingo Omura and Daisuke Taniwaki. By using this tool, you can run ML jobs of your git-managed code with locally made modifications without losing reproducibility. You can go back to the code of a specific run anytime during the trial-and-error phase!

Motivation

Running one ML job for trial-and-error while waiting for other jobs is a very common use case. Before Git Ghost, the simplest way to do it was managing source code with git and using rsync to synchronize our source code with locally made modifications to run ML jobs in our Kubernetes cluster. Then, we realized we often want to revert the code back to a state when we got good results. However, although git-managed code provides versioning of your code, synchronizing code with rsync breaks this versioning because it does not make any versioning of the synchronized code, so it was hard to get back such code.

One idea we came up with first was just to commit local modifications and push it to a remote. However, it’s cumbersome to commit and push many times just to run a job with a modification of a few characters and of course, you don’t want to get your remote repository dirty. So we came up with the idea of this tool.

Usage

Assume you want to send a modification of content change from a to b on a file foo in your local machine to a directory in a remote server.

First, create a patch of the local modification.

$ git ghost push
xxxxxxx yyyyyyy
$ git ghost show yyyyyyy
diff --git a/foo b/foo
index 7898192..6178079 100644
--- a/foo
+++ b/foo
@@ -1 +1 @@
-a
+b

Then, you can sync the local modification in a remote server.

$ git ghost diff HEAD
$ git ghost pull yyyyyyy
$ git ghost diff HEAD
diff --git a/foo b/foo
index 7898192..6178079 100644
--- a/foo
+++ b/foo
@@ -1 +1 @@
-a
+b

There you go! You can see that the modifications in your local machine were synchronized to the remote server.

Although Git Ghost is a very simple tool as shown above, it performs brilliantly when it is integrated with other tools. For example, you can send modifications into a Kaniko container to build Docker images with local modifications. Here’s an example using Argo to execute a job with local modifications in a reproducible manner.

Architecture

The idea is simple. The tool creates a patch with your locally made commits and modifications with the information of a base commit existing in your remote repository and pushes it to another remote repository. Then, it downloads the base commit in a remote place and applies the patch. A small trick here is we separated patches of locally made commits and locally made modifications because with this separation, locally made modifications can be reused even after locally made commits are pushed to the remote repository.

The reason why we chose a git repository for the patch storage is that it doesn’t require extra tools and credentials.

Although we’re going to use this tool in a Kubernetes cluster, we believe using this tool is not limited to Kubernetes clusters. You can use it to send changes from your laptop to an on-premise server if you want to track changes.

Please try it and give us your feedback on GitHub!

k8s-cluster-simulator: A simulator for evaluating Kubernetes schedulers

Daisuke Taniwaki

2019-04-11 08:00:53

Overview

We’re happy to release an open source, Kubernetes cluster simulator, called k8s-cluster-simulator.  The simulator is in the alpha release, and was created by Hidehito Yabuuchi, a PFN internship student in 2018 and part-time employee, along with his mentors, Daisuke Taniwaki and Shingo Omura. This simulator simulates workloads of a Kubernetes cluster and time clock so you can evaluate your Kubernetes scheduler without actually deploying it in the production site.

Motivation

We have large on-premise GPU clusters, in which researchers run ML jobs of various running duration via Kubernetes. One of our goals is to maximize the utilization of the GPUs for cost-effectiveness while enabling all researchers to have reasonable access. To do this, we developed our own private Kubernetes scheduler and extender (e.g. kube-throttler). However, it’s hard to evaluate new logic in production, because researchers are running jobs, and we should not change the scheduling logic and fairness so often. Of course, we cannot deploy a buggy scheduler that stops the researchers work. Moreover, it is not desirable to stop research to test new scheduling logic in large clusters. Therefore, we started to develop a scheduler simulator for Kubernetes.

Design

We believe the simulator should have the following properties.

  • Require as few changes on scheduler’s implementation and interface as possible.
  • Simulate clock time to accelerate evaluations and also evaluate scheduling logics without being affected by system latencies such as network and internal processes.
  • Simulate workloads as flexibly as possible.
  • Support various output formats for further analysis.

Architecture

Here’s the simple flow diagram.

The idea is simple. The simulator simulates clocks and ticks the simulated clock at each step of the loop. At each step, the simulator asks submitters if they have pods which should be submitted or deleted in this clock, and schedule the submitted pods to scheduler. Scheduler returns bind and delete events so the simulator can simulate the resource management. Finally, the simulator writes metrics of simulation by metrics loggers.

And here’s the high-level class diagram.

We provide the following two points of customizations for scheduling simulations.

Submitters

Multiple users can be simulated by adding any number and combination of submitters, with time and number of pods submitted fully customizable through the simulator interface. For example, assume user A tends to submit more pods in the morning and user B tends to submit more pods in the evening. A submitter can be created for each user and plugged into the simulator.
Moreover, as submitters receive metrics from the simulator, they can change behaviors based on the state of a cluster, such as crowded or not.

Scheduler

You have two options for scheduler extensions, depending on the style of Kubernetes scheduler customization. The first scheduler extension mimics the normal Kubernetes scheduler (kube-scheduler), and can be extended with Prioritizer, Extender and Predicate. If you customize your scheduling logic by these kube-scheduler extension points, this is the best approach. As Kubernetes scheduler is a queue-based scheduler, you may want to implement more complicated scheduling logic that doesn’t fit a queue based scheduler, for example, scheduling a new set of pods immediately after receiving multiple pod submissions. For this case, we provide an option to evaluate a scheduler with the interface defined in Kubernetes with a thin wrapper function.

Roadmap

We’re implementing the following features before the beta phase to support more realistic cluster environments simulations.

  • More isolation between components (e.g. supporting RPC interface for a scheduler and submitter)
  • Provide common submitter implementations (e.g. typical probabilistic distributions(Uniform, Binomial, Poisson, etc.))
  • Support various cluster events (node failures, accidental pods failures, node addition/removal, etc.)
  • Support plottable output formats in popular plotter tools (matplotlib, gnuplot etc.)

Please try it out! We’re waiting for your feedback!

ChainerRL Visualizer: Deep RL Agent Visualization Library

ofk

2019-03-19 12:40:46

This post is contributed by Mr. Takahiro Ishikawa, who was an intern last year and now works as a part-time engineer at PFN.

We have released ChainerRL Visualizer, which visualizes the behaviors of agents trained by ChainerRL on the Web browser.

My name is Takahiro Ishikawa, who participated in PFN 2018 internship and currently work as a part-time engineer.

This library is developed in the aim of “making debugging of deep RL implementations easier” and “contributing to understanding of how deep RL agents work”.
It enables to interactively observe the behaviors of trained deep RL agents on the Web browser.
This library is easy to use. All you have to do is to pass the `agent` object implemented in ChainerRL and the `env` object that satisfies a specific interface to the `launch visualizer` function provided by this library, along with a few options.

from chainerrl_visualizer import launch_visualizer

# Prepare agent and env object here
#

# Prepare dictionary which explains meanings of each action
ACTION_MEANINGS = {
  0: 'hoge',
  1: 'fuga',
  ...
}

launch_visualizer(
    agent,                           # required
    env,                             # required
    ACTION_MEANINGS,                 # required
    port=5002,                       # optional (default: 5002)
    log_dir='log_space',             # optional (default: 'log_space')
    raw_image_input=False,           # optional (default: False)
    contains_rnn=False,              # optional (default: False)
)

After executing this script, a local Web server will be launched and the following features will be provided on the Web browser.

1. Roll-out one episode (or specified steps)

You can tell the agent to run one episode (or specified steps) from the UI, then the outputs of the agent model will be visualized in chronological order.
In the following video, the probabilities of the next action and the state value of the agent trained with A3C are visualized.

2. Tick timestep and visualize the behaviors of environment and agent

In the following video, the agent can be moved back and forth and the outputs in each step are visualized along with the behavior of the environment. The pie chart on bottom-right shows the probabilities of the next action of each step.

3. Saliency map

If the input of the model is raw pixels, the UI can visualize saliency map, which shows the specific sub-area to which the agent pays attention. This feature is implemented based on the paper Visualizing and Understanding Atari Agents.
In the following video, saliency maps of the agent trained with CategoricalDQN are visualized over the image of the environment.
For now, this feature allows us to specify the number of steps for which saliency maps are created because the computational cost of creating saliency maps is very expensive.

4. Miscellaneous visualizations

Various ways of visualization for each type of agent are supported.
For example, the value distributions of the agent trained with CategoricalDQN are visualized in the following video.

Quickstart guides are provided.

For now, almost all of the visualization tools in deep learning have focused on visualizing scores and other metrics along with the progress of the model training. ChainerRL Visualizer is the original visualizer among them as it can interactively and dynamically visualize the behaviors of the deep RL agent and environment themselves.

Atari Zoo, which was released by Uber Research, is developed for a similar purpose. Atari Zoo aims at accelerating research for understanding deep RL agent, providing trained models, and analyzing tools for the frozen models. It enables researchers to participate in the research for understanding deep RL agent even if they don’t have enough computing resources.

ChainerRL Visualizer is different from Atari Zoo in the sense that “all” kinds of agents in ChainerRL can be dynamically analyzed during training by the visualizer while Atari Zoo is only for visualizing already-trained models in the repository and those models are limited to the ALE environments and specific algorithms where the architecture looks like ( raw image => Conv => Conv .. => FC .. ).

There are also other visualization tools with similar motivations, such as DQNViz that visualizes the behaviors of DQN agents and its various metrics during the training.

Though much effort has been dedicated to improve the performance of deep RL algorithms on benchmark tasks, less effort has been paid for understanding what deep RL agents learn by deep RL algorithm, and analyzing how the trained agents behave. However, the research for understanding deep RL agent as seen in the visualizations above will expand in the future.

ChainerRL Visualizer is now in beta version and still does not have sufficient features for deeply analyzing deep RL agents. So continuous development is needed in order to contribute to the emerging research area of understanding deep RL agent. We welcome you to participate in the development of ChainerRL Visualizer to add new features or to improve existing features through OSS collaboration.

Experimental toolchain to compile and run Chainer models

Shinichiro Hamaji

2019-01-25 16:19:34

Hello, my name is Shinichiro Hamaji, an engineer at Preferred Networks. I would like to introduce an experimental project named Chainer-compiler today. Although not ready for end users, we are making it publicly available with the hope that others may find it interesting or useful for research purposes in its current state.

https://github.com/pfnet-research/chainer-compiler

Late last year, Preferred Networks release a beta version of ChainerX. The three goals of ChainerX were

  • Optimize the speed of running models
  • Make models deployable to environment without Python
  • Make it easier to port models to non-CPU/GPU environments

while keeping the flexibility of Chainer. The goal of chainer-compiler is to go further with ChainerX. Currently, it has the following three main components:

  1. Translate Python AST to extract computation graphs as extended ONNX format.
  2. Modify the extended ONNX graph for optimization, auto-differentiation, etc. It then generates deployable code.
  3. Run the exported code with ChainerX’s C++ API.

Here are expected use-cases of this project:

  • Unlike the imperative model of Chainer/CuPy/ChainerX (a.k.a. define-by-run), the first step extracts a computation graph with multiple operations so that it gives a chance to apply inter-operation optimization techniques such as operation fusion.
  • By running the step 1 and 2 on the host machine and deploying only 3, you can easily deploy your model to Python-free environments.
  • If you add targets of the code generator, your model can run with/on optimized model executor or domain specific chips like MN-Core.
  • By using the step 2 and 3, you can run ONNX models generated by other tools such as ONNX-chainer.

Other than the above, we would like to continue conducting experimental research.

Like other areas around deep learning, many people are competing for deep learning compilers. They have different strengths and focuses, which makes research on a deep learning compiler very interesting, in my opinion. In this project, we are trying not to hurt the flexibility of Chainer. For that reason, the toolchain does not assume that the model is static and can handle tensors without static dimensions, control-flow primitives of Python, and Python lists. This could be one of the unique strengths of this toolchain.

In this article, we have introduced chainer-compiler, an experimental project which compiles and runs Chainer models. We still have a huge number of TODOs but they are challenging and fun to work on. If you are interested in working with us, please consider applying to Preferred Networks. Any questions or feedbacks are really appreciated.

Lastly, I would like to thank everyone who helped us. I especially would like to thank Sato-san, an intern who realized the Python code to ONNX compiler.

Optuna: An Automatic Hyperparameter Optimization Framework

Takuya Akiba

2018-12-03 14:00:28

Preferred Networks has released a beta version of an open-source, automatic hyperparameter optimization framework called Optuna. In this blog, we will introduce the motivation behind the development of Optuna as well as its features.

 

optuna-logo.png

 

What is a hyperparameter?

A hyperparameter is a parameter to control how a machine learning algorithm behaves. In deep learning, the learning rate, batch size, and number of training iterations are hyperparameters. Hyperparameters also include the numbers of neural network layers and channels. They are not, however, just numerical values. Things like whether to use Momentum SGD or Adam in training are also regarded as hyperparameters.

It is almost impossible to make a machine learning algorithm do the job without tuning hyperparameters. The number of hyperparameters tends to be high, especially in deep learning, and it is believed that performance largely depends on how we tune them. Most researchers and engineers that use deep learning technology manually tune these hyperparameters and spend a significant amount of their time doing so.

What is Optuna?

Optuna is a software framework for automating the optimization process of these hyperparameters. It automatically searches for and finds optimal hyperparameter values by trial and error for excellent performance. Currently, the software can be used in Python.

Optuna uses a history record of trials to determine which hyperparameter values to try next. Using this data, it estimates a promising area and tries values in that area. Optuna then estimates an even more promising region based on the new result. It repeats this process using the history data of trials completed thus far. Specifically, it employs a Bayesian optimization algorithm called Tree-structured Parzen Estimator.

What is its relationship with Chainer?

Chainer is a deep learning framework and Optuna is an automatic hyperparameter optimization framework. To optimize hyperparameters for training a neural network using Chainer, the user needs to write a code for receiving hyperparameters from Optuna within the Chainer code. Given this code, Optuna repeatedly calls the user code, and the neural network is trained with different hyperparameter values until it finds good values.

Optuna is being used with Chainer in most of the use cases at PFN, but this does not mean Optuna and Chainer are closely connected with each other.  Users can use Optuna with other machine learning software as well. We have prepared some sample codes that use scikit-learn, XGBoost, and LightGBM as well as Chainer. In fact, Optuna can cover a broad range of use cases beyond machine learning, like acceleration, providing an interface that receives hyperparameters and returns evaluation values, for instance.

Why did PFN develop Optuna?

Why did we develop Optuna even though there were already established automatic hyperparameter optimization frameworks like Hyperopt, Spearmint, and SMAC?

When we tried the existing alternatives, we found that they did not work or were unstable in some of our environments, and that the algorithms had lagged behind recent advances in hyperparameter optimization. We wanted a way to specify which hyperparameters should be tuned within the python code, instead of having to write separate code for the optimizer.

Key Features

Define-by-Run style API

Optuna provides a novel Define-by-Run style API that enables the user to optimize hyperparameters, even if the user code is complex, while maintaining higher modularity than other frameworks. It can also optimize hyperparameters in a complex space like no other framework could express before.

There are two paradigms in deep learning frameworks: Define-and-Run and Define-by-Run. In the early days, Caffe and other Define-and-Run frameworks were dominant players. Then, PFN-developed Chainer appeared as the first advocate of the Define-by-Run paradigm, followed by the release of PyTorch, and later, eager mode becoming the default in TensorFlow 2.0. Now the Define-by-Run paradigm is well recognized and appears to be gaining momentum to become the standard.

Is the Define-by-Run paradigm useful only in the domain of deep learning frameworks? We came to understand that we could apply a similar approach to automatic hyperparameter optimization frameworks as well. Under this approach, all existing automatic hyperparameter optimization frameworks are classified as Define-and-Run. Optuna, on the other hand, is based on the Define-by-Run concept and provides users with a new style of API that is very different from other frameworks. This has made it possible to give high modularity to a user program and access to complex hyperparameter spaces, among other things.

Pruning of trials using learning curves

When iterative algorithms like deep learning and gradient boosting are used, rough prediction on end results of training can be made from the learning curve. Using these predictions, Optuna can halt unpromising trials before the training is over. This is the pruning feature of Optuna.

Existing frameworks such as Hyperopt, Spearmint, and SMAC do not have this functionality. Recent studies show that the pruning technique using learning curves is highly effective.  The following graph indicates its effectiveness in performing a sample deep learning task. While the optimization engines of both Optuna and Hyperopt utilize the same TPE, thanks to pruning, the optimization performed by Optuna is more efficient.

Pruning_of_unpromising_trials.png

Parallel distributed optimization

Deep learning is computationally intensive, and each training process is very time-consuming. Therefore, for automatic hyperparameter optimization in practical use cases, it is essential that the user can easily use parallel distributed optimization that is efficient and stable. Optuna supports asynchronous distributed optimization which simultaneously performs multiple trials using multiple nodes. Parallelization can make the optimization process even faster as shown in the following figure. In the below example, we changed the number of workers from 1, 2, 4, to 8, confirming that the parallelization has accelerated the optimization.

Optuna also has a functionality to work with ChainerMN, allowing the user to optimize training that requires distributed processing without difficulty. By making use of a combination of these functionalities, the user can execute objective functions that include distributed processing in a parallel, distributed manner.

Parallel_distributed_optimization

Visualized optimization on dashboard (under development)

Optuna has a dashboard that provides a visualized display of the optimization process. With this, the user can obtain useful information from experimental results. The dashboard can be accessed by connecting via a Web browser to an HTTP server which can be started by one command. Optuna also has a functionality to export optimization processes in a pandas dataframe, for systematic analysis.

1.png

Conclusions

Optuna is already in use by several projects at PFN. Among them is the project to compete in the Open Images Challenge 2018, in which we finished in second place. We will continue to aggressively develop Optuna to improve its integrity as well as prototyping and implementing advanced functionalities. We believe Optuna is ready for use, so we would love to receive your candid feedback.

Our objective is to speed up deep learning related R&D activities as much as possible. Our effort into automatic hyperparameter optimization is an important step toward this end. Additionally, we have begun working on other important technologies such as neural architecture search and automatic feature extraction. PFN is looking for potential full-time members and interns who are enthusiastic about working with us in these fields and activities.

New HCI group + upcoming papers and demos at UIST and ISS 2018

Fabrice Matulic

2018-10-15 09:11:17

Creation of HCI group

At PFN, we aspire to create next-generation “intelligent” systems and services, powered by cutting-edge AI technology, but we also recognise that humans will remain essential actors in the design and usage of such systems and therefore it is paramount to think about how the dialogue occurs. Human-Computer Interaction (HCI) approaches, which focus on bridging the gap between people and machines, can considerably contribute to improving intricate machine-learning processes requiring human intervention. With the creation of a dedicated HCI group at PFN, we aim to advance user-centred design for AI and machines and make sure the “humans in the loop” are supported with powerful tools when working with such systems.

Broadly, there are three main lines of research that the team would like to pursue:

  • HCI for machine learning: Utilise HCI methods to facilitate complex or tedious machine-learning processes in which people are involved (such as data gathering, labelling, pre-processing, augmentation; neural network engineering, deployment, and management)
  • Machine-learning for HCI: Use deep learning to enhance existing or enable new interaction techniques (e.g. advanced gesture recognition, activity recognition, multimodal input, sensor fusion, embodied interaction, collaboration between AI, robots and humans, generative model to create interactive content etc.)
  • Human-Robot Interaction (HRI): Make communication and interaction between smart robots and their users natural, intuitive and hopefully even fun!

Of course, HCI does not necessarily involve machine learning or robots and we are also generally interested in creating novel and exciting interactive experiences.

The HCI group will benefit from the expertise of Prof. Takeo Igarashi, of The University of Tokyo, who has been hired as an external consultant. In addition to his wide experience in HCI and HRI, Prof. Igarashi has recently started a JST CREST project on “HCI for machine learning” at his lab, which very much aligns with our research interests. We look forward to a long and fruitful collaboration.

Papers and demos at UIST and ISS 2018

Although the group was just officially created, we have been active in HCI research for the past months already and we will present two papers on recent work, respectively at UIST this week and ISS next month.

The first project, which was started at the University of Waterloo with Drini Cami and Prof. Dan Vogel, proposes to use different ways of holding a stylus pen while writing on a tablet to trigger different UI actions. The technique uses machine learning on the raw touch input data to detect these different pen grips when the user contacts the surface with the hand. The advantage of our technique is that it allows to rapidly switch between various pen modes using the same hand that writes and without resorting to cumbersome UI widgets.

In addition to the paper presentation, Drini will also be showing the technique at UIST’s popular demo session.

Our second contribution is the interactive projection mapping system for PaintsChainer that we showed at the Winter Comiket last year. For those of you who missed it, ColourAIze (which is how we call it in the paper) works directly with drawings and art on paper. Specifically, it projects colour fills determined by PaintsChainer directly onto the paper drawing with the colouring superimposed on the line art. Like with the web version of PaintsChainer, the ability to specify local colour hints to influence the colourisation is supported through simple (digital) pen strokes.

As with the pen-posture project above, we will both present our paper and do a demo of the system at the conference. If you’d like to try the fun experience of having your paper sketches, drawings and mangas coloured by AI, come and see us at ISS in Tokyo in November!

Last but not least, we are looking for talented HCI researchers to join our team, so if you think you can contribute in the areas mentioned above, please check the details of the position on our jobs page and apply!

 

[PFN Day] BoF session: How to Improve Sharing of Software Components and Best Practices

tgpfeiffer

2018-08-24 12:54:26

Hello, my name is Tobias Pfeiffer, I am a Lead Engineer at Preferred Networks. On July 24, 2018, PFN held an internal tech conference called “PFN Day”, in which I hosted a Birds-of-a-Feather session named “How to Improve Sharing of Software Components and Best Practices”. I want to give a short report on what we did in that session and what the outcomes were.

Reusability of both code and best practices is one of the core goals of software engineering – some people would go as far as saying that achieving a high level of reusability is the sole reason why software engineering exists. Avoiding to reinvent the wheel, fighting the “not-invented-here syndrome” and understanding the real cost of rewriting something from scratch as opposed to learning how to use an existing code base is essential if time is to be spent on developing truly new functionality.

In a company such as PFN that develops highly specialized code based on latest research results, it is all the more important that work that can not be regarded as something that “only PFN can do” is minimized and shared as much as possible. However, in a rapidly growing organization it is often difficult to keep an overview of which member of which group may have already implemented the same functionality before or who to ask with help on a certain problem. Under the impression that PFN has still room to improve when it comes to code reuse I decided to hold a session about that topic and asked people from various projects to join.

First everyone contributed with a number of examples of efforts from their respective teams that seem like duplicate work. One person reported that wrappers for a certain library to work with a certain data format are written again and again by different members because they don’t know this work has been done already. Someone else reported that some algorithms to solve a certain task in computer vision are implemented in many teams separately, simply because nobody who did so made the step to package that code into a reusable unit. Also, maybe more an example of sharing best practices rather than program code, it was said that getting started with Continuous Integration (CI) systems is difficult because it takes a lot of time initially, and special requirements like hardware-specific code (GPU, Infiniband, …) or nested Docker containers are complex to set up.

In a next step we thought about reasons for a low level of reuse and possible impediments. We found that on the producer side, when someone has implemented a new functionality that could be useful for other members, hurdles might be that that person may not have the knowledge of how to package code, that it is not trivial to design a good interface for a software library, and that it takes time to extract the core functionality, package the code, and write usable documentation; this time can then not be used for further development and also the effort does not come with an immediate benefit.

On the consumer side, problems might be that it is not easy to find code that may implement a required functionality in the internal repositories, that copy/paste combined with editing individual lines of code often provides a faster solution than using code as a library and contribute missing functionality back upstream, and that it is hard to judge on the quality of code found “somewhere” in an internal repository.

So we concluded that in order to increase code reuse we should try to (1) increase the motivation to share code and (2) lower the technical barriers to both creating and using shared code. In concrete terms, we thought that the following steps might be a good approach, starting with Python developers in mind:

  • Create an internal PyPI server for code that cannot be released as open source code.
  • Create a template repository with skeleton files that encode the best practices for sharing Python code as a library. For example, there could be a setup.py template, a basic documentation template, and CI settings that run all the tests, build the documentation, and publish to the internal package server.
  • Form an expert group that can assist with any questions related to packaging, teach best practices, and that can perform some QA tasks.
  • Create an overview of which internal libraries exist for which purpose, where people can add their libraries. This listing could also show some metric of how many people are using a particular library, serving as a motivation for the author and a quality indicator for potential users.

As the host of that 80-minute long session I was happy about the deep and focused discussion that we had. With members from different teams it was possible to get input from various viewpoints and also learn about issues that are specific to certain work styles. I think we developed some concrete and actionable ideas and we will also try to follow-up and actually realize these ideas after PFN Day to increase code reuse and improve across all of our teams.