[PFN Day] BoF session: How to Improve Sharing of Software Components and Best Practices

tgpfeiffer

2018-08-24 12:54:26

Hello, my name is Tobias Pfeiffer, I am a Lead Engineer at Preferred Networks. On July 24, 2018, PFN held an internal tech conference called “PFN Day”, in which I hosted a Birds-of-a-Feather session named “How to Improve Sharing of Software Components and Best Practices”. I want to give a short report on what we did in that session and what the outcomes were.

Reusability of both code and best practices is one of the core goals of software engineering – some people would go as far as saying that achieving a high level of reusability is the sole reason why software engineering exists. Avoiding to reinvent the wheel, fighting the “not-invented-here syndrome” and understanding the real cost of rewriting something from scratch as opposed to learning how to use an existing code base is essential if time is to be spent on developing truly new functionality.

In a company such as PFN that develops highly specialized code based on latest research results, it is all the more important that work that can not be regarded as something that “only PFN can do” is minimized and shared as much as possible. However, in a rapidly growing organization it is often difficult to keep an overview of which member of which group may have already implemented the same functionality before or who to ask with help on a certain problem. Under the impression that PFN has still room to improve when it comes to code reuse I decided to hold a session about that topic and asked people from various projects to join.

First everyone contributed with a number of examples of efforts from their respective teams that seem like duplicate work. One person reported that wrappers for a certain library to work with a certain data format are written again and again by different members because they don’t know this work has been done already. Someone else reported that some algorithms to solve a certain task in computer vision are implemented in many teams separately, simply because nobody who did so made the step to package that code into a reusable unit. Also, maybe more an example of sharing best practices rather than program code, it was said that getting started with Continuous Integration (CI) systems is difficult because it takes a lot of time initially, and special requirements like hardware-specific code (GPU, Infiniband, …) or nested Docker containers are complex to set up.

In a next step we thought about reasons for a low level of reuse and possible impediments. We found that on the producer side, when someone has implemented a new functionality that could be useful for other members, hurdles might be that that person may not have the knowledge of how to package code, that it is not trivial to design a good interface for a software library, and that it takes time to extract the core functionality, package the code, and write usable documentation; this time can then not be used for further development and also the effort does not come with an immediate benefit.

On the consumer side, problems might be that it is not easy to find code that may implement a required functionality in the internal repositories, that copy/paste combined with editing individual lines of code often provides a faster solution than using code as a library and contribute missing functionality back upstream, and that it is hard to judge on the quality of code found “somewhere” in an internal repository.

So we concluded that in order to increase code reuse we should try to (1) increase the motivation to share code and (2) lower the technical barriers to both creating and using shared code. In concrete terms, we thought that the following steps might be a good approach, starting with Python developers in mind:

  • Create an internal PyPI server for code that cannot be released as open source code.
  • Create a template repository with skeleton files that encode the best practices for sharing Python code as a library. For example, there could be a setup.py template, a basic documentation template, and CI settings that run all the tests, build the documentation, and publish to the internal package server.
  • Form an expert group that can assist with any questions related to packaging, teach best practices, and that can perform some QA tasks.
  • Create an overview of which internal libraries exist for which purpose, where people can add their libraries. This listing could also show some metric of how many people are using a particular library, serving as a motivation for the author and a quality indicator for potential users.

As the host of that 80-minute long session I was happy about the deep and focused discussion that we had. With members from different teams it was possible to get input from various viewpoints and also learn about issues that are specific to certain work styles. I think we developed some concrete and actionable ideas and we will also try to follow-up and actually realize these ideas after PFN Day to increase code reuse and improve across all of our teams.