Git Ghost: A command line tool to execute a program with local modifications without losing reproducibility

Daisuke Taniwaki

2019-05-13 08:00:17

Overview

We’re happy to open-source Git Ghost, which is developed by Shingo Omura and Daisuke Taniwaki. By using this tool, you can run ML jobs of your git-managed code with locally made modifications without losing reproducibility. You can go back to the code of a specific run anytime during the trial-and-error phase!

Motivation

Running one ML job for trial-and-error while waiting for other jobs is a very common use case. Before Git Ghost, the simplest way to do it was managing source code with git and using rsync to synchronize our source code with locally made modifications to run ML jobs in our Kubernetes cluster. Then, we realized we often want to revert the code back to a state when we got good results. However, although git-managed code provides versioning of your code, synchronizing code with rsync breaks this versioning because it does not make any versioning of the synchronized code, so it was hard to get back such code.

One idea we came up with first was just to commit local modifications and push it to a remote. However, it’s cumbersome to commit and push many times just to run a job with a modification of a few characters and of course, you don’t want to get your remote repository dirty. So we came up with the idea of this tool.

Usage

Assume you want to send a modification of content change from a to b on a file foo in your local machine to a directory in a remote server.

First, create a patch of the local modification.

$ git ghost push
xxxxxxx yyyyyyy
$ git ghost show yyyyyyy
diff --git a/foo b/foo
index 7898192..6178079 100644
--- a/foo
+++ b/foo
@@ -1 +1 @@
-a
+b

Then, you can sync the local modification in a remote server.

$ git ghost diff HEAD
$ git ghost pull yyyyyyy
$ git ghost diff HEAD
diff --git a/foo b/foo
index 7898192..6178079 100644
--- a/foo
+++ b/foo
@@ -1 +1 @@
-a
+b

There you go! You can see that the modifications in your local machine were synchronized to the remote server.

Although Git Ghost is a very simple tool as shown above, it performs brilliantly when it is integrated with other tools. For example, you can send modifications into a Kaniko container to build Docker images with local modifications. Here’s an example using Argo to execute a job with local modifications in a reproducible manner.

Architecture

The idea is simple. The tool creates a patch with your locally made commits and modifications with the information of a base commit existing in your remote repository and pushes it to another remote repository. Then, it downloads the base commit in a remote place and applies the patch. A small trick here is we separated patches of locally made commits and locally made modifications because with this separation, locally made modifications can be reused even after locally made commits are pushed to the remote repository.

The reason why we chose a git repository for the patch storage is that it doesn’t require extra tools and credentials.

Although we’re going to use this tool in a Kubernetes cluster, we believe using this tool is not limited to Kubernetes clusters. You can use it to send changes from your laptop to an on-premise server if you want to track changes.

Please try it and give us your feedback on GitHub!

k8s-cluster-simulator: A simulator for evaluating Kubernetes schedulers

Daisuke Taniwaki

2019-04-11 08:00:53

Overview

We’re happy to release an open source, Kubernetes cluster simulator, called k8s-cluster-simulator.  The simulator is in the alpha release, and was created by Hidehito Yabuuchi, a PFN internship student in 2018 and part-time employee, along with his mentors, Daisuke Taniwaki and Shingo Omura. This simulator simulates workloads of a Kubernetes cluster and time clock so you can evaluate your Kubernetes scheduler without actually deploying it in the production site.

Motivation

We have large on-premise GPU clusters, in which researchers run ML jobs of various running duration via Kubernetes. One of our goals is to maximize the utilization of the GPUs for cost-effectiveness while enabling all researchers to have reasonable access. To do this, we developed our own private Kubernetes scheduler and extender (e.g. kube-throttler). However, it’s hard to evaluate new logic in production, because researchers are running jobs, and we should not change the scheduling logic and fairness so often. Of course, we cannot deploy a buggy scheduler that stops the researchers work. Moreover, it is not desirable to stop research to test new scheduling logic in large clusters. Therefore, we started to develop a scheduler simulator for Kubernetes.

Design

We believe the simulator should have the following properties.

  • Require as few changes on scheduler’s implementation and interface as possible.
  • Simulate clock time to accelerate evaluations and also evaluate scheduling logics without being affected by system latencies such as network and internal processes.
  • Simulate workloads as flexibly as possible.
  • Support various output formats for further analysis.

Architecture

Here’s the simple flow diagram.

The idea is simple. The simulator simulates clocks and ticks the simulated clock at each step of the loop. At each step, the simulator asks submitters if they have pods which should be submitted or deleted in this clock, and schedule the submitted pods to scheduler. Scheduler returns bind and delete events so the simulator can simulate the resource management. Finally, the simulator writes metrics of simulation by metrics loggers.

And here’s the high-level class diagram.

We provide the following two points of customizations for scheduling simulations.

Submitters

Multiple users can be simulated by adding any number and combination of submitters, with time and number of pods submitted fully customizable through the simulator interface. For example, assume user A tends to submit more pods in the morning and user B tends to submit more pods in the evening. A submitter can be created for each user and plugged into the simulator.
Moreover, as submitters receive metrics from the simulator, they can change behaviors based on the state of a cluster, such as crowded or not.

Scheduler

You have two options for scheduler extensions, depending on the style of Kubernetes scheduler customization. The first scheduler extension mimics the normal Kubernetes scheduler (kube-scheduler), and can be extended with Prioritizer, Extender and Predicate. If you customize your scheduling logic by these kube-scheduler extension points, this is the best approach. As Kubernetes scheduler is a queue-based scheduler, you may want to implement more complicated scheduling logic that doesn’t fit a queue based scheduler, for example, scheduling a new set of pods immediately after receiving multiple pod submissions. For this case, we provide an option to evaluate a scheduler with the interface defined in Kubernetes with a thin wrapper function.

Roadmap

We’re implementing the following features before the beta phase to support more realistic cluster environments simulations.

  • More isolation between components (e.g. supporting RPC interface for a scheduler and submitter)
  • Provide common submitter implementations (e.g. typical probabilistic distributions(Uniform, Binomial, Poisson, etc.))
  • Support various cluster events (node failures, accidental pods failures, node addition/removal, etc.)
  • Support plottable output formats in popular plotter tools (matplotlib, gnuplot etc.)

Please try it out! We’re waiting for your feedback!