Visualizing the Taskcluster Index

2015-09-15 by Mike Shal, tagged as graphing, indexing, mozharness, mozilla, taskcluster

I've been working on trying to organize the Taskcluster index in bug 1133074. The index is analogous to the directory structure that we have on ftp.mozilla.org, but it is much easier to organize things based on how we'd like to access them. In FTP, our upload scripts actually ssh into the ftp server and copy files around into the layout we want. But with Taskcluster indexing, we can just add a new route and now we can organize things by revision, platform, branch, or however we'd like.

However, the sheer amount of things we build still makes this tricky. I pulled down the build logs for all the mozilla-central builds I could find, and ended up with 106 builds from buildbot/mozharness (including nightlies and l10n), and 16 builds from Taskcluster. That's a lot of routes to get organized into a usable hierarchy!

Although I have no delusions that the current set of routes are in any sense final, I wanted to at least get a reasonable and usable first-pass attempt at them. This is hard to do just by guessing at a possible route name and testing it on a few platforms - there are many corner cases to handle. For example, mozharness' internal config uses "macosx64" as the platform name both for an OSX desktop build and an OSX mulet build. If you just use the platform name as the key, multiple things will end up routing to the same place. Similarly, a PGO build isn't a separate platform, but rather it looks exactly like a regular build with an extra "--enable-pgo" flag. This flag must be accounted for somehow in the routes, otherwise PGO builds will be intermixed with their non-PGO counterparts.

By adding some debug to mozharness and taskcluster, I was able to dump out the list of potential routes for all build types. This results in a huge list like this:

Android 4.2 x86 mozilla-central build: ['python', '/home/mshal/buildbot/mozharness/scripts/fx_desktop_build.py', '--config', 'builds/releng_base_android_64_builds.py', '--custom-build-variant-cfg', 'x86', '--config', 'balrog/production.py', '--branch', 'mozilla-central', '--build-pool', 'production']
 +  index.gecko.v2.mozilla-central.revision.abcdef12345.mobile.android-x86-opt
 +  index.gecko.v2.mozilla-central.latest.mobile.android-x86-opt

Android 4.2 x86 mozilla-central nightly: ['python', '/home/mshal/buildbot/mozharness/scripts/fx_desktop_build.py', '--config', 'builds/releng_base_android_64_builds.py', '--custom-build-variant-cfg', 'x86', '--config', 'balrog/production.py', '--branch', 'mozilla-central', '--build-pool', 'production', '--enable-nightly']
 +  index.gecko.v2.mozilla-central.nightly.2015.07.14.revision.abcdef12345.mobile.android-x86-opt
 +  index.gecko.v2.mozilla-central.nightly.2015.07.14.latest.mobile.android-x86-opt
 +  index.gecko.v2.mozilla-central.nightly.revision.abcdef12345.mobile.android-x86-opt
 +  index.gecko.v2.mozilla-central.nightly.latest.mobile.android-x86-opt

... (hundreds more)

While this is a useful way to show what the routes for each individual build may look like, it doesn't help for getting an idea of what the index will actually look like when used since the list is so huge. For example, we may want to ensure that each level in the route hierarchy contains a reasonable number of sub-routes (ie: not too many so that it's hard to browse, and not just a single route that requires a user to click more than necessary). To get this view, it helps to combine the list of routes into a tree format. For example, the routes for all the Linux builds would look like this:

+  index
  + gecko
    + v2
      + mozilla-central
        + latest
          + firefox
            + linux-debug
            + linux-opt
            + linux-pgo
            + linux64-asan
            + linux64-asan-debug
            + linux64-br-haz
            + linux64-debug
            + linux64-opt
            + linux64-pgo
            + linux64-st-an-debug
        + nightly
          + 2015
            + 07
              + 14
                + latest
                  + firefox
                    + linux-opt
                    + linux64-asan
                    + linux64-asan-debug
                    + linux64-opt
                + revision
                  + abcdef12345
                    + firefox
                      + linux-opt
                      + linux64-asan
                      + linux64-asan-debug
                      + linux64-opt
          + latest
            + firefox
              + linux-opt
              + linux64-asan
              + linux64-asan-debug
              + linux64-opt
          + revision
            + abcdef12345
              + firefox
                + linux-opt
                + linux64-asan
                + linux64-asan-debug
                + linux64-opt
        + revision
          + abcdef12345
            + firefox
              + linux-debug
              + linux-opt
              + linux-pgo
              + linux64-asan
              + linux64-asan-debug
              + linux64-br-haz
              + linux64-debug
              + linux64-opt
              + linux64-pgo
              + linux64-st-an-debug

However, the actual tree is quite large and it's still hard to see the overall structure. Even just the Linux builds produces a graph that is too complicated:

Fortunately we visited this problem earlier in the Combining Nodes in Graphviz post. We can make up some rules on what nodes in the tree are "similar" and group them. As a first step, we can combine nodes that only have one child. For example, we can transform the tree on the left into the one on the right:

+  index
  + gecko
    + v2
      + ...

+  index.gecko.v2
  + ...

This starts to tidy up the graph, but it's still a mess:

Next, we can combine sibling nodes that have the same subtrees. Leaf nodes are a simple example of this - they have no subtrees, so we can easily say that they are similar to their neighbors. This results in the following transformation:

+  firefox
   + linux-debug
   + linux-opt
   + linux-pgo
   + linux64-asan
   + linux64-asan-debug
   + linux64-br-haz
   + linux64-debug
   + linux64-opt
   + linux64-pgo
   + linux64-st-an-debug

+  firefox
   + linux-debug, linux-opt, linux-pgo, ...

This also works for non-leaf nodes - if two nodes have the same subtrees (ie: the same children, grandchildren, etc all the way to the leaf nodes), then they can be combined:

+  nightly
  + latest.firefox
    + linux-opt
    + linux64-asan
    + linux64-asan-debug
    + linux64-opt
  + revision.abcdef12345.firefox
    + linux-opt
    + linux64-asan
    + linux64-asan-debug
    + linux64-opt

+  nightly
  + latest.firefox, revision.abcdef12345.firefox
    + linux-opt, linux64-asan, linux64-asan-debug, linux64-opt

Now the graph of Linux routes is actually somewhat visible:

Adding in the rest of the routes still produces a somewhat readable graph (this graph has just 418 in total... I didn't even add all the locales, and it only has a single date and revision!)

There are still some optimizations that could be performed, such as combining the 'latest.firefox / revision.abcdef12345.firefox' subtrees under 'nightly' and '2015.07.14', but for me it was good enough to get an idea of what browsing through the index would look like before landing it. Most of the builds are now indexed, so take a look at the gecko.v2 namespace and browse around!

Future Work

In a future post, I'd like to chat a bit about how best to query the index, grab artifacts from it, and how to use it in combination with pushlog. We're still working through all the details though, so that will have to wait for another post. Thoughts & patches are always welcome!

Mike Shal's Blog

Visualizing the Taskcluster Index

Future Work

About

Categories

Archive

Links

Quotes