Megazine: A stratified social links dashboard

January 03, 2012 by Tim Cuthbertson

Note: We've upgraded megazine to use the latest StratifiedJS (0.14) since this article was originally posted. The code snippets in this article have been updated accordingly.

To celebrate the release of apollo 0.13, we've put together a small app to show the latest links from various news sources, including:

  • the front page of hacker news,
  • your twitter feed, (twitter has since shut down the @anywhere API)
  • and any RSS feed you want

It's called megazine, and you can try it out here. It loads links from the relevant news source, grabs content from the links such as page summaries and important images, then displays an easily-scannable view of the most recent links. Here's a screenshot:

megazine screenshot

The underlying APIs can be a bit flaky (the unofficial hackernews feed seems to return 500 errors often), but you can typically click "retry" a little later and things will start working again. Tested on chrome and firefox, it should work in most modern browsers.

It's got a few notable features that make it a good demonstration of stratified JS functionality. Note that code examples shown below are not exact extracts of the megazine code, some have been simplified slightly to focus on the constructs used rather than the actual functionality of the app.

Article processing

Each article goes through the following process:

  • fetch items from the underlying source (hackernews, twitter, etc.)
  • fetch important information about the linked URL (title, description & images) using yql
  • figure out the biggest image on the page (using a few simple mechanisms like style and html attributes, falling back to actually loading them)

In addition to the above, tweet contents are first scanned for URLs which are then expanded (using the longurl.org API) to figure out what page they're actually linking to.

All of these steps need to be performed before the article gets displayed on the dashboard. Despite the fact that each step involves an async request, each step is completely dependent on the previous one, making it a perfect fit for SJS' sequential syntax.

Note that sequential syntax doesn't mean we're actually waiting any more than we would in a vanilla JavaScript implementation using copious callbacks. Making use of the sequence module, we have easy control over which parts happen in sequence and which in parallel. So while each of the above steps can't progress before the previous one is complete, we use the parallel iteration methods provided by sequence to process as many tweets as we can at the same time to minimise waiting. We also use the rate-limiting functions provided by the cutil module to prevent overloading any APIs (twitter, longurl, yql) or choking the browser's connections.

Offline storage

To make the app more responsive, it also utilizes local storage (via lawnchair.js) to cache the results of:

  • url expansion
  • article contents (title, summary, primary image URL)

This means that after an article is processed once, it doesn't need to perform the above work to redisplay when you come back to the page - it'll be displayed almost immediately and without any network traffic. It also means we can persist some state along with each article - you can click the (hide article) link underneath each article to mark it as hidden, and it will be hidden and won't be displayed again for as long as you use the same browser.

We also use the local storage to save the name & URL of any RSS feeds you add, so they're still there when you return.

Templating

As an experiment, we've used angular.js for the UI of megazine. Angular.js is an interesting framework that allows you to use declarative templates with plain JavaScript objects, and the data is kept in sync. When you add or remove (or modify!) an article in the array of articles, angular will update the display immediately without you having to add / remove HTML elements yourself. This makes for a clean separation of template and model code, and typically a lot less "glue" code keeping the two in sync - eliminating entire classes of bugs where the UI goes out of sync with the model.

While you may not use angular, the lessons learned are likely relevant to most plain JavaScript frameworks that you might want to use with stratified code. The main thing to remember is that JavaScript is incapable of waiting for suspending SJS code to return: if you call an SJS function from JS and the SJS function suspends, then the SJS function will return to JS, but happily continue working in the background. It's as if it had been spawned.

So if you want your suspending code to work the same way whether it is called from JS or SJS, you will want to do something like:

function start() {
  this.init();
  spawn(this.run());
}

Assuming start is called from angular.js somewhere, the start function will return immediately while this.run is kicked off in the background (or more accurately, it will run until it suspends, and then resume sometime after the current event loop has ended).

What does this.run() do? When working with angular.js, it ends up looking somewhat similar to threaded code - but without all the locking concerns. For megazine, the run function for each news source looks roughly like this:

run: function() {
  this.errorCondition.clear();
  try {
    // keep loading new items every 2 mins
    while(true) {
      this.updateTitle();
      using(this.workItem()) {
        try {
          var items = this.loadNewItems();
        } or {
          hold(this.loadTimeout);
          throw new Error(this.type + " items not received within " +
                          Math.round(this.loadTimeout / 1000) + 
                          " seconds");
        }
      }
      var newItems = this.filterNewItems(items);
      this.processItems(newItems);
      hold(1000 * 60 * 2);
    }
  } or {
    this.errorCondition.wait();
  } catch(e) {
    this.setError(e);
  }
}

Importantly, this.processItems will call this.$root.$eval() as soon as a new article is added to the list of current articles. This is an angular.js method, similar to redraw() in a typical GUI framework - it tells angular the data may have changed, and to update the UI as needed. Since angular.js is plain JavaScript, it can't wait for the return of our article processing code - so this is how we tell it that some new data has finished being processed and it's time to update the UI.

Error handling in the face of concurrency

You'll note that the above method has two paths:

  • the data processing loop: update the page title, load new items, process the pages thy point to, and insert summaries. Sleep for 2 minutes, and do it all again
  • the errorCondition.wait path

These are joined with SJS' parallel or construct - as soon as one of the paths completes, the other will be retracted and cease running. For our run method's case, that means that the data loading loop will stop when this.errorCondition.wait() returns. For simplicity's sake, the app stops loading new data as soon as any error occurs. Since SJS allows try/catch to work across suspending code (unlike callback-based code), this is very simple - we just surround the entire data loading loop with a try/catch block. But this is not the only place that an error can occur. For example, hiding an article could fail if there article can't be found in local storage (this shouldn't happen any more, but it did during development).

The click handler for hiding an article follows a similar pattern to the run code, since it is called by angular. It spawns the work to happen in the background, and wraps the execution in a try/catch. When an error occurs, it sets this.errorCondition (this happens in this.setError()). errorCondition is a cutil.Condition object (also new to apollo 0.13), and as soon as it is set(), all strata that have called wait() will be resumed. For us, that means the data processing part of the run loop above will be retracted, and new data will stop loading. setError() will cause the error message to be displayed to the user, allowing them to restart the run loop and try loading new data again. Here's the hideArticle function from megazine:

hideArticle: function(article, column) {
  spawn((function() {
    try {
      // persist "hidden" state in local storage:
      article.hidden = true;
      this.cache.save(article);
      // and remove the article from the model
      angular.Array.remove(column, article);
    } catch (e) {
      this.setError(e);
    }
    this.redraw();
  }).call(this));
}

We don't have to stop the main loop from loading more data when an error happens somewhere else, but it's a good way to ensure that errors don't get lost - as soon as any error occurs, background processing is halted and the app displays an error message - requiring the user to press a button to start processing new data again.

Error handling is a big benefit of stratified code. With plain JavaScript, every asynchronous callback is independent of the code that created it - you can't catch an error caused by something after a callback. In stratified code, try/catch work exactly as expected, greatly simplifying error handling logic while ensuring you don't get "runaway" code - callbacks that continue to run despite an error elsewhere in the code, compounding and complicating the effects of an earlier error. This is especially important when working with external APIs that are outside of your control, as is often the case in client-side JavaScript.

And these aspects of SJS are not limited to error handling - the top level application loop watches for changes in the document's hash (using angular.js' route service) like so:

var currentStratum;
while (true) {
  waitfor() { route.onChange(resume); }
  if(currentStratum) {
    currentStratum.abort();
  }
  // initialize the new scope and run it in the background
  currentStratum = spawn( ... );
};

The ability to abort a running stratum means that when the news source is changed (by clicking on the twitter tab, for example), no further work happens to process hacker news articles that wouldn't actually be displayed.

The code for megazine is all up on github, if you're interested in taking a closer look at the code or adding more news sources: megazine on github.