0001-01-01

grep— type: “page” draft: true title: “SmartFlow Conception” description: “SmartFlow Conception” weight: 99999 usemathjax: false usesmartdown: true categories:

  • galaxy
  • smartdown
  • workflow outputs:
  • “html”
  • “Smartdown”

Smartdown Meets Distributed Workflow

Ever since I added reactive variables to Smartdown, and began writing data science notebooks with it, I’ve written too much code to tie my reactive Smartdown document to the external data sources I need to digest. Usually, these external sources are REST APIs or static file content stored in S3 or GitHub, and I use XMLHttpRequest to bring the remote data into my Smartdown context in the browser.

I’ve mused over the idea of using the reactive data-binding to tie in remote and/or asynchronous sources in a more convenient and concise manner. For example, a syntax (or API) could be developed to allow a Smartdown variable to be bound to an external process. The current Smartdown calcHandlers feature is one way to obtain this, and I’ve exploited the Falcor syntax to minimize the syntax load when performing a query (see the External Data Query example where WikiData is consulted via a calcHandler). However, if I want to build a notebook that performs both read and write operations into external data, I’d like to be able to exploit all of Smartdown’s reactivity and dependencies with minimal syntax.

Coincidentally, I’ve been becoming more familiar with the Galaxy Project and am trying to write a Smartdown Notebook exploring their API. So…. I decided to try to use Galaxy’s backend infrastructure to act as a reactive substrate and to have Smartdown Notebooks act as both workflow definition and workflow interaction and visualization tools.

My idea is to create literate Galaxy Workflows by combining Smartdown’s prose-orientation with Galaxy’s workflow system, to produce an ensemble that uses Smartdown as front-end, and Galaxy’s API and Engine as backend. This solution would obviate the Galaxy Authoring Environment for some use cases and users.

More importantly, it would massively lighten the work and payload required to share and evolve Galaxy Workflows, by enabling them to be managed within a traditional version control system like GitHub.

Trying out the Galaxy Project API.

SmartFlow - Literate Workflows with Smartdown and Galaxy

My rough idea is to create something that could exist, and see what it might be useful for. This is how most of my stuff gets generated.

  • Markdown-based text document (Smartdown) extended to support embedding of a sequence of labelled Galaxy Workflow Steps, as well as the declaration of metadata to tie it into a server instance and a Workflow Name.
  • Compilation of this prose-heavy document with embedded workflow steps results in the extraction of the steps from the text, and the injection (via the Galaxy API) of the workflow into a Galaxy instance.
  • Compilation also results in the rendering of basic UI controls to play/pause the workflow. Playing the workflow initiates the creation of a new Galaxy History, which is bound to the SmartFlow doc, but also retrievable as a MostRecentHistory variable.
  • Workflow intermediates (files in the current Galaxy history) are transparently made available as Smartdown reactive variables within the Browser, enabling easy visualization of intermediate and final results.

The Galaxy Authoring environment would not be used, except perhaps for debugging, where it is very useful to determine the state of a workflow and whether anything got stuck. The Authoring Environment is also wildly convenient for documentation and understanding Tool parameters.

Initial Prototyping

We’re gonna start by exploring the Galaxy API using Smartdown’s basic Javascript Playable. Once I’m comfortable with its capabilities, I’ll try to create a Smartdown document that constructs and executes a desired workflow within Galaxy, binding the Galaxy data to Smartdown variables. If that works, then a proper galaxy playable type will be written, which will enable the easy textual specification of Galaxy workflow steps, data references, and control/monitoring.

Smartdown Variables for configuration

The following Smartdown variables are used within subsequent playables, which will reactively update if these values are changed.

Galaxy API URL Galaxy API Key History Name CORS Proxy Prefix

smartdown.setVariable('sourceCSVURL', 'https://data.oregon.gov/api/views/esjy-u4fc/rows.csv?accessType=DOWNLOAD');


smartdown.setVariable('galaxyAPIKey', '810d0409aa37f60c4871e3f76d463679');
smartdown.setVariable('galaxyAPIURL', 'https://usegalaxy.org');

// smartdown.setVariable('galaxyAPIKey', 'a5442cad7f64eeaf08c4503c0bbb2c70');
// smartdown.setVariable('galaxyAPIURL', 'https://test.galaxyproject.org');

// smartdown.setVariable('fixCORSPrefix', 'https://crossorigin.me/');
// smartdown.setVariable('fixCORSPrefix', 'https://corsproxy.our.buildo.io/');
// smartdown.setVariable('fixCORSPrefix', 'https://thingproxy.freeboard.io/fetch/');
smartdown.setVariable('fixCORSPrefix', 'https://ekswhyzee.xyz?url=');
// smartdown.setVariable('fixCORSPrefix', '');
smartdown.setVariable('historyName', 'MySmartFlowHistory');

Ensure the desired history exists

We’ll be associating a Galaxy History value with the current Smartdown document, and using that History as the source (and target) of any Smartdown variables that are bound to that history. How these variables are bound is still to be determined, but presumably I’ll just create a special type of Smartdown variable that represents this binding.

I’m going to first use the Histories API to determine whether the desired named History exists, and if not, I’ll create a new History by that name.


this.dependOn = ['galaxyAPIKey', 'galaxyAPIURL', 'fixCORSPrefix', 'historyName'];
this.depend = function() {
  var galaxyAPIKey = env.galaxyAPIKey;
  var galaxyAPIURL = env.galaxyAPIURL;
  var fixCORSPrefix = env.fixCORSPrefix;
  var historyName = env.historyName;

  var historiesURL = `${galaxyAPIURL}/api/histories?key=${galaxyAPIKey}`;
  smartdown.setVariable('currentHistoryURL', historiesURL);

  function handleResponse() {
    var response = JSON.parse(this.response);
    // console.log('handleResponse', historiesURL);
    // console.log(JSON.stringify(response, null, 2));

    var currentHistory = null;
    response.forEach(h => {
      if (h.name === historyName) {
        currentHistory = h;
      }
    });
    if (currentHistory) {
      smartdown.setVariable('currentHistoryID', currentHistory.id);
    }
    else {
      function handleCreationResponse() {
        // console.log('handleCreationResponse', this);
        var response = JSON.parse(this.response);
        smartdown.setVariable('currentHistoryID', response.id);
        // console.log('handleCreationResponse', historiesURL);
        // console.log(JSON.stringify(response, null, 2));
      }

      var newHistoryJSON = {
        name: historyName,
      };

      var request = new XMLHttpRequest();
      request.withCredentials = false;
      request.url = `${fixCORSPrefix}${historiesURL}`;
      request.addEventListener("load", handleCreationResponse);

      request.onreadystatechange = function () {
        if (request.readyState === XMLHttpRequest.DONE && request.status === 200) {
          console.log('onreadystatechange', request, this);
        }
      };

      request.open('POST', request.url, true);
      request.setRequestHeader("Content-Type", "application/json");
      request.send(JSON.stringify(newHistoryJSON));
    }
  }

  var request = new XMLHttpRequest();
  request.withCredentials = false;
  request.url = `${fixCORSPrefix}${historiesURL}`;
  request.addEventListener("load", handleResponse);
  request.open("GET", request.url);
  request.send();
};

Current History ID Current History URL

Get our History Contents

In the same way that we created a named History above if none existed, we’ll next ensure that our History has a specific named value corresponding to a CSV file. If the file is not in our History, we’ll create that content and load it or generate it.

The particular CSV file doesn’t matter, so we’ll start with a freely available CSV provided by the State of Oregon that shows new businesses registered in the last month:

New Businesses Registered Last Month

Here is the CSV file URL from which Galaxy will obtain data (on our behalf) and upload that into our Galaxy History:

Original CSV from data.oregon.gov

this.dependOn = ['currentHistoryID'];
this.depend = function() {
  var historyID = env.currentHistoryID;
  var galaxyAPIURL = env.galaxyAPIURL;

  var historyContentsURL = `${galaxyAPIURL}/api/histories/${historyID}/contents?key=${env.galaxyAPIKey}`;
  var historyContentsURLProxied = `${env.fixCORSPrefix}${historyContentsURL}`;
  smartdown.setVariable('historyContentsURL', historyContentsURL);
  smartdown.setVariable('historyContentsURLProxied', historyContentsURLProxied);

  smartdown.axios.get(`${env.fixCORSPrefix}${historyContentsURL}`)
  .then(function(historyContents) {
    const data = historyContents.data;
    console.log('historyContents');
    console.log(JSON.stringify(historyContents.data, null, 2));
    smartdown.setVariable('historyContents', data, 'json');
  });
};

History Contents URL Proxied History Contents URL History Contents

Detect whether our CSV is in the History already

We’ll look at the historyContents variable and if it doesn’t have our desired CSV file in it, we’ll set the transferFileToGalaxy variable that another playable will be awaiting, to initiate a transfer. Otherwise, we’ll set the fileTransferredToGalaxy variable to allow the contents to be displayed.

this.dependOn = ['historyContents'];
this.depend = function() {
  var historyContents = env.historyContents;
  // console.log(JSON.stringify(historyContents, null, 2));
  let found = null;
  historyContents.forEach((h) => {
    console.log('h', h.name, env.sourceCSVURL);
    if (h.name === env.sourceCSVURL) {
      found = h.id;
    }
  });

  if (found) {
    smartdown.setVariable('uploadResultOutputURL', `${env.galaxyAPIURL}/datasets/${found}/display`);
  }
  else {
    smartdown.setVariable('transferFileToGalaxy', true);
  }
};

Initiate the Upload of CSV via and to Galaxy

Uploading a file to a new HDA with the history contents API

If we’ve detected that our desired CSV file is not present in the history, we’ll direct Galaxy to copy the CSV contents from data.oregon.gov to our History container. This operation is asynchronous, and it may take a while.

transferFileToGalaxy

this.dependOn = ['galaxyAPIURL', 'currentHistoryID', 'transferFileToGalaxy'];
this.depend = function() {
  var galaxyAPIURL = env.galaxyAPIURL;
  var galaxyAPIKey = env.galaxyAPIKey;
  var currentHistoryID = env.currentHistoryID;
  var fixCORSPrefix = env.fixCORSPrefix;

  var inputs = {
        'file_count': 1,
        'dbkey': '?',
        'ajax_upload': true,
        'files_0|type': 'upload_dataset',
        'files_0|space_to_tab': null,
        'files_0|to_posix_lines': 'Yes',
        'files_0|dbkey': '?',
        'files_0|file_type': 'csv',
        'files_0|url_paste': 'https://data.oregon.gov/api/views/esjy-u4fc/rows.csv?accessType=DOWNLOAD',
  };

  var data = {
      'key': galaxyAPIKey,
      'tool_id': 'upload1',
      'history_id': currentHistoryID,
      inputs: inputs,
      // payload: payload,
      // files: files,
      // error_message: null
  };


  function handleResponse() {
    console.log('handleResponse', this.response);
    var response = JSON.parse(this.response);
    var outputId = response.outputs[0].id;
    smartdown.setVariable('uploadResult', response);
    smartdown.setVariable('uploadResultOutputId', outputId);
    smartdown.setVariable('uploadResultOutputURL', `${galaxyAPIURL}/datasets/${outputId}/display?to_ext=csv`);
  }

  var uploadURL = `${galaxyAPIURL}/api/tools?key=${galaxyAPIKey}`;
  var request = new XMLHttpRequest();
  request.withCredentials = false;
  request.url = `${fixCORSPrefix}${uploadURL}`;
  request.addEventListener("load", handleResponse);
  request.open('POST', request.url);
  // request.setRequestHeader("Content-Type", "application/json");
  var dataS = JSON.stringify(data);
  console.log('dataS', dataS);
  request.send(dataS);
};

uploadResult

uploadResultOutputId uploadResultOutputURL

View the resulting CSV Data

Using Smartdown’s D3 integration, we can obtain the CSV data from our Galaxy History and display it in Smartdown.

this.dependOn = ['uploadResultOutputURL'];
this.depend = function() {
  const url = `${env.fixCORSPrefix}${env.uploadResultOutputURL}`;
  console.log('url', url);

  smartdown.axios.get(url)
  .then(function(csvContents) {
    const data = csvContents.data;
    // console.log('data');
    // console.log(JSON.stringify(data, null, 2));
    smartdown.setVariable('csvContents', data, 'json');
  });
};

csv