MDI Data Packages
The MDI exploits staged data analysis, which means that processed data files flow from a Stage 1 Pipeline to a Stage 2 App. Such data, with potentially multiple distinct files, are carried in a zipped data package generated by the MDI pipelines framework as instructed by a pipeline.yml file.
All apps are launched from a generic framework page that supports the upload of both data packages and bookmark files as the user’s entry point to your app. In other words, the typical way users make first calls to load your app is by providing a compatible data package.
As a result, in addition to an app name, description, and possibly verion, your app’s configuration must declare the types of data packages it accepts and the files it expects those packages to contain.
This matching method creates a many-to-many relationship between data packages and apps, e.g., multiple apps might be willing to work on the same data package. More typically, you will design a pipeline and and app as one-to-one partners.
App step declaration in <app>/config.yml
The steps for your app are listed in its config.yml file, following the instructions found in the _template app’s file:
# <app>/config.yml
name: <appName>
description: ...
version: v0.0.0
uploadTypes:
<uploadTypeName>: # as declared in pipeline.yml
contentFileTypes:
<contentFileTypeName>:
required: true/false
As an example:
# <pipeline>/pipeline.yml
package:
do:
uploadType: myUploadType
files:
fileType_1:
type: tab-delimited
file: $DATA_FILE_PREFIX.xxx.txt
# <app>/config.yml
name: myApp
uploadTypes:
myUploadType: # matches uploadType declared above
contentFileTypes:
fileType_1:
required: true
fileType_2: # OK to be missing from pipeline.yml above
required: false
In this example, myApp
will be offered to users as a way of analyzing data packages with an uploadType of myUploadType
that carry a file of contentFileType fileType_1
.