Batch Updating Plugins

Batch updating plugins allows your plugin to update/set one or more user properties in the current project's data in large batches. An example would be a plugin that predicts the likelihood that a user will convert, and predicts this for every user in the current project.

For each user it saves a number from 0.0 to 1.0, based on how likely it is that this user will convert. This property can then be re-used everywhere on the platform, such as segments, report filters and more.

Configuring plugin as batch updating

To make your plugin support batch updating, one of the plugin JSON result stages needs to output an object with key batches on the root object. This can be added to the output JSON from the initial stage or any additional stage. If multiple stages output an batches results object, the one from the last stage will be taken.

Here is a basic batches object specification that is added on the trainMore stage:

{
"data": {
... // Some model metrics that can be displayed in the plugin JSX visualization
},
"status": {
"code": "success",
"title": None,
"explanation": None,
"backtrace": None
},
"batches": {
"maxBatchSize": 5000
"options": {"arg1": 1, "arg2": "abc"} // optional
}
}

As can be seen in the example above, the batches object is pretty straightforward:

  • maxBatchSize: How many user objects to score in a single batch? Defaults to 10,000, minimum needs to be 1000 and maximum 10,000,000.
  • options: Optional — This object will be passed on to the manifest of the batch stage. This way you can easily share parameters from for example the model training stage with the batch stage.

Batch updating plugin manifest

When a plugin supports batch updating, it again receives a JSON manifest for the plugin to read, just like in any other stage such as the initial one.

During local development, get your batch stage manifest by doing a GET like this — note that the last path for the stage stage needs to be batch:

curl "https://www.stormly.com/api/developer/get_manifest/batch" \
-H 'X-Project-Key: abcd12345' \
-H 'X-Dataset-Key: abcdefghjklm12346789'

Store the results in a file called batch-manifest.json, as we'll use it in the next step to the launch plugin's batch process.

Here is what the JSON manifest will look like:

{
"stage": "batch",
"dataUrls": {
"initial": "http://dev.stormly.com/api/plugin/query_dataset/sadfj88e7?range_start_gt_or_eq=0.0&range_end_lt=0.2",
"latest": "http://dev.stormly.com/api/plugin/query_dataset/oefi129?range_start_gt_or_eq=0.0&range_end_lt=0.2",
"train_0": "http://dev.stormly.com/api/plugin/query_dataset/mvmdo9188?range_start_gt_or_eq=0.0&range_end_lt=0.2",
"train_6270": "http://dev.stormly.com/api/plugin/query_dataset/spqpp3883?range_start_gt_or_eq=0.0&range_end_lt=0.2",
"train_12464": "http://dev.stormly.com/api/plugin/query_dataset/ffhgy737?range_start_gt_or_eq=0.0&range_end_lt=0.2",
"train_322978": "http://dev.stormly.com/api/plugin/query_dataset/cxnnv134?range_start_gt_or_eq=0.0&range_end_lt=0.2"
},
"downloadUrls": {
"initial": "https://s3.eu-central-1.amazonaws.com/storage.stormly.com/abcd/1234",
"trainMore": "https://s3.eu-central-1.amazonaws.com/storage.stormly.com/abcd/5678"
},
"getUploadUrls": {
"batch": "https://www.stormly.com/api/developer/upload_url/wxyz/6789"
},
"options": {
"arg1": 1,
"arg2": "abc"
},
"metadata": {
...
}
}

First note that the stage will always be batch for the batch updating part of plugins. Basically when the plugin loads the JSON manifest, and it finds that manifest.stage == 'batch' we simply start the batch scoring part of our plugin code.

Data urls are available under dataUrls, because we need to get a batch of say 1000 user records with their features and score each of them. Any of the datasets requested in the initial or additional stages are available here. Note that each data url has range_start_gt_or_eq and range_end_lt query parameters appended automatically — this so that we get roughly the number of users per batch as we've requested during earlier stages with the batches object.

To update the actual user properties, we need to upload a JSON file with the updated properties for each user. This is done via the getUploadUrls batch property, more on this later.

You can download files uploaded to storage from previous stages using the downloadUrls, and append the file path used during uploading, for example ${manifest['downloadUrls']['initial']}/model.pkl

The options object is an exact copy of what was specified in the previous stage under the batches object. The metadata is the exact same object as for any previous stages.

Batch updating user properties

A batch updating plugin should generate two files.

The usual results.json which has no data or other properties like it has for regular plugin runs, but only has the status object like below:

{
"status": {
"code": "success",
"title": null,
"explanation": null,
"backtrace": null
}
}

In case any of the batches doesn't have a success code, the plugin run as a whole will fail.

The other file is a data.json file, which contains the actual users and property values that should be updated/set:

{
"category": "Predictions",
"properties": [
"Class", "Rating"
],
"updates": [
["user_1", "A", 0.43],
["user_2", "A", 0.59],
["user_3", "B", 0.37],
["user_4", "C", 0.01]
]
}

As can be seen in the example above the data.json file is pretty straightforward:

  • category: Optional, but recommended to specify this. Each time this plugins runs, the properties will be saved under the name of the plugin plus the timestamp of the run. If category is given those properties will end up under its own sub-menu in the user properties menu. If not used, the properties will appear at root of the user properties.
  • properties: An array of strings. Each element corresponds with the name of the user property being set or updated. Needs to have a minimum of one element.
  • updates: An array of arrays. Each element starts with the user_id (see Dataset and features), then each element after that corresponds with the values as specified in the properties array. So in this example the values "A", "A", "B", "C" are for to the "Class" property, while 0.43, 0.59, 0.37, 0.01 are for the "Rating" property. The first element thus updates user_1 and sets Class for that user to A and Rating to 0.43.

Running the plugin

Once you're ready to test your batch updating plugin, start it with the command below, where batch-manifest.json was generated in the previous step:

python main.py batch-manifest.json results.json

Once the run finished, you should have a results.json and data.json file.

Your code should automatically upload the data.json file using the batch key of getUploadUrls. So you'll have to add that part before your plugin is fully completed. Basically you first need to get a signed upload url, by doing a GET on getUploadUrls for the batch key, and append to the GET request /data.json. Then make a PUT request on that signed upload url as-is, sending the data.json file. Here's the flow using curl:

# Get the url from JSON manifest using key `batch` for `getUploadUrls`.
GET_UPLOAD_URL='https://www.stormly.com/api/developer/upload_url/wxyz/6789'
SIGNED_UPLOAD_URL=`curl $GET_UPLOAD_URL/data.json`
curl --upload-file data.json $SIGNED_UPLOAD_URL

You can either implement this using http request libraries for your language, or just use system exec and curl from your plugin code, as curl is available in your plugin runner environment.