MLOps for Conversational AI with Rasa, DVC, and CML (Part II)




github share icon

In the previous post I gave an introduction to the DVC, CML, and Rasa, and why we might want to use them all together. In this post I’ll get down to business and describe how to put it all together. All the code is available in this Github Repo.

Rasa basics

This post isn’t really about how to use Rasa, so I won’t spend a lot of time talking about how to use it, but it won’t make much sense if I don’t at least share some simple commands.

First off, Rasa trains machine learning models with which to run its chat bots: typically entity extraction, intent recognition (combined into the Rasa DIET model) and a dialogue management model. In our repo, the training data is stored in ./data/stories.yml. These are divided into ‘paths’ representing conversations, for example:

Intents are the classes that our intent model will classify, and actions are the responses that our chat bot will return: these are defined in ./domain.yml.

At this point I should shout out to the rasa-converter by Nick Sorros which is a useful little tool for converting between a jsonl style format more commonly used in NLP model building, in case you want to train a model in another framework external to Rasa.

Now we know where the training data is, the three commands we need to know are:

  • rasa train, which will train a new version of the models (e.g. following an update to the training data),
  • rasa test, which takes test stories from ./tests and evaluates how well our model performs against them (think evaluate more than test), and
  • rasa shell, which creates an interactive shell in which we can have a live conversation with the AI.

A simple workflow with Rasa would therefore look something like this:

  • Create some training data (either through invention or real conversations — see Rasa X)
  • Train a model using rasa train
  • Evaluate the model using rasa test
  • Deploy the model somewhere
  • Repeat

The DVC pipeline

Now that we have a better understanding of what a Rasa workflow looks like, let’s see how we can use DVC to make this more reproducible.

The first step is to create a DVC pipeline. DVC can be used simply to version control data (as the name suggests) but it really becomes useful when you start to use DVC pipelines to create directed acyclic graphs (DAGs).

Below we set out a simple DVC pipeline for training and testing our Rasa model. This is stored in ./dvc.yaml.

The train stage

The cmd attribute specifies the commands for the stage. You can see here I’ve used >- to allow my command to flow over multiple lines. I still need to use && to ensure that the second command runs after the first command executes successfully. Here you can see that I first use the built-in rasa data validate command to make sure there aren’t any issues with the training data, before running the rasa train command.

The params attribute specifies parameters that are used in this step — and for which a change would require the step to be repeated. Here I have specified the Rasa ./config.yml file which is where configuration for a Rasa model is usually stored. If you don’t explicitly specify a filename here, DVC will default to using a file called ./params.yml (we don’t have one in our repo).

The deps attribute specifies the dependencies for this stage, which in our case is the entire ./data folder (but equally it could be a single file or files). Any changes in the data folder will force this step to be repeated.

Finally the outs attribute specifies what outputs should be tracked by DVC. This has two implications. Firstly, the outputs in ./models will be stored by DVC in a local cache (and a remote one on S3 — in our case). Secondly, this folder will be deleted when the train stage is executed, and any changes to the ./models folder not created by running the stage with dvc repro will result in the stage being repeated — this is why you don’t see it in our example repository.

The test stage

This time we call the built in rasa test command to test the model against a number of test conversations. We use the same configuration as in the train stage, and set models (the output from the train stage) as our dependency. Note that we might also want to set the ./tests folder as a dependency here. That would ensure that we also have up to date test results in case we made changes to the test conversations.

This time, we don’t have an outs attribute, instead we set a number of json files to be our metrics which are tracked in the same way as outs, but with the added benefit that we can explicitly ask DVC for changes to the metrics with dvc metrics diff.

In summary, our DVC pipeline will:

  • Check for updates in the data or changes to the model configuration or hyper-parameters. If found, it will…
  • Validate the data format and retrain the model, and…
  • Test the model against the test conversations (since the model has now changed)

DVC in action

Now that this is all in place, let’s test the pipeline. I’m assuming here you just pulled the repository from the Github Repo.

Syncing with DVC

Let’s start by getting the last version of the model from DVC with: dvc pull (make sure you build the virtualenv and activate it first with make virtualenv and source ./build/virtualenv/bin/activate — you’ll also need an aws account to be able to access our s3 bucket). This gets the latest model that corresponds to a hash stored in the ./dvc.lock file. Once we have that model, we can chat with our bot right away with rasa shell. This is a pretty stupid bot, so its not going to be a great conversationalist.

Making a change

Ok, so let’s imagine that you (rightly so) are underwhelmed by this bot’s performance. Let’s try to get it to do a better job by tweaking the model slightly. Why not increase the number of epochs used in the DIET classifier from 100 to 150 (you can find this setting in ./config.yml). If we now run dvc status we can see that dvc has detected a change in ./config.yml and wants to re-run all the stages that depend on this file:

Let’s appease DVC by running dvc repro to recreate our pipeline. This will take a little bit of time as the various models train, but you’ll notice that the first model runs for 150 epochs this time. DVC will then run the test, and finally tell us that it has updated ./dvc.lock and that we should add these changes to git to capture the latest changes.

Since we have specified the various test jsons as our results, we can track the change that resulted from the increase in epochs by running dvc metrics diff. This will give us a breakdown of the metric changes from the previous version of the model (which is probably the version checked out on main if you are working through this post).

Looks like there are a few changes, and some new metrics recorded where we see that the intent goodbye is no longer confused with the intent great. Your results will likely be different due to the stochastic nature of the results.

Git will now tell us that two files have changed, the ./config.yml that we edited to change the number of epochs, and the ./dvc.lock which records the current state of the workspace.

If I was happy with these results, I could now commit the dvc.lock to git and push this model to our remote storage with dvc push.

Note that DVC has a new feature called experiments that would allow you to compare lots of parameter changes without having to commit them to git, but still keeping a record of them in DVC. You can even queue up multiple model runs with small parameter changes and just leave it to chunk through the various models. This is super useful, but not the subject of this blog.

Remote storage

By default DVC stores artefacts in ./dvc/cache in the local folder, but we can also specify remote storage (for example an S3 bucket). This is specified in the ./dvc/config file:

Here you can see we have set up an S3 bucket to act as our remote storage. Once set, calling dvc push will send changes to the S3 bucket (assuming you have permission to write to it).

Next time...

This was the second (here’s the first one) of three blog posts about using DVC with Rasa. In the final blog post we’ll talk about how to use Continuous Machine Learning (CML) to automatically retrain our model using a Github action.

Going the extra mile, lessons learnt from Kaggle on how to train better NLP models (Part I)


Are you interested in working with us?