Channel 9 Implements The Azure Machine Learning Recommendations API

News

Extraordinary Robot
Robot
You may have noticed a link to that appears next to any recommended entries or sessions:

c6fe009c-3ea8-48d1-be82-af831601da04.png


This is because we have implemented the Azure Machine Learning Recommendations API. Now, whenever you see recommended videos to watch (which appears at the end of each video as well as in a sidebar on each video's page) the recommendations are coming from a model built by this service, based on Channel 9 usage data back dated to July of 2014. This service worked out perfect for us and is an improvement over our old recommendations algorithm. You can learn more about the Azure Machine Learning Marketplace video on Channel 9.

In this post, I want to discuss a little bit more about our implementation. The Recommendation API is quite user friendly and well documented. And the sample app proved to be invaluable, especially the classes that handled all the XML parsing and XPath syntax -- I've been doing nothing but JSON for so long that my XML skills were a bit rusty!

I want to call out that they have a very handy web UI for getting information about your model. A link to this UI is located in the datamarket portal itself on your account page. Here's what it looks like:

b8ccd4fe-30fa-415b-ada7-d98d57888191.PNG


So, here's some more about our implementation. Basically, my implementation borrowed heavily from the sample app, with a little customization. We have a WebJob that runs weekly that handles everything. This job does four things:

  1. Uploads the latest catalog, meaning all published Channel 9 videos.
  2. Uploads the latest usage data, meaning all anonymized playback info, aka this unique user watched this unique video
  3. Builds and replaces the model, meaning we have the service rebuild the model with the latest catalog and usage data, and we replace the old model with the new one
  4. Rebuilds cache, meaning we cache all the recommendations on our servers

I'll discuss each step below.

Uploading Catalog Data

Because there is constantly new content being added to Channel 9, we have to continually update the model with the latest entries. Otherwise, the recommendation engine doesn't know about the new content and can't make any recommendations. Updating the catalog is pretty straightfoward. We call the ImportCatalogFile API, passing all unique IDs for all the videos on Channel 9. I pretty much used the code from the sample app as is for this.

The only gotcha with the ImportCatalogFile API is that it is a comma delimited file that doesn't take quotes and doesn't have a way of escaping commas, so watch out for that. I just nuke all the commas since that field is never seen in our UI.

Uploading Usage Data

The next step in using a recommendation engine is to provide usage data to inform the model. By usage data, I mean all the anonymized playback data from Channel 9 -- this unique user watched this unique video. We provide this data to the model as a bulk upload, using the ImportUsageFile API. We choose this as opposed to their other way of collecting data, which is to post to an API every time there is a usage event. In our case, it made more sense to grab our daily logs and send them up.

The one thing I had to customize with the ImportUsageFile API was a way to make sure that I wasn't uploading a usage file that I already uploaded. Because I have an automated WebJob that does the bulk upload, I needed to diff between what had been uploaded and not. This query isn't inherent in the API, but could be built. To get the usage files, you can call GetAllModels and then extracting the UsageFileNames element, which returns a comma delimited list of file names. Here's the code for that:





public string GetRecommendationUsageFiles(string modelId)
{
string usageFiles = null;
using (var request = new HttpRequestMessage(HttpMethod.Get, RecommendationUris.GetAllModels))
using (var response = httpClient.SendAsync(request).Result)
{
if (!response.IsSuccessStatusCode)
{
Console.WriteLine(String.Format("Error {0}: Failed to start build for model {1}, \n reason {2}",
response.StatusCode, modelId, response.ReasonPhrase));
return null;
}
//process response if success
var node = XmlUtils.ExtractXmlElement(response.Content.ReadAsStreamAsync().Result, string.Format("//a:entry/a:content/m:properties[d:Id='{0}']/d:UsageFileNames", modelId));
if (node != null)
{
usageFiles = node.InnerText;
}
}
return usageFiles;
}

Building And Replacing The Model

With the service having the latest catalog and usage data, the next step is to build a new model. This is done using the BuildModel API. This API returns immediately, so you have to poll the service using the GetModelBuildsStatus. Typically, our model takes about 20 minutes to build. I wrapped the method that polls for the model status in a background thread so that I wasn't hanging the WebJob (which also does some other things) while it waited for the job to complete.

The only gotcha here is that you can only kick off one build at a time. If you try to kick of a second build before the first one completes, you'll get a cryptic error back.

Once the GetModelBuildsStatus returned a status of BuildStatus.Success, I call UpdateModel passing the BuildId of the latest model. Nothing fancy here.

Building A Cache

The final thing I do is build a cache of all the recommendations. You do have the option of querying the API directly to get a given recommendation for an item in the catalog, but we chose to build our own local cache. So, once I have the latest, greatest model built, I walk through all of the Channel 9 video IDs and call ItemRecommend, storing the results on our own server. To speed this up (since there are quite a few videos on Channel 9!), I wrap the calls to ItemRecommend in a ParallelForEach using a ConcurrentBag to store the results:



var resultCollection = new ConcurrentBag<string>();
Parallel.ForEach(catalogGuids.Skip(i * 100).Take(100), guid => { resultCollection.Add(GetRec(guid)); });

Final Thoughts

If you are in need of a powerful, smart recommendation service, I'd recommend using the Azure engine. There are so many scenarios that it could be used for. Machine Learning in the cloud!

njs.gif


Continue reading...
 
Back
Top