Quickstart Guide

This provides a quick introduction to implementing a recsystem using the high-level interface to the reference implementation.

It demonstrates how a trivial recommendation system can be written in a single function, and that the renewal_recsystem Python package provides a simple command-line interface for running your recsystem.

Prerequisites

You must install the renewal_recsystem package for Python. Please refer to the installation instructions.

In order to connect your recsystem to the Renewal Backend, you must also have obtained an authentication/authorization token. Currently, this is provided directly to you by the administrator of your contest. In the future it will be provided through a registration site.

Bare minimal “working” recsystem

To get started on your recsystem you will create a .py file containing at a minimum a function named recommend(). This function is called every time recommendations for a user are requested from your recsystem.

It may import any other modules as needed, whether they’re your own modules (if you have decided to split your code among multiple files) or third-party packages like Numpy and Pandas. But at a minimum this file is the entry-point to your recsystem.

For this tutorial we’ll call it my_recsystem.py. Create a file with that name and containing the following code:

def recommend(state, user, articles, min_articles, max_articles):
    return []

The recommend() function must have exactly the same signature as given here.

This is the bare minimum “working” recsystem insofar as that the recsystem will run and connect to the backend. You can start it by running:

$ python -m renewal_recsystem -t <path-to-your-token> my_recsystem.py

It should output some log messages that look like:

2021-06-01 17:03:45 MyComputer my_recsystem.py[16167] INFO starting up basic recsystem on https://api.renewal-research.com/v1/
2021-06-01 17:03:45 MyComputer my_recsystem.py[16167] INFO initializing articles cache with 1000 articles
2021-06-01 17:03:45 MyComputer my_recsystem.py[16167] INFO initializing websocket connection to wss://api.renewal-research.com/v1/event_stream
2021-06-01 17:03:45 MyComputer my_recsystem.py[16167] INFO ping() -> 'pong'

But beyond that it will never provide useful recommendations because it simply returns an empty list. We want it to return a list of articles to recommend to the user. In particular, our recommend() function should return a list of article IDs of the best articles we want to recommend to the user.

Testing the recsystem

In the previous section we defined a function called recommend() which takes some arguments. But how do we actually call that function in order to test it? What arguments does it take?

Actually, we never call this function directly. It is called for us whenever our recsystem receives a request from the backend for recommendations for a user.

Under normal operation, this means we would have to wait around for our recsystem to be assigned some users, and for those users to generate activity (i.e. fetching news recommendations in the mobile app).

However, for testing and development of our recsystem, there is a separate utility that allows our recsystem to be called “on demand” with test data. The results of these test calls are not used by the backend, and do not in any way impact the performance of our recsystem in a contest.

To test remote calls to your recsystem, while your recsystem is running open a separate terminal and use the test utility like:

$ python -m renewal_recsystem.test -t <token> <command>

We pass this command the same <token> as when running the actual recsystem, in order to authenticate to the backend. Then <command> is the name of any method we want the backend to call on our recsystem. For example:

$ python -m renewal_recsystem.test -t <token> recommend
[]

This prints [] which is the return value of the recommend() function we just implemented. If we look at the logs of our recsystem we also see something like:

2021-06-03 17:52:55 MyComputer my_recsystem.py[29595] INFO recommend(user_id='fake-user', max_articles=200, min_articles=15) -> []

The candidate articles

When you wrote the stub for your recommend() function it took a number of arguments: state, user, etc. Let’s take a look at what those look like by augmenting the function to log their values:

import logging

log = logging.getLogger(__name__)

def recommend(state, user, articles, min_articles, max_articles):
    log.info(f'state: {state}')
    log.info(f'user: {user}')
    log.info(f'articles:\n{articles}')
    log.info(f'min_articles: {min_articles}')
    log.info(f'max_articles: {max_articles}')
    return []

Restart your recsystem (hit Ctrl-C if it’s still running) and try making another test call:

$ python -m renewal_recsystem.test -t <token> recommend

In the logs we should see something like:

2021-06-03 18:02:37 MyComputer my_recsystem.py[30460] INFO state: {}
2021-06-03 18:02:37 MyComputer my_recsystem.py[30460] INFO user: User(uid='fake-user', interactions=defaultdict(<class 'dict'>, {}))
2021-06-03 18:02:37 MyComputer my_recsystem.py[30460] INFO articles:
                                               authors                 date  ...                                              title                                                url
article_id                                                                   ...
48573                              [Par, La Rédaction]  2021-06-02T19:56:29  ...  France-Galles : les Bleus mènent à la pause gr...  https://sport24.lefigaro.fr/football/euro-2020...
48572                         [Par Le Figaro Avec Afp]  2021-06-02T20:00:38  ...  Israël: le parti arabe Raam formalise son appu...  https://www.lefigaro.fr/international/israel-l...
48571                                  [Kenneth Chang]  2021-06-02T20:04:00  ...  New NASA Missions Will Study Venus, a World Ov...  https://www.nytimes.com/2021/06/02/science/nas...
48570         [Stéphany Gardier, Par Stéphany Gardier]  2021-06-02T19:10:32  ...  Un retour d’expérience rassurant sur des milli...  https://www.lefigaro.fr/sciences/un-retour-d-e...
48569       [Vincent Bordenave, Par Vincent Bordenave]  2021-06-02T19:11:14  ...  Covid-19: protéger les plus jeunes pour attein...  https://www.lefigaro.fr/sciences/covid-19-prot...
...                                                ...                  ...  ...                                                ...                                                ...
47578               [Paul Carcenac, Par Paul Carcenac]  2021-05-25T21:01:28  ...  Breton, belge, californien... Le cercle des va...  https://www.lefigaro.fr/sciences/breton-belge-...
47577                         [Par Le Figaro Avec Afp]  2021-05-26T06:01:27  ...  Livraisons de vaccins : l'UE et AstraZeneca s'...  https://www.lefigaro.fr/societes/livraisons-de...
47576                         [Par Le Figaro Avec Afp]  2021-05-25T17:12:15  ...  Covid-19 : l'Académie de médecine préconise de...  https://www.lefigaro.fr/sciences/covid-19-l-ac...
47575               [Elsa Bembaron, Par Elsa Bembaron]  2021-05-25T16:27:21  ...  Le carnet de rappel sera obligatoire à l'entré...  https://www.lefigaro.fr/secteur/high-tech/le-c...
47574                         [Par Le Figaro Avec Afp]  2021-05-26T17:13:00  ...  Covid-19 : le variant indien présent dans 53 t...  https://www.lefigaro.fr/sciences/covid-19-le-v...

[1000 rows x 11 columns]
2021-06-03 18:02:37 MyComputer my_recsystem.py[30460] INFO min_articles: 15
2021-06-03 18:02:37 MyComputer my_recsystem.py[30460] INFO max_articles: 200

For this section we are focusing on the last 3 arguments: articles, min_articles, and max_articles.

Of these, the last two are simply integers giving hints as to how many articles the backend wants your recsystem to return. Usually these will be the same values each time, but they may change as they are adjustable parameters. Your recsystem should return a minimum of min_articles recommendations on each call to recommend() in order for your recommendations to be considered.

The articles argument, on the other hand, is a Pandas DataFrame containing a collection of candidate articles to recommend to the user. This includes a backlog of past articles and new articles sent to your recsystem when it started running.

It also has pre-filtered out articles that user has already been recommended.

Note

The pre-filtering of already recommended articles is not always perfect as there may be race conditions. However, they should be mostly unique. As long as your recsystem returns a good number of results (well above min_articles but below max_articles) it should have more than enough recommendations to be considered for the user.

Further exploring the articles data

The exact format of the articles DataFrame is not fully documented here.

In order to more easily explore it, you could add something like:

articles.to_csv('articles.csv')

to your recommend() function to save the articles to a file. Then in a separate Python prompt or Jupyter Notebook open it like:

>>> import pandas
>>> articles = pandas.read_csv('articles.csv')

Note

In order debug your function, it might be a good idea to insert breakpoint() at the beginning of your recommend() function, then call python -m renewal_recsystem.test recommend. This will drop you into PDB: the Python debugger in which you can explore the value of articles interactively. However, in order for this to work you must also change the function definition from async def recommend(...):. This will be explained in a future chapter.

One of the more interesting columns is articles.metrics:

>>> articles.metrics
article_id
48573    {'bookmarks': 0, 'clicks': 0, 'dislikes': 0, '...
48572    {'bookmarks': 0, 'clicks': 0, 'dislikes': 0, '...
48571    {'bookmarks': 0, 'clicks': 0, 'dislikes': 0, '...
48570    {'bookmarks': 0, 'clicks': 0, 'dislikes': 0, '...
48569    {'bookmarks': 0, 'clicks': 0, 'dislikes': 0, '...
                               ...
47578    {'bookmarks': 0, 'clicks': 0, 'dislikes': 0, '...
47577    {'bookmarks': 0, 'clicks': 0, 'dislikes': 0, '...
47576    {'bookmarks': 0, 'clicks': 0, 'dislikes': 0, '...
47575    {'bookmarks': 0, 'clicks': 0, 'dislikes': 0, '...
47574    {'bookmarks': 0, 'clicks': 0, 'dislikes': 0, '...
Name: metrics, Length: 1000, dtype: object

For each article this gives a tally of all user interactions with that article, how many users have clicked on it, liked it, etc.

We will use this for the example in the next section.

Simple popularity-based recsystem

What we’ve learned so far is enough to build a recsystem that actually makes some recommendations. For starters we’ll add to my_recsystem.py a very simple function that measures the “popularity” of a single article given its metrics dict, using a very naïve metric (which you can take your own time to enhance):

def popularity(metrics):
    """
    Returns a measure of an article's popularity.

    The formula is ``max(clicks, 1) * ((likes - dislikes) or 1)``.

    You could replace this with a more sophisticated measure of popularity.
    """
    clicks = metrics.get('clicks', 0)
    likes = metrics.get('likes', 0)
    dislikes = metrics.get('dislikes', 0)

    return max(1, clicks) * ((likes - dislikes) or 1)

Basically the articles with the most clicks are the most “popular”, though it is “weighted” by the number of likes minus the number of dislikes.

Now we can sort our candidate articles from greatest to least popularity like:

def recommend(state, user, articles, min_articles, max_articles):
    # Drop articles that don't have a 'metrics' dict
    articles = articles.dropna(subset=['metrics'])

    # Sort articles by most to least popular according to the
    # `popularity` function applied to their metrics dicts.
    articles = articles.sort_values(
        'metrics',
        key=lambda m: m.apply(popularity),  # type: ignore
        ascending=False)

    # Take the top `max_articles` most popular
    articles = articles.iloc[:max_articles]
    return list(articles.index)

Here articles.sort_values sorts the articles according to their metrics. It takes as a sort key a function that applies the popularity() function to each article.

At the end we return list(articles.index). The articles tables is indexed by their article_id, so this results in a list of the article IDs of the articles we want to recommend. Let’s test it:

$ python -m renewal_recsystem.test -t <token> recommend
[48573, 47902, 47915, 47914, 47913, 47912, 47911, 47910, 47909, ...]

This should return a long list of article IDs. If you look at the logs for your recsystem you should see something similar logged.

Note

Make sure you’ve restarted your recsystem after making the code changes. Hot reloading isn’t implemented yet!

The example we’ve seen here is actually used for one of the baseline recsystems: renewal_recsystem.baseline.popularity