Quickstart Guide¶
This provides a quick introduction to implementing a recsystem using the high-level interface to the reference implementation.
It demonstrates how a trivial recommendation system can be written in a
single function, and that the renewal_recsystem
Python package provides
a simple command-line interface for running your recsystem.
Prerequisites¶
You must install the renewal_recsystem
package for Python. Please refer
to the installation instructions.
In order to connect your recsystem to the Renewal Backend, you must also have obtained an authentication/authorization token. Currently, this is provided directly to you by the administrator of your contest. In the future it will be provided through a registration site.
Bare minimal “working” recsystem¶
To get started on your recsystem you will create a .py
file containing
at a minimum a function named recommend()
. This function is called
every time recommendations for a user are requested from your recsystem.
It may import any other modules as needed, whether they’re your own modules (if you have decided to split your code among multiple files) or third-party packages like Numpy and Pandas. But at a minimum this file is the entry-point to your recsystem.
For this tutorial we’ll call it my_recsystem.py
. Create a file with
that name and containing the following code:
def recommend(state, user, articles, min_articles, max_articles):
return []
The recommend()
function must have exactly the same signature as given
here.
This is the bare minimum “working” recsystem insofar as that the recsystem will run and connect to the backend. You can start it by running:
$ python -m renewal_recsystem -t <path-to-your-token> my_recsystem.py
It should output some log messages that look like:
2021-06-01 17:03:45 MyComputer my_recsystem.py[16167] INFO starting up basic recsystem on https://api.renewal-research.com/v1/
2021-06-01 17:03:45 MyComputer my_recsystem.py[16167] INFO initializing articles cache with 1000 articles
2021-06-01 17:03:45 MyComputer my_recsystem.py[16167] INFO initializing websocket connection to wss://api.renewal-research.com/v1/event_stream
2021-06-01 17:03:45 MyComputer my_recsystem.py[16167] INFO ping() -> 'pong'
But beyond that it will never provide useful recommendations because it
simply returns an empty list. We want it to return a list of articles to
recommend to the user. In particular, our recommend()
function should
return a list of article IDs of the best articles we want to recommend to
the user.
Testing the recsystem¶
In the previous section we defined a function called recommend()
which
takes some arguments. But how do we actually call that function in order
to test it? What arguments does it take?
Actually, we never call this function directly. It is called for us whenever our recsystem receives a request from the backend for recommendations for a user.
Under normal operation, this means we would have to wait around for our recsystem to be assigned some users, and for those users to generate activity (i.e. fetching news recommendations in the mobile app).
However, for testing and development of our recsystem, there is a separate utility that allows our recsystem to be called “on demand” with test data. The results of these test calls are not used by the backend, and do not in any way impact the performance of our recsystem in a contest.
To test remote calls to your recsystem, while your recsystem is running open a separate terminal and use the test utility like:
$ python -m renewal_recsystem.test -t <token> <command>
We pass this command the same <token>
as when running the actual
recsystem, in order to authenticate to the backend. Then <command>
is
the name of any method we want the backend to call on our recsystem. For
example:
$ python -m renewal_recsystem.test -t <token> recommend
[]
This prints []
which is the return value of the recommend()
function
we just implemented. If we look at the logs of our recsystem we also see
something like:
2021-06-03 17:52:55 MyComputer my_recsystem.py[29595] INFO recommend(user_id='fake-user', max_articles=200, min_articles=15) -> []
The candidate articles¶
When you wrote the stub for your recommend()
function it took a number
of arguments: state
, user
, etc. Let’s take a look at what those
look like by augmenting the function to log their values:
import logging
log = logging.getLogger(__name__)
def recommend(state, user, articles, min_articles, max_articles):
log.info(f'state: {state}')
log.info(f'user: {user}')
log.info(f'articles:\n{articles}')
log.info(f'min_articles: {min_articles}')
log.info(f'max_articles: {max_articles}')
return []
Restart your recsystem (hit Ctrl-C if it’s still running) and try making another test call:
$ python -m renewal_recsystem.test -t <token> recommend
In the logs we should see something like:
2021-06-03 18:02:37 MyComputer my_recsystem.py[30460] INFO state: {}
2021-06-03 18:02:37 MyComputer my_recsystem.py[30460] INFO user: User(uid='fake-user', interactions=defaultdict(<class 'dict'>, {}))
2021-06-03 18:02:37 MyComputer my_recsystem.py[30460] INFO articles:
authors date ... title url
article_id ...
48573 [Par, La Rédaction] 2021-06-02T19:56:29 ... France-Galles : les Bleus mènent à la pause gr... https://sport24.lefigaro.fr/football/euro-2020...
48572 [Par Le Figaro Avec Afp] 2021-06-02T20:00:38 ... Israël: le parti arabe Raam formalise son appu... https://www.lefigaro.fr/international/israel-l...
48571 [Kenneth Chang] 2021-06-02T20:04:00 ... New NASA Missions Will Study Venus, a World Ov... https://www.nytimes.com/2021/06/02/science/nas...
48570 [Stéphany Gardier, Par Stéphany Gardier] 2021-06-02T19:10:32 ... Un retour d’expérience rassurant sur des milli... https://www.lefigaro.fr/sciences/un-retour-d-e...
48569 [Vincent Bordenave, Par Vincent Bordenave] 2021-06-02T19:11:14 ... Covid-19: protéger les plus jeunes pour attein... https://www.lefigaro.fr/sciences/covid-19-prot...
... ... ... ... ... ...
47578 [Paul Carcenac, Par Paul Carcenac] 2021-05-25T21:01:28 ... Breton, belge, californien... Le cercle des va... https://www.lefigaro.fr/sciences/breton-belge-...
47577 [Par Le Figaro Avec Afp] 2021-05-26T06:01:27 ... Livraisons de vaccins : l'UE et AstraZeneca s'... https://www.lefigaro.fr/societes/livraisons-de...
47576 [Par Le Figaro Avec Afp] 2021-05-25T17:12:15 ... Covid-19 : l'Académie de médecine préconise de... https://www.lefigaro.fr/sciences/covid-19-l-ac...
47575 [Elsa Bembaron, Par Elsa Bembaron] 2021-05-25T16:27:21 ... Le carnet de rappel sera obligatoire à l'entré... https://www.lefigaro.fr/secteur/high-tech/le-c...
47574 [Par Le Figaro Avec Afp] 2021-05-26T17:13:00 ... Covid-19 : le variant indien présent dans 53 t... https://www.lefigaro.fr/sciences/covid-19-le-v...
[1000 rows x 11 columns]
2021-06-03 18:02:37 MyComputer my_recsystem.py[30460] INFO min_articles: 15
2021-06-03 18:02:37 MyComputer my_recsystem.py[30460] INFO max_articles: 200
For this section we are focusing on the last 3 arguments: articles
,
min_articles
, and max_articles
.
Of these, the last two are simply integers giving hints as to how many
articles the backend wants your recsystem to return. Usually these will be
the same values each time, but they may change as they are adjustable
parameters. Your recsystem should return a minimum of min_articles
recommendations on each call to recommend()
in order for your
recommendations to be considered.
The articles
argument, on the other hand, is a Pandas DataFrame containing a collection of candidate articles to
recommend to the user. This includes a backlog of past articles and new
articles sent to your recsystem when it started running.
It also has pre-filtered out articles that user has already been recommended.
Note
The pre-filtering of already recommended articles is not always perfect
as there may be race conditions. However, they should be mostly unique.
As long as your recsystem returns a good number of results (well above
min_articles
but below max_articles
) it should have more than
enough recommendations to be considered for the user.
Further exploring the articles data¶
The exact format of the articles DataFrame
is not fully documented
here.
In order to more easily explore it, you could add something like:
articles.to_csv('articles.csv')
to your recommend()
function to save the articles to a file. Then in a
separate Python prompt or Jupyter Notebook open it like:
>>> import pandas
>>> articles = pandas.read_csv('articles.csv')
Note
In order debug your function, it might be a good idea to insert
breakpoint()
at the beginning of your recommend()
function, then
call python -m renewal_recsystem.test recommend
. This will drop you
into PDB: the Python debugger
in which you can explore the value
of articles
interactively. However, in order for this to work you
must also change the function definition from async def
recommend(...):
. This will be explained in a future chapter.
One of the more interesting columns is articles.metrics
:
>>> articles.metrics
article_id
48573 {'bookmarks': 0, 'clicks': 0, 'dislikes': 0, '...
48572 {'bookmarks': 0, 'clicks': 0, 'dislikes': 0, '...
48571 {'bookmarks': 0, 'clicks': 0, 'dislikes': 0, '...
48570 {'bookmarks': 0, 'clicks': 0, 'dislikes': 0, '...
48569 {'bookmarks': 0, 'clicks': 0, 'dislikes': 0, '...
...
47578 {'bookmarks': 0, 'clicks': 0, 'dislikes': 0, '...
47577 {'bookmarks': 0, 'clicks': 0, 'dislikes': 0, '...
47576 {'bookmarks': 0, 'clicks': 0, 'dislikes': 0, '...
47575 {'bookmarks': 0, 'clicks': 0, 'dislikes': 0, '...
47574 {'bookmarks': 0, 'clicks': 0, 'dislikes': 0, '...
Name: metrics, Length: 1000, dtype: object
For each article this gives a tally of all user interactions with that article, how many users have clicked on it, liked it, etc.
We will use this for the example in the next section.
Simple popularity-based recsystem¶
What we’ve learned so far is enough to build a recsystem that actually makes
some recommendations. For starters we’ll add to my_recsystem.py
a very
simple function that measures the “popularity” of a single article given its
metrics
dict, using a very naïve metric (which you can take your own
time to enhance):
def popularity(metrics):
"""
Returns a measure of an article's popularity.
The formula is ``max(clicks, 1) * ((likes - dislikes) or 1)``.
You could replace this with a more sophisticated measure of popularity.
"""
clicks = metrics.get('clicks', 0)
likes = metrics.get('likes', 0)
dislikes = metrics.get('dislikes', 0)
return max(1, clicks) * ((likes - dislikes) or 1)
Basically the articles with the most clicks are the most “popular”, though it is “weighted” by the number of likes minus the number of dislikes.
Now we can sort our candidate articles from greatest to least popularity like:
def recommend(state, user, articles, min_articles, max_articles):
# Drop articles that don't have a 'metrics' dict
articles = articles.dropna(subset=['metrics'])
# Sort articles by most to least popular according to the
# `popularity` function applied to their metrics dicts.
articles = articles.sort_values(
'metrics',
key=lambda m: m.apply(popularity), # type: ignore
ascending=False)
# Take the top `max_articles` most popular
articles = articles.iloc[:max_articles]
return list(articles.index)
Here articles.sort_values
sorts the
articles according to their metrics
. It takes as a sort key a function
that applies the popularity()
function to each article.
At the end we return list(articles.index)
. The articles tables is
indexed by their article_id
, so this results in a list of the article
IDs of the articles we want to recommend. Let’s test it:
$ python -m renewal_recsystem.test -t <token> recommend
[48573, 47902, 47915, 47914, 47913, 47912, 47911, 47910, 47909, ...]
This should return a long list of article IDs. If you look at the logs for your recsystem you should see something similar logged.
Note
Make sure you’ve restarted your recsystem after making the code changes. Hot reloading isn’t implemented yet!
The example we’ve seen here is actually used for one of the baseline recsystems: renewal_recsystem.baseline.popularity