Quickstart Guide ================ This provides a quick introduction to implementing a recsystem using the high-level interface to the reference implementation. It demonstrates how a trivial recommendation system can be written in a single function, and that the ``renewal_recsystem`` Python package provides a simple command-line interface for running your recsystem. Prerequisites ------------- You must install the ``renewal_recsystem`` package for Python. Please refer to the :ref:`installation instructions `. In order to connect your recsystem to the Renewal Backend, you must also have obtained an authentication/authorization token. Currently, this is provided directly to you by the administrator of your contest. In the future it will be provided through a registration site. Bare minimal "working" recsystem -------------------------------- To get started on your recsystem you will create a ``.py`` file containing at a minimum a function named ``recommend()``. This function is called every time recommendations for a user are requested from your recsystem. It may import any other modules as needed, whether they're your own modules (if you have decided to split your code among multiple files) or third-party packages like Numpy and Pandas. But at a minimum this file is the entry-point to your recsystem. For this tutorial we'll call it ``my_recsystem.py``. Create a file with that name and containing the following code: .. code-block:: python def recommend(state, user, articles, min_articles, max_articles): return [] The ``recommend()`` function must have exactly the same signature as given here. This is the bare minimum "working" recsystem insofar as that the recsystem will run and connect to the backend. You can start it by running: .. code-block:: shell $ python -m renewal_recsystem -t my_recsystem.py It should output some log messages that look like:: 2021-06-01 17:03:45 MyComputer my_recsystem.py[16167] INFO starting up basic recsystem on https://api.renewal-research.com/v1/ 2021-06-01 17:03:45 MyComputer my_recsystem.py[16167] INFO initializing articles cache with 1000 articles 2021-06-01 17:03:45 MyComputer my_recsystem.py[16167] INFO initializing websocket connection to wss://api.renewal-research.com/v1/event_stream 2021-06-01 17:03:45 MyComputer my_recsystem.py[16167] INFO ping() -> 'pong' But beyond that it will never provide useful recommendations because it simply returns an empty list. We want it to return a list of articles to recommend to the user. In particular, our ``recommend()`` function should return a list of *article IDs* of the best articles we want to recommend to the user. Testing the recsystem --------------------- In the previous section we defined a function called ``recommend()`` which takes some arguments. But how do we actually *call* that function in order to test it? What arguments does it take? Actually, we never call this function directly. It is called for us whenever our recsystem receives a request from the backend for recommendations for a user. Under normal operation, this means we would have to wait around for our recsystem to be assigned some users, and for those users to generate activity (i.e. fetching news recommendations in the mobile app). However, for testing and development of our recsystem, there is a separate utility that allows our recsystem to be called "on demand" with test data. The results of these test calls are not used by the backend, and do not in any way impact the performance of our recsystem in a contest. To test remote calls to your recsystem, while your recsystem is running open a separate terminal and use the test utility like:: $ python -m renewal_recsystem.test -t We pass this command the same ```` as when running the actual recsystem, in order to authenticate to the backend. Then ```` is the name of any method we want the backend to call on our recsystem. For example: .. code-block:: bash $ python -m renewal_recsystem.test -t recommend [] This prints ``[]`` which is the return value of the ``recommend()`` function we just implemented. If we look at the logs of our recsystem we also see something like:: 2021-06-03 17:52:55 MyComputer my_recsystem.py[29595] INFO recommend(user_id='fake-user', max_articles=200, min_articles=15) -> [] The candidate articles ---------------------- When you wrote the stub for your ``recommend()`` function it took a number of arguments: ``state``, ``user``, etc. Let's take a look at what those look like by augmenting the function to log their values: .. code-block:: python import logging log = logging.getLogger(__name__) def recommend(state, user, articles, min_articles, max_articles): log.info(f'state: {state}') log.info(f'user: {user}') log.info(f'articles:\n{articles}') log.info(f'min_articles: {min_articles}') log.info(f'max_articles: {max_articles}') return [] Restart your recsystem (hit Ctrl-C if it's still running) and try making another test call:: $ python -m renewal_recsystem.test -t recommend In the logs we should see something like:: 2021-06-03 18:02:37 MyComputer my_recsystem.py[30460] INFO state: {} 2021-06-03 18:02:37 MyComputer my_recsystem.py[30460] INFO user: User(uid='fake-user', interactions=defaultdict(, {})) 2021-06-03 18:02:37 MyComputer my_recsystem.py[30460] INFO articles: authors date ... title url article_id ... 48573 [Par, La Rédaction] 2021-06-02T19:56:29 ... France-Galles : les Bleus mènent à la pause gr... https://sport24.lefigaro.fr/football/euro-2020... 48572 [Par Le Figaro Avec Afp] 2021-06-02T20:00:38 ... Israël: le parti arabe Raam formalise son appu... https://www.lefigaro.fr/international/israel-l... 48571 [Kenneth Chang] 2021-06-02T20:04:00 ... New NASA Missions Will Study Venus, a World Ov... https://www.nytimes.com/2021/06/02/science/nas... 48570 [Stéphany Gardier, Par Stéphany Gardier] 2021-06-02T19:10:32 ... Un retour d’expérience rassurant sur des milli... https://www.lefigaro.fr/sciences/un-retour-d-e... 48569 [Vincent Bordenave, Par Vincent Bordenave] 2021-06-02T19:11:14 ... Covid-19: protéger les plus jeunes pour attein... https://www.lefigaro.fr/sciences/covid-19-prot... ... ... ... ... ... ... 47578 [Paul Carcenac, Par Paul Carcenac] 2021-05-25T21:01:28 ... Breton, belge, californien... Le cercle des va... https://www.lefigaro.fr/sciences/breton-belge-... 47577 [Par Le Figaro Avec Afp] 2021-05-26T06:01:27 ... Livraisons de vaccins : l'UE et AstraZeneca s'... https://www.lefigaro.fr/societes/livraisons-de... 47576 [Par Le Figaro Avec Afp] 2021-05-25T17:12:15 ... Covid-19 : l'Académie de médecine préconise de... https://www.lefigaro.fr/sciences/covid-19-l-ac... 47575 [Elsa Bembaron, Par Elsa Bembaron] 2021-05-25T16:27:21 ... Le carnet de rappel sera obligatoire à l'entré... https://www.lefigaro.fr/secteur/high-tech/le-c... 47574 [Par Le Figaro Avec Afp] 2021-05-26T17:13:00 ... Covid-19 : le variant indien présent dans 53 t... https://www.lefigaro.fr/sciences/covid-19-le-v... [1000 rows x 11 columns] 2021-06-03 18:02:37 MyComputer my_recsystem.py[30460] INFO min_articles: 15 2021-06-03 18:02:37 MyComputer my_recsystem.py[30460] INFO max_articles: 200 For this section we are focusing on the last 3 arguments: ``articles``, ``min_articles``, and ``max_articles``. Of these, the last two are simply integers giving hints as to how many articles the backend wants your recsystem to return. Usually these will be the same values each time, but they may change as they are adjustable parameters. Your recsystem should return a minimum of ``min_articles`` recommendations on each call to ``recommend()`` in order for your recommendations to be considered. The ``articles`` argument, on the other hand, is a :ref:`Pandas DataFrame ` containing a collection of candidate articles to recommend to the user. This includes a backlog of past articles and new articles sent to your recsystem when it started running. It also has pre-filtered out articles that user has already been recommended. .. note:: The pre-filtering of already recommended articles is not always perfect as there may be race conditions. However, they should be mostly unique. As long as your recsystem returns a good number of results (well above ``min_articles`` but below ``max_articles``) it should have more than enough recommendations to be considered for the user. Further exploring the articles data ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ The exact format of the articles `~pandas.DataFrame` is not fully documented here. In order to more easily explore it, you could add something like: .. code-block:: python articles.to_csv('articles.csv') to your ``recommend()`` function to save the articles to a file. Then in a separate Python prompt or Jupyter Notebook open it like: .. code-block:: python >>> import pandas >>> articles = pandas.read_csv('articles.csv') .. note:: In order debug your function, it might be a good idea to insert ``breakpoint()`` at the beginning of your ``recommend()`` function, then call ``python -m renewal_recsystem.test recommend``. This will drop you into `PDB: the Python debugger ` in which you can explore the value of ``articles`` interactively. However, in order for this to work you must also change the function definition from ``async def recommend(...):``. This will be explained in a future chapter. One of the more interesting columns is ``articles.metrics``: .. code-block:: python >>> articles.metrics article_id 48573 {'bookmarks': 0, 'clicks': 0, 'dislikes': 0, '... 48572 {'bookmarks': 0, 'clicks': 0, 'dislikes': 0, '... 48571 {'bookmarks': 0, 'clicks': 0, 'dislikes': 0, '... 48570 {'bookmarks': 0, 'clicks': 0, 'dislikes': 0, '... 48569 {'bookmarks': 0, 'clicks': 0, 'dislikes': 0, '... ... 47578 {'bookmarks': 0, 'clicks': 0, 'dislikes': 0, '... 47577 {'bookmarks': 0, 'clicks': 0, 'dislikes': 0, '... 47576 {'bookmarks': 0, 'clicks': 0, 'dislikes': 0, '... 47575 {'bookmarks': 0, 'clicks': 0, 'dislikes': 0, '... 47574 {'bookmarks': 0, 'clicks': 0, 'dislikes': 0, '... Name: metrics, Length: 1000, dtype: object For each article this gives a tally of all user interactions with that article, how many users have clicked on it, liked it, etc. We will use this for the example in the next section. Simple popularity-based recsystem --------------------------------- What we've learned so far is enough to build a recsystem that actually makes some recommendations. For starters we'll add to ``my_recsystem.py`` a very simple function that measures the "popularity" of a single article given its ``metrics`` dict, using a very naïve metric (which you can take your own time to enhance): .. code-block:: python def popularity(metrics): """ Returns a measure of an article's popularity. The formula is ``max(clicks, 1) * ((likes - dislikes) or 1)``. You could replace this with a more sophisticated measure of popularity. """ clicks = metrics.get('clicks', 0) likes = metrics.get('likes', 0) dislikes = metrics.get('dislikes', 0) return max(1, clicks) * ((likes - dislikes) or 1) Basically the articles with the most clicks are the most "popular", though it is "weighted" by the number of likes minus the number of dislikes. Now we can sort our candidate articles from greatest to least popularity like: .. code-block:: python def recommend(state, user, articles, min_articles, max_articles): # Drop articles that don't have a 'metrics' dict articles = articles.dropna(subset=['metrics']) # Sort articles by most to least popular according to the # `popularity` function applied to their metrics dicts. articles = articles.sort_values( 'metrics', key=lambda m: m.apply(popularity), # type: ignore ascending=False) # Take the top `max_articles` most popular articles = articles.iloc[:max_articles] return list(articles.index) Here `articles.sort_values ` sorts the articles according to their ``metrics``. It takes as a sort key a function that applies the ``popularity()`` function to each article. At the end we return ``list(articles.index)``. The articles tables is indexed by their ``article_id``, so this results in a list of the article IDs of the articles we want to recommend. Let's test it:: $ python -m renewal_recsystem.test -t recommend [48573, 47902, 47915, 47914, 47913, 47912, 47911, 47910, 47909, ...] This should return a long list of article IDs. If you look at the logs for your recsystem you should see something similar logged. .. note:: Make sure you've restarted your recsystem after making the code changes. Hot reloading isn't implemented yet! The example we've seen here is actually used for one of the baseline recsystems: `renewal_recsystem.baseline.popularity `_