Recsystem Interface Documentation ================================= If you've worked though the :doc:`quickstart` you've seen the basics of how to write and run a recsystem for Renewal competitions. That guide introduced the main function all recsystems have to implement: ``recommend()``. But it did not delve into all the details that might be needed to implement a non-trivial recsystem. This guide lists all of the other special "hook" functions you can write to make your recsystem respond to user activity from the mobile apps, and also explains how you can use the :ref:`state ` dict to hold data specific to your recommendation algorithm. The hook module --------------- This is the main entry-point to your recommendation system. The `renewal_recsystem` package handles all the heavy-lifting of managing the network protocols and concurrency, so that you can just focus on writing a few functions that instruct the package on how you want to provide recommendations to users, as well as collect data on users' activity (as well as any additional background work you want the recsystem to perform). The hook module is the Python file you pass as the argument to the ``python -m renewal_recsystem`` script. If your code is complex enough that it needs to grow beyond one file, your hook module is free to import code from other files, as well as from any Python packages installed on your system. See the next section for a complete list of the functions understood by `renewal_recsystem` that you can include in your hook module. .. _recsystem-hooks: Available hook functions ------------------------ The following hook functions are pre-defined by the system. Some of them have exact names and some of them have naming patterns you can follow. All of them should be implemented with the exact call signatures defined here. .. contents:: :local: .. _recsystem-recommend-hook: ``recommend`` ^^^^^^^^^^^^^ .. code-block:: python def recommend(state, user, articles, min_articles, max_articles): ... return [] This is the only function that is *required* to be implemented in your hook module. It is called every time one of the users assigned to your recsystem requests a list of news recommendations. In a future version it also may be called periodically by the Backend in order to pre-queue recommendations for users, but from the perspective of your recsystem the two cases are no different (except that you should make sure to return unique sets of recommendations on each call, for each user. * The ``state`` argument is explained in :ref:`recsystem-state`. * The ``user`` argument is a `User` object representing the user your are making recommendations for. * The ``articles`` argument is a `pandas.DataFrame` representing candidate articles to recommend to the user. To the extent possible (outside race conditions) this contains articles not already recommended to that user by any recsystem. * The ``min_articles`` and ``max_articles`` arguments are *hints* specifying how many recommendations the function should return. If less than ``min_articles`` are returned, this recommendation list will be discarded by the backend, as there are not enough recommendations to make a meaningful head-to-head contest with the other recsystem(s) assigned to the user. Returning more than ``max_articles`` does not currently disqualify your recommendations, but recommendations beyond the first ``max_articles`` returned will be discarded. .. _recsystem-article-interaction-hook: ``article_interaction`` ^^^^^^^^^^^^^^^^^^^^^^^ .. code-block:: python def article_interaction(state, user, articles, article_id, interaction): ... This *optional* hook function is called every time one of your recsystem's *assigned* users interacts with an article in any way. This can be used to make real-time updates to your model of the user, e.g. any statistics you are keeping of the user's preferences. The `User.interactions` attribute also keeps a tally of all the user's interactions with all articles the user has seen. This is updated automatically whenever a user interacts with an article, before your ``article_interaction`` function is called. For example, if a user clicked on article ``1234``, then the following will be true: .. code-block:: python user.interactions[1234]['clicked'] == True So you don't have to keep track of these basic metrics yourself. You can see an example implementation of ``article_interaction`` in the `keywords-based example recsystem `_. This recsystem keeps scores for keywords found in articles that each user interacts with. * The ``state`` argument is explained in :ref:`recsystem-state`. In the case of ``article_interaction`` you might use the ``state`` dict to track running statistics of the user's preferences, such as similarity scores. * The ``user`` argument is the `User` object representing the user. * The ``articles`` argument is the `pandas.DataFrame` containing the corpus of articles available to your recsystem. This is similar to the ``articles`` argument to :ref:`recsystem-recommend-hook` except it also contains articles the user has not interacted with yet. * The ``article_id`` argument is the ID of the article the user interacted with. Thus, you can look up the full record for that article by using: .. code-block:: python article = articles.loc[article_id] * The ``interaction`` argument is a `dict` specifying the type of interaction that took place. Typically it has one or two keywords specifying the type of interaction. Here are the current possibilities: * ``{'recommended': True}`` this is a special case that just means the user recently refreshed the app and received this article as recommendation (but has not yet clicked on it or rated it). * ``{'clicked': True}`` the user clicked on the article to read it. * ``{'rating': 1, 'prev_rating': 0}`` the user "rated" the article's interest to them (whether or not they read it). The rating can be either ``-1`` (the article is not interesting), ``1`` (the article is interesting), or ``0`` (no opinion). You will only ever see ``{'rating': 0}`` if a user previously rated the article and then changed their mind. The ``'prev_rating'`` is the user's previous rating of the article. This can be used to recalculate scores in case a user rates an article, but then later change their minds (for example they might rate it ``1``, but then read the article, decide it wasn't interesting, and change their rating to ``-1``). * ``{'bookmarked': True}`` the user added the article to their bookmarks. More interaction details will be added in a future version, including the percent-read of the article, and geolocation details (if the user has allowed geolocation). .. _recsystem-lifecycle-hooks: ``initialize`` and ``shutdown`` ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ .. code-block:: python def initialize(state, users, articles): ... .. code-block:: python def shutdown(state): ... These are lifecycle hooks that are called shortly after your recsystem starts up, and before it exits cleanly (where "cleanly" means it is not terminated forcefully such as with ``kill -9``). This can be used for any additional steps you want to perform at the startup of your recsystem, such as initialize the :ref:`state ` or save the state at shutdown. See :ref:`interface:State persistence` and the `keywords recsystem `_ for an example of how you can load and save your state dict from a `pickle` file. Though in the future state persistence will be handled automatially (see `issue #15 `_). .. _recsystem-background-hooks: ``background_*`` ^^^^^^^^^^^^^^^^ .. code-block:: python def background_(state, users, articles): ... If you define *any* function whose name begins with ``background_`` (the rest of the name is up to you) that function is run repeatedly in the background in an infinite loop. For example if you have a function named ``background_work`` it is run (schematically) like: .. code-block:: python while True: update = background_work(state, users, articles) state = apply_state_update(state, update) This can be used for example to perform intensive calculations that take a long time, and that would otherwise introduce too much latency into functions like ``recommend()``. For example, it could be performing running updates of similarity calculations between articles. .. warning:: Be careful to use ``background_`` functions for work that is performed very "fast" (e.g. less than a few milliseconds). See `How to Profile Your Code in Python`_ for tips on how to measure the execution speed of your functions. This is because every ``background_`` function is called repeatedly in an infinite loop, and could create a bottleneck if it is being called too often. For tasks that might be short but that you still want to call periodically, see :ref:`recsystem-every-hooks`. .. _recsystem-every-hooks: ``every_`` ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ .. code-block:: python def every_(state, users, articles): ... *or* .. code-block:: python def every__(state, users, articles): ... These are like ``background_`` but allow you to define hook functions that are scheduled periodically. For example, if you write a function named ``every_minute_calculate_scores`` that function will be called once every minute. Alternatively, you can use a name scheme like ``every_30_seconds_calculate_scores`` to run the function every 30 seconds. The time units "seconds", "minutes", "hours", and "days" are available. The function is re-scheduled after its last call completes. So for example if you have a function that is called every second, but it takes more than a second to complete, its next call will be one second after it completed. In other words, you won't have multiple calls of the same periodic hook running simultaneously. So you might choose a period that represents an upper bound on the time *performance* of the hook function. .. _recsystem-state: Recsystem state --------------- Here we explain the use of the ``state`` argument that is passed as the first argument to all hook functions. The ``state`` argument is a Python `dict` which may contain any number of nested dicts. It's your recsystem's own work area where it can store any data specific to your recsystem's functionality. For example, say you are performing sentiment analysis on articles. You would like to peridically compute sentiment scores for articles, and you will need a place to save these scores (in order to avoid recomputing them). You could add a key to your ``state`` named ``"article_sentiments"`` containing a dictionary mapping article IDs to the sentiment analysis results. In this case the state (or this portion of it) could look something like: .. code-block:: python { "article_sentiments": { 12345: "happy", 12346: "sad", 12347: "neutral" } } .. note:: Technical note for the curious: You may ask "Why do I need to pass this ``state`` argument around? Why can't I just use a global variable?" In many cases using a global variable will not work, because in order to keep your recsystem able to handle many events concurrently, your hook functions may be run in some separate processes. If you use a global variable for this, changes you make to its value will not be propagated correctly to the whole system. This is also why your hook functions should return :ref:`recsystem-state-updates`. The keys in the ``state`` dict may be any type that can be used as a dictionary key in Python (strings, integers, tuples, etc). However, keys and values must be able to be `pickled `. Fortunately, this is true for most types you will likely encounter in Python data science, such as Numpy arrays and Pandas DataFrames, etc. .. _recsystem-state-updates: State updates ^^^^^^^^^^^^^ Most of the :ref:`hook functions ` defined may ``return`` a value referred to as a "state update" performed by that function. It informs the system which parts of the state you want your hook function to modify. The state update is also a `dict`, but you *should not* simply modify the original ``state`` dict and return it. This could result in your hook functions overstepping each other and clobbering each other's results. Instead, each call to a hook function should only return a `dict` representing the parts of the state changed by that call. This update will be automatically merged into the "real" state that will be passed to future hook calls. Returning to the previous example, if you have a function ``every_10_seconds_perform_sentiment_analysis`` to update the sentiment analysis for new articles, and it finds a new article with ID ``12456``, the hook function should return a state update like: .. code-block:: python { "article_sentiments": { 12456: "mixed" } } This informs the system that there is a new key/value pair to add to ``"article_sentiments"`` and that no other part of the state needs to be touched. Special case: ``recommend()`` """"""""""""""""""""""""""""" With one exception, the return value of every hook function is a state update (or no return value if you have a hook function that does not update the state). The ``shutdown()`` hook is also a corner case since any state update it returns will be ignored, as the system is shutting down. However, the ``recommend()`` function normally returns a `list` of article IDs, not a state update. If you have a ``recommend()`` function in which you also want it to update the state (e.g. maybe to keep some statistics on how many recommendations it's made to each user) it can return a tuple: ``(recommendations, state_update)``. State persistence ^^^^^^^^^^^^^^^^^ Currently (though this might change in the future) the ``state`` is not *persisted* automatically. That is, when you shut down your recsystem and start it up again, it will always start with an empty state (``{}``). Naturally, you will probably want to be able to keep your recsystem's data over the course of a competition. Currently, the best way to do that is to define some :ref:`recsystem-lifecycle-hooks` hooks like: .. code-block:: python import logging import os import pickle log = logging.getLogger(__name__) STATE_FILENAME = __name__ + '.pickle' def initialize(state, users, articles): # This runs every time your recsystem starts up if os.path.isfile(STATE_FILENAME): with open(STATE_FILENAME, 'rb') as fobj: state = pickle.load(fobj) log.info(f'loaded previous state from {STATE_FILENAME}') return state def shutdown(state): # This runs every time you stop the recsystem cleanly and in most # cases if it crashes save_state(state) def save_state(state): pickled_state = pickle.dumps(state) with open(STATE_FILENAME, 'wb') as fobj: fobj.write(pickled_state) log.info(f'saved updated state to {STATE_FILENAME}') To protect against unfortunate catastrophes (e.g. your computer crashes) you might also want to periodically save state updates: .. code-block:: python def every_30_seconds(state, users, articles): save_state(state) Alternatives ^^^^^^^^^^^^ The ``state`` dict is provided as a quick and convenient space to store your recsystem's data at runtime. Its use is purely optional. For example, some contestants might choose instead to use an external storage method for their recsystem's data, such as a database (SQLite, MongoDB, Redis, etc.). This is perfectly allowed. A combination of the two can also be used, such as using ``state`` as a cache, but using a database for longer-term persistence. The choice is yours! Async hooks functions --------------------- .. todo:: Explain how to write hook functions with ``async def`` instead of ``def``, what this means, and when and why to use it. .. include:: links.txt