How Recsystems Work

The ultimate goal of a recsystem is recommend news articles to a user, with an aim towards providing that user the articles that will be of the most interest to them.

How you, as a contest participant, accomplish this is entirely up to you. You can use any algorithm, such as,

Todo

List some examples of how a user might implement a recommendation system.

You can also write your recommendation system in any programming language or language(s), as well as any auxiliary tools e.g. for training your algorithm on data provided by the Renewal backend.

However, plugging your recommendation algorithm into the rest of the Renewal platform will require a minimal understanding of how the Renewal platform works, and how it connects to and interacts with your recsystem.

Your recsystem must include a “real-time” component: a software service that is always running during the duration of the contest, in order to provide news recommendations to users of the Renewal app. When users of the app refresh their news feed, their app makes a request to the backend. The backend in turn assigns the user to two more recsystems that will be placed head-to-head in competition for the user’s eyeballs. At this moment each recsystem assigned to the user is requested a batch of recommendations for that user, which are returned to the backend, and from the backend to the user’s app.

Todo

Add a simplified sequence diagram showing what happens when a user refreshes their news feed. It can be based on the one at https://gitlri.lri.fr/renewal/Renewal#appendix-sequence-diagrams but simplified to omit most of the backend details.

Again, this “real-time” service can be written in most any programming language since it communicates with the Renewal backend using standard Web technologies such as WebSockets and JSON-RPC which have implementations for most popular languages, including Python, JavaScript, R, Go, etc.

The renewal_recsystem Python package also provides a base recsystem implementation in Python which takes care of all the boilerplate programming such as handling the WebSocket connection, so that you can focus on the parts that matter to your recsystem, simply by providing implementations for a few stub functions. However, whether you use the renewal_recsystem package or roll your own is entirely your choice. For details on how to use this package to implement your recsystem, see the Quickstart Guide.

Note

Contributions either to the renewal_recsystem package or of new boilerplate recsystems in other languages are also highly encouraged.

The rest of this documentation focuses on implementation of the “real-time” component of your recommendation system; that is, the software that responds in real time to recommendation requests. It may be the only component of your recsystem, or you may have various additional code run “offline” for training your models.

Lifecycle of a recsystem

The primary purpose of each recsystem is to respond to requests for news by users of the mobile app. Every time a user refreshes the app, the app (via communication with the backend) will request news recommendations for that user from two or more recsystems. Each recsystem connected to the backend (including yours!) is assigned one or more users for which they are currently providing recommendations. The user assignments are rotated on an occasional basis (e.g. once per week).

Responses to requests for recommendations should be fast–typically under one second–in order to not keep the user waiting. How this is done is up to you: For example, you can build a model of each users’ preferences in the background, and then use that model to decide which news articles to send the user when they make a request.

In order to accomplish this, your recsystem is responsible for managing a few things:

  • A set of users currently assigned to your recsystem.

    • If you wish, you may also build models around users not currently assigned to your recsystem, in case they are later assigned to you.

  • A corpus of news articles to use in building your models. These are news articles provided to you by the backend, including metadata such as the article’s news source (i.e. what website/newspaper it came from) title, text, keywords, etc.

  • User interactions: For example, when a user clicks on and reads or rates an article, you will want to know about that in order to build your model of that user’s preferences.

As such, a typical lifecycle for a recsystem is as follows:

1. Initialization

When first starting up, your recsystem will want to know:

  1. To which users am I currently assigned.

  2. What are some articles I can work with.

This can be accomplished by making a couple requests to the Renewal backend using its API. For example, to request your assigned users, make an HTTP GET request to https://api.renewal-recsystems.com/v1/user_assignments.

Currently this just returns a list of opaque user IDs like:

["Mhkc4xuaFPWnmbFomIv8drAtsn13","ct4LvjwHDOXdIGH1kJUtAvVQgmv1"]

This will be updated in the near future to return other details about the user that they have opted in to sharing with the app such as their age, location, gender, etc. It will not contain other personally identifying information such as their names or e-mail addresses.

You will also want to have some news articles that you can recommend to users. You can fetch a list of recently crawled news articles from the backend by making an HTTP GET request to https://api.renewal-recsystems.com/v1/articles. This returns a list of article documents that look something like:

{
  "article_id": 10999,
  "authors": [
    "Brooks Barnes"
  ],
  "date": "2020-09-30T01:14:41",
  "image_url": "https://static01.nyt.com/images/2020/09/25/business/25virus-disneyparks-3/25virus-disneyparks-3-facebookJumbo.jpg",
  "keywords": [
    "workers",
    "unionized",
    "world",
    "newsom",
    "disneyland",
    "mr",
    "theme",
    "quarter",
    "lays",
    "florida",
    "park",
    "disney",
    "restrictions"
  ],
  "lang": "en",
  "metrics": {
    "bookmarks": 0,
    "clicks": 0,
    "dislikes": 0,
    "likes": 0
  },
  "site": {
    "icon_url": "http://localhost:8080/v1/images/icons/5f68e3404b19bc8dd873ef25",
    "name": "NYTimes",
    "url": "www.nytimes.com"
  },
  "summary": "In Florida, where government officials have ...",
  "text": "Disneyland in California has remained closed ...",
  "title": "Disney Lays Off a Quarter of U.S. Theme Park Workers",
  "url": "https://www.nytimes.com/2020/09/29/business/disney-theme-park-workers-layoffs.html"
}

where every article has a unique integer article_id. You may store these article documents however you like, whether in memory, or your own database of articles.

While the recsystem is running it is not necessary to make frequent requests for more articles. Instead, every time the backend scrapes a new article it will be sent to your recsystem. See the next step in the lifecycle.

To view a working example of initializing a recsystem see the source code for BasicRecsystem.initialize.

2. Event loop

After your recsystem is initialized it will enter an event loop, in which it listens for and in some cases responds to events sent to it by the backend (using JSON-RPC).

The most important such event will be recommend requests: This happens when a user assigned to your recsystem requests new articles through the app. Your recsystem will respond to this request by returning a list of article_ids based on your recommendation model for that user.

Most other events do not require a response, and are merely to notify your recsystem of something interesting. In particular:

  • article_interaction: received when a user interacts with an article in any way, such as clicking on it or rating it. This notification will be in the form of an interaction record. You can use this event to tune your recommendation model for that user.

  • new_article: received every time the backend crawls a new news article. You can add this to your existing database of articles in order to always return the freshest news to users.

  • assigned_user: received every time the backend assigns a new user to your recsystem; similarly unassigned_user.

The full list of events your recsystem should be able to handle are documented in the JSON-RPC API.

For the rest of the time it is running, your recsystem is simply waiting for and responding to new events.

How it works

Hosting requirements

The basic networking requirements for running a recsystem are minimal. The recsystem does not act as a server; rather it only makes outgoing connections to the Renewal backend over the standard port 443 used for secure Web connections. What this means is that your recsystem can run on most any computer with an internet connection. There is no need to open a firewall for incoming connections–if your recsystem is running on a computer that can connect to websites, it can connect to Renewal.

Once connected, bi-directional communication between your recsystem and the backend is achieved using WebSockets.

Thus, the main requirement is to run your recsystem on a computer that can be expected to have reliable up-time and internet connection, since when your recsystem is down it cannot respond to recommendation requests, and its overall ranking will diminish.

Todo

Maybe provide a list of some hosting options, either at the university or publicly available.

Note

In the future the Renewal project may provide hosting for recsystems, but presently does not have the infrastructure set up.

Connecting to the backend

Your recsystem has two methods of communicating with the Renewal backend: At any time, whether while running in real-time, or for “offline” data analysis and model training, it can access the HTTP API to download data on articles and users from the database.

The current URL for the backend API is:

https://api.renewal-research.com/v1/

So all connection to the backend will start with HTTPS requests to endpoints under that URL.

The majority of communication your recsystem will have is via the “event stream”, over which your recsystem will receive notification about events on the system–when new articles become available, when users click on articles, assignments of your recsystem to users, etc. as well as respond to requests for article recommendations for users. Your recsystem will connect to the WebSocket interface via the URI:

wss://api.renewal-research.com/v1/event_stream

All messages sent by the backend to the recsystem over WebSockets are in the form of JSON-RPC requests, and all messages sent by your backend will be in response to certain JSON-RPC requests. Your recsystem must implement the full JSON-RPC API documented here.

See the next section for an introduction to WebSockets and JSON-RPC if you are unfamiliar with these technologies. However, if you build a recsystem in Python on top of the renewal_recsystem.RenewalRecsystem class provided by this package, it is not necessary to fully understand how to use these protocols, as it implements all the details, and all you need to provide are implementations of some of the functions called on your recsystem via RPC.

Recsystem as WebSockets client

Because your recsystem initiates requests to the Renewal backend server, including when making a WebSockets connection, it acts as a client to the backend’s WebSockets server. See the WebSockets primer for more details.

Recsystem as JSON-RPC server

WebSockets are merely a transport mechanism which can carry any type of message. For effective communication between two ends of a WebSocket connection an additional protocol is needed. The Renewal backend uses JSON-RPC for this.

However, in the JSON-RPC context your recsystem acts as a JSON-RPC server. That is, it provides implementations of set of methods or “procedures” which the Renewal backend calls remotely on your recsystem, and to which your recsystem returns responses. So once the WebSocket connection is established, all communications between the two ends are initiated by the backend in the form of RPC calls, and the only messages your recsystem sends are responses to (some of) these RPC calls.

Any messages sent by your recsystem that are not RPC responses are ignored. See the JSON-RPC primer for more details.

Authentication

Almost all requests made by your recsystem to the backend must be authenticated, including when connecting to the WebSocket interface.

Authentication is performed by passing an authentication/authorization token in the form of a JSON Web Token (JWT) which you will be provided by an administrator when registering your recsystem.

The token should be provided along with each request in the Authorization HTTP header using the Bearer scheme. That is, each request must send a header in the form:

Authorization: Bearer eyJ0eXAiOiJKV1QiLCJhbGciOiJIUzI1NiJ9.e30.uJKHM4XyWv1bC_-rpkjK19GUy0Fgrkm_pGHi8XghjWM

where the string after Bearer in this case is an example JWT to be replaced with your actual token.

Warning

The JWT token acts both to identify and authenticate your recsystem. Treat it as you would any password: Take every step to keep it private, as having this token will allow anyone to identify themselves as your recsystem.

If working in a team, you should strongly consider using a secure password manager for teams in order to share the token.

If the token is lost or revoked, contact the Renewal administrator who registered you to have the token revoked and to obtain a new one.

Todo

In the future it will be possible to revoke/regenerate recsystem tokens through the website.