
Personalisation is everywhere these days, think of social media feeds presenting your data based on your profile, or browsing the web and uncannily spotting ads for products you’ve Googled moments ago. We’ve all become too accustomed to these practices, where a lot of money is generated from tools that carefully transform who we are and what we like into fat dollar signs.
But the voices that are demanding better privacy protection and ownership of our own data are increasingly louder, there’s a backlash towards these practices.
At Xayn this is exactly our mission, we started out building a mobile news app, which would feature personalisation, without user data ever actually leaving the device. Depending on user interactions, such as explicitly liking articles, or implicitly reading articles long enough, we would keep altering the news feed so that each next batch of presented articles would align more and more with what the user would be interested in.
This was not a simple feat, to maintain absolute privacy, we couldn’t rely on a server solution to process the user’s interactions (or the data would leave the device).
Personalisation means tracking what users like, in our case, we transform a desired article into a centre of interest, or if another centre exists and is found to be close enough, we then merge and evolve the existing one instead. You can think of a centre of interest as a collection of keywords, derived from longer texts, but then represented in a purely mathematical way. Eventually, these centres allow us to better understand what a user likes, then allow us to present better matching articles going forward.
So to optimise our feed, we run semantic similarity on the articles, using the centres of interest as a basis, and then present the top matching articles to the user.
Since we don’t use a server-side solution, we, therefore, bundled our A.I. model and related logic together with the app and began an uphill battle to reach an acceptable performance level - running a model is generally an expensive task. Solutions like ChatGTP for example, require a ton of energy to run, some estimates place it at 23 kgCO2e per day.
Granted, we were not embedding our own chatbot on a device, nevertheless, the size of our model had to be as small as possible, and the token size for our embeddings had to be low enough so that the device could calculate similarities within an acceptable time.
We chose a tiny and lightweight student model and further reduced the size of the layers to 1512 intermediate size and 128 embedding sizes. Next, we modified the sequence length to 128 from 512 to make the model inference more performant in terms of latency.
Via distillation, we then inferred the behaviour of a large teacher model on our student model.
Finally, we converted the model to ONNX, so that we could run it in Tract.
Tract is a neural network inference engine, think of it as a lightweight version of TensorFlow. It’s maintained as an open-source project by Sonos and is available for the Rust programming language.
Rust itself is well known for its excellent performance and memory safety. This alone already made it an ideal candidate for our project; however, speed was not the only reason to choose Rust. The mobile market is dominated by Android and iOS and each platform runs on different CPU architectures still.
With Rust, we could compile against the most common architectures, even targeting web assembly if we wanted to. We could really optimise our binary even further, using LTO for example.
Today, we are no longer building this app, but we’ve taken what we’ve learned and instead offer our personalisation as a service for B2B clients. Our goal has shifted to providing a service that is reliable, affordable, and of course, performing well enough on benchmarks.
With Rust, we were able to create a robust web application on top of the existing framework that we had developed for mobile devices and we expose our functionality via APIs written in the open API standard.
Given all the optimisations mentioned in this article, moving to the cloud gave us a clear competitive edge – we could offer semantic similarity and semantic search at a very cheap price, and we are continuing our efforts to provide more NLP-related functionality down the line.
At the same time, we are working hard to keep improving our model and to score better on benchmarks.