In early 2020, we built a content curation system to automate and improve the process of building online communities using social media.
This system was part of our first venture, Infinitesax - a long term vision to democratise live concerts for amateur jazz musicians.
Our first goal was to form an organic community around talented, amateur saxophonists globally via an instagram account, @infinitesax. We focussed on developing a dynamic, low-touch curation system that identified amateur musicians on social media and re-posted model-curated content.
Under the hood, the infrastructure was largely like any other project - a data storage service and a pipeline to ingest, process & move data to that storage service. We used Python for scripting and MongoDB as our storage service.
Most of the value of the project came from a less common feat - our MLOps pipeline. This pipeline allowed us to perform automatic hyper-parameter optimisation and subsequent model selection on two common unsupervised learning families, K-Means and DBSCAN. Where, traditionally, model selection would require human judgement, we allowed constraint selection to free up our time without losing confidence in the performance of our system.
Like much of the live entertainment world, global disruptions have meant this Infinitesax is currently on hold. But for the same reason, we're even more proud to have brought 100k+ views to amateur artists globally via our novel content curation model.