In order to build robust quantitative trading strategies we need a strong foundation to rely on: data.

When building quantitative models a common pitfall is to invest a lot of time into algorithms while neglecting the foundation: the input data. The unlucky result will often be an over-optimistic model that cannot be reproduced in real life. Even with reputable vendors data may be a revised in non-obvious and non-intuitive ways. Keynum has set out to create a solution for robust data and has started its own data initiative and API.

Here, we emphasize building a stable core of usable datasets that are frozen in the sense that we are, at any point in time, able to recreate the dataset in its original form. Also, we emphasize datasets that are likely to be of general interest but hard to get.


In God we trust. All others must bring data. (William E. Deming)

Data pipelines

Our data is organized in data libraries which are specific to the actual type of data. This allows us to quickly check datasets for completeness and obvious errors. From data ingestions to data preparation our processes use well-defined pipelines to ensure unbiased replication of datasets for model building.

While our datasets are cutting edge and updated frequently, finding ways to organize data in an easily accessible way dates back many hundreds, even thousands, of years. We believe that in some ways technology has allowed us to be more lazy with the ways we care for and update data. However, for quantitative models, we prefer quality of data to quantity of data. We have automated most data acquisition processes and run regular quality and plausibility checks on our datasets.

Data is the new Oil. (Clive Humby)