It is not immediately obvious that a foundation model for time-series forecasting is possible. Unlike in NLP, there is no well defined vocabulary or grammar for time-series. Additionally, such a model would need to support forecasting with varying history lengths (context), prediction lengths (horizon) and time granularities. Furthermore, unlike the huge volume of public text data for pretraining language models, vast amounts of time-series data is not readily available.
The key elements of our foundation models are twofold:
- a large-scale time-series corpus built using both real-world (mostly time-series data from web search queries and Wikipedia page visits) and synthetic data, which meets the volume and diversity of data needed for training our foundation model
- a decoder style attention architecture with input patching, that can be effectively pre-trained on this time-series corpus.
Compared to latest LLMs, our time-series foundation model is much smaller in both parameter size (200M parameters) and pretraining data size (O(100B)) timepoints; yet we show even at such scales, it is possible to pretrain a practical foundation model for forecasting whose zero-shot performance comes close to the accuracy of fully-supervised approaches on a diverse set of time-series data. Unlike other works, that recommends LLMs such as GPT-3 and Llama-2 as out-of-box zero-shot forecasters, foundation mdoels trained from scratch exclusively on time-series data can obtain much better zero-shot performance at tiny fraction of its cost.
Existing works: …