Caravan - A global community dataset for large-sample hydrology

This paper introduces the Caravan dataset, a global large-sample hydrology dataset that builds on cloud computing to be extensible by anyone.

Abstract

High-quality datasets are essential to support hydrological science and modeling. Several CAMELS (Catchment Attributes and Meteorology for Large-sample Studies) datasets exist for specific countries or regions, however these datasets lack standardization, which makes global studies difficult. This paper introduces a dataset called Caravan (a series of CAMELS) that standardizes and aggregates seven existing large-sample hydrology datasets. Caravan includes meteorological forcing data, streamflow data, and static catchment attributes (e.g., geophysical, sociological, climatological) for 2532 catchments. Most importantly, Caravan is both a dataset and open-source software that allows members of the hydrology community to extend the dataset to new locations by extracting forcing data and catchment attributes in the cloud. Our vision is for Caravan to democratize the creation and use of globally-standardized large-sample hydrology datasets. Caravan is a truly global open-source community resource.

Paper

Kratzert, F., Nearing, G., Addor, N., Erickson, T., Gauch, M., Gilon, O., Gudmundsson, L., Hassidim, A., Klotz, D., Nevo, S., Shalev, G., and Matias, Y.: Caravan - A global community dataset for large-sample hydrology, EarthArxiv, https://doi.org/10.31223/X50S70, in review, 2022.

Contributing and further resources

Our main vision with Caravan is that this dataset will grow over time. Anyone, with as little as streamflow records and catchment boundaries of one (or more) basins, can contribute to extending the Caravan dataset to new regions. The GitHub repository is your main source of information if you would like to contribute to Caravan.

The discussion forum on GitHub also acts as a community hub to share news and updates on Caravan, and for anyone to share extensions of Caravan to new regions.

Code to produce the dataset

Static snapshot of the dataset at the time of the submission

Citation

@article{kratzert2022caravan,
author = {Kratzert, F. and Nearing, G. and Addor, N. and Erickson, T. and Gauch, M. and Gilon, O. and Gudmundsson, L. and Hassidim, A. and Klotz, D. and Nevo, S. and Shalev, G. and Matias, Y.},
title = {Caravan - A global community dataset for large-sample hydrology},
journal = {EarthArxiv},
year = {2022},
doi = {10.31223/X50S70}
}