Aims The continuous observation datasets of water, heat, and carbon fluxes measured by the eddy covariance technique are important basis for accurate assessment of regional carbon sequestration and water-holding capacity. However, the rate of gaps in flux datasets is high and common due to various reasons, and different gap-filling methods increase the uncertainties of the related studies. The aim of this study is to introduce and test the applicability of boosted regression trees model (BRT), one of the up-to-date machine learning algorithms, for the gap- filling to flux datasets.
Methods Based on the published valid dataset of water, heat and CO2 flux, and main environmental factors, including air temperature, atmospheric water vapor pressure, wind speed, solar shortwave radiation, topsoil temperature, and topsoil water content of an alpine Potentilla fruticosa scrubland on the northeastern Qingzang Plateau from 2003 to 2005, the BRT were trained to fill flux data gaps and the results were compared to those corresponding data serials provided by Chinese Flux Observation and Research Network (ChinaFLUX).
Important findings The results showed that the BRT performed well for a large amount of samples (N > 10 000) and the regression slopes of observation data against predicted value were between 1.01 and 1.05 with R2 > 0.80. The BRT revealed that the daytime 30-min CO2 flux (net ecosystem CO2 exchange, NEE) in the growing season (i.e., May to October) was mainly controlled by solar shortwave radiation and atmospheric vapor pressure, whose relative contributions to NEE variability were up to 74.7%. The topsoil temperature was the determinant for NEE at night during the growing season and the whole day during the non-growing season, and its relative contribution was 68.5%. The 30-min sensible heat flux (H) and latent heat flux (LE) were both linearly related to solar radiation, and their relative contributions were above 58.6%. 30-min flux data gap amount filled by the BRT was significantly less than those by ChinaFLUX. Except for daily net ecosystem CO2 exchange (p = 0.14), daily gross ecosystem CO2 exchange (GEE), ecosystem respiration (RES), H, and LE of the BRT were significantly less than those of ChinaFLUX by 17.5%, 21.0%, 2.7%, and 2.2%, respectively. However, there was a reasonable consistency between the daily fluxes of 2003-2005 interpolated by the BRT and by ChinaFLUX due to the small magnitude difference (the regression slopes of the two data series were between 0.95 and 1.17). Except for monthly GEE and RES, monthly NEE, H, and LE of the BRT had no significant difference between the BRT and ChinaFLUX (p > 0.09). Compared with the ChinaFLUX gap-filling method, BRT can simulate the nonlinear relationships between fluxes and environmental factors without complicated mathematical expressions and quantify the relative contribution of environmental factors to the flux data gaps, which is a feasible technique for the integrated analysis of flux data.