Our proposed cyber-innovated global partnerships consist of an integrated cyberinfrastructure (CI) and an inclusive global team of scientists and data contributors. With four core functions, the CI supports FAIR access to in situ forest inventory data, advanced analysis tools and research codes, as well as a secure team collaboration platform. The team collaborations produce end-products (e.g., research papers, open-source data products, information co-production, and team growth), which will feed back to increase the amount of research projects and resource to ensure healthy and sustainable ecosystem-wide growth. (Figure 1):
[Data Ingestion] The stringent and complex multi-tier and multi-source data-sharing policies in forest sciences pose an obstacle to collaborative forest research. The CI incorporates multiple data-sharing policies – pre-defined by data contributors of local raw datasets – in a dynamic system so that these policies are enforced throughout the data lifecycle. The CI first collects local raw forest inventory data from around the world, and converts them into standard global datasets after error-checking and harmonization. According to the data-sharing policies pre-defined by data contributors, the CI distributes these global datasets to a number of ongoing research projects in the system. This data ingestion service enables data contributors to register and contribute data resources in the proposed CI, with a customizable data processing tool designed for both streaming and batch data ingestion.
[Data Discovery and Publication] The data discovery service and associated web user interface enable data users in the CI to discover processed datasets that have been generated from the contributed datasets. Depending on the inferred privacy level of a particular processed dataset, users will either be able to visualize point-level, heat-map, and contour views on a map (for open datasets), or only basic metadata such as the data coverage and number of datasets used in generation (for restricted datasets). Research products and datasets that are generated by users from their analysis of these processed datasets can be made public with an associated DOI using the existing data publication process.
[Project Management] Projects in the proposed CI are the key research collaboration and access control mechanism. Users can gain access to the processed datasets by proposing a research project and requesting access to the necessary datasets. Users can create new projects either via the web user interface, or when selecting a processed dataset to use in a new project. When a processed dataset is requested, an approval request to all the requisite contributors is triggered based on dataset’s metadata. Contributors can review a project’s description before granting approval and will also be automatically added as collaborators on the project, following their approval. Once approval has been obtained from all the relevant contributors, a processed dataset will be added to the project storage.
[Data Analysis & Machine Learning Toolkit] The team-wide collaboration is supported by a secure project sandbox that utilizes machine learning (ML) models in a wide range of applications such as visual search, predictor variable importance ranking, forecasting, and anomaly detection. Data analysis of the processed datasets will be carried out in an interactive computing environment (e.g. R studio). The CI includes a ML toolkit which will make it easier for non-experts to understand and apply ML technologies such as classification, regression, prediction, and clustering to their research problems, so that users do not have to write any code. The ML toolkit will also include ready-to-use models for widely used ML algorithms such as Convolutional Neural Networks (CNN), Support Vector Machines (SVM), k-Means, Decision Trees, and Random Forests.
Security and Privacy
The CI supergroups can restrict access to those registered users who are also approved as members of the supergroup by an administrator. Next, access to each of the processed datasets will be controlled via a fine-grained access control policy based on pre-defined privacy levels. Existing security policy around project creation and team-member-based access will be employed to ensure projects are only accessible by authorized team members. The CI’s plugin infrastructure supports the definition of custom event-handlers for various project-related events such as project creation and project file addition. Custom event handlers will be developed to implement the approval chain required for access to a processed dataset in a project. Specifically, an email will be generation for each requisite data contributor along with a secure URL that the contributor can visit to review the project’s description, approve, or disapprove the dataset’s inclusion, and provide optional comments. Only after approval by all requisite data contributors will the processed dataset be made available in the project’s file storage.
Integrity, Provenance, Transparency, Reproducibility, and Usability
Provenance will be captured for both the raw datasets and the processed data products in the CI and serialized as metadata attached to these files. Checksums will also be generated to enable future data validation as needed. The Schema.org metadata schema will be used to encode the file metadata and will include the data contributor information, contributor-specified data descriptions, as well as any derived spatio-temporal extents. Metadata on processed data products will be retained when copied to a project’s file storage, along with the necessary checksums. Generated data products, the corresponding analysis code and ML models, and their associated metadata can be published with an associated DOI and persistent URL to support discovery and reproducibility. Resource-intensive simulations can be off-loaded to high-performance computing (HPC) resources. Web access to projects, and scalable, containerized tools will ensure that users across the world have low-barrier access to research data contribution and usage.
Knowledge Co-production Among Stakeholders
Knowledge co-production will feed back to the CI to ensure a healthy and sustainable growth. In forest sciences, the CI will connect forest inventory personnel and policy makers under one roof, so that forest inventories funded by carbon initiatives such as REDD+ can be incorporated into forest tree monitoring. Meanwhile, the severe shortage of experts and facilities, especially from indigenous regions and in low-income countries, poses a major hurdle for global monitoring of forest trees(9). The education and training from this research project, especially with indigenous communities, can bring tangible benefits to global forest tree monitoring while improving local economies as well. Furthermore, the collective human experiences of rural communities embedded within these forested landscapes have strong ties to surrounding forest types. From the Sitka spruce—western hemlock forests in the Pacific Northwest to the tropical rainforests in Amazonia, the change of native forests induced by climate change is threatening the customs, identities, and culture of indigenous and other local communities, jeopardizing the non-timber forest products supply and overall environmental justice(10). The multi-stakeholder team in the proposed CI co-produces knowledge and policies on climate-resilient forest management and conservation, which will bring tangible benefits to help local communities adapt their cultural norms and relationships to the changes in surrounding forests.
With the proposed cyber-innovated global partnership model, we have established Science-i as a global research metaverse that seeks to accelerate forest science by empowering underrepresented communities in global research and knowledge co-production. The beta testing site was online in April 2022. To this date, there are more than 320 members registered from 55 countries (13 from Africa, 8 from Asia, 22 from Europe, 2 from Oceania, 3 from North America, and 6 from South America). The global member team of Science-I cover a broad range of expertise areas, including applied ecology (175), forestry (163), computer science (12), economics (8), anthropology (4), and many more. 55 members are female, and 113 members are from developing countries. Based on this diverse and comprehensive global network team, who contributed more than 200 in situ forest inventory datasets, we have compiled a ground-sourced global forest inventory database (GFI-3D), which consists of 1.5 million sample plots distributed across the global forest range (Figure 2).
Figure 2: GFI-3D consists of in situ tree-level measurement of 1.5 million forest sample plots. Each plot is underpinned by exhaustive community-wide tree survey records including tree species, status, expansion factor(27), measurement year, diameter-at-breast-height, as well as key plot-level attributes. Global forest extent is indicated with green shading.
Utilizing the foregoing global database and a prototype CI based our proposed cyber-innovated global partnership model, Science-i is currently supporting 14 forest research projects(12), spanning a broad spectrum of subject areas, including biogeography, macroecology, forestry, ecological modeling, and microbial ecology. Among these ongoing projects, ten are led by female principal investigators. Except for one project, all studies are led by graduate students, postdoctoral researchers, or other early-career scientists. It is particularly noteworthy that this article was developed with Science-i. From October 2022 to April 2023, the service team of Science-i have hosted a series of global webinars (Globinar) in which invited guest speakers and hundreds of participants engaged in inclusive and constructive discussions on reducing inequality in forest research. Using the CI of Science-i, every single author was able to contribute to this article, throughout the entire life cycle of this project, from conceiving the idea to proofreading the text.
The cyberinfrastructure (CI) described here innovates global partnership to reduce the inequality in forest research with user-specific customizable data-sharing policies enforced through the data lifecycle, as well as transparent real-time worldwide co-production and cross-validation through inclusion of data contributors in the research lifecycle. The CI will also significantly reduce the complexity of connecting workflows to the requisite data and technologies. The CI engages under-represented communities, including undergraduate indigenous researchers, by granting them access to global research data, toolkits for data analyses, and a platform to co-produce and cross-validate research findings with experts worldwide. These benefits also extend to forest landowners and conservation/restoration professionals, for building global communities to find solutions that will significantly impact the future of our forest ecosystems worldwide for climate change mitigation and poverty alleviation. The reduced inequality will also improve workforce diversity in forest research, which in turn increases scientific outputs, as mounting evidence supports that diversity of scientists underpins the quality and productivity of scientific research, and elevates our ability to translate science to society and policy(13).
1. UN General Assembly, “Transforming our world: the 2030 Agenda for Sustainable Development,” (United Nations, New York, USA, 2015).
2. NSF, NSB, “Publications Output: U.S. Trends and International
Comparisons,” (National Science Board, 2021).
3. J.-P. O. d. Sardan, Promouvoir la recherche face à la consultance. Autour de l’expérience du lasdel (Niger-Bénin). Cahiers d’études africaines 51, 511-528 (2011).
4. K. Hüfner, in A concise encyclopedia of the United Nations. (Brill Nijhoff, 2000), pp. 557-560.
5. K. Marou Sama, R. d’Aiglepierre, S. Botton, Recherches africaines et rôles de l’aide internationale: le cas des sciences sociales. Rapports techniques, (2019).
6. C. Doumenge, F. Palla, I. Madzous, G. Ludovic. (OFAC, 2021).
7. FAO, UNEP, “The State of the World’s Forests 2020. Forests, biodiversity and people,” (United Nations, Rome, Italy, 2020).
8. J. Liang, J. G. P. Gamarra, The importance of sharing global forest data in a world of crises. Scientific Data 7, 424 (2020).
9. E. O. Wilson, A Global Biodiversity Map. Science 289, 2279-2279 (2000).
10. J. Fleetwood, Social justice, food loss, and the sustainable development goals in the era of COVID-19. Sustainability 12, 5027 (2020).
11. J. Liang et al., Co-limitation towards lower latitudes shapes global forest diversity gradients. Nature Ecology & Evolution 8, 10.1038/s41559-41022-01831-x (2022).
12. Science-i team. Science-i Ongoing Research Projects, <https://science-i.org/groups/> (Accessed April 2023).
13. J. L. Graves, M. Kearney, G. Barabino, S. Malcom, Inequality in science and the case for a new agenda. Proceedings of the National Academy of Sciences 119, e2117831119 (2022).