Biodiversity Information Science and Standards,2018年
Quentin Groom, Henry Engledow, Ann Bogaerts, Nuno Veríssimo Pereira, Sofie De Smedt
LicenseType:Unknown |
Many, if not most, countries have several official or widely used languages. And most, if not all, of these countries have herbaria. Furthermore, specimens have been exchanged between herbaria from many countries, so herbaria are often polylingual collections. It is therefore useful to have label transcription systems that can attract users proficient in a wide variety of languages. Belgium is a typical polylingual country at the boundary between the Romance and Franconian languages (French, Dutch & German). Yet, currently there are few non-English transcription platforms for citizen science. This is why in Belgium we built DoeDat, from the Digivol system of the Atlas of Living Australia.We will be demonstrating DoeDat and its multilingual features. We will explain how we enter translations, both for the user interface and for the dynamic parts of the website. We will share our experiences of running a multilingual site and the challenges it brings. Translating and running such a website requires skilled personnel and patience. However, our experience has been positive and the number and quality of our volunteer transcriptions has been rewarding. We look forward to the further use of DoeDat to transcribe data in many other languages. There are no reasons anymore to exclude willing volunteers in any language.
Biodiversity Information Science and Standards,2018年
Henry Engledow, Sofie De Smedt, Ann Bogaerts, Quentin Groom
LicenseType:Unknown |
There are many ways to capture data from herbarium specimen labels. Here we compare the results of in-house verses out-sourced data transcription with the aim of evaluating the pros and cons of each approach and guiding future projects that want to do the same.In 2014 Meise Botanic Garden (BR) embarked on a mass digitization project. We digitally imaged of some 1.2 million herbarium specimens from our African and Belgian Herbaria. The minimal data for a third of these images was transcribed in-house, while the remainder was out-sourced to a commercial company. The minimal data comprised the fields: specimen’s herbarium location, barcode, filing name, family, collector, collector number, country code and phytoregion (for the Democratic Republic of Congo, Rwanda & Burundi). The out-sourced data capture consisted of three types:additional label information for central African specimens having minimal data;complete data for the remaining African specimens; and,species filing name information for African and Belgian specimens without minimal data. As part of the preparation for out-sourcing, a strict protocol had to be established as to the criteria for acceptable data quality levels.Also, the creation of several lookup tables for data entry was necessary to improve data quality. During the start-up phase all the data were checked, feedback given, compromises made and the protocol amended. After this phase, an agreed upon subsample was quality controlled. If the error score exceeded the agreed level, the batch was returned for retyping. The data had three quality control checks during the process, by the data capturers, the contractor’s project managers and ourselves.Data quality was analysed and compared in-house versus out-sourced modes of data capture. The error rate by our staff versus the external company was comparable. The types of error that occurred were often linked to the specific field in question. These errors include problems of interpretation, legibility, foreign languages, typographic errors, etc. A significant amount of data cleaning and post-capture processing was required prior to import into our database, despite the data being of good quality according to protocol (error < 1%). By improving the workflow and field definitions a notable improvement could be made in the “data cleaning” phase.The initial motivation for capturing some data in-house was financial. However, after analysis, this may not have been the most cost effective approach. Many lessons have been learned from this first mass digitisation project that will implemented in similar projects in the future.
Biodiversity Information Science and Standards,2018年
Sofie De Smedt, Ann Bogaerts, Quentin Groom, Henry Engledow
LicenseType:Unknown |
The botanicalcollections.be website (http://www.botanicalcollections.be) is the culmination of the three year Digitale Ontsluiting Erfgoedcollecties (DOE!) project. Over this period we have digitally imaged 1.2 million African and Belgian herbarium specimens and much of their label data. All these data are freely available on our new virtual herbarium www.botanicalcollections.be. For this we have to thank a generous grant from the Flemish Government.The site was officially launched on the 23rd March, 2018, at the Fourth Annual Meeting of Plant Ecology and Evolution held at Bouchout Castle in Meise Botanic Garden (https://sites.google.com/plantentuinmeise.be/ampee4/).Before developing the website we conducted a user requirements analysis (Vissers et al. 2017. These requirements formed the basis for development from initial design to the finished product. Lots of features were incorporated to make the site as user-friendly and usable as possible; persistent URIs, zoomable and downloadable images and access to data. Each specimen can be annotated and is available in a machine readable format.The goal of the botanicalcollections.be website is not only to make digitized specimens from the Botanic Garden available, but also to centralize and display the herbarium specimens from other Belgian herbaria. A cooperation agreement will make collaboration easy and transparent.The benefits to herbaria of participating in this virtual herbarium include greater publicity, the ability to show how their specimens contribute to overall knowledge, and a mechanism for identifying where to focus future collecting efforts, all of which help validate their worth to institutional administrators. In addition, such cooperation helps build professional relationships who, because of disparate interests and obligations, might not normally connect with each other.
Biodiversity Information Science and Standards,2018年
Quentin Groom, Sofie De Smedt, Nuno Veríssimo Pereira, Ann Bogaerts, Henry Engledow
LicenseType:Unknown |
Herbarium specimens hold a wealth of data about plants; where they come from, where they were collected and by whom. Once digitized, these data can be searched, mapped and compared. However, the information on specimens is often handwritten and even the best software systems cannot read it. This is where we get real value from citizen involvement. Digitizing these data is only possible with the aid of human intelligence.DoeDat is a multilingual open-source platform for transcription, based upon the DigiVol program of the Australian Museum and Atlas of Living Australia. DoeDat is a product of our digitization project Digital Access to Cultural Heritage Collections (DOE!), funded by the Flemish Government. DoeDat is about creating data and also, ‘Doe Dat’ means ‘do that’ in Dutch.DoeDat will help us digitize our collections, and will also give the public the chance to take an active part in the process. We aim to build a community of enthusiastic online volunteers who will help us liberate botanical data from specimen labels and documents. We launched the platform on Science Day and within two months, more than one hundred volunteers had transcribed more than 4,000 specimens.
Biodiversity Information Science and Standards,2018年
Henry Engledow, Sofie De Smedt, Quentin Groom, Ann Bogaerts, Piet Stoffelen, Marc Sosef, Paul Van Wambeke
LicenseType:Unknown |
Mass digitization is a large undertaking for a collection. It is disruptive of routine and can challenge long-held practises. Having been through the procedure and survived, we feel we have a lot of experience to share with other institutions who are considering taking on this challenge. The changes that digitization has made to our institution are positive and the digitization a success, but that is not to say that we would not have done some things differently, were we to repeat the exercise.In 2015 Meise Botanic Garden received a grant from the Flemish Government to upgrade its digitization infrastructure and mass digitize 1.2 million specimens from its African and Belgian Herbaria. The new infrastructure improved our workflow significantly, enabling us to digitize specimens five to ten times faster while also improving their quality.The mass digitization part of the project was split into two parts, imaging and transcription. The contract was awarded and out-sourced to Picturae, who started imaging in May 2016 using a conveyor belt installation. Prior to starting, a significant amount of preparation was required at the herbarium. Within one year, 1.2 million specimens were imaged. The images were captured as TIFF files and stored in triplicate at The Flemish Institute for Archiving (VIAA), while smaller derived JPEG 2000 and JPEG files were generated for day-to-day use.The second part of the project was label transcription. A third of the specimens were transcribed in-house for capturing minimal data (barcode, filing name, collector, collector number & country of origin). This was partly done to reduce costs, but also allowed us to compare in-house to out-sourced transcription. Some 500,000 specimens were transcribed, either completely or partially, by Alembo (subcontracted by Picturae).The remaining 200.000 specimens from our Belgian Herbarium are being transcribed using crowdsourcing. The latter is being realized through the citizen science platform DoeDat (www.doedat.be) that was launched in November 2017.Many lessons have been learnt with respect to implementing mass digitization, both practically and sociologically. Many of the problems encountered during the project could have been avoided by changing the workflow. The addition of extra control points during the process could have reduced problems encountered later in the data capture process. Solving these problems at a later stage was time consuming. Trying to “save money” can result in a disruptive workflow, which may lead to a number of costly errors. Mass digitization has fundamentally changed the workflow in our collections and the way in which our herbarium is managed. All images for the African and Belgian collections may be now found on our new virtual herbarium www.botanicalcollections.be.