Back to overview

straditize: Digitizing stratigraphic diagrams

Type of publication Peer-reviewed
Publikationsform Original article (peer-reviewed)
Author Sommer Philipp, Rech Dilan, Chevalier Manuel, Davis Basil,
Project HORNET Holocene Climate Reconstruction for the Northern Hemisphere Extra-tropics
Show all

Original article (peer-reviewed)

Journal Journal of Open Source Software
Volume (Issue) 4(34)
Page(s) 1216 - 1216
Title of proceedings Journal of Open Source Software
DOI 10.21105/joss

Open Access

Type of Open Access Publisher (Gold Open Access)


In an age of digital data analysis, gaining access to data from the pre-digital era – or any data that is only available as a figure on a page – remains a problem and an underutilized scientific resource. Whilst there are numerous programs available that allow the digitization of scientific data in a simple x-y graph format, we know of no semi-automated program that can deal with data plotted with multiple horizontal axes that share the same vertical axis, such as pollen diagrams (see image below) and other stratigraphic figures that are common in the Earth sciences. Straditize (Stratigraphic Diagram Digitizer) (Sommer, 2019) fills this gap. It is an open-source program that allows stratigraphic figures to be digitized in a single semi-automated operation. It is designed to detect multiple plots of variables analyzed along the same vertical axis, whether this is a sediment core or any similar depth/time series. The program supports mixtures of many different diagram types, such as bar plots, line plots, as well as shaded, stacked, and filled area plots. Other features of straditize include text recognition to interpret the names of the different plotted variables, the automatic and semi-automatic recognition of picture artifacts, as well an automatic measurement finder to exactly reproduce the data that has been used to create the diagram. Straditize is written in the programming language Python and is available for Windows, Linux and MacOS. Being implemented in the visualization framework psyplot (Sommer, 2017) it provides an extensively documented graphical user interface for a point-andclick handling of the semi-automatic process, but can also be scripted or used from the command line. The visualization is based on matplotlib (Hunter, 2007) and most of the detection algorithms use image recognition functions from the scikit-image package (Walt et al., 2014) and numeric routines from scipy (Jones, Oliphant, Peterson, & others, 2001) and numpy (T. E. Oliphant, 2006).