Tables are ubiquitous in the geoscience industry, appearing in numerous documents and spreadsheets. They contain a wealth of data in a structured format which can help us understand the subsurface. However, the number of tables created over the years is huge and it requires an enormous manual effort for domain experts to read each table to understand what kind of data is in it. Therefore, it would be more efficient to develop an automated way to do this, but different tables can vary greatly in style and layouts which makes it difficult for a machine to understand tables. For this reason, a first step towards automatic extraction of data from tables and spreadsheets is the identification of the role each cell plays, a task called table cell classification. In this work, we explore machine learning techniques for performing this task.
Download Resource Publications
EAGE - European Association of Geoscientists and EngineersAuthors
Chin Hang Lun, Thomas Hewitt, Song Hou