event-icon
Description

Abstract: Quality assurance of biomedical terminologies such as the National Cancer Institute (NCI) Thesaurus is an essential part of the terminology management lifecycle. We investigate a structural-lexical approach based on non-lattice subgraphs to automatically identify missing hierarchical relations and missing concepts in the NCI Thesaurus. We mine six structural-lexical patterns exhibiting in non-lattice subgraphs: containment, union, intersection, union-intersection, inference-contradiction, and inference-union. Each pattern indicates a potential specific type of error and suggests a potential type of remediation. We found 809 non-lattice subgraphs with these patterns in the NCI Thesaurus (version 16.12d). Domain experts evaluated a random sample of 50 small non-lattice subgraphs, of which 33 were confirmed to contain errors and make correct suggestions (33/50 = 66%). Of the 25 evaluated subgraphs revealing multiple patterns, 22 were verified correct (22/25 = 88%). This shows the effectiveness of our structural-lexical-pattern-based approach in detecting errors and suggesting remediations in the NCI Thesaurus.

Learning Objective 1: Mine lexical and structural patterns of the concepts of non-lattice subgraphs in biomedical terminologies, to identify certain types of errors and suggest remediations.

Authors:

Rashmie Abeysinghe (Presenter)
University of Kentucky

Michael Brooks, University of Kentucky
Jeffery Talbert, University of Kentucky
Licong Cui, University of Kentucky

Presentation Materials:

Tags