Abstract: Providing annotated biomarker data for natural language processing and machine learning algorithm development can improve system performance and ultimately maximize patient outcomes. The annotation process for developing a validation benchmark for these systems is often time-consuming and resource-intensive. We aim to develop a metric that evaluates these two constraints and allows for more efficient resource allocation to accelerate knowledge discovery. This metric incorporates known variables available prior to annotation: token and metadata field counts.

Learning Objective 1: Formulate a predictive metric to forecast level of effort for an annotation task to allow for efficient allocation of organizational resources.


Glenn Abastillas (Presenter)
Information Management Services, Inc.

Spencer Morris, National Cancer Institute
Jessica Boten, National Cancer Institute
Tumenbayar Tumurchudur, Information Management Services, Inc.
Kinjal Vora, Vasta Global
Paul Fearn, National Cancer Institute

Presentation Materials: