Active learning strategies for extracting phrase-level topics from scientific literature

Tao Yue*, Yu Li, Zhang Runjie

*此作品的通讯作者

科研成果: 期刊稿件文章同行评审

2 引用 (Scopus)

摘要

[Objective] This paper explores methods of extracting information from scientific literature with the help of active learning strategies, aiming to address the issue of lacking annotated corpus. [Methods] We constructed our new model based on three representative active learning strategies (MARGIN, NSE, MNLP) and one novel LWP strategy, as well as the neural network model (namely CNN-BiLSTM-CRF). Then, we extracted the task and method related information from texts with much fewer annotations. [Results] We examined our model with scientific articles with 10%~30% selectively annotated texts. The proposed model yielded the same results as those of models with 100% annotated texts. It significantly reduced the labor costs of corpus construction. [Limitations] The number of scientific articles in our sample corpus was small, which led to low precision issues. [Conclusions] The proposed model significantly reduces its reliance on the scale of annotated corpus. Compared with the existing active learning strategies, the MNLP yielded better results and normalizes the sentence length to improve the model’s stability. In the meantime, MARGIN performs well in the initial iteration to identify the low-value instances, while LWP is suitable for dataset with more semantic labels.

源语言英语
页(从-至)134-143
页数10
期刊Data Analysis and Knowledge Discovery
4
10
DOI
出版状态已出版 - 2020
已对外发布

指纹

探究 'Active learning strategies for extracting phrase-level topics from scientific literature' 的科研主题。它们共同构成独一无二的指纹。

引用此