Published in In Proceedings of the 30th Conference on Intelligent Systems for Molecular Biology, 2022, Madison USA, July 10-14, 2022, 2022
Motivation: Literature-based Gene Ontology Annotations (GOA) are biological database records that use controlled vocabulary to uniformly represent gene function information that is described in primary literature. Assurance of the quality of GOA is crucial for supporting biological research. However, a range of different kinds of inconsistencies in between literature as evidence and annotated GO terms can be identified; these have not been systematically studied at record level. The existing manual-curation approach to GOA consistency assurance is inefficient and is unable to keep pace with the rate of updates to gene function knowledge. Automatic tools are therefore needed to assist with GOA consistency assurance. This paper presents an exploration of different GOA inconsistencies and an early feasibility study of automatic inconsistency detection. Results: We have created a reliable synthetic dataset to simulate four realistic types of GOA inconsistency in biological databases. Three automatic approaches are proposed. They provide reasonable performance on the task of distinguishing the four types of inconsistency and are directly applicable to detect inconsistencies in real-world GOA database records. Major challenges resulting from such inconsistencies in the context of several specific application settings are reported. Conclusion: This is the first study to introduce automatic approaches that are designed to address the challenges in current GOA quality assurance workflows.
Recommended citation: Jiyu Chen, Benjamin Gouldey, Justin Zobel, Nicholas Geard, and Karin Verspoor. "Exploring Automatic Inconsistency Detection for Literature-based Gene Ontology Annotation" In Proceedings of the 30th Conference on Intelligent Systems for Molecular Biology, 2022, Madison USA, July 10-14, 2022 https://www.iscb.org/cms_addon/conferences/ismb2022/proceedings.php/