Study and development of methods for automatic detection and correction of errors and inconsistencies in syntactically annotated corpora

Grant number: 13/18090-6
Support type:Scholarships in Brazil - Post-Doctorate
Effective date (Start): January 01, 2014
Effective date (End): July 31, 2016
Field of knowledge:Interdisciplinary
Principal Investigator:Charlotte Marie Chambelland Galves
Grantee:
Home Institution: Instituto de Estudos da Linguagem (IEL). Universidade Estadual de Campinas (UNICAMP). Campinas, SP, Brazil
Associated research grant:12/06078-9 - Portuguese in time and space: linguistic contact, grammars in competition and parametric change, AP.TEM
Associated scholarship(s):14/17172-1 - Study and application of explicit grammatical formalisms in the detection of inconsistency in treebanks, BE.EP.PD
Abstract
This research project has as its main goal the study, application, and further development, of computational methods for automatic detection and correction of errors and inconsistencies in syntactic annotated corpora (treebanks) such as the method proposed by Kato & Matsubara (2010), which is based on Synchronous Tree Substitution Grammar (Shieber & Schabes, 1990). Tied to the research project "Portuguese in time and space: linguistic contact, grammars in competition and parametric change" (Thematic Grants, FAPESP 12/06078-9), the present proposal will join their efforts, particularly those aimed at extending and consolidating Tycho Brahe Corpus (CTB), inasmuch as the method to be developed is to be evaluated and applied to it. As expected results, we have (i) the development of a new method for automatic detection and correction of inconsistencies with better results than the current state-of-the-art methods; (ii) the inclusion of the method as part of the syntactic annotation process and the application of it to CTB in order to make available a revised version of the current corpus; and, finally, (iii) based on the analysis of the most recurrent types of errors detected by the method, the update of the manual annotation guidelines in order to better train the team of annotation reviewers. (AU)
CDi/FAPESP - Documentation and Information Center, São Paulo Research Foundation

R. Pio XI, 1500 - Alto da Lapa - CEP 05468-901 - São Paulo/SP - Brasil
cdi@fapesp.br - Contact us