Busca avançada
(Referência obtida automaticamente do Web of Science, por meio da informação sobre o financiamento pela FAPESP e o número do processo correspondente, incluída na publicação pelos autores.)

Inductive Model Generation for Text Classification Using a Bipartite Heterogeneous Network

Texto completo
Autor(es):
Rossi, Rafael Geraldeli [1] ; Lopes, Alneu de Andrade [1] ; Faleiros, Thiago de Paulo [1] ; Rezende, Solange Oliveira [1]
Número total de Autores: 4
Afiliação do(s) autor(es):
[1] Univ Sao Paulo, Inst Math & Comp Sci, Sao Carlos, SP - Brazil
Número total de Afiliações: 1
Tipo de documento: Artigo Científico
Fonte: JOURNAL OF COMPUTER SCIENCE AND TECHNOLOGY; v. 29, n. 3, p. 361-375, MAY 2014.
Citações Web of Science: 5
Resumo

Algorithms for numeric data classification have been applied for text classification. Usually the vector space model is used to represent text collections. The characteristics of this representation such as sparsity and high dimensionality sometimes impair the quality of general-purpose classifiers. Networks can be used to represent text collections, avoiding the high sparsity and allowing to model relationships among different objects that compose a text collection. Such network-based representations can improve the quality of the classification results. One of the simplest ways to represent textual collections by a network is through a bipartite heterogeneous network, which is composed of objects that represent the documents connected to objects that represent the terms. Heterogeneous bipartite networks do not require computation of similarities or relations among the objects and can be used to model any type of text collection. Due to the advantages of representing text collections through bipartite heterogeneous networks, in this article we present a text classifier which builds a classification model using the structure of a bipartite heterogeneous network. Such an algorithm, referred to as IMBHN (Inductive Model Based on Bipartite Heterogeneous Network), induces a classification model assigning weights to objects that represent the terms for each class of the text collection. An empirical evaluation using a large amount of text collections from different domains shows that the proposed IMBHN algorithm produces significantly better results than k-NN, C4.5, SVM, and Naive Bayes algorithms. (AU)

Processo FAPESP: 11/23689-9 - Propagação em grafos bipartidos para extração de tópicos em fluxo de dados
Beneficiário:Thiago de Paulo Faleiros
Linha de fomento: Bolsas no Brasil - Doutorado
Processo FAPESP: 11/19850-9 - Métodos de agrupamento hierárquico para organização automática de resultados de motores de busca
Beneficiário:Solange Oliveira Rezende
Linha de fomento: Auxílio à Pesquisa - Regular
Processo FAPESP: 11/12823-6 - Extraindo padrões de coleções de documentos textuais utilizando redes heterogêneas
Beneficiário:Rafael Geraldeli Rossi
Linha de fomento: Bolsas no Brasil - Doutorado