ログイン
言語:

WEKO3

  • トップ
  • ランキング
To
lat lon distance
To

Field does not validate



インデックスリンク

インデックスツリー

メールアドレスを入力してください。

WEKO

One fine body…

WEKO

One fine body…

アイテム

  1. 40 大学院工学研究科・工学部
  2. 40D 学位論文
  3. 修士論文
  4. 2010年度

A Study on Automatic Chinese Text Classification

http://hdl.handle.net/10076/12727
http://hdl.handle.net/10076/12727
a8390755-3d80-45c4-bcb6-503d6c397135
名前 / ファイル ライセンス アクション
2010M251.pdf 2010M251.pdf (1.6 MB)
Item type 学位論文 / Thesis or Dissertation(1)
公開日 2013-06-11
タイトル
タイトル A Study on Automatic Chinese Text Classification
言語 en
言語
言語 eng
資源タイプ
資源タイプ識別子 http://purl.org/coar/resource_type/c_46ec
資源タイプ thesis
著者 LUO, XI

× LUO, XI

en LUO, XI

Search repository
抄録
内容記述タイプ Abstract
内容記述 Automatic text classification (ATC) is the task to automatically assign one or more appropriate categories for a document according to its content or topic. Traditionally, text classification is carried out by human experts as it requires a certain level of vocabulary recognition and knowledge processing. With the rapid explosion of texts in digital form and growth of online information, text classification has become an important research area owing to the need to automatically handle and organize text collections. The applications of this technology are manifold, including automatic indexing for information retrieval systems, document organization, text filtering, spam filtering, and even hierarchical categorization of web pages. Many standard machine learning techniques have been applied to automated text classification problems, and K Nearest Neighbor system (kNN) and Support Vector Machines (SVM) have been reported as the top performing methods for English text classification. Unfortunately, perfect precision cannot be reached in Chinese text classification and the inherent errors caused by word segmentation always remain as a problem. The purpose of this research is to evaluate the effectiveness of feature extraction, feature transformation and dimension reduction techniques, and to improve the accuracy of Chinese text classification using various techniques. In this paper, we perform Chinese text classification using N-gram (uni-gram, bi-gram and mixed uni-gram/bi-gram) frequency feature instead of word frequency feature to represent documents and propose the use of mixed uni-gram/bi-gram after feature transformation. We further propose a serial approach based on feature transformation and dimension reduction techniques to improve the performance. Then we compare the results of three different types of SVM kernel functions. Experimental results show that our proposed approach is efficient and effective for improving the performance of Chinese text classification. Furthermore, we propose a novel feature selection method based on part-of-speech analysis. According to the components of Chinese texts, we utilize the words’ part-of-speech (POS) at tributes to filter lots of meaningless features. The results show that suitable combination ofpart-of-speech can lead to better classification performance.
内容記述
内容記述タイプ Other
内容記述 三重大学大学院工学研究科博士前期課程情報工学専攻
内容記述
内容記述タイプ Other
内容記述 4, 28
書誌情報
発行日 2011-01-01
フォーマット
内容記述タイプ Other
内容記述 application/pdf
著者版フラグ
出版タイプ VoR
出版タイプResource http://purl.org/coar/version/c_970fb48d4fbd8a85
出版者
出版者 三重大学
修士論文指導教員
寄与者識別子Scheme WEKO
寄与者識別子 22700
姓名 Kimura, Fumitaka
言語 en
資源タイプ(三重大)
値 Master's Thesis / 修士論文
戻る
0
views
See details
Views

Versions

Ver.1 2023-06-19 17:23:25.217412
Show All versions

Share

Mendeley Twitter Facebook Print Addthis

Cite as

エクスポート

OAI-PMH
  • OAI-PMH JPCOAR 2.0
  • OAI-PMH JPCOAR 1.0
  • OAI-PMH DublinCore
  • OAI-PMH DDI
Other Formats
  • JSON
  • BIBTEX

Confirm


Powered by WEKO3


Powered by WEKO3