使用Lucene3和IKAnalyzer对一段文本进行分词

编程技术  /  houtizong 发布于 3年前   86
import java.io.IOException;import java.io.StringReader;import java.util.ArrayList;import java.util.List;import org.apache.lucene.analysis.Analyzer;import org.apache.lucene.analysis.TokenStream;import org.apache.lucene.analysis.tokenattributes.TermAttribute;import org.wltea.analyzer.lucene.IKAnalyzer;/** * 对一段文字进行分词 * @author Administrator */public class IkAnalyzerWord {  private String resource;  private List<String> result = new ArrayList<String>();  public IkAnalyzerWord(String resource) throws IOException {    this.resource = resource;    analyzer();  }  private void analyzer() throws IOException {    Analyzer analyzer = new IKAnalyzer();    TokenStream ts = analyzer.tokenStream("*", new StringReader(resource));    ts.addAttribute(TermAttribute.class);    //public <A extends Attribute> A addAttribute(Class<A> attClass)    //The caller must pass in a Class<? extends Attribute> value.    //This method first checks if an instance of that class is already in this AttributeSource and returns it.    //Otherwise a new instance is created, added to this AttributeSource and returned.    while (ts.incrementToken()) {      TermAttribute ta = ts.getAttribute(TermAttribute.class);      //public <A extends Attribute> A getAttribute(Class<A> attClass)      //The caller must pass in a Class<? extends Attribute> value.       //Returns the instance of the passed in Attribute contained in this AttributeSource      result.add(ta.term());      //Returns the Token's term text.    }  }  public List<String> getResult() {    return this.result;  }  public static void main(String[] args) throws IOException {    IkAnalyzerWord ik = new IkAnalyzerWord("今天的大风终于小了,但是又起雾了今天的大风终于小了,但是又起雾了");    System.out.println(ik.getResult());  }}


输出结果(我设置了stopword词典)

[大风, 终于, 小了, 雾, 大风, 终于, 小了, 雾]

请勿发布不友善或者负能量的内容。与人为善,比聪明更重要!

留言需要登陆哦

技术博客集 - 网站简介:
前后端技术:
后端基于Hyperf2.1框架开发,前端使用Bootstrap可视化布局系统生成

网站主要作用:
1.编程技术分享及讨论交流,内置聊天系统;
2.测试交流框架问题,比如:Hyperf、Laravel、TP、beego;
3.本站数据是基于大数据采集等爬虫技术为基础助力分享知识,如有侵权请发邮件到站长邮箱,站长会尽快处理;
4.站长邮箱:[email protected];

      订阅博客周刊 去订阅

文章归档

文章标签

友情链接

Auther ·HouTiZong
侯体宗的博客
© 2020 zongscan.com
版权所有ICP证 : 粤ICP备20027696号
PHP交流群 也可以扫右边的二维码
侯体宗的博客