使用Lucene3和IKAnalyzer对一段文本进行分词
编程技术  /  houtizong 发布于 3年前   86
import java.io.IOException;import java.io.StringReader;import java.util.ArrayList;import java.util.List;import org.apache.lucene.analysis.Analyzer;import org.apache.lucene.analysis.TokenStream;import org.apache.lucene.analysis.tokenattributes.TermAttribute;import org.wltea.analyzer.lucene.IKAnalyzer;/** * 对一段文字进行分词 * @author Administrator */public class IkAnalyzerWord { private String resource; private List<String> result = new ArrayList<String>(); public IkAnalyzerWord(String resource) throws IOException { this.resource = resource; analyzer(); } private void analyzer() throws IOException { Analyzer analyzer = new IKAnalyzer(); TokenStream ts = analyzer.tokenStream("*", new StringReader(resource)); ts.addAttribute(TermAttribute.class); //public <A extends Attribute> A addAttribute(Class<A> attClass) //The caller must pass in a Class<? extends Attribute> value. //This method first checks if an instance of that class is already in this AttributeSource and returns it. //Otherwise a new instance is created, added to this AttributeSource and returned. while (ts.incrementToken()) { TermAttribute ta = ts.getAttribute(TermAttribute.class); //public <A extends Attribute> A getAttribute(Class<A> attClass) //The caller must pass in a Class<? extends Attribute> value. //Returns the instance of the passed in Attribute contained in this AttributeSource result.add(ta.term()); //Returns the Token's term text. } } public List<String> getResult() { return this.result; } public static void main(String[] args) throws IOException { IkAnalyzerWord ik = new IkAnalyzerWord("今天的大风终于小了,但是又起雾了今天的大风终于小了,但是又起雾了"); System.out.println(ik.getResult()); }}
[大风, 终于, 小了, 雾, 大风, 终于, 小了, 雾]
请勿发布不友善或者负能量的内容。与人为善,比聪明更重要!
技术博客集 - 网站简介:
前后端技术:
后端基于Hyperf2.1框架开发,前端使用Bootstrap可视化布局系统生成
网站主要作用:
1.编程技术分享及讨论交流,内置聊天系统;
2.测试交流框架问题,比如:Hyperf、Laravel、TP、beego;
3.本站数据是基于大数据采集等爬虫技术为基础助力分享知识,如有侵权请发邮件到站长邮箱,站长会尽快处理;
4.站长邮箱:[email protected];
文章归档
文章标签
友情链接