Quantcast
Channel: IT社区推荐资讯 - ITIndex.net
Viewing all articles
Browse latest Browse all 11804

[原]Lucene系列-近实时搜索

$
0
0

近实时搜索(near-real-time)可以搜索IndexWriter还未commit的内容,介于immediate和eventual之间,在数据比较大、更新较频繁的情况下使用。lucene的nrt可以控制更新生效的间隔时间。

基本过程:

  • 打开indexwriter
  • 从indexwriter中获得indexreader
  • 建立indexsearcher
  • 查看是否有变化,有变化则建立新reader/searcher

需要用到DirectoryReader.openIfChanged函数,该函数会对老readers做删除、合并,将变化应用到新reader中。代码如下,基于lucene 4.10。

    private Directory ramDir = new RAMDirectory();
    public void testNrt() throws IOException {
        //IndexWriter
        IndexWriterConfig writerConfig = new IndexWriterConfig(Version.LUCENE_4_10_0, new StandardAnalyzer());
        IndexWriter writer = new IndexWriter(ramDir, writerConfig);

        Document doc = new Document();
        doc.add(new TextField("title", "lucene", Field.Store.YES));
        doc.add(new TextField("author", "zhangsan", Field.Store.YES));
        writer.addDocument(doc);
        //IndexReader,基于IndexWriter打开的IndexReader
        <strong>IndexReader reader = DirectoryReader.open(writer, true);</strong>
        IndexSearcher searcher = new IndexSearcher(reader);
        TermQuery query = new TermQuery(new Term("title", "lucene"));
        TopDocs docs = searcher.search(query, 10);
        System.out.println(docs.totalHits);

        for (int i = 0; i < 10; ++i) {
            doc = new Document();
            doc.add(new TextField("title", "lucene " + i, Field.Store.YES));
            doc.add(new TextField("author", "zhangsan " + i, Field.Store.YES));
            writer.addDocument(doc);
        }
        //openIfChanged,如果有提交或未提交的变化,就打开新的indexreader。
        <strong>IndexReader newReader = DirectoryReader.openIfChanged((DirectoryReader) reader, writer, true);</strong>
        if (reader != newReader) {
            searcher = new IndexSearcher(newReader);
            reader.close();
        }

        docs = searcher.search(query, 10);
        System.out.println(docs.totalHits);
    }

在实际应用中,会并行的进行搜索、建索引、打开新Reader,需要考虑多线程安全问题。lucene提供了SearcherManager extends ReferenceManager<IndexSearcher>来确保searcher和索引的使用安全。

  • accquire获取最新的IndexSearcher引用
  • release释放不用的searcher,以便于SearcherManager关闭不再使用的老IndexSearcher
  • maybeRefresh,起一个后台线程定期调用该方法,以获得包含了变化的新IndexSearcher
追踪maybeRefresh代码会看到里面还是调用了 DirectoryReader.openIfChanged方法来获取新reader,只是加了多线程安全控制,封装了reader/searcher的生成。

    private Directory ramDir = new RAMDirectory();
    public void testNrt() throws IOException {
        //IndexWriter
        IndexWriterConfig writerConfig = new IndexWriterConfig(Version.LUCENE_4_10_0, new StandardAnalyzer());
        IndexWriter writer = new IndexWriter(ramDir, writerConfig);

        Document doc = new Document();
        doc.add(new TextField("title", "lucene", Field.Store.YES));
        doc.add(new TextField("author", "zhangsan", Field.Store.YES));
        writer.addDocument(doc);

        //可以自定义SearcherFactory来对生成的IndexSearcher进行设置,如setSimilarity
        <strong>SearcherManager searcherManager = new SearcherManager(writer, true, new SearcherFactory());</strong>
        //获取当前Searcher
        <strong>IndexSearcher searcher = searcherManager.acquire();</strong>

        TermQuery query = new TermQuery(new Term("title", "lucene"));
        TopDocs docs = searcher.search(query, 10);
        System.out.println(docs.totalHits);
        //减少该searcher的引用,本质是利用了IndexReader的refCount,当refCount为0时关闭reader
        <strong>searcherManager.release(searcher);</strong>

        for (int i = 0; i < 10; ++i) {
            doc = new Document();
            doc.add(new TextField("title", "lucene " + i, Field.Store.YES));
            doc.add(new TextField("author", "zhangsan " + i, Field.Store.YES));
            writer.addDocument(doc);
        }
        //调用DirectoryReader.openIfChanged获取新searcher
        searcherManager.maybeRefresh();

        searcher = searcherManager.acquire();
        docs = searcher.search(query, 10);
        System.out.println(docs.totalHits);
        searcherManager.release(searcher);
        //释放资源
       <strong>searcherManager.close();</strong>

作者:whuqin 发表于2015/1/20 20:01:33 原文链接
阅读:110 评论:0 查看评论

Viewing all articles
Browse latest Browse all 11804

Trending Articles



<script src="https://jsc.adskeeper.com/r/s/rssing.com.1596347.js" async> </script>