You are viewing a plain text version of this content. The canonical link for it is here.
Posted to solr-user@lucene.apache.org by Furkan KAMACI <fu...@gmail.com> on 2013/05/27 15:37:54 UTC

Solr/Lucene Analayzer That Writes To File

Hi;

I want to use Solr for an academical research. One step of my purpose is I
want to store tokens in a file (I will store it at a database later) and I
don't want to index them. For such kind of purposes should I use core
Lucene or Solr? Is there an example for writing a custom analyzer and just
storing tokens in a file?

Re: Solr/Lucene Analayzer That Writes To File

Posted by Roman Chyla <ro...@gmail.com>.
You can store them and then use different analyzer chains on it (stored,
doesn't need to be indexed)

I'd probably use the collector pattern


    se.search(new MatchAllDocsQuery(), new Collector() {
      private AtomicReader reader;
      private int i = 0;

      @Override
      public boolean acceptsDocsOutOfOrder() {
        return true;
      }

      @Override

      public void collect(int i) {
        Document d;
        try {
          d = reader.document(i, fieldsToLoad);
          for (String f: fieldsToLoad) {
            String[] vals = d.getValues(f);
            for (String s: vals) {
              TokenStream ts = analyzer.tokenStream(targetAnalyzer,
new StringReader(s));
              ts.reset();
              while (ts.incrementToken()) {
                //do something with the analyzed tokens
              }

            }
          }
        } catch (IOException e) {
          // pass

        }
      }
      @Override

      public void setNextReader(AtomicReaderContext context) {
        this.reader = context.reader();
      }

      @Override
      public void setScorer(org.apache.lucene.search.Scorer scorer) {
        // Do Nothing

      }
    });

    // or persist the data here if one of your components knows to
write to disk, but there is no api...
    TokenStream ts = analyzer.tokenStream(data.targetField, new
StringReader("xxx"));
    ts.reset();
    ts.reset();
    ts.reset();

  }



On Mon, May 27, 2013 at 9:37 AM, Furkan KAMACI <fu...@gmail.com>wrote:

> Hi;
>
> I want to use Solr for an academical research. One step of my purpose is I
> want to store tokens in a file (I will store it at a database later) and I
> don't want to index them. For such kind of purposes should I use core
> Lucene or Solr? Is there an example for writing a custom analyzer and just
> storing tokens in a file?
>

Re: Solr/Lucene Analayzer That Writes To File

Posted by Rafał Kuć <r....@solr.pl>.
Hello!

Take a look at custom posting formats. For example
here  is  a  nice  post showing what you can do with Lucene SimpleText
codec:
http://blog.mikemccandless.com/2010/10/lucenes-simpletext-codec.html

However  please  remember  that it is not advised to use that codec in
production environment.

-- 
Regards,
 Rafał Kuć
 Sematext :: http://sematext.com/ :: Solr - Lucene - Nutch - ElasticSearch

> Hi;

> I want to use Solr for an academical research. One step of my purpose is I
> want to store tokens in a file (I will store it at a database later) and I
> don't want to index them. For such kind of purposes should I use core
> Lucene or Solr? Is there an example for writing a custom analyzer and just
> storing tokens in a file?


Re: Solr/Lucene Analayzer That Writes To File

Posted by Chris Hostetter <ho...@fucit.org>.
: I want to use Solr for an academical research. One step of my purpose is I
: want to store tokens in a file (I will store it at a database later) and I

you could absolutely write a java program which access the analyzers 
directly nad does whatever you want with the results of analysing a piece 
of text that you feed in.   

Alternatively, you could use something like the 
FieldAnalysisRequestHandler in solr, so that you could have an arbitrary 
client send data to solr asking it to analyze it for you and break it down 
into tokens, per your schema.xml...

http://localhost:8983/solr/collection1/analysis/field?analysis.fieldvalue=The%20quick%20brown%20fox%20jumped%20over%20the%20lazy%20dog&analysis.fieldtype=text_en&wt=json&indent=true

(this is exactly how the Analysis page in the admin UI works, the 
javascript powering htat page hits this same URL)

https://lucene.apache.org/solr/4_3_0/solr-core/org/apache/solr/handler/FieldAnalysisRequestHandler.html


-Hoss