You are viewing a plain text version of this content. The canonical link for it is here.

Posted to solr-user@lucene.apache.org by Yuchen Wang <yu...@trulia.com> on 2009/07/05 06:43:50 UTC

Problem in parsing non-string dynamic field by using IndexReader

I have a task to parse all documents in a solr index. I use Lucene
IndexReader to read the index and go through each field from all documents.
However, for float or int dynamic fields, the stringValue() call always
returns some special characters. I tried tokenStreamValue, byteValue,
readerValue, and they return null.
Following is my method to parse the solr index. My question is, how can I
get the values from non-string dynamic fields properly?

    public static void main(String[] args) throws Exception {
        IndexReader reader =
IndexReader.open("/path/to/my/index/directory");

        int total = reader.numDocs();
        System.out.println("Total documents: " + total);

        for (int i = 0; i < 1; i++) {
            Document d = reader.document(i);

            List<Field> fields = d.getFields();

            for (Field f : fields) {
                String name = f.name();
                String val = f.stringValue();

               System.out.println("get field / value: [" + name + "=" + val
+ "]");            }
        }

        reader.close();
    }

Re: Problem in parsing non-string dynamic field by using IndexReader

Posted by Yuchen Wang <yu...@trulia.com>.

that works perfectly! Thanks a lot!

On Mon, Jul 6, 2009 at 2:12 PM, Chris Hostetter <ho...@fucit.org>wrote:

> : OK, here is my latest code to get the IndexReader from the solr core.
> : However, it still printed out the non-string fields as special chars. I
> do
> : use the schema file here. Please help.
>
> you'll want to use the IndexSchema object to get the FieldType
> object for your field name.  then use the FieldType to convert the values
> in the index to readable values.
>
> Take a look at the javadocs for IndexSearcher and FieldType for more
> details.
>
> if you look at code like the XMLResponseWriter you'll see examples of
> iterating over all the fields in a Document and using those methods.
>
>
>
> -Hoss
>
>

Re: Problem in parsing non-string dynamic field by using IndexReader

Posted by Chris Hostetter <ho...@fucit.org>.

: OK, here is my latest code to get the IndexReader from the solr core.
: However, it still printed out the non-string fields as special chars. I do
: use the schema file here. Please help.

you'll want to use the IndexSchema object to get the FieldType 
object for your field name.  then use the FieldType to convert the values 
in the index to readable values.

Take a look at the javadocs for IndexSearcher and FieldType for more 
details.  

if you look at code like the XMLResponseWriter you'll see examples of 
iterating over all the fields in a Document and using those methods.



-Hoss

Re: Problem in parsing non-string dynamic field by using IndexReader

Posted by Yuchen Wang <yu...@trulia.com>.

OK, here is my latest code to get the IndexReader from the solr core.
However, it still printed out the non-string fields as special chars. I do
use the schema file here. Please help.
    public static void main(String[] args) throws Exception {
        SolrConfig config = new SolrConfig("/Users/yuchen/Work/data/",
"solrconfig.xml", null);
        IndexSchema schema = new IndexSchema(config, "schema.xml", null);

        CoreContainer container = new CoreContainer(new
SolrResourceLoader(SolrResourceLoader.locateInstanceDir()));
        CoreDescriptor dcore = new CoreDescriptor(container, "solr0",
config.getResourceLoader().getInstanceDir());
        dcore.setConfigName(config.getResourceName());
        dcore.setSchemaName(schema.getResourceName());
        SolrCore core = new SolrCore("solr0", "/Users/yuchen/Work/data",
config, schema, dcore);
        container.register("solr0", core, false);

        IndexReader reader = core.getSearcher().get().getReader();

        FieldCache cache = FieldCache.DEFAULT;

        int total = reader.numDocs();
        System.out.println("Total documents: " + total);

        for (int i = 0; i < 1; i++) {
            System.out.println("\n=============== Got Node: " + i + "
=================");
            Document d = reader.document(i);

            List<Field> fields = d.getFields();

            for (Field f : fields) {
                String name = f.name();
                String val = f.stringValue();
                System.out.println("get field / value: [" + name + "=" + val
+ "]");
            }
        }

        reader.close();
    }


On Sun, Jul 5, 2009 at 7:58 PM, Otis Gospodnetic <otis_gospodnetic@yahoo.com
> wrote:

>
> Yuchen,
>
> schema.xml is a Solr configuration file that you can find in a conf
> directory under Solr home.  Please go through the Solr tutorial on the site
> first.
>
>  Otis
> --
> Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch
>
>
>
> ----- Original Message ----
> > From: Yuchen Wang <yu...@trulia.com>
> > To: solr-user@lucene.apache.org
> > Sent: Sunday, July 5, 2009 1:19:12 PM
> > Subject: Re: Problem in parsing non-string dynamic field by using
> IndexReader
> >
> > Thanks for the reply. However, in the code I posted, where should I load
> the
> > schema.xml? I just created a Lucene IndexReader directly.
> >
> > On Sun, Jul 5, 2009 at 9:31 AM, Otis Gospodnetic
> > > wrote:
> >
> > >
> > > Yuchen,
> > >
> > > Make sure the fields you are trying to read are stored (stored="true"
> in
> > > schema.xml)
> > >
> > >  Otis
> > > --
> > > Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch
> > >
> > >
> > >
> > > ----- Original Message ----
> > > > From: Yuchen Wang
> > > > To: solr-user@lucene.apache.org
> > > > Sent: Sunday, July 5, 2009 12:43:50 AM
> > > > Subject: Problem in parsing non-string dynamic field by using
> IndexReader
> > > >
> > > > I have a task to parse all documents in a solr index. I use Lucene
> > > > IndexReader to read the index and go through each field from all
> > > documents.
> > > > However, for float or int dynamic fields, the stringValue() call
> always
> > > > returns some special characters. I tried tokenStreamValue, byteValue,
> > > > readerValue, and they return null.
> > > > Following is my method to parse the solr index. My question is, how
> can I
> > > > get the values from non-string dynamic fields properly?
> > > >
> > > >     public static void main(String[] args) throws Exception {
> > > >         IndexReader reader =
> > > > IndexReader.open("/path/to/my/index/directory");
> > > >
> > > >         int total = reader.numDocs();
> > > >         System.out.println("Total documents: " + total);
> > > >
> > > >         for (int i = 0; i < 1; i++) {
> > > >             Document d = reader.document(i);
> > > >
> > > >             Listfields = d.getFields();
> > > >
> > > >             for (Field f : fields) {
> > > >                 String name = f.name();
> > > >                 String val = f.stringValue();
> > > >
> > > >                System.out.println("get field / value: [" + name + "="
> +
> > > val
> > > > + "]");            }
> > > >         }
> > > >
> > > >         reader.close();
> > > >     }
> > >
> > >
>
>

Re: Problem in parsing non-string dynamic field by using IndexReader

Posted by Otis Gospodnetic <ot...@yahoo.com>.

Yuchen,

schema.xml is a Solr configuration file that you can find in a conf directory under Solr home.  Please go through the Solr tutorial on the site first.

 Otis
--
Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch



----- Original Message ----
> From: Yuchen Wang <yu...@trulia.com>
> To: solr-user@lucene.apache.org
> Sent: Sunday, July 5, 2009 1:19:12 PM
> Subject: Re: Problem in parsing non-string dynamic field by using IndexReader
> 
> Thanks for the reply. However, in the code I posted, where should I load the
> schema.xml? I just created a Lucene IndexReader directly.
> 
> On Sun, Jul 5, 2009 at 9:31 AM, Otis Gospodnetic 
> > wrote:
> 
> >
> > Yuchen,
> >
> > Make sure the fields you are trying to read are stored (stored="true" in
> > schema.xml)
> >
> >  Otis
> > --
> > Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch
> >
> >
> >
> > ----- Original Message ----
> > > From: Yuchen Wang 
> > > To: solr-user@lucene.apache.org
> > > Sent: Sunday, July 5, 2009 12:43:50 AM
> > > Subject: Problem in parsing non-string dynamic field by using IndexReader
> > >
> > > I have a task to parse all documents in a solr index. I use Lucene
> > > IndexReader to read the index and go through each field from all
> > documents.
> > > However, for float or int dynamic fields, the stringValue() call always
> > > returns some special characters. I tried tokenStreamValue, byteValue,
> > > readerValue, and they return null.
> > > Following is my method to parse the solr index. My question is, how can I
> > > get the values from non-string dynamic fields properly?
> > >
> > >     public static void main(String[] args) throws Exception {
> > >         IndexReader reader =
> > > IndexReader.open("/path/to/my/index/directory");
> > >
> > >         int total = reader.numDocs();
> > >         System.out.println("Total documents: " + total);
> > >
> > >         for (int i = 0; i < 1; i++) {
> > >             Document d = reader.document(i);
> > >
> > >             Listfields = d.getFields();
> > >
> > >             for (Field f : fields) {
> > >                 String name = f.name();
> > >                 String val = f.stringValue();
> > >
> > >                System.out.println("get field / value: [" + name + "=" +
> > val
> > > + "]");            }
> > >         }
> > >
> > >         reader.close();
> > >     }
> >
> >

Re: Problem in parsing non-string dynamic field by using IndexReader

Posted by Yuchen Wang <yu...@trulia.com>.

Thanks for the reply. However, in the code I posted, where should I load the
schema.xml? I just created a Lucene IndexReader directly.

On Sun, Jul 5, 2009 at 9:31 AM, Otis Gospodnetic <otis_gospodnetic@yahoo.com
> wrote:

>
> Yuchen,
>
> Make sure the fields you are trying to read are stored (stored="true" in
> schema.xml)
>
>  Otis
> --
> Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch
>
>
>
> ----- Original Message ----
> > From: Yuchen Wang <yu...@trulia.com>
> > To: solr-user@lucene.apache.org
> > Sent: Sunday, July 5, 2009 12:43:50 AM
> > Subject: Problem in parsing non-string dynamic field by using IndexReader
> >
> > I have a task to parse all documents in a solr index. I use Lucene
> > IndexReader to read the index and go through each field from all
> documents.
> > However, for float or int dynamic fields, the stringValue() call always
> > returns some special characters. I tried tokenStreamValue, byteValue,
> > readerValue, and they return null.
> > Following is my method to parse the solr index. My question is, how can I
> > get the values from non-string dynamic fields properly?
> >
> >     public static void main(String[] args) throws Exception {
> >         IndexReader reader =
> > IndexReader.open("/path/to/my/index/directory");
> >
> >         int total = reader.numDocs();
> >         System.out.println("Total documents: " + total);
> >
> >         for (int i = 0; i < 1; i++) {
> >             Document d = reader.document(i);
> >
> >             Listfields = d.getFields();
> >
> >             for (Field f : fields) {
> >                 String name = f.name();
> >                 String val = f.stringValue();
> >
> >                System.out.println("get field / value: [" + name + "=" +
> val
> > + "]");            }
> >         }
> >
> >         reader.close();
> >     }
>
>

Re: Problem in parsing non-string dynamic field by using IndexReader

Posted by Otis Gospodnetic <ot...@yahoo.com>.

Yuchen,

Make sure the fields you are trying to read are stored (stored="true" in schema.xml)

 Otis
--
Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch



----- Original Message ----
> From: Yuchen Wang <yu...@trulia.com>
> To: solr-user@lucene.apache.org
> Sent: Sunday, July 5, 2009 12:43:50 AM
> Subject: Problem in parsing non-string dynamic field by using IndexReader
> 
> I have a task to parse all documents in a solr index. I use Lucene
> IndexReader to read the index and go through each field from all documents.
> However, for float or int dynamic fields, the stringValue() call always
> returns some special characters. I tried tokenStreamValue, byteValue,
> readerValue, and they return null.
> Following is my method to parse the solr index. My question is, how can I
> get the values from non-string dynamic fields properly?
> 
>     public static void main(String[] args) throws Exception {
>         IndexReader reader =
> IndexReader.open("/path/to/my/index/directory");
> 
>         int total = reader.numDocs();
>         System.out.println("Total documents: " + total);
> 
>         for (int i = 0; i < 1; i++) {
>             Document d = reader.document(i);
> 
>             Listfields = d.getFields();
> 
>             for (Field f : fields) {
>                 String name = f.name();
>                 String val = f.stringValue();
> 
>                System.out.println("get field / value: [" + name + "=" + val
> + "]");            }
>         }
> 
>         reader.close();
>     }