You are viewing a plain text version of this content. The canonical link for it is here.
Posted to solr-user@lucene.apache.org by Ken Stanley <do...@gmail.com> on 2010/10/16 00:42:45 UTC

SOLR DateTime and SortableLongField field type problems

Hello all,

I am using SOLR-1.4.1 with the DataImportHandler, and I am trying to follow
the advice from
http://www.mail-archive.com/solr-user@lucene.apache.org/msg11887.html about
converting date fields to SortableLong fields for better memory efficiency.
However, whenever I try to do this using the DateFormater, I get exceptions
when indexing for every row that tries to create my sortable fields.

In my schema.xml, I have the following definitions for the fieldType and
dynamicField:

<fieldType name="sdate" class="solr.SortableLongField" indexed="true"
stored="false" sortMissingLast="true" omitNorms="true" />
<dynamicField name="sort_date_*" type="sdate" stored="false" indexed="true"
/>

In my dih.xml, I have the following definitions:

<dataConfig>
    <dataSource type="FileDataSource" encoding="UTF-8" />
        <entity
            name="xml_stories"
            rootEntity="false"
            dataSource="null"
            processor="FileListEntityProcessor"
            fileName="legacy_stories.*\.xml$"
            recursive="false"
            baseDir="/usr/local/extracts"
            newerThan="${dataimporter.xml_stories.last_index_time}"
        >
            <entity
                name="stories"
                pk="id"
                dataSource="xml_stories"
                processor="XPathEntityProcessor"
                url="${xml_stories.fileAbsolutePath}"
                forEach="/RECORDS/RECORD"
                stream="true"

transformer="DateFormatTransformer,HTMLStripTransformer,RegexTransformer,TemplateTransformer"
                onError="continue"
            >
                <field column="_modified_date"
xpath="/RECORDS/RECORD/PROP[@NAME='R_ModifiedTime']/PVAL" />
                <field column="modified_date" sourceColName="_modified_date"
dateTimeFormat="yyyy-MM-dd'T'hh:mm:ss'Z'" />

                <field column="_df_date_published"
xpath="/RECORDS/RECORD/PROP[@NAME='R_StoryDate']/PVAL" />
                <field column="df_date_published"
sourceColName="_df_date_published" dateTimeFormat="yyyy-MM-dd'T'hh:mm:ss'Z'"
/>

                <field column="sort_date_modified"
sourceColName="modified_date" dateTimeFormat="yyyyMMddhhmmss" />
                <field column="sort_date_published"
sourceColName="df_date_published" dateTimeFormat="yyyyMMddhhmmss" />
            </entity>
        </entity>
    </document>
</dataConfig>

The fields in question are in the formats:

<RECORDS>
<RECORD>
    <PROP NAME="R_StoryDate">
        <PVAL>2001-12-04T00:00:00Z</PVAL>
    </PROP>
    <PROP NAME="R_ModifiedTime">
        <PVAL>2001-12-04T19:38:01Z</PVAL>
    </PROP>
</RECORD>
</RECORDS>

The exception that I am receiving is:

Oct 15, 2010 6:23:24 PM
org.apache.solr.handler.dataimport.DateFormatTransformer transformRow
WARNING: Could not parse a Date field
java.text.ParseException: Unparseable date: "Wed Nov 28 21:39:05 EST 2007"
    at java.text.DateFormat.parse(DateFormat.java:337)
    at
org.apache.solr.handler.dataimport.DateFormatTransformer.process(DateFormatTransformer.java:89)
    at
org.apache.solr.handler.dataimport.DateFormatTransformer.transformRow(DateFormatTransformer.java:69)
    at
org.apache.solr.handler.dataimport.EntityProcessorWrapper.applyTransformer(EntityProcessorWrapper.java:195)
    at
org.apache.solr.handler.dataimport.EntityProcessorWrapper.nextRow(EntityProcessorWrapper.java:241)
    at
org.apache.solr.handler.dataimport.DocBuilder.buildDocument(DocBuilder.java:357)
    at
org.apache.solr.handler.dataimport.DocBuilder.buildDocument(DocBuilder.java:383)
    at
org.apache.solr.handler.dataimport.DocBuilder.doFullDump(DocBuilder.java:242)
    at
org.apache.solr.handler.dataimport.DocBuilder.execute(DocBuilder.java:180)
    at
org.apache.solr.handler.dataimport.DataImporter.doFullImport(DataImporter.java:331)
    at
org.apache.solr.handler.dataimport.DataImporter.runCmd(DataImporter.java:389)
    at
org.apache.solr.handler.dataimport.DataImporter$1.run(DataImporter.java:370)

I know that it has to be the SortableLong fields, because if I remove just
those two lines from my dih.xml, everything imports as I expect it to. Am I
doing something wrong? Mis-using the SortableLong and/or DateTransformer? Is
this not supported in my version of SOLR? I'm not very experienced with
Java, so digging into the code would be a lost cause for me right now. I was
hoping that somebody here might be able to help point me in the
right/correct direction.

It should be noted that the modified_date and df_date_published fields index
just fine (so long as I do it as I've defined above).

Thank you,

- Ken

It looked like something resembling white marble, which was
probably what it was: something resembling white marble.
                -- Douglas Adams, "The Hitchhikers Guide to the Galaxy"

Re: SOLR DateTime and SortableLongField field type problems

Posted by Ken Stanley <do...@gmail.com>.
On Mon, Oct 18, 2010 at 7:52 AM, Michael Sokolov <so...@ifactory.com>wrote:

> I think if you look closely you'll find the date quoted in the Exception
> report doesn't match any of the declared formats in the schema.  I would
> suggest, as a first step, hunting through your data to see where that date
> is coming from.
>
> -Mike
>
>
[Note: RE-sending this because apparently in my sleepy-stupor, I clicked to
wrong Reply button and never sent this to the list (It's a Monday) :)]

I've noticed that date anomaly as well, and I've discovered that is one of
the gotchas of DIH: it seems to modify my date to that format. All of the
dates in the data are in the correct "yyyy-MM-dd'T'hh:mm:ss'Z'" format. Once
it is run through dateTImeFormat, I assume it is converted into a date
object; trying to use that date object in any other form (i.e., using
template, or even another dateTimeFormat) results in the exception I've
described (displaying the date in the incorrect format).

Thanks,

Ken Stanley

RE: SOLR DateTime and SortableLongField field type problems

Posted by Michael Sokolov <so...@ifactory.com>.
I think if you look closely you'll find the date quoted in the Exception
report doesn't match any of the declared formats in the schema.  I would
suggest, as a first step, hunting through your data to see where that date
is coming from.

-Mike

> -----Original Message-----
> From: Ken Stanley [mailto:dohpaz@gmail.com] 
> Sent: Monday, October 18, 2010 7:40 AM
> To: solr-user@lucene.apache.org
> Subject: Re: SOLR DateTime and SortableLongField field type problems
> 
> Just following up to see if anybody might have some words of 
> wisdom on the issue?
> 
> Thank you,
> 
> Ken
> 
> It looked like something resembling white marble, which was 
> probably what it was: something resembling white marble.
>                 -- Douglas Adams, "The Hitchhikers Guide to 
> the Galaxy"
> 
> 
> On Fri, Oct 15, 2010 at 6:42 PM, Ken Stanley <do...@gmail.com> wrote:
> 
> > Hello all,
> >
> > I am using SOLR-1.4.1 with the DataImportHandler, and I am 
> trying to 
> > follow the advice from 
> > 
> http://www.mail-archive.com/solr-user@lucene.apache.org/msg11887.htmla
> > bout converting date fields to SortableLong fields for 
> better memory 
> > efficiency. However, whenever I try to do this using the 
> DateFormater, 
> > I get exceptions when indexing for every row that tries to 
> create my sortable fields.
> >
> > In my schema.xml, I have the following definitions for the 
> fieldType 
> > and
> > dynamicField:
> >
> > <fieldType name="sdate" class="solr.SortableLongField" 
> indexed="true"
> > stored="false" sortMissingLast="true" omitNorms="true" /> 
> > <dynamicField name="sort_date_*" type="sdate" 
> stored="false" indexed="true"
> > />
> >
> > In my dih.xml, I have the following definitions:
> >
> > <dataConfig>
> >     <dataSource type="FileDataSource" encoding="UTF-8" />
> >         <entity
> >             name="xml_stories"
> >             rootEntity="false"
> >             dataSource="null"
> >             processor="FileListEntityProcessor"
> >             fileName="legacy_stories.*\.xml$"
> >             recursive="false"
> >             baseDir="/usr/local/extracts"
> >             newerThan="${dataimporter.xml_stories.last_index_time}"
> >         >
> >             <entity
> >                 name="stories"
> >                 pk="id"
> >                 dataSource="xml_stories"
> >                 processor="XPathEntityProcessor"
> >                 url="${xml_stories.fileAbsolutePath}"
> >                 forEach="/RECORDS/RECORD"
> >                 stream="true"
> >
> > 
> transformer="DateFormatTransformer,HTMLStripTransformer,RegexT
> ransformer,TemplateTransformer"
> >                 onError="continue"
> >             >
> >                 <field column="_modified_date"
> > xpath="/RECORDS/RECORD/PROP[@NAME='R_ModifiedTime']/PVAL" />
> >                 <field column="modified_date"
> > sourceColName="_modified_date" 
> > dateTimeFormat="yyyy-MM-dd'T'hh:mm:ss'Z'" />
> >
> >                 <field column="_df_date_published"
> > xpath="/RECORDS/RECORD/PROP[@NAME='R_StoryDate']/PVAL" />
> >                 <field column="df_date_published"
> > sourceColName="_df_date_published" 
> dateTimeFormat="yyyy-MM-dd'T'hh:mm:ss'Z'"
> > />
> >
> >                 <field column="sort_date_modified"
> > sourceColName="modified_date" dateTimeFormat="yyyyMMddhhmmss" />
> >                 <field column="sort_date_published"
> > sourceColName="df_date_published" dateTimeFormat="yyyyMMddhhmmss" />
> >             </entity>
> >         </entity>
> >     </document>
> > </dataConfig>
> >
> > The fields in question are in the formats:
> >
> > <RECORDS>
> > <RECORD>
> >     <PROP NAME="R_StoryDate">
> >         <PVAL>2001-12-04T00:00:00Z</PVAL>
> >     </PROP>
> >     <PROP NAME="R_ModifiedTime">
> >         <PVAL>2001-12-04T19:38:01Z</PVAL>
> >     </PROP>
> > </RECORD>
> > </RECORDS>
> >
> > The exception that I am receiving is:
> >
> > Oct 15, 2010 6:23:24 PM
> > org.apache.solr.handler.dataimport.DateFormatTransformer 
> transformRow
> > WARNING: Could not parse a Date field
> > java.text.ParseException: Unparseable date: "Wed Nov 28 
> 21:39:05 EST 2007"
> >     at java.text.DateFormat.parse(DateFormat.java:337)
> >     at
> > 
> org.apache.solr.handler.dataimport.DateFormatTransformer.proce
> ss(DateFormatTransformer.java:89)
> >     at
> > 
> org.apache.solr.handler.dataimport.DateFormatTransformer.trans
> formRow(DateFormatTransformer.java:69)
> >     at
> > 
> org.apache.solr.handler.dataimport.EntityProcessorWrapper.appl
> yTransformer(EntityProcessorWrapper.java:195)
> >     at
> > 
> org.apache.solr.handler.dataimport.EntityProcessorWrapper.next
> Row(EntityProcessorWrapper.java:241)
> >     at
> > 
> org.apache.solr.handler.dataimport.DocBuilder.buildDocument(Do
> cBuilder.java:357)
> >     at
> > 
> org.apache.solr.handler.dataimport.DocBuilder.buildDocument(Do
> cBuilder.java:383)
> >     at
> > 
> org.apache.solr.handler.dataimport.DocBuilder.doFullDump(DocBu
> ilder.java:242)
> >     at
> > 
> org.apache.solr.handler.dataimport.DocBuilder.execute(DocBuild
> er.java:180)
> >     at
> > 
> org.apache.solr.handler.dataimport.DataImporter.doFullImport(D
> ataImporter.java:331)
> >     at
> > 
> org.apache.solr.handler.dataimport.DataImporter.runCmd(DataImp
> orter.java:389)
> >     at
> > 
> org.apache.solr.handler.dataimport.DataImporter$1.run(DataImporter.jav
> > a:370)
> >
> > I know that it has to be the SortableLong fields, because 
> if I remove 
> > just those two lines from my dih.xml, everything imports as 
> I expect 
> > it to. Am I doing something wrong? Mis-using the 
> SortableLong and/or 
> > DateTransformer? Is this not supported in my version of 
> SOLR? I'm not 
> > very experienced with Java, so digging into the code would 
> be a lost 
> > cause for me right now. I was hoping that somebody here 
> might be able 
> > to help point me in the right/correct direction.
> >
> > It should be noted that the modified_date and 
> df_date_published fields 
> > index just fine (so long as I do it as I've defined above).
> >
> > Thank you,
> >
> > - Ken
> >
> > It looked like something resembling white marble, which was 
> probably 
> > what it was: something resembling white marble.
> >                 -- Douglas Adams, "The Hitchhikers Guide to 
> the Galaxy"
> >
> 


Re: SOLR DateTime and SortableLongField field type problems

Posted by Ken Stanley <do...@gmail.com>.
Just following up to see if anybody might have some words of wisdom on the
issue?

Thank you,

Ken

It looked like something resembling white marble, which was
probably what it was: something resembling white marble.
                -- Douglas Adams, "The Hitchhikers Guide to the Galaxy"


On Fri, Oct 15, 2010 at 6:42 PM, Ken Stanley <do...@gmail.com> wrote:

> Hello all,
>
> I am using SOLR-1.4.1 with the DataImportHandler, and I am trying to follow
> the advice from
> http://www.mail-archive.com/solr-user@lucene.apache.org/msg11887.htmlabout converting date fields to SortableLong fields for better memory
> efficiency. However, whenever I try to do this using the DateFormater, I get
> exceptions when indexing for every row that tries to create my sortable
> fields.
>
> In my schema.xml, I have the following definitions for the fieldType and
> dynamicField:
>
> <fieldType name="sdate" class="solr.SortableLongField" indexed="true"
> stored="false" sortMissingLast="true" omitNorms="true" />
> <dynamicField name="sort_date_*" type="sdate" stored="false" indexed="true"
> />
>
> In my dih.xml, I have the following definitions:
>
> <dataConfig>
>     <dataSource type="FileDataSource" encoding="UTF-8" />
>         <entity
>             name="xml_stories"
>             rootEntity="false"
>             dataSource="null"
>             processor="FileListEntityProcessor"
>             fileName="legacy_stories.*\.xml$"
>             recursive="false"
>             baseDir="/usr/local/extracts"
>             newerThan="${dataimporter.xml_stories.last_index_time}"
>         >
>             <entity
>                 name="stories"
>                 pk="id"
>                 dataSource="xml_stories"
>                 processor="XPathEntityProcessor"
>                 url="${xml_stories.fileAbsolutePath}"
>                 forEach="/RECORDS/RECORD"
>                 stream="true"
>
> transformer="DateFormatTransformer,HTMLStripTransformer,RegexTransformer,TemplateTransformer"
>                 onError="continue"
>             >
>                 <field column="_modified_date"
> xpath="/RECORDS/RECORD/PROP[@NAME='R_ModifiedTime']/PVAL" />
>                 <field column="modified_date"
> sourceColName="_modified_date" dateTimeFormat="yyyy-MM-dd'T'hh:mm:ss'Z'" />
>
>                 <field column="_df_date_published"
> xpath="/RECORDS/RECORD/PROP[@NAME='R_StoryDate']/PVAL" />
>                 <field column="df_date_published"
> sourceColName="_df_date_published" dateTimeFormat="yyyy-MM-dd'T'hh:mm:ss'Z'"
> />
>
>                 <field column="sort_date_modified"
> sourceColName="modified_date" dateTimeFormat="yyyyMMddhhmmss" />
>                 <field column="sort_date_published"
> sourceColName="df_date_published" dateTimeFormat="yyyyMMddhhmmss" />
>             </entity>
>         </entity>
>     </document>
> </dataConfig>
>
> The fields in question are in the formats:
>
> <RECORDS>
> <RECORD>
>     <PROP NAME="R_StoryDate">
>         <PVAL>2001-12-04T00:00:00Z</PVAL>
>     </PROP>
>     <PROP NAME="R_ModifiedTime">
>         <PVAL>2001-12-04T19:38:01Z</PVAL>
>     </PROP>
> </RECORD>
> </RECORDS>
>
> The exception that I am receiving is:
>
> Oct 15, 2010 6:23:24 PM
> org.apache.solr.handler.dataimport.DateFormatTransformer transformRow
> WARNING: Could not parse a Date field
> java.text.ParseException: Unparseable date: "Wed Nov 28 21:39:05 EST 2007"
>     at java.text.DateFormat.parse(DateFormat.java:337)
>     at
> org.apache.solr.handler.dataimport.DateFormatTransformer.process(DateFormatTransformer.java:89)
>     at
> org.apache.solr.handler.dataimport.DateFormatTransformer.transformRow(DateFormatTransformer.java:69)
>     at
> org.apache.solr.handler.dataimport.EntityProcessorWrapper.applyTransformer(EntityProcessorWrapper.java:195)
>     at
> org.apache.solr.handler.dataimport.EntityProcessorWrapper.nextRow(EntityProcessorWrapper.java:241)
>     at
> org.apache.solr.handler.dataimport.DocBuilder.buildDocument(DocBuilder.java:357)
>     at
> org.apache.solr.handler.dataimport.DocBuilder.buildDocument(DocBuilder.java:383)
>     at
> org.apache.solr.handler.dataimport.DocBuilder.doFullDump(DocBuilder.java:242)
>     at
> org.apache.solr.handler.dataimport.DocBuilder.execute(DocBuilder.java:180)
>     at
> org.apache.solr.handler.dataimport.DataImporter.doFullImport(DataImporter.java:331)
>     at
> org.apache.solr.handler.dataimport.DataImporter.runCmd(DataImporter.java:389)
>     at
> org.apache.solr.handler.dataimport.DataImporter$1.run(DataImporter.java:370)
>
> I know that it has to be the SortableLong fields, because if I remove just
> those two lines from my dih.xml, everything imports as I expect it to. Am I
> doing something wrong? Mis-using the SortableLong and/or DateTransformer? Is
> this not supported in my version of SOLR? I'm not very experienced with
> Java, so digging into the code would be a lost cause for me right now. I was
> hoping that somebody here might be able to help point me in the
> right/correct direction.
>
> It should be noted that the modified_date and df_date_published fields
> index just fine (so long as I do it as I've defined above).
>
> Thank you,
>
> - Ken
>
> It looked like something resembling white marble, which was
> probably what it was: something resembling white marble.
>                 -- Douglas Adams, "The Hitchhikers Guide to the Galaxy"
>