You are viewing a plain text version of this content. The canonical link for it is here.
Posted to solr-user@lucene.apache.org by jayantu <ja...@yahoo.com> on 2012/08/17 23:42:43 UTC

Solr RSS DIH dateTimeFormat does work

I have DIH for indexing RSS feeds. the problem is that the date format
expected by solr is 1995-12-31T23:59:59Z while the way i see the date that
comes in rss feed is 'Wed, 15 Aug 2012 14:11:27 EDT' so I used the
dateTimeFormat transformer like this:
<field column="pubDate" xpath="/rss/channel/item/pubDate"
dateTimeFormat="EEE, dd MMM yyyy HH:mm:ss z" locale="en"
dataSource="dsurl"/>

The filed "pubDate" is defined as type date in the schema file. Even after
this I still get the Solr Exception as:
org.apache.solr.common.SolrException: ERROR:
[doc=rss.cnn.com/~r/rss/cnn_allpolitics/~3/R2L1CPDPBJU/] Error adding field
'pubDate'='Wed, 15 Aug 2012 14:11:27 EDT' 

What might be the issue?
            



--
View this message in context: http://lucene.472066.n3.nabble.com/Solr-RSS-DIH-dateTimeFormat-does-work-tp4001911.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: Solr RSS DIH dateTimeFormat does work

Posted by jayantu <ja...@yahoo.com>.
yes it does
SimpleDateFormat sdt = new SimpleDateFormat("EEE',' dd MMM yyyy HH:mm:ss
z");
	try {
		Date myDate = 	sdt.parse(myDateStr);
		System.out.println("Parsed date: " + myDate);
			
	} catch (ParseException e) {
		// TODO Auto-generated catch block
		e.printStackTrace();
}




--
View this message in context: http://lucene.472066.n3.nabble.com/Solr-RSS-DIH-dateTimeFormat-does-NOT-work-tp4001911p4002005.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: Solr RSS DIH dateTimeFormat does work

Posted by Lance Norskog <go...@gmail.com>.
Have you checked this date time format in Java code?

On Fri, Aug 17, 2012 at 2:42 PM, jayantu <ja...@yahoo.com> wrote:
> I have DIH for indexing RSS feeds. the problem is that the date format
> expected by solr is 1995-12-31T23:59:59Z while the way i see the date that
> comes in rss feed is 'Wed, 15 Aug 2012 14:11:27 EDT' so I used the
> dateTimeFormat transformer like this:
> <field column="pubDate" xpath="/rss/channel/item/pubDate"
> dateTimeFormat="EEE, dd MMM yyyy HH:mm:ss z" locale="en"
> dataSource="dsurl"/>
>
> The filed "pubDate" is defined as type date in the schema file. Even after
> this I still get the Solr Exception as:
> org.apache.solr.common.SolrException: ERROR:
> [doc=rss.cnn.com/~r/rss/cnn_allpolitics/~3/R2L1CPDPBJU/] Error adding field
> 'pubDate'='Wed, 15 Aug 2012 14:11:27 EDT'
>
> What might be the issue?
>
>
>
>
> --
> View this message in context: http://lucene.472066.n3.nabble.com/Solr-RSS-DIH-dateTimeFormat-does-work-tp4001911.html
> Sent from the Solr - User mailing list archive at Nabble.com.



-- 
Lance Norskog
goksron@gmail.com

Re: Solr RSS DIH dateTimeFormat does work

Posted by Jack Krupansky <ja...@basetechnology.com>.
The enclosing "entity" must specify the transformer, as in:

<entity name="slashdot"
                        pk="link"
                        url="http://rss.slashdot.org/Slashdot/slashdot"
                        processor="XPathEntityProcessor"
                        forEach="/RDF/channel | /RDF/item"
                        transformer="DateFormatTransformer">

Compare your DIH config with the wiki:
http://wiki.apache.org/solr/DataImportHandler#HttpDataSource_Example

-- Jack Krupansky

-----Original Message----- 
From: jayantu
Sent: Friday, August 17, 2012 5:42 PM
To: solr-user@lucene.apache.org
Subject: Solr RSS DIH dateTimeFormat does work

I have DIH for indexing RSS feeds. the problem is that the date format
expected by solr is 1995-12-31T23:59:59Z while the way i see the date that
comes in rss feed is 'Wed, 15 Aug 2012 14:11:27 EDT' so I used the
dateTimeFormat transformer like this:
<field column="pubDate" xpath="/rss/channel/item/pubDate"
dateTimeFormat="EEE, dd MMM yyyy HH:mm:ss z" locale="en"
dataSource="dsurl"/>

The filed "pubDate" is defined as type date in the schema file. Even after
this I still get the Solr Exception as:
org.apache.solr.common.SolrException: ERROR:
[doc=rss.cnn.com/~r/rss/cnn_allpolitics/~3/R2L1CPDPBJU/] Error adding field
'pubDate'='Wed, 15 Aug 2012 14:11:27 EDT'

What might be the issue?




--
View this message in context: 
http://lucene.472066.n3.nabble.com/Solr-RSS-DIH-dateTimeFormat-does-work-tp4001911.html
Sent from the Solr - User mailing list archive at Nabble.com.