You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@nutch.apache.org by "Markus Jelsma (JIRA)" <ji...@apache.org> on 2011/04/26 11:52:03 UTC
[jira] [Created] (NUTCH-986) Dedup fails due to date format (long)
Dedup fails due to date format (long)
-------------------------------------
Key: NUTCH-986
URL: https://issues.apache.org/jira/browse/NUTCH-986
Project: Nutch
Issue Type: Bug
Components: indexer
Affects Versions: 1.3
Reporter: Markus Jelsma
Fix For: 1.3
As already mentioned on the list, dedup also failes because of invalid date formats.
Apr 19, 2011 10:34:50 AM org.apache.solr.request.BinaryResponseWriter$Resolver
getDoc
WARNING: Error reading a field from document :
SolrDocument[{digest=7ff92a31c58e43a34fd45bc6d87cda03}]
java.lang.NumberFormatException: For input string: "2011-04-19T08:16:31.675Z"
at
java.lang.NumberFormatException.forInputString(NumberFormatException.java:48)
at java.lang.Long.parseLong(Long.java:419)
at java.lang.Long.valueOf(Long.java:525)
at org.apache.solr.schema.LongField.toObject(LongField.java:82)
....
Strange enough, Solr seems to allow updates of long fields with a formatted
date. In Nutch 1.2 the tstamp field is actually a long but in 1.3 the field is
a valid Solr date format. This exception is only triggered using the javabin
response writer so there's something weird in Solr too.
We need to either change the tstamp field back to a long or update the Solr
example schema and fix SolrDeleteDuplicates to use the formatted date instead
of the long.
--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Closed] (NUTCH-986) Dedup fails due to date format (long)
Posted by "Markus Jelsma (JIRA)" <ji...@apache.org>.
[ https://issues.apache.org/jira/browse/NUTCH-986?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Markus Jelsma closed NUTCH-986.
-------------------------------
> Dedup fails due to date format (long)
> -------------------------------------
>
> Key: NUTCH-986
> URL: https://issues.apache.org/jira/browse/NUTCH-986
> Project: Nutch
> Issue Type: Bug
> Components: indexer
> Affects Versions: 1.3, 2.0
> Reporter: Markus Jelsma
> Assignee: Markus Jelsma
> Fix For: 1.3, 2.0
>
> Attachments: NUTCH-986-1.3-1.patch, NUTCH-986-1.3-2.patch, NUTCH-986-trunk-1.patch, NUTCH-986-trunk-2.patch
>
>
> As already mentioned on the list, dedup also failes because of invalid date formats.
> Apr 19, 2011 10:34:50 AM org.apache.solr.request.BinaryResponseWriter$Resolver
> getDoc
> WARNING: Error reading a field from document :
> SolrDocument[{digest=7ff92a31c58e43a34fd45bc6d87cda03}]
> java.lang.NumberFormatException: For input string: "2011-04-19T08:16:31.675Z"
> at
> java.lang.NumberFormatException.forInputString(NumberFormatException.java:48)
> at java.lang.Long.parseLong(Long.java:419)
> at java.lang.Long.valueOf(Long.java:525)
> at org.apache.solr.schema.LongField.toObject(LongField.java:82)
> ....
> Strange enough, Solr seems to allow updates of long fields with a formatted
> date. In Nutch 1.2 the tstamp field is actually a long but in 1.3 the field is
> a valid Solr date format. This exception is only triggered using the javabin
> response writer so there's something weird in Solr too.
> We need to either change the tstamp field back to a long or update the Solr
> example schema and fix SolrDeleteDuplicates to use the formatted date instead
> of the long.
--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Resolved] (NUTCH-986) Dedup fails due to date format (long)
Posted by "Markus Jelsma (JIRA)" <ji...@apache.org>.
[ https://issues.apache.org/jira/browse/NUTCH-986?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Markus Jelsma resolved NUTCH-986.
---------------------------------
Resolution: Fixed
Committed 1.3 in rev. 1097390 and for trunk in rev. 1097391.
> Dedup fails due to date format (long)
> -------------------------------------
>
> Key: NUTCH-986
> URL: https://issues.apache.org/jira/browse/NUTCH-986
> Project: Nutch
> Issue Type: Bug
> Components: indexer
> Affects Versions: 1.3, 2.0
> Reporter: Markus Jelsma
> Assignee: Markus Jelsma
> Fix For: 1.3, 2.0
>
> Attachments: NUTCH-986-1.3-1.patch, NUTCH-986-trunk-1.patch
>
>
> As already mentioned on the list, dedup also failes because of invalid date formats.
> Apr 19, 2011 10:34:50 AM org.apache.solr.request.BinaryResponseWriter$Resolver
> getDoc
> WARNING: Error reading a field from document :
> SolrDocument[{digest=7ff92a31c58e43a34fd45bc6d87cda03}]
> java.lang.NumberFormatException: For input string: "2011-04-19T08:16:31.675Z"
> at
> java.lang.NumberFormatException.forInputString(NumberFormatException.java:48)
> at java.lang.Long.parseLong(Long.java:419)
> at java.lang.Long.valueOf(Long.java:525)
> at org.apache.solr.schema.LongField.toObject(LongField.java:82)
> ....
> Strange enough, Solr seems to allow updates of long fields with a formatted
> date. In Nutch 1.2 the tstamp field is actually a long but in 1.3 the field is
> a valid Solr date format. This exception is only triggered using the javabin
> response writer so there's something weird in Solr too.
> We need to either change the tstamp field back to a long or update the Solr
> example schema and fix SolrDeleteDuplicates to use the formatted date instead
> of the long.
--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (NUTCH-986) Dedup fails due to date format (long)
Posted by "Markus Jelsma (JIRA)" <ji...@apache.org>.
[ https://issues.apache.org/jira/browse/NUTCH-986?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Markus Jelsma updated NUTCH-986:
--------------------------------
Attachment: NUTCH-986-trunk-1.patch
Patch for trunk!
> Dedup fails due to date format (long)
> -------------------------------------
>
> Key: NUTCH-986
> URL: https://issues.apache.org/jira/browse/NUTCH-986
> Project: Nutch
> Issue Type: Bug
> Components: indexer
> Affects Versions: 1.3, 2.0
> Reporter: Markus Jelsma
> Assignee: Markus Jelsma
> Fix For: 1.3, 2.0
>
> Attachments: NUTCH-986-1.3-1.patch, NUTCH-986-trunk-1.patch
>
>
> As already mentioned on the list, dedup also failes because of invalid date formats.
> Apr 19, 2011 10:34:50 AM org.apache.solr.request.BinaryResponseWriter$Resolver
> getDoc
> WARNING: Error reading a field from document :
> SolrDocument[{digest=7ff92a31c58e43a34fd45bc6d87cda03}]
> java.lang.NumberFormatException: For input string: "2011-04-19T08:16:31.675Z"
> at
> java.lang.NumberFormatException.forInputString(NumberFormatException.java:48)
> at java.lang.Long.parseLong(Long.java:419)
> at java.lang.Long.valueOf(Long.java:525)
> at org.apache.solr.schema.LongField.toObject(LongField.java:82)
> ....
> Strange enough, Solr seems to allow updates of long fields with a formatted
> date. In Nutch 1.2 the tstamp field is actually a long but in 1.3 the field is
> a valid Solr date format. This exception is only triggered using the javabin
> response writer so there's something weird in Solr too.
> We need to either change the tstamp field back to a long or update the Solr
> example schema and fix SolrDeleteDuplicates to use the formatted date instead
> of the long.
--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (NUTCH-986) Dedup fails due to date format
(long)
Posted by "Markus Jelsma (JIRA)" <ji...@apache.org>.
[ https://issues.apache.org/jira/browse/NUTCH-986?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13026255#comment-13026255 ]
Markus Jelsma commented on NUTCH-986:
-------------------------------------
Recommitted 1.3 in rev 1097410 and for trunk in rev. 1097411. Apologies!
> Dedup fails due to date format (long)
> -------------------------------------
>
> Key: NUTCH-986
> URL: https://issues.apache.org/jira/browse/NUTCH-986
> Project: Nutch
> Issue Type: Bug
> Components: indexer
> Affects Versions: 1.3, 2.0
> Reporter: Markus Jelsma
> Assignee: Markus Jelsma
> Fix For: 1.3, 2.0
>
> Attachments: NUTCH-986-1.3-1.patch, NUTCH-986-1.3-2.patch, NUTCH-986-trunk-1.patch, NUTCH-986-trunk-2.patch
>
>
> As already mentioned on the list, dedup also failes because of invalid date formats.
> Apr 19, 2011 10:34:50 AM org.apache.solr.request.BinaryResponseWriter$Resolver
> getDoc
> WARNING: Error reading a field from document :
> SolrDocument[{digest=7ff92a31c58e43a34fd45bc6d87cda03}]
> java.lang.NumberFormatException: For input string: "2011-04-19T08:16:31.675Z"
> at
> java.lang.NumberFormatException.forInputString(NumberFormatException.java:48)
> at java.lang.Long.parseLong(Long.java:419)
> at java.lang.Long.valueOf(Long.java:525)
> at org.apache.solr.schema.LongField.toObject(LongField.java:82)
> ....
> Strange enough, Solr seems to allow updates of long fields with a formatted
> date. In Nutch 1.2 the tstamp field is actually a long but in 1.3 the field is
> a valid Solr date format. This exception is only triggered using the javabin
> response writer so there's something weird in Solr too.
> We need to either change the tstamp field back to a long or update the Solr
> example schema and fix SolrDeleteDuplicates to use the formatted date instead
> of the long.
--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (NUTCH-986) Dedup fails due to date format
(long)
Posted by "Markus Jelsma (JIRA)" <ji...@apache.org>.
[ https://issues.apache.org/jira/browse/NUTCH-986?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13025869#comment-13025869 ]
Markus Jelsma commented on NUTCH-986:
-------------------------------------
If there are no objections i'll commit this one tomorrow and commit NUTCH-991 (dedup must commit) as well.
> Dedup fails due to date format (long)
> -------------------------------------
>
> Key: NUTCH-986
> URL: https://issues.apache.org/jira/browse/NUTCH-986
> Project: Nutch
> Issue Type: Bug
> Components: indexer
> Affects Versions: 1.3, 2.0
> Reporter: Markus Jelsma
> Assignee: Markus Jelsma
> Fix For: 1.3, 2.0
>
> Attachments: NUTCH-986-1.3-1.patch, NUTCH-986-trunk-1.patch
>
>
> As already mentioned on the list, dedup also failes because of invalid date formats.
> Apr 19, 2011 10:34:50 AM org.apache.solr.request.BinaryResponseWriter$Resolver
> getDoc
> WARNING: Error reading a field from document :
> SolrDocument[{digest=7ff92a31c58e43a34fd45bc6d87cda03}]
> java.lang.NumberFormatException: For input string: "2011-04-19T08:16:31.675Z"
> at
> java.lang.NumberFormatException.forInputString(NumberFormatException.java:48)
> at java.lang.Long.parseLong(Long.java:419)
> at java.lang.Long.valueOf(Long.java:525)
> at org.apache.solr.schema.LongField.toObject(LongField.java:82)
> ....
> Strange enough, Solr seems to allow updates of long fields with a formatted
> date. In Nutch 1.2 the tstamp field is actually a long but in 1.3 the field is
> a valid Solr date format. This exception is only triggered using the javabin
> response writer so there's something weird in Solr too.
> We need to either change the tstamp field back to a long or update the Solr
> example schema and fix SolrDeleteDuplicates to use the formatted date instead
> of the long.
--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (NUTCH-986) Dedup fails due to date format (long)
Posted by "Markus Jelsma (JIRA)" <ji...@apache.org>.
[ https://issues.apache.org/jira/browse/NUTCH-986?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Markus Jelsma updated NUTCH-986:
--------------------------------
Affects Version/s: 2.0
Fix Version/s: 2.0
Assignee: Markus Jelsma
> Dedup fails due to date format (long)
> -------------------------------------
>
> Key: NUTCH-986
> URL: https://issues.apache.org/jira/browse/NUTCH-986
> Project: Nutch
> Issue Type: Bug
> Components: indexer
> Affects Versions: 1.3, 2.0
> Reporter: Markus Jelsma
> Assignee: Markus Jelsma
> Fix For: 1.3, 2.0
>
>
> As already mentioned on the list, dedup also failes because of invalid date formats.
> Apr 19, 2011 10:34:50 AM org.apache.solr.request.BinaryResponseWriter$Resolver
> getDoc
> WARNING: Error reading a field from document :
> SolrDocument[{digest=7ff92a31c58e43a34fd45bc6d87cda03}]
> java.lang.NumberFormatException: For input string: "2011-04-19T08:16:31.675Z"
> at
> java.lang.NumberFormatException.forInputString(NumberFormatException.java:48)
> at java.lang.Long.parseLong(Long.java:419)
> at java.lang.Long.valueOf(Long.java:525)
> at org.apache.solr.schema.LongField.toObject(LongField.java:82)
> ....
> Strange enough, Solr seems to allow updates of long fields with a formatted
> date. In Nutch 1.2 the tstamp field is actually a long but in 1.3 the field is
> a valid Solr date format. This exception is only triggered using the javabin
> response writer so there's something weird in Solr too.
> We need to either change the tstamp field back to a long or update the Solr
> example schema and fix SolrDeleteDuplicates to use the formatted date instead
> of the long.
--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (NUTCH-986) Dedup fails due to date format
(long)
Posted by "Hudson (JIRA)" <ji...@apache.org>.
[ https://issues.apache.org/jira/browse/NUTCH-986?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13056984#comment-13056984 ]
Hudson commented on NUTCH-986:
------------------------------
Integrated in Nutch-trunk #1530 (See [https://builds.apache.org/job/Nutch-trunk/1530/])
> Dedup fails due to date format (long)
> -------------------------------------
>
> Key: NUTCH-986
> URL: https://issues.apache.org/jira/browse/NUTCH-986
> Project: Nutch
> Issue Type: Bug
> Components: indexer
> Affects Versions: 1.3, 2.0
> Reporter: Markus Jelsma
> Assignee: Markus Jelsma
> Fix For: 1.3, 2.0
>
> Attachments: NUTCH-986-1.3-1.patch, NUTCH-986-1.3-2.patch, NUTCH-986-trunk-1.patch, NUTCH-986-trunk-2.patch
>
>
> As already mentioned on the list, dedup also failes because of invalid date formats.
> Apr 19, 2011 10:34:50 AM org.apache.solr.request.BinaryResponseWriter$Resolver
> getDoc
> WARNING: Error reading a field from document :
> SolrDocument[{digest=7ff92a31c58e43a34fd45bc6d87cda03}]
> java.lang.NumberFormatException: For input string: "2011-04-19T08:16:31.675Z"
> at
> java.lang.NumberFormatException.forInputString(NumberFormatException.java:48)
> at java.lang.Long.parseLong(Long.java:419)
> at java.lang.Long.valueOf(Long.java:525)
> at org.apache.solr.schema.LongField.toObject(LongField.java:82)
> ....
> Strange enough, Solr seems to allow updates of long fields with a formatted
> date. In Nutch 1.2 the tstamp field is actually a long but in 1.3 the field is
> a valid Solr date format. This exception is only triggered using the javabin
> response writer so there's something weird in Solr too.
> We need to either change the tstamp field back to a long or update the Solr
> example schema and fix SolrDeleteDuplicates to use the formatted date instead
> of the long.
--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (NUTCH-986) Dedup fails due to date format (long)
Posted by "Markus Jelsma (JIRA)" <ji...@apache.org>.
[ https://issues.apache.org/jira/browse/NUTCH-986?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Markus Jelsma updated NUTCH-986:
--------------------------------
Attachment: NUTCH-986-1.3-1.patch
Here's a patch. It leaves all code intact but only converts the incoming formatted date to the internally used long. Tested and confirmed to work as expected.
> Dedup fails due to date format (long)
> -------------------------------------
>
> Key: NUTCH-986
> URL: https://issues.apache.org/jira/browse/NUTCH-986
> Project: Nutch
> Issue Type: Bug
> Components: indexer
> Affects Versions: 1.3, 2.0
> Reporter: Markus Jelsma
> Assignee: Markus Jelsma
> Fix For: 1.3, 2.0
>
> Attachments: NUTCH-986-1.3-1.patch
>
>
> As already mentioned on the list, dedup also failes because of invalid date formats.
> Apr 19, 2011 10:34:50 AM org.apache.solr.request.BinaryResponseWriter$Resolver
> getDoc
> WARNING: Error reading a field from document :
> SolrDocument[{digest=7ff92a31c58e43a34fd45bc6d87cda03}]
> java.lang.NumberFormatException: For input string: "2011-04-19T08:16:31.675Z"
> at
> java.lang.NumberFormatException.forInputString(NumberFormatException.java:48)
> at java.lang.Long.parseLong(Long.java:419)
> at java.lang.Long.valueOf(Long.java:525)
> at org.apache.solr.schema.LongField.toObject(LongField.java:82)
> ....
> Strange enough, Solr seems to allow updates of long fields with a formatted
> date. In Nutch 1.2 the tstamp field is actually a long but in 1.3 the field is
> a valid Solr date format. This exception is only triggered using the javabin
> response writer so there's something weird in Solr too.
> We need to either change the tstamp field back to a long or update the Solr
> example schema and fix SolrDeleteDuplicates to use the formatted date instead
> of the long.
--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (NUTCH-986) Dedup fails due to date format (long)
Posted by "Markus Jelsma (JIRA)" <ji...@apache.org>.
[ https://issues.apache.org/jira/browse/NUTCH-986?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Markus Jelsma updated NUTCH-986:
--------------------------------
Patch Info: [Patch Available]
> Dedup fails due to date format (long)
> -------------------------------------
>
> Key: NUTCH-986
> URL: https://issues.apache.org/jira/browse/NUTCH-986
> Project: Nutch
> Issue Type: Bug
> Components: indexer
> Affects Versions: 1.3, 2.0
> Reporter: Markus Jelsma
> Assignee: Markus Jelsma
> Fix For: 1.3, 2.0
>
>
> As already mentioned on the list, dedup also failes because of invalid date formats.
> Apr 19, 2011 10:34:50 AM org.apache.solr.request.BinaryResponseWriter$Resolver
> getDoc
> WARNING: Error reading a field from document :
> SolrDocument[{digest=7ff92a31c58e43a34fd45bc6d87cda03}]
> java.lang.NumberFormatException: For input string: "2011-04-19T08:16:31.675Z"
> at
> java.lang.NumberFormatException.forInputString(NumberFormatException.java:48)
> at java.lang.Long.parseLong(Long.java:419)
> at java.lang.Long.valueOf(Long.java:525)
> at org.apache.solr.schema.LongField.toObject(LongField.java:82)
> ....
> Strange enough, Solr seems to allow updates of long fields with a formatted
> date. In Nutch 1.2 the tstamp field is actually a long but in 1.3 the field is
> a valid Solr date format. This exception is only triggered using the javabin
> response writer so there's something weird in Solr too.
> We need to either change the tstamp field back to a long or update the Solr
> example schema and fix SolrDeleteDuplicates to use the formatted date instead
> of the long.
--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (NUTCH-986) Dedup fails due to date format (long)
Posted by "Markus Jelsma (JIRA)" <ji...@apache.org>.
[ https://issues.apache.org/jira/browse/NUTCH-986?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Markus Jelsma updated NUTCH-986:
--------------------------------
Attachment: NUTCH-986-1.3-2.patch
NUTCH-986-trunk-2.patch
Previous patch was incorrect but committed. It did actually deduplicate but threw exceptions. These patches fix it all.
> Dedup fails due to date format (long)
> -------------------------------------
>
> Key: NUTCH-986
> URL: https://issues.apache.org/jira/browse/NUTCH-986
> Project: Nutch
> Issue Type: Bug
> Components: indexer
> Affects Versions: 1.3, 2.0
> Reporter: Markus Jelsma
> Assignee: Markus Jelsma
> Fix For: 1.3, 2.0
>
> Attachments: NUTCH-986-1.3-1.patch, NUTCH-986-1.3-2.patch, NUTCH-986-trunk-1.patch, NUTCH-986-trunk-2.patch
>
>
> As already mentioned on the list, dedup also failes because of invalid date formats.
> Apr 19, 2011 10:34:50 AM org.apache.solr.request.BinaryResponseWriter$Resolver
> getDoc
> WARNING: Error reading a field from document :
> SolrDocument[{digest=7ff92a31c58e43a34fd45bc6d87cda03}]
> java.lang.NumberFormatException: For input string: "2011-04-19T08:16:31.675Z"
> at
> java.lang.NumberFormatException.forInputString(NumberFormatException.java:48)
> at java.lang.Long.parseLong(Long.java:419)
> at java.lang.Long.valueOf(Long.java:525)
> at org.apache.solr.schema.LongField.toObject(LongField.java:82)
> ....
> Strange enough, Solr seems to allow updates of long fields with a formatted
> date. In Nutch 1.2 the tstamp field is actually a long but in 1.3 the field is
> a valid Solr date format. This exception is only triggered using the javabin
> response writer so there's something weird in Solr too.
> We need to either change the tstamp field back to a long or update the Solr
> example schema and fix SolrDeleteDuplicates to use the formatted date instead
> of the long.
--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira