You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@oodt.apache.org by "Chris A. Mattmann (JIRA)" <ji...@apache.org> on 2011/02/25 06:58:38 UTC

[jira] Commented: (OODT-148) crawler returns error if we use capital letters in the name of the mimetypes or if we don't use "product/" at the beginning of the name.

    [ https://issues.apache.org/jira/browse/OODT-148?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12999224#comment-12999224 ] 

Chris A. Mattmann commented on OODT-148:
----------------------------------------

Hey Faranak, I think this might be an issue with Tika. You may want to join user@tika.apache.org and ask your question there.

> crawler returns error if we use capital letters in the name of the mimetypes or if we don't use "product/" at the beginning of the name.
> ----------------------------------------------------------------------------------------------------------------------------------------
>
>                 Key: OODT-148
>                 URL: https://issues.apache.org/jira/browse/OODT-148
>             Project: OODT
>          Issue Type: Bug
>          Components: crawler
>    Affects Versions: 0.3
>         Environment: unix
>            Reporter: faranak davoodi
>             Fix For: 0.2
>
>
> naming the mimetypes in the crawler required some certain format that is not documented anywhere. Suppose it should be started with "product/" and it should be all in lower case of it returns error. And the error is so general that you don't know what the problem is.
> I used "product/dadsL0" as the name for a mimetype. and I got the errors below:
> Feb 24, 2011 9:20:49 PM org.apache.oodt.cas.crawl.ProductCrawler handleFile
> INFO: Handling file /usr/local/carve/support/filemgr_lucene/carveFiles/20110209183453.dadsL0
> Feb 24, 2011 9:20:49 PM org.apache.oodt.cas.crawl.AutoDetectProductCrawler passesPreconditions
> WARNING: No extractor specs specified for /usr/local/carve/support/filemgr_lucene/carveFiles/20110209183453.dadsL0
> Feb 24, 2011 9:20:49 PM org.apache.oodt.cas.crawl.ProductCrawler handleFile
> WARNING: Failed to pass preconditions for ingest of product: [/usr/local/carve/support/filemgr_lucene/carveFiles/20110209183453.dadsL0]
> After changing the mimetype name to "product/dadsl0" I got:
> Feb 24, 2011 9:25:46 PM org.apache.oodt.cas.crawl.ProductCrawler ingest
> INFO: Successfully ingested product: [/usr/local/carve/support/filemgr_lucene/carveFiles/20110209183453.dadsL0]: product id: b2c6deec-409f-11e0-9885-3f3332df0e68
> Feb 24, 2011 9:25:46 PM org.apache.oodt.cas.crawl.ProductCrawler handleFile
> INFO: Successful ingest of product: [/usr/local/carve/support/filemgr_lucene/carveFiles/20110209183453.dadsL0]
> I wish the format for the mimetype names wouldn't be this sensitive. And if it is necessary to have such a format, then we might want to have it documented in the crawler's user guide to avoid hours of confusion.
> Thanks.

-- 
This message is automatically generated by JIRA.
-
For more information on JIRA, see: http://www.atlassian.com/software/jira