You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@oodt.apache.org by Konstantinos Mavrommatis <km...@celgene.com> on 2016/04/03 07:34:48 UTC

Transition from OODT 0.6 to 0.12 cannot find extractor specifications

Hi,
I am trying to replicate a fully functional service that I had setup long time ago using OODT 0.6 but I am having the following problem that does not allow me to ingest files. When I try to ingest files with the extension fastq.gz I get the line:
WARNING: No extractor specs specified for /mnt/celgene.rnd.combio.mmgp.external/TestSeqData/RNA-Seq/RawData/fastq/E837642_R1.fastq.gz
Apr 02, 2016 10:12:14 PM org.apache.oodt.cas.crawl.ProductCrawler handleFile
And of course the file is not ingested. This process works without problem with OODT 0.6 on a different server.

The crawler command I am running is:
./crawler_launcher \
--operation \
--launchAutoCrawler \
--productPath $FILEPATH \
--filemgrUrl $OODT_FILEMGR_URL \
--clientTransferer org.apache.oodt.cas.filemgr.datatransfer.InPlaceDataTransferFactory \
--mimeExtractorRepo ../policy/mime-extractor-map.xml \
--noRecur \
--crawlForDirs 2>&1



I have setup OODT 0.12 on a server which runs FM listening to port 9000.
>From a client machine I have verified that I can use FM to ingest products.
I am now trying to use crawler to crawl and ingest all files in a directory. Since I have non standard MIME types in these directories I have done the following:
1. Added my own mime types in policy/mimetypes.xml eg
  <mime-type type="text/fastq">
                <glob pattern="*.fastq"/>
                <glob pattern="*.fastq.gz"/>
                <glob pattern="*.fastq.bz"/>
                <glob pattern="*.fastq.bz2"/>
                <glob pattern="*.fastq.bzip"/>
                <glob pattern="*.fq"/>
                <glob pattern="*.fq.gz"/>
                <glob pattern="*.fq.bz"/>
                <glob pattern="*.fq.bz2"/>
                <glob pattern="*.fq.bzip"/>
        </mime-type>
2. created the file policy/mime-extractor-map.xml

        <mime type="text/fastq">
                <extractor class="org.apache.oodt.cas.metadata.extractors.ExternMetExtractor">
                        <config file="/apache-oodt/crawler/bin/fastq.config"/>
                        <preCondComparators>
                                <preCondComparator id="CheckThatDataFileSizeIsGreaterThanZero"/>
                        </preCondComparators>
                </extractor>
        </mime>

3. created the file fastq.config
<?xml version="1.0" encoding="UTF-8"?>
<cas:externextractor xmlns:cas="http://oodt.jpl.nasa.gov/1.0/cas">
  <exec workingDir="">
    <extractorBinPath>/apache-oodt/crawler/bin/MetExtractorNGS.pl</extractorBinPath>
      <args>
         <arg isDataFile="true"></arg>
        <arg>fastq</arg>
      </args>
   </exec>
</cas:externextractor>



The MetExtractorNGS.pl is a small perl script that opens the file to be ingested, gets some information and stores it in the .met file that corresponds to the file to be ingested and have manually verified that works as expected producing the correct met file.

What am I missing here? Any ideas comments suggestions will be greatly appreciated.
Thanks in advance for any help
Kostas



PS1 The full output from running the crawler command follows:


Setting property 'StdProductCrawler.filemgrUrl'
Setting property 'MetExtractorProductCrawler.filemgrUrl'
Setting property 'AutoDetectProductCrawler.filemgrUrl'
Setting property 'StdProductCrawler.clientTransferer'
Setting property 'MetExtractorProductCrawler.clientTransferer'
Setting property 'AutoDetectProductCrawler.clientTransferer'
Setting property 'StdProductCrawler.noRecur'
Setting property 'MetExtractorProductCrawler.noRecur'
Setting property 'AutoDetectProductCrawler.noRecur'
Setting property 'AutoDetectProductCrawler.mimeExtractorRepo'
Setting property 'StdProductCrawler.productPath'
Setting property 'MetExtractorProductCrawler.productPath'
Setting property 'AutoDetectProductCrawler.productPath'
Apr 02, 2016 10:12:13 PM org.springframework.beans.factory.config.PropertyOverrideConfigurer processKey
FINE: Property 'AutoDetectProductCrawler.noRecur' set to value [true]
Apr 02, 2016 10:12:13 PM org.springframework.beans.factory.config.PropertyOverrideConfigurer processKey
FINE: Property 'StdProductCrawler.productPath' set to value [/mnt/celgene.rnd.combio.mmgp.external/TestSeqData/RNA-Seq/RawData/fastq]
Apr 02, 2016 10:12:13 PM org.springframework.beans.factory.config.PropertyOverrideConfigurer processKey
FINE: Property 'MetExtractorProductCrawler.noRecur' set to value [true]
Apr 02, 2016 10:12:13 PM org.springframework.beans.factory.config.PropertyOverrideConfigurer processKey
FINE: Property 'AutoDetectProductCrawler.mimeExtractorRepo' set to value [../policy/mime-extractor-map.xml]
Apr 02, 2016 10:12:13 PM org.springframework.beans.factory.config.PropertyOverrideConfigurer processKey
FINE: Property 'MetExtractorProductCrawler.clientTransferer' set to value [org.apache.oodt.cas.filemgr.datatransfer.InPlaceDataTransferFactory]
Apr 02, 2016 10:12:13 PM org.springframework.beans.factory.config.PropertyOverrideConfigurer processKey
FINE: Property 'AutoDetectProductCrawler.filemgrUrl' set to value [http://192.168.8.44:9000]
Apr 02, 2016 10:12:13 PM org.springframework.beans.factory.config.PropertyOverrideConfigurer processKey
FINE: Property 'AutoDetectProductCrawler.clientTransferer' set to value [org.apache.oodt.cas.filemgr.datatransfer.InPlaceDataTransferFactory]
Apr 02, 2016 10:12:13 PM org.springframework.beans.factory.config.PropertyOverrideConfigurer processKey
FINE: Property 'StdProductCrawler.noRecur' set to value [true]
Apr 02, 2016 10:12:13 PM org.springframework.beans.factory.config.PropertyOverrideConfigurer processKey
FINE: Property 'StdProductCrawler.filemgrUrl' set to value [http://192.168.8.44:9000]
Apr 02, 2016 10:12:13 PM org.springframework.beans.factory.config.PropertyOverrideConfigurer processKey
FINE: Property 'AutoDetectProductCrawler.productPath' set to value [/mnt/celgene.rnd.combio.mmgp.external/TestSeqData/RNA-Seq/RawData/fastq]
Apr 02, 2016 10:12:13 PM org.springframework.beans.factory.config.PropertyOverrideConfigurer processKey
FINE: Property 'MetExtractorProductCrawler.filemgrUrl' set to value [http://192.168.8.44:9000]
Apr 02, 2016 10:12:13 PM org.springframework.beans.factory.config.PropertyOverrideConfigurer processKey
FINE: Property 'StdProductCrawler.clientTransferer' set to value [org.apache.oodt.cas.filemgr.datatransfer.InPlaceDataTransferFactory]
Apr 02, 2016 10:12:13 PM org.springframework.beans.factory.config.PropertyOverrideConfigurer processKey
FINE: Property 'MetExtractorProductCrawler.productPath' set to value [/mnt/celgene.rnd.combio.mmgp.external/TestSeqData/RNA-Seq/RawData/fastq]
Apr 02, 2016 10:12:13 PM org.apache.oodt.cas.crawl.ProductCrawler crawl
INFO: Crawling /mnt/celgene.rnd.combio.mmgp.external/TestSeqData/RNA-Seq/RawData/fastq
Apr 02, 2016 10:12:13 PM org.apache.oodt.cas.crawl.ProductCrawler handleFile
INFO: Handling file /mnt/celgene.rnd.combio.mmgp.external/TestSeqData/RNA-Seq/RawData/fastq/E837642_R1.fastq.gz
Apr 02, 2016 10:12:14 PM org.apache.oodt.cas.crawl.AutoDetectProductCrawler passesPreconditions
WARNING: No extractor specs specified for /mnt/celgene.rnd.combio.mmgp.external/TestSeqData/RNA-Seq/RawData/fastq/E837642_R1.fastq.gz
Apr 02, 2016 10:12:14 PM org.apache.oodt.cas.crawl.ProductCrawler handleFile
WARNING: Failed to pass preconditions for ingest of product: [/mnt/celgene.rnd.combio.mmgp.external/TestSeqData/RNA-Seq/RawData/fastq/E837642_R1.fastq.gz]
Apr 02, 2016 10:12:14 PM org.apache.oodt.cas.crawl.ProductCrawler handleFile
INFO: Handling file /mnt/celgene.rnd.combio.mmgp.external/TestSeqData/RNA-Seq/RawData/fastq/E837642_R1.fastq.gz.met
Apr 02, 2016 10:12:14 PM org.apache.oodt.cas.metadata.preconditions.PreCondEvalUtils eval
INFO: Passed precondition comparator id CheckThatDataFileSizeIsGreaterThanZero
Apr 02, 2016 10:12:14 PM org.apache.oodt.cas.metadata.extractors.ExternMetExtractor extrMetadata
INFO: Generating met file for product file: [/mnt/celgene.rnd.combio.mmgp.external/TestSeqData/RNA-Seq/RawData/fastq/E837642_R1.fastq.gz.met]
Apr 02, 2016 10:12:14 PM org.apache.oodt.cas.metadata.extractors.ExternMetExtractor extrMetadata
INFO: Executing command line: [/celgene/software/apache-oodt/crawler/bin/MetExtractorNGS.pl /mnt/celgene.rnd.combio.mmgp.external/TestSeqData/RNA-Seq/RawData/fastq/E837642_R1.fastq.gz.met text ] with workingDir: [/mnt/celgene.rnd.combio.mmgp.external/TestSeqData/RNA-Seq/RawData/fastq] to extract metadata
OUTPUT: [WARN : MetExtractorNGS - 2016/04/02 22:12:15] - Input file /mnt/celgene.rnd.combio.mmgp.external/TestSeqData/RNA-Seq/RawData/fastq/E837642_R1.fastq.gz.met will be ignored. .met files are not processed !
Apr 02, 2016 10:12:15 PM org.apache.oodt.cas.crawl.ProductCrawler handleFile
SEVERE: Failed to get metadata for product : Met extractor failed to create metadata file
org.apache.oodt.cas.metadata.exceptions.MetExtractionException: Met extractor failed to create metadata file
        at org.apache.oodt.cas.metadata.extractors.ExternMetExtractor.extrMetadata(ExternMetExtractor.java:120)
        at org.apache.oodt.cas.metadata.AbstractMetExtractor.extractMetadata(AbstractMetExtractor.java:74)
        at org.apache.oodt.cas.crawl.AutoDetectProductCrawler.getMetadataForProduct(AutoDetectProductCrawler.java:84)
        at org.apache.oodt.cas.crawl.ProductCrawler.handleFile(ProductCrawler.java:136)
        at org.apache.oodt.cas.crawl.ProductCrawler.crawl(ProductCrawler.java:104)
        at org.apache.oodt.cas.crawl.ProductCrawler.crawl(ProductCrawler.java:74)
        at org.apache.oodt.cas.crawl.cli.action.CrawlerLauncherCliAction.execute(CrawlerLauncherCliAction.java:58)
        at org.apache.oodt.cas.cli.CmdLineUtility.execute(CmdLineUtility.java:331)
        at org.apache.oodt.cas.cli.CmdLineUtility.run(CmdLineUtility.java:188)
        at org.apache.oodt.cas.crawl.CrawlerLauncher.main(CrawlerLauncher.java:36)

Apr 02, 2016 10:12:15 PM org.apache.oodt.cas.crawl.ProductCrawler handleFile
INFO: Handling file /mnt/celgene.rnd.combio.mmgp.external/TestSeqData/RNA-Seq/RawData/fastq/E837642_R2.fastq.gz
Apr 02, 2016 10:12:15 PM org.apache.oodt.cas.crawl.AutoDetectProductCrawler passesPreconditions
WARNING: No extractor specs specified for /mnt/celgene.rnd.combio.mmgp.external/TestSeqData/RNA-Seq/RawData/fastq/E837642_R2.fastq.gz
Apr 02, 2016 10:12:15 PM org.apache.oodt.cas.crawl.ProductCrawler handleFile
WARNING: Failed to pass preconditions for ingest of product: [/mnt/celgene.rnd.combio.mmgp.external/TestSeqData/RNA-Seq/RawData/fastq/E837642_R2.fastq.gz]
Apr 02, 2016 10:12:15 PM org.apache.oodt.cas.crawl.ProductCrawler handleFile
INFO: Handling file /mnt/celgene.rnd.combio.mmgp.external/TestSeqData/RNA-Seq/RawData/fastq/E837642_R2.fastq.gz.met
Apr 02, 2016 10:12:15 PM org.apache.oodt.cas.metadata.preconditions.PreCondEvalUtils eval
INFO: Passed precondition comparator id CheckThatDataFileSizeIsGreaterThanZero
Apr 02, 2016 10:12:16 PM org.apache.oodt.cas.metadata.extractors.ExternMetExtractor extrMetadata
INFO: Generating met file for product file: [/mnt/celgene.rnd.combio.mmgp.external/TestSeqData/RNA-Seq/RawData/fastq/E837642_R2.fastq.gz.met]
Apr 02, 2016 10:12:16 PM org.apache.oodt.cas.metadata.extractors.ExternMetExtractor extrMetadata
INFO: Executing command line: [/celgene/software/apache-oodt/crawler/bin/MetExtractorNGS.pl /mnt/celgene.rnd.combio.mmgp.external/TestSeqData/RNA-Seq/RawData/fastq/E837642_R2.fastq.gz.met text ] with workingDir: [/mnt/celgene.rnd.combio.mmgp.external/TestSeqData/RNA-Seq/RawData/fastq] to extract metadata
OUTPUT: [WARN : MetExtractorNGS - 2016/04/02 22:12:16] - Input file /mnt/celgene.rnd.combio.mmgp.external/TestSeqData/RNA-Seq/RawData/fastq/E837642_R2.fastq.gz.met will be ignored. .met files are not processed !
Apr 02, 2016 10:12:17 PM org.apache.oodt.cas.crawl.ProductCrawler handleFile
SEVERE: Failed to get metadata for product : Met extractor failed to create metadata file
org.apache.oodt.cas.metadata.exceptions.MetExtractionException: Met extractor failed to create metadata file
        at org.apache.oodt.cas.metadata.extractors.ExternMetExtractor.extrMetadata(ExternMetExtractor.java:120)
        at org.apache.oodt.cas.metadata.AbstractMetExtractor.extractMetadata(AbstractMetExtractor.java:74)
        at org.apache.oodt.cas.crawl.AutoDetectProductCrawler.getMetadataForProduct(AutoDetectProductCrawler.java:84)
        at org.apache.oodt.cas.crawl.ProductCrawler.handleFile(ProductCrawler.java:136)
        at org.apache.oodt.cas.crawl.ProductCrawler.crawl(ProductCrawler.java:104)
        at org.apache.oodt.cas.crawl.ProductCrawler.crawl(ProductCrawler.java:74)
        at org.apache.oodt.cas.crawl.cli.action.CrawlerLauncherCliAction.execute(CrawlerLauncherCliAction.java:58)
        at org.apache.oodt.cas.cli.CmdLineUtility.execute(CmdLineUtility.java:331)
        at org.apache.oodt.cas.cli.CmdLineUtility.run(CmdLineUtility.java:188)
        at org.apache.oodt.cas.crawl.CrawlerLauncher.main(CrawlerLauncher.java:36)

Apr 02, 2016 10:12:17 PM org.apache.oodt.cas.crawl.ProductCrawler handleFile
INFO: Handling file /mnt/celgene.rnd.combio.mmgp.external/TestSeqData/RNA-Seq/RawData/fastq/cas-crawler-04-02-16.log.gz
Apr 02, 2016 10:12:17 PM org.apache.oodt.cas.crawl.AutoDetectProductCrawler passesPreconditions
WARNING: No extractor specs specified for /mnt/celgene.rnd.combio.mmgp.external/TestSeqData/RNA-Seq/RawData/fastq/cas-crawler-04-02-16.log.gz
Apr 02, 2016 10:12:17 PM org.apache.oodt.cas.crawl.ProductCrawler handleFile
WARNING: Failed to pass preconditions for ingest of product: [/mnt/celgene.rnd.combio.mmgp.external/TestSeqData/RNA-Seq/RawData/fastq/cas-crawler-04-02-16.log.gz]
Apr 02, 2016 10:12:17 PM org.apache.oodt.cas.crawl.ProductCrawler handleFile
INFO: Handling file /mnt/celgene.rnd.combio.mmgp.external/TestSeqData/RNA-Seq/RawData/fastq/cas-crawler-04-02-16.tar.gz
Apr 02, 2016 10:12:17 PM org.apache.oodt.cas.crawl.AutoDetectProductCrawler passesPreconditions
WARNING: No extractor specs specified for /mnt/celgene.rnd.combio.mmgp.external/TestSeqData/RNA-Seq/RawData/fastq/cas-crawler-04-02-16.tar.gz
Apr 02, 2016 10:12:17 PM org.apache.oodt.cas.crawl.ProductCrawler handleFile
WARNING: Failed to pass preconditions for ingest of product: [/mnt/celgene.rnd.combio.mmgp.external/TestSeqData/RNA-Seq/RawData/fastq/cas-crawler-04-02-16.tar.gz]
Apr 02, 2016 10:12:17 PM org.apache.oodt.cas.crawl.ProductCrawler handleFile
INFO: Handling file /mnt/celgene.rnd.combio.mmgp.external/TestSeqData/RNA-Seq/RawData/fastq/cas-crawler-mnt-celgene.rnd.combio.mmgp.external-TestSeqData-RNA-Seq-RawData-fastq-04-02-16.tar.gz
Apr 02, 2016 10:12:17 PM org.apache.oodt.cas.crawl.AutoDetectProductCrawler passesPreconditions
WARNING: No extractor specs specified for /mnt/celgene.rnd.combio.mmgp.external/TestSeqData/RNA-Seq/RawData/fastq/cas-crawler-mnt-celgene.rnd.combio.mmgp.external-TestSeqData-RNA-Seq-RawData-fastq-04-02-16.tar.gz
Apr 02, 2016 10:12:17 PM org.apache.oodt.cas.crawl.ProductCrawler handleFile
WARNING: Failed to pass preconditions for ingest of product: [/mnt/celgene.rnd.combio.mmgp.external/TestSeqData/RNA-Seq/RawData/fastq/cas-crawler-mnt-celgene.rnd.combio.mmgp.external-TestSeqData-RNA-Seq-RawData-fastq-04-02-16.tar.gz]
Apr 02, 2016 10:12:17 PM org.apache.oodt.cas.crawl.ProductCrawler handleFile
INFO: Handling file /mnt/celgene.rnd.combio.mmgp.external/TestSeqData/RNA-Seq/RawData/fastq/test
Apr 02, 2016 10:12:17 PM org.apache.oodt.cas.metadata.preconditions.PreCondEvalUtils eval
INFO: Passed precondition comparator id CheckThatDataFileSizeIsGreaterThanZero
Apr 02, 2016 10:12:17 PM org.apache.oodt.cas.metadata.extractors.ExternMetExtractor extrMetadata
INFO: Generating met file for product file: [/mnt/celgene.rnd.combio.mmgp.external/TestSeqData/RNA-Seq/RawData/fastq/test]
Apr 02, 2016 10:12:17 PM org.apache.oodt.cas.metadata.extractors.ExternMetExtractor extrMetadata
INFO: Executing command line: [/celgene/software/apache-oodt/crawler/bin/MetExtractorNGS.pl /mnt/celgene.rnd.combio.mmgp.external/TestSeqData/RNA-Seq/RawData/fastq/test text ] with workingDir: [/mnt/celgene.rnd.combio.mmgp.external/TestSeqData/RNA-Seq/RawData/fastq] to extract metadata
OUTPUT: [DEBUG : MetExtractorNGS - 2016/04/02 22:12:18] - Accessing NGS server at http://192.168.8.44:8082/RPC2
OUTPUT: [DEBUG : metadataPrepare - 2016/04/02 22:12:18] - addMetadata: metadata for file_host are not in array format.Converting..
OUTPUT: [DEBUG : metadataPrepare - 2016/04/02 22:12:18] - addMetadata: adding key/value [file_host]/[ip-192-168-8-66]
OUTPUT: [DEBUG : metadataPrepare - 2016/04/02 22:12:18] - addMetadata: metadata for ProductType are not in array format.Converting..
OUTPUT: [DEBUG : metadataPrepare - 2016/04/02 22:12:18] - addMetadata: adding key/value [ProductType]/[GenericFile]
OUTPUT: [DEBUG : metadataPrepare - 2016/04/02 22:12:18] - addMetadata: metadata for ingest_user are not in array format.Converting..
OUTPUT: [DEBUG : metadataPrepare - 2016/04/02 22:12:18] - addMetadata: adding key/value [ingest_user]/[kmavrommatis]
OUTPUT: [DEBUG : MetExtractorNGS - 2016/04/02 22:12:18] - The file path is ARRAY(0x22d3f48). It will be added under the FilePath metadata field
OUTPUT: [DEBUG : metadataPrepare - 2016/04/02 22:12:18] - addMetadata: metadata for FilePath are not in array format.Converting..
OUTPUT: [DEBUG : metadataPrepare - 2016/04/02 22:12:18] - addMetadata: adding key/value [FilePath]/[/mnt/celgene.rnd.combio.mmgp.external/TestSeqData/RNA-Seq/RawData/fastq/test]
OUTPUT: [DEBUG : MetExtractorNGS - 2016/04/02 22:12:18] - This file is of type text
OUTPUT: [DEBUG : MetExtractorNGS - 2016/04/02 22:12:18] - Storing metadata in file /mnt/celgene.rnd.combio.mmgp.external/TestSeqData/RNA-Seq/RawData/fastq/test.met
OUTPUT: [DEBUG : metadataPrepare - 2016/04/02 22:12:18] - Changing /mnt/celgene.rnd.combio.mmgp.external/TestSeqData/RNA-Seq/RawData/fastq/test to
OUTPUT: [DEBUG : metadataPrepare - 2016/04/02 22:12:18] - /mnt/celgene.rnd.combio.mmgp.external/TestSeqData/RNA-Seq/RawData/fastq/test
OUTPUT: [DEBUG : metadataPrepare - 2016/04/02 22:12:18] - Changing kmavrommatis to
OUTPUT: [DEBUG : metadataPrepare - 2016/04/02 22:12:18] - kmavrommatis
OUTPUT: [DEBUG : metadataPrepare - 2016/04/02 22:12:18] - Changing GenericFile to
OUTPUT: [DEBUG : metadataPrepare - 2016/04/02 22:12:18] - GenericFile
OUTPUT: [DEBUG : metadataPrepare - 2016/04/02 22:12:18] - Changing ip-192-168-8-66 to
OUTPUT: [DEBUG : metadataPrepare - 2016/04/02 22:12:18] - ip-192-168-8-66
OUTPUT: [DEBUG : MetExtractorNGS - 2016/04/02 22:12:19] - Process finished SUCCESSFULLY
Apr 02, 2016 10:12:19 PM org.apache.oodt.cas.metadata.extractors.ExternMetExtractor extrMetadata
INFO: Met extraction successful for product file: [/mnt/celgene.rnd.combio.mmgp.external/TestSeqData/RNA-Seq/RawData/fastq/test]
Apr 02, 2016 10:12:19 PM org.apache.oodt.cas.crawl.ProductCrawler ingest
INFO: ProductCrawler: Ready to ingest product: [/mnt/celgene.rnd.combio.mmgp.external/TestSeqData/RNA-Seq/RawData/fastq/test]: ProductType: [GenericFile]
Apr 02, 2016 10:12:19 PM org.apache.oodt.cas.filemgr.ingest.StdIngester setFileManager
INFO: StdIngester: connected to file manager: [http://192.168.8.44:9000]
Apr 02, 2016 10:12:19 PM org.apache.oodt.cas.filemgr.datatransfer.InPlaceDataTransferer setFileManagerUrl
INFO: In Place Data Transfer to: [http://192.168.8.44:9000] enabled
Apr 02, 2016 10:12:19 PM org.apache.oodt.cas.filemgr.ingest.StdIngester ingest
INFO: StdIngester: ingesting product: ProductName: [test]: ProductType: [GenericFile]: FileLocation: [/mnt/celgene.rnd.combio.mmgp.external/TestSeqData/RNA-Seq/RawData/fastq/]
Apr 02, 2016 10:12:19 PM org.apache.oodt.cas.filemgr.system.XmlRpcFileManagerClient ingestProduct
FINEST: File Manager Client: clientTransfer enabled: transfering product [test]
Apr 02, 2016 10:12:19 PM org.apache.oodt.cas.filemgr.versioning.VersioningUtils createBasicDataStoreRefsFlat
FINE: VersioningUtils: Generated data store ref: file:/opt/oodt/data/archive/test/test from origRef: file:/mnt/celgene.rnd.combio.mmgp.external/TestSeqData/RNA-Seq/RawData/fastq/test
Apr 02, 2016 10:12:19 PM org.apache.oodt.cas.crawl.ProductCrawler ingest
INFO: Successfully ingested product: [/mnt/celgene.rnd.combio.mmgp.external/TestSeqData/RNA-Seq/RawData/fastq/test]: product id: 4c8de2da-265a-48c4-8380-3f1103dfecfc
Apr 02, 2016 10:12:19 PM org.apache.oodt.cas.crawl.ProductCrawler handleFile
INFO: Successful ingest of product: [/mnt/celgene.rnd.combio.mmgp.external/TestSeqData/RNA-Seq/RawData/fastq/test]


*********************************************************
THIS ELECTRONIC MAIL MESSAGE AND ANY ATTACHMENT IS
CONFIDENTIAL AND MAY CONTAIN LEGALLY PRIVILEGED
INFORMATION INTENDED ONLY FOR THE USE OF THE INDIVIDUAL
OR INDIVIDUALS NAMED ABOVE.
If the reader is not the intended recipient, or the
employee or agent responsible to deliver it to the
intended recipient, you are hereby notified that any
dissemination, distribution or copying of this
communication is strictly prohibited. If you have
received this communication in error, please reply to the
sender to notify us of the error and delete the original
message. Thank You.

Re: Transition from OODT 0.6 to 0.12 cannot find extractor specifications

Posted by Chris Mattmann <ch...@gmail.com>.
thanks Val

—
Chris Mattmann
chris.mattmann@gmail.com







On 4/6/16, 7:15 PM, "Mallder, Valerie" <Va...@jhuapl.edu> wrote:

>I haven't had a chance to study this yet. But after a first pass through this email trail I'm suspicious that Kostas may be running into the same problem I ran into when tika was either introduced or upgraded to a much newer version than had been in the system previously. I ended up having to modify my mimetypes.xml file to get around the problem I was having after that happened. But, I will look at this in detail tomorrow and compare it to my history of debugging when I was going from versions 0.6 to 0.7 to 0.8 to 0.9 and 0.10 and see if the problem is what I have seen before. However, I am staying at 0.10, so I won't be able to speak for going up to version 0.12.
>
>Val
>
>
>
>Sent with Good (www.good.com)
>________________________________
>From: Chris Mattmann <ch...@gmail.com>
>Sent: Wednesday, April 6, 2016 9:58:15 PM
>To: dev@oodt.apache.org
>Subject: Re: Transition from OODT 0.6 to 0.12 cannot find extractor specifications
>
>Thanks Kostas, they are wire compatible and this is a good
>use case.
>
>The crawler should not have undergone much update (perhaps at
>all) since 0.6, so am not exactly sure why you were seeing
>issues with it. There are definitely upgrades since 0.6 to CAS-PGE
>and maybe that’s what you were running into.
>
>
>—
>Chris Mattmann
>chris.mattmann@gmail.com
>
>
>
>
>
>
>
>On 4/6/16, 6:47 PM, "Konstantinos Mavrommatis" <km...@celgene.com> wrote:
>
>>I am giving up on this....
>>I had used [1] in the first place to setup oodt (v0.6 back then) my setup in the new system is identical to the old one.
>>I could not make much out of [0]. Among other things I tried to copy the files in the old crawler/policy to the new crawler/policy - which included some legacy-cmd-line-options.xml, legacy-cmd-line actions.xml. I also tried to reinstall the full oodt on the client side, but still did not work.
>>
>>I ended up reverting to the older version (0.6) which I run on my client. The server (which runs FM) is still 0.12, but the combination seems to be working fine.
>>
>>K
>>
>>-----Original Message-----
>>From: Lewis John Mcgibbney [mailto:lewis.mcgibbney@gmail.com]
>>Sent: Tuesday, April 05, 2016 3:33 AM
>>To: dev@oodt.apache.org
>>Subject: Re: Transition from OODT 0.6 to 0.12 cannot find extractor specifications
>>
>>Hi K,
>>OK so I did a bit of searching here and located a bunch of files which are defined as legacy... you can check the search results out below https://urldefense.proofpoint.com/v2/url?u=https-3A__github.com_apache_oodt_search-3Futf8-3D-25E2-259C-2593-26q-3DAutoDetectProductCrawler-26type-3DCode&d=CwIBaQ&c=CZZujK3G2KuXGEKnzq-Hrg&r=wndYZ4MLMT9l3Zb2WZv2hq2O6yvZ1Cs-T2gHY95y7ZA&m=AZOhzDmmNuBD_R9H2fm-CubVmid0OEJbXqk4G2cmzDs&s=B33E_m-BUEEQBqIqa2J8tZ2vnLqfwapWZp9Rn5nRyU8&e=
>>I would urge you to have a look at the AutoDetectProductCrawler Javadoc description included in master branch [0] as well to see if you've got everything required.
>>Finally, I came across some documentation on the wiki which may guide you in the right direction [1]. It may also be outdated though so please let us know if that it the case.
>>hth
>>
>>[0]
>>https://urldefense.proofpoint.com/v2/url?u=https-3A__github.com_apache_oodt_blob_91d0bafe71124906bd94baad746189caf35fb39c_crawler_src_main_java_org_apache_oodt_cas_crawl_AutoDetectProductCrawler.java-23L40-2DL64&d=CwIBaQ&c=CZZujK3G2KuXGEKnzq-Hrg&r=wndYZ4MLMT9l3Zb2WZv2hq2O6yvZ1Cs-T2gHY95y7ZA&m=AZOhzDmmNuBD_R9H2fm-CubVmid0OEJbXqk4G2cmzDs&s=rJpNgTfZDhDyGV5KksACkvbSnkVvobGfBQcxXiLWwT4&e=
>>[1]
>>https://urldefense.proofpoint.com/v2/url?u=https-3A__cwiki.apache.org_confluence_display_OODT_Mime-2Btype-2Bdetection-2Bwith-2Bthe-2BAutoDetectProductCrawler&d=CwIBaQ&c=CZZujK3G2KuXGEKnzq-Hrg&r=wndYZ4MLMT9l3Zb2WZv2hq2O6yvZ1Cs-T2gHY95y7ZA&m=AZOhzDmmNuBD_R9H2fm-CubVmid0OEJbXqk4G2cmzDs&s=V5fEGERshX3JHBTQXryhwoEZqhgarILk8WutEwICmGs&e=
>>
>>On Mon, Apr 4, 2016 at 10:54 PM, Konstantinos Mavrommatis < kmavrommatis@celgene.com> wrote:
>>
>>> Hi,
>>> It seems to be happening for a number of types of files that I have in
>>> the mimetypes.xml.
>>> A few things are puzzling to me: this file which is a .gz file is not
>>> processed by the regular tika mimetypes which contains the gzip files
>>> A file that has no extension, which defaults to txt is passed to the
>>> MetExtractor.pl and processed.
>>>
>>> Any ideas I can find what are the preconditions that fail ? I tried to
>>> change the log level to DEBUG for all components but I did not get
>>> much more information. This must be something that changed in the OODT
>>> releases
>>> >0.6 but could not find anything relevant in the release notes.
>>> I also noticed in the documentation  of the AutoDecectProductCrawler
>>> that it uses the file met-extr-preconditions.xml which I could not
>>> find anywhere in the deployed OODT or the src directories. Could that
>>> be a reason for the problem I observe?
>>>
>>> Thanks
>>> K
>>>
>>> -----Original Message-----
>>> From: Lewis John Mcgibbney [mailto:lewis.mcgibbney@gmail.com]
>>> Sent: Monday, April 04, 2016 3:24 PM
>>> To: dev@oodt.apache.org
>>> Subject: Re: Transition from OODT 0.6 to 0.12 cannot find extractor
>>> specifications
>>>
>>> Hi Konstantinos,
>>> It appears to be happening with a tar.gz file as well right?
>>>
>>> WARNING: No extractor specs specified for
>>> /mnt/celgene.rnd.combio.mmgp.external/TestSeqData/RNA-Seq/RawData/fast
>>> q/cas-crawler-04-02-16.log.gz
>>>
>>> I wonder if it is the file names... However I would be extremely
>>> surprised as I've seen some much more verbose file naming.
>>> Lewis
>>>
>>> On Saturday, April 2, 2016, Konstantinos Mavrommatis <
>>> kmavrommatis@celgene.com> wrote:
>>>
>>> > Hi,
>>> > I am trying to replicate a fully functional service that I had setup
>>> > long time ago using OODT 0.6 but I am having the following problem
>>> > that does not allow me to ingest files. When I try to ingest files
>>> > with the extension fastq.gz I get the line:
>>> > WARNING: No extractor specs specified for
>>> > /mnt/celgene.rnd.combio.mmgp.external/TestSeqData/RNA-Seq/RawData/fa
>>> > st q/E837642_R1.fastq.gz Apr 02, 2016 10:12:14 PM
>>> > org.apache.oodt.cas.crawl.ProductCrawler
>>> > handleFile
>>> > And of course the file is not ingested. This process works without
>>> > problem with OODT 0.6 on a different server.
>>> >
>>> > The crawler command I am running is:
>>> > ./crawler_launcher \
>>> > --operation \
>>> > --launchAutoCrawler \
>>> > --productPath $FILEPATH \
>>> > --filemgrUrl $OODT_FILEMGR_URL \
>>> > --clientTransferer
>>> > org.apache.oodt.cas.filemgr.datatransfer.InPlaceDataTransferFactory
>>> > \ --mimeExtractorRepo ../policy/mime-extractor-map.xml \ --noRecur \
>>> > --crawlForDirs 2>&1
>>> >
>>> >
>>> >
>>> > I have setup OODT 0.12 on a server which runs FM listening to port 9000.
>>> > From a client machine I have verified that I can use FM to ingest
>>> products.
>>> > I am now trying to use crawler to crawl and ingest all files in a
>>> > directory. Since I have non standard MIME types in these directories
>>> > I have done the following:
>>> > 1. Added my own mime types in policy/mimetypes.xml eg
>>> >   <mime-type type="text/fastq">
>>> >                 <glob pattern="*.fastq"/>
>>> >                 <glob pattern="*.fastq.gz"/>
>>> >                 <glob pattern="*.fastq.bz"/>
>>> >                 <glob pattern="*.fastq.bz2"/>
>>> >                 <glob pattern="*.fastq.bzip"/>
>>> >                 <glob pattern="*.fq"/>
>>> >                 <glob pattern="*.fq.gz"/>
>>> >                 <glob pattern="*.fq.bz"/>
>>> >                 <glob pattern="*.fq.bz2"/>
>>> >                 <glob pattern="*.fq.bzip"/>
>>> >         </mime-type>
>>> > 2. created the file policy/mime-extractor-map.xml
>>> >
>>> >         <mime type="text/fastq">
>>> >                 <extractor
>>> > class="org.apache.oodt.cas.metadata.extractors.ExternMetExtractor">
>>> >                         <config
>>> > file="/apache-oodt/crawler/bin/fastq.config"/>
>>> >                         <preCondComparators>
>>> >                                 <preCondComparator
>>> > id="CheckThatDataFileSizeIsGreaterThanZero"/>
>>> >                         </preCondComparators>
>>> >                 </extractor>
>>> >         </mime>
>>> >
>>> > 3. created the file fastq.config
>>> > <?xml version="1.0" encoding="UTF-8"?> <cas:externextractor
>>> > xmlns:cas="https://urldefense.proofpoint.com/v2/url?u=http-3A__oodt.jpl.nasa.gov_1.0_cas&d=CwIBaQ&c=CZZujK3G2KuXGEKnzq-Hrg&r=wndYZ4MLMT9l3Zb2WZv2hq2O6yvZ1Cs-T2gHY95y7ZA&m=AZOhzDmmNuBD_R9H2fm-CubVmid0OEJbXqk4G2cmzDs&s=FvkBYgoM8RnUm2ITaMjYb1s1sa9YtHvNL4c1M_KF06w&e= ">
>>> >   <exec workingDir="">
>>> >
>>> >
>>> <extractorBinPath>/apache-oodt/crawler/bin/MetExtractorNGS.pl</extract
>>> orBinPath>
>>> >       <args>
>>> >          <arg isDataFile="true"></arg>
>>> >         <arg>fastq</arg>
>>> >       </args>
>>> >    </exec>
>>> > </cas:externextractor>
>>> >
>>> >
>>> >
>>> > The MetExtractorNGS.pl is a small perl script that opens the file to
>>> > be ingested, gets some information and stores it in the .met file
>>> > that corresponds to the file to be ingested and have manually
>>> > verified that works as expected producing the correct met file.
>>> >
>>> > What am I missing here? Any ideas comments suggestions will be
>>> > greatly appreciated.
>>> > Thanks in advance for any help
>>> > Kostas
>>> >
>>> >
>>> >
>>> > PS1 The full output from running the crawler command follows:
>>> >
>>> >
>>> > Setting property 'StdProductCrawler.filemgrUrl'
>>> > Setting property 'MetExtractorProductCrawler.filemgrUrl'
>>> > Setting property 'AutoDetectProductCrawler.filemgrUrl'
>>> > Setting property 'StdProductCrawler.clientTransferer'
>>> > Setting property 'MetExtractorProductCrawler.clientTransferer'
>>> > Setting property 'AutoDetectProductCrawler.clientTransferer'
>>> > Setting property 'StdProductCrawler.noRecur'
>>> > Setting property 'MetExtractorProductCrawler.noRecur'
>>> > Setting property 'AutoDetectProductCrawler.noRecur'
>>> > Setting property 'AutoDetectProductCrawler.mimeExtractorRepo'
>>> > Setting property 'StdProductCrawler.productPath'
>>> > Setting property 'MetExtractorProductCrawler.productPath'
>>> > Setting property 'AutoDetectProductCrawler.productPath'
>>> > Apr 02, 2016 10:12:13 PM
>>> > org.springframework.beans.factory.config.PropertyOverrideConfigurer
>>> > processKey
>>> > FINE: Property 'AutoDetectProductCrawler.noRecur' set to value
>>> > [true] Apr 02, 2016 10:12:13 PM
>>> > org.springframework.beans.factory.config.PropertyOverrideConfigurer
>>> > processKey
>>> > FINE: Property 'StdProductCrawler.productPath' set to value
>>> > [/mnt/celgene.rnd.combio.mmgp.external/TestSeqData/RNA-Seq/RawData/f
>>> > as
>>> > tq]
>>> > Apr 02, 2016 10:12:13 PM
>>> > org.springframework.beans.factory.config.PropertyOverrideConfigurer
>>> > processKey
>>> > FINE: Property 'MetExtractorProductCrawler.noRecur' set to value
>>> > [true] Apr 02, 2016 10:12:13 PM
>>> > org.springframework.beans.factory.config.PropertyOverrideConfigurer
>>> > processKey
>>> > FINE: Property 'AutoDetectProductCrawler.mimeExtractorRepo' set to
>>> > value [../policy/mime-extractor-map.xml]
>>> > Apr 02, 2016 10:12:13 PM
>>> > org.springframework.beans.factory.config.PropertyOverrideConfigurer
>>> > processKey
>>> > FINE: Property 'MetExtractorProductCrawler.clientTransferer' set to
>>> > value
>>> > [org.apache.oodt.cas.filemgr.datatransfer.InPlaceDataTransferFactory
>>> > ]
>>> > Apr 02, 2016 10:12:13 PM
>>> > org.springframework.beans.factory.config.PropertyOverrideConfigurer
>>> > processKey
>>> > FINE: Property 'AutoDetectProductCrawler.filemgrUrl' set to value [
>>> > https://urldefense.proofpoint.com/v2/url?u=http-3A__192.168.8.44-3A9
>>> > 00
>>> > 0&d=CwIBaQ&c=CZZujK3G2KuXGEKnzq-Hrg&r=wndYZ4MLMT9l3Zb2WZv2hq2O6yvZ1C
>>> > s-
>>> > T2gHY95y7ZA&m=Qaz0eKz7FHe35NMF43A17ey59ANhAqJD5ZfwZQC0VRo&s=OvpwZVR1
>>> > Xq gKclL83VXAWh__c7nz87xK_nS-O7hIXqc&e= ] Apr 02, 2016 10:12:13 PM
>>> > org.springframework.beans.factory.config.PropertyOverrideConfigurer
>>> > processKey
>>> > FINE: Property 'AutoDetectProductCrawler.clientTransferer' set to
>>> > value
>>> > [org.apache.oodt.cas.filemgr.datatransfer.InPlaceDataTransferFactory
>>> > ]
>>> > Apr 02, 2016 10:12:13 PM
>>> > org.springframework.beans.factory.config.PropertyOverrideConfigurer
>>> > processKey
>>> > FINE: Property 'StdProductCrawler.noRecur' set to value [true] Apr
>>> > 02,
>>> > 2016 10:12:13 PM
>>> > org.springframework.beans.factory.config.PropertyOverrideConfigurer
>>> > processKey
>>> > FINE: Property 'StdProductCrawler.filemgrUrl' set to value [
>>> > https://urldefense.proofpoint.com/v2/url?u=http-3A__192.168.8.44-3A9
>>> > 00
>>> > 0&d=CwIBaQ&c=CZZujK3G2KuXGEKnzq-Hrg&r=wndYZ4MLMT9l3Zb2WZv2hq2O6yvZ1C
>>> > s-
>>> > T2gHY95y7ZA&m=Qaz0eKz7FHe35NMF43A17ey59ANhAqJD5ZfwZQC0VRo&s=OvpwZVR1
>>> > Xq gKclL83VXAWh__c7nz87xK_nS-O7hIXqc&e= ] Apr 02, 2016 10:12:13 PM
>>> > org.springframework.beans.factory.config.PropertyOverrideConfigurer
>>> > processKey
>>> > FINE: Property 'AutoDetectProductCrawler.productPath' set to value
>>> > [/mnt/celgene.rnd.combio.mmgp.external/TestSeqData/RNA-Seq/RawData/f
>>> > as
>>> > tq]
>>> > Apr 02, 2016 10:12:13 PM
>>> > org.springframework.beans.factory.config.PropertyOverrideConfigurer
>>> > processKey
>>> > FINE: Property 'MetExtractorProductCrawler.filemgrUrl' set to value
>>> > [
>>> > https://urldefense.proofpoint.com/v2/url?u=http-3A__192.168.8.44-3A9
>>> > 00
>>> > 0&d=CwIBaQ&c=CZZujK3G2KuXGEKnzq-Hrg&r=wndYZ4MLMT9l3Zb2WZv2hq2O6yvZ1C
>>> > s-
>>> > T2gHY95y7ZA&m=Qaz0eKz7FHe35NMF43A17ey59ANhAqJD5ZfwZQC0VRo&s=OvpwZVR1
>>> > Xq gKclL83VXAWh__c7nz87xK_nS-O7hIXqc&e= ] Apr 02, 2016 10:12:13 PM
>>> > org.springframework.beans.factory.config.PropertyOverrideConfigurer
>>> > processKey
>>> > FINE: Property 'StdProductCrawler.clientTransferer' set to value
>>> > [org.apache.oodt.cas.filemgr.datatransfer.InPlaceDataTransferFactory
>>> > ]
>>> > Apr 02, 2016 10:12:13 PM
>>> > org.springframework.beans.factory.config.PropertyOverrideConfigurer
>>> > processKey
>>> > FINE: Property 'MetExtractorProductCrawler.productPath' set to value
>>> > [/mnt/celgene.rnd.combio.mmgp.external/TestSeqData/RNA-Seq/RawData/f
>>> > as tq] Apr 02, 2016 10:12:13 PM
>>> > org.apache.oodt.cas.crawl.ProductCrawler
>>> > crawl
>>> > INFO: Crawling
>>> > /mnt/celgene.rnd.combio.mmgp.external/TestSeqData/RNA-Seq/RawData/fa
>>> > st q Apr 02, 2016 10:12:13 PM
>>> > org.apache.oodt.cas.crawl.ProductCrawler
>>> > handleFile
>>> > INFO: Handling file
>>> > /mnt/celgene.rnd.combio.mmgp.external/TestSeqData/RNA-Seq/RawData/fa
>>> > st
>>> > q/E837642_R1.fastq.gz
>>> > Apr 02, 2016 10:12:14 PM
>>> > org.apache.oodt.cas.crawl.AutoDetectProductCrawler
>>> > passesPreconditions
>>> > WARNING: No extractor specs specified for
>>> > /mnt/celgene.rnd.combio.mmgp.external/TestSeqData/RNA-Seq/RawData/fa
>>> > st q/E837642_R1.fastq.gz Apr 02, 2016 10:12:14 PM
>>> > org.apache.oodt.cas.crawl.ProductCrawler
>>> > handleFile
>>> > WARNING: Failed to pass preconditions for ingest of product:
>>> > [/mnt/celgene.rnd.combio.mmgp.external/TestSeqData/RNA-Seq/RawData/f
>>> > as tq/E837642_R1.fastq.gz] Apr 02, 2016 10:12:14 PM
>>> > org.apache.oodt.cas.crawl.ProductCrawler
>>> > handleFile
>>> > INFO: Handling file
>>> > /mnt/celgene.rnd.combio.mmgp.external/TestSeqData/RNA-Seq/RawData/fa
>>> > st
>>> > q/E837642_R1.fastq.gz.met
>>> > Apr 02, 2016 10:12:14 PM
>>> > org.apache.oodt.cas.metadata.preconditions.PreCondEvalUtils eval
>>> > INFO: Passed precondition comparator id
>>> > CheckThatDataFileSizeIsGreaterThanZero
>>> > Apr 02, 2016 10:12:14 PM
>>> > org.apache.oodt.cas.metadata.extractors.ExternMetExtractor
>>> > extrMetadata
>>> > INFO: Generating met file for product file:
>>> > [/mnt/celgene.rnd.combio.mmgp.external/TestSeqData/RNA-Seq/RawData/f
>>> > as
>>> > tq/E837642_R1.fastq.gz.met]
>>> > Apr 02, 2016 10:12:14 PM
>>> > org.apache.oodt.cas.metadata.extractors.ExternMetExtractor
>>> > extrMetadata
>>> > INFO: Executing command line:
>>> > [/celgene/software/apache-oodt/crawler/bin/MetExtractorNGS.pl
>>> > /mnt/celgene.rnd.combio.mmgp.external/TestSeqData/RNA-Seq/RawData/fa
>>> > st
>>> > q/E837642_R1.fastq.gz.met
>>> > text ] with workingDir:
>>> > [/mnt/celgene.rnd.combio.mmgp.external/TestSeqData/RNA-Seq/RawData/f
>>> > as
>>> > tq]
>>> > to extract metadata
>>> > OUTPUT: [WARN : MetExtractorNGS - 2016/04/02 22:12:15] - Input file
>>> > /mnt/celgene.rnd.combio.mmgp.external/TestSeqData/RNA-Seq/RawData/fa
>>> > st q/E837642_R1.fastq.gz.met will be ignored. .met files are not
>>> > processed !
>>> > Apr 02, 2016 10:12:15 PM org.apache.oodt.cas.crawl.ProductCrawler
>>> > handleFile
>>> > SEVERE: Failed to get metadata for product : Met extractor failed to
>>> > create metadata file
>>> > org.apache.oodt.cas.metadata.exceptions.MetExtractionException: Met
>>> > extractor failed to create metadata file
>>> >         at
>>> >
>>> org.apache.oodt.cas.metadata.extractors.ExternMetExtractor.extrMetadat
>>> a(ExternMetExtractor.java:120)
>>> >         at
>>> >
>>> org.apache.oodt.cas.metadata.AbstractMetExtractor.extractMetadata(Abst
>>> ractMetExtractor.java:74)
>>> >         at
>>> >
>>> org.apache.oodt.cas.crawl.AutoDetectProductCrawler.getMetadataForProdu
>>> ct(AutoDetectProductCrawler.java:84)
>>> >         at
>>> >
>>> org.apache.oodt.cas.crawl.ProductCrawler.handleFile(ProductCrawler.jav
>>> a:136)
>>> >         at
>>> > org.apache.oodt.cas.crawl.ProductCrawler.crawl(ProductCrawler.java:104)
>>> >         at
>>> > org.apache.oodt.cas.crawl.ProductCrawler.crawl(ProductCrawler.java:74)
>>> >         at
>>> >
>>> org.apache.oodt.cas.crawl.cli.action.CrawlerLauncherCliAction.execute(
>>> CrawlerLauncherCliAction.java:58)
>>> >         at
>>> > org.apache.oodt.cas.cli.CmdLineUtility.execute(CmdLineUtility.java:331)
>>> >         at
>>> > org.apache.oodt.cas.cli.CmdLineUtility.run(CmdLineUtility.java:188)
>>> >         at
>>> > org.apache.oodt.cas.crawl.CrawlerLauncher.main(CrawlerLauncher.java:
>>> > 36
>>> > )
>>> >
>>> > Apr 02, 2016 10:12:15 PM org.apache.oodt.cas.crawl.ProductCrawler
>>> > handleFile
>>> > INFO: Handling file
>>> > /mnt/celgene.rnd.combio.mmgp.external/TestSeqData/RNA-Seq/RawData/fa
>>> > st
>>> > q/E837642_R2.fastq.gz
>>> > Apr 02, 2016 10:12:15 PM
>>> > org.apache.oodt.cas.crawl.AutoDetectProductCrawler
>>> > passesPreconditions
>>> > WARNING: No extractor specs specified for
>>> > /mnt/celgene.rnd.combio.mmgp.external/TestSeqData/RNA-Seq/RawData/fa
>>> > st q/E837642_R2.fastq.gz Apr 02, 2016 10:12:15 PM
>>> > org.apache.oodt.cas.crawl.ProductCrawler
>>> > handleFile
>>> > WARNING: Failed to pass preconditions for ingest of product:
>>> > [/mnt/celgene.rnd.combio.mmgp.external/TestSeqData/RNA-Seq/RawData/f
>>> > as tq/E837642_R2.fastq.gz] Apr 02, 2016 10:12:15 PM
>>> > org.apache.oodt.cas.crawl.ProductCrawler
>>> > handleFile
>>> > INFO: Handling file
>>> > /mnt/celgene.rnd.combio.mmgp.external/TestSeqData/RNA-Seq/RawData/fa
>>> > st
>>> > q/E837642_R2.fastq.gz.met
>>> > Apr 02, 2016 10:12:15 PM
>>> > org.apache.oodt.cas.metadata.preconditions.PreCondEvalUtils eval
>>> > INFO: Passed precondition comparator id
>>> > CheckThatDataFileSizeIsGreaterThanZero
>>> > Apr 02, 2016 10:12:16 PM
>>> > org.apache.oodt.cas.metadata.extractors.ExternMetExtractor
>>> > extrMetadata
>>> > INFO: Generating met file for product file:
>>> > [/mnt/celgene.rnd.combio.mmgp.external/TestSeqData/RNA-Seq/RawData/f
>>> > as
>>> > tq/E837642_R2.fastq.gz.met]
>>> > Apr 02, 2016 10:12:16 PM
>>> > org.apache.oodt.cas.metadata.extractors.ExternMetExtractor
>>> > extrMetadata
>>> > INFO: Executing command line:
>>> > [/celgene/software/apache-oodt/crawler/bin/MetExtractorNGS.pl
>>> > /mnt/celgene.rnd.combio.mmgp.external/TestSeqData/RNA-Seq/RawData/fa
>>> > st
>>> > q/E837642_R2.fastq.gz.met
>>> > text ] with workingDir:
>>> > [/mnt/celgene.rnd.combio.mmgp.external/TestSeqData/RNA-Seq/RawData/f
>>> > as
>>> > tq]
>>> > to extract metadata
>>> > OUTPUT: [WARN : MetExtractorNGS - 2016/04/02 22:12:16] - Input file
>>> > /mnt/celgene.rnd.combio.mmgp.external/TestSeqData/RNA-Seq/RawData/fa
>>> > st q/E837642_R2.fastq.gz.met will be ignored. .met files are not
>>> > processed !
>>> > Apr 02, 2016 10:12:17 PM org.apache.oodt.cas.crawl.ProductCrawler
>>> > handleFile
>>> > SEVERE: Failed to get metadata for product : Met extractor failed to
>>> > create metadata file
>>> > org.apache.oodt.cas.metadata.exceptions.MetExtractionException: Met
>>> > extractor failed to create metadata file
>>> >         at
>>> >
>>> org.apache.oodt.cas.metadata.extractors.ExternMetExtractor.extrMetadat
>>> a(ExternMetExtractor.java:120)
>>> >         at
>>> >
>>> org.apache.oodt.cas.metadata.AbstractMetExtractor.extractMetadata(Abst
>>> ractMetExtractor.java:74)
>>> >         at
>>> >
>>> org.apache.oodt.cas.crawl.AutoDetectProductCrawler.getMetadataForProdu
>>> ct(AutoDetectProductCrawler.java:84)
>>> >         at
>>> >
>>> org.apache.oodt.cas.crawl.ProductCrawler.handleFile(ProductCrawler.jav
>>> a:136)
>>> >         at
>>> > org.apache.oodt.cas.crawl.ProductCrawler.crawl(ProductCrawler.java:104)
>>> >         at
>>> > org.apache.oodt.cas.crawl.ProductCrawler.crawl(ProductCrawler.java:74)
>>> >         at
>>> >
>>> org.apache.oodt.cas.crawl.cli.action.CrawlerLauncherCliAction.execute(
>>> CrawlerLauncherCliAction.java:58)
>>> >         at
>>> > org.apache.oodt.cas.cli.CmdLineUtility.execute(CmdLineUtility.java:331)
>>> >         at
>>> > org.apache.oodt.cas.cli.CmdLineUtility.run(CmdLineUtility.java:188)
>>> >         at
>>> > org.apache.oodt.cas.crawl.CrawlerLauncher.main(CrawlerLauncher.java:
>>> > 36
>>> > )
>>> >
>>> > Apr 02, 2016 10:12:17 PM org.apache.oodt.cas.crawl.ProductCrawler
>>> > handleFile
>>> > INFO: Handling file
>>> > /mnt/celgene.rnd.combio.mmgp.external/TestSeqData/RNA-Seq/RawData/fa
>>> > st
>>> > q/cas-crawler-04-02-16.log.gz
>>> > Apr 02, 2016 10:12:17 PM
>>> > org.apache.oodt.cas.crawl.AutoDetectProductCrawler
>>> > passesPreconditions
>>> > WARNING: No extractor specs specified for
>>> > /mnt/celgene.rnd.combio.mmgp.external/TestSeqData/RNA-Seq/RawData/fa
>>> > st q/cas-crawler-04-02-16.log.gz Apr 02, 2016 10:12:17 PM
>>> > org.apache.oodt.cas.crawl.ProductCrawler
>>> > handleFile
>>> > WARNING: Failed to pass preconditions for ingest of product:
>>> > [/mnt/celgene.rnd.combio.mmgp.external/TestSeqData/RNA-Seq/RawData/f
>>> > as tq/cas-crawler-04-02-16.log.gz] Apr 02, 2016 10:12:17 PM
>>> > org.apache.oodt.cas.crawl.ProductCrawler
>>> > handleFile
>>> > INFO: Handling file
>>> > /mnt/celgene.rnd.combio.mmgp.external/TestSeqData/RNA-Seq/RawData/fa
>>> > st
>>> > q/cas-crawler-04-02-16.tar.gz
>>> > Apr 02, 2016 10:12:17 PM
>>> > org.apache.oodt.cas.crawl.AutoDetectProductCrawler
>>> > passesPreconditions
>>> > WARNING: No extractor specs specified for
>>> > /mnt/celgene.rnd.combio.mmgp.external/TestSeqData/RNA-Seq/RawData/fa
>>> > st q/cas-crawler-04-02-16.tar.gz Apr 02, 2016 10:12:17 PM
>>> > org.apache.oodt.cas.crawl.ProductCrawler
>>> > handleFile
>>> > WARNING: Failed to pass preconditions for ingest of product:
>>> > [/mnt/celgene.rnd.combio.mmgp.external/TestSeqData/RNA-Seq/RawData/f
>>> > as tq/cas-crawler-04-02-16.tar.gz] Apr 02, 2016 10:12:17 PM
>>> > org.apache.oodt.cas.crawl.ProductCrawler
>>> > handleFile
>>> > INFO: Handling file
>>> > /mnt/celgene.rnd.combio.mmgp.external/TestSeqData/RNA-Seq/RawData/fa
>>> > st
>>> > q/cas-crawler-mnt-celgene.rnd.combio.mmgp.external-TestSeqData-RNA-S
>>> > eq
>>> > -RawData-fastq-04-02-16.tar.gz
>>> > Apr 02, 2016 10:12:17 PM
>>> > org.apache.oodt.cas.crawl.AutoDetectProductCrawler
>>> > passesPreconditions
>>> > WARNING: No extractor specs specified for
>>> > /mnt/celgene.rnd.combio.mmgp.external/TestSeqData/RNA-Seq/RawData/fa
>>> > st
>>> > q/cas-crawler-mnt-celgene.rnd.combio.mmgp.external-TestSeqData-RNA-S
>>> > eq -RawData-fastq-04-02-16.tar.gz Apr 02, 2016 10:12:17 PM
>>> > org.apache.oodt.cas.crawl.ProductCrawler
>>> > handleFile
>>> > WARNING: Failed to pass preconditions for ingest of product:
>>> > [/mnt/celgene.rnd.combio.mmgp.external/TestSeqData/RNA-Seq/RawData/f
>>> > as
>>> > tq/cas-crawler-mnt-celgene.rnd.combio.mmgp.external-TestSeqData-RNA-
>>> > Se q-RawData-fastq-04-02-16.tar.gz] Apr 02, 2016 10:12:17 PM
>>> > org.apache.oodt.cas.crawl.ProductCrawler
>>> > handleFile
>>> > INFO: Handling file
>>> > /mnt/celgene.rnd.combio.mmgp.external/TestSeqData/RNA-Seq/RawData/fa
>>> > st
>>> > q/test
>>> > Apr 02, 2016 10:12:17 PM
>>> > org.apache.oodt.cas.metadata.preconditions.PreCondEvalUtils eval
>>> > INFO: Passed precondition comparator id
>>> > CheckThatDataFileSizeIsGreaterThanZero
>>> > Apr 02, 2016 10:12:17 PM
>>> > org.apache.oodt.cas.metadata.extractors.ExternMetExtractor
>>> > extrMetadata
>>> > INFO: Generating met file for product file:
>>> > [/mnt/celgene.rnd.combio.mmgp.external/TestSeqData/RNA-Seq/RawData/f
>>> > as
>>> > tq/test]
>>> > Apr 02, 2016 10:12:17 PM
>>> > org.apache.oodt.cas.metadata.extractors.ExternMetExtractor
>>> > extrMetadata
>>> > INFO: Executing command line:
>>> > [/celgene/software/apache-oodt/crawler/bin/MetExtractorNGS.pl
>>> > /mnt/celgene.rnd.combio.mmgp.external/TestSeqData/RNA-Seq/RawData/fa
>>> > st
>>> > q/test
>>> > text ] with workingDir:
>>> > [/mnt/celgene.rnd.combio.mmgp.external/TestSeqData/RNA-Seq/RawData/f
>>> > as
>>> > tq]
>>> > to extract metadata
>>> > OUTPUT: [DEBUG : MetExtractorNGS - 2016/04/02 22:12:18] - Accessing
>>> > NGS server at
>>> > https://urldefense.proofpoint.com/v2/url?u=http-3A__192.168.8.44-3A8
>>> > 08
>>> > 2_RPC2&d=CwIBaQ&c=CZZujK3G2KuXGEKnzq-Hrg&r=wndYZ4MLMT9l3Zb2WZv2hq2O6
>>> > yv
>>> > Z1Cs-T2gHY95y7ZA&m=Qaz0eKz7FHe35NMF43A17ey59ANhAqJD5ZfwZQC0VRo&s=tSc
>>> > i2 Q1bJj0cQnBHjjOwtZjjx9uNMoN5Bi-ABG0Q7Y4&e=
>>> > OUTPUT: [DEBUG : metadataPrepare - 2016/04/02 22:12:18] - addMetadata:
>>> > metadata for file_host are not in array format.Converting..
>>> > OUTPUT: [DEBUG : metadataPrepare - 2016/04/02 22:12:18] - addMetadata:
>>> > adding key/value [file_host]/[ip-192-168-8-66]
>>> > OUTPUT: [DEBUG : metadataPrepare - 2016/04/02 22:12:18] - addMetadata:
>>> > metadata for ProductType are not in array format.Converting..
>>> > OUTPUT: [DEBUG : metadataPrepare - 2016/04/02 22:12:18] - addMetadata:
>>> > adding key/value [ProductType]/[GenericFile]
>>> > OUTPUT: [DEBUG : metadataPrepare - 2016/04/02 22:12:18] - addMetadata:
>>> > metadata for ingest_user are not in array format.Converting..
>>> > OUTPUT: [DEBUG : metadataPrepare - 2016/04/02 22:12:18] - addMetadata:
>>> > adding key/value [ingest_user]/[kmavrommatis]
>>> > OUTPUT: [DEBUG : MetExtractorNGS - 2016/04/02 22:12:18] - The file
>>> > path is ARRAY(0x22d3f48). It will be added under the FilePath
>>> > metadata field
>>> > OUTPUT: [DEBUG : metadataPrepare - 2016/04/02 22:12:18] - addMetadata:
>>> > metadata for FilePath are not in array format.Converting..
>>> > OUTPUT: [DEBUG : metadataPrepare - 2016/04/02 22:12:18] - addMetadata:
>>> > adding key/value
>>> > [FilePath]/[/mnt/celgene.rnd.combio.mmgp.external/TestSeqData/RNA-Se
>>> > q/
>>> > RawData/fastq/test]
>>> > OUTPUT: [DEBUG : MetExtractorNGS - 2016/04/02 22:12:18] - This file
>>> > is of type text
>>> > OUTPUT: [DEBUG : MetExtractorNGS - 2016/04/02 22:12:18] - Storing
>>> > metadata in file
>>> > /mnt/celgene.rnd.combio.mmgp.external/TestSeqData/RNA-Seq/RawData/fa
>>> > st
>>> > q/test.met
>>> > OUTPUT: [DEBUG : metadataPrepare - 2016/04/02 22:12:18] - Changing
>>> > /mnt/celgene.rnd.combio.mmgp.external/TestSeqData/RNA-Seq/RawData/fa
>>> > st
>>> > q/test
>>> > to
>>> > OUTPUT: [DEBUG : metadataPrepare - 2016/04/02 22:12:18] -
>>> > /mnt/celgene.rnd.combio.mmgp.external/TestSeqData/RNA-Seq/RawData/fa
>>> > st
>>> > q/test
>>> > OUTPUT: [DEBUG : metadataPrepare - 2016/04/02 22:12:18] - Changing
>>> > kmavrommatis to
>>> > OUTPUT: [DEBUG : metadataPrepare - 2016/04/02 22:12:18] -
>>> > kmavrommatis
>>> > OUTPUT: [DEBUG : metadataPrepare - 2016/04/02 22:12:18] - Changing
>>> > GenericFile to
>>> > OUTPUT: [DEBUG : metadataPrepare - 2016/04/02 22:12:18] -
>>> > GenericFile
>>> > OUTPUT: [DEBUG : metadataPrepare - 2016/04/02 22:12:18] - Changing
>>> > ip-192-168-8-66 to
>>> > OUTPUT: [DEBUG : metadataPrepare - 2016/04/02 22:12:18] -
>>> > ip-192-168-8-66
>>> > OUTPUT: [DEBUG : MetExtractorNGS - 2016/04/02 22:12:19] - Process
>>> > finished SUCCESSFULLY Apr 02, 2016 10:12:19 PM
>>> > org.apache.oodt.cas.metadata.extractors.ExternMetExtractor
>>> > extrMetadata
>>> > INFO: Met extraction successful for product file:
>>> > [/mnt/celgene.rnd.combio.mmgp.external/TestSeqData/RNA-Seq/RawData/f
>>> > as
>>> > tq/test] Apr 02, 2016 10:12:19 PM
>>> > org.apache.oodt.cas.crawl.ProductCrawler ingest
>>> > INFO: ProductCrawler: Ready to ingest product:
>>> >
>>> [/mnt/celgene.rnd.combio.mmgp.external/TestSeqData/RNA-Seq/RawData/fastq/test]:
>>> > ProductType: [GenericFile]
>>> > Apr 02, 2016 10:12:19 PM
>>> > org.apache.oodt.cas.filemgr.ingest.StdIngester
>>> > setFileManager
>>> > INFO: StdIngester: connected to file manager:
>>> > [https://urldefense.proofpoint.com/v2/url?u=http-3A__192.168.8.44-3A
>>> > 90
>>> > 00&d=CwIBaQ&c=CZZujK3G2KuXGEKnzq-Hrg&r=wndYZ4MLMT9l3Zb2WZv2hq2O6yvZ1
>>> > Cs
>>> > -T2gHY95y7ZA&m=Qaz0eKz7FHe35NMF43A17ey59ANhAqJD5ZfwZQC0VRo&s=OvpwZVR
>>> > 1X qgKclL83VXAWh__c7nz87xK_nS-O7hIXqc&e= ] Apr 02, 2016 10:12:19 PM
>>> > org.apache.oodt.cas.filemgr.datatransfer.InPlaceDataTransferer
>>> > setFileManagerUrl
>>> > INFO: In Place Data Transfer to:
>>> > [https://urldefense.proofpoint.com/v2/url?u=http-3A__192.168.8.44-3A
>>> > 90
>>> > 00&d=CwIBaQ&c=CZZujK3G2KuXGEKnzq-Hrg&r=wndYZ4MLMT9l3Zb2WZv2hq2O6yvZ1
>>> > Cs
>>> > -T2gHY95y7ZA&m=Qaz0eKz7FHe35NMF43A17ey59ANhAqJD5ZfwZQC0VRo&s=OvpwZVR
>>> > 1X qgKclL83VXAWh__c7nz87xK_nS-O7hIXqc&e= ] enabled Apr 02, 2016
>>> > 10:12:19 PM org.apache.oodt.cas.filemgr.ingest.StdIngester
>>> > ingest
>>> > INFO: StdIngester: ingesting product: ProductName: [test]: ProductType:
>>> > [GenericFile]: FileLocation:
>>> > [/mnt/celgene.rnd.combio.mmgp.external/TestSeqData/RNA-Seq/RawData/f
>>> > as
>>> > tq/]
>>> > Apr 02, 2016 10:12:19 PM
>>> > org.apache.oodt.cas.filemgr.system.XmlRpcFileManagerClient
>>> > ingestProduct
>>> > FINEST: File Manager Client: clientTransfer enabled: transfering
>>> > product [test] Apr 02, 2016 10:12:19 PM
>>> > org.apache.oodt.cas.filemgr.versioning.VersioningUtils
>>> > createBasicDataStoreRefsFlat
>>> > FINE: VersioningUtils: Generated data store ref:
>>> > file:/opt/oodt/data/archive/test/test from origRef:
>>> > file:/mnt/celgene.rnd.combio.mmgp.external/TestSeqData/RNA-Seq/RawDa
>>> > ta /fastq/test Apr 02, 2016 10:12:19 PM
>>> > org.apache.oodt.cas.crawl.ProductCrawler ingest
>>> > INFO: Successfully ingested product:
>>> >
>>> [/mnt/celgene.rnd.combio.mmgp.external/TestSeqData/RNA-Seq/RawData/fastq/test]:
>>> > product id: 4c8de2da-265a-48c4-8380-3f1103dfecfc
>>> > Apr 02, 2016 10:12:19 PM org.apache.oodt.cas.crawl.ProductCrawler
>>> > handleFile
>>> > INFO: Successful ingest of product:
>>> > [/mnt/celgene.rnd.combio.mmgp.external/TestSeqData/RNA-Seq/RawData/f
>>> > as
>>> > tq/test]
>>> >
>>> >
>>> > *********************************************************
>>> > THIS ELECTRONIC MAIL MESSAGE AND ANY ATTACHMENT IS CONFIDENTIAL AND
>>> > MAY CONTAIN LEGALLY PRIVILEGED INFORMATION INTENDED ONLY FOR THE USE
>>> > OF THE INDIVIDUAL OR INDIVIDUALS NAMED ABOVE.
>>> > If the reader is not the intended recipient, or the employee or
>>> > agent responsible to deliver it to the intended recipient, you are
>>> > hereby notified that any dissemination, distribution or copying of
>>> > this communication is strictly prohibited. If you have received this
>>> > communication in error, please reply to the sender to notify us of
>>> > the error and delete the original message. Thank You.
>>> >
>>>
>>>
>>> --
>>> *Lewis*
>>>
>>> *********************************************************
>>> THIS ELECTRONIC MAIL MESSAGE AND ANY ATTACHMENT IS CONFIDENTIAL AND
>>> MAY CONTAIN LEGALLY PRIVILEGED INFORMATION INTENDED ONLY FOR THE USE
>>> OF THE INDIVIDUAL OR INDIVIDUALS NAMED ABOVE.
>>> If the reader is not the intended recipient, or the employee or agent
>>> responsible to deliver it to the intended recipient, you are hereby
>>> notified that any dissemination, distribution or copying of this
>>> communication is strictly prohibited. If you have received this
>>> communication in error, please reply to the sender to notify us of the
>>> error and delete the original message. Thank You.
>>>
>>
>>
>>
>>--
>>*Lewis*
>>*********************************************************
>>THIS ELECTRONIC MAIL MESSAGE AND ANY ATTACHMENT IS
>>CONFIDENTIAL AND MAY CONTAIN LEGALLY PRIVILEGED
>>INFORMATION INTENDED ONLY FOR THE USE OF THE INDIVIDUAL
>>OR INDIVIDUALS NAMED ABOVE.
>>If the reader is not the intended recipient, or the
>>employee or agent responsible to deliver it to the
>>intended recipient, you are hereby notified that any
>>dissemination, distribution or copying of this
>>communication is strictly prohibited. If you have
>>received this communication in error, please reply to the
>>sender to notify us of the error and delete the original
>>message. Thank You.
>


RE: Transition from OODT 0.6 to 0.12 cannot find extractor specifications

Posted by "Mallder, Valerie" <Va...@jhuapl.edu>.
I haven't had a chance to study this yet. But after a first pass through this email trail I'm suspicious that Kostas may be running into the same problem I ran into when tika was either introduced or upgraded to a much newer version than had been in the system previously. I ended up having to modify my mimetypes.xml file to get around the problem I was having after that happened. But, I will look at this in detail tomorrow and compare it to my history of debugging when I was going from versions 0.6 to 0.7 to 0.8 to 0.9 and 0.10 and see if the problem is what I have seen before. However, I am staying at 0.10, so I won't be able to speak for going up to version 0.12.

Val



Sent with Good (www.good.com)
________________________________
From: Chris Mattmann <ch...@gmail.com>
Sent: Wednesday, April 6, 2016 9:58:15 PM
To: dev@oodt.apache.org
Subject: Re: Transition from OODT 0.6 to 0.12 cannot find extractor specifications

Thanks Kostas, they are wire compatible and this is a good
use case.

The crawler should not have undergone much update (perhaps at
all) since 0.6, so am not exactly sure why you were seeing
issues with it. There are definitely upgrades since 0.6 to CAS-PGE
and maybe that’s what you were running into.


—
Chris Mattmann
chris.mattmann@gmail.com







On 4/6/16, 6:47 PM, "Konstantinos Mavrommatis" <km...@celgene.com> wrote:

>I am giving up on this....
>I had used [1] in the first place to setup oodt (v0.6 back then) my setup in the new system is identical to the old one.
>I could not make much out of [0]. Among other things I tried to copy the files in the old crawler/policy to the new crawler/policy - which included some legacy-cmd-line-options.xml, legacy-cmd-line actions.xml. I also tried to reinstall the full oodt on the client side, but still did not work.
>
>I ended up reverting to the older version (0.6) which I run on my client. The server (which runs FM) is still 0.12, but the combination seems to be working fine.
>
>K
>
>-----Original Message-----
>From: Lewis John Mcgibbney [mailto:lewis.mcgibbney@gmail.com]
>Sent: Tuesday, April 05, 2016 3:33 AM
>To: dev@oodt.apache.org
>Subject: Re: Transition from OODT 0.6 to 0.12 cannot find extractor specifications
>
>Hi K,
>OK so I did a bit of searching here and located a bunch of files which are defined as legacy... you can check the search results out below https://urldefense.proofpoint.com/v2/url?u=https-3A__github.com_apache_oodt_search-3Futf8-3D-25E2-259C-2593-26q-3DAutoDetectProductCrawler-26type-3DCode&d=CwIBaQ&c=CZZujK3G2KuXGEKnzq-Hrg&r=wndYZ4MLMT9l3Zb2WZv2hq2O6yvZ1Cs-T2gHY95y7ZA&m=AZOhzDmmNuBD_R9H2fm-CubVmid0OEJbXqk4G2cmzDs&s=B33E_m-BUEEQBqIqa2J8tZ2vnLqfwapWZp9Rn5nRyU8&e=
>I would urge you to have a look at the AutoDetectProductCrawler Javadoc description included in master branch [0] as well to see if you've got everything required.
>Finally, I came across some documentation on the wiki which may guide you in the right direction [1]. It may also be outdated though so please let us know if that it the case.
>hth
>
>[0]
>https://urldefense.proofpoint.com/v2/url?u=https-3A__github.com_apache_oodt_blob_91d0bafe71124906bd94baad746189caf35fb39c_crawler_src_main_java_org_apache_oodt_cas_crawl_AutoDetectProductCrawler.java-23L40-2DL64&d=CwIBaQ&c=CZZujK3G2KuXGEKnzq-Hrg&r=wndYZ4MLMT9l3Zb2WZv2hq2O6yvZ1Cs-T2gHY95y7ZA&m=AZOhzDmmNuBD_R9H2fm-CubVmid0OEJbXqk4G2cmzDs&s=rJpNgTfZDhDyGV5KksACkvbSnkVvobGfBQcxXiLWwT4&e=
>[1]
>https://urldefense.proofpoint.com/v2/url?u=https-3A__cwiki.apache.org_confluence_display_OODT_Mime-2Btype-2Bdetection-2Bwith-2Bthe-2BAutoDetectProductCrawler&d=CwIBaQ&c=CZZujK3G2KuXGEKnzq-Hrg&r=wndYZ4MLMT9l3Zb2WZv2hq2O6yvZ1Cs-T2gHY95y7ZA&m=AZOhzDmmNuBD_R9H2fm-CubVmid0OEJbXqk4G2cmzDs&s=V5fEGERshX3JHBTQXryhwoEZqhgarILk8WutEwICmGs&e=
>
>On Mon, Apr 4, 2016 at 10:54 PM, Konstantinos Mavrommatis < kmavrommatis@celgene.com> wrote:
>
>> Hi,
>> It seems to be happening for a number of types of files that I have in
>> the mimetypes.xml.
>> A few things are puzzling to me: this file which is a .gz file is not
>> processed by the regular tika mimetypes which contains the gzip files
>> A file that has no extension, which defaults to txt is passed to the
>> MetExtractor.pl and processed.
>>
>> Any ideas I can find what are the preconditions that fail ? I tried to
>> change the log level to DEBUG for all components but I did not get
>> much more information. This must be something that changed in the OODT
>> releases
>> >0.6 but could not find anything relevant in the release notes.
>> I also noticed in the documentation  of the AutoDecectProductCrawler
>> that it uses the file met-extr-preconditions.xml which I could not
>> find anywhere in the deployed OODT or the src directories. Could that
>> be a reason for the problem I observe?
>>
>> Thanks
>> K
>>
>> -----Original Message-----
>> From: Lewis John Mcgibbney [mailto:lewis.mcgibbney@gmail.com]
>> Sent: Monday, April 04, 2016 3:24 PM
>> To: dev@oodt.apache.org
>> Subject: Re: Transition from OODT 0.6 to 0.12 cannot find extractor
>> specifications
>>
>> Hi Konstantinos,
>> It appears to be happening with a tar.gz file as well right?
>>
>> WARNING: No extractor specs specified for
>> /mnt/celgene.rnd.combio.mmgp.external/TestSeqData/RNA-Seq/RawData/fast
>> q/cas-crawler-04-02-16.log.gz
>>
>> I wonder if it is the file names... However I would be extremely
>> surprised as I've seen some much more verbose file naming.
>> Lewis
>>
>> On Saturday, April 2, 2016, Konstantinos Mavrommatis <
>> kmavrommatis@celgene.com> wrote:
>>
>> > Hi,
>> > I am trying to replicate a fully functional service that I had setup
>> > long time ago using OODT 0.6 but I am having the following problem
>> > that does not allow me to ingest files. When I try to ingest files
>> > with the extension fastq.gz I get the line:
>> > WARNING: No extractor specs specified for
>> > /mnt/celgene.rnd.combio.mmgp.external/TestSeqData/RNA-Seq/RawData/fa
>> > st q/E837642_R1.fastq.gz Apr 02, 2016 10:12:14 PM
>> > org.apache.oodt.cas.crawl.ProductCrawler
>> > handleFile
>> > And of course the file is not ingested. This process works without
>> > problem with OODT 0.6 on a different server.
>> >
>> > The crawler command I am running is:
>> > ./crawler_launcher \
>> > --operation \
>> > --launchAutoCrawler \
>> > --productPath $FILEPATH \
>> > --filemgrUrl $OODT_FILEMGR_URL \
>> > --clientTransferer
>> > org.apache.oodt.cas.filemgr.datatransfer.InPlaceDataTransferFactory
>> > \ --mimeExtractorRepo ../policy/mime-extractor-map.xml \ --noRecur \
>> > --crawlForDirs 2>&1
>> >
>> >
>> >
>> > I have setup OODT 0.12 on a server which runs FM listening to port 9000.
>> > From a client machine I have verified that I can use FM to ingest
>> products.
>> > I am now trying to use crawler to crawl and ingest all files in a
>> > directory. Since I have non standard MIME types in these directories
>> > I have done the following:
>> > 1. Added my own mime types in policy/mimetypes.xml eg
>> >   <mime-type type="text/fastq">
>> >                 <glob pattern="*.fastq"/>
>> >                 <glob pattern="*.fastq.gz"/>
>> >                 <glob pattern="*.fastq.bz"/>
>> >                 <glob pattern="*.fastq.bz2"/>
>> >                 <glob pattern="*.fastq.bzip"/>
>> >                 <glob pattern="*.fq"/>
>> >                 <glob pattern="*.fq.gz"/>
>> >                 <glob pattern="*.fq.bz"/>
>> >                 <glob pattern="*.fq.bz2"/>
>> >                 <glob pattern="*.fq.bzip"/>
>> >         </mime-type>
>> > 2. created the file policy/mime-extractor-map.xml
>> >
>> >         <mime type="text/fastq">
>> >                 <extractor
>> > class="org.apache.oodt.cas.metadata.extractors.ExternMetExtractor">
>> >                         <config
>> > file="/apache-oodt/crawler/bin/fastq.config"/>
>> >                         <preCondComparators>
>> >                                 <preCondComparator
>> > id="CheckThatDataFileSizeIsGreaterThanZero"/>
>> >                         </preCondComparators>
>> >                 </extractor>
>> >         </mime>
>> >
>> > 3. created the file fastq.config
>> > <?xml version="1.0" encoding="UTF-8"?> <cas:externextractor
>> > xmlns:cas="https://urldefense.proofpoint.com/v2/url?u=http-3A__oodt.jpl.nasa.gov_1.0_cas&d=CwIBaQ&c=CZZujK3G2KuXGEKnzq-Hrg&r=wndYZ4MLMT9l3Zb2WZv2hq2O6yvZ1Cs-T2gHY95y7ZA&m=AZOhzDmmNuBD_R9H2fm-CubVmid0OEJbXqk4G2cmzDs&s=FvkBYgoM8RnUm2ITaMjYb1s1sa9YtHvNL4c1M_KF06w&e= ">
>> >   <exec workingDir="">
>> >
>> >
>> <extractorBinPath>/apache-oodt/crawler/bin/MetExtractorNGS.pl</extract
>> orBinPath>
>> >       <args>
>> >          <arg isDataFile="true"></arg>
>> >         <arg>fastq</arg>
>> >       </args>
>> >    </exec>
>> > </cas:externextractor>
>> >
>> >
>> >
>> > The MetExtractorNGS.pl is a small perl script that opens the file to
>> > be ingested, gets some information and stores it in the .met file
>> > that corresponds to the file to be ingested and have manually
>> > verified that works as expected producing the correct met file.
>> >
>> > What am I missing here? Any ideas comments suggestions will be
>> > greatly appreciated.
>> > Thanks in advance for any help
>> > Kostas
>> >
>> >
>> >
>> > PS1 The full output from running the crawler command follows:
>> >
>> >
>> > Setting property 'StdProductCrawler.filemgrUrl'
>> > Setting property 'MetExtractorProductCrawler.filemgrUrl'
>> > Setting property 'AutoDetectProductCrawler.filemgrUrl'
>> > Setting property 'StdProductCrawler.clientTransferer'
>> > Setting property 'MetExtractorProductCrawler.clientTransferer'
>> > Setting property 'AutoDetectProductCrawler.clientTransferer'
>> > Setting property 'StdProductCrawler.noRecur'
>> > Setting property 'MetExtractorProductCrawler.noRecur'
>> > Setting property 'AutoDetectProductCrawler.noRecur'
>> > Setting property 'AutoDetectProductCrawler.mimeExtractorRepo'
>> > Setting property 'StdProductCrawler.productPath'
>> > Setting property 'MetExtractorProductCrawler.productPath'
>> > Setting property 'AutoDetectProductCrawler.productPath'
>> > Apr 02, 2016 10:12:13 PM
>> > org.springframework.beans.factory.config.PropertyOverrideConfigurer
>> > processKey
>> > FINE: Property 'AutoDetectProductCrawler.noRecur' set to value
>> > [true] Apr 02, 2016 10:12:13 PM
>> > org.springframework.beans.factory.config.PropertyOverrideConfigurer
>> > processKey
>> > FINE: Property 'StdProductCrawler.productPath' set to value
>> > [/mnt/celgene.rnd.combio.mmgp.external/TestSeqData/RNA-Seq/RawData/f
>> > as
>> > tq]
>> > Apr 02, 2016 10:12:13 PM
>> > org.springframework.beans.factory.config.PropertyOverrideConfigurer
>> > processKey
>> > FINE: Property 'MetExtractorProductCrawler.noRecur' set to value
>> > [true] Apr 02, 2016 10:12:13 PM
>> > org.springframework.beans.factory.config.PropertyOverrideConfigurer
>> > processKey
>> > FINE: Property 'AutoDetectProductCrawler.mimeExtractorRepo' set to
>> > value [../policy/mime-extractor-map.xml]
>> > Apr 02, 2016 10:12:13 PM
>> > org.springframework.beans.factory.config.PropertyOverrideConfigurer
>> > processKey
>> > FINE: Property 'MetExtractorProductCrawler.clientTransferer' set to
>> > value
>> > [org.apache.oodt.cas.filemgr.datatransfer.InPlaceDataTransferFactory
>> > ]
>> > Apr 02, 2016 10:12:13 PM
>> > org.springframework.beans.factory.config.PropertyOverrideConfigurer
>> > processKey
>> > FINE: Property 'AutoDetectProductCrawler.filemgrUrl' set to value [
>> > https://urldefense.proofpoint.com/v2/url?u=http-3A__192.168.8.44-3A9
>> > 00
>> > 0&d=CwIBaQ&c=CZZujK3G2KuXGEKnzq-Hrg&r=wndYZ4MLMT9l3Zb2WZv2hq2O6yvZ1C
>> > s-
>> > T2gHY95y7ZA&m=Qaz0eKz7FHe35NMF43A17ey59ANhAqJD5ZfwZQC0VRo&s=OvpwZVR1
>> > Xq gKclL83VXAWh__c7nz87xK_nS-O7hIXqc&e= ] Apr 02, 2016 10:12:13 PM
>> > org.springframework.beans.factory.config.PropertyOverrideConfigurer
>> > processKey
>> > FINE: Property 'AutoDetectProductCrawler.clientTransferer' set to
>> > value
>> > [org.apache.oodt.cas.filemgr.datatransfer.InPlaceDataTransferFactory
>> > ]
>> > Apr 02, 2016 10:12:13 PM
>> > org.springframework.beans.factory.config.PropertyOverrideConfigurer
>> > processKey
>> > FINE: Property 'StdProductCrawler.noRecur' set to value [true] Apr
>> > 02,
>> > 2016 10:12:13 PM
>> > org.springframework.beans.factory.config.PropertyOverrideConfigurer
>> > processKey
>> > FINE: Property 'StdProductCrawler.filemgrUrl' set to value [
>> > https://urldefense.proofpoint.com/v2/url?u=http-3A__192.168.8.44-3A9
>> > 00
>> > 0&d=CwIBaQ&c=CZZujK3G2KuXGEKnzq-Hrg&r=wndYZ4MLMT9l3Zb2WZv2hq2O6yvZ1C
>> > s-
>> > T2gHY95y7ZA&m=Qaz0eKz7FHe35NMF43A17ey59ANhAqJD5ZfwZQC0VRo&s=OvpwZVR1
>> > Xq gKclL83VXAWh__c7nz87xK_nS-O7hIXqc&e= ] Apr 02, 2016 10:12:13 PM
>> > org.springframework.beans.factory.config.PropertyOverrideConfigurer
>> > processKey
>> > FINE: Property 'AutoDetectProductCrawler.productPath' set to value
>> > [/mnt/celgene.rnd.combio.mmgp.external/TestSeqData/RNA-Seq/RawData/f
>> > as
>> > tq]
>> > Apr 02, 2016 10:12:13 PM
>> > org.springframework.beans.factory.config.PropertyOverrideConfigurer
>> > processKey
>> > FINE: Property 'MetExtractorProductCrawler.filemgrUrl' set to value
>> > [
>> > https://urldefense.proofpoint.com/v2/url?u=http-3A__192.168.8.44-3A9
>> > 00
>> > 0&d=CwIBaQ&c=CZZujK3G2KuXGEKnzq-Hrg&r=wndYZ4MLMT9l3Zb2WZv2hq2O6yvZ1C
>> > s-
>> > T2gHY95y7ZA&m=Qaz0eKz7FHe35NMF43A17ey59ANhAqJD5ZfwZQC0VRo&s=OvpwZVR1
>> > Xq gKclL83VXAWh__c7nz87xK_nS-O7hIXqc&e= ] Apr 02, 2016 10:12:13 PM
>> > org.springframework.beans.factory.config.PropertyOverrideConfigurer
>> > processKey
>> > FINE: Property 'StdProductCrawler.clientTransferer' set to value
>> > [org.apache.oodt.cas.filemgr.datatransfer.InPlaceDataTransferFactory
>> > ]
>> > Apr 02, 2016 10:12:13 PM
>> > org.springframework.beans.factory.config.PropertyOverrideConfigurer
>> > processKey
>> > FINE: Property 'MetExtractorProductCrawler.productPath' set to value
>> > [/mnt/celgene.rnd.combio.mmgp.external/TestSeqData/RNA-Seq/RawData/f
>> > as tq] Apr 02, 2016 10:12:13 PM
>> > org.apache.oodt.cas.crawl.ProductCrawler
>> > crawl
>> > INFO: Crawling
>> > /mnt/celgene.rnd.combio.mmgp.external/TestSeqData/RNA-Seq/RawData/fa
>> > st q Apr 02, 2016 10:12:13 PM
>> > org.apache.oodt.cas.crawl.ProductCrawler
>> > handleFile
>> > INFO: Handling file
>> > /mnt/celgene.rnd.combio.mmgp.external/TestSeqData/RNA-Seq/RawData/fa
>> > st
>> > q/E837642_R1.fastq.gz
>> > Apr 02, 2016 10:12:14 PM
>> > org.apache.oodt.cas.crawl.AutoDetectProductCrawler
>> > passesPreconditions
>> > WARNING: No extractor specs specified for
>> > /mnt/celgene.rnd.combio.mmgp.external/TestSeqData/RNA-Seq/RawData/fa
>> > st q/E837642_R1.fastq.gz Apr 02, 2016 10:12:14 PM
>> > org.apache.oodt.cas.crawl.ProductCrawler
>> > handleFile
>> > WARNING: Failed to pass preconditions for ingest of product:
>> > [/mnt/celgene.rnd.combio.mmgp.external/TestSeqData/RNA-Seq/RawData/f
>> > as tq/E837642_R1.fastq.gz] Apr 02, 2016 10:12:14 PM
>> > org.apache.oodt.cas.crawl.ProductCrawler
>> > handleFile
>> > INFO: Handling file
>> > /mnt/celgene.rnd.combio.mmgp.external/TestSeqData/RNA-Seq/RawData/fa
>> > st
>> > q/E837642_R1.fastq.gz.met
>> > Apr 02, 2016 10:12:14 PM
>> > org.apache.oodt.cas.metadata.preconditions.PreCondEvalUtils eval
>> > INFO: Passed precondition comparator id
>> > CheckThatDataFileSizeIsGreaterThanZero
>> > Apr 02, 2016 10:12:14 PM
>> > org.apache.oodt.cas.metadata.extractors.ExternMetExtractor
>> > extrMetadata
>> > INFO: Generating met file for product file:
>> > [/mnt/celgene.rnd.combio.mmgp.external/TestSeqData/RNA-Seq/RawData/f
>> > as
>> > tq/E837642_R1.fastq.gz.met]
>> > Apr 02, 2016 10:12:14 PM
>> > org.apache.oodt.cas.metadata.extractors.ExternMetExtractor
>> > extrMetadata
>> > INFO: Executing command line:
>> > [/celgene/software/apache-oodt/crawler/bin/MetExtractorNGS.pl
>> > /mnt/celgene.rnd.combio.mmgp.external/TestSeqData/RNA-Seq/RawData/fa
>> > st
>> > q/E837642_R1.fastq.gz.met
>> > text ] with workingDir:
>> > [/mnt/celgene.rnd.combio.mmgp.external/TestSeqData/RNA-Seq/RawData/f
>> > as
>> > tq]
>> > to extract metadata
>> > OUTPUT: [WARN : MetExtractorNGS - 2016/04/02 22:12:15] - Input file
>> > /mnt/celgene.rnd.combio.mmgp.external/TestSeqData/RNA-Seq/RawData/fa
>> > st q/E837642_R1.fastq.gz.met will be ignored. .met files are not
>> > processed !
>> > Apr 02, 2016 10:12:15 PM org.apache.oodt.cas.crawl.ProductCrawler
>> > handleFile
>> > SEVERE: Failed to get metadata for product : Met extractor failed to
>> > create metadata file
>> > org.apache.oodt.cas.metadata.exceptions.MetExtractionException: Met
>> > extractor failed to create metadata file
>> >         at
>> >
>> org.apache.oodt.cas.metadata.extractors.ExternMetExtractor.extrMetadat
>> a(ExternMetExtractor.java:120)
>> >         at
>> >
>> org.apache.oodt.cas.metadata.AbstractMetExtractor.extractMetadata(Abst
>> ractMetExtractor.java:74)
>> >         at
>> >
>> org.apache.oodt.cas.crawl.AutoDetectProductCrawler.getMetadataForProdu
>> ct(AutoDetectProductCrawler.java:84)
>> >         at
>> >
>> org.apache.oodt.cas.crawl.ProductCrawler.handleFile(ProductCrawler.jav
>> a:136)
>> >         at
>> > org.apache.oodt.cas.crawl.ProductCrawler.crawl(ProductCrawler.java:104)
>> >         at
>> > org.apache.oodt.cas.crawl.ProductCrawler.crawl(ProductCrawler.java:74)
>> >         at
>> >
>> org.apache.oodt.cas.crawl.cli.action.CrawlerLauncherCliAction.execute(
>> CrawlerLauncherCliAction.java:58)
>> >         at
>> > org.apache.oodt.cas.cli.CmdLineUtility.execute(CmdLineUtility.java:331)
>> >         at
>> > org.apache.oodt.cas.cli.CmdLineUtility.run(CmdLineUtility.java:188)
>> >         at
>> > org.apache.oodt.cas.crawl.CrawlerLauncher.main(CrawlerLauncher.java:
>> > 36
>> > )
>> >
>> > Apr 02, 2016 10:12:15 PM org.apache.oodt.cas.crawl.ProductCrawler
>> > handleFile
>> > INFO: Handling file
>> > /mnt/celgene.rnd.combio.mmgp.external/TestSeqData/RNA-Seq/RawData/fa
>> > st
>> > q/E837642_R2.fastq.gz
>> > Apr 02, 2016 10:12:15 PM
>> > org.apache.oodt.cas.crawl.AutoDetectProductCrawler
>> > passesPreconditions
>> > WARNING: No extractor specs specified for
>> > /mnt/celgene.rnd.combio.mmgp.external/TestSeqData/RNA-Seq/RawData/fa
>> > st q/E837642_R2.fastq.gz Apr 02, 2016 10:12:15 PM
>> > org.apache.oodt.cas.crawl.ProductCrawler
>> > handleFile
>> > WARNING: Failed to pass preconditions for ingest of product:
>> > [/mnt/celgene.rnd.combio.mmgp.external/TestSeqData/RNA-Seq/RawData/f
>> > as tq/E837642_R2.fastq.gz] Apr 02, 2016 10:12:15 PM
>> > org.apache.oodt.cas.crawl.ProductCrawler
>> > handleFile
>> > INFO: Handling file
>> > /mnt/celgene.rnd.combio.mmgp.external/TestSeqData/RNA-Seq/RawData/fa
>> > st
>> > q/E837642_R2.fastq.gz.met
>> > Apr 02, 2016 10:12:15 PM
>> > org.apache.oodt.cas.metadata.preconditions.PreCondEvalUtils eval
>> > INFO: Passed precondition comparator id
>> > CheckThatDataFileSizeIsGreaterThanZero
>> > Apr 02, 2016 10:12:16 PM
>> > org.apache.oodt.cas.metadata.extractors.ExternMetExtractor
>> > extrMetadata
>> > INFO: Generating met file for product file:
>> > [/mnt/celgene.rnd.combio.mmgp.external/TestSeqData/RNA-Seq/RawData/f
>> > as
>> > tq/E837642_R2.fastq.gz.met]
>> > Apr 02, 2016 10:12:16 PM
>> > org.apache.oodt.cas.metadata.extractors.ExternMetExtractor
>> > extrMetadata
>> > INFO: Executing command line:
>> > [/celgene/software/apache-oodt/crawler/bin/MetExtractorNGS.pl
>> > /mnt/celgene.rnd.combio.mmgp.external/TestSeqData/RNA-Seq/RawData/fa
>> > st
>> > q/E837642_R2.fastq.gz.met
>> > text ] with workingDir:
>> > [/mnt/celgene.rnd.combio.mmgp.external/TestSeqData/RNA-Seq/RawData/f
>> > as
>> > tq]
>> > to extract metadata
>> > OUTPUT: [WARN : MetExtractorNGS - 2016/04/02 22:12:16] - Input file
>> > /mnt/celgene.rnd.combio.mmgp.external/TestSeqData/RNA-Seq/RawData/fa
>> > st q/E837642_R2.fastq.gz.met will be ignored. .met files are not
>> > processed !
>> > Apr 02, 2016 10:12:17 PM org.apache.oodt.cas.crawl.ProductCrawler
>> > handleFile
>> > SEVERE: Failed to get metadata for product : Met extractor failed to
>> > create metadata file
>> > org.apache.oodt.cas.metadata.exceptions.MetExtractionException: Met
>> > extractor failed to create metadata file
>> >         at
>> >
>> org.apache.oodt.cas.metadata.extractors.ExternMetExtractor.extrMetadat
>> a(ExternMetExtractor.java:120)
>> >         at
>> >
>> org.apache.oodt.cas.metadata.AbstractMetExtractor.extractMetadata(Abst
>> ractMetExtractor.java:74)
>> >         at
>> >
>> org.apache.oodt.cas.crawl.AutoDetectProductCrawler.getMetadataForProdu
>> ct(AutoDetectProductCrawler.java:84)
>> >         at
>> >
>> org.apache.oodt.cas.crawl.ProductCrawler.handleFile(ProductCrawler.jav
>> a:136)
>> >         at
>> > org.apache.oodt.cas.crawl.ProductCrawler.crawl(ProductCrawler.java:104)
>> >         at
>> > org.apache.oodt.cas.crawl.ProductCrawler.crawl(ProductCrawler.java:74)
>> >         at
>> >
>> org.apache.oodt.cas.crawl.cli.action.CrawlerLauncherCliAction.execute(
>> CrawlerLauncherCliAction.java:58)
>> >         at
>> > org.apache.oodt.cas.cli.CmdLineUtility.execute(CmdLineUtility.java:331)
>> >         at
>> > org.apache.oodt.cas.cli.CmdLineUtility.run(CmdLineUtility.java:188)
>> >         at
>> > org.apache.oodt.cas.crawl.CrawlerLauncher.main(CrawlerLauncher.java:
>> > 36
>> > )
>> >
>> > Apr 02, 2016 10:12:17 PM org.apache.oodt.cas.crawl.ProductCrawler
>> > handleFile
>> > INFO: Handling file
>> > /mnt/celgene.rnd.combio.mmgp.external/TestSeqData/RNA-Seq/RawData/fa
>> > st
>> > q/cas-crawler-04-02-16.log.gz
>> > Apr 02, 2016 10:12:17 PM
>> > org.apache.oodt.cas.crawl.AutoDetectProductCrawler
>> > passesPreconditions
>> > WARNING: No extractor specs specified for
>> > /mnt/celgene.rnd.combio.mmgp.external/TestSeqData/RNA-Seq/RawData/fa
>> > st q/cas-crawler-04-02-16.log.gz Apr 02, 2016 10:12:17 PM
>> > org.apache.oodt.cas.crawl.ProductCrawler
>> > handleFile
>> > WARNING: Failed to pass preconditions for ingest of product:
>> > [/mnt/celgene.rnd.combio.mmgp.external/TestSeqData/RNA-Seq/RawData/f
>> > as tq/cas-crawler-04-02-16.log.gz] Apr 02, 2016 10:12:17 PM
>> > org.apache.oodt.cas.crawl.ProductCrawler
>> > handleFile
>> > INFO: Handling file
>> > /mnt/celgene.rnd.combio.mmgp.external/TestSeqData/RNA-Seq/RawData/fa
>> > st
>> > q/cas-crawler-04-02-16.tar.gz
>> > Apr 02, 2016 10:12:17 PM
>> > org.apache.oodt.cas.crawl.AutoDetectProductCrawler
>> > passesPreconditions
>> > WARNING: No extractor specs specified for
>> > /mnt/celgene.rnd.combio.mmgp.external/TestSeqData/RNA-Seq/RawData/fa
>> > st q/cas-crawler-04-02-16.tar.gz Apr 02, 2016 10:12:17 PM
>> > org.apache.oodt.cas.crawl.ProductCrawler
>> > handleFile
>> > WARNING: Failed to pass preconditions for ingest of product:
>> > [/mnt/celgene.rnd.combio.mmgp.external/TestSeqData/RNA-Seq/RawData/f
>> > as tq/cas-crawler-04-02-16.tar.gz] Apr 02, 2016 10:12:17 PM
>> > org.apache.oodt.cas.crawl.ProductCrawler
>> > handleFile
>> > INFO: Handling file
>> > /mnt/celgene.rnd.combio.mmgp.external/TestSeqData/RNA-Seq/RawData/fa
>> > st
>> > q/cas-crawler-mnt-celgene.rnd.combio.mmgp.external-TestSeqData-RNA-S
>> > eq
>> > -RawData-fastq-04-02-16.tar.gz
>> > Apr 02, 2016 10:12:17 PM
>> > org.apache.oodt.cas.crawl.AutoDetectProductCrawler
>> > passesPreconditions
>> > WARNING: No extractor specs specified for
>> > /mnt/celgene.rnd.combio.mmgp.external/TestSeqData/RNA-Seq/RawData/fa
>> > st
>> > q/cas-crawler-mnt-celgene.rnd.combio.mmgp.external-TestSeqData-RNA-S
>> > eq -RawData-fastq-04-02-16.tar.gz Apr 02, 2016 10:12:17 PM
>> > org.apache.oodt.cas.crawl.ProductCrawler
>> > handleFile
>> > WARNING: Failed to pass preconditions for ingest of product:
>> > [/mnt/celgene.rnd.combio.mmgp.external/TestSeqData/RNA-Seq/RawData/f
>> > as
>> > tq/cas-crawler-mnt-celgene.rnd.combio.mmgp.external-TestSeqData-RNA-
>> > Se q-RawData-fastq-04-02-16.tar.gz] Apr 02, 2016 10:12:17 PM
>> > org.apache.oodt.cas.crawl.ProductCrawler
>> > handleFile
>> > INFO: Handling file
>> > /mnt/celgene.rnd.combio.mmgp.external/TestSeqData/RNA-Seq/RawData/fa
>> > st
>> > q/test
>> > Apr 02, 2016 10:12:17 PM
>> > org.apache.oodt.cas.metadata.preconditions.PreCondEvalUtils eval
>> > INFO: Passed precondition comparator id
>> > CheckThatDataFileSizeIsGreaterThanZero
>> > Apr 02, 2016 10:12:17 PM
>> > org.apache.oodt.cas.metadata.extractors.ExternMetExtractor
>> > extrMetadata
>> > INFO: Generating met file for product file:
>> > [/mnt/celgene.rnd.combio.mmgp.external/TestSeqData/RNA-Seq/RawData/f
>> > as
>> > tq/test]
>> > Apr 02, 2016 10:12:17 PM
>> > org.apache.oodt.cas.metadata.extractors.ExternMetExtractor
>> > extrMetadata
>> > INFO: Executing command line:
>> > [/celgene/software/apache-oodt/crawler/bin/MetExtractorNGS.pl
>> > /mnt/celgene.rnd.combio.mmgp.external/TestSeqData/RNA-Seq/RawData/fa
>> > st
>> > q/test
>> > text ] with workingDir:
>> > [/mnt/celgene.rnd.combio.mmgp.external/TestSeqData/RNA-Seq/RawData/f
>> > as
>> > tq]
>> > to extract metadata
>> > OUTPUT: [DEBUG : MetExtractorNGS - 2016/04/02 22:12:18] - Accessing
>> > NGS server at
>> > https://urldefense.proofpoint.com/v2/url?u=http-3A__192.168.8.44-3A8
>> > 08
>> > 2_RPC2&d=CwIBaQ&c=CZZujK3G2KuXGEKnzq-Hrg&r=wndYZ4MLMT9l3Zb2WZv2hq2O6
>> > yv
>> > Z1Cs-T2gHY95y7ZA&m=Qaz0eKz7FHe35NMF43A17ey59ANhAqJD5ZfwZQC0VRo&s=tSc
>> > i2 Q1bJj0cQnBHjjOwtZjjx9uNMoN5Bi-ABG0Q7Y4&e=
>> > OUTPUT: [DEBUG : metadataPrepare - 2016/04/02 22:12:18] - addMetadata:
>> > metadata for file_host are not in array format.Converting..
>> > OUTPUT: [DEBUG : metadataPrepare - 2016/04/02 22:12:18] - addMetadata:
>> > adding key/value [file_host]/[ip-192-168-8-66]
>> > OUTPUT: [DEBUG : metadataPrepare - 2016/04/02 22:12:18] - addMetadata:
>> > metadata for ProductType are not in array format.Converting..
>> > OUTPUT: [DEBUG : metadataPrepare - 2016/04/02 22:12:18] - addMetadata:
>> > adding key/value [ProductType]/[GenericFile]
>> > OUTPUT: [DEBUG : metadataPrepare - 2016/04/02 22:12:18] - addMetadata:
>> > metadata for ingest_user are not in array format.Converting..
>> > OUTPUT: [DEBUG : metadataPrepare - 2016/04/02 22:12:18] - addMetadata:
>> > adding key/value [ingest_user]/[kmavrommatis]
>> > OUTPUT: [DEBUG : MetExtractorNGS - 2016/04/02 22:12:18] - The file
>> > path is ARRAY(0x22d3f48). It will be added under the FilePath
>> > metadata field
>> > OUTPUT: [DEBUG : metadataPrepare - 2016/04/02 22:12:18] - addMetadata:
>> > metadata for FilePath are not in array format.Converting..
>> > OUTPUT: [DEBUG : metadataPrepare - 2016/04/02 22:12:18] - addMetadata:
>> > adding key/value
>> > [FilePath]/[/mnt/celgene.rnd.combio.mmgp.external/TestSeqData/RNA-Se
>> > q/
>> > RawData/fastq/test]
>> > OUTPUT: [DEBUG : MetExtractorNGS - 2016/04/02 22:12:18] - This file
>> > is of type text
>> > OUTPUT: [DEBUG : MetExtractorNGS - 2016/04/02 22:12:18] - Storing
>> > metadata in file
>> > /mnt/celgene.rnd.combio.mmgp.external/TestSeqData/RNA-Seq/RawData/fa
>> > st
>> > q/test.met
>> > OUTPUT: [DEBUG : metadataPrepare - 2016/04/02 22:12:18] - Changing
>> > /mnt/celgene.rnd.combio.mmgp.external/TestSeqData/RNA-Seq/RawData/fa
>> > st
>> > q/test
>> > to
>> > OUTPUT: [DEBUG : metadataPrepare - 2016/04/02 22:12:18] -
>> > /mnt/celgene.rnd.combio.mmgp.external/TestSeqData/RNA-Seq/RawData/fa
>> > st
>> > q/test
>> > OUTPUT: [DEBUG : metadataPrepare - 2016/04/02 22:12:18] - Changing
>> > kmavrommatis to
>> > OUTPUT: [DEBUG : metadataPrepare - 2016/04/02 22:12:18] -
>> > kmavrommatis
>> > OUTPUT: [DEBUG : metadataPrepare - 2016/04/02 22:12:18] - Changing
>> > GenericFile to
>> > OUTPUT: [DEBUG : metadataPrepare - 2016/04/02 22:12:18] -
>> > GenericFile
>> > OUTPUT: [DEBUG : metadataPrepare - 2016/04/02 22:12:18] - Changing
>> > ip-192-168-8-66 to
>> > OUTPUT: [DEBUG : metadataPrepare - 2016/04/02 22:12:18] -
>> > ip-192-168-8-66
>> > OUTPUT: [DEBUG : MetExtractorNGS - 2016/04/02 22:12:19] - Process
>> > finished SUCCESSFULLY Apr 02, 2016 10:12:19 PM
>> > org.apache.oodt.cas.metadata.extractors.ExternMetExtractor
>> > extrMetadata
>> > INFO: Met extraction successful for product file:
>> > [/mnt/celgene.rnd.combio.mmgp.external/TestSeqData/RNA-Seq/RawData/f
>> > as
>> > tq/test] Apr 02, 2016 10:12:19 PM
>> > org.apache.oodt.cas.crawl.ProductCrawler ingest
>> > INFO: ProductCrawler: Ready to ingest product:
>> >
>> [/mnt/celgene.rnd.combio.mmgp.external/TestSeqData/RNA-Seq/RawData/fastq/test]:
>> > ProductType: [GenericFile]
>> > Apr 02, 2016 10:12:19 PM
>> > org.apache.oodt.cas.filemgr.ingest.StdIngester
>> > setFileManager
>> > INFO: StdIngester: connected to file manager:
>> > [https://urldefense.proofpoint.com/v2/url?u=http-3A__192.168.8.44-3A
>> > 90
>> > 00&d=CwIBaQ&c=CZZujK3G2KuXGEKnzq-Hrg&r=wndYZ4MLMT9l3Zb2WZv2hq2O6yvZ1
>> > Cs
>> > -T2gHY95y7ZA&m=Qaz0eKz7FHe35NMF43A17ey59ANhAqJD5ZfwZQC0VRo&s=OvpwZVR
>> > 1X qgKclL83VXAWh__c7nz87xK_nS-O7hIXqc&e= ] Apr 02, 2016 10:12:19 PM
>> > org.apache.oodt.cas.filemgr.datatransfer.InPlaceDataTransferer
>> > setFileManagerUrl
>> > INFO: In Place Data Transfer to:
>> > [https://urldefense.proofpoint.com/v2/url?u=http-3A__192.168.8.44-3A
>> > 90
>> > 00&d=CwIBaQ&c=CZZujK3G2KuXGEKnzq-Hrg&r=wndYZ4MLMT9l3Zb2WZv2hq2O6yvZ1
>> > Cs
>> > -T2gHY95y7ZA&m=Qaz0eKz7FHe35NMF43A17ey59ANhAqJD5ZfwZQC0VRo&s=OvpwZVR
>> > 1X qgKclL83VXAWh__c7nz87xK_nS-O7hIXqc&e= ] enabled Apr 02, 2016
>> > 10:12:19 PM org.apache.oodt.cas.filemgr.ingest.StdIngester
>> > ingest
>> > INFO: StdIngester: ingesting product: ProductName: [test]: ProductType:
>> > [GenericFile]: FileLocation:
>> > [/mnt/celgene.rnd.combio.mmgp.external/TestSeqData/RNA-Seq/RawData/f
>> > as
>> > tq/]
>> > Apr 02, 2016 10:12:19 PM
>> > org.apache.oodt.cas.filemgr.system.XmlRpcFileManagerClient
>> > ingestProduct
>> > FINEST: File Manager Client: clientTransfer enabled: transfering
>> > product [test] Apr 02, 2016 10:12:19 PM
>> > org.apache.oodt.cas.filemgr.versioning.VersioningUtils
>> > createBasicDataStoreRefsFlat
>> > FINE: VersioningUtils: Generated data store ref:
>> > file:/opt/oodt/data/archive/test/test from origRef:
>> > file:/mnt/celgene.rnd.combio.mmgp.external/TestSeqData/RNA-Seq/RawDa
>> > ta /fastq/test Apr 02, 2016 10:12:19 PM
>> > org.apache.oodt.cas.crawl.ProductCrawler ingest
>> > INFO: Successfully ingested product:
>> >
>> [/mnt/celgene.rnd.combio.mmgp.external/TestSeqData/RNA-Seq/RawData/fastq/test]:
>> > product id: 4c8de2da-265a-48c4-8380-3f1103dfecfc
>> > Apr 02, 2016 10:12:19 PM org.apache.oodt.cas.crawl.ProductCrawler
>> > handleFile
>> > INFO: Successful ingest of product:
>> > [/mnt/celgene.rnd.combio.mmgp.external/TestSeqData/RNA-Seq/RawData/f
>> > as
>> > tq/test]
>> >
>> >
>> > *********************************************************
>> > THIS ELECTRONIC MAIL MESSAGE AND ANY ATTACHMENT IS CONFIDENTIAL AND
>> > MAY CONTAIN LEGALLY PRIVILEGED INFORMATION INTENDED ONLY FOR THE USE
>> > OF THE INDIVIDUAL OR INDIVIDUALS NAMED ABOVE.
>> > If the reader is not the intended recipient, or the employee or
>> > agent responsible to deliver it to the intended recipient, you are
>> > hereby notified that any dissemination, distribution or copying of
>> > this communication is strictly prohibited. If you have received this
>> > communication in error, please reply to the sender to notify us of
>> > the error and delete the original message. Thank You.
>> >
>>
>>
>> --
>> *Lewis*
>>
>> *********************************************************
>> THIS ELECTRONIC MAIL MESSAGE AND ANY ATTACHMENT IS CONFIDENTIAL AND
>> MAY CONTAIN LEGALLY PRIVILEGED INFORMATION INTENDED ONLY FOR THE USE
>> OF THE INDIVIDUAL OR INDIVIDUALS NAMED ABOVE.
>> If the reader is not the intended recipient, or the employee or agent
>> responsible to deliver it to the intended recipient, you are hereby
>> notified that any dissemination, distribution or copying of this
>> communication is strictly prohibited. If you have received this
>> communication in error, please reply to the sender to notify us of the
>> error and delete the original message. Thank You.
>>
>
>
>
>--
>*Lewis*
>*********************************************************
>THIS ELECTRONIC MAIL MESSAGE AND ANY ATTACHMENT IS
>CONFIDENTIAL AND MAY CONTAIN LEGALLY PRIVILEGED
>INFORMATION INTENDED ONLY FOR THE USE OF THE INDIVIDUAL
>OR INDIVIDUALS NAMED ABOVE.
>If the reader is not the intended recipient, or the
>employee or agent responsible to deliver it to the
>intended recipient, you are hereby notified that any
>dissemination, distribution or copying of this
>communication is strictly prohibited. If you have
>received this communication in error, please reply to the
>sender to notify us of the error and delete the original
>message. Thank You.


Re: Transition from OODT 0.6 to 0.12 cannot find extractor specifications

Posted by Chris Mattmann <ch...@gmail.com>.
Thanks Kostas, they are wire compatible and this is a good
use case.

The crawler should not have undergone much update (perhaps at
all) since 0.6, so am not exactly sure why you were seeing 
issues with it. There are definitely upgrades since 0.6 to CAS-PGE
and maybe that’s what you were running into.


—
Chris Mattmann
chris.mattmann@gmail.com







On 4/6/16, 6:47 PM, "Konstantinos Mavrommatis" <km...@celgene.com> wrote:

>I am giving up on this....
>I had used [1] in the first place to setup oodt (v0.6 back then) my setup in the new system is identical to the old one.
>I could not make much out of [0]. Among other things I tried to copy the files in the old crawler/policy to the new crawler/policy - which included some legacy-cmd-line-options.xml, legacy-cmd-line actions.xml. I also tried to reinstall the full oodt on the client side, but still did not work. 
>
>I ended up reverting to the older version (0.6) which I run on my client. The server (which runs FM) is still 0.12, but the combination seems to be working fine.
>
>K
>
>-----Original Message-----
>From: Lewis John Mcgibbney [mailto:lewis.mcgibbney@gmail.com] 
>Sent: Tuesday, April 05, 2016 3:33 AM
>To: dev@oodt.apache.org
>Subject: Re: Transition from OODT 0.6 to 0.12 cannot find extractor specifications
>
>Hi K,
>OK so I did a bit of searching here and located a bunch of files which are defined as legacy... you can check the search results out below https://urldefense.proofpoint.com/v2/url?u=https-3A__github.com_apache_oodt_search-3Futf8-3D-25E2-259C-2593-26q-3DAutoDetectProductCrawler-26type-3DCode&d=CwIBaQ&c=CZZujK3G2KuXGEKnzq-Hrg&r=wndYZ4MLMT9l3Zb2WZv2hq2O6yvZ1Cs-T2gHY95y7ZA&m=AZOhzDmmNuBD_R9H2fm-CubVmid0OEJbXqk4G2cmzDs&s=B33E_m-BUEEQBqIqa2J8tZ2vnLqfwapWZp9Rn5nRyU8&e=
>I would urge you to have a look at the AutoDetectProductCrawler Javadoc description included in master branch [0] as well to see if you've got everything required.
>Finally, I came across some documentation on the wiki which may guide you in the right direction [1]. It may also be outdated though so please let us know if that it the case.
>hth
>
>[0]
>https://urldefense.proofpoint.com/v2/url?u=https-3A__github.com_apache_oodt_blob_91d0bafe71124906bd94baad746189caf35fb39c_crawler_src_main_java_org_apache_oodt_cas_crawl_AutoDetectProductCrawler.java-23L40-2DL64&d=CwIBaQ&c=CZZujK3G2KuXGEKnzq-Hrg&r=wndYZ4MLMT9l3Zb2WZv2hq2O6yvZ1Cs-T2gHY95y7ZA&m=AZOhzDmmNuBD_R9H2fm-CubVmid0OEJbXqk4G2cmzDs&s=rJpNgTfZDhDyGV5KksACkvbSnkVvobGfBQcxXiLWwT4&e=
>[1]
>https://urldefense.proofpoint.com/v2/url?u=https-3A__cwiki.apache.org_confluence_display_OODT_Mime-2Btype-2Bdetection-2Bwith-2Bthe-2BAutoDetectProductCrawler&d=CwIBaQ&c=CZZujK3G2KuXGEKnzq-Hrg&r=wndYZ4MLMT9l3Zb2WZv2hq2O6yvZ1Cs-T2gHY95y7ZA&m=AZOhzDmmNuBD_R9H2fm-CubVmid0OEJbXqk4G2cmzDs&s=V5fEGERshX3JHBTQXryhwoEZqhgarILk8WutEwICmGs&e= 
>
>On Mon, Apr 4, 2016 at 10:54 PM, Konstantinos Mavrommatis < kmavrommatis@celgene.com> wrote:
>
>> Hi,
>> It seems to be happening for a number of types of files that I have in 
>> the mimetypes.xml.
>> A few things are puzzling to me: this file which is a .gz file is not 
>> processed by the regular tika mimetypes which contains the gzip files 
>> A file that has no extension, which defaults to txt is passed to the 
>> MetExtractor.pl and processed.
>>
>> Any ideas I can find what are the preconditions that fail ? I tried to 
>> change the log level to DEBUG for all components but I did not get 
>> much more information. This must be something that changed in the OODT 
>> releases
>> >0.6 but could not find anything relevant in the release notes.
>> I also noticed in the documentation  of the AutoDecectProductCrawler 
>> that it uses the file met-extr-preconditions.xml which I could not 
>> find anywhere in the deployed OODT or the src directories. Could that 
>> be a reason for the problem I observe?
>>
>> Thanks
>> K
>>
>> -----Original Message-----
>> From: Lewis John Mcgibbney [mailto:lewis.mcgibbney@gmail.com]
>> Sent: Monday, April 04, 2016 3:24 PM
>> To: dev@oodt.apache.org
>> Subject: Re: Transition from OODT 0.6 to 0.12 cannot find extractor 
>> specifications
>>
>> Hi Konstantinos,
>> It appears to be happening with a tar.gz file as well right?
>>
>> WARNING: No extractor specs specified for 
>> /mnt/celgene.rnd.combio.mmgp.external/TestSeqData/RNA-Seq/RawData/fast
>> q/cas-crawler-04-02-16.log.gz
>>
>> I wonder if it is the file names... However I would be extremely 
>> surprised as I've seen some much more verbose file naming.
>> Lewis
>>
>> On Saturday, April 2, 2016, Konstantinos Mavrommatis < 
>> kmavrommatis@celgene.com> wrote:
>>
>> > Hi,
>> > I am trying to replicate a fully functional service that I had setup 
>> > long time ago using OODT 0.6 but I am having the following problem 
>> > that does not allow me to ingest files. When I try to ingest files 
>> > with the extension fastq.gz I get the line:
>> > WARNING: No extractor specs specified for 
>> > /mnt/celgene.rnd.combio.mmgp.external/TestSeqData/RNA-Seq/RawData/fa
>> > st q/E837642_R1.fastq.gz Apr 02, 2016 10:12:14 PM 
>> > org.apache.oodt.cas.crawl.ProductCrawler
>> > handleFile
>> > And of course the file is not ingested. This process works without 
>> > problem with OODT 0.6 on a different server.
>> >
>> > The crawler command I am running is:
>> > ./crawler_launcher \
>> > --operation \
>> > --launchAutoCrawler \
>> > --productPath $FILEPATH \
>> > --filemgrUrl $OODT_FILEMGR_URL \
>> > --clientTransferer
>> > org.apache.oodt.cas.filemgr.datatransfer.InPlaceDataTransferFactory 
>> > \ --mimeExtractorRepo ../policy/mime-extractor-map.xml \ --noRecur \ 
>> > --crawlForDirs 2>&1
>> >
>> >
>> >
>> > I have setup OODT 0.12 on a server which runs FM listening to port 9000.
>> > From a client machine I have verified that I can use FM to ingest
>> products.
>> > I am now trying to use crawler to crawl and ingest all files in a 
>> > directory. Since I have non standard MIME types in these directories 
>> > I have done the following:
>> > 1. Added my own mime types in policy/mimetypes.xml eg
>> >   <mime-type type="text/fastq">
>> >                 <glob pattern="*.fastq"/>
>> >                 <glob pattern="*.fastq.gz"/>
>> >                 <glob pattern="*.fastq.bz"/>
>> >                 <glob pattern="*.fastq.bz2"/>
>> >                 <glob pattern="*.fastq.bzip"/>
>> >                 <glob pattern="*.fq"/>
>> >                 <glob pattern="*.fq.gz"/>
>> >                 <glob pattern="*.fq.bz"/>
>> >                 <glob pattern="*.fq.bz2"/>
>> >                 <glob pattern="*.fq.bzip"/>
>> >         </mime-type>
>> > 2. created the file policy/mime-extractor-map.xml
>> >
>> >         <mime type="text/fastq">
>> >                 <extractor
>> > class="org.apache.oodt.cas.metadata.extractors.ExternMetExtractor">
>> >                         <config
>> > file="/apache-oodt/crawler/bin/fastq.config"/>
>> >                         <preCondComparators>
>> >                                 <preCondComparator 
>> > id="CheckThatDataFileSizeIsGreaterThanZero"/>
>> >                         </preCondComparators>
>> >                 </extractor>
>> >         </mime>
>> >
>> > 3. created the file fastq.config
>> > <?xml version="1.0" encoding="UTF-8"?> <cas:externextractor 
>> > xmlns:cas="https://urldefense.proofpoint.com/v2/url?u=http-3A__oodt.jpl.nasa.gov_1.0_cas&d=CwIBaQ&c=CZZujK3G2KuXGEKnzq-Hrg&r=wndYZ4MLMT9l3Zb2WZv2hq2O6yvZ1Cs-T2gHY95y7ZA&m=AZOhzDmmNuBD_R9H2fm-CubVmid0OEJbXqk4G2cmzDs&s=FvkBYgoM8RnUm2ITaMjYb1s1sa9YtHvNL4c1M_KF06w&e= ">
>> >   <exec workingDir="">
>> >
>> >
>> <extractorBinPath>/apache-oodt/crawler/bin/MetExtractorNGS.pl</extract
>> orBinPath>
>> >       <args>
>> >          <arg isDataFile="true"></arg>
>> >         <arg>fastq</arg>
>> >       </args>
>> >    </exec>
>> > </cas:externextractor>
>> >
>> >
>> >
>> > The MetExtractorNGS.pl is a small perl script that opens the file to 
>> > be ingested, gets some information and stores it in the .met file 
>> > that corresponds to the file to be ingested and have manually 
>> > verified that works as expected producing the correct met file.
>> >
>> > What am I missing here? Any ideas comments suggestions will be 
>> > greatly appreciated.
>> > Thanks in advance for any help
>> > Kostas
>> >
>> >
>> >
>> > PS1 The full output from running the crawler command follows:
>> >
>> >
>> > Setting property 'StdProductCrawler.filemgrUrl'
>> > Setting property 'MetExtractorProductCrawler.filemgrUrl'
>> > Setting property 'AutoDetectProductCrawler.filemgrUrl'
>> > Setting property 'StdProductCrawler.clientTransferer'
>> > Setting property 'MetExtractorProductCrawler.clientTransferer'
>> > Setting property 'AutoDetectProductCrawler.clientTransferer'
>> > Setting property 'StdProductCrawler.noRecur'
>> > Setting property 'MetExtractorProductCrawler.noRecur'
>> > Setting property 'AutoDetectProductCrawler.noRecur'
>> > Setting property 'AutoDetectProductCrawler.mimeExtractorRepo'
>> > Setting property 'StdProductCrawler.productPath'
>> > Setting property 'MetExtractorProductCrawler.productPath'
>> > Setting property 'AutoDetectProductCrawler.productPath'
>> > Apr 02, 2016 10:12:13 PM
>> > org.springframework.beans.factory.config.PropertyOverrideConfigurer
>> > processKey
>> > FINE: Property 'AutoDetectProductCrawler.noRecur' set to value 
>> > [true] Apr 02, 2016 10:12:13 PM 
>> > org.springframework.beans.factory.config.PropertyOverrideConfigurer
>> > processKey
>> > FINE: Property 'StdProductCrawler.productPath' set to value 
>> > [/mnt/celgene.rnd.combio.mmgp.external/TestSeqData/RNA-Seq/RawData/f
>> > as
>> > tq]
>> > Apr 02, 2016 10:12:13 PM
>> > org.springframework.beans.factory.config.PropertyOverrideConfigurer
>> > processKey
>> > FINE: Property 'MetExtractorProductCrawler.noRecur' set to value 
>> > [true] Apr 02, 2016 10:12:13 PM 
>> > org.springframework.beans.factory.config.PropertyOverrideConfigurer
>> > processKey
>> > FINE: Property 'AutoDetectProductCrawler.mimeExtractorRepo' set to 
>> > value [../policy/mime-extractor-map.xml]
>> > Apr 02, 2016 10:12:13 PM
>> > org.springframework.beans.factory.config.PropertyOverrideConfigurer
>> > processKey
>> > FINE: Property 'MetExtractorProductCrawler.clientTransferer' set to 
>> > value 
>> > [org.apache.oodt.cas.filemgr.datatransfer.InPlaceDataTransferFactory
>> > ]
>> > Apr 02, 2016 10:12:13 PM
>> > org.springframework.beans.factory.config.PropertyOverrideConfigurer
>> > processKey
>> > FINE: Property 'AutoDetectProductCrawler.filemgrUrl' set to value [
>> > https://urldefense.proofpoint.com/v2/url?u=http-3A__192.168.8.44-3A9
>> > 00
>> > 0&d=CwIBaQ&c=CZZujK3G2KuXGEKnzq-Hrg&r=wndYZ4MLMT9l3Zb2WZv2hq2O6yvZ1C
>> > s- 
>> > T2gHY95y7ZA&m=Qaz0eKz7FHe35NMF43A17ey59ANhAqJD5ZfwZQC0VRo&s=OvpwZVR1
>> > Xq gKclL83VXAWh__c7nz87xK_nS-O7hIXqc&e= ] Apr 02, 2016 10:12:13 PM 
>> > org.springframework.beans.factory.config.PropertyOverrideConfigurer
>> > processKey
>> > FINE: Property 'AutoDetectProductCrawler.clientTransferer' set to 
>> > value 
>> > [org.apache.oodt.cas.filemgr.datatransfer.InPlaceDataTransferFactory
>> > ]
>> > Apr 02, 2016 10:12:13 PM
>> > org.springframework.beans.factory.config.PropertyOverrideConfigurer
>> > processKey
>> > FINE: Property 'StdProductCrawler.noRecur' set to value [true] Apr 
>> > 02,
>> > 2016 10:12:13 PM
>> > org.springframework.beans.factory.config.PropertyOverrideConfigurer
>> > processKey
>> > FINE: Property 'StdProductCrawler.filemgrUrl' set to value [
>> > https://urldefense.proofpoint.com/v2/url?u=http-3A__192.168.8.44-3A9
>> > 00
>> > 0&d=CwIBaQ&c=CZZujK3G2KuXGEKnzq-Hrg&r=wndYZ4MLMT9l3Zb2WZv2hq2O6yvZ1C
>> > s- 
>> > T2gHY95y7ZA&m=Qaz0eKz7FHe35NMF43A17ey59ANhAqJD5ZfwZQC0VRo&s=OvpwZVR1
>> > Xq gKclL83VXAWh__c7nz87xK_nS-O7hIXqc&e= ] Apr 02, 2016 10:12:13 PM 
>> > org.springframework.beans.factory.config.PropertyOverrideConfigurer
>> > processKey
>> > FINE: Property 'AutoDetectProductCrawler.productPath' set to value 
>> > [/mnt/celgene.rnd.combio.mmgp.external/TestSeqData/RNA-Seq/RawData/f
>> > as
>> > tq]
>> > Apr 02, 2016 10:12:13 PM
>> > org.springframework.beans.factory.config.PropertyOverrideConfigurer
>> > processKey
>> > FINE: Property 'MetExtractorProductCrawler.filemgrUrl' set to value 
>> > [
>> > https://urldefense.proofpoint.com/v2/url?u=http-3A__192.168.8.44-3A9
>> > 00
>> > 0&d=CwIBaQ&c=CZZujK3G2KuXGEKnzq-Hrg&r=wndYZ4MLMT9l3Zb2WZv2hq2O6yvZ1C
>> > s- 
>> > T2gHY95y7ZA&m=Qaz0eKz7FHe35NMF43A17ey59ANhAqJD5ZfwZQC0VRo&s=OvpwZVR1
>> > Xq gKclL83VXAWh__c7nz87xK_nS-O7hIXqc&e= ] Apr 02, 2016 10:12:13 PM 
>> > org.springframework.beans.factory.config.PropertyOverrideConfigurer
>> > processKey
>> > FINE: Property 'StdProductCrawler.clientTransferer' set to value 
>> > [org.apache.oodt.cas.filemgr.datatransfer.InPlaceDataTransferFactory
>> > ]
>> > Apr 02, 2016 10:12:13 PM
>> > org.springframework.beans.factory.config.PropertyOverrideConfigurer
>> > processKey
>> > FINE: Property 'MetExtractorProductCrawler.productPath' set to value 
>> > [/mnt/celgene.rnd.combio.mmgp.external/TestSeqData/RNA-Seq/RawData/f
>> > as tq] Apr 02, 2016 10:12:13 PM 
>> > org.apache.oodt.cas.crawl.ProductCrawler
>> > crawl
>> > INFO: Crawling
>> > /mnt/celgene.rnd.combio.mmgp.external/TestSeqData/RNA-Seq/RawData/fa
>> > st q Apr 02, 2016 10:12:13 PM 
>> > org.apache.oodt.cas.crawl.ProductCrawler
>> > handleFile
>> > INFO: Handling file
>> > /mnt/celgene.rnd.combio.mmgp.external/TestSeqData/RNA-Seq/RawData/fa
>> > st
>> > q/E837642_R1.fastq.gz
>> > Apr 02, 2016 10:12:14 PM
>> > org.apache.oodt.cas.crawl.AutoDetectProductCrawler 
>> > passesPreconditions
>> > WARNING: No extractor specs specified for 
>> > /mnt/celgene.rnd.combio.mmgp.external/TestSeqData/RNA-Seq/RawData/fa
>> > st q/E837642_R1.fastq.gz Apr 02, 2016 10:12:14 PM 
>> > org.apache.oodt.cas.crawl.ProductCrawler
>> > handleFile
>> > WARNING: Failed to pass preconditions for ingest of product:
>> > [/mnt/celgene.rnd.combio.mmgp.external/TestSeqData/RNA-Seq/RawData/f
>> > as tq/E837642_R1.fastq.gz] Apr 02, 2016 10:12:14 PM 
>> > org.apache.oodt.cas.crawl.ProductCrawler
>> > handleFile
>> > INFO: Handling file
>> > /mnt/celgene.rnd.combio.mmgp.external/TestSeqData/RNA-Seq/RawData/fa
>> > st
>> > q/E837642_R1.fastq.gz.met
>> > Apr 02, 2016 10:12:14 PM
>> > org.apache.oodt.cas.metadata.preconditions.PreCondEvalUtils eval
>> > INFO: Passed precondition comparator id 
>> > CheckThatDataFileSizeIsGreaterThanZero
>> > Apr 02, 2016 10:12:14 PM
>> > org.apache.oodt.cas.metadata.extractors.ExternMetExtractor
>> > extrMetadata
>> > INFO: Generating met file for product file:
>> > [/mnt/celgene.rnd.combio.mmgp.external/TestSeqData/RNA-Seq/RawData/f
>> > as
>> > tq/E837642_R1.fastq.gz.met]
>> > Apr 02, 2016 10:12:14 PM
>> > org.apache.oodt.cas.metadata.extractors.ExternMetExtractor
>> > extrMetadata
>> > INFO: Executing command line:
>> > [/celgene/software/apache-oodt/crawler/bin/MetExtractorNGS.pl
>> > /mnt/celgene.rnd.combio.mmgp.external/TestSeqData/RNA-Seq/RawData/fa
>> > st
>> > q/E837642_R1.fastq.gz.met
>> > text ] with workingDir:
>> > [/mnt/celgene.rnd.combio.mmgp.external/TestSeqData/RNA-Seq/RawData/f
>> > as
>> > tq]
>> > to extract metadata
>> > OUTPUT: [WARN : MetExtractorNGS - 2016/04/02 22:12:15] - Input file 
>> > /mnt/celgene.rnd.combio.mmgp.external/TestSeqData/RNA-Seq/RawData/fa
>> > st q/E837642_R1.fastq.gz.met will be ignored. .met files are not 
>> > processed !
>> > Apr 02, 2016 10:12:15 PM org.apache.oodt.cas.crawl.ProductCrawler
>> > handleFile
>> > SEVERE: Failed to get metadata for product : Met extractor failed to 
>> > create metadata file
>> > org.apache.oodt.cas.metadata.exceptions.MetExtractionException: Met 
>> > extractor failed to create metadata file
>> >         at
>> >
>> org.apache.oodt.cas.metadata.extractors.ExternMetExtractor.extrMetadat
>> a(ExternMetExtractor.java:120)
>> >         at
>> >
>> org.apache.oodt.cas.metadata.AbstractMetExtractor.extractMetadata(Abst
>> ractMetExtractor.java:74)
>> >         at
>> >
>> org.apache.oodt.cas.crawl.AutoDetectProductCrawler.getMetadataForProdu
>> ct(AutoDetectProductCrawler.java:84)
>> >         at
>> >
>> org.apache.oodt.cas.crawl.ProductCrawler.handleFile(ProductCrawler.jav
>> a:136)
>> >         at
>> > org.apache.oodt.cas.crawl.ProductCrawler.crawl(ProductCrawler.java:104)
>> >         at
>> > org.apache.oodt.cas.crawl.ProductCrawler.crawl(ProductCrawler.java:74)
>> >         at
>> >
>> org.apache.oodt.cas.crawl.cli.action.CrawlerLauncherCliAction.execute(
>> CrawlerLauncherCliAction.java:58)
>> >         at
>> > org.apache.oodt.cas.cli.CmdLineUtility.execute(CmdLineUtility.java:331)
>> >         at
>> > org.apache.oodt.cas.cli.CmdLineUtility.run(CmdLineUtility.java:188)
>> >         at
>> > org.apache.oodt.cas.crawl.CrawlerLauncher.main(CrawlerLauncher.java:
>> > 36
>> > )
>> >
>> > Apr 02, 2016 10:12:15 PM org.apache.oodt.cas.crawl.ProductCrawler
>> > handleFile
>> > INFO: Handling file
>> > /mnt/celgene.rnd.combio.mmgp.external/TestSeqData/RNA-Seq/RawData/fa
>> > st
>> > q/E837642_R2.fastq.gz
>> > Apr 02, 2016 10:12:15 PM
>> > org.apache.oodt.cas.crawl.AutoDetectProductCrawler 
>> > passesPreconditions
>> > WARNING: No extractor specs specified for 
>> > /mnt/celgene.rnd.combio.mmgp.external/TestSeqData/RNA-Seq/RawData/fa
>> > st q/E837642_R2.fastq.gz Apr 02, 2016 10:12:15 PM 
>> > org.apache.oodt.cas.crawl.ProductCrawler
>> > handleFile
>> > WARNING: Failed to pass preconditions for ingest of product:
>> > [/mnt/celgene.rnd.combio.mmgp.external/TestSeqData/RNA-Seq/RawData/f
>> > as tq/E837642_R2.fastq.gz] Apr 02, 2016 10:12:15 PM 
>> > org.apache.oodt.cas.crawl.ProductCrawler
>> > handleFile
>> > INFO: Handling file
>> > /mnt/celgene.rnd.combio.mmgp.external/TestSeqData/RNA-Seq/RawData/fa
>> > st
>> > q/E837642_R2.fastq.gz.met
>> > Apr 02, 2016 10:12:15 PM
>> > org.apache.oodt.cas.metadata.preconditions.PreCondEvalUtils eval
>> > INFO: Passed precondition comparator id 
>> > CheckThatDataFileSizeIsGreaterThanZero
>> > Apr 02, 2016 10:12:16 PM
>> > org.apache.oodt.cas.metadata.extractors.ExternMetExtractor
>> > extrMetadata
>> > INFO: Generating met file for product file:
>> > [/mnt/celgene.rnd.combio.mmgp.external/TestSeqData/RNA-Seq/RawData/f
>> > as
>> > tq/E837642_R2.fastq.gz.met]
>> > Apr 02, 2016 10:12:16 PM
>> > org.apache.oodt.cas.metadata.extractors.ExternMetExtractor
>> > extrMetadata
>> > INFO: Executing command line:
>> > [/celgene/software/apache-oodt/crawler/bin/MetExtractorNGS.pl
>> > /mnt/celgene.rnd.combio.mmgp.external/TestSeqData/RNA-Seq/RawData/fa
>> > st
>> > q/E837642_R2.fastq.gz.met
>> > text ] with workingDir:
>> > [/mnt/celgene.rnd.combio.mmgp.external/TestSeqData/RNA-Seq/RawData/f
>> > as
>> > tq]
>> > to extract metadata
>> > OUTPUT: [WARN : MetExtractorNGS - 2016/04/02 22:12:16] - Input file 
>> > /mnt/celgene.rnd.combio.mmgp.external/TestSeqData/RNA-Seq/RawData/fa
>> > st q/E837642_R2.fastq.gz.met will be ignored. .met files are not 
>> > processed !
>> > Apr 02, 2016 10:12:17 PM org.apache.oodt.cas.crawl.ProductCrawler
>> > handleFile
>> > SEVERE: Failed to get metadata for product : Met extractor failed to 
>> > create metadata file
>> > org.apache.oodt.cas.metadata.exceptions.MetExtractionException: Met 
>> > extractor failed to create metadata file
>> >         at
>> >
>> org.apache.oodt.cas.metadata.extractors.ExternMetExtractor.extrMetadat
>> a(ExternMetExtractor.java:120)
>> >         at
>> >
>> org.apache.oodt.cas.metadata.AbstractMetExtractor.extractMetadata(Abst
>> ractMetExtractor.java:74)
>> >         at
>> >
>> org.apache.oodt.cas.crawl.AutoDetectProductCrawler.getMetadataForProdu
>> ct(AutoDetectProductCrawler.java:84)
>> >         at
>> >
>> org.apache.oodt.cas.crawl.ProductCrawler.handleFile(ProductCrawler.jav
>> a:136)
>> >         at
>> > org.apache.oodt.cas.crawl.ProductCrawler.crawl(ProductCrawler.java:104)
>> >         at
>> > org.apache.oodt.cas.crawl.ProductCrawler.crawl(ProductCrawler.java:74)
>> >         at
>> >
>> org.apache.oodt.cas.crawl.cli.action.CrawlerLauncherCliAction.execute(
>> CrawlerLauncherCliAction.java:58)
>> >         at
>> > org.apache.oodt.cas.cli.CmdLineUtility.execute(CmdLineUtility.java:331)
>> >         at
>> > org.apache.oodt.cas.cli.CmdLineUtility.run(CmdLineUtility.java:188)
>> >         at
>> > org.apache.oodt.cas.crawl.CrawlerLauncher.main(CrawlerLauncher.java:
>> > 36
>> > )
>> >
>> > Apr 02, 2016 10:12:17 PM org.apache.oodt.cas.crawl.ProductCrawler
>> > handleFile
>> > INFO: Handling file
>> > /mnt/celgene.rnd.combio.mmgp.external/TestSeqData/RNA-Seq/RawData/fa
>> > st
>> > q/cas-crawler-04-02-16.log.gz
>> > Apr 02, 2016 10:12:17 PM
>> > org.apache.oodt.cas.crawl.AutoDetectProductCrawler 
>> > passesPreconditions
>> > WARNING: No extractor specs specified for 
>> > /mnt/celgene.rnd.combio.mmgp.external/TestSeqData/RNA-Seq/RawData/fa
>> > st q/cas-crawler-04-02-16.log.gz Apr 02, 2016 10:12:17 PM 
>> > org.apache.oodt.cas.crawl.ProductCrawler
>> > handleFile
>> > WARNING: Failed to pass preconditions for ingest of product:
>> > [/mnt/celgene.rnd.combio.mmgp.external/TestSeqData/RNA-Seq/RawData/f
>> > as tq/cas-crawler-04-02-16.log.gz] Apr 02, 2016 10:12:17 PM 
>> > org.apache.oodt.cas.crawl.ProductCrawler
>> > handleFile
>> > INFO: Handling file
>> > /mnt/celgene.rnd.combio.mmgp.external/TestSeqData/RNA-Seq/RawData/fa
>> > st
>> > q/cas-crawler-04-02-16.tar.gz
>> > Apr 02, 2016 10:12:17 PM
>> > org.apache.oodt.cas.crawl.AutoDetectProductCrawler 
>> > passesPreconditions
>> > WARNING: No extractor specs specified for 
>> > /mnt/celgene.rnd.combio.mmgp.external/TestSeqData/RNA-Seq/RawData/fa
>> > st q/cas-crawler-04-02-16.tar.gz Apr 02, 2016 10:12:17 PM 
>> > org.apache.oodt.cas.crawl.ProductCrawler
>> > handleFile
>> > WARNING: Failed to pass preconditions for ingest of product:
>> > [/mnt/celgene.rnd.combio.mmgp.external/TestSeqData/RNA-Seq/RawData/f
>> > as tq/cas-crawler-04-02-16.tar.gz] Apr 02, 2016 10:12:17 PM 
>> > org.apache.oodt.cas.crawl.ProductCrawler
>> > handleFile
>> > INFO: Handling file
>> > /mnt/celgene.rnd.combio.mmgp.external/TestSeqData/RNA-Seq/RawData/fa
>> > st 
>> > q/cas-crawler-mnt-celgene.rnd.combio.mmgp.external-TestSeqData-RNA-S
>> > eq
>> > -RawData-fastq-04-02-16.tar.gz
>> > Apr 02, 2016 10:12:17 PM
>> > org.apache.oodt.cas.crawl.AutoDetectProductCrawler 
>> > passesPreconditions
>> > WARNING: No extractor specs specified for 
>> > /mnt/celgene.rnd.combio.mmgp.external/TestSeqData/RNA-Seq/RawData/fa
>> > st 
>> > q/cas-crawler-mnt-celgene.rnd.combio.mmgp.external-TestSeqData-RNA-S
>> > eq -RawData-fastq-04-02-16.tar.gz Apr 02, 2016 10:12:17 PM 
>> > org.apache.oodt.cas.crawl.ProductCrawler
>> > handleFile
>> > WARNING: Failed to pass preconditions for ingest of product:
>> > [/mnt/celgene.rnd.combio.mmgp.external/TestSeqData/RNA-Seq/RawData/f
>> > as 
>> > tq/cas-crawler-mnt-celgene.rnd.combio.mmgp.external-TestSeqData-RNA-
>> > Se q-RawData-fastq-04-02-16.tar.gz] Apr 02, 2016 10:12:17 PM 
>> > org.apache.oodt.cas.crawl.ProductCrawler
>> > handleFile
>> > INFO: Handling file
>> > /mnt/celgene.rnd.combio.mmgp.external/TestSeqData/RNA-Seq/RawData/fa
>> > st
>> > q/test
>> > Apr 02, 2016 10:12:17 PM
>> > org.apache.oodt.cas.metadata.preconditions.PreCondEvalUtils eval
>> > INFO: Passed precondition comparator id 
>> > CheckThatDataFileSizeIsGreaterThanZero
>> > Apr 02, 2016 10:12:17 PM
>> > org.apache.oodt.cas.metadata.extractors.ExternMetExtractor
>> > extrMetadata
>> > INFO: Generating met file for product file:
>> > [/mnt/celgene.rnd.combio.mmgp.external/TestSeqData/RNA-Seq/RawData/f
>> > as
>> > tq/test]
>> > Apr 02, 2016 10:12:17 PM
>> > org.apache.oodt.cas.metadata.extractors.ExternMetExtractor
>> > extrMetadata
>> > INFO: Executing command line:
>> > [/celgene/software/apache-oodt/crawler/bin/MetExtractorNGS.pl
>> > /mnt/celgene.rnd.combio.mmgp.external/TestSeqData/RNA-Seq/RawData/fa
>> > st
>> > q/test
>> > text ] with workingDir:
>> > [/mnt/celgene.rnd.combio.mmgp.external/TestSeqData/RNA-Seq/RawData/f
>> > as
>> > tq]
>> > to extract metadata
>> > OUTPUT: [DEBUG : MetExtractorNGS - 2016/04/02 22:12:18] - Accessing 
>> > NGS server at
>> > https://urldefense.proofpoint.com/v2/url?u=http-3A__192.168.8.44-3A8
>> > 08 
>> > 2_RPC2&d=CwIBaQ&c=CZZujK3G2KuXGEKnzq-Hrg&r=wndYZ4MLMT9l3Zb2WZv2hq2O6
>> > yv
>> > Z1Cs-T2gHY95y7ZA&m=Qaz0eKz7FHe35NMF43A17ey59ANhAqJD5ZfwZQC0VRo&s=tSc
>> > i2 Q1bJj0cQnBHjjOwtZjjx9uNMoN5Bi-ABG0Q7Y4&e=
>> > OUTPUT: [DEBUG : metadataPrepare - 2016/04/02 22:12:18] - addMetadata:
>> > metadata for file_host are not in array format.Converting..
>> > OUTPUT: [DEBUG : metadataPrepare - 2016/04/02 22:12:18] - addMetadata:
>> > adding key/value [file_host]/[ip-192-168-8-66]
>> > OUTPUT: [DEBUG : metadataPrepare - 2016/04/02 22:12:18] - addMetadata:
>> > metadata for ProductType are not in array format.Converting..
>> > OUTPUT: [DEBUG : metadataPrepare - 2016/04/02 22:12:18] - addMetadata:
>> > adding key/value [ProductType]/[GenericFile]
>> > OUTPUT: [DEBUG : metadataPrepare - 2016/04/02 22:12:18] - addMetadata:
>> > metadata for ingest_user are not in array format.Converting..
>> > OUTPUT: [DEBUG : metadataPrepare - 2016/04/02 22:12:18] - addMetadata:
>> > adding key/value [ingest_user]/[kmavrommatis]
>> > OUTPUT: [DEBUG : MetExtractorNGS - 2016/04/02 22:12:18] - The file 
>> > path is ARRAY(0x22d3f48). It will be added under the FilePath 
>> > metadata field
>> > OUTPUT: [DEBUG : metadataPrepare - 2016/04/02 22:12:18] - addMetadata:
>> > metadata for FilePath are not in array format.Converting..
>> > OUTPUT: [DEBUG : metadataPrepare - 2016/04/02 22:12:18] - addMetadata:
>> > adding key/value
>> > [FilePath]/[/mnt/celgene.rnd.combio.mmgp.external/TestSeqData/RNA-Se
>> > q/
>> > RawData/fastq/test]
>> > OUTPUT: [DEBUG : MetExtractorNGS - 2016/04/02 22:12:18] - This file 
>> > is of type text
>> > OUTPUT: [DEBUG : MetExtractorNGS - 2016/04/02 22:12:18] - Storing 
>> > metadata in file 
>> > /mnt/celgene.rnd.combio.mmgp.external/TestSeqData/RNA-Seq/RawData/fa
>> > st
>> > q/test.met
>> > OUTPUT: [DEBUG : metadataPrepare - 2016/04/02 22:12:18] - Changing 
>> > /mnt/celgene.rnd.combio.mmgp.external/TestSeqData/RNA-Seq/RawData/fa
>> > st
>> > q/test
>> > to
>> > OUTPUT: [DEBUG : metadataPrepare - 2016/04/02 22:12:18] - 
>> > /mnt/celgene.rnd.combio.mmgp.external/TestSeqData/RNA-Seq/RawData/fa
>> > st
>> > q/test
>> > OUTPUT: [DEBUG : metadataPrepare - 2016/04/02 22:12:18] - Changing 
>> > kmavrommatis to
>> > OUTPUT: [DEBUG : metadataPrepare - 2016/04/02 22:12:18] - 
>> > kmavrommatis
>> > OUTPUT: [DEBUG : metadataPrepare - 2016/04/02 22:12:18] - Changing 
>> > GenericFile to
>> > OUTPUT: [DEBUG : metadataPrepare - 2016/04/02 22:12:18] - 
>> > GenericFile
>> > OUTPUT: [DEBUG : metadataPrepare - 2016/04/02 22:12:18] - Changing
>> > ip-192-168-8-66 to
>> > OUTPUT: [DEBUG : metadataPrepare - 2016/04/02 22:12:18] -
>> > ip-192-168-8-66
>> > OUTPUT: [DEBUG : MetExtractorNGS - 2016/04/02 22:12:19] - Process 
>> > finished SUCCESSFULLY Apr 02, 2016 10:12:19 PM 
>> > org.apache.oodt.cas.metadata.extractors.ExternMetExtractor
>> > extrMetadata
>> > INFO: Met extraction successful for product file:
>> > [/mnt/celgene.rnd.combio.mmgp.external/TestSeqData/RNA-Seq/RawData/f
>> > as
>> > tq/test] Apr 02, 2016 10:12:19 PM
>> > org.apache.oodt.cas.crawl.ProductCrawler ingest
>> > INFO: ProductCrawler: Ready to ingest product:
>> >
>> [/mnt/celgene.rnd.combio.mmgp.external/TestSeqData/RNA-Seq/RawData/fastq/test]:
>> > ProductType: [GenericFile]
>> > Apr 02, 2016 10:12:19 PM
>> > org.apache.oodt.cas.filemgr.ingest.StdIngester
>> > setFileManager
>> > INFO: StdIngester: connected to file manager:
>> > [https://urldefense.proofpoint.com/v2/url?u=http-3A__192.168.8.44-3A
>> > 90 
>> > 00&d=CwIBaQ&c=CZZujK3G2KuXGEKnzq-Hrg&r=wndYZ4MLMT9l3Zb2WZv2hq2O6yvZ1
>> > Cs 
>> > -T2gHY95y7ZA&m=Qaz0eKz7FHe35NMF43A17ey59ANhAqJD5ZfwZQC0VRo&s=OvpwZVR
>> > 1X qgKclL83VXAWh__c7nz87xK_nS-O7hIXqc&e= ] Apr 02, 2016 10:12:19 PM 
>> > org.apache.oodt.cas.filemgr.datatransfer.InPlaceDataTransferer
>> > setFileManagerUrl
>> > INFO: In Place Data Transfer to:
>> > [https://urldefense.proofpoint.com/v2/url?u=http-3A__192.168.8.44-3A
>> > 90 
>> > 00&d=CwIBaQ&c=CZZujK3G2KuXGEKnzq-Hrg&r=wndYZ4MLMT9l3Zb2WZv2hq2O6yvZ1
>> > Cs 
>> > -T2gHY95y7ZA&m=Qaz0eKz7FHe35NMF43A17ey59ANhAqJD5ZfwZQC0VRo&s=OvpwZVR
>> > 1X qgKclL83VXAWh__c7nz87xK_nS-O7hIXqc&e= ] enabled Apr 02, 2016 
>> > 10:12:19 PM org.apache.oodt.cas.filemgr.ingest.StdIngester
>> > ingest
>> > INFO: StdIngester: ingesting product: ProductName: [test]: ProductType:
>> > [GenericFile]: FileLocation:
>> > [/mnt/celgene.rnd.combio.mmgp.external/TestSeqData/RNA-Seq/RawData/f
>> > as
>> > tq/]
>> > Apr 02, 2016 10:12:19 PM
>> > org.apache.oodt.cas.filemgr.system.XmlRpcFileManagerClient
>> > ingestProduct
>> > FINEST: File Manager Client: clientTransfer enabled: transfering 
>> > product [test] Apr 02, 2016 10:12:19 PM 
>> > org.apache.oodt.cas.filemgr.versioning.VersioningUtils
>> > createBasicDataStoreRefsFlat
>> > FINE: VersioningUtils: Generated data store ref:
>> > file:/opt/oodt/data/archive/test/test from origRef:
>> > file:/mnt/celgene.rnd.combio.mmgp.external/TestSeqData/RNA-Seq/RawDa
>> > ta /fastq/test Apr 02, 2016 10:12:19 PM 
>> > org.apache.oodt.cas.crawl.ProductCrawler ingest
>> > INFO: Successfully ingested product:
>> >
>> [/mnt/celgene.rnd.combio.mmgp.external/TestSeqData/RNA-Seq/RawData/fastq/test]:
>> > product id: 4c8de2da-265a-48c4-8380-3f1103dfecfc
>> > Apr 02, 2016 10:12:19 PM org.apache.oodt.cas.crawl.ProductCrawler
>> > handleFile
>> > INFO: Successful ingest of product:
>> > [/mnt/celgene.rnd.combio.mmgp.external/TestSeqData/RNA-Seq/RawData/f
>> > as
>> > tq/test]
>> >
>> >
>> > *********************************************************
>> > THIS ELECTRONIC MAIL MESSAGE AND ANY ATTACHMENT IS CONFIDENTIAL AND 
>> > MAY CONTAIN LEGALLY PRIVILEGED INFORMATION INTENDED ONLY FOR THE USE 
>> > OF THE INDIVIDUAL OR INDIVIDUALS NAMED ABOVE.
>> > If the reader is not the intended recipient, or the employee or 
>> > agent responsible to deliver it to the intended recipient, you are 
>> > hereby notified that any dissemination, distribution or copying of 
>> > this communication is strictly prohibited. If you have received this 
>> > communication in error, please reply to the sender to notify us of 
>> > the error and delete the original message. Thank You.
>> >
>>
>>
>> --
>> *Lewis*
>>
>> *********************************************************
>> THIS ELECTRONIC MAIL MESSAGE AND ANY ATTACHMENT IS CONFIDENTIAL AND 
>> MAY CONTAIN LEGALLY PRIVILEGED INFORMATION INTENDED ONLY FOR THE USE 
>> OF THE INDIVIDUAL OR INDIVIDUALS NAMED ABOVE.
>> If the reader is not the intended recipient, or the employee or agent 
>> responsible to deliver it to the intended recipient, you are hereby 
>> notified that any dissemination, distribution or copying of this 
>> communication is strictly prohibited. If you have received this 
>> communication in error, please reply to the sender to notify us of the 
>> error and delete the original message. Thank You.
>>
>
>
>
>--
>*Lewis*
>*********************************************************
>THIS ELECTRONIC MAIL MESSAGE AND ANY ATTACHMENT IS
>CONFIDENTIAL AND MAY CONTAIN LEGALLY PRIVILEGED
>INFORMATION INTENDED ONLY FOR THE USE OF THE INDIVIDUAL
>OR INDIVIDUALS NAMED ABOVE.
>If the reader is not the intended recipient, or the
>employee or agent responsible to deliver it to the
>intended recipient, you are hereby notified that any
>dissemination, distribution or copying of this
>communication is strictly prohibited. If you have
>received this communication in error, please reply to the
>sender to notify us of the error and delete the original
>message. Thank You.


Re: Transition from OODT 0.6 to 0.12 cannot find extractor specifications

Posted by Chris Mattmann <ch...@gmail.com>.
*bravo Val* comprehensive and insightful.

Tom, or someone we need to get this on the wiki
in an FAQ..

—
Chris Mattmann
chris.mattmann@gmail.com







On 4/7/16, 9:16 AM, "Mallder, Valerie" <Va...@jhuapl.edu> wrote:

>Hi Konstantinos,
>
>This may be a long shot. But, I may have come up with a few things you could look at to try and solve your issue.  I am working with version 0.10, so the files and their locations that I am going to reference in this email pertain only to 0.10. So, keep that in mind if I reference a file that you can't find. Because there could have been a change made between 0.10 and 0.12 that I haven't looked at yet.  
>
>The first thing I notice is the filename you are using for your mime types 'mimetypes.xml'.  I know that the filename you use shouldn't make difference as long as all the references to the file are the same. But, there are many references to the mime type file throughout the system, and, depending on which original *.xml files you based your system on, it can be very easy to have one of those references set to something different than the others. 
>
>If you look in the filemgr/etc directory, the default name for the mime types file is 'mime-types.xml'.
>
>If you look in the filemgr/etc/filemgr.properties file, there is a property setting that implies the default filename is 'mime-types.xml' as in:
>
># location of Mime-Type repository
>org.apache.oodt.cas.filemgr.mime.type.repository=/path/to/mime-types.xml
>
>If you look in the example mime-extractor-map.xml file in the pge/etc/examples directory, the mime repository is set to 'mime-types.xml' as in:
>
><cas:mimetypemap xmlns:cas="http://oodt.jpl.nassa.gov/1.0/cas" magic="false" mimeRepo="mime-types.xml">
>
>If you look in the crawler/policy directory, there is a default mime types file named 'mimetypes.xml', but the default mime-extractor-map.xml file in that same directory sets the mime repository to 'path/to/tika-mimetypes/xml/file', as in:
>
><cas:mimetypemap xmlns:cas="http://oodt.jpl.nassa.gov/1.0/cas" magic="true or false" mimeRepo="path/to/tika-mimetypes/xml/file">
>
>In addition, if you download the source code for the 'metadata' component, and look in the metadata/src/main/java/org/apache/oodt/cas/metadata/util/MimeTypeUtils.java file, it sets the default name of the mime types file to 'tika-mimetypes.xml' as in this line of code: 
>
>public final static String MIME_FILE_RES_PATH = "tika-mimetypes.xml";
>
>
>So, the first thing you should do is make sure all of your references to your mime types file are the same.  There are several places ( or in several classes) where the MimeTypeUtil class is used, and you need to make sure that each instantiation of the class is using the same mime types file. 
>
>A quick search of the source code revealed that MimeTypeUtils is referenced in the following places:
>./crawler/src/main/java/org/apache/oodt/cas/crawl/typedetection/MimeExtractorRepo.java
>./pushpull/src/main/java/org/apache/oodt/cas/pushpull/retrievalsystem/FileRetrievalSystem.java
>./protocol/http/src/main/java/org/apache/oodt/cas/protocol/http/util/HttpUtils.java
>./metadata/src/main/java/org/apache/oodt/cas/metadata/util/MimeTypeUtils.java
>./metadata/src/main/java/org/apache/oodt/cas/metadata/preconditions/MimeTypeComparator.java
>./metadata/src/test/org/apache/oodt/cas/metadata/util/TestMimeTypeUtils.java
>
>For example, the MimeTypeComparator.java class has a method called setMimeTypeRepo to set the mime repository name, but there is no code in the system that actually calls MimeTypeComparator::setMimeTypeRepo, so, if you are using the MimeTypeComparator as one of your preconditions, then, MimeTypeUtils was instantiated with its default constructor which then sets its internal mime repository to MIME_FILE_RES_PATH shown above, which is probably not what you want because your custom mime type is not in that file. And then you can get the 'no extractor defined' error.  
>
>The second thing I noticed is how you are defining your custom mime types.
>
><mime-type type="text/fastq">
>    <glob pattern="*.fastq"/>
>    <glob pattern="*.fastq.gz"/>
>    <glob pattern="*.fastq.bz"/>
>    <glob pattern="*.fastq.bz2"/>
>    <glob pattern="*.fastq.bzip"/>
>    <glob pattern="*.fq"/>
>    <glob pattern="*.fq.gz"/>
>    <glob pattern="*.fq.bz"/>
>   <glob pattern="*.fq.bz2"/>
>   <glob pattern="*.fq.bzip"/>
></mime-type>
>
>I had to make a change to how I was defining my mime types. I don't think Tika will like the way you have defined your mime types.  For example, I have a mime type called "product/fei-ecsv" which are just text files named *.ecsv.  I had defined it like this:
>
><mime-type type="product/fei-ecsv">
>	<glob pattern="*.ecsv"/>
></mime-type>
>
>If I remember correctly, I think Tika ended up not being able to determine the mime type  and it defaulted to 'application/octet-stream' - for which I did not have an extractor defined, and so I got the 'no extractor defined' errors. So, in order to get Tika to recognize my new mime type, I had to add the 'sub-class-of' tag and change my definition to:
>
><mime-type type="product/fei-ecsv">
>  <sub-class-of type="text/plain"/>
>    <glob pattern="*.ecsv"/>
></mime-type>
>
>I also ran into a problem  when I tried to define a mime type for files that have an extension that was already defined in the mime types file, even if it was a two part extension that didn't actually exist in the file.  For example, I am a little worried you might run into problems with your patterns that end in .gz, .bz, .bz2 and .bzip even though they also have '.fq' and '.fastq' in the pattern.  You might have to split all of your patterns up into a few different mime types.  I hope that you won't have to.  But if you do, then I pretty sure these 4 types will work as far as Tika is concerned. But doing this might screw up how you have set up your "product types'.
>
><mime-type type="text/fastq">
>   <sub-class-of type="text/plain"/>
>     <glob pattern="*.fastq"/>
>     <glob pattern="*.fq "/>
></mime-type>
>
><mime-type type="text/fastq-gz">
>   <sub-class-of type="application/gzip"/>
>     <glob pattern="*.fastq.gz "/>
>     <glob pattern="*.fq.gz "/>
></mime-type>
>
><mime-type type="text/fastq-bz">
>   <sub-class-of type="application/x-bzip"/>
>    <glob pattern="*.fastq.bz"/>
>    <glob pattern="*.fastq.bzip"/>
>    <glob pattern="*.fq.bz"/>
>   <glob pattern="*.fq.bzip"/>
></mime-type>
>
><mime-type type="text/fastq-bz2">
>   <sub-class-of type="application/x-bzip2"/>
>    <glob pattern="*.fastq.bz2"/>
>   <glob pattern="*.fq.bz2"/>
></mime-type>
>
>
>I hope this helps!  Please let me now if yo have any questions.  I spent a huge amount of time debugging the 'no extractor found' error, so I have spent a huge amount of time upgrading to each new version from 0.6 to 0.10, so I'm hoping my struggles can help someone else :)
>
>Val
>
>
>
>Valerie A. Mallder
>New Horizons Deputy Mission System Engineer
>Johns Hopkins University/Applied Physics Laboratory
>
>
>> -----Original Message-----
>> From: Konstantinos Mavrommatis [mailto:kmavrommatis@celgene.com]
>> Sent: Wednesday, April 06, 2016 9:48 PM
>> To: dev@oodt.apache.org
>> Subject: RE: Transition from OODT 0.6 to 0.12 cannot find extractor specifications
>> 
>> I am giving up on this....
>> I had used [1] in the first place to setup oodt (v0.6 back then) my setup in the new
>> system is identical to the old one.
>> I could not make much out of [0]. Among other things I tried to copy the files in the
>> old crawler/policy to the new crawler/policy - which included some legacy-cmd-line-
>> options.xml, legacy-cmd-line actions.xml. I also tried to reinstall the full oodt on
>> the client side, but still did not work.
>> 
>> I ended up reverting to the older version (0.6) which I run on my client. The server
>> (which runs FM) is still 0.12, but the combination seems to be working fine.
>> 
>> K
>> 
>> -----Original Message-----
>> From: Lewis John Mcgibbney [mailto:lewis.mcgibbney@gmail.com]
>> Sent: Tuesday, April 05, 2016 3:33 AM
>> To: dev@oodt.apache.org
>> Subject: Re: Transition from OODT 0.6 to 0.12 cannot find extractor specifications
>> 
>> Hi K,
>> OK so I did a bit of searching here and located a bunch of files which are defined
>> as legacy... you can check the search results out below
>> https://urldefense.proofpoint.com/v2/url?u=https-
>> 3A__github.com_apache_oodt_search-3Futf8-3D-25E2-259C-2593-26q-
>> 3DAutoDetectProductCrawler-26type-
>> 3DCode&d=CwIBaQ&c=CZZujK3G2KuXGEKnzq-
>> Hrg&r=wndYZ4MLMT9l3Zb2WZv2hq2O6yvZ1Cs-
>> T2gHY95y7ZA&m=AZOhzDmmNuBD_R9H2fm-
>> CubVmid0OEJbXqk4G2cmzDs&s=B33E_m-
>> BUEEQBqIqa2J8tZ2vnLqfwapWZp9Rn5nRyU8&e=
>> I would urge you to have a look at the AutoDetectProductCrawler Javadoc
>> description included in master branch [0] as well to see if you've got everything
>> required.
>> Finally, I came across some documentation on the wiki which may guide you in the
>> right direction [1]. It may also be outdated though so please let us know if that it
>> the case.
>> hth
>> 
>> [0]
>> https://urldefense.proofpoint.com/v2/url?u=https-
>> 3A__github.com_apache_oodt_blob_91d0bafe71124906bd94baad746189caf35fb3
>> 9c_crawler_src_main_java_org_apache_oodt_cas_crawl_AutoDetectProductCrawle
>> r.java-23L40-2DL64&d=CwIBaQ&c=CZZujK3G2KuXGEKnzq-
>> Hrg&r=wndYZ4MLMT9l3Zb2WZv2hq2O6yvZ1Cs-
>> T2gHY95y7ZA&m=AZOhzDmmNuBD_R9H2fm-
>> CubVmid0OEJbXqk4G2cmzDs&s=rJpNgTfZDhDyGV5KksACkvbSnkVvobGfBQcx
>> XiLWwT4&e=
>> [1]
>> https://urldefense.proofpoint.com/v2/url?u=https-
>> 3A__cwiki.apache.org_confluence_display_OODT_Mime-2Btype-2Bdetection-
>> 2Bwith-2Bthe-
>> 2BAutoDetectProductCrawler&d=CwIBaQ&c=CZZujK3G2KuXGEKnzq-
>> Hrg&r=wndYZ4MLMT9l3Zb2WZv2hq2O6yvZ1Cs-
>> T2gHY95y7ZA&m=AZOhzDmmNuBD_R9H2fm-
>> CubVmid0OEJbXqk4G2cmzDs&s=V5fEGERshX3JHBTQXryhwoEZqhgarILk8WutE
>> wICmGs&e=
>> 
>> On Mon, Apr 4, 2016 at 10:54 PM, Konstantinos Mavrommatis <
>> kmavrommatis@celgene.com> wrote:
>> 
>> > Hi,
>> > It seems to be happening for a number of types of files that I have in
>> > the mimetypes.xml.
>> > A few things are puzzling to me: this file which is a .gz file is not
>> > processed by the regular tika mimetypes which contains the gzip files
>> > A file that has no extension, which defaults to txt is passed to the
>> > MetExtractor.pl and processed.
>> >
>> > Any ideas I can find what are the preconditions that fail ? I tried to
>> > change the log level to DEBUG for all components but I did not get
>> > much more information. This must be something that changed in the OODT
>> > releases
>> > >0.6 but could not find anything relevant in the release notes.
>> > I also noticed in the documentation  of the AutoDecectProductCrawler
>> > that it uses the file met-extr-preconditions.xml which I could not
>> > find anywhere in the deployed OODT or the src directories. Could that
>> > be a reason for the problem I observe?
>> >
>> > Thanks
>> > K
>> >
>> > -----Original Message-----
>> > From: Lewis John Mcgibbney [mailto:lewis.mcgibbney@gmail.com]
>> > Sent: Monday, April 04, 2016 3:24 PM
>> > To: dev@oodt.apache.org
>> > Subject: Re: Transition from OODT 0.6 to 0.12 cannot find extractor
>> > specifications
>> >
>> > Hi Konstantinos,
>> > It appears to be happening with a tar.gz file as well right?
>> >
>> > WARNING: No extractor specs specified for
>> > /mnt/celgene.rnd.combio.mmgp.external/TestSeqData/RNA-Seq/RawData/fast
>> > q/cas-crawler-04-02-16.log.gz
>> >
>> > I wonder if it is the file names... However I would be extremely
>> > surprised as I've seen some much more verbose file naming.
>> > Lewis
>> >
>> > On Saturday, April 2, 2016, Konstantinos Mavrommatis <
>> > kmavrommatis@celgene.com> wrote:
>> >
>> > > Hi,
>> > > I am trying to replicate a fully functional service that I had setup
>> > > long time ago using OODT 0.6 but I am having the following problem
>> > > that does not allow me to ingest files. When I try to ingest files
>> > > with the extension fastq.gz I get the line:
>> > > WARNING: No extractor specs specified for
>> > > /mnt/celgene.rnd.combio.mmgp.external/TestSeqData/RNA-Seq/RawData/fa
>> > > st q/E837642_R1.fastq.gz Apr 02, 2016 10:12:14 PM
>> > > org.apache.oodt.cas.crawl.ProductCrawler
>> > > handleFile
>> > > And of course the file is not ingested. This process works without
>> > > problem with OODT 0.6 on a different server.
>> > >
>> > > The crawler command I am running is:
>> > > ./crawler_launcher \
>> > > --operation \
>> > > --launchAutoCrawler \
>> > > --productPath $FILEPATH \
>> > > --filemgrUrl $OODT_FILEMGR_URL \
>> > > --clientTransferer
>> > > org.apache.oodt.cas.filemgr.datatransfer.InPlaceDataTransferFactory
>> > > \ --mimeExtractorRepo ../policy/mime-extractor-map.xml \ --noRecur \
>> > > --crawlForDirs 2>&1
>> > >
>> > >
>> > >
>> > > I have setup OODT 0.12 on a server which runs FM listening to port 9000.
>> > > From a client machine I have verified that I can use FM to ingest
>> > products.
>> > > I am now trying to use crawler to crawl and ingest all files in a
>> > > directory. Since I have non standard MIME types in these directories
>> > > I have done the following:
>> > > 1. Added my own mime types in policy/mimetypes.xml eg
>> > >   <mime-type type="text/fastq">
>> > >                 <glob pattern="*.fastq"/>
>> > >                 <glob pattern="*.fastq.gz"/>
>> > >                 <glob pattern="*.fastq.bz"/>
>> > >                 <glob pattern="*.fastq.bz2"/>
>> > >                 <glob pattern="*.fastq.bzip"/>
>> > >                 <glob pattern="*.fq"/>
>> > >                 <glob pattern="*.fq.gz"/>
>> > >                 <glob pattern="*.fq.bz"/>
>> > >                 <glob pattern="*.fq.bz2"/>
>> > >                 <glob pattern="*.fq.bzip"/>
>> > >         </mime-type>
>> > > 2. created the file policy/mime-extractor-map.xml
>> > >
>> > >         <mime type="text/fastq">
>> > >                 <extractor
>> > > class="org.apache.oodt.cas.metadata.extractors.ExternMetExtractor">
>> > >                         <config
>> > > file="/apache-oodt/crawler/bin/fastq.config"/>
>> > >                         <preCondComparators>
>> > >                                 <preCondComparator
>> > > id="CheckThatDataFileSizeIsGreaterThanZero"/>
>> > >                         </preCondComparators>
>> > >                 </extractor>
>> > >         </mime>
>> > >
>> > > 3. created the file fastq.config
>> > > <?xml version="1.0" encoding="UTF-8"?> <cas:externextractor
>> > > xmlns:cas="https://urldefense.proofpoint.com/v2/url?u=http-
>> 3A__oodt.jpl.nasa.gov_1.0_cas&d=CwIBaQ&c=CZZujK3G2KuXGEKnzq-
>> Hrg&r=wndYZ4MLMT9l3Zb2WZv2hq2O6yvZ1Cs-
>> T2gHY95y7ZA&m=AZOhzDmmNuBD_R9H2fm-
>> CubVmid0OEJbXqk4G2cmzDs&s=FvkBYgoM8RnUm2ITaMjYb1s1sa9YtHvNL4c1M
>> _KF06w&e= ">
>> > >   <exec workingDir="">
>> > >
>> > >
>> > <extractorBinPath>/apache-oodt/crawler/bin/MetExtractorNGS.pl</extract
>> > orBinPath>
>> > >       <args>
>> > >          <arg isDataFile="true"></arg>
>> > >         <arg>fastq</arg>
>> > >       </args>
>> > >    </exec>
>> > > </cas:externextractor>
>> > >
>> > >
>> > >
>> > > The MetExtractorNGS.pl is a small perl script that opens the file to
>> > > be ingested, gets some information and stores it in the .met file
>> > > that corresponds to the file to be ingested and have manually
>> > > verified that works as expected producing the correct met file.
>> > >
>> > > What am I missing here? Any ideas comments suggestions will be
>> > > greatly appreciated.
>> > > Thanks in advance for any help
>> > > Kostas
>> > >
>> > >
>> > >
>> > > PS1 The full output from running the crawler command follows:
>> > >
>> > >
>> > > Setting property 'StdProductCrawler.filemgrUrl'
>> > > Setting property 'MetExtractorProductCrawler.filemgrUrl'
>> > > Setting property 'AutoDetectProductCrawler.filemgrUrl'
>> > > Setting property 'StdProductCrawler.clientTransferer'
>> > > Setting property 'MetExtractorProductCrawler.clientTransferer'
>> > > Setting property 'AutoDetectProductCrawler.clientTransferer'
>> > > Setting property 'StdProductCrawler.noRecur'
>> > > Setting property 'MetExtractorProductCrawler.noRecur'
>> > > Setting property 'AutoDetectProductCrawler.noRecur'
>> > > Setting property 'AutoDetectProductCrawler.mimeExtractorRepo'
>> > > Setting property 'StdProductCrawler.productPath'
>> > > Setting property 'MetExtractorProductCrawler.productPath'
>> > > Setting property 'AutoDetectProductCrawler.productPath'
>> > > Apr 02, 2016 10:12:13 PM
>> > > org.springframework.beans.factory.config.PropertyOverrideConfigurer
>> > > processKey
>> > > FINE: Property 'AutoDetectProductCrawler.noRecur' set to value
>> > > [true] Apr 02, 2016 10:12:13 PM
>> > > org.springframework.beans.factory.config.PropertyOverrideConfigurer
>> > > processKey
>> > > FINE: Property 'StdProductCrawler.productPath' set to value
>> > > [/mnt/celgene.rnd.combio.mmgp.external/TestSeqData/RNA-Seq/RawData/f
>> > > as
>> > > tq]
>> > > Apr 02, 2016 10:12:13 PM
>> > > org.springframework.beans.factory.config.PropertyOverrideConfigurer
>> > > processKey
>> > > FINE: Property 'MetExtractorProductCrawler.noRecur' set to value
>> > > [true] Apr 02, 2016 10:12:13 PM
>> > > org.springframework.beans.factory.config.PropertyOverrideConfigurer
>> > > processKey
>> > > FINE: Property 'AutoDetectProductCrawler.mimeExtractorRepo' set to
>> > > value [../policy/mime-extractor-map.xml]
>> > > Apr 02, 2016 10:12:13 PM
>> > > org.springframework.beans.factory.config.PropertyOverrideConfigurer
>> > > processKey
>> > > FINE: Property 'MetExtractorProductCrawler.clientTransferer' set to
>> > > value
>> > > [org.apache.oodt.cas.filemgr.datatransfer.InPlaceDataTransferFactory
>> > > ]
>> > > Apr 02, 2016 10:12:13 PM
>> > > org.springframework.beans.factory.config.PropertyOverrideConfigurer
>> > > processKey
>> > > FINE: Property 'AutoDetectProductCrawler.filemgrUrl' set to value [
>> > > https://urldefense.proofpoint.com/v2/url?u=http-3A__192.168.8.44-3A9
>> > > 00
>> > > 0&d=CwIBaQ&c=CZZujK3G2KuXGEKnzq-
>> Hrg&r=wndYZ4MLMT9l3Zb2WZv2hq2O6yvZ1C
>> > > s-
>> > >
>> T2gHY95y7ZA&m=Qaz0eKz7FHe35NMF43A17ey59ANhAqJD5ZfwZQC0VRo&s=Ov
>> pwZVR1
>> > > Xq gKclL83VXAWh__c7nz87xK_nS-O7hIXqc&e= ] Apr 02, 2016 10:12:13 PM
>> > > org.springframework.beans.factory.config.PropertyOverrideConfigurer
>> > > processKey
>> > > FINE: Property 'AutoDetectProductCrawler.clientTransferer' set to
>> > > value
>> > > [org.apache.oodt.cas.filemgr.datatransfer.InPlaceDataTransferFactory
>> > > ]
>> > > Apr 02, 2016 10:12:13 PM
>> > > org.springframework.beans.factory.config.PropertyOverrideConfigurer
>> > > processKey
>> > > FINE: Property 'StdProductCrawler.noRecur' set to value [true] Apr
>> > > 02,
>> > > 2016 10:12:13 PM
>> > > org.springframework.beans.factory.config.PropertyOverrideConfigurer
>> > > processKey
>> > > FINE: Property 'StdProductCrawler.filemgrUrl' set to value [
>> > > https://urldefense.proofpoint.com/v2/url?u=http-3A__192.168.8.44-3A9
>> > > 00
>> > > 0&d=CwIBaQ&c=CZZujK3G2KuXGEKnzq-
>> Hrg&r=wndYZ4MLMT9l3Zb2WZv2hq2O6yvZ1C
>> > > s-
>> > >
>> T2gHY95y7ZA&m=Qaz0eKz7FHe35NMF43A17ey59ANhAqJD5ZfwZQC0VRo&s=Ov
>> pwZVR1
>> > > Xq gKclL83VXAWh__c7nz87xK_nS-O7hIXqc&e= ] Apr 02, 2016 10:12:13 PM
>> > > org.springframework.beans.factory.config.PropertyOverrideConfigurer
>> > > processKey
>> > > FINE: Property 'AutoDetectProductCrawler.productPath' set to value
>> > > [/mnt/celgene.rnd.combio.mmgp.external/TestSeqData/RNA-Seq/RawData/f
>> > > as
>> > > tq]
>> > > Apr 02, 2016 10:12:13 PM
>> > > org.springframework.beans.factory.config.PropertyOverrideConfigurer
>> > > processKey
>> > > FINE: Property 'MetExtractorProductCrawler.filemgrUrl' set to value
>> > > [
>> > > https://urldefense.proofpoint.com/v2/url?u=http-3A__192.168.8.44-3A9
>> > > 00
>> > > 0&d=CwIBaQ&c=CZZujK3G2KuXGEKnzq-
>> Hrg&r=wndYZ4MLMT9l3Zb2WZv2hq2O6yvZ1C
>> > > s-
>> > >
>> T2gHY95y7ZA&m=Qaz0eKz7FHe35NMF43A17ey59ANhAqJD5ZfwZQC0VRo&s=Ov
>> pwZVR1
>> > > Xq gKclL83VXAWh__c7nz87xK_nS-O7hIXqc&e= ] Apr 02, 2016 10:12:13 PM
>> > > org.springframework.beans.factory.config.PropertyOverrideConfigurer
>> > > processKey
>> > > FINE: Property 'StdProductCrawler.clientTransferer' set to value
>> > > [org.apache.oodt.cas.filemgr.datatransfer.InPlaceDataTransferFactory
>> > > ]
>> > > Apr 02, 2016 10:12:13 PM
>> > > org.springframework.beans.factory.config.PropertyOverrideConfigurer
>> > > processKey
>> > > FINE: Property 'MetExtractorProductCrawler.productPath' set to value
>> > > [/mnt/celgene.rnd.combio.mmgp.external/TestSeqData/RNA-Seq/RawData/f
>> > > as tq] Apr 02, 2016 10:12:13 PM
>> > > org.apache.oodt.cas.crawl.ProductCrawler
>> > > crawl
>> > > INFO: Crawling
>> > > /mnt/celgene.rnd.combio.mmgp.external/TestSeqData/RNA-Seq/RawData/fa
>> > > st q Apr 02, 2016 10:12:13 PM
>> > > org.apache.oodt.cas.crawl.ProductCrawler
>> > > handleFile
>> > > INFO: Handling file
>> > > /mnt/celgene.rnd.combio.mmgp.external/TestSeqData/RNA-Seq/RawData/fa
>> > > st
>> > > q/E837642_R1.fastq.gz
>> > > Apr 02, 2016 10:12:14 PM
>> > > org.apache.oodt.cas.crawl.AutoDetectProductCrawler
>> > > passesPreconditions
>> > > WARNING: No extractor specs specified for
>> > > /mnt/celgene.rnd.combio.mmgp.external/TestSeqData/RNA-Seq/RawData/fa
>> > > st q/E837642_R1.fastq.gz Apr 02, 2016 10:12:14 PM
>> > > org.apache.oodt.cas.crawl.ProductCrawler
>> > > handleFile
>> > > WARNING: Failed to pass preconditions for ingest of product:
>> > > [/mnt/celgene.rnd.combio.mmgp.external/TestSeqData/RNA-Seq/RawData/f
>> > > as tq/E837642_R1.fastq.gz] Apr 02, 2016 10:12:14 PM
>> > > org.apache.oodt.cas.crawl.ProductCrawler
>> > > handleFile
>> > > INFO: Handling file
>> > > /mnt/celgene.rnd.combio.mmgp.external/TestSeqData/RNA-Seq/RawData/fa
>> > > st
>> > > q/E837642_R1.fastq.gz.met
>> > > Apr 02, 2016 10:12:14 PM
>> > > org.apache.oodt.cas.metadata.preconditions.PreCondEvalUtils eval
>> > > INFO: Passed precondition comparator id
>> > > CheckThatDataFileSizeIsGreaterThanZero
>> > > Apr 02, 2016 10:12:14 PM
>> > > org.apache.oodt.cas.metadata.extractors.ExternMetExtractor
>> > > extrMetadata
>> > > INFO: Generating met file for product file:
>> > > [/mnt/celgene.rnd.combio.mmgp.external/TestSeqData/RNA-Seq/RawData/f
>> > > as
>> > > tq/E837642_R1.fastq.gz.met]
>> > > Apr 02, 2016 10:12:14 PM
>> > > org.apache.oodt.cas.metadata.extractors.ExternMetExtractor
>> > > extrMetadata
>> > > INFO: Executing command line:
>> > > [/celgene/software/apache-oodt/crawler/bin/MetExtractorNGS.pl
>> > > /mnt/celgene.rnd.combio.mmgp.external/TestSeqData/RNA-Seq/RawData/fa
>> > > st
>> > > q/E837642_R1.fastq.gz.met
>> > > text ] with workingDir:
>> > > [/mnt/celgene.rnd.combio.mmgp.external/TestSeqData/RNA-Seq/RawData/f
>> > > as
>> > > tq]
>> > > to extract metadata
>> > > OUTPUT: [WARN : MetExtractorNGS - 2016/04/02 22:12:15] - Input file
>> > > /mnt/celgene.rnd.combio.mmgp.external/TestSeqData/RNA-Seq/RawData/fa
>> > > st q/E837642_R1.fastq.gz.met will be ignored. .met files are not
>> > > processed !
>> > > Apr 02, 2016 10:12:15 PM org.apache.oodt.cas.crawl.ProductCrawler
>> > > handleFile
>> > > SEVERE: Failed to get metadata for product : Met extractor failed to
>> > > create metadata file
>> > > org.apache.oodt.cas.metadata.exceptions.MetExtractionException: Met
>> > > extractor failed to create metadata file
>> > >         at
>> > >
>> > org.apache.oodt.cas.metadata.extractors.ExternMetExtractor.extrMetadat
>> > a(ExternMetExtractor.java:120)
>> > >         at
>> > >
>> > org.apache.oodt.cas.metadata.AbstractMetExtractor.extractMetadata(Abst
>> > ractMetExtractor.java:74)
>> > >         at
>> > >
>> > org.apache.oodt.cas.crawl.AutoDetectProductCrawler.getMetadataForProdu
>> > ct(AutoDetectProductCrawler.java:84)
>> > >         at
>> > >
>> > org.apache.oodt.cas.crawl.ProductCrawler.handleFile(ProductCrawler.jav
>> > a:136)
>> > >         at
>> > > org.apache.oodt.cas.crawl.ProductCrawler.crawl(ProductCrawler.java:104)
>> > >         at
>> > > org.apache.oodt.cas.crawl.ProductCrawler.crawl(ProductCrawler.java:74)
>> > >         at
>> > >
>> > org.apache.oodt.cas.crawl.cli.action.CrawlerLauncherCliAction.execute(
>> > CrawlerLauncherCliAction.java:58)
>> > >         at
>> > > org.apache.oodt.cas.cli.CmdLineUtility.execute(CmdLineUtility.java:331)
>> > >         at
>> > > org.apache.oodt.cas.cli.CmdLineUtility.run(CmdLineUtility.java:188)
>> > >         at
>> > > org.apache.oodt.cas.crawl.CrawlerLauncher.main(CrawlerLauncher.java:
>> > > 36
>> > > )
>> > >
>> > > Apr 02, 2016 10:12:15 PM org.apache.oodt.cas.crawl.ProductCrawler
>> > > handleFile
>> > > INFO: Handling file
>> > > /mnt/celgene.rnd.combio.mmgp.external/TestSeqData/RNA-Seq/RawData/fa
>> > > st
>> > > q/E837642_R2.fastq.gz
>> > > Apr 02, 2016 10:12:15 PM
>> > > org.apache.oodt.cas.crawl.AutoDetectProductCrawler
>> > > passesPreconditions
>> > > WARNING: No extractor specs specified for
>> > > /mnt/celgene.rnd.combio.mmgp.external/TestSeqData/RNA-Seq/RawData/fa
>> > > st q/E837642_R2.fastq.gz Apr 02, 2016 10:12:15 PM
>> > > org.apache.oodt.cas.crawl.ProductCrawler
>> > > handleFile
>> > > WARNING: Failed to pass preconditions for ingest of product:
>> > > [/mnt/celgene.rnd.combio.mmgp.external/TestSeqData/RNA-Seq/RawData/f
>> > > as tq/E837642_R2.fastq.gz] Apr 02, 2016 10:12:15 PM
>> > > org.apache.oodt.cas.crawl.ProductCrawler
>> > > handleFile
>> > > INFO: Handling file
>> > > /mnt/celgene.rnd.combio.mmgp.external/TestSeqData/RNA-Seq/RawData/fa
>> > > st
>> > > q/E837642_R2.fastq.gz.met
>> > > Apr 02, 2016 10:12:15 PM
>> > > org.apache.oodt.cas.metadata.preconditions.PreCondEvalUtils eval
>> > > INFO: Passed precondition comparator id
>> > > CheckThatDataFileSizeIsGreaterThanZero
>> > > Apr 02, 2016 10:12:16 PM
>> > > org.apache.oodt.cas.metadata.extractors.ExternMetExtractor
>> > > extrMetadata
>> > > INFO: Generating met file for product file:
>> > > [/mnt/celgene.rnd.combio.mmgp.external/TestSeqData/RNA-Seq/RawData/f
>> > > as
>> > > tq/E837642_R2.fastq.gz.met]
>> > > Apr 02, 2016 10:12:16 PM
>> > > org.apache.oodt.cas.metadata.extractors.ExternMetExtractor
>> > > extrMetadata
>> > > INFO: Executing command line:
>> > > [/celgene/software/apache-oodt/crawler/bin/MetExtractorNGS.pl
>> > > /mnt/celgene.rnd.combio.mmgp.external/TestSeqData/RNA-Seq/RawData/fa
>> > > st
>> > > q/E837642_R2.fastq.gz.met
>> > > text ] with workingDir:
>> > > [/mnt/celgene.rnd.combio.mmgp.external/TestSeqData/RNA-Seq/RawData/f
>> > > as
>> > > tq]
>> > > to extract metadata
>> > > OUTPUT: [WARN : MetExtractorNGS - 2016/04/02 22:12:16] - Input file
>> > > /mnt/celgene.rnd.combio.mmgp.external/TestSeqData/RNA-Seq/RawData/fa
>> > > st q/E837642_R2.fastq.gz.met will be ignored. .met files are not
>> > > processed !
>> > > Apr 02, 2016 10:12:17 PM org.apache.oodt.cas.crawl.ProductCrawler
>> > > handleFile
>> > > SEVERE: Failed to get metadata for product : Met extractor failed to
>> > > create metadata file
>> > > org.apache.oodt.cas.metadata.exceptions.MetExtractionException: Met
>> > > extractor failed to create metadata file
>> > >         at
>> > >
>> > org.apache.oodt.cas.metadata.extractors.ExternMetExtractor.extrMetadat
>> > a(ExternMetExtractor.java:120)
>> > >         at
>> > >
>> > org.apache.oodt.cas.metadata.AbstractMetExtractor.extractMetadata(Abst
>> > ractMetExtractor.java:74)
>> > >         at
>> > >
>> > org.apache.oodt.cas.crawl.AutoDetectProductCrawler.getMetadataForProdu
>> > ct(AutoDetectProductCrawler.java:84)
>> > >         at
>> > >
>> > org.apache.oodt.cas.crawl.ProductCrawler.handleFile(ProductCrawler.jav
>> > a:136)
>> > >         at
>> > > org.apache.oodt.cas.crawl.ProductCrawler.crawl(ProductCrawler.java:104)
>> > >         at
>> > > org.apache.oodt.cas.crawl.ProductCrawler.crawl(ProductCrawler.java:74)
>> > >         at
>> > >
>> > org.apache.oodt.cas.crawl.cli.action.CrawlerLauncherCliAction.execute(
>> > CrawlerLauncherCliAction.java:58)
>> > >         at
>> > > org.apache.oodt.cas.cli.CmdLineUtility.execute(CmdLineUtility.java:331)
>> > >         at
>> > > org.apache.oodt.cas.cli.CmdLineUtility.run(CmdLineUtility.java:188)
>> > >         at
>> > > org.apache.oodt.cas.crawl.CrawlerLauncher.main(CrawlerLauncher.java:
>> > > 36
>> > > )
>> > >
>> > > Apr 02, 2016 10:12:17 PM org.apache.oodt.cas.crawl.ProductCrawler
>> > > handleFile
>> > > INFO: Handling file
>> > > /mnt/celgene.rnd.combio.mmgp.external/TestSeqData/RNA-Seq/RawData/fa
>> > > st
>> > > q/cas-crawler-04-02-16.log.gz
>> > > Apr 02, 2016 10:12:17 PM
>> > > org.apache.oodt.cas.crawl.AutoDetectProductCrawler
>> > > passesPreconditions
>> > > WARNING: No extractor specs specified for
>> > > /mnt/celgene.rnd.combio.mmgp.external/TestSeqData/RNA-Seq/RawData/fa
>> > > st q/cas-crawler-04-02-16.log.gz Apr 02, 2016 10:12:17 PM
>> > > org.apache.oodt.cas.crawl.ProductCrawler
>> > > handleFile
>> > > WARNING: Failed to pass preconditions for ingest of product:
>> > > [/mnt/celgene.rnd.combio.mmgp.external/TestSeqData/RNA-Seq/RawData/f
>> > > as tq/cas-crawler-04-02-16.log.gz] Apr 02, 2016 10:12:17 PM
>> > > org.apache.oodt.cas.crawl.ProductCrawler
>> > > handleFile
>> > > INFO: Handling file
>> > > /mnt/celgene.rnd.combio.mmgp.external/TestSeqData/RNA-Seq/RawData/fa
>> > > st
>> > > q/cas-crawler-04-02-16.tar.gz
>> > > Apr 02, 2016 10:12:17 PM
>> > > org.apache.oodt.cas.crawl.AutoDetectProductCrawler
>> > > passesPreconditions
>> > > WARNING: No extractor specs specified for
>> > > /mnt/celgene.rnd.combio.mmgp.external/TestSeqData/RNA-Seq/RawData/fa
>> > > st q/cas-crawler-04-02-16.tar.gz Apr 02, 2016 10:12:17 PM
>> > > org.apache.oodt.cas.crawl.ProductCrawler
>> > > handleFile
>> > > WARNING: Failed to pass preconditions for ingest of product:
>> > > [/mnt/celgene.rnd.combio.mmgp.external/TestSeqData/RNA-Seq/RawData/f
>> > > as tq/cas-crawler-04-02-16.tar.gz] Apr 02, 2016 10:12:17 PM
>> > > org.apache.oodt.cas.crawl.ProductCrawler
>> > > handleFile
>> > > INFO: Handling file
>> > > /mnt/celgene.rnd.combio.mmgp.external/TestSeqData/RNA-Seq/RawData/fa
>> > > st
>> > > q/cas-crawler-mnt-celgene.rnd.combio.mmgp.external-TestSeqData-RNA-S
>> > > eq
>> > > -RawData-fastq-04-02-16.tar.gz
>> > > Apr 02, 2016 10:12:17 PM
>> > > org.apache.oodt.cas.crawl.AutoDetectProductCrawler
>> > > passesPreconditions
>> > > WARNING: No extractor specs specified for
>> > > /mnt/celgene.rnd.combio.mmgp.external/TestSeqData/RNA-Seq/RawData/fa
>> > > st
>> > > q/cas-crawler-mnt-celgene.rnd.combio.mmgp.external-TestSeqData-RNA-S
>> > > eq -RawData-fastq-04-02-16.tar.gz Apr 02, 2016 10:12:17 PM
>> > > org.apache.oodt.cas.crawl.ProductCrawler
>> > > handleFile
>> > > WARNING: Failed to pass preconditions for ingest of product:
>> > > [/mnt/celgene.rnd.combio.mmgp.external/TestSeqData/RNA-Seq/RawData/f
>> > > as
>> > > tq/cas-crawler-mnt-celgene.rnd.combio.mmgp.external-TestSeqData-RNA-
>> > > Se q-RawData-fastq-04-02-16.tar.gz] Apr 02, 2016 10:12:17 PM
>> > > org.apache.oodt.cas.crawl.ProductCrawler
>> > > handleFile
>> > > INFO: Handling file
>> > > /mnt/celgene.rnd.combio.mmgp.external/TestSeqData/RNA-Seq/RawData/fa
>> > > st
>> > > q/test
>> > > Apr 02, 2016 10:12:17 PM
>> > > org.apache.oodt.cas.metadata.preconditions.PreCondEvalUtils eval
>> > > INFO: Passed precondition comparator id
>> > > CheckThatDataFileSizeIsGreaterThanZero
>> > > Apr 02, 2016 10:12:17 PM
>> > > org.apache.oodt.cas.metadata.extractors.ExternMetExtractor
>> > > extrMetadata
>> > > INFO: Generating met file for product file:
>> > > [/mnt/celgene.rnd.combio.mmgp.external/TestSeqData/RNA-Seq/RawData/f
>> > > as
>> > > tq/test]
>> > > Apr 02, 2016 10:12:17 PM
>> > > org.apache.oodt.cas.metadata.extractors.ExternMetExtractor
>> > > extrMetadata
>> > > INFO: Executing command line:
>> > > [/celgene/software/apache-oodt/crawler/bin/MetExtractorNGS.pl
>> > > /mnt/celgene.rnd.combio.mmgp.external/TestSeqData/RNA-Seq/RawData/fa
>> > > st
>> > > q/test
>> > > text ] with workingDir:
>> > > [/mnt/celgene.rnd.combio.mmgp.external/TestSeqData/RNA-Seq/RawData/f
>> > > as
>> > > tq]
>> > > to extract metadata
>> > > OUTPUT: [DEBUG : MetExtractorNGS - 2016/04/02 22:12:18] - Accessing
>> > > NGS server at
>> > > https://urldefense.proofpoint.com/v2/url?u=http-3A__192.168.8.44-3A8
>> > > 08
>> > > 2_RPC2&d=CwIBaQ&c=CZZujK3G2KuXGEKnzq-
>> Hrg&r=wndYZ4MLMT9l3Zb2WZv2hq2O6
>> > > yv
>> > > Z1Cs-
>> T2gHY95y7ZA&m=Qaz0eKz7FHe35NMF43A17ey59ANhAqJD5ZfwZQC0VRo&s=tS
>> c
>> > > i2 Q1bJj0cQnBHjjOwtZjjx9uNMoN5Bi-ABG0Q7Y4&e=
>> > > OUTPUT: [DEBUG : metadataPrepare - 2016/04/02 22:12:18] - addMetadata:
>> > > metadata for file_host are not in array format.Converting..
>> > > OUTPUT: [DEBUG : metadataPrepare - 2016/04/02 22:12:18] - addMetadata:
>> > > adding key/value [file_host]/[ip-192-168-8-66]
>> > > OUTPUT: [DEBUG : metadataPrepare - 2016/04/02 22:12:18] - addMetadata:
>> > > metadata for ProductType are not in array format.Converting..
>> > > OUTPUT: [DEBUG : metadataPrepare - 2016/04/02 22:12:18] - addMetadata:
>> > > adding key/value [ProductType]/[GenericFile]
>> > > OUTPUT: [DEBUG : metadataPrepare - 2016/04/02 22:12:18] - addMetadata:
>> > > metadata for ingest_user are not in array format.Converting..
>> > > OUTPUT: [DEBUG : metadataPrepare - 2016/04/02 22:12:18] - addMetadata:
>> > > adding key/value [ingest_user]/[kmavrommatis]
>> > > OUTPUT: [DEBUG : MetExtractorNGS - 2016/04/02 22:12:18] - The file
>> > > path is ARRAY(0x22d3f48). It will be added under the FilePath
>> > > metadata field
>> > > OUTPUT: [DEBUG : metadataPrepare - 2016/04/02 22:12:18] - addMetadata:
>> > > metadata for FilePath are not in array format.Converting..
>> > > OUTPUT: [DEBUG : metadataPrepare - 2016/04/02 22:12:18] - addMetadata:
>> > > adding key/value
>> > > [FilePath]/[/mnt/celgene.rnd.combio.mmgp.external/TestSeqData/RNA-Se
>> > > q/
>> > > RawData/fastq/test]
>> > > OUTPUT: [DEBUG : MetExtractorNGS - 2016/04/02 22:12:18] - This file
>> > > is of type text
>> > > OUTPUT: [DEBUG : MetExtractorNGS - 2016/04/02 22:12:18] - Storing
>> > > metadata in file
>> > > /mnt/celgene.rnd.combio.mmgp.external/TestSeqData/RNA-Seq/RawData/fa
>> > > st
>> > > q/test.met
>> > > OUTPUT: [DEBUG : metadataPrepare - 2016/04/02 22:12:18] - Changing
>> > > /mnt/celgene.rnd.combio.mmgp.external/TestSeqData/RNA-Seq/RawData/fa
>> > > st
>> > > q/test
>> > > to
>> > > OUTPUT: [DEBUG : metadataPrepare - 2016/04/02 22:12:18] -
>> > > /mnt/celgene.rnd.combio.mmgp.external/TestSeqData/RNA-Seq/RawData/fa
>> > > st
>> > > q/test
>> > > OUTPUT: [DEBUG : metadataPrepare - 2016/04/02 22:12:18] - Changing
>> > > kmavrommatis to
>> > > OUTPUT: [DEBUG : metadataPrepare - 2016/04/02 22:12:18] -
>> > > kmavrommatis
>> > > OUTPUT: [DEBUG : metadataPrepare - 2016/04/02 22:12:18] - Changing
>> > > GenericFile to
>> > > OUTPUT: [DEBUG : metadataPrepare - 2016/04/02 22:12:18] -
>> > > GenericFile
>> > > OUTPUT: [DEBUG : metadataPrepare - 2016/04/02 22:12:18] - Changing
>> > > ip-192-168-8-66 to
>> > > OUTPUT: [DEBUG : metadataPrepare - 2016/04/02 22:12:18] -
>> > > ip-192-168-8-66
>> > > OUTPUT: [DEBUG : MetExtractorNGS - 2016/04/02 22:12:19] - Process
>> > > finished SUCCESSFULLY Apr 02, 2016 10:12:19 PM
>> > > org.apache.oodt.cas.metadata.extractors.ExternMetExtractor
>> > > extrMetadata
>> > > INFO: Met extraction successful for product file:
>> > > [/mnt/celgene.rnd.combio.mmgp.external/TestSeqData/RNA-Seq/RawData/f
>> > > as
>> > > tq/test] Apr 02, 2016 10:12:19 PM
>> > > org.apache.oodt.cas.crawl.ProductCrawler ingest
>> > > INFO: ProductCrawler: Ready to ingest product:
>> > >
>> > [/mnt/celgene.rnd.combio.mmgp.external/TestSeqData/RNA-
>> Seq/RawData/fastq/test]:
>> > > ProductType: [GenericFile]
>> > > Apr 02, 2016 10:12:19 PM
>> > > org.apache.oodt.cas.filemgr.ingest.StdIngester
>> > > setFileManager
>> > > INFO: StdIngester: connected to file manager:
>> > > [https://urldefense.proofpoint.com/v2/url?u=http-3A__192.168.8.44-3A
>> > > 90
>> > > 00&d=CwIBaQ&c=CZZujK3G2KuXGEKnzq-
>> Hrg&r=wndYZ4MLMT9l3Zb2WZv2hq2O6yvZ1
>> > > Cs
>> > > -
>> T2gHY95y7ZA&m=Qaz0eKz7FHe35NMF43A17ey59ANhAqJD5ZfwZQC0VRo&s=Ov
>> pwZVR
>> > > 1X qgKclL83VXAWh__c7nz87xK_nS-O7hIXqc&e= ] Apr 02, 2016 10:12:19 PM
>> > > org.apache.oodt.cas.filemgr.datatransfer.InPlaceDataTransferer
>> > > setFileManagerUrl
>> > > INFO: In Place Data Transfer to:
>> > > [https://urldefense.proofpoint.com/v2/url?u=http-3A__192.168.8.44-3A
>> > > 90
>> > > 00&d=CwIBaQ&c=CZZujK3G2KuXGEKnzq-
>> Hrg&r=wndYZ4MLMT9l3Zb2WZv2hq2O6yvZ1
>> > > Cs
>> > > -
>> T2gHY95y7ZA&m=Qaz0eKz7FHe35NMF43A17ey59ANhAqJD5ZfwZQC0VRo&s=Ov
>> pwZVR
>> > > 1X qgKclL83VXAWh__c7nz87xK_nS-O7hIXqc&e= ] enabled Apr 02, 2016
>> > > 10:12:19 PM org.apache.oodt.cas.filemgr.ingest.StdIngester
>> > > ingest
>> > > INFO: StdIngester: ingesting product: ProductName: [test]: ProductType:
>> > > [GenericFile]: FileLocation:
>> > > [/mnt/celgene.rnd.combio.mmgp.external/TestSeqData/RNA-Seq/RawData/f
>> > > as
>> > > tq/]
>> > > Apr 02, 2016 10:12:19 PM
>> > > org.apache.oodt.cas.filemgr.system.XmlRpcFileManagerClient
>> > > ingestProduct
>> > > FINEST: File Manager Client: clientTransfer enabled: transfering
>> > > product [test] Apr 02, 2016 10:12:19 PM
>> > > org.apache.oodt.cas.filemgr.versioning.VersioningUtils
>> > > createBasicDataStoreRefsFlat
>> > > FINE: VersioningUtils: Generated data store ref:
>> > > file:/opt/oodt/data/archive/test/test from origRef:
>> > > file:/mnt/celgene.rnd.combio.mmgp.external/TestSeqData/RNA-Seq/RawDa
>> > > ta /fastq/test Apr 02, 2016 10:12:19 PM
>> > > org.apache.oodt.cas.crawl.ProductCrawler ingest
>> > > INFO: Successfully ingested product:
>> > >
>> > [/mnt/celgene.rnd.combio.mmgp.external/TestSeqData/RNA-
>> Seq/RawData/fastq/test]:
>> > > product id: 4c8de2da-265a-48c4-8380-3f1103dfecfc
>> > > Apr 02, 2016 10:12:19 PM org.apache.oodt.cas.crawl.ProductCrawler
>> > > handleFile
>> > > INFO: Successful ingest of product:
>> > > [/mnt/celgene.rnd.combio.mmgp.external/TestSeqData/RNA-Seq/RawData/f
>> > > as
>> > > tq/test]
>> > >
>> > >
>> > > *********************************************************
>> > > THIS ELECTRONIC MAIL MESSAGE AND ANY ATTACHMENT IS
>> CONFIDENTIAL AND
>> > > MAY CONTAIN LEGALLY PRIVILEGED INFORMATION INTENDED ONLY FOR
>> THE USE
>> > > OF THE INDIVIDUAL OR INDIVIDUALS NAMED ABOVE.
>> > > If the reader is not the intended recipient, or the employee or
>> > > agent responsible to deliver it to the intended recipient, you are
>> > > hereby notified that any dissemination, distribution or copying of
>> > > this communication is strictly prohibited. If you have received this
>> > > communication in error, please reply to the sender to notify us of
>> > > the error and delete the original message. Thank You.
>> > >
>> >
>> >
>> > --
>> > *Lewis*
>> >
>> > *********************************************************
>> > THIS ELECTRONIC MAIL MESSAGE AND ANY ATTACHMENT IS
>> CONFIDENTIAL AND
>> > MAY CONTAIN LEGALLY PRIVILEGED INFORMATION INTENDED ONLY FOR
>> THE USE
>> > OF THE INDIVIDUAL OR INDIVIDUALS NAMED ABOVE.
>> > If the reader is not the intended recipient, or the employee or agent
>> > responsible to deliver it to the intended recipient, you are hereby
>> > notified that any dissemination, distribution or copying of this
>> > communication is strictly prohibited. If you have received this
>> > communication in error, please reply to the sender to notify us of the
>> > error and delete the original message. Thank You.
>> >
>> 
>> 
>> 
>> --
>> *Lewis*
>> 
>> *********************************************************
>> THIS ELECTRONIC MAIL MESSAGE AND ANY ATTACHMENT IS CONFIDENTIAL
>> AND MAY CONTAIN LEGALLY PRIVILEGED INFORMATION INTENDED ONLY
>> FOR THE USE OF THE INDIVIDUAL OR INDIVIDUALS NAMED ABOVE.
>> If the reader is not the intended recipient, or the employee or agent responsible to
>> deliver it to the intended recipient, you are hereby notified that any dissemination,
>> distribution or copying of this communication is strictly prohibited. If you have
>> received this communication in error, please reply to the sender to notify us of the
>> error and delete the original message. Thank You.



RE: Transition from OODT 0.6 to 0.12 cannot find extractor specifications

Posted by "Mallder, Valerie" <Va...@jhuapl.edu>.
Hi Konstantinos,

This may be a long shot. But, I may have come up with a few things you could look at to try and solve your issue.  I am working with version 0.10, so the files and their locations that I am going to reference in this email pertain only to 0.10. So, keep that in mind if I reference a file that you can't find. Because there could have been a change made between 0.10 and 0.12 that I haven't looked at yet.  

The first thing I notice is the filename you are using for your mime types 'mimetypes.xml'.  I know that the filename you use shouldn't make difference as long as all the references to the file are the same. But, there are many references to the mime type file throughout the system, and, depending on which original *.xml files you based your system on, it can be very easy to have one of those references set to something different than the others. 

If you look in the filemgr/etc directory, the default name for the mime types file is 'mime-types.xml'.

If you look in the filemgr/etc/filemgr.properties file, there is a property setting that implies the default filename is 'mime-types.xml' as in:

# location of Mime-Type repository
org.apache.oodt.cas.filemgr.mime.type.repository=/path/to/mime-types.xml

If you look in the example mime-extractor-map.xml file in the pge/etc/examples directory, the mime repository is set to 'mime-types.xml' as in:

<cas:mimetypemap xmlns:cas="http://oodt.jpl.nassa.gov/1.0/cas" magic="false" mimeRepo="mime-types.xml">

If you look in the crawler/policy directory, there is a default mime types file named 'mimetypes.xml', but the default mime-extractor-map.xml file in that same directory sets the mime repository to 'path/to/tika-mimetypes/xml/file', as in:

<cas:mimetypemap xmlns:cas="http://oodt.jpl.nassa.gov/1.0/cas" magic="true or false" mimeRepo="path/to/tika-mimetypes/xml/file">

In addition, if you download the source code for the 'metadata' component, and look in the metadata/src/main/java/org/apache/oodt/cas/metadata/util/MimeTypeUtils.java file, it sets the default name of the mime types file to 'tika-mimetypes.xml' as in this line of code: 

public final static String MIME_FILE_RES_PATH = "tika-mimetypes.xml";


So, the first thing you should do is make sure all of your references to your mime types file are the same.  There are several places ( or in several classes) where the MimeTypeUtil class is used, and you need to make sure that each instantiation of the class is using the same mime types file. 

A quick search of the source code revealed that MimeTypeUtils is referenced in the following places:
./crawler/src/main/java/org/apache/oodt/cas/crawl/typedetection/MimeExtractorRepo.java
./pushpull/src/main/java/org/apache/oodt/cas/pushpull/retrievalsystem/FileRetrievalSystem.java
./protocol/http/src/main/java/org/apache/oodt/cas/protocol/http/util/HttpUtils.java
./metadata/src/main/java/org/apache/oodt/cas/metadata/util/MimeTypeUtils.java
./metadata/src/main/java/org/apache/oodt/cas/metadata/preconditions/MimeTypeComparator.java
./metadata/src/test/org/apache/oodt/cas/metadata/util/TestMimeTypeUtils.java

For example, the MimeTypeComparator.java class has a method called setMimeTypeRepo to set the mime repository name, but there is no code in the system that actually calls MimeTypeComparator::setMimeTypeRepo, so, if you are using the MimeTypeComparator as one of your preconditions, then, MimeTypeUtils was instantiated with its default constructor which then sets its internal mime repository to MIME_FILE_RES_PATH shown above, which is probably not what you want because your custom mime type is not in that file. And then you can get the 'no extractor defined' error.  

The second thing I noticed is how you are defining your custom mime types.

<mime-type type="text/fastq">
    <glob pattern="*.fastq"/>
    <glob pattern="*.fastq.gz"/>
    <glob pattern="*.fastq.bz"/>
    <glob pattern="*.fastq.bz2"/>
    <glob pattern="*.fastq.bzip"/>
    <glob pattern="*.fq"/>
    <glob pattern="*.fq.gz"/>
    <glob pattern="*.fq.bz"/>
   <glob pattern="*.fq.bz2"/>
   <glob pattern="*.fq.bzip"/>
</mime-type>

I had to make a change to how I was defining my mime types. I don't think Tika will like the way you have defined your mime types.  For example, I have a mime type called "product/fei-ecsv" which are just text files named *.ecsv.  I had defined it like this:

<mime-type type="product/fei-ecsv">
	<glob pattern="*.ecsv"/>
</mime-type>

If I remember correctly, I think Tika ended up not being able to determine the mime type  and it defaulted to 'application/octet-stream' - for which I did not have an extractor defined, and so I got the 'no extractor defined' errors. So, in order to get Tika to recognize my new mime type, I had to add the 'sub-class-of' tag and change my definition to:

<mime-type type="product/fei-ecsv">
  <sub-class-of type="text/plain"/>
    <glob pattern="*.ecsv"/>
</mime-type>

I also ran into a problem  when I tried to define a mime type for files that have an extension that was already defined in the mime types file, even if it was a two part extension that didn't actually exist in the file.  For example, I am a little worried you might run into problems with your patterns that end in .gz, .bz, .bz2 and .bzip even though they also have '.fq' and '.fastq' in the pattern.  You might have to split all of your patterns up into a few different mime types.  I hope that you won't have to.  But if you do, then I pretty sure these 4 types will work as far as Tika is concerned. But doing this might screw up how you have set up your "product types'.

<mime-type type="text/fastq">
   <sub-class-of type="text/plain"/>
     <glob pattern="*.fastq"/>
     <glob pattern="*.fq "/>
</mime-type>

<mime-type type="text/fastq-gz">
   <sub-class-of type="application/gzip"/>
     <glob pattern="*.fastq.gz "/>
     <glob pattern="*.fq.gz "/>
</mime-type>

<mime-type type="text/fastq-bz">
   <sub-class-of type="application/x-bzip"/>
    <glob pattern="*.fastq.bz"/>
    <glob pattern="*.fastq.bzip"/>
    <glob pattern="*.fq.bz"/>
   <glob pattern="*.fq.bzip"/>
</mime-type>

<mime-type type="text/fastq-bz2">
   <sub-class-of type="application/x-bzip2"/>
    <glob pattern="*.fastq.bz2"/>
   <glob pattern="*.fq.bz2"/>
</mime-type>


I hope this helps!  Please let me now if yo have any questions.  I spent a huge amount of time debugging the 'no extractor found' error, so I have spent a huge amount of time upgrading to each new version from 0.6 to 0.10, so I'm hoping my struggles can help someone else :)

Val



Valerie A. Mallder
New Horizons Deputy Mission System Engineer
Johns Hopkins University/Applied Physics Laboratory


> -----Original Message-----
> From: Konstantinos Mavrommatis [mailto:kmavrommatis@celgene.com]
> Sent: Wednesday, April 06, 2016 9:48 PM
> To: dev@oodt.apache.org
> Subject: RE: Transition from OODT 0.6 to 0.12 cannot find extractor specifications
> 
> I am giving up on this....
> I had used [1] in the first place to setup oodt (v0.6 back then) my setup in the new
> system is identical to the old one.
> I could not make much out of [0]. Among other things I tried to copy the files in the
> old crawler/policy to the new crawler/policy - which included some legacy-cmd-line-
> options.xml, legacy-cmd-line actions.xml. I also tried to reinstall the full oodt on
> the client side, but still did not work.
> 
> I ended up reverting to the older version (0.6) which I run on my client. The server
> (which runs FM) is still 0.12, but the combination seems to be working fine.
> 
> K
> 
> -----Original Message-----
> From: Lewis John Mcgibbney [mailto:lewis.mcgibbney@gmail.com]
> Sent: Tuesday, April 05, 2016 3:33 AM
> To: dev@oodt.apache.org
> Subject: Re: Transition from OODT 0.6 to 0.12 cannot find extractor specifications
> 
> Hi K,
> OK so I did a bit of searching here and located a bunch of files which are defined
> as legacy... you can check the search results out below
> https://urldefense.proofpoint.com/v2/url?u=https-
> 3A__github.com_apache_oodt_search-3Futf8-3D-25E2-259C-2593-26q-
> 3DAutoDetectProductCrawler-26type-
> 3DCode&d=CwIBaQ&c=CZZujK3G2KuXGEKnzq-
> Hrg&r=wndYZ4MLMT9l3Zb2WZv2hq2O6yvZ1Cs-
> T2gHY95y7ZA&m=AZOhzDmmNuBD_R9H2fm-
> CubVmid0OEJbXqk4G2cmzDs&s=B33E_m-
> BUEEQBqIqa2J8tZ2vnLqfwapWZp9Rn5nRyU8&e=
> I would urge you to have a look at the AutoDetectProductCrawler Javadoc
> description included in master branch [0] as well to see if you've got everything
> required.
> Finally, I came across some documentation on the wiki which may guide you in the
> right direction [1]. It may also be outdated though so please let us know if that it
> the case.
> hth
> 
> [0]
> https://urldefense.proofpoint.com/v2/url?u=https-
> 3A__github.com_apache_oodt_blob_91d0bafe71124906bd94baad746189caf35fb3
> 9c_crawler_src_main_java_org_apache_oodt_cas_crawl_AutoDetectProductCrawle
> r.java-23L40-2DL64&d=CwIBaQ&c=CZZujK3G2KuXGEKnzq-
> Hrg&r=wndYZ4MLMT9l3Zb2WZv2hq2O6yvZ1Cs-
> T2gHY95y7ZA&m=AZOhzDmmNuBD_R9H2fm-
> CubVmid0OEJbXqk4G2cmzDs&s=rJpNgTfZDhDyGV5KksACkvbSnkVvobGfBQcx
> XiLWwT4&e=
> [1]
> https://urldefense.proofpoint.com/v2/url?u=https-
> 3A__cwiki.apache.org_confluence_display_OODT_Mime-2Btype-2Bdetection-
> 2Bwith-2Bthe-
> 2BAutoDetectProductCrawler&d=CwIBaQ&c=CZZujK3G2KuXGEKnzq-
> Hrg&r=wndYZ4MLMT9l3Zb2WZv2hq2O6yvZ1Cs-
> T2gHY95y7ZA&m=AZOhzDmmNuBD_R9H2fm-
> CubVmid0OEJbXqk4G2cmzDs&s=V5fEGERshX3JHBTQXryhwoEZqhgarILk8WutE
> wICmGs&e=
> 
> On Mon, Apr 4, 2016 at 10:54 PM, Konstantinos Mavrommatis <
> kmavrommatis@celgene.com> wrote:
> 
> > Hi,
> > It seems to be happening for a number of types of files that I have in
> > the mimetypes.xml.
> > A few things are puzzling to me: this file which is a .gz file is not
> > processed by the regular tika mimetypes which contains the gzip files
> > A file that has no extension, which defaults to txt is passed to the
> > MetExtractor.pl and processed.
> >
> > Any ideas I can find what are the preconditions that fail ? I tried to
> > change the log level to DEBUG for all components but I did not get
> > much more information. This must be something that changed in the OODT
> > releases
> > >0.6 but could not find anything relevant in the release notes.
> > I also noticed in the documentation  of the AutoDecectProductCrawler
> > that it uses the file met-extr-preconditions.xml which I could not
> > find anywhere in the deployed OODT or the src directories. Could that
> > be a reason for the problem I observe?
> >
> > Thanks
> > K
> >
> > -----Original Message-----
> > From: Lewis John Mcgibbney [mailto:lewis.mcgibbney@gmail.com]
> > Sent: Monday, April 04, 2016 3:24 PM
> > To: dev@oodt.apache.org
> > Subject: Re: Transition from OODT 0.6 to 0.12 cannot find extractor
> > specifications
> >
> > Hi Konstantinos,
> > It appears to be happening with a tar.gz file as well right?
> >
> > WARNING: No extractor specs specified for
> > /mnt/celgene.rnd.combio.mmgp.external/TestSeqData/RNA-Seq/RawData/fast
> > q/cas-crawler-04-02-16.log.gz
> >
> > I wonder if it is the file names... However I would be extremely
> > surprised as I've seen some much more verbose file naming.
> > Lewis
> >
> > On Saturday, April 2, 2016, Konstantinos Mavrommatis <
> > kmavrommatis@celgene.com> wrote:
> >
> > > Hi,
> > > I am trying to replicate a fully functional service that I had setup
> > > long time ago using OODT 0.6 but I am having the following problem
> > > that does not allow me to ingest files. When I try to ingest files
> > > with the extension fastq.gz I get the line:
> > > WARNING: No extractor specs specified for
> > > /mnt/celgene.rnd.combio.mmgp.external/TestSeqData/RNA-Seq/RawData/fa
> > > st q/E837642_R1.fastq.gz Apr 02, 2016 10:12:14 PM
> > > org.apache.oodt.cas.crawl.ProductCrawler
> > > handleFile
> > > And of course the file is not ingested. This process works without
> > > problem with OODT 0.6 on a different server.
> > >
> > > The crawler command I am running is:
> > > ./crawler_launcher \
> > > --operation \
> > > --launchAutoCrawler \
> > > --productPath $FILEPATH \
> > > --filemgrUrl $OODT_FILEMGR_URL \
> > > --clientTransferer
> > > org.apache.oodt.cas.filemgr.datatransfer.InPlaceDataTransferFactory
> > > \ --mimeExtractorRepo ../policy/mime-extractor-map.xml \ --noRecur \
> > > --crawlForDirs 2>&1
> > >
> > >
> > >
> > > I have setup OODT 0.12 on a server which runs FM listening to port 9000.
> > > From a client machine I have verified that I can use FM to ingest
> > products.
> > > I am now trying to use crawler to crawl and ingest all files in a
> > > directory. Since I have non standard MIME types in these directories
> > > I have done the following:
> > > 1. Added my own mime types in policy/mimetypes.xml eg
> > >   <mime-type type="text/fastq">
> > >                 <glob pattern="*.fastq"/>
> > >                 <glob pattern="*.fastq.gz"/>
> > >                 <glob pattern="*.fastq.bz"/>
> > >                 <glob pattern="*.fastq.bz2"/>
> > >                 <glob pattern="*.fastq.bzip"/>
> > >                 <glob pattern="*.fq"/>
> > >                 <glob pattern="*.fq.gz"/>
> > >                 <glob pattern="*.fq.bz"/>
> > >                 <glob pattern="*.fq.bz2"/>
> > >                 <glob pattern="*.fq.bzip"/>
> > >         </mime-type>
> > > 2. created the file policy/mime-extractor-map.xml
> > >
> > >         <mime type="text/fastq">
> > >                 <extractor
> > > class="org.apache.oodt.cas.metadata.extractors.ExternMetExtractor">
> > >                         <config
> > > file="/apache-oodt/crawler/bin/fastq.config"/>
> > >                         <preCondComparators>
> > >                                 <preCondComparator
> > > id="CheckThatDataFileSizeIsGreaterThanZero"/>
> > >                         </preCondComparators>
> > >                 </extractor>
> > >         </mime>
> > >
> > > 3. created the file fastq.config
> > > <?xml version="1.0" encoding="UTF-8"?> <cas:externextractor
> > > xmlns:cas="https://urldefense.proofpoint.com/v2/url?u=http-
> 3A__oodt.jpl.nasa.gov_1.0_cas&d=CwIBaQ&c=CZZujK3G2KuXGEKnzq-
> Hrg&r=wndYZ4MLMT9l3Zb2WZv2hq2O6yvZ1Cs-
> T2gHY95y7ZA&m=AZOhzDmmNuBD_R9H2fm-
> CubVmid0OEJbXqk4G2cmzDs&s=FvkBYgoM8RnUm2ITaMjYb1s1sa9YtHvNL4c1M
> _KF06w&e= ">
> > >   <exec workingDir="">
> > >
> > >
> > <extractorBinPath>/apache-oodt/crawler/bin/MetExtractorNGS.pl</extract
> > orBinPath>
> > >       <args>
> > >          <arg isDataFile="true"></arg>
> > >         <arg>fastq</arg>
> > >       </args>
> > >    </exec>
> > > </cas:externextractor>
> > >
> > >
> > >
> > > The MetExtractorNGS.pl is a small perl script that opens the file to
> > > be ingested, gets some information and stores it in the .met file
> > > that corresponds to the file to be ingested and have manually
> > > verified that works as expected producing the correct met file.
> > >
> > > What am I missing here? Any ideas comments suggestions will be
> > > greatly appreciated.
> > > Thanks in advance for any help
> > > Kostas
> > >
> > >
> > >
> > > PS1 The full output from running the crawler command follows:
> > >
> > >
> > > Setting property 'StdProductCrawler.filemgrUrl'
> > > Setting property 'MetExtractorProductCrawler.filemgrUrl'
> > > Setting property 'AutoDetectProductCrawler.filemgrUrl'
> > > Setting property 'StdProductCrawler.clientTransferer'
> > > Setting property 'MetExtractorProductCrawler.clientTransferer'
> > > Setting property 'AutoDetectProductCrawler.clientTransferer'
> > > Setting property 'StdProductCrawler.noRecur'
> > > Setting property 'MetExtractorProductCrawler.noRecur'
> > > Setting property 'AutoDetectProductCrawler.noRecur'
> > > Setting property 'AutoDetectProductCrawler.mimeExtractorRepo'
> > > Setting property 'StdProductCrawler.productPath'
> > > Setting property 'MetExtractorProductCrawler.productPath'
> > > Setting property 'AutoDetectProductCrawler.productPath'
> > > Apr 02, 2016 10:12:13 PM
> > > org.springframework.beans.factory.config.PropertyOverrideConfigurer
> > > processKey
> > > FINE: Property 'AutoDetectProductCrawler.noRecur' set to value
> > > [true] Apr 02, 2016 10:12:13 PM
> > > org.springframework.beans.factory.config.PropertyOverrideConfigurer
> > > processKey
> > > FINE: Property 'StdProductCrawler.productPath' set to value
> > > [/mnt/celgene.rnd.combio.mmgp.external/TestSeqData/RNA-Seq/RawData/f
> > > as
> > > tq]
> > > Apr 02, 2016 10:12:13 PM
> > > org.springframework.beans.factory.config.PropertyOverrideConfigurer
> > > processKey
> > > FINE: Property 'MetExtractorProductCrawler.noRecur' set to value
> > > [true] Apr 02, 2016 10:12:13 PM
> > > org.springframework.beans.factory.config.PropertyOverrideConfigurer
> > > processKey
> > > FINE: Property 'AutoDetectProductCrawler.mimeExtractorRepo' set to
> > > value [../policy/mime-extractor-map.xml]
> > > Apr 02, 2016 10:12:13 PM
> > > org.springframework.beans.factory.config.PropertyOverrideConfigurer
> > > processKey
> > > FINE: Property 'MetExtractorProductCrawler.clientTransferer' set to
> > > value
> > > [org.apache.oodt.cas.filemgr.datatransfer.InPlaceDataTransferFactory
> > > ]
> > > Apr 02, 2016 10:12:13 PM
> > > org.springframework.beans.factory.config.PropertyOverrideConfigurer
> > > processKey
> > > FINE: Property 'AutoDetectProductCrawler.filemgrUrl' set to value [
> > > https://urldefense.proofpoint.com/v2/url?u=http-3A__192.168.8.44-3A9
> > > 00
> > > 0&d=CwIBaQ&c=CZZujK3G2KuXGEKnzq-
> Hrg&r=wndYZ4MLMT9l3Zb2WZv2hq2O6yvZ1C
> > > s-
> > >
> T2gHY95y7ZA&m=Qaz0eKz7FHe35NMF43A17ey59ANhAqJD5ZfwZQC0VRo&s=Ov
> pwZVR1
> > > Xq gKclL83VXAWh__c7nz87xK_nS-O7hIXqc&e= ] Apr 02, 2016 10:12:13 PM
> > > org.springframework.beans.factory.config.PropertyOverrideConfigurer
> > > processKey
> > > FINE: Property 'AutoDetectProductCrawler.clientTransferer' set to
> > > value
> > > [org.apache.oodt.cas.filemgr.datatransfer.InPlaceDataTransferFactory
> > > ]
> > > Apr 02, 2016 10:12:13 PM
> > > org.springframework.beans.factory.config.PropertyOverrideConfigurer
> > > processKey
> > > FINE: Property 'StdProductCrawler.noRecur' set to value [true] Apr
> > > 02,
> > > 2016 10:12:13 PM
> > > org.springframework.beans.factory.config.PropertyOverrideConfigurer
> > > processKey
> > > FINE: Property 'StdProductCrawler.filemgrUrl' set to value [
> > > https://urldefense.proofpoint.com/v2/url?u=http-3A__192.168.8.44-3A9
> > > 00
> > > 0&d=CwIBaQ&c=CZZujK3G2KuXGEKnzq-
> Hrg&r=wndYZ4MLMT9l3Zb2WZv2hq2O6yvZ1C
> > > s-
> > >
> T2gHY95y7ZA&m=Qaz0eKz7FHe35NMF43A17ey59ANhAqJD5ZfwZQC0VRo&s=Ov
> pwZVR1
> > > Xq gKclL83VXAWh__c7nz87xK_nS-O7hIXqc&e= ] Apr 02, 2016 10:12:13 PM
> > > org.springframework.beans.factory.config.PropertyOverrideConfigurer
> > > processKey
> > > FINE: Property 'AutoDetectProductCrawler.productPath' set to value
> > > [/mnt/celgene.rnd.combio.mmgp.external/TestSeqData/RNA-Seq/RawData/f
> > > as
> > > tq]
> > > Apr 02, 2016 10:12:13 PM
> > > org.springframework.beans.factory.config.PropertyOverrideConfigurer
> > > processKey
> > > FINE: Property 'MetExtractorProductCrawler.filemgrUrl' set to value
> > > [
> > > https://urldefense.proofpoint.com/v2/url?u=http-3A__192.168.8.44-3A9
> > > 00
> > > 0&d=CwIBaQ&c=CZZujK3G2KuXGEKnzq-
> Hrg&r=wndYZ4MLMT9l3Zb2WZv2hq2O6yvZ1C
> > > s-
> > >
> T2gHY95y7ZA&m=Qaz0eKz7FHe35NMF43A17ey59ANhAqJD5ZfwZQC0VRo&s=Ov
> pwZVR1
> > > Xq gKclL83VXAWh__c7nz87xK_nS-O7hIXqc&e= ] Apr 02, 2016 10:12:13 PM
> > > org.springframework.beans.factory.config.PropertyOverrideConfigurer
> > > processKey
> > > FINE: Property 'StdProductCrawler.clientTransferer' set to value
> > > [org.apache.oodt.cas.filemgr.datatransfer.InPlaceDataTransferFactory
> > > ]
> > > Apr 02, 2016 10:12:13 PM
> > > org.springframework.beans.factory.config.PropertyOverrideConfigurer
> > > processKey
> > > FINE: Property 'MetExtractorProductCrawler.productPath' set to value
> > > [/mnt/celgene.rnd.combio.mmgp.external/TestSeqData/RNA-Seq/RawData/f
> > > as tq] Apr 02, 2016 10:12:13 PM
> > > org.apache.oodt.cas.crawl.ProductCrawler
> > > crawl
> > > INFO: Crawling
> > > /mnt/celgene.rnd.combio.mmgp.external/TestSeqData/RNA-Seq/RawData/fa
> > > st q Apr 02, 2016 10:12:13 PM
> > > org.apache.oodt.cas.crawl.ProductCrawler
> > > handleFile
> > > INFO: Handling file
> > > /mnt/celgene.rnd.combio.mmgp.external/TestSeqData/RNA-Seq/RawData/fa
> > > st
> > > q/E837642_R1.fastq.gz
> > > Apr 02, 2016 10:12:14 PM
> > > org.apache.oodt.cas.crawl.AutoDetectProductCrawler
> > > passesPreconditions
> > > WARNING: No extractor specs specified for
> > > /mnt/celgene.rnd.combio.mmgp.external/TestSeqData/RNA-Seq/RawData/fa
> > > st q/E837642_R1.fastq.gz Apr 02, 2016 10:12:14 PM
> > > org.apache.oodt.cas.crawl.ProductCrawler
> > > handleFile
> > > WARNING: Failed to pass preconditions for ingest of product:
> > > [/mnt/celgene.rnd.combio.mmgp.external/TestSeqData/RNA-Seq/RawData/f
> > > as tq/E837642_R1.fastq.gz] Apr 02, 2016 10:12:14 PM
> > > org.apache.oodt.cas.crawl.ProductCrawler
> > > handleFile
> > > INFO: Handling file
> > > /mnt/celgene.rnd.combio.mmgp.external/TestSeqData/RNA-Seq/RawData/fa
> > > st
> > > q/E837642_R1.fastq.gz.met
> > > Apr 02, 2016 10:12:14 PM
> > > org.apache.oodt.cas.metadata.preconditions.PreCondEvalUtils eval
> > > INFO: Passed precondition comparator id
> > > CheckThatDataFileSizeIsGreaterThanZero
> > > Apr 02, 2016 10:12:14 PM
> > > org.apache.oodt.cas.metadata.extractors.ExternMetExtractor
> > > extrMetadata
> > > INFO: Generating met file for product file:
> > > [/mnt/celgene.rnd.combio.mmgp.external/TestSeqData/RNA-Seq/RawData/f
> > > as
> > > tq/E837642_R1.fastq.gz.met]
> > > Apr 02, 2016 10:12:14 PM
> > > org.apache.oodt.cas.metadata.extractors.ExternMetExtractor
> > > extrMetadata
> > > INFO: Executing command line:
> > > [/celgene/software/apache-oodt/crawler/bin/MetExtractorNGS.pl
> > > /mnt/celgene.rnd.combio.mmgp.external/TestSeqData/RNA-Seq/RawData/fa
> > > st
> > > q/E837642_R1.fastq.gz.met
> > > text ] with workingDir:
> > > [/mnt/celgene.rnd.combio.mmgp.external/TestSeqData/RNA-Seq/RawData/f
> > > as
> > > tq]
> > > to extract metadata
> > > OUTPUT: [WARN : MetExtractorNGS - 2016/04/02 22:12:15] - Input file
> > > /mnt/celgene.rnd.combio.mmgp.external/TestSeqData/RNA-Seq/RawData/fa
> > > st q/E837642_R1.fastq.gz.met will be ignored. .met files are not
> > > processed !
> > > Apr 02, 2016 10:12:15 PM org.apache.oodt.cas.crawl.ProductCrawler
> > > handleFile
> > > SEVERE: Failed to get metadata for product : Met extractor failed to
> > > create metadata file
> > > org.apache.oodt.cas.metadata.exceptions.MetExtractionException: Met
> > > extractor failed to create metadata file
> > >         at
> > >
> > org.apache.oodt.cas.metadata.extractors.ExternMetExtractor.extrMetadat
> > a(ExternMetExtractor.java:120)
> > >         at
> > >
> > org.apache.oodt.cas.metadata.AbstractMetExtractor.extractMetadata(Abst
> > ractMetExtractor.java:74)
> > >         at
> > >
> > org.apache.oodt.cas.crawl.AutoDetectProductCrawler.getMetadataForProdu
> > ct(AutoDetectProductCrawler.java:84)
> > >         at
> > >
> > org.apache.oodt.cas.crawl.ProductCrawler.handleFile(ProductCrawler.jav
> > a:136)
> > >         at
> > > org.apache.oodt.cas.crawl.ProductCrawler.crawl(ProductCrawler.java:104)
> > >         at
> > > org.apache.oodt.cas.crawl.ProductCrawler.crawl(ProductCrawler.java:74)
> > >         at
> > >
> > org.apache.oodt.cas.crawl.cli.action.CrawlerLauncherCliAction.execute(
> > CrawlerLauncherCliAction.java:58)
> > >         at
> > > org.apache.oodt.cas.cli.CmdLineUtility.execute(CmdLineUtility.java:331)
> > >         at
> > > org.apache.oodt.cas.cli.CmdLineUtility.run(CmdLineUtility.java:188)
> > >         at
> > > org.apache.oodt.cas.crawl.CrawlerLauncher.main(CrawlerLauncher.java:
> > > 36
> > > )
> > >
> > > Apr 02, 2016 10:12:15 PM org.apache.oodt.cas.crawl.ProductCrawler
> > > handleFile
> > > INFO: Handling file
> > > /mnt/celgene.rnd.combio.mmgp.external/TestSeqData/RNA-Seq/RawData/fa
> > > st
> > > q/E837642_R2.fastq.gz
> > > Apr 02, 2016 10:12:15 PM
> > > org.apache.oodt.cas.crawl.AutoDetectProductCrawler
> > > passesPreconditions
> > > WARNING: No extractor specs specified for
> > > /mnt/celgene.rnd.combio.mmgp.external/TestSeqData/RNA-Seq/RawData/fa
> > > st q/E837642_R2.fastq.gz Apr 02, 2016 10:12:15 PM
> > > org.apache.oodt.cas.crawl.ProductCrawler
> > > handleFile
> > > WARNING: Failed to pass preconditions for ingest of product:
> > > [/mnt/celgene.rnd.combio.mmgp.external/TestSeqData/RNA-Seq/RawData/f
> > > as tq/E837642_R2.fastq.gz] Apr 02, 2016 10:12:15 PM
> > > org.apache.oodt.cas.crawl.ProductCrawler
> > > handleFile
> > > INFO: Handling file
> > > /mnt/celgene.rnd.combio.mmgp.external/TestSeqData/RNA-Seq/RawData/fa
> > > st
> > > q/E837642_R2.fastq.gz.met
> > > Apr 02, 2016 10:12:15 PM
> > > org.apache.oodt.cas.metadata.preconditions.PreCondEvalUtils eval
> > > INFO: Passed precondition comparator id
> > > CheckThatDataFileSizeIsGreaterThanZero
> > > Apr 02, 2016 10:12:16 PM
> > > org.apache.oodt.cas.metadata.extractors.ExternMetExtractor
> > > extrMetadata
> > > INFO: Generating met file for product file:
> > > [/mnt/celgene.rnd.combio.mmgp.external/TestSeqData/RNA-Seq/RawData/f
> > > as
> > > tq/E837642_R2.fastq.gz.met]
> > > Apr 02, 2016 10:12:16 PM
> > > org.apache.oodt.cas.metadata.extractors.ExternMetExtractor
> > > extrMetadata
> > > INFO: Executing command line:
> > > [/celgene/software/apache-oodt/crawler/bin/MetExtractorNGS.pl
> > > /mnt/celgene.rnd.combio.mmgp.external/TestSeqData/RNA-Seq/RawData/fa
> > > st
> > > q/E837642_R2.fastq.gz.met
> > > text ] with workingDir:
> > > [/mnt/celgene.rnd.combio.mmgp.external/TestSeqData/RNA-Seq/RawData/f
> > > as
> > > tq]
> > > to extract metadata
> > > OUTPUT: [WARN : MetExtractorNGS - 2016/04/02 22:12:16] - Input file
> > > /mnt/celgene.rnd.combio.mmgp.external/TestSeqData/RNA-Seq/RawData/fa
> > > st q/E837642_R2.fastq.gz.met will be ignored. .met files are not
> > > processed !
> > > Apr 02, 2016 10:12:17 PM org.apache.oodt.cas.crawl.ProductCrawler
> > > handleFile
> > > SEVERE: Failed to get metadata for product : Met extractor failed to
> > > create metadata file
> > > org.apache.oodt.cas.metadata.exceptions.MetExtractionException: Met
> > > extractor failed to create metadata file
> > >         at
> > >
> > org.apache.oodt.cas.metadata.extractors.ExternMetExtractor.extrMetadat
> > a(ExternMetExtractor.java:120)
> > >         at
> > >
> > org.apache.oodt.cas.metadata.AbstractMetExtractor.extractMetadata(Abst
> > ractMetExtractor.java:74)
> > >         at
> > >
> > org.apache.oodt.cas.crawl.AutoDetectProductCrawler.getMetadataForProdu
> > ct(AutoDetectProductCrawler.java:84)
> > >         at
> > >
> > org.apache.oodt.cas.crawl.ProductCrawler.handleFile(ProductCrawler.jav
> > a:136)
> > >         at
> > > org.apache.oodt.cas.crawl.ProductCrawler.crawl(ProductCrawler.java:104)
> > >         at
> > > org.apache.oodt.cas.crawl.ProductCrawler.crawl(ProductCrawler.java:74)
> > >         at
> > >
> > org.apache.oodt.cas.crawl.cli.action.CrawlerLauncherCliAction.execute(
> > CrawlerLauncherCliAction.java:58)
> > >         at
> > > org.apache.oodt.cas.cli.CmdLineUtility.execute(CmdLineUtility.java:331)
> > >         at
> > > org.apache.oodt.cas.cli.CmdLineUtility.run(CmdLineUtility.java:188)
> > >         at
> > > org.apache.oodt.cas.crawl.CrawlerLauncher.main(CrawlerLauncher.java:
> > > 36
> > > )
> > >
> > > Apr 02, 2016 10:12:17 PM org.apache.oodt.cas.crawl.ProductCrawler
> > > handleFile
> > > INFO: Handling file
> > > /mnt/celgene.rnd.combio.mmgp.external/TestSeqData/RNA-Seq/RawData/fa
> > > st
> > > q/cas-crawler-04-02-16.log.gz
> > > Apr 02, 2016 10:12:17 PM
> > > org.apache.oodt.cas.crawl.AutoDetectProductCrawler
> > > passesPreconditions
> > > WARNING: No extractor specs specified for
> > > /mnt/celgene.rnd.combio.mmgp.external/TestSeqData/RNA-Seq/RawData/fa
> > > st q/cas-crawler-04-02-16.log.gz Apr 02, 2016 10:12:17 PM
> > > org.apache.oodt.cas.crawl.ProductCrawler
> > > handleFile
> > > WARNING: Failed to pass preconditions for ingest of product:
> > > [/mnt/celgene.rnd.combio.mmgp.external/TestSeqData/RNA-Seq/RawData/f
> > > as tq/cas-crawler-04-02-16.log.gz] Apr 02, 2016 10:12:17 PM
> > > org.apache.oodt.cas.crawl.ProductCrawler
> > > handleFile
> > > INFO: Handling file
> > > /mnt/celgene.rnd.combio.mmgp.external/TestSeqData/RNA-Seq/RawData/fa
> > > st
> > > q/cas-crawler-04-02-16.tar.gz
> > > Apr 02, 2016 10:12:17 PM
> > > org.apache.oodt.cas.crawl.AutoDetectProductCrawler
> > > passesPreconditions
> > > WARNING: No extractor specs specified for
> > > /mnt/celgene.rnd.combio.mmgp.external/TestSeqData/RNA-Seq/RawData/fa
> > > st q/cas-crawler-04-02-16.tar.gz Apr 02, 2016 10:12:17 PM
> > > org.apache.oodt.cas.crawl.ProductCrawler
> > > handleFile
> > > WARNING: Failed to pass preconditions for ingest of product:
> > > [/mnt/celgene.rnd.combio.mmgp.external/TestSeqData/RNA-Seq/RawData/f
> > > as tq/cas-crawler-04-02-16.tar.gz] Apr 02, 2016 10:12:17 PM
> > > org.apache.oodt.cas.crawl.ProductCrawler
> > > handleFile
> > > INFO: Handling file
> > > /mnt/celgene.rnd.combio.mmgp.external/TestSeqData/RNA-Seq/RawData/fa
> > > st
> > > q/cas-crawler-mnt-celgene.rnd.combio.mmgp.external-TestSeqData-RNA-S
> > > eq
> > > -RawData-fastq-04-02-16.tar.gz
> > > Apr 02, 2016 10:12:17 PM
> > > org.apache.oodt.cas.crawl.AutoDetectProductCrawler
> > > passesPreconditions
> > > WARNING: No extractor specs specified for
> > > /mnt/celgene.rnd.combio.mmgp.external/TestSeqData/RNA-Seq/RawData/fa
> > > st
> > > q/cas-crawler-mnt-celgene.rnd.combio.mmgp.external-TestSeqData-RNA-S
> > > eq -RawData-fastq-04-02-16.tar.gz Apr 02, 2016 10:12:17 PM
> > > org.apache.oodt.cas.crawl.ProductCrawler
> > > handleFile
> > > WARNING: Failed to pass preconditions for ingest of product:
> > > [/mnt/celgene.rnd.combio.mmgp.external/TestSeqData/RNA-Seq/RawData/f
> > > as
> > > tq/cas-crawler-mnt-celgene.rnd.combio.mmgp.external-TestSeqData-RNA-
> > > Se q-RawData-fastq-04-02-16.tar.gz] Apr 02, 2016 10:12:17 PM
> > > org.apache.oodt.cas.crawl.ProductCrawler
> > > handleFile
> > > INFO: Handling file
> > > /mnt/celgene.rnd.combio.mmgp.external/TestSeqData/RNA-Seq/RawData/fa
> > > st
> > > q/test
> > > Apr 02, 2016 10:12:17 PM
> > > org.apache.oodt.cas.metadata.preconditions.PreCondEvalUtils eval
> > > INFO: Passed precondition comparator id
> > > CheckThatDataFileSizeIsGreaterThanZero
> > > Apr 02, 2016 10:12:17 PM
> > > org.apache.oodt.cas.metadata.extractors.ExternMetExtractor
> > > extrMetadata
> > > INFO: Generating met file for product file:
> > > [/mnt/celgene.rnd.combio.mmgp.external/TestSeqData/RNA-Seq/RawData/f
> > > as
> > > tq/test]
> > > Apr 02, 2016 10:12:17 PM
> > > org.apache.oodt.cas.metadata.extractors.ExternMetExtractor
> > > extrMetadata
> > > INFO: Executing command line:
> > > [/celgene/software/apache-oodt/crawler/bin/MetExtractorNGS.pl
> > > /mnt/celgene.rnd.combio.mmgp.external/TestSeqData/RNA-Seq/RawData/fa
> > > st
> > > q/test
> > > text ] with workingDir:
> > > [/mnt/celgene.rnd.combio.mmgp.external/TestSeqData/RNA-Seq/RawData/f
> > > as
> > > tq]
> > > to extract metadata
> > > OUTPUT: [DEBUG : MetExtractorNGS - 2016/04/02 22:12:18] - Accessing
> > > NGS server at
> > > https://urldefense.proofpoint.com/v2/url?u=http-3A__192.168.8.44-3A8
> > > 08
> > > 2_RPC2&d=CwIBaQ&c=CZZujK3G2KuXGEKnzq-
> Hrg&r=wndYZ4MLMT9l3Zb2WZv2hq2O6
> > > yv
> > > Z1Cs-
> T2gHY95y7ZA&m=Qaz0eKz7FHe35NMF43A17ey59ANhAqJD5ZfwZQC0VRo&s=tS
> c
> > > i2 Q1bJj0cQnBHjjOwtZjjx9uNMoN5Bi-ABG0Q7Y4&e=
> > > OUTPUT: [DEBUG : metadataPrepare - 2016/04/02 22:12:18] - addMetadata:
> > > metadata for file_host are not in array format.Converting..
> > > OUTPUT: [DEBUG : metadataPrepare - 2016/04/02 22:12:18] - addMetadata:
> > > adding key/value [file_host]/[ip-192-168-8-66]
> > > OUTPUT: [DEBUG : metadataPrepare - 2016/04/02 22:12:18] - addMetadata:
> > > metadata for ProductType are not in array format.Converting..
> > > OUTPUT: [DEBUG : metadataPrepare - 2016/04/02 22:12:18] - addMetadata:
> > > adding key/value [ProductType]/[GenericFile]
> > > OUTPUT: [DEBUG : metadataPrepare - 2016/04/02 22:12:18] - addMetadata:
> > > metadata for ingest_user are not in array format.Converting..
> > > OUTPUT: [DEBUG : metadataPrepare - 2016/04/02 22:12:18] - addMetadata:
> > > adding key/value [ingest_user]/[kmavrommatis]
> > > OUTPUT: [DEBUG : MetExtractorNGS - 2016/04/02 22:12:18] - The file
> > > path is ARRAY(0x22d3f48). It will be added under the FilePath
> > > metadata field
> > > OUTPUT: [DEBUG : metadataPrepare - 2016/04/02 22:12:18] - addMetadata:
> > > metadata for FilePath are not in array format.Converting..
> > > OUTPUT: [DEBUG : metadataPrepare - 2016/04/02 22:12:18] - addMetadata:
> > > adding key/value
> > > [FilePath]/[/mnt/celgene.rnd.combio.mmgp.external/TestSeqData/RNA-Se
> > > q/
> > > RawData/fastq/test]
> > > OUTPUT: [DEBUG : MetExtractorNGS - 2016/04/02 22:12:18] - This file
> > > is of type text
> > > OUTPUT: [DEBUG : MetExtractorNGS - 2016/04/02 22:12:18] - Storing
> > > metadata in file
> > > /mnt/celgene.rnd.combio.mmgp.external/TestSeqData/RNA-Seq/RawData/fa
> > > st
> > > q/test.met
> > > OUTPUT: [DEBUG : metadataPrepare - 2016/04/02 22:12:18] - Changing
> > > /mnt/celgene.rnd.combio.mmgp.external/TestSeqData/RNA-Seq/RawData/fa
> > > st
> > > q/test
> > > to
> > > OUTPUT: [DEBUG : metadataPrepare - 2016/04/02 22:12:18] -
> > > /mnt/celgene.rnd.combio.mmgp.external/TestSeqData/RNA-Seq/RawData/fa
> > > st
> > > q/test
> > > OUTPUT: [DEBUG : metadataPrepare - 2016/04/02 22:12:18] - Changing
> > > kmavrommatis to
> > > OUTPUT: [DEBUG : metadataPrepare - 2016/04/02 22:12:18] -
> > > kmavrommatis
> > > OUTPUT: [DEBUG : metadataPrepare - 2016/04/02 22:12:18] - Changing
> > > GenericFile to
> > > OUTPUT: [DEBUG : metadataPrepare - 2016/04/02 22:12:18] -
> > > GenericFile
> > > OUTPUT: [DEBUG : metadataPrepare - 2016/04/02 22:12:18] - Changing
> > > ip-192-168-8-66 to
> > > OUTPUT: [DEBUG : metadataPrepare - 2016/04/02 22:12:18] -
> > > ip-192-168-8-66
> > > OUTPUT: [DEBUG : MetExtractorNGS - 2016/04/02 22:12:19] - Process
> > > finished SUCCESSFULLY Apr 02, 2016 10:12:19 PM
> > > org.apache.oodt.cas.metadata.extractors.ExternMetExtractor
> > > extrMetadata
> > > INFO: Met extraction successful for product file:
> > > [/mnt/celgene.rnd.combio.mmgp.external/TestSeqData/RNA-Seq/RawData/f
> > > as
> > > tq/test] Apr 02, 2016 10:12:19 PM
> > > org.apache.oodt.cas.crawl.ProductCrawler ingest
> > > INFO: ProductCrawler: Ready to ingest product:
> > >
> > [/mnt/celgene.rnd.combio.mmgp.external/TestSeqData/RNA-
> Seq/RawData/fastq/test]:
> > > ProductType: [GenericFile]
> > > Apr 02, 2016 10:12:19 PM
> > > org.apache.oodt.cas.filemgr.ingest.StdIngester
> > > setFileManager
> > > INFO: StdIngester: connected to file manager:
> > > [https://urldefense.proofpoint.com/v2/url?u=http-3A__192.168.8.44-3A
> > > 90
> > > 00&d=CwIBaQ&c=CZZujK3G2KuXGEKnzq-
> Hrg&r=wndYZ4MLMT9l3Zb2WZv2hq2O6yvZ1
> > > Cs
> > > -
> T2gHY95y7ZA&m=Qaz0eKz7FHe35NMF43A17ey59ANhAqJD5ZfwZQC0VRo&s=Ov
> pwZVR
> > > 1X qgKclL83VXAWh__c7nz87xK_nS-O7hIXqc&e= ] Apr 02, 2016 10:12:19 PM
> > > org.apache.oodt.cas.filemgr.datatransfer.InPlaceDataTransferer
> > > setFileManagerUrl
> > > INFO: In Place Data Transfer to:
> > > [https://urldefense.proofpoint.com/v2/url?u=http-3A__192.168.8.44-3A
> > > 90
> > > 00&d=CwIBaQ&c=CZZujK3G2KuXGEKnzq-
> Hrg&r=wndYZ4MLMT9l3Zb2WZv2hq2O6yvZ1
> > > Cs
> > > -
> T2gHY95y7ZA&m=Qaz0eKz7FHe35NMF43A17ey59ANhAqJD5ZfwZQC0VRo&s=Ov
> pwZVR
> > > 1X qgKclL83VXAWh__c7nz87xK_nS-O7hIXqc&e= ] enabled Apr 02, 2016
> > > 10:12:19 PM org.apache.oodt.cas.filemgr.ingest.StdIngester
> > > ingest
> > > INFO: StdIngester: ingesting product: ProductName: [test]: ProductType:
> > > [GenericFile]: FileLocation:
> > > [/mnt/celgene.rnd.combio.mmgp.external/TestSeqData/RNA-Seq/RawData/f
> > > as
> > > tq/]
> > > Apr 02, 2016 10:12:19 PM
> > > org.apache.oodt.cas.filemgr.system.XmlRpcFileManagerClient
> > > ingestProduct
> > > FINEST: File Manager Client: clientTransfer enabled: transfering
> > > product [test] Apr 02, 2016 10:12:19 PM
> > > org.apache.oodt.cas.filemgr.versioning.VersioningUtils
> > > createBasicDataStoreRefsFlat
> > > FINE: VersioningUtils: Generated data store ref:
> > > file:/opt/oodt/data/archive/test/test from origRef:
> > > file:/mnt/celgene.rnd.combio.mmgp.external/TestSeqData/RNA-Seq/RawDa
> > > ta /fastq/test Apr 02, 2016 10:12:19 PM
> > > org.apache.oodt.cas.crawl.ProductCrawler ingest
> > > INFO: Successfully ingested product:
> > >
> > [/mnt/celgene.rnd.combio.mmgp.external/TestSeqData/RNA-
> Seq/RawData/fastq/test]:
> > > product id: 4c8de2da-265a-48c4-8380-3f1103dfecfc
> > > Apr 02, 2016 10:12:19 PM org.apache.oodt.cas.crawl.ProductCrawler
> > > handleFile
> > > INFO: Successful ingest of product:
> > > [/mnt/celgene.rnd.combio.mmgp.external/TestSeqData/RNA-Seq/RawData/f
> > > as
> > > tq/test]
> > >
> > >
> > > *********************************************************
> > > THIS ELECTRONIC MAIL MESSAGE AND ANY ATTACHMENT IS
> CONFIDENTIAL AND
> > > MAY CONTAIN LEGALLY PRIVILEGED INFORMATION INTENDED ONLY FOR
> THE USE
> > > OF THE INDIVIDUAL OR INDIVIDUALS NAMED ABOVE.
> > > If the reader is not the intended recipient, or the employee or
> > > agent responsible to deliver it to the intended recipient, you are
> > > hereby notified that any dissemination, distribution or copying of
> > > this communication is strictly prohibited. If you have received this
> > > communication in error, please reply to the sender to notify us of
> > > the error and delete the original message. Thank You.
> > >
> >
> >
> > --
> > *Lewis*
> >
> > *********************************************************
> > THIS ELECTRONIC MAIL MESSAGE AND ANY ATTACHMENT IS
> CONFIDENTIAL AND
> > MAY CONTAIN LEGALLY PRIVILEGED INFORMATION INTENDED ONLY FOR
> THE USE
> > OF THE INDIVIDUAL OR INDIVIDUALS NAMED ABOVE.
> > If the reader is not the intended recipient, or the employee or agent
> > responsible to deliver it to the intended recipient, you are hereby
> > notified that any dissemination, distribution or copying of this
> > communication is strictly prohibited. If you have received this
> > communication in error, please reply to the sender to notify us of the
> > error and delete the original message. Thank You.
> >
> 
> 
> 
> --
> *Lewis*
> 
> *********************************************************
> THIS ELECTRONIC MAIL MESSAGE AND ANY ATTACHMENT IS CONFIDENTIAL
> AND MAY CONTAIN LEGALLY PRIVILEGED INFORMATION INTENDED ONLY
> FOR THE USE OF THE INDIVIDUAL OR INDIVIDUALS NAMED ABOVE.
> If the reader is not the intended recipient, or the employee or agent responsible to
> deliver it to the intended recipient, you are hereby notified that any dissemination,
> distribution or copying of this communication is strictly prohibited. If you have
> received this communication in error, please reply to the sender to notify us of the
> error and delete the original message. Thank You.

RE: Transition from OODT 0.6 to 0.12 cannot find extractor specifications

Posted by Konstantinos Mavrommatis <km...@celgene.com>.
I am giving up on this....
I had used [1] in the first place to setup oodt (v0.6 back then) my setup in the new system is identical to the old one.
I could not make much out of [0]. Among other things I tried to copy the files in the old crawler/policy to the new crawler/policy - which included some legacy-cmd-line-options.xml, legacy-cmd-line actions.xml. I also tried to reinstall the full oodt on the client side, but still did not work. 

I ended up reverting to the older version (0.6) which I run on my client. The server (which runs FM) is still 0.12, but the combination seems to be working fine.

K

-----Original Message-----
From: Lewis John Mcgibbney [mailto:lewis.mcgibbney@gmail.com] 
Sent: Tuesday, April 05, 2016 3:33 AM
To: dev@oodt.apache.org
Subject: Re: Transition from OODT 0.6 to 0.12 cannot find extractor specifications

Hi K,
OK so I did a bit of searching here and located a bunch of files which are defined as legacy... you can check the search results out below https://urldefense.proofpoint.com/v2/url?u=https-3A__github.com_apache_oodt_search-3Futf8-3D-25E2-259C-2593-26q-3DAutoDetectProductCrawler-26type-3DCode&d=CwIBaQ&c=CZZujK3G2KuXGEKnzq-Hrg&r=wndYZ4MLMT9l3Zb2WZv2hq2O6yvZ1Cs-T2gHY95y7ZA&m=AZOhzDmmNuBD_R9H2fm-CubVmid0OEJbXqk4G2cmzDs&s=B33E_m-BUEEQBqIqa2J8tZ2vnLqfwapWZp9Rn5nRyU8&e=
I would urge you to have a look at the AutoDetectProductCrawler Javadoc description included in master branch [0] as well to see if you've got everything required.
Finally, I came across some documentation on the wiki which may guide you in the right direction [1]. It may also be outdated though so please let us know if that it the case.
hth

[0]
https://urldefense.proofpoint.com/v2/url?u=https-3A__github.com_apache_oodt_blob_91d0bafe71124906bd94baad746189caf35fb39c_crawler_src_main_java_org_apache_oodt_cas_crawl_AutoDetectProductCrawler.java-23L40-2DL64&d=CwIBaQ&c=CZZujK3G2KuXGEKnzq-Hrg&r=wndYZ4MLMT9l3Zb2WZv2hq2O6yvZ1Cs-T2gHY95y7ZA&m=AZOhzDmmNuBD_R9H2fm-CubVmid0OEJbXqk4G2cmzDs&s=rJpNgTfZDhDyGV5KksACkvbSnkVvobGfBQcxXiLWwT4&e=
[1]
https://urldefense.proofpoint.com/v2/url?u=https-3A__cwiki.apache.org_confluence_display_OODT_Mime-2Btype-2Bdetection-2Bwith-2Bthe-2BAutoDetectProductCrawler&d=CwIBaQ&c=CZZujK3G2KuXGEKnzq-Hrg&r=wndYZ4MLMT9l3Zb2WZv2hq2O6yvZ1Cs-T2gHY95y7ZA&m=AZOhzDmmNuBD_R9H2fm-CubVmid0OEJbXqk4G2cmzDs&s=V5fEGERshX3JHBTQXryhwoEZqhgarILk8WutEwICmGs&e= 

On Mon, Apr 4, 2016 at 10:54 PM, Konstantinos Mavrommatis < kmavrommatis@celgene.com> wrote:

> Hi,
> It seems to be happening for a number of types of files that I have in 
> the mimetypes.xml.
> A few things are puzzling to me: this file which is a .gz file is not 
> processed by the regular tika mimetypes which contains the gzip files 
> A file that has no extension, which defaults to txt is passed to the 
> MetExtractor.pl and processed.
>
> Any ideas I can find what are the preconditions that fail ? I tried to 
> change the log level to DEBUG for all components but I did not get 
> much more information. This must be something that changed in the OODT 
> releases
> >0.6 but could not find anything relevant in the release notes.
> I also noticed in the documentation  of the AutoDecectProductCrawler 
> that it uses the file met-extr-preconditions.xml which I could not 
> find anywhere in the deployed OODT or the src directories. Could that 
> be a reason for the problem I observe?
>
> Thanks
> K
>
> -----Original Message-----
> From: Lewis John Mcgibbney [mailto:lewis.mcgibbney@gmail.com]
> Sent: Monday, April 04, 2016 3:24 PM
> To: dev@oodt.apache.org
> Subject: Re: Transition from OODT 0.6 to 0.12 cannot find extractor 
> specifications
>
> Hi Konstantinos,
> It appears to be happening with a tar.gz file as well right?
>
> WARNING: No extractor specs specified for 
> /mnt/celgene.rnd.combio.mmgp.external/TestSeqData/RNA-Seq/RawData/fast
> q/cas-crawler-04-02-16.log.gz
>
> I wonder if it is the file names... However I would be extremely 
> surprised as I've seen some much more verbose file naming.
> Lewis
>
> On Saturday, April 2, 2016, Konstantinos Mavrommatis < 
> kmavrommatis@celgene.com> wrote:
>
> > Hi,
> > I am trying to replicate a fully functional service that I had setup 
> > long time ago using OODT 0.6 but I am having the following problem 
> > that does not allow me to ingest files. When I try to ingest files 
> > with the extension fastq.gz I get the line:
> > WARNING: No extractor specs specified for 
> > /mnt/celgene.rnd.combio.mmgp.external/TestSeqData/RNA-Seq/RawData/fa
> > st q/E837642_R1.fastq.gz Apr 02, 2016 10:12:14 PM 
> > org.apache.oodt.cas.crawl.ProductCrawler
> > handleFile
> > And of course the file is not ingested. This process works without 
> > problem with OODT 0.6 on a different server.
> >
> > The crawler command I am running is:
> > ./crawler_launcher \
> > --operation \
> > --launchAutoCrawler \
> > --productPath $FILEPATH \
> > --filemgrUrl $OODT_FILEMGR_URL \
> > --clientTransferer
> > org.apache.oodt.cas.filemgr.datatransfer.InPlaceDataTransferFactory 
> > \ --mimeExtractorRepo ../policy/mime-extractor-map.xml \ --noRecur \ 
> > --crawlForDirs 2>&1
> >
> >
> >
> > I have setup OODT 0.12 on a server which runs FM listening to port 9000.
> > From a client machine I have verified that I can use FM to ingest
> products.
> > I am now trying to use crawler to crawl and ingest all files in a 
> > directory. Since I have non standard MIME types in these directories 
> > I have done the following:
> > 1. Added my own mime types in policy/mimetypes.xml eg
> >   <mime-type type="text/fastq">
> >                 <glob pattern="*.fastq"/>
> >                 <glob pattern="*.fastq.gz"/>
> >                 <glob pattern="*.fastq.bz"/>
> >                 <glob pattern="*.fastq.bz2"/>
> >                 <glob pattern="*.fastq.bzip"/>
> >                 <glob pattern="*.fq"/>
> >                 <glob pattern="*.fq.gz"/>
> >                 <glob pattern="*.fq.bz"/>
> >                 <glob pattern="*.fq.bz2"/>
> >                 <glob pattern="*.fq.bzip"/>
> >         </mime-type>
> > 2. created the file policy/mime-extractor-map.xml
> >
> >         <mime type="text/fastq">
> >                 <extractor
> > class="org.apache.oodt.cas.metadata.extractors.ExternMetExtractor">
> >                         <config
> > file="/apache-oodt/crawler/bin/fastq.config"/>
> >                         <preCondComparators>
> >                                 <preCondComparator 
> > id="CheckThatDataFileSizeIsGreaterThanZero"/>
> >                         </preCondComparators>
> >                 </extractor>
> >         </mime>
> >
> > 3. created the file fastq.config
> > <?xml version="1.0" encoding="UTF-8"?> <cas:externextractor 
> > xmlns:cas="https://urldefense.proofpoint.com/v2/url?u=http-3A__oodt.jpl.nasa.gov_1.0_cas&d=CwIBaQ&c=CZZujK3G2KuXGEKnzq-Hrg&r=wndYZ4MLMT9l3Zb2WZv2hq2O6yvZ1Cs-T2gHY95y7ZA&m=AZOhzDmmNuBD_R9H2fm-CubVmid0OEJbXqk4G2cmzDs&s=FvkBYgoM8RnUm2ITaMjYb1s1sa9YtHvNL4c1M_KF06w&e= ">
> >   <exec workingDir="">
> >
> >
> <extractorBinPath>/apache-oodt/crawler/bin/MetExtractorNGS.pl</extract
> orBinPath>
> >       <args>
> >          <arg isDataFile="true"></arg>
> >         <arg>fastq</arg>
> >       </args>
> >    </exec>
> > </cas:externextractor>
> >
> >
> >
> > The MetExtractorNGS.pl is a small perl script that opens the file to 
> > be ingested, gets some information and stores it in the .met file 
> > that corresponds to the file to be ingested and have manually 
> > verified that works as expected producing the correct met file.
> >
> > What am I missing here? Any ideas comments suggestions will be 
> > greatly appreciated.
> > Thanks in advance for any help
> > Kostas
> >
> >
> >
> > PS1 The full output from running the crawler command follows:
> >
> >
> > Setting property 'StdProductCrawler.filemgrUrl'
> > Setting property 'MetExtractorProductCrawler.filemgrUrl'
> > Setting property 'AutoDetectProductCrawler.filemgrUrl'
> > Setting property 'StdProductCrawler.clientTransferer'
> > Setting property 'MetExtractorProductCrawler.clientTransferer'
> > Setting property 'AutoDetectProductCrawler.clientTransferer'
> > Setting property 'StdProductCrawler.noRecur'
> > Setting property 'MetExtractorProductCrawler.noRecur'
> > Setting property 'AutoDetectProductCrawler.noRecur'
> > Setting property 'AutoDetectProductCrawler.mimeExtractorRepo'
> > Setting property 'StdProductCrawler.productPath'
> > Setting property 'MetExtractorProductCrawler.productPath'
> > Setting property 'AutoDetectProductCrawler.productPath'
> > Apr 02, 2016 10:12:13 PM
> > org.springframework.beans.factory.config.PropertyOverrideConfigurer
> > processKey
> > FINE: Property 'AutoDetectProductCrawler.noRecur' set to value 
> > [true] Apr 02, 2016 10:12:13 PM 
> > org.springframework.beans.factory.config.PropertyOverrideConfigurer
> > processKey
> > FINE: Property 'StdProductCrawler.productPath' set to value 
> > [/mnt/celgene.rnd.combio.mmgp.external/TestSeqData/RNA-Seq/RawData/f
> > as
> > tq]
> > Apr 02, 2016 10:12:13 PM
> > org.springframework.beans.factory.config.PropertyOverrideConfigurer
> > processKey
> > FINE: Property 'MetExtractorProductCrawler.noRecur' set to value 
> > [true] Apr 02, 2016 10:12:13 PM 
> > org.springframework.beans.factory.config.PropertyOverrideConfigurer
> > processKey
> > FINE: Property 'AutoDetectProductCrawler.mimeExtractorRepo' set to 
> > value [../policy/mime-extractor-map.xml]
> > Apr 02, 2016 10:12:13 PM
> > org.springframework.beans.factory.config.PropertyOverrideConfigurer
> > processKey
> > FINE: Property 'MetExtractorProductCrawler.clientTransferer' set to 
> > value 
> > [org.apache.oodt.cas.filemgr.datatransfer.InPlaceDataTransferFactory
> > ]
> > Apr 02, 2016 10:12:13 PM
> > org.springframework.beans.factory.config.PropertyOverrideConfigurer
> > processKey
> > FINE: Property 'AutoDetectProductCrawler.filemgrUrl' set to value [
> > https://urldefense.proofpoint.com/v2/url?u=http-3A__192.168.8.44-3A9
> > 00
> > 0&d=CwIBaQ&c=CZZujK3G2KuXGEKnzq-Hrg&r=wndYZ4MLMT9l3Zb2WZv2hq2O6yvZ1C
> > s- 
> > T2gHY95y7ZA&m=Qaz0eKz7FHe35NMF43A17ey59ANhAqJD5ZfwZQC0VRo&s=OvpwZVR1
> > Xq gKclL83VXAWh__c7nz87xK_nS-O7hIXqc&e= ] Apr 02, 2016 10:12:13 PM 
> > org.springframework.beans.factory.config.PropertyOverrideConfigurer
> > processKey
> > FINE: Property 'AutoDetectProductCrawler.clientTransferer' set to 
> > value 
> > [org.apache.oodt.cas.filemgr.datatransfer.InPlaceDataTransferFactory
> > ]
> > Apr 02, 2016 10:12:13 PM
> > org.springframework.beans.factory.config.PropertyOverrideConfigurer
> > processKey
> > FINE: Property 'StdProductCrawler.noRecur' set to value [true] Apr 
> > 02,
> > 2016 10:12:13 PM
> > org.springframework.beans.factory.config.PropertyOverrideConfigurer
> > processKey
> > FINE: Property 'StdProductCrawler.filemgrUrl' set to value [
> > https://urldefense.proofpoint.com/v2/url?u=http-3A__192.168.8.44-3A9
> > 00
> > 0&d=CwIBaQ&c=CZZujK3G2KuXGEKnzq-Hrg&r=wndYZ4MLMT9l3Zb2WZv2hq2O6yvZ1C
> > s- 
> > T2gHY95y7ZA&m=Qaz0eKz7FHe35NMF43A17ey59ANhAqJD5ZfwZQC0VRo&s=OvpwZVR1
> > Xq gKclL83VXAWh__c7nz87xK_nS-O7hIXqc&e= ] Apr 02, 2016 10:12:13 PM 
> > org.springframework.beans.factory.config.PropertyOverrideConfigurer
> > processKey
> > FINE: Property 'AutoDetectProductCrawler.productPath' set to value 
> > [/mnt/celgene.rnd.combio.mmgp.external/TestSeqData/RNA-Seq/RawData/f
> > as
> > tq]
> > Apr 02, 2016 10:12:13 PM
> > org.springframework.beans.factory.config.PropertyOverrideConfigurer
> > processKey
> > FINE: Property 'MetExtractorProductCrawler.filemgrUrl' set to value 
> > [
> > https://urldefense.proofpoint.com/v2/url?u=http-3A__192.168.8.44-3A9
> > 00
> > 0&d=CwIBaQ&c=CZZujK3G2KuXGEKnzq-Hrg&r=wndYZ4MLMT9l3Zb2WZv2hq2O6yvZ1C
> > s- 
> > T2gHY95y7ZA&m=Qaz0eKz7FHe35NMF43A17ey59ANhAqJD5ZfwZQC0VRo&s=OvpwZVR1
> > Xq gKclL83VXAWh__c7nz87xK_nS-O7hIXqc&e= ] Apr 02, 2016 10:12:13 PM 
> > org.springframework.beans.factory.config.PropertyOverrideConfigurer
> > processKey
> > FINE: Property 'StdProductCrawler.clientTransferer' set to value 
> > [org.apache.oodt.cas.filemgr.datatransfer.InPlaceDataTransferFactory
> > ]
> > Apr 02, 2016 10:12:13 PM
> > org.springframework.beans.factory.config.PropertyOverrideConfigurer
> > processKey
> > FINE: Property 'MetExtractorProductCrawler.productPath' set to value 
> > [/mnt/celgene.rnd.combio.mmgp.external/TestSeqData/RNA-Seq/RawData/f
> > as tq] Apr 02, 2016 10:12:13 PM 
> > org.apache.oodt.cas.crawl.ProductCrawler
> > crawl
> > INFO: Crawling
> > /mnt/celgene.rnd.combio.mmgp.external/TestSeqData/RNA-Seq/RawData/fa
> > st q Apr 02, 2016 10:12:13 PM 
> > org.apache.oodt.cas.crawl.ProductCrawler
> > handleFile
> > INFO: Handling file
> > /mnt/celgene.rnd.combio.mmgp.external/TestSeqData/RNA-Seq/RawData/fa
> > st
> > q/E837642_R1.fastq.gz
> > Apr 02, 2016 10:12:14 PM
> > org.apache.oodt.cas.crawl.AutoDetectProductCrawler 
> > passesPreconditions
> > WARNING: No extractor specs specified for 
> > /mnt/celgene.rnd.combio.mmgp.external/TestSeqData/RNA-Seq/RawData/fa
> > st q/E837642_R1.fastq.gz Apr 02, 2016 10:12:14 PM 
> > org.apache.oodt.cas.crawl.ProductCrawler
> > handleFile
> > WARNING: Failed to pass preconditions for ingest of product:
> > [/mnt/celgene.rnd.combio.mmgp.external/TestSeqData/RNA-Seq/RawData/f
> > as tq/E837642_R1.fastq.gz] Apr 02, 2016 10:12:14 PM 
> > org.apache.oodt.cas.crawl.ProductCrawler
> > handleFile
> > INFO: Handling file
> > /mnt/celgene.rnd.combio.mmgp.external/TestSeqData/RNA-Seq/RawData/fa
> > st
> > q/E837642_R1.fastq.gz.met
> > Apr 02, 2016 10:12:14 PM
> > org.apache.oodt.cas.metadata.preconditions.PreCondEvalUtils eval
> > INFO: Passed precondition comparator id 
> > CheckThatDataFileSizeIsGreaterThanZero
> > Apr 02, 2016 10:12:14 PM
> > org.apache.oodt.cas.metadata.extractors.ExternMetExtractor
> > extrMetadata
> > INFO: Generating met file for product file:
> > [/mnt/celgene.rnd.combio.mmgp.external/TestSeqData/RNA-Seq/RawData/f
> > as
> > tq/E837642_R1.fastq.gz.met]
> > Apr 02, 2016 10:12:14 PM
> > org.apache.oodt.cas.metadata.extractors.ExternMetExtractor
> > extrMetadata
> > INFO: Executing command line:
> > [/celgene/software/apache-oodt/crawler/bin/MetExtractorNGS.pl
> > /mnt/celgene.rnd.combio.mmgp.external/TestSeqData/RNA-Seq/RawData/fa
> > st
> > q/E837642_R1.fastq.gz.met
> > text ] with workingDir:
> > [/mnt/celgene.rnd.combio.mmgp.external/TestSeqData/RNA-Seq/RawData/f
> > as
> > tq]
> > to extract metadata
> > OUTPUT: [WARN : MetExtractorNGS - 2016/04/02 22:12:15] - Input file 
> > /mnt/celgene.rnd.combio.mmgp.external/TestSeqData/RNA-Seq/RawData/fa
> > st q/E837642_R1.fastq.gz.met will be ignored. .met files are not 
> > processed !
> > Apr 02, 2016 10:12:15 PM org.apache.oodt.cas.crawl.ProductCrawler
> > handleFile
> > SEVERE: Failed to get metadata for product : Met extractor failed to 
> > create metadata file
> > org.apache.oodt.cas.metadata.exceptions.MetExtractionException: Met 
> > extractor failed to create metadata file
> >         at
> >
> org.apache.oodt.cas.metadata.extractors.ExternMetExtractor.extrMetadat
> a(ExternMetExtractor.java:120)
> >         at
> >
> org.apache.oodt.cas.metadata.AbstractMetExtractor.extractMetadata(Abst
> ractMetExtractor.java:74)
> >         at
> >
> org.apache.oodt.cas.crawl.AutoDetectProductCrawler.getMetadataForProdu
> ct(AutoDetectProductCrawler.java:84)
> >         at
> >
> org.apache.oodt.cas.crawl.ProductCrawler.handleFile(ProductCrawler.jav
> a:136)
> >         at
> > org.apache.oodt.cas.crawl.ProductCrawler.crawl(ProductCrawler.java:104)
> >         at
> > org.apache.oodt.cas.crawl.ProductCrawler.crawl(ProductCrawler.java:74)
> >         at
> >
> org.apache.oodt.cas.crawl.cli.action.CrawlerLauncherCliAction.execute(
> CrawlerLauncherCliAction.java:58)
> >         at
> > org.apache.oodt.cas.cli.CmdLineUtility.execute(CmdLineUtility.java:331)
> >         at
> > org.apache.oodt.cas.cli.CmdLineUtility.run(CmdLineUtility.java:188)
> >         at
> > org.apache.oodt.cas.crawl.CrawlerLauncher.main(CrawlerLauncher.java:
> > 36
> > )
> >
> > Apr 02, 2016 10:12:15 PM org.apache.oodt.cas.crawl.ProductCrawler
> > handleFile
> > INFO: Handling file
> > /mnt/celgene.rnd.combio.mmgp.external/TestSeqData/RNA-Seq/RawData/fa
> > st
> > q/E837642_R2.fastq.gz
> > Apr 02, 2016 10:12:15 PM
> > org.apache.oodt.cas.crawl.AutoDetectProductCrawler 
> > passesPreconditions
> > WARNING: No extractor specs specified for 
> > /mnt/celgene.rnd.combio.mmgp.external/TestSeqData/RNA-Seq/RawData/fa
> > st q/E837642_R2.fastq.gz Apr 02, 2016 10:12:15 PM 
> > org.apache.oodt.cas.crawl.ProductCrawler
> > handleFile
> > WARNING: Failed to pass preconditions for ingest of product:
> > [/mnt/celgene.rnd.combio.mmgp.external/TestSeqData/RNA-Seq/RawData/f
> > as tq/E837642_R2.fastq.gz] Apr 02, 2016 10:12:15 PM 
> > org.apache.oodt.cas.crawl.ProductCrawler
> > handleFile
> > INFO: Handling file
> > /mnt/celgene.rnd.combio.mmgp.external/TestSeqData/RNA-Seq/RawData/fa
> > st
> > q/E837642_R2.fastq.gz.met
> > Apr 02, 2016 10:12:15 PM
> > org.apache.oodt.cas.metadata.preconditions.PreCondEvalUtils eval
> > INFO: Passed precondition comparator id 
> > CheckThatDataFileSizeIsGreaterThanZero
> > Apr 02, 2016 10:12:16 PM
> > org.apache.oodt.cas.metadata.extractors.ExternMetExtractor
> > extrMetadata
> > INFO: Generating met file for product file:
> > [/mnt/celgene.rnd.combio.mmgp.external/TestSeqData/RNA-Seq/RawData/f
> > as
> > tq/E837642_R2.fastq.gz.met]
> > Apr 02, 2016 10:12:16 PM
> > org.apache.oodt.cas.metadata.extractors.ExternMetExtractor
> > extrMetadata
> > INFO: Executing command line:
> > [/celgene/software/apache-oodt/crawler/bin/MetExtractorNGS.pl
> > /mnt/celgene.rnd.combio.mmgp.external/TestSeqData/RNA-Seq/RawData/fa
> > st
> > q/E837642_R2.fastq.gz.met
> > text ] with workingDir:
> > [/mnt/celgene.rnd.combio.mmgp.external/TestSeqData/RNA-Seq/RawData/f
> > as
> > tq]
> > to extract metadata
> > OUTPUT: [WARN : MetExtractorNGS - 2016/04/02 22:12:16] - Input file 
> > /mnt/celgene.rnd.combio.mmgp.external/TestSeqData/RNA-Seq/RawData/fa
> > st q/E837642_R2.fastq.gz.met will be ignored. .met files are not 
> > processed !
> > Apr 02, 2016 10:12:17 PM org.apache.oodt.cas.crawl.ProductCrawler
> > handleFile
> > SEVERE: Failed to get metadata for product : Met extractor failed to 
> > create metadata file
> > org.apache.oodt.cas.metadata.exceptions.MetExtractionException: Met 
> > extractor failed to create metadata file
> >         at
> >
> org.apache.oodt.cas.metadata.extractors.ExternMetExtractor.extrMetadat
> a(ExternMetExtractor.java:120)
> >         at
> >
> org.apache.oodt.cas.metadata.AbstractMetExtractor.extractMetadata(Abst
> ractMetExtractor.java:74)
> >         at
> >
> org.apache.oodt.cas.crawl.AutoDetectProductCrawler.getMetadataForProdu
> ct(AutoDetectProductCrawler.java:84)
> >         at
> >
> org.apache.oodt.cas.crawl.ProductCrawler.handleFile(ProductCrawler.jav
> a:136)
> >         at
> > org.apache.oodt.cas.crawl.ProductCrawler.crawl(ProductCrawler.java:104)
> >         at
> > org.apache.oodt.cas.crawl.ProductCrawler.crawl(ProductCrawler.java:74)
> >         at
> >
> org.apache.oodt.cas.crawl.cli.action.CrawlerLauncherCliAction.execute(
> CrawlerLauncherCliAction.java:58)
> >         at
> > org.apache.oodt.cas.cli.CmdLineUtility.execute(CmdLineUtility.java:331)
> >         at
> > org.apache.oodt.cas.cli.CmdLineUtility.run(CmdLineUtility.java:188)
> >         at
> > org.apache.oodt.cas.crawl.CrawlerLauncher.main(CrawlerLauncher.java:
> > 36
> > )
> >
> > Apr 02, 2016 10:12:17 PM org.apache.oodt.cas.crawl.ProductCrawler
> > handleFile
> > INFO: Handling file
> > /mnt/celgene.rnd.combio.mmgp.external/TestSeqData/RNA-Seq/RawData/fa
> > st
> > q/cas-crawler-04-02-16.log.gz
> > Apr 02, 2016 10:12:17 PM
> > org.apache.oodt.cas.crawl.AutoDetectProductCrawler 
> > passesPreconditions
> > WARNING: No extractor specs specified for 
> > /mnt/celgene.rnd.combio.mmgp.external/TestSeqData/RNA-Seq/RawData/fa
> > st q/cas-crawler-04-02-16.log.gz Apr 02, 2016 10:12:17 PM 
> > org.apache.oodt.cas.crawl.ProductCrawler
> > handleFile
> > WARNING: Failed to pass preconditions for ingest of product:
> > [/mnt/celgene.rnd.combio.mmgp.external/TestSeqData/RNA-Seq/RawData/f
> > as tq/cas-crawler-04-02-16.log.gz] Apr 02, 2016 10:12:17 PM 
> > org.apache.oodt.cas.crawl.ProductCrawler
> > handleFile
> > INFO: Handling file
> > /mnt/celgene.rnd.combio.mmgp.external/TestSeqData/RNA-Seq/RawData/fa
> > st
> > q/cas-crawler-04-02-16.tar.gz
> > Apr 02, 2016 10:12:17 PM
> > org.apache.oodt.cas.crawl.AutoDetectProductCrawler 
> > passesPreconditions
> > WARNING: No extractor specs specified for 
> > /mnt/celgene.rnd.combio.mmgp.external/TestSeqData/RNA-Seq/RawData/fa
> > st q/cas-crawler-04-02-16.tar.gz Apr 02, 2016 10:12:17 PM 
> > org.apache.oodt.cas.crawl.ProductCrawler
> > handleFile
> > WARNING: Failed to pass preconditions for ingest of product:
> > [/mnt/celgene.rnd.combio.mmgp.external/TestSeqData/RNA-Seq/RawData/f
> > as tq/cas-crawler-04-02-16.tar.gz] Apr 02, 2016 10:12:17 PM 
> > org.apache.oodt.cas.crawl.ProductCrawler
> > handleFile
> > INFO: Handling file
> > /mnt/celgene.rnd.combio.mmgp.external/TestSeqData/RNA-Seq/RawData/fa
> > st 
> > q/cas-crawler-mnt-celgene.rnd.combio.mmgp.external-TestSeqData-RNA-S
> > eq
> > -RawData-fastq-04-02-16.tar.gz
> > Apr 02, 2016 10:12:17 PM
> > org.apache.oodt.cas.crawl.AutoDetectProductCrawler 
> > passesPreconditions
> > WARNING: No extractor specs specified for 
> > /mnt/celgene.rnd.combio.mmgp.external/TestSeqData/RNA-Seq/RawData/fa
> > st 
> > q/cas-crawler-mnt-celgene.rnd.combio.mmgp.external-TestSeqData-RNA-S
> > eq -RawData-fastq-04-02-16.tar.gz Apr 02, 2016 10:12:17 PM 
> > org.apache.oodt.cas.crawl.ProductCrawler
> > handleFile
> > WARNING: Failed to pass preconditions for ingest of product:
> > [/mnt/celgene.rnd.combio.mmgp.external/TestSeqData/RNA-Seq/RawData/f
> > as 
> > tq/cas-crawler-mnt-celgene.rnd.combio.mmgp.external-TestSeqData-RNA-
> > Se q-RawData-fastq-04-02-16.tar.gz] Apr 02, 2016 10:12:17 PM 
> > org.apache.oodt.cas.crawl.ProductCrawler
> > handleFile
> > INFO: Handling file
> > /mnt/celgene.rnd.combio.mmgp.external/TestSeqData/RNA-Seq/RawData/fa
> > st
> > q/test
> > Apr 02, 2016 10:12:17 PM
> > org.apache.oodt.cas.metadata.preconditions.PreCondEvalUtils eval
> > INFO: Passed precondition comparator id 
> > CheckThatDataFileSizeIsGreaterThanZero
> > Apr 02, 2016 10:12:17 PM
> > org.apache.oodt.cas.metadata.extractors.ExternMetExtractor
> > extrMetadata
> > INFO: Generating met file for product file:
> > [/mnt/celgene.rnd.combio.mmgp.external/TestSeqData/RNA-Seq/RawData/f
> > as
> > tq/test]
> > Apr 02, 2016 10:12:17 PM
> > org.apache.oodt.cas.metadata.extractors.ExternMetExtractor
> > extrMetadata
> > INFO: Executing command line:
> > [/celgene/software/apache-oodt/crawler/bin/MetExtractorNGS.pl
> > /mnt/celgene.rnd.combio.mmgp.external/TestSeqData/RNA-Seq/RawData/fa
> > st
> > q/test
> > text ] with workingDir:
> > [/mnt/celgene.rnd.combio.mmgp.external/TestSeqData/RNA-Seq/RawData/f
> > as
> > tq]
> > to extract metadata
> > OUTPUT: [DEBUG : MetExtractorNGS - 2016/04/02 22:12:18] - Accessing 
> > NGS server at
> > https://urldefense.proofpoint.com/v2/url?u=http-3A__192.168.8.44-3A8
> > 08 
> > 2_RPC2&d=CwIBaQ&c=CZZujK3G2KuXGEKnzq-Hrg&r=wndYZ4MLMT9l3Zb2WZv2hq2O6
> > yv
> > Z1Cs-T2gHY95y7ZA&m=Qaz0eKz7FHe35NMF43A17ey59ANhAqJD5ZfwZQC0VRo&s=tSc
> > i2 Q1bJj0cQnBHjjOwtZjjx9uNMoN5Bi-ABG0Q7Y4&e=
> > OUTPUT: [DEBUG : metadataPrepare - 2016/04/02 22:12:18] - addMetadata:
> > metadata for file_host are not in array format.Converting..
> > OUTPUT: [DEBUG : metadataPrepare - 2016/04/02 22:12:18] - addMetadata:
> > adding key/value [file_host]/[ip-192-168-8-66]
> > OUTPUT: [DEBUG : metadataPrepare - 2016/04/02 22:12:18] - addMetadata:
> > metadata for ProductType are not in array format.Converting..
> > OUTPUT: [DEBUG : metadataPrepare - 2016/04/02 22:12:18] - addMetadata:
> > adding key/value [ProductType]/[GenericFile]
> > OUTPUT: [DEBUG : metadataPrepare - 2016/04/02 22:12:18] - addMetadata:
> > metadata for ingest_user are not in array format.Converting..
> > OUTPUT: [DEBUG : metadataPrepare - 2016/04/02 22:12:18] - addMetadata:
> > adding key/value [ingest_user]/[kmavrommatis]
> > OUTPUT: [DEBUG : MetExtractorNGS - 2016/04/02 22:12:18] - The file 
> > path is ARRAY(0x22d3f48). It will be added under the FilePath 
> > metadata field
> > OUTPUT: [DEBUG : metadataPrepare - 2016/04/02 22:12:18] - addMetadata:
> > metadata for FilePath are not in array format.Converting..
> > OUTPUT: [DEBUG : metadataPrepare - 2016/04/02 22:12:18] - addMetadata:
> > adding key/value
> > [FilePath]/[/mnt/celgene.rnd.combio.mmgp.external/TestSeqData/RNA-Se
> > q/
> > RawData/fastq/test]
> > OUTPUT: [DEBUG : MetExtractorNGS - 2016/04/02 22:12:18] - This file 
> > is of type text
> > OUTPUT: [DEBUG : MetExtractorNGS - 2016/04/02 22:12:18] - Storing 
> > metadata in file 
> > /mnt/celgene.rnd.combio.mmgp.external/TestSeqData/RNA-Seq/RawData/fa
> > st
> > q/test.met
> > OUTPUT: [DEBUG : metadataPrepare - 2016/04/02 22:12:18] - Changing 
> > /mnt/celgene.rnd.combio.mmgp.external/TestSeqData/RNA-Seq/RawData/fa
> > st
> > q/test
> > to
> > OUTPUT: [DEBUG : metadataPrepare - 2016/04/02 22:12:18] - 
> > /mnt/celgene.rnd.combio.mmgp.external/TestSeqData/RNA-Seq/RawData/fa
> > st
> > q/test
> > OUTPUT: [DEBUG : metadataPrepare - 2016/04/02 22:12:18] - Changing 
> > kmavrommatis to
> > OUTPUT: [DEBUG : metadataPrepare - 2016/04/02 22:12:18] - 
> > kmavrommatis
> > OUTPUT: [DEBUG : metadataPrepare - 2016/04/02 22:12:18] - Changing 
> > GenericFile to
> > OUTPUT: [DEBUG : metadataPrepare - 2016/04/02 22:12:18] - 
> > GenericFile
> > OUTPUT: [DEBUG : metadataPrepare - 2016/04/02 22:12:18] - Changing
> > ip-192-168-8-66 to
> > OUTPUT: [DEBUG : metadataPrepare - 2016/04/02 22:12:18] -
> > ip-192-168-8-66
> > OUTPUT: [DEBUG : MetExtractorNGS - 2016/04/02 22:12:19] - Process 
> > finished SUCCESSFULLY Apr 02, 2016 10:12:19 PM 
> > org.apache.oodt.cas.metadata.extractors.ExternMetExtractor
> > extrMetadata
> > INFO: Met extraction successful for product file:
> > [/mnt/celgene.rnd.combio.mmgp.external/TestSeqData/RNA-Seq/RawData/f
> > as
> > tq/test] Apr 02, 2016 10:12:19 PM
> > org.apache.oodt.cas.crawl.ProductCrawler ingest
> > INFO: ProductCrawler: Ready to ingest product:
> >
> [/mnt/celgene.rnd.combio.mmgp.external/TestSeqData/RNA-Seq/RawData/fastq/test]:
> > ProductType: [GenericFile]
> > Apr 02, 2016 10:12:19 PM
> > org.apache.oodt.cas.filemgr.ingest.StdIngester
> > setFileManager
> > INFO: StdIngester: connected to file manager:
> > [https://urldefense.proofpoint.com/v2/url?u=http-3A__192.168.8.44-3A
> > 90 
> > 00&d=CwIBaQ&c=CZZujK3G2KuXGEKnzq-Hrg&r=wndYZ4MLMT9l3Zb2WZv2hq2O6yvZ1
> > Cs 
> > -T2gHY95y7ZA&m=Qaz0eKz7FHe35NMF43A17ey59ANhAqJD5ZfwZQC0VRo&s=OvpwZVR
> > 1X qgKclL83VXAWh__c7nz87xK_nS-O7hIXqc&e= ] Apr 02, 2016 10:12:19 PM 
> > org.apache.oodt.cas.filemgr.datatransfer.InPlaceDataTransferer
> > setFileManagerUrl
> > INFO: In Place Data Transfer to:
> > [https://urldefense.proofpoint.com/v2/url?u=http-3A__192.168.8.44-3A
> > 90 
> > 00&d=CwIBaQ&c=CZZujK3G2KuXGEKnzq-Hrg&r=wndYZ4MLMT9l3Zb2WZv2hq2O6yvZ1
> > Cs 
> > -T2gHY95y7ZA&m=Qaz0eKz7FHe35NMF43A17ey59ANhAqJD5ZfwZQC0VRo&s=OvpwZVR
> > 1X qgKclL83VXAWh__c7nz87xK_nS-O7hIXqc&e= ] enabled Apr 02, 2016 
> > 10:12:19 PM org.apache.oodt.cas.filemgr.ingest.StdIngester
> > ingest
> > INFO: StdIngester: ingesting product: ProductName: [test]: ProductType:
> > [GenericFile]: FileLocation:
> > [/mnt/celgene.rnd.combio.mmgp.external/TestSeqData/RNA-Seq/RawData/f
> > as
> > tq/]
> > Apr 02, 2016 10:12:19 PM
> > org.apache.oodt.cas.filemgr.system.XmlRpcFileManagerClient
> > ingestProduct
> > FINEST: File Manager Client: clientTransfer enabled: transfering 
> > product [test] Apr 02, 2016 10:12:19 PM 
> > org.apache.oodt.cas.filemgr.versioning.VersioningUtils
> > createBasicDataStoreRefsFlat
> > FINE: VersioningUtils: Generated data store ref:
> > file:/opt/oodt/data/archive/test/test from origRef:
> > file:/mnt/celgene.rnd.combio.mmgp.external/TestSeqData/RNA-Seq/RawDa
> > ta /fastq/test Apr 02, 2016 10:12:19 PM 
> > org.apache.oodt.cas.crawl.ProductCrawler ingest
> > INFO: Successfully ingested product:
> >
> [/mnt/celgene.rnd.combio.mmgp.external/TestSeqData/RNA-Seq/RawData/fastq/test]:
> > product id: 4c8de2da-265a-48c4-8380-3f1103dfecfc
> > Apr 02, 2016 10:12:19 PM org.apache.oodt.cas.crawl.ProductCrawler
> > handleFile
> > INFO: Successful ingest of product:
> > [/mnt/celgene.rnd.combio.mmgp.external/TestSeqData/RNA-Seq/RawData/f
> > as
> > tq/test]
> >
> >
> > *********************************************************
> > THIS ELECTRONIC MAIL MESSAGE AND ANY ATTACHMENT IS CONFIDENTIAL AND 
> > MAY CONTAIN LEGALLY PRIVILEGED INFORMATION INTENDED ONLY FOR THE USE 
> > OF THE INDIVIDUAL OR INDIVIDUALS NAMED ABOVE.
> > If the reader is not the intended recipient, or the employee or 
> > agent responsible to deliver it to the intended recipient, you are 
> > hereby notified that any dissemination, distribution or copying of 
> > this communication is strictly prohibited. If you have received this 
> > communication in error, please reply to the sender to notify us of 
> > the error and delete the original message. Thank You.
> >
>
>
> --
> *Lewis*
>
> *********************************************************
> THIS ELECTRONIC MAIL MESSAGE AND ANY ATTACHMENT IS CONFIDENTIAL AND 
> MAY CONTAIN LEGALLY PRIVILEGED INFORMATION INTENDED ONLY FOR THE USE 
> OF THE INDIVIDUAL OR INDIVIDUALS NAMED ABOVE.
> If the reader is not the intended recipient, or the employee or agent 
> responsible to deliver it to the intended recipient, you are hereby 
> notified that any dissemination, distribution or copying of this 
> communication is strictly prohibited. If you have received this 
> communication in error, please reply to the sender to notify us of the 
> error and delete the original message. Thank You.
>



--
*Lewis*

*********************************************************
THIS ELECTRONIC MAIL MESSAGE AND ANY ATTACHMENT IS
CONFIDENTIAL AND MAY CONTAIN LEGALLY PRIVILEGED
INFORMATION INTENDED ONLY FOR THE USE OF THE INDIVIDUAL
OR INDIVIDUALS NAMED ABOVE.
If the reader is not the intended recipient, or the
employee or agent responsible to deliver it to the
intended recipient, you are hereby notified that any
dissemination, distribution or copying of this
communication is strictly prohibited. If you have
received this communication in error, please reply to the
sender to notify us of the error and delete the original
message. Thank You.

Re: Transition from OODT 0.6 to 0.12 cannot find extractor specifications

Posted by Lewis John Mcgibbney <le...@gmail.com>.
Hi K,
OK so I did a bit of searching here and located a bunch of files which are
defined as legacy... you can check the search results out below
https://github.com/apache/oodt/search?utf8=%E2%9C%93&q=AutoDetectProductCrawler&type=Code
I would urge you to have a look at the AutoDetectProductCrawler Javadoc
description included in master branch [0] as well to see if you've got
everything required.
Finally, I came across some documentation on the wiki which may guide you
in the right direction [1]. It may also be outdated though so please let us
know if that it the case.
hth

[0]
https://github.com/apache/oodt/blob/91d0bafe71124906bd94baad746189caf35fb39c/crawler/src/main/java/org/apache/oodt/cas/crawl/AutoDetectProductCrawler.java#L40-L64
[1]
https://cwiki.apache.org/confluence/display/OODT/Mime+type+detection+with+the+AutoDetectProductCrawler

On Mon, Apr 4, 2016 at 10:54 PM, Konstantinos Mavrommatis <
kmavrommatis@celgene.com> wrote:

> Hi,
> It seems to be happening for a number of types of files that I have in the
> mimetypes.xml.
> A few things are puzzling to me: this file which is a .gz file is not
> processed by the regular tika mimetypes which contains the gzip files
> A file that has no extension, which defaults to txt is passed to the
> MetExtractor.pl and processed.
>
> Any ideas I can find what are the preconditions that fail ? I tried to
> change the log level to DEBUG for all components but I did not get much
> more information. This must be something that changed in the OODT releases
> >0.6 but could not find anything relevant in the release notes.
> I also noticed in the documentation  of the AutoDecectProductCrawler that
> it uses the file met-extr-preconditions.xml which I could not find anywhere
> in the deployed OODT or the src directories. Could that be a reason for the
> problem I observe?
>
> Thanks
> K
>
> -----Original Message-----
> From: Lewis John Mcgibbney [mailto:lewis.mcgibbney@gmail.com]
> Sent: Monday, April 04, 2016 3:24 PM
> To: dev@oodt.apache.org
> Subject: Re: Transition from OODT 0.6 to 0.12 cannot find extractor
> specifications
>
> Hi Konstantinos,
> It appears to be happening with a tar.gz file as well right?
>
> WARNING: No extractor specs specified for
> /mnt/celgene.rnd.combio.mmgp.external/TestSeqData/RNA-Seq/RawData/fastq/cas-crawler-04-02-16.log.gz
>
> I wonder if it is the file names... However I would be extremely surprised
> as I've seen some much more verbose file naming.
> Lewis
>
> On Saturday, April 2, 2016, Konstantinos Mavrommatis <
> kmavrommatis@celgene.com> wrote:
>
> > Hi,
> > I am trying to replicate a fully functional service that I had setup
> > long time ago using OODT 0.6 but I am having the following problem
> > that does not allow me to ingest files. When I try to ingest files
> > with the extension fastq.gz I get the line:
> > WARNING: No extractor specs specified for
> > /mnt/celgene.rnd.combio.mmgp.external/TestSeqData/RNA-Seq/RawData/fast
> > q/E837642_R1.fastq.gz Apr 02, 2016 10:12:14 PM
> > org.apache.oodt.cas.crawl.ProductCrawler
> > handleFile
> > And of course the file is not ingested. This process works without
> > problem with OODT 0.6 on a different server.
> >
> > The crawler command I am running is:
> > ./crawler_launcher \
> > --operation \
> > --launchAutoCrawler \
> > --productPath $FILEPATH \
> > --filemgrUrl $OODT_FILEMGR_URL \
> > --clientTransferer
> > org.apache.oodt.cas.filemgr.datatransfer.InPlaceDataTransferFactory \
> > --mimeExtractorRepo ../policy/mime-extractor-map.xml \ --noRecur \
> > --crawlForDirs 2>&1
> >
> >
> >
> > I have setup OODT 0.12 on a server which runs FM listening to port 9000.
> > From a client machine I have verified that I can use FM to ingest
> products.
> > I am now trying to use crawler to crawl and ingest all files in a
> > directory. Since I have non standard MIME types in these directories I
> > have done the following:
> > 1. Added my own mime types in policy/mimetypes.xml eg
> >   <mime-type type="text/fastq">
> >                 <glob pattern="*.fastq"/>
> >                 <glob pattern="*.fastq.gz"/>
> >                 <glob pattern="*.fastq.bz"/>
> >                 <glob pattern="*.fastq.bz2"/>
> >                 <glob pattern="*.fastq.bzip"/>
> >                 <glob pattern="*.fq"/>
> >                 <glob pattern="*.fq.gz"/>
> >                 <glob pattern="*.fq.bz"/>
> >                 <glob pattern="*.fq.bz2"/>
> >                 <glob pattern="*.fq.bzip"/>
> >         </mime-type>
> > 2. created the file policy/mime-extractor-map.xml
> >
> >         <mime type="text/fastq">
> >                 <extractor
> > class="org.apache.oodt.cas.metadata.extractors.ExternMetExtractor">
> >                         <config
> > file="/apache-oodt/crawler/bin/fastq.config"/>
> >                         <preCondComparators>
> >                                 <preCondComparator
> > id="CheckThatDataFileSizeIsGreaterThanZero"/>
> >                         </preCondComparators>
> >                 </extractor>
> >         </mime>
> >
> > 3. created the file fastq.config
> > <?xml version="1.0" encoding="UTF-8"?> <cas:externextractor
> > xmlns:cas="http://oodt.jpl.nasa.gov/1.0/cas">
> >   <exec workingDir="">
> >
> >
> <extractorBinPath>/apache-oodt/crawler/bin/MetExtractorNGS.pl</extractorBinPath>
> >       <args>
> >          <arg isDataFile="true"></arg>
> >         <arg>fastq</arg>
> >       </args>
> >    </exec>
> > </cas:externextractor>
> >
> >
> >
> > The MetExtractorNGS.pl is a small perl script that opens the file to
> > be ingested, gets some information and stores it in the .met file that
> > corresponds to the file to be ingested and have manually verified that
> > works as expected producing the correct met file.
> >
> > What am I missing here? Any ideas comments suggestions will be greatly
> > appreciated.
> > Thanks in advance for any help
> > Kostas
> >
> >
> >
> > PS1 The full output from running the crawler command follows:
> >
> >
> > Setting property 'StdProductCrawler.filemgrUrl'
> > Setting property 'MetExtractorProductCrawler.filemgrUrl'
> > Setting property 'AutoDetectProductCrawler.filemgrUrl'
> > Setting property 'StdProductCrawler.clientTransferer'
> > Setting property 'MetExtractorProductCrawler.clientTransferer'
> > Setting property 'AutoDetectProductCrawler.clientTransferer'
> > Setting property 'StdProductCrawler.noRecur'
> > Setting property 'MetExtractorProductCrawler.noRecur'
> > Setting property 'AutoDetectProductCrawler.noRecur'
> > Setting property 'AutoDetectProductCrawler.mimeExtractorRepo'
> > Setting property 'StdProductCrawler.productPath'
> > Setting property 'MetExtractorProductCrawler.productPath'
> > Setting property 'AutoDetectProductCrawler.productPath'
> > Apr 02, 2016 10:12:13 PM
> > org.springframework.beans.factory.config.PropertyOverrideConfigurer
> > processKey
> > FINE: Property 'AutoDetectProductCrawler.noRecur' set to value [true]
> > Apr 02, 2016 10:12:13 PM
> > org.springframework.beans.factory.config.PropertyOverrideConfigurer
> > processKey
> > FINE: Property 'StdProductCrawler.productPath' set to value
> > [/mnt/celgene.rnd.combio.mmgp.external/TestSeqData/RNA-Seq/RawData/fas
> > tq]
> > Apr 02, 2016 10:12:13 PM
> > org.springframework.beans.factory.config.PropertyOverrideConfigurer
> > processKey
> > FINE: Property 'MetExtractorProductCrawler.noRecur' set to value
> > [true] Apr 02, 2016 10:12:13 PM
> > org.springframework.beans.factory.config.PropertyOverrideConfigurer
> > processKey
> > FINE: Property 'AutoDetectProductCrawler.mimeExtractorRepo' set to
> > value [../policy/mime-extractor-map.xml]
> > Apr 02, 2016 10:12:13 PM
> > org.springframework.beans.factory.config.PropertyOverrideConfigurer
> > processKey
> > FINE: Property 'MetExtractorProductCrawler.clientTransferer' set to
> > value
> > [org.apache.oodt.cas.filemgr.datatransfer.InPlaceDataTransferFactory]
> > Apr 02, 2016 10:12:13 PM
> > org.springframework.beans.factory.config.PropertyOverrideConfigurer
> > processKey
> > FINE: Property 'AutoDetectProductCrawler.filemgrUrl' set to value [
> > https://urldefense.proofpoint.com/v2/url?u=http-3A__192.168.8.44-3A900
> > 0&d=CwIBaQ&c=CZZujK3G2KuXGEKnzq-Hrg&r=wndYZ4MLMT9l3Zb2WZv2hq2O6yvZ1Cs-
> > T2gHY95y7ZA&m=Qaz0eKz7FHe35NMF43A17ey59ANhAqJD5ZfwZQC0VRo&s=OvpwZVR1Xq
> > gKclL83VXAWh__c7nz87xK_nS-O7hIXqc&e= ] Apr 02, 2016 10:12:13 PM
> > org.springframework.beans.factory.config.PropertyOverrideConfigurer
> > processKey
> > FINE: Property 'AutoDetectProductCrawler.clientTransferer' set to
> > value
> > [org.apache.oodt.cas.filemgr.datatransfer.InPlaceDataTransferFactory]
> > Apr 02, 2016 10:12:13 PM
> > org.springframework.beans.factory.config.PropertyOverrideConfigurer
> > processKey
> > FINE: Property 'StdProductCrawler.noRecur' set to value [true] Apr 02,
> > 2016 10:12:13 PM
> > org.springframework.beans.factory.config.PropertyOverrideConfigurer
> > processKey
> > FINE: Property 'StdProductCrawler.filemgrUrl' set to value [
> > https://urldefense.proofpoint.com/v2/url?u=http-3A__192.168.8.44-3A900
> > 0&d=CwIBaQ&c=CZZujK3G2KuXGEKnzq-Hrg&r=wndYZ4MLMT9l3Zb2WZv2hq2O6yvZ1Cs-
> > T2gHY95y7ZA&m=Qaz0eKz7FHe35NMF43A17ey59ANhAqJD5ZfwZQC0VRo&s=OvpwZVR1Xq
> > gKclL83VXAWh__c7nz87xK_nS-O7hIXqc&e= ] Apr 02, 2016 10:12:13 PM
> > org.springframework.beans.factory.config.PropertyOverrideConfigurer
> > processKey
> > FINE: Property 'AutoDetectProductCrawler.productPath' set to value
> > [/mnt/celgene.rnd.combio.mmgp.external/TestSeqData/RNA-Seq/RawData/fas
> > tq]
> > Apr 02, 2016 10:12:13 PM
> > org.springframework.beans.factory.config.PropertyOverrideConfigurer
> > processKey
> > FINE: Property 'MetExtractorProductCrawler.filemgrUrl' set to value [
> > https://urldefense.proofpoint.com/v2/url?u=http-3A__192.168.8.44-3A900
> > 0&d=CwIBaQ&c=CZZujK3G2KuXGEKnzq-Hrg&r=wndYZ4MLMT9l3Zb2WZv2hq2O6yvZ1Cs-
> > T2gHY95y7ZA&m=Qaz0eKz7FHe35NMF43A17ey59ANhAqJD5ZfwZQC0VRo&s=OvpwZVR1Xq
> > gKclL83VXAWh__c7nz87xK_nS-O7hIXqc&e= ] Apr 02, 2016 10:12:13 PM
> > org.springframework.beans.factory.config.PropertyOverrideConfigurer
> > processKey
> > FINE: Property 'StdProductCrawler.clientTransferer' set to value
> > [org.apache.oodt.cas.filemgr.datatransfer.InPlaceDataTransferFactory]
> > Apr 02, 2016 10:12:13 PM
> > org.springframework.beans.factory.config.PropertyOverrideConfigurer
> > processKey
> > FINE: Property 'MetExtractorProductCrawler.productPath' set to value
> > [/mnt/celgene.rnd.combio.mmgp.external/TestSeqData/RNA-Seq/RawData/fas
> > tq] Apr 02, 2016 10:12:13 PM org.apache.oodt.cas.crawl.ProductCrawler
> > crawl
> > INFO: Crawling
> > /mnt/celgene.rnd.combio.mmgp.external/TestSeqData/RNA-Seq/RawData/fast
> > q Apr 02, 2016 10:12:13 PM org.apache.oodt.cas.crawl.ProductCrawler
> > handleFile
> > INFO: Handling file
> > /mnt/celgene.rnd.combio.mmgp.external/TestSeqData/RNA-Seq/RawData/fast
> > q/E837642_R1.fastq.gz
> > Apr 02, 2016 10:12:14 PM
> > org.apache.oodt.cas.crawl.AutoDetectProductCrawler passesPreconditions
> > WARNING: No extractor specs specified for
> > /mnt/celgene.rnd.combio.mmgp.external/TestSeqData/RNA-Seq/RawData/fast
> > q/E837642_R1.fastq.gz Apr 02, 2016 10:12:14 PM
> > org.apache.oodt.cas.crawl.ProductCrawler
> > handleFile
> > WARNING: Failed to pass preconditions for ingest of product:
> > [/mnt/celgene.rnd.combio.mmgp.external/TestSeqData/RNA-Seq/RawData/fas
> > tq/E837642_R1.fastq.gz] Apr 02, 2016 10:12:14 PM
> > org.apache.oodt.cas.crawl.ProductCrawler
> > handleFile
> > INFO: Handling file
> > /mnt/celgene.rnd.combio.mmgp.external/TestSeqData/RNA-Seq/RawData/fast
> > q/E837642_R1.fastq.gz.met
> > Apr 02, 2016 10:12:14 PM
> > org.apache.oodt.cas.metadata.preconditions.PreCondEvalUtils eval
> > INFO: Passed precondition comparator id
> > CheckThatDataFileSizeIsGreaterThanZero
> > Apr 02, 2016 10:12:14 PM
> > org.apache.oodt.cas.metadata.extractors.ExternMetExtractor
> > extrMetadata
> > INFO: Generating met file for product file:
> > [/mnt/celgene.rnd.combio.mmgp.external/TestSeqData/RNA-Seq/RawData/fas
> > tq/E837642_R1.fastq.gz.met]
> > Apr 02, 2016 10:12:14 PM
> > org.apache.oodt.cas.metadata.extractors.ExternMetExtractor
> > extrMetadata
> > INFO: Executing command line:
> > [/celgene/software/apache-oodt/crawler/bin/MetExtractorNGS.pl
> > /mnt/celgene.rnd.combio.mmgp.external/TestSeqData/RNA-Seq/RawData/fast
> > q/E837642_R1.fastq.gz.met
> > text ] with workingDir:
> > [/mnt/celgene.rnd.combio.mmgp.external/TestSeqData/RNA-Seq/RawData/fas
> > tq]
> > to extract metadata
> > OUTPUT: [WARN : MetExtractorNGS - 2016/04/02 22:12:15] - Input file
> > /mnt/celgene.rnd.combio.mmgp.external/TestSeqData/RNA-Seq/RawData/fast
> > q/E837642_R1.fastq.gz.met will be ignored. .met files are not
> > processed !
> > Apr 02, 2016 10:12:15 PM org.apache.oodt.cas.crawl.ProductCrawler
> > handleFile
> > SEVERE: Failed to get metadata for product : Met extractor failed to
> > create metadata file
> > org.apache.oodt.cas.metadata.exceptions.MetExtractionException: Met
> > extractor failed to create metadata file
> >         at
> >
> org.apache.oodt.cas.metadata.extractors.ExternMetExtractor.extrMetadata(ExternMetExtractor.java:120)
> >         at
> >
> org.apache.oodt.cas.metadata.AbstractMetExtractor.extractMetadata(AbstractMetExtractor.java:74)
> >         at
> >
> org.apache.oodt.cas.crawl.AutoDetectProductCrawler.getMetadataForProduct(AutoDetectProductCrawler.java:84)
> >         at
> >
> org.apache.oodt.cas.crawl.ProductCrawler.handleFile(ProductCrawler.java:136)
> >         at
> > org.apache.oodt.cas.crawl.ProductCrawler.crawl(ProductCrawler.java:104)
> >         at
> > org.apache.oodt.cas.crawl.ProductCrawler.crawl(ProductCrawler.java:74)
> >         at
> >
> org.apache.oodt.cas.crawl.cli.action.CrawlerLauncherCliAction.execute(CrawlerLauncherCliAction.java:58)
> >         at
> > org.apache.oodt.cas.cli.CmdLineUtility.execute(CmdLineUtility.java:331)
> >         at
> > org.apache.oodt.cas.cli.CmdLineUtility.run(CmdLineUtility.java:188)
> >         at
> > org.apache.oodt.cas.crawl.CrawlerLauncher.main(CrawlerLauncher.java:36
> > )
> >
> > Apr 02, 2016 10:12:15 PM org.apache.oodt.cas.crawl.ProductCrawler
> > handleFile
> > INFO: Handling file
> > /mnt/celgene.rnd.combio.mmgp.external/TestSeqData/RNA-Seq/RawData/fast
> > q/E837642_R2.fastq.gz
> > Apr 02, 2016 10:12:15 PM
> > org.apache.oodt.cas.crawl.AutoDetectProductCrawler passesPreconditions
> > WARNING: No extractor specs specified for
> > /mnt/celgene.rnd.combio.mmgp.external/TestSeqData/RNA-Seq/RawData/fast
> > q/E837642_R2.fastq.gz Apr 02, 2016 10:12:15 PM
> > org.apache.oodt.cas.crawl.ProductCrawler
> > handleFile
> > WARNING: Failed to pass preconditions for ingest of product:
> > [/mnt/celgene.rnd.combio.mmgp.external/TestSeqData/RNA-Seq/RawData/fas
> > tq/E837642_R2.fastq.gz] Apr 02, 2016 10:12:15 PM
> > org.apache.oodt.cas.crawl.ProductCrawler
> > handleFile
> > INFO: Handling file
> > /mnt/celgene.rnd.combio.mmgp.external/TestSeqData/RNA-Seq/RawData/fast
> > q/E837642_R2.fastq.gz.met
> > Apr 02, 2016 10:12:15 PM
> > org.apache.oodt.cas.metadata.preconditions.PreCondEvalUtils eval
> > INFO: Passed precondition comparator id
> > CheckThatDataFileSizeIsGreaterThanZero
> > Apr 02, 2016 10:12:16 PM
> > org.apache.oodt.cas.metadata.extractors.ExternMetExtractor
> > extrMetadata
> > INFO: Generating met file for product file:
> > [/mnt/celgene.rnd.combio.mmgp.external/TestSeqData/RNA-Seq/RawData/fas
> > tq/E837642_R2.fastq.gz.met]
> > Apr 02, 2016 10:12:16 PM
> > org.apache.oodt.cas.metadata.extractors.ExternMetExtractor
> > extrMetadata
> > INFO: Executing command line:
> > [/celgene/software/apache-oodt/crawler/bin/MetExtractorNGS.pl
> > /mnt/celgene.rnd.combio.mmgp.external/TestSeqData/RNA-Seq/RawData/fast
> > q/E837642_R2.fastq.gz.met
> > text ] with workingDir:
> > [/mnt/celgene.rnd.combio.mmgp.external/TestSeqData/RNA-Seq/RawData/fas
> > tq]
> > to extract metadata
> > OUTPUT: [WARN : MetExtractorNGS - 2016/04/02 22:12:16] - Input file
> > /mnt/celgene.rnd.combio.mmgp.external/TestSeqData/RNA-Seq/RawData/fast
> > q/E837642_R2.fastq.gz.met will be ignored. .met files are not
> > processed !
> > Apr 02, 2016 10:12:17 PM org.apache.oodt.cas.crawl.ProductCrawler
> > handleFile
> > SEVERE: Failed to get metadata for product : Met extractor failed to
> > create metadata file
> > org.apache.oodt.cas.metadata.exceptions.MetExtractionException: Met
> > extractor failed to create metadata file
> >         at
> >
> org.apache.oodt.cas.metadata.extractors.ExternMetExtractor.extrMetadata(ExternMetExtractor.java:120)
> >         at
> >
> org.apache.oodt.cas.metadata.AbstractMetExtractor.extractMetadata(AbstractMetExtractor.java:74)
> >         at
> >
> org.apache.oodt.cas.crawl.AutoDetectProductCrawler.getMetadataForProduct(AutoDetectProductCrawler.java:84)
> >         at
> >
> org.apache.oodt.cas.crawl.ProductCrawler.handleFile(ProductCrawler.java:136)
> >         at
> > org.apache.oodt.cas.crawl.ProductCrawler.crawl(ProductCrawler.java:104)
> >         at
> > org.apache.oodt.cas.crawl.ProductCrawler.crawl(ProductCrawler.java:74)
> >         at
> >
> org.apache.oodt.cas.crawl.cli.action.CrawlerLauncherCliAction.execute(CrawlerLauncherCliAction.java:58)
> >         at
> > org.apache.oodt.cas.cli.CmdLineUtility.execute(CmdLineUtility.java:331)
> >         at
> > org.apache.oodt.cas.cli.CmdLineUtility.run(CmdLineUtility.java:188)
> >         at
> > org.apache.oodt.cas.crawl.CrawlerLauncher.main(CrawlerLauncher.java:36
> > )
> >
> > Apr 02, 2016 10:12:17 PM org.apache.oodt.cas.crawl.ProductCrawler
> > handleFile
> > INFO: Handling file
> > /mnt/celgene.rnd.combio.mmgp.external/TestSeqData/RNA-Seq/RawData/fast
> > q/cas-crawler-04-02-16.log.gz
> > Apr 02, 2016 10:12:17 PM
> > org.apache.oodt.cas.crawl.AutoDetectProductCrawler passesPreconditions
> > WARNING: No extractor specs specified for
> > /mnt/celgene.rnd.combio.mmgp.external/TestSeqData/RNA-Seq/RawData/fast
> > q/cas-crawler-04-02-16.log.gz Apr 02, 2016 10:12:17 PM
> > org.apache.oodt.cas.crawl.ProductCrawler
> > handleFile
> > WARNING: Failed to pass preconditions for ingest of product:
> > [/mnt/celgene.rnd.combio.mmgp.external/TestSeqData/RNA-Seq/RawData/fas
> > tq/cas-crawler-04-02-16.log.gz] Apr 02, 2016 10:12:17 PM
> > org.apache.oodt.cas.crawl.ProductCrawler
> > handleFile
> > INFO: Handling file
> > /mnt/celgene.rnd.combio.mmgp.external/TestSeqData/RNA-Seq/RawData/fast
> > q/cas-crawler-04-02-16.tar.gz
> > Apr 02, 2016 10:12:17 PM
> > org.apache.oodt.cas.crawl.AutoDetectProductCrawler passesPreconditions
> > WARNING: No extractor specs specified for
> > /mnt/celgene.rnd.combio.mmgp.external/TestSeqData/RNA-Seq/RawData/fast
> > q/cas-crawler-04-02-16.tar.gz Apr 02, 2016 10:12:17 PM
> > org.apache.oodt.cas.crawl.ProductCrawler
> > handleFile
> > WARNING: Failed to pass preconditions for ingest of product:
> > [/mnt/celgene.rnd.combio.mmgp.external/TestSeqData/RNA-Seq/RawData/fas
> > tq/cas-crawler-04-02-16.tar.gz] Apr 02, 2016 10:12:17 PM
> > org.apache.oodt.cas.crawl.ProductCrawler
> > handleFile
> > INFO: Handling file
> > /mnt/celgene.rnd.combio.mmgp.external/TestSeqData/RNA-Seq/RawData/fast
> > q/cas-crawler-mnt-celgene.rnd.combio.mmgp.external-TestSeqData-RNA-Seq
> > -RawData-fastq-04-02-16.tar.gz
> > Apr 02, 2016 10:12:17 PM
> > org.apache.oodt.cas.crawl.AutoDetectProductCrawler passesPreconditions
> > WARNING: No extractor specs specified for
> > /mnt/celgene.rnd.combio.mmgp.external/TestSeqData/RNA-Seq/RawData/fast
> > q/cas-crawler-mnt-celgene.rnd.combio.mmgp.external-TestSeqData-RNA-Seq
> > -RawData-fastq-04-02-16.tar.gz Apr 02, 2016 10:12:17 PM
> > org.apache.oodt.cas.crawl.ProductCrawler
> > handleFile
> > WARNING: Failed to pass preconditions for ingest of product:
> > [/mnt/celgene.rnd.combio.mmgp.external/TestSeqData/RNA-Seq/RawData/fas
> > tq/cas-crawler-mnt-celgene.rnd.combio.mmgp.external-TestSeqData-RNA-Se
> > q-RawData-fastq-04-02-16.tar.gz] Apr 02, 2016 10:12:17 PM
> > org.apache.oodt.cas.crawl.ProductCrawler
> > handleFile
> > INFO: Handling file
> > /mnt/celgene.rnd.combio.mmgp.external/TestSeqData/RNA-Seq/RawData/fast
> > q/test
> > Apr 02, 2016 10:12:17 PM
> > org.apache.oodt.cas.metadata.preconditions.PreCondEvalUtils eval
> > INFO: Passed precondition comparator id
> > CheckThatDataFileSizeIsGreaterThanZero
> > Apr 02, 2016 10:12:17 PM
> > org.apache.oodt.cas.metadata.extractors.ExternMetExtractor
> > extrMetadata
> > INFO: Generating met file for product file:
> > [/mnt/celgene.rnd.combio.mmgp.external/TestSeqData/RNA-Seq/RawData/fas
> > tq/test]
> > Apr 02, 2016 10:12:17 PM
> > org.apache.oodt.cas.metadata.extractors.ExternMetExtractor
> > extrMetadata
> > INFO: Executing command line:
> > [/celgene/software/apache-oodt/crawler/bin/MetExtractorNGS.pl
> > /mnt/celgene.rnd.combio.mmgp.external/TestSeqData/RNA-Seq/RawData/fast
> > q/test
> > text ] with workingDir:
> > [/mnt/celgene.rnd.combio.mmgp.external/TestSeqData/RNA-Seq/RawData/fas
> > tq]
> > to extract metadata
> > OUTPUT: [DEBUG : MetExtractorNGS - 2016/04/02 22:12:18] - Accessing
> > NGS server at
> > https://urldefense.proofpoint.com/v2/url?u=http-3A__192.168.8.44-3A808
> > 2_RPC2&d=CwIBaQ&c=CZZujK3G2KuXGEKnzq-Hrg&r=wndYZ4MLMT9l3Zb2WZv2hq2O6yv
> > Z1Cs-T2gHY95y7ZA&m=Qaz0eKz7FHe35NMF43A17ey59ANhAqJD5ZfwZQC0VRo&s=tSci2
> > Q1bJj0cQnBHjjOwtZjjx9uNMoN5Bi-ABG0Q7Y4&e=
> > OUTPUT: [DEBUG : metadataPrepare - 2016/04/02 22:12:18] - addMetadata:
> > metadata for file_host are not in array format.Converting..
> > OUTPUT: [DEBUG : metadataPrepare - 2016/04/02 22:12:18] - addMetadata:
> > adding key/value [file_host]/[ip-192-168-8-66]
> > OUTPUT: [DEBUG : metadataPrepare - 2016/04/02 22:12:18] - addMetadata:
> > metadata for ProductType are not in array format.Converting..
> > OUTPUT: [DEBUG : metadataPrepare - 2016/04/02 22:12:18] - addMetadata:
> > adding key/value [ProductType]/[GenericFile]
> > OUTPUT: [DEBUG : metadataPrepare - 2016/04/02 22:12:18] - addMetadata:
> > metadata for ingest_user are not in array format.Converting..
> > OUTPUT: [DEBUG : metadataPrepare - 2016/04/02 22:12:18] - addMetadata:
> > adding key/value [ingest_user]/[kmavrommatis]
> > OUTPUT: [DEBUG : MetExtractorNGS - 2016/04/02 22:12:18] - The file
> > path is ARRAY(0x22d3f48). It will be added under the FilePath metadata
> > field
> > OUTPUT: [DEBUG : metadataPrepare - 2016/04/02 22:12:18] - addMetadata:
> > metadata for FilePath are not in array format.Converting..
> > OUTPUT: [DEBUG : metadataPrepare - 2016/04/02 22:12:18] - addMetadata:
> > adding key/value
> > [FilePath]/[/mnt/celgene.rnd.combio.mmgp.external/TestSeqData/RNA-Seq/
> > RawData/fastq/test]
> > OUTPUT: [DEBUG : MetExtractorNGS - 2016/04/02 22:12:18] - This file is
> > of type text
> > OUTPUT: [DEBUG : MetExtractorNGS - 2016/04/02 22:12:18] - Storing
> > metadata in file
> > /mnt/celgene.rnd.combio.mmgp.external/TestSeqData/RNA-Seq/RawData/fast
> > q/test.met
> > OUTPUT: [DEBUG : metadataPrepare - 2016/04/02 22:12:18] - Changing
> > /mnt/celgene.rnd.combio.mmgp.external/TestSeqData/RNA-Seq/RawData/fast
> > q/test
> > to
> > OUTPUT: [DEBUG : metadataPrepare - 2016/04/02 22:12:18] -
> > /mnt/celgene.rnd.combio.mmgp.external/TestSeqData/RNA-Seq/RawData/fast
> > q/test
> > OUTPUT: [DEBUG : metadataPrepare - 2016/04/02 22:12:18] - Changing
> > kmavrommatis to
> > OUTPUT: [DEBUG : metadataPrepare - 2016/04/02 22:12:18] - kmavrommatis
> > OUTPUT: [DEBUG : metadataPrepare - 2016/04/02 22:12:18] - Changing
> > GenericFile to
> > OUTPUT: [DEBUG : metadataPrepare - 2016/04/02 22:12:18] - GenericFile
> > OUTPUT: [DEBUG : metadataPrepare - 2016/04/02 22:12:18] - Changing
> > ip-192-168-8-66 to
> > OUTPUT: [DEBUG : metadataPrepare - 2016/04/02 22:12:18] -
> > ip-192-168-8-66
> > OUTPUT: [DEBUG : MetExtractorNGS - 2016/04/02 22:12:19] - Process
> > finished SUCCESSFULLY Apr 02, 2016 10:12:19 PM
> > org.apache.oodt.cas.metadata.extractors.ExternMetExtractor
> > extrMetadata
> > INFO: Met extraction successful for product file:
> > [/mnt/celgene.rnd.combio.mmgp.external/TestSeqData/RNA-Seq/RawData/fas
> > tq/test] Apr 02, 2016 10:12:19 PM
> > org.apache.oodt.cas.crawl.ProductCrawler ingest
> > INFO: ProductCrawler: Ready to ingest product:
> >
> [/mnt/celgene.rnd.combio.mmgp.external/TestSeqData/RNA-Seq/RawData/fastq/test]:
> > ProductType: [GenericFile]
> > Apr 02, 2016 10:12:19 PM
> > org.apache.oodt.cas.filemgr.ingest.StdIngester
> > setFileManager
> > INFO: StdIngester: connected to file manager:
> > [https://urldefense.proofpoint.com/v2/url?u=http-3A__192.168.8.44-3A90
> > 00&d=CwIBaQ&c=CZZujK3G2KuXGEKnzq-Hrg&r=wndYZ4MLMT9l3Zb2WZv2hq2O6yvZ1Cs
> > -T2gHY95y7ZA&m=Qaz0eKz7FHe35NMF43A17ey59ANhAqJD5ZfwZQC0VRo&s=OvpwZVR1X
> > qgKclL83VXAWh__c7nz87xK_nS-O7hIXqc&e= ] Apr 02, 2016 10:12:19 PM
> > org.apache.oodt.cas.filemgr.datatransfer.InPlaceDataTransferer
> > setFileManagerUrl
> > INFO: In Place Data Transfer to:
> > [https://urldefense.proofpoint.com/v2/url?u=http-3A__192.168.8.44-3A90
> > 00&d=CwIBaQ&c=CZZujK3G2KuXGEKnzq-Hrg&r=wndYZ4MLMT9l3Zb2WZv2hq2O6yvZ1Cs
> > -T2gHY95y7ZA&m=Qaz0eKz7FHe35NMF43A17ey59ANhAqJD5ZfwZQC0VRo&s=OvpwZVR1X
> > qgKclL83VXAWh__c7nz87xK_nS-O7hIXqc&e= ] enabled Apr 02, 2016 10:12:19
> > PM org.apache.oodt.cas.filemgr.ingest.StdIngester
> > ingest
> > INFO: StdIngester: ingesting product: ProductName: [test]: ProductType:
> > [GenericFile]: FileLocation:
> > [/mnt/celgene.rnd.combio.mmgp.external/TestSeqData/RNA-Seq/RawData/fas
> > tq/]
> > Apr 02, 2016 10:12:19 PM
> > org.apache.oodt.cas.filemgr.system.XmlRpcFileManagerClient
> > ingestProduct
> > FINEST: File Manager Client: clientTransfer enabled: transfering
> > product [test] Apr 02, 2016 10:12:19 PM
> > org.apache.oodt.cas.filemgr.versioning.VersioningUtils
> > createBasicDataStoreRefsFlat
> > FINE: VersioningUtils: Generated data store ref:
> > file:/opt/oodt/data/archive/test/test from origRef:
> > file:/mnt/celgene.rnd.combio.mmgp.external/TestSeqData/RNA-Seq/RawData
> > /fastq/test Apr 02, 2016 10:12:19 PM
> > org.apache.oodt.cas.crawl.ProductCrawler ingest
> > INFO: Successfully ingested product:
> >
> [/mnt/celgene.rnd.combio.mmgp.external/TestSeqData/RNA-Seq/RawData/fastq/test]:
> > product id: 4c8de2da-265a-48c4-8380-3f1103dfecfc
> > Apr 02, 2016 10:12:19 PM org.apache.oodt.cas.crawl.ProductCrawler
> > handleFile
> > INFO: Successful ingest of product:
> > [/mnt/celgene.rnd.combio.mmgp.external/TestSeqData/RNA-Seq/RawData/fas
> > tq/test]
> >
> >
> > *********************************************************
> > THIS ELECTRONIC MAIL MESSAGE AND ANY ATTACHMENT IS CONFIDENTIAL AND
> > MAY CONTAIN LEGALLY PRIVILEGED INFORMATION INTENDED ONLY FOR THE USE
> > OF THE INDIVIDUAL OR INDIVIDUALS NAMED ABOVE.
> > If the reader is not the intended recipient, or the employee or agent
> > responsible to deliver it to the intended recipient, you are hereby
> > notified that any dissemination, distribution or copying of this
> > communication is strictly prohibited. If you have received this
> > communication in error, please reply to the sender to notify us of the
> > error and delete the original message. Thank You.
> >
>
>
> --
> *Lewis*
>
> *********************************************************
> THIS ELECTRONIC MAIL MESSAGE AND ANY ATTACHMENT IS
> CONFIDENTIAL AND MAY CONTAIN LEGALLY PRIVILEGED
> INFORMATION INTENDED ONLY FOR THE USE OF THE INDIVIDUAL
> OR INDIVIDUALS NAMED ABOVE.
> If the reader is not the intended recipient, or the
> employee or agent responsible to deliver it to the
> intended recipient, you are hereby notified that any
> dissemination, distribution or copying of this
> communication is strictly prohibited. If you have
> received this communication in error, please reply to the
> sender to notify us of the error and delete the original
> message. Thank You.
>



-- 
*Lewis*

RE: Transition from OODT 0.6 to 0.12 cannot find extractor specifications

Posted by Konstantinos Mavrommatis <km...@celgene.com>.
Hi,
It seems to be happening for a number of types of files that I have in the mimetypes.xml. 
A few things are puzzling to me: this file which is a .gz file is not processed by the regular tika mimetypes which contains the gzip files
A file that has no extension, which defaults to txt is passed to the MetExtractor.pl and processed. 

Any ideas I can find what are the preconditions that fail ? I tried to change the log level to DEBUG for all components but I did not get much more information. This must be something that changed in the OODT releases >0.6 but could not find anything relevant in the release notes.
I also noticed in the documentation  of the AutoDecectProductCrawler that it uses the file met-extr-preconditions.xml which I could not find anywhere in the deployed OODT or the src directories. Could that be a reason for the problem I observe?

Thanks
K

-----Original Message-----
From: Lewis John Mcgibbney [mailto:lewis.mcgibbney@gmail.com] 
Sent: Monday, April 04, 2016 3:24 PM
To: dev@oodt.apache.org
Subject: Re: Transition from OODT 0.6 to 0.12 cannot find extractor specifications

Hi Konstantinos,
It appears to be happening with a tar.gz file as well right?

WARNING: No extractor specs specified for /mnt/celgene.rnd.combio.mmgp.external/TestSeqData/RNA-Seq/RawData/fastq/cas-crawler-04-02-16.log.gz

I wonder if it is the file names... However I would be extremely surprised as I've seen some much more verbose file naming.
Lewis

On Saturday, April 2, 2016, Konstantinos Mavrommatis < kmavrommatis@celgene.com> wrote:

> Hi,
> I am trying to replicate a fully functional service that I had setup 
> long time ago using OODT 0.6 but I am having the following problem 
> that does not allow me to ingest files. When I try to ingest files 
> with the extension fastq.gz I get the line:
> WARNING: No extractor specs specified for 
> /mnt/celgene.rnd.combio.mmgp.external/TestSeqData/RNA-Seq/RawData/fast
> q/E837642_R1.fastq.gz Apr 02, 2016 10:12:14 PM 
> org.apache.oodt.cas.crawl.ProductCrawler
> handleFile
> And of course the file is not ingested. This process works without 
> problem with OODT 0.6 on a different server.
>
> The crawler command I am running is:
> ./crawler_launcher \
> --operation \
> --launchAutoCrawler \
> --productPath $FILEPATH \
> --filemgrUrl $OODT_FILEMGR_URL \
> --clientTransferer
> org.apache.oodt.cas.filemgr.datatransfer.InPlaceDataTransferFactory \ 
> --mimeExtractorRepo ../policy/mime-extractor-map.xml \ --noRecur \ 
> --crawlForDirs 2>&1
>
>
>
> I have setup OODT 0.12 on a server which runs FM listening to port 9000.
> From a client machine I have verified that I can use FM to ingest products.
> I am now trying to use crawler to crawl and ingest all files in a 
> directory. Since I have non standard MIME types in these directories I 
> have done the following:
> 1. Added my own mime types in policy/mimetypes.xml eg
>   <mime-type type="text/fastq">
>                 <glob pattern="*.fastq"/>
>                 <glob pattern="*.fastq.gz"/>
>                 <glob pattern="*.fastq.bz"/>
>                 <glob pattern="*.fastq.bz2"/>
>                 <glob pattern="*.fastq.bzip"/>
>                 <glob pattern="*.fq"/>
>                 <glob pattern="*.fq.gz"/>
>                 <glob pattern="*.fq.bz"/>
>                 <glob pattern="*.fq.bz2"/>
>                 <glob pattern="*.fq.bzip"/>
>         </mime-type>
> 2. created the file policy/mime-extractor-map.xml
>
>         <mime type="text/fastq">
>                 <extractor
> class="org.apache.oodt.cas.metadata.extractors.ExternMetExtractor">
>                         <config
> file="/apache-oodt/crawler/bin/fastq.config"/>
>                         <preCondComparators>
>                                 <preCondComparator 
> id="CheckThatDataFileSizeIsGreaterThanZero"/>
>                         </preCondComparators>
>                 </extractor>
>         </mime>
>
> 3. created the file fastq.config
> <?xml version="1.0" encoding="UTF-8"?> <cas:externextractor 
> xmlns:cas="http://oodt.jpl.nasa.gov/1.0/cas">
>   <exec workingDir="">
>
> <extractorBinPath>/apache-oodt/crawler/bin/MetExtractorNGS.pl</extractorBinPath>
>       <args>
>          <arg isDataFile="true"></arg>
>         <arg>fastq</arg>
>       </args>
>    </exec>
> </cas:externextractor>
>
>
>
> The MetExtractorNGS.pl is a small perl script that opens the file to 
> be ingested, gets some information and stores it in the .met file that 
> corresponds to the file to be ingested and have manually verified that 
> works as expected producing the correct met file.
>
> What am I missing here? Any ideas comments suggestions will be greatly 
> appreciated.
> Thanks in advance for any help
> Kostas
>
>
>
> PS1 The full output from running the crawler command follows:
>
>
> Setting property 'StdProductCrawler.filemgrUrl'
> Setting property 'MetExtractorProductCrawler.filemgrUrl'
> Setting property 'AutoDetectProductCrawler.filemgrUrl'
> Setting property 'StdProductCrawler.clientTransferer'
> Setting property 'MetExtractorProductCrawler.clientTransferer'
> Setting property 'AutoDetectProductCrawler.clientTransferer'
> Setting property 'StdProductCrawler.noRecur'
> Setting property 'MetExtractorProductCrawler.noRecur'
> Setting property 'AutoDetectProductCrawler.noRecur'
> Setting property 'AutoDetectProductCrawler.mimeExtractorRepo'
> Setting property 'StdProductCrawler.productPath'
> Setting property 'MetExtractorProductCrawler.productPath'
> Setting property 'AutoDetectProductCrawler.productPath'
> Apr 02, 2016 10:12:13 PM
> org.springframework.beans.factory.config.PropertyOverrideConfigurer
> processKey
> FINE: Property 'AutoDetectProductCrawler.noRecur' set to value [true] 
> Apr 02, 2016 10:12:13 PM 
> org.springframework.beans.factory.config.PropertyOverrideConfigurer
> processKey
> FINE: Property 'StdProductCrawler.productPath' set to value 
> [/mnt/celgene.rnd.combio.mmgp.external/TestSeqData/RNA-Seq/RawData/fas
> tq]
> Apr 02, 2016 10:12:13 PM
> org.springframework.beans.factory.config.PropertyOverrideConfigurer
> processKey
> FINE: Property 'MetExtractorProductCrawler.noRecur' set to value 
> [true] Apr 02, 2016 10:12:13 PM 
> org.springframework.beans.factory.config.PropertyOverrideConfigurer
> processKey
> FINE: Property 'AutoDetectProductCrawler.mimeExtractorRepo' set to 
> value [../policy/mime-extractor-map.xml]
> Apr 02, 2016 10:12:13 PM
> org.springframework.beans.factory.config.PropertyOverrideConfigurer
> processKey
> FINE: Property 'MetExtractorProductCrawler.clientTransferer' set to 
> value 
> [org.apache.oodt.cas.filemgr.datatransfer.InPlaceDataTransferFactory]
> Apr 02, 2016 10:12:13 PM
> org.springframework.beans.factory.config.PropertyOverrideConfigurer
> processKey
> FINE: Property 'AutoDetectProductCrawler.filemgrUrl' set to value [ 
> https://urldefense.proofpoint.com/v2/url?u=http-3A__192.168.8.44-3A900
> 0&d=CwIBaQ&c=CZZujK3G2KuXGEKnzq-Hrg&r=wndYZ4MLMT9l3Zb2WZv2hq2O6yvZ1Cs-
> T2gHY95y7ZA&m=Qaz0eKz7FHe35NMF43A17ey59ANhAqJD5ZfwZQC0VRo&s=OvpwZVR1Xq
> gKclL83VXAWh__c7nz87xK_nS-O7hIXqc&e= ] Apr 02, 2016 10:12:13 PM 
> org.springframework.beans.factory.config.PropertyOverrideConfigurer
> processKey
> FINE: Property 'AutoDetectProductCrawler.clientTransferer' set to 
> value 
> [org.apache.oodt.cas.filemgr.datatransfer.InPlaceDataTransferFactory]
> Apr 02, 2016 10:12:13 PM
> org.springframework.beans.factory.config.PropertyOverrideConfigurer
> processKey
> FINE: Property 'StdProductCrawler.noRecur' set to value [true] Apr 02, 
> 2016 10:12:13 PM 
> org.springframework.beans.factory.config.PropertyOverrideConfigurer
> processKey
> FINE: Property 'StdProductCrawler.filemgrUrl' set to value [ 
> https://urldefense.proofpoint.com/v2/url?u=http-3A__192.168.8.44-3A900
> 0&d=CwIBaQ&c=CZZujK3G2KuXGEKnzq-Hrg&r=wndYZ4MLMT9l3Zb2WZv2hq2O6yvZ1Cs-
> T2gHY95y7ZA&m=Qaz0eKz7FHe35NMF43A17ey59ANhAqJD5ZfwZQC0VRo&s=OvpwZVR1Xq
> gKclL83VXAWh__c7nz87xK_nS-O7hIXqc&e= ] Apr 02, 2016 10:12:13 PM 
> org.springframework.beans.factory.config.PropertyOverrideConfigurer
> processKey
> FINE: Property 'AutoDetectProductCrawler.productPath' set to value 
> [/mnt/celgene.rnd.combio.mmgp.external/TestSeqData/RNA-Seq/RawData/fas
> tq]
> Apr 02, 2016 10:12:13 PM
> org.springframework.beans.factory.config.PropertyOverrideConfigurer
> processKey
> FINE: Property 'MetExtractorProductCrawler.filemgrUrl' set to value [ 
> https://urldefense.proofpoint.com/v2/url?u=http-3A__192.168.8.44-3A900
> 0&d=CwIBaQ&c=CZZujK3G2KuXGEKnzq-Hrg&r=wndYZ4MLMT9l3Zb2WZv2hq2O6yvZ1Cs-
> T2gHY95y7ZA&m=Qaz0eKz7FHe35NMF43A17ey59ANhAqJD5ZfwZQC0VRo&s=OvpwZVR1Xq
> gKclL83VXAWh__c7nz87xK_nS-O7hIXqc&e= ] Apr 02, 2016 10:12:13 PM 
> org.springframework.beans.factory.config.PropertyOverrideConfigurer
> processKey
> FINE: Property 'StdProductCrawler.clientTransferer' set to value 
> [org.apache.oodt.cas.filemgr.datatransfer.InPlaceDataTransferFactory]
> Apr 02, 2016 10:12:13 PM
> org.springframework.beans.factory.config.PropertyOverrideConfigurer
> processKey
> FINE: Property 'MetExtractorProductCrawler.productPath' set to value 
> [/mnt/celgene.rnd.combio.mmgp.external/TestSeqData/RNA-Seq/RawData/fas
> tq] Apr 02, 2016 10:12:13 PM org.apache.oodt.cas.crawl.ProductCrawler 
> crawl
> INFO: Crawling
> /mnt/celgene.rnd.combio.mmgp.external/TestSeqData/RNA-Seq/RawData/fast
> q Apr 02, 2016 10:12:13 PM org.apache.oodt.cas.crawl.ProductCrawler
> handleFile
> INFO: Handling file
> /mnt/celgene.rnd.combio.mmgp.external/TestSeqData/RNA-Seq/RawData/fast
> q/E837642_R1.fastq.gz
> Apr 02, 2016 10:12:14 PM
> org.apache.oodt.cas.crawl.AutoDetectProductCrawler passesPreconditions
> WARNING: No extractor specs specified for 
> /mnt/celgene.rnd.combio.mmgp.external/TestSeqData/RNA-Seq/RawData/fast
> q/E837642_R1.fastq.gz Apr 02, 2016 10:12:14 PM 
> org.apache.oodt.cas.crawl.ProductCrawler
> handleFile
> WARNING: Failed to pass preconditions for ingest of product:
> [/mnt/celgene.rnd.combio.mmgp.external/TestSeqData/RNA-Seq/RawData/fas
> tq/E837642_R1.fastq.gz] Apr 02, 2016 10:12:14 PM 
> org.apache.oodt.cas.crawl.ProductCrawler
> handleFile
> INFO: Handling file
> /mnt/celgene.rnd.combio.mmgp.external/TestSeqData/RNA-Seq/RawData/fast
> q/E837642_R1.fastq.gz.met
> Apr 02, 2016 10:12:14 PM
> org.apache.oodt.cas.metadata.preconditions.PreCondEvalUtils eval
> INFO: Passed precondition comparator id 
> CheckThatDataFileSizeIsGreaterThanZero
> Apr 02, 2016 10:12:14 PM
> org.apache.oodt.cas.metadata.extractors.ExternMetExtractor 
> extrMetadata
> INFO: Generating met file for product file:
> [/mnt/celgene.rnd.combio.mmgp.external/TestSeqData/RNA-Seq/RawData/fas
> tq/E837642_R1.fastq.gz.met]
> Apr 02, 2016 10:12:14 PM
> org.apache.oodt.cas.metadata.extractors.ExternMetExtractor 
> extrMetadata
> INFO: Executing command line:
> [/celgene/software/apache-oodt/crawler/bin/MetExtractorNGS.pl
> /mnt/celgene.rnd.combio.mmgp.external/TestSeqData/RNA-Seq/RawData/fast
> q/E837642_R1.fastq.gz.met
> text ] with workingDir:
> [/mnt/celgene.rnd.combio.mmgp.external/TestSeqData/RNA-Seq/RawData/fas
> tq]
> to extract metadata
> OUTPUT: [WARN : MetExtractorNGS - 2016/04/02 22:12:15] - Input file 
> /mnt/celgene.rnd.combio.mmgp.external/TestSeqData/RNA-Seq/RawData/fast
> q/E837642_R1.fastq.gz.met will be ignored. .met files are not 
> processed !
> Apr 02, 2016 10:12:15 PM org.apache.oodt.cas.crawl.ProductCrawler
> handleFile
> SEVERE: Failed to get metadata for product : Met extractor failed to 
> create metadata file
> org.apache.oodt.cas.metadata.exceptions.MetExtractionException: Met 
> extractor failed to create metadata file
>         at
> org.apache.oodt.cas.metadata.extractors.ExternMetExtractor.extrMetadata(ExternMetExtractor.java:120)
>         at
> org.apache.oodt.cas.metadata.AbstractMetExtractor.extractMetadata(AbstractMetExtractor.java:74)
>         at
> org.apache.oodt.cas.crawl.AutoDetectProductCrawler.getMetadataForProduct(AutoDetectProductCrawler.java:84)
>         at
> org.apache.oodt.cas.crawl.ProductCrawler.handleFile(ProductCrawler.java:136)
>         at
> org.apache.oodt.cas.crawl.ProductCrawler.crawl(ProductCrawler.java:104)
>         at
> org.apache.oodt.cas.crawl.ProductCrawler.crawl(ProductCrawler.java:74)
>         at
> org.apache.oodt.cas.crawl.cli.action.CrawlerLauncherCliAction.execute(CrawlerLauncherCliAction.java:58)
>         at
> org.apache.oodt.cas.cli.CmdLineUtility.execute(CmdLineUtility.java:331)
>         at
> org.apache.oodt.cas.cli.CmdLineUtility.run(CmdLineUtility.java:188)
>         at
> org.apache.oodt.cas.crawl.CrawlerLauncher.main(CrawlerLauncher.java:36
> )
>
> Apr 02, 2016 10:12:15 PM org.apache.oodt.cas.crawl.ProductCrawler
> handleFile
> INFO: Handling file
> /mnt/celgene.rnd.combio.mmgp.external/TestSeqData/RNA-Seq/RawData/fast
> q/E837642_R2.fastq.gz
> Apr 02, 2016 10:12:15 PM
> org.apache.oodt.cas.crawl.AutoDetectProductCrawler passesPreconditions
> WARNING: No extractor specs specified for 
> /mnt/celgene.rnd.combio.mmgp.external/TestSeqData/RNA-Seq/RawData/fast
> q/E837642_R2.fastq.gz Apr 02, 2016 10:12:15 PM 
> org.apache.oodt.cas.crawl.ProductCrawler
> handleFile
> WARNING: Failed to pass preconditions for ingest of product:
> [/mnt/celgene.rnd.combio.mmgp.external/TestSeqData/RNA-Seq/RawData/fas
> tq/E837642_R2.fastq.gz] Apr 02, 2016 10:12:15 PM 
> org.apache.oodt.cas.crawl.ProductCrawler
> handleFile
> INFO: Handling file
> /mnt/celgene.rnd.combio.mmgp.external/TestSeqData/RNA-Seq/RawData/fast
> q/E837642_R2.fastq.gz.met
> Apr 02, 2016 10:12:15 PM
> org.apache.oodt.cas.metadata.preconditions.PreCondEvalUtils eval
> INFO: Passed precondition comparator id 
> CheckThatDataFileSizeIsGreaterThanZero
> Apr 02, 2016 10:12:16 PM
> org.apache.oodt.cas.metadata.extractors.ExternMetExtractor 
> extrMetadata
> INFO: Generating met file for product file:
> [/mnt/celgene.rnd.combio.mmgp.external/TestSeqData/RNA-Seq/RawData/fas
> tq/E837642_R2.fastq.gz.met]
> Apr 02, 2016 10:12:16 PM
> org.apache.oodt.cas.metadata.extractors.ExternMetExtractor 
> extrMetadata
> INFO: Executing command line:
> [/celgene/software/apache-oodt/crawler/bin/MetExtractorNGS.pl
> /mnt/celgene.rnd.combio.mmgp.external/TestSeqData/RNA-Seq/RawData/fast
> q/E837642_R2.fastq.gz.met
> text ] with workingDir:
> [/mnt/celgene.rnd.combio.mmgp.external/TestSeqData/RNA-Seq/RawData/fas
> tq]
> to extract metadata
> OUTPUT: [WARN : MetExtractorNGS - 2016/04/02 22:12:16] - Input file 
> /mnt/celgene.rnd.combio.mmgp.external/TestSeqData/RNA-Seq/RawData/fast
> q/E837642_R2.fastq.gz.met will be ignored. .met files are not 
> processed !
> Apr 02, 2016 10:12:17 PM org.apache.oodt.cas.crawl.ProductCrawler
> handleFile
> SEVERE: Failed to get metadata for product : Met extractor failed to 
> create metadata file
> org.apache.oodt.cas.metadata.exceptions.MetExtractionException: Met 
> extractor failed to create metadata file
>         at
> org.apache.oodt.cas.metadata.extractors.ExternMetExtractor.extrMetadata(ExternMetExtractor.java:120)
>         at
> org.apache.oodt.cas.metadata.AbstractMetExtractor.extractMetadata(AbstractMetExtractor.java:74)
>         at
> org.apache.oodt.cas.crawl.AutoDetectProductCrawler.getMetadataForProduct(AutoDetectProductCrawler.java:84)
>         at
> org.apache.oodt.cas.crawl.ProductCrawler.handleFile(ProductCrawler.java:136)
>         at
> org.apache.oodt.cas.crawl.ProductCrawler.crawl(ProductCrawler.java:104)
>         at
> org.apache.oodt.cas.crawl.ProductCrawler.crawl(ProductCrawler.java:74)
>         at
> org.apache.oodt.cas.crawl.cli.action.CrawlerLauncherCliAction.execute(CrawlerLauncherCliAction.java:58)
>         at
> org.apache.oodt.cas.cli.CmdLineUtility.execute(CmdLineUtility.java:331)
>         at
> org.apache.oodt.cas.cli.CmdLineUtility.run(CmdLineUtility.java:188)
>         at
> org.apache.oodt.cas.crawl.CrawlerLauncher.main(CrawlerLauncher.java:36
> )
>
> Apr 02, 2016 10:12:17 PM org.apache.oodt.cas.crawl.ProductCrawler
> handleFile
> INFO: Handling file
> /mnt/celgene.rnd.combio.mmgp.external/TestSeqData/RNA-Seq/RawData/fast
> q/cas-crawler-04-02-16.log.gz
> Apr 02, 2016 10:12:17 PM
> org.apache.oodt.cas.crawl.AutoDetectProductCrawler passesPreconditions
> WARNING: No extractor specs specified for 
> /mnt/celgene.rnd.combio.mmgp.external/TestSeqData/RNA-Seq/RawData/fast
> q/cas-crawler-04-02-16.log.gz Apr 02, 2016 10:12:17 PM 
> org.apache.oodt.cas.crawl.ProductCrawler
> handleFile
> WARNING: Failed to pass preconditions for ingest of product:
> [/mnt/celgene.rnd.combio.mmgp.external/TestSeqData/RNA-Seq/RawData/fas
> tq/cas-crawler-04-02-16.log.gz] Apr 02, 2016 10:12:17 PM 
> org.apache.oodt.cas.crawl.ProductCrawler
> handleFile
> INFO: Handling file
> /mnt/celgene.rnd.combio.mmgp.external/TestSeqData/RNA-Seq/RawData/fast
> q/cas-crawler-04-02-16.tar.gz
> Apr 02, 2016 10:12:17 PM
> org.apache.oodt.cas.crawl.AutoDetectProductCrawler passesPreconditions
> WARNING: No extractor specs specified for 
> /mnt/celgene.rnd.combio.mmgp.external/TestSeqData/RNA-Seq/RawData/fast
> q/cas-crawler-04-02-16.tar.gz Apr 02, 2016 10:12:17 PM 
> org.apache.oodt.cas.crawl.ProductCrawler
> handleFile
> WARNING: Failed to pass preconditions for ingest of product:
> [/mnt/celgene.rnd.combio.mmgp.external/TestSeqData/RNA-Seq/RawData/fas
> tq/cas-crawler-04-02-16.tar.gz] Apr 02, 2016 10:12:17 PM 
> org.apache.oodt.cas.crawl.ProductCrawler
> handleFile
> INFO: Handling file
> /mnt/celgene.rnd.combio.mmgp.external/TestSeqData/RNA-Seq/RawData/fast
> q/cas-crawler-mnt-celgene.rnd.combio.mmgp.external-TestSeqData-RNA-Seq
> -RawData-fastq-04-02-16.tar.gz
> Apr 02, 2016 10:12:17 PM
> org.apache.oodt.cas.crawl.AutoDetectProductCrawler passesPreconditions
> WARNING: No extractor specs specified for 
> /mnt/celgene.rnd.combio.mmgp.external/TestSeqData/RNA-Seq/RawData/fast
> q/cas-crawler-mnt-celgene.rnd.combio.mmgp.external-TestSeqData-RNA-Seq
> -RawData-fastq-04-02-16.tar.gz Apr 02, 2016 10:12:17 PM 
> org.apache.oodt.cas.crawl.ProductCrawler
> handleFile
> WARNING: Failed to pass preconditions for ingest of product:
> [/mnt/celgene.rnd.combio.mmgp.external/TestSeqData/RNA-Seq/RawData/fas
> tq/cas-crawler-mnt-celgene.rnd.combio.mmgp.external-TestSeqData-RNA-Se
> q-RawData-fastq-04-02-16.tar.gz] Apr 02, 2016 10:12:17 PM 
> org.apache.oodt.cas.crawl.ProductCrawler
> handleFile
> INFO: Handling file
> /mnt/celgene.rnd.combio.mmgp.external/TestSeqData/RNA-Seq/RawData/fast
> q/test
> Apr 02, 2016 10:12:17 PM
> org.apache.oodt.cas.metadata.preconditions.PreCondEvalUtils eval
> INFO: Passed precondition comparator id 
> CheckThatDataFileSizeIsGreaterThanZero
> Apr 02, 2016 10:12:17 PM
> org.apache.oodt.cas.metadata.extractors.ExternMetExtractor 
> extrMetadata
> INFO: Generating met file for product file:
> [/mnt/celgene.rnd.combio.mmgp.external/TestSeqData/RNA-Seq/RawData/fas
> tq/test]
> Apr 02, 2016 10:12:17 PM
> org.apache.oodt.cas.metadata.extractors.ExternMetExtractor 
> extrMetadata
> INFO: Executing command line:
> [/celgene/software/apache-oodt/crawler/bin/MetExtractorNGS.pl
> /mnt/celgene.rnd.combio.mmgp.external/TestSeqData/RNA-Seq/RawData/fast
> q/test
> text ] with workingDir:
> [/mnt/celgene.rnd.combio.mmgp.external/TestSeqData/RNA-Seq/RawData/fas
> tq]
> to extract metadata
> OUTPUT: [DEBUG : MetExtractorNGS - 2016/04/02 22:12:18] - Accessing 
> NGS server at 
> https://urldefense.proofpoint.com/v2/url?u=http-3A__192.168.8.44-3A808
> 2_RPC2&d=CwIBaQ&c=CZZujK3G2KuXGEKnzq-Hrg&r=wndYZ4MLMT9l3Zb2WZv2hq2O6yv
> Z1Cs-T2gHY95y7ZA&m=Qaz0eKz7FHe35NMF43A17ey59ANhAqJD5ZfwZQC0VRo&s=tSci2
> Q1bJj0cQnBHjjOwtZjjx9uNMoN5Bi-ABG0Q7Y4&e=
> OUTPUT: [DEBUG : metadataPrepare - 2016/04/02 22:12:18] - addMetadata:
> metadata for file_host are not in array format.Converting..
> OUTPUT: [DEBUG : metadataPrepare - 2016/04/02 22:12:18] - addMetadata:
> adding key/value [file_host]/[ip-192-168-8-66]
> OUTPUT: [DEBUG : metadataPrepare - 2016/04/02 22:12:18] - addMetadata:
> metadata for ProductType are not in array format.Converting..
> OUTPUT: [DEBUG : metadataPrepare - 2016/04/02 22:12:18] - addMetadata:
> adding key/value [ProductType]/[GenericFile]
> OUTPUT: [DEBUG : metadataPrepare - 2016/04/02 22:12:18] - addMetadata:
> metadata for ingest_user are not in array format.Converting..
> OUTPUT: [DEBUG : metadataPrepare - 2016/04/02 22:12:18] - addMetadata:
> adding key/value [ingest_user]/[kmavrommatis]
> OUTPUT: [DEBUG : MetExtractorNGS - 2016/04/02 22:12:18] - The file 
> path is ARRAY(0x22d3f48). It will be added under the FilePath metadata 
> field
> OUTPUT: [DEBUG : metadataPrepare - 2016/04/02 22:12:18] - addMetadata:
> metadata for FilePath are not in array format.Converting..
> OUTPUT: [DEBUG : metadataPrepare - 2016/04/02 22:12:18] - addMetadata:
> adding key/value
> [FilePath]/[/mnt/celgene.rnd.combio.mmgp.external/TestSeqData/RNA-Seq/
> RawData/fastq/test]
> OUTPUT: [DEBUG : MetExtractorNGS - 2016/04/02 22:12:18] - This file is 
> of type text
> OUTPUT: [DEBUG : MetExtractorNGS - 2016/04/02 22:12:18] - Storing 
> metadata in file 
> /mnt/celgene.rnd.combio.mmgp.external/TestSeqData/RNA-Seq/RawData/fast
> q/test.met
> OUTPUT: [DEBUG : metadataPrepare - 2016/04/02 22:12:18] - Changing 
> /mnt/celgene.rnd.combio.mmgp.external/TestSeqData/RNA-Seq/RawData/fast
> q/test
> to
> OUTPUT: [DEBUG : metadataPrepare - 2016/04/02 22:12:18] - 
> /mnt/celgene.rnd.combio.mmgp.external/TestSeqData/RNA-Seq/RawData/fast
> q/test
> OUTPUT: [DEBUG : metadataPrepare - 2016/04/02 22:12:18] - Changing 
> kmavrommatis to
> OUTPUT: [DEBUG : metadataPrepare - 2016/04/02 22:12:18] - kmavrommatis
> OUTPUT: [DEBUG : metadataPrepare - 2016/04/02 22:12:18] - Changing 
> GenericFile to
> OUTPUT: [DEBUG : metadataPrepare - 2016/04/02 22:12:18] - GenericFile
> OUTPUT: [DEBUG : metadataPrepare - 2016/04/02 22:12:18] - Changing
> ip-192-168-8-66 to
> OUTPUT: [DEBUG : metadataPrepare - 2016/04/02 22:12:18] - 
> ip-192-168-8-66
> OUTPUT: [DEBUG : MetExtractorNGS - 2016/04/02 22:12:19] - Process 
> finished SUCCESSFULLY Apr 02, 2016 10:12:19 PM 
> org.apache.oodt.cas.metadata.extractors.ExternMetExtractor 
> extrMetadata
> INFO: Met extraction successful for product file:
> [/mnt/celgene.rnd.combio.mmgp.external/TestSeqData/RNA-Seq/RawData/fas
> tq/test] Apr 02, 2016 10:12:19 PM 
> org.apache.oodt.cas.crawl.ProductCrawler ingest
> INFO: ProductCrawler: Ready to ingest product:
> [/mnt/celgene.rnd.combio.mmgp.external/TestSeqData/RNA-Seq/RawData/fastq/test]:
> ProductType: [GenericFile]
> Apr 02, 2016 10:12:19 PM 
> org.apache.oodt.cas.filemgr.ingest.StdIngester
> setFileManager
> INFO: StdIngester: connected to file manager: 
> [https://urldefense.proofpoint.com/v2/url?u=http-3A__192.168.8.44-3A90
> 00&d=CwIBaQ&c=CZZujK3G2KuXGEKnzq-Hrg&r=wndYZ4MLMT9l3Zb2WZv2hq2O6yvZ1Cs
> -T2gHY95y7ZA&m=Qaz0eKz7FHe35NMF43A17ey59ANhAqJD5ZfwZQC0VRo&s=OvpwZVR1X
> qgKclL83VXAWh__c7nz87xK_nS-O7hIXqc&e= ] Apr 02, 2016 10:12:19 PM 
> org.apache.oodt.cas.filemgr.datatransfer.InPlaceDataTransferer
> setFileManagerUrl
> INFO: In Place Data Transfer to: 
> [https://urldefense.proofpoint.com/v2/url?u=http-3A__192.168.8.44-3A90
> 00&d=CwIBaQ&c=CZZujK3G2KuXGEKnzq-Hrg&r=wndYZ4MLMT9l3Zb2WZv2hq2O6yvZ1Cs
> -T2gHY95y7ZA&m=Qaz0eKz7FHe35NMF43A17ey59ANhAqJD5ZfwZQC0VRo&s=OvpwZVR1X
> qgKclL83VXAWh__c7nz87xK_nS-O7hIXqc&e= ] enabled Apr 02, 2016 10:12:19 
> PM org.apache.oodt.cas.filemgr.ingest.StdIngester
> ingest
> INFO: StdIngester: ingesting product: ProductName: [test]: ProductType:
> [GenericFile]: FileLocation:
> [/mnt/celgene.rnd.combio.mmgp.external/TestSeqData/RNA-Seq/RawData/fas
> tq/]
> Apr 02, 2016 10:12:19 PM
> org.apache.oodt.cas.filemgr.system.XmlRpcFileManagerClient 
> ingestProduct
> FINEST: File Manager Client: clientTransfer enabled: transfering 
> product [test] Apr 02, 2016 10:12:19 PM 
> org.apache.oodt.cas.filemgr.versioning.VersioningUtils
> createBasicDataStoreRefsFlat
> FINE: VersioningUtils: Generated data store ref:
> file:/opt/oodt/data/archive/test/test from origRef:
> file:/mnt/celgene.rnd.combio.mmgp.external/TestSeqData/RNA-Seq/RawData
> /fastq/test Apr 02, 2016 10:12:19 PM 
> org.apache.oodt.cas.crawl.ProductCrawler ingest
> INFO: Successfully ingested product:
> [/mnt/celgene.rnd.combio.mmgp.external/TestSeqData/RNA-Seq/RawData/fastq/test]:
> product id: 4c8de2da-265a-48c4-8380-3f1103dfecfc
> Apr 02, 2016 10:12:19 PM org.apache.oodt.cas.crawl.ProductCrawler
> handleFile
> INFO: Successful ingest of product:
> [/mnt/celgene.rnd.combio.mmgp.external/TestSeqData/RNA-Seq/RawData/fas
> tq/test]
>
>
> *********************************************************
> THIS ELECTRONIC MAIL MESSAGE AND ANY ATTACHMENT IS CONFIDENTIAL AND 
> MAY CONTAIN LEGALLY PRIVILEGED INFORMATION INTENDED ONLY FOR THE USE 
> OF THE INDIVIDUAL OR INDIVIDUALS NAMED ABOVE.
> If the reader is not the intended recipient, or the employee or agent 
> responsible to deliver it to the intended recipient, you are hereby 
> notified that any dissemination, distribution or copying of this 
> communication is strictly prohibited. If you have received this 
> communication in error, please reply to the sender to notify us of the 
> error and delete the original message. Thank You.
>


--
*Lewis*

*********************************************************
THIS ELECTRONIC MAIL MESSAGE AND ANY ATTACHMENT IS
CONFIDENTIAL AND MAY CONTAIN LEGALLY PRIVILEGED
INFORMATION INTENDED ONLY FOR THE USE OF THE INDIVIDUAL
OR INDIVIDUALS NAMED ABOVE.
If the reader is not the intended recipient, or the
employee or agent responsible to deliver it to the
intended recipient, you are hereby notified that any
dissemination, distribution or copying of this
communication is strictly prohibited. If you have
received this communication in error, please reply to the
sender to notify us of the error and delete the original
message. Thank You.

Re: Transition from OODT 0.6 to 0.12 cannot find extractor specifications

Posted by Lewis John Mcgibbney <le...@gmail.com>.
Hi Konstantinos,
It appears to be happening with a tar.gz file as well right?

WARNING: No extractor specs specified for
/mnt/celgene.rnd.combio.mmgp.external/TestSeqData/RNA-Seq/RawData/fastq/cas-crawler-04-02-16.log.gz

I wonder if it is the file names... However I would be extremely surprised
as I've seen some much more verbose file naming.
Lewis

On Saturday, April 2, 2016, Konstantinos Mavrommatis <
kmavrommatis@celgene.com> wrote:

> Hi,
> I am trying to replicate a fully functional service that I had setup long
> time ago using OODT 0.6 but I am having the following problem that does not
> allow me to ingest files. When I try to ingest files with the extension
> fastq.gz I get the line:
> WARNING: No extractor specs specified for
> /mnt/celgene.rnd.combio.mmgp.external/TestSeqData/RNA-Seq/RawData/fastq/E837642_R1.fastq.gz
> Apr 02, 2016 10:12:14 PM org.apache.oodt.cas.crawl.ProductCrawler
> handleFile
> And of course the file is not ingested. This process works without problem
> with OODT 0.6 on a different server.
>
> The crawler command I am running is:
> ./crawler_launcher \
> --operation \
> --launchAutoCrawler \
> --productPath $FILEPATH \
> --filemgrUrl $OODT_FILEMGR_URL \
> --clientTransferer
> org.apache.oodt.cas.filemgr.datatransfer.InPlaceDataTransferFactory \
> --mimeExtractorRepo ../policy/mime-extractor-map.xml \
> --noRecur \
> --crawlForDirs 2>&1
>
>
>
> I have setup OODT 0.12 on a server which runs FM listening to port 9000.
> From a client machine I have verified that I can use FM to ingest products.
> I am now trying to use crawler to crawl and ingest all files in a
> directory. Since I have non standard MIME types in these directories I have
> done the following:
> 1. Added my own mime types in policy/mimetypes.xml eg
>   <mime-type type="text/fastq">
>                 <glob pattern="*.fastq"/>
>                 <glob pattern="*.fastq.gz"/>
>                 <glob pattern="*.fastq.bz"/>
>                 <glob pattern="*.fastq.bz2"/>
>                 <glob pattern="*.fastq.bzip"/>
>                 <glob pattern="*.fq"/>
>                 <glob pattern="*.fq.gz"/>
>                 <glob pattern="*.fq.bz"/>
>                 <glob pattern="*.fq.bz2"/>
>                 <glob pattern="*.fq.bzip"/>
>         </mime-type>
> 2. created the file policy/mime-extractor-map.xml
>
>         <mime type="text/fastq">
>                 <extractor
> class="org.apache.oodt.cas.metadata.extractors.ExternMetExtractor">
>                         <config
> file="/apache-oodt/crawler/bin/fastq.config"/>
>                         <preCondComparators>
>                                 <preCondComparator
> id="CheckThatDataFileSizeIsGreaterThanZero"/>
>                         </preCondComparators>
>                 </extractor>
>         </mime>
>
> 3. created the file fastq.config
> <?xml version="1.0" encoding="UTF-8"?>
> <cas:externextractor xmlns:cas="http://oodt.jpl.nasa.gov/1.0/cas">
>   <exec workingDir="">
>
> <extractorBinPath>/apache-oodt/crawler/bin/MetExtractorNGS.pl</extractorBinPath>
>       <args>
>          <arg isDataFile="true"></arg>
>         <arg>fastq</arg>
>       </args>
>    </exec>
> </cas:externextractor>
>
>
>
> The MetExtractorNGS.pl is a small perl script that opens the file to be
> ingested, gets some information and stores it in the .met file that
> corresponds to the file to be ingested and have manually verified that
> works as expected producing the correct met file.
>
> What am I missing here? Any ideas comments suggestions will be greatly
> appreciated.
> Thanks in advance for any help
> Kostas
>
>
>
> PS1 The full output from running the crawler command follows:
>
>
> Setting property 'StdProductCrawler.filemgrUrl'
> Setting property 'MetExtractorProductCrawler.filemgrUrl'
> Setting property 'AutoDetectProductCrawler.filemgrUrl'
> Setting property 'StdProductCrawler.clientTransferer'
> Setting property 'MetExtractorProductCrawler.clientTransferer'
> Setting property 'AutoDetectProductCrawler.clientTransferer'
> Setting property 'StdProductCrawler.noRecur'
> Setting property 'MetExtractorProductCrawler.noRecur'
> Setting property 'AutoDetectProductCrawler.noRecur'
> Setting property 'AutoDetectProductCrawler.mimeExtractorRepo'
> Setting property 'StdProductCrawler.productPath'
> Setting property 'MetExtractorProductCrawler.productPath'
> Setting property 'AutoDetectProductCrawler.productPath'
> Apr 02, 2016 10:12:13 PM
> org.springframework.beans.factory.config.PropertyOverrideConfigurer
> processKey
> FINE: Property 'AutoDetectProductCrawler.noRecur' set to value [true]
> Apr 02, 2016 10:12:13 PM
> org.springframework.beans.factory.config.PropertyOverrideConfigurer
> processKey
> FINE: Property 'StdProductCrawler.productPath' set to value
> [/mnt/celgene.rnd.combio.mmgp.external/TestSeqData/RNA-Seq/RawData/fastq]
> Apr 02, 2016 10:12:13 PM
> org.springframework.beans.factory.config.PropertyOverrideConfigurer
> processKey
> FINE: Property 'MetExtractorProductCrawler.noRecur' set to value [true]
> Apr 02, 2016 10:12:13 PM
> org.springframework.beans.factory.config.PropertyOverrideConfigurer
> processKey
> FINE: Property 'AutoDetectProductCrawler.mimeExtractorRepo' set to value
> [../policy/mime-extractor-map.xml]
> Apr 02, 2016 10:12:13 PM
> org.springframework.beans.factory.config.PropertyOverrideConfigurer
> processKey
> FINE: Property 'MetExtractorProductCrawler.clientTransferer' set to value
> [org.apache.oodt.cas.filemgr.datatransfer.InPlaceDataTransferFactory]
> Apr 02, 2016 10:12:13 PM
> org.springframework.beans.factory.config.PropertyOverrideConfigurer
> processKey
> FINE: Property 'AutoDetectProductCrawler.filemgrUrl' set to value [
> http://192.168.8.44:9000]
> Apr 02, 2016 10:12:13 PM
> org.springframework.beans.factory.config.PropertyOverrideConfigurer
> processKey
> FINE: Property 'AutoDetectProductCrawler.clientTransferer' set to value
> [org.apache.oodt.cas.filemgr.datatransfer.InPlaceDataTransferFactory]
> Apr 02, 2016 10:12:13 PM
> org.springframework.beans.factory.config.PropertyOverrideConfigurer
> processKey
> FINE: Property 'StdProductCrawler.noRecur' set to value [true]
> Apr 02, 2016 10:12:13 PM
> org.springframework.beans.factory.config.PropertyOverrideConfigurer
> processKey
> FINE: Property 'StdProductCrawler.filemgrUrl' set to value [
> http://192.168.8.44:9000]
> Apr 02, 2016 10:12:13 PM
> org.springframework.beans.factory.config.PropertyOverrideConfigurer
> processKey
> FINE: Property 'AutoDetectProductCrawler.productPath' set to value
> [/mnt/celgene.rnd.combio.mmgp.external/TestSeqData/RNA-Seq/RawData/fastq]
> Apr 02, 2016 10:12:13 PM
> org.springframework.beans.factory.config.PropertyOverrideConfigurer
> processKey
> FINE: Property 'MetExtractorProductCrawler.filemgrUrl' set to value [
> http://192.168.8.44:9000]
> Apr 02, 2016 10:12:13 PM
> org.springframework.beans.factory.config.PropertyOverrideConfigurer
> processKey
> FINE: Property 'StdProductCrawler.clientTransferer' set to value
> [org.apache.oodt.cas.filemgr.datatransfer.InPlaceDataTransferFactory]
> Apr 02, 2016 10:12:13 PM
> org.springframework.beans.factory.config.PropertyOverrideConfigurer
> processKey
> FINE: Property 'MetExtractorProductCrawler.productPath' set to value
> [/mnt/celgene.rnd.combio.mmgp.external/TestSeqData/RNA-Seq/RawData/fastq]
> Apr 02, 2016 10:12:13 PM org.apache.oodt.cas.crawl.ProductCrawler crawl
> INFO: Crawling
> /mnt/celgene.rnd.combio.mmgp.external/TestSeqData/RNA-Seq/RawData/fastq
> Apr 02, 2016 10:12:13 PM org.apache.oodt.cas.crawl.ProductCrawler
> handleFile
> INFO: Handling file
> /mnt/celgene.rnd.combio.mmgp.external/TestSeqData/RNA-Seq/RawData/fastq/E837642_R1.fastq.gz
> Apr 02, 2016 10:12:14 PM
> org.apache.oodt.cas.crawl.AutoDetectProductCrawler passesPreconditions
> WARNING: No extractor specs specified for
> /mnt/celgene.rnd.combio.mmgp.external/TestSeqData/RNA-Seq/RawData/fastq/E837642_R1.fastq.gz
> Apr 02, 2016 10:12:14 PM org.apache.oodt.cas.crawl.ProductCrawler
> handleFile
> WARNING: Failed to pass preconditions for ingest of product:
> [/mnt/celgene.rnd.combio.mmgp.external/TestSeqData/RNA-Seq/RawData/fastq/E837642_R1.fastq.gz]
> Apr 02, 2016 10:12:14 PM org.apache.oodt.cas.crawl.ProductCrawler
> handleFile
> INFO: Handling file
> /mnt/celgene.rnd.combio.mmgp.external/TestSeqData/RNA-Seq/RawData/fastq/E837642_R1.fastq.gz.met
> Apr 02, 2016 10:12:14 PM
> org.apache.oodt.cas.metadata.preconditions.PreCondEvalUtils eval
> INFO: Passed precondition comparator id
> CheckThatDataFileSizeIsGreaterThanZero
> Apr 02, 2016 10:12:14 PM
> org.apache.oodt.cas.metadata.extractors.ExternMetExtractor extrMetadata
> INFO: Generating met file for product file:
> [/mnt/celgene.rnd.combio.mmgp.external/TestSeqData/RNA-Seq/RawData/fastq/E837642_R1.fastq.gz.met]
> Apr 02, 2016 10:12:14 PM
> org.apache.oodt.cas.metadata.extractors.ExternMetExtractor extrMetadata
> INFO: Executing command line:
> [/celgene/software/apache-oodt/crawler/bin/MetExtractorNGS.pl
> /mnt/celgene.rnd.combio.mmgp.external/TestSeqData/RNA-Seq/RawData/fastq/E837642_R1.fastq.gz.met
> text ] with workingDir:
> [/mnt/celgene.rnd.combio.mmgp.external/TestSeqData/RNA-Seq/RawData/fastq]
> to extract metadata
> OUTPUT: [WARN : MetExtractorNGS - 2016/04/02 22:12:15] - Input file
> /mnt/celgene.rnd.combio.mmgp.external/TestSeqData/RNA-Seq/RawData/fastq/E837642_R1.fastq.gz.met
> will be ignored. .met files are not processed !
> Apr 02, 2016 10:12:15 PM org.apache.oodt.cas.crawl.ProductCrawler
> handleFile
> SEVERE: Failed to get metadata for product : Met extractor failed to
> create metadata file
> org.apache.oodt.cas.metadata.exceptions.MetExtractionException: Met
> extractor failed to create metadata file
>         at
> org.apache.oodt.cas.metadata.extractors.ExternMetExtractor.extrMetadata(ExternMetExtractor.java:120)
>         at
> org.apache.oodt.cas.metadata.AbstractMetExtractor.extractMetadata(AbstractMetExtractor.java:74)
>         at
> org.apache.oodt.cas.crawl.AutoDetectProductCrawler.getMetadataForProduct(AutoDetectProductCrawler.java:84)
>         at
> org.apache.oodt.cas.crawl.ProductCrawler.handleFile(ProductCrawler.java:136)
>         at
> org.apache.oodt.cas.crawl.ProductCrawler.crawl(ProductCrawler.java:104)
>         at
> org.apache.oodt.cas.crawl.ProductCrawler.crawl(ProductCrawler.java:74)
>         at
> org.apache.oodt.cas.crawl.cli.action.CrawlerLauncherCliAction.execute(CrawlerLauncherCliAction.java:58)
>         at
> org.apache.oodt.cas.cli.CmdLineUtility.execute(CmdLineUtility.java:331)
>         at
> org.apache.oodt.cas.cli.CmdLineUtility.run(CmdLineUtility.java:188)
>         at
> org.apache.oodt.cas.crawl.CrawlerLauncher.main(CrawlerLauncher.java:36)
>
> Apr 02, 2016 10:12:15 PM org.apache.oodt.cas.crawl.ProductCrawler
> handleFile
> INFO: Handling file
> /mnt/celgene.rnd.combio.mmgp.external/TestSeqData/RNA-Seq/RawData/fastq/E837642_R2.fastq.gz
> Apr 02, 2016 10:12:15 PM
> org.apache.oodt.cas.crawl.AutoDetectProductCrawler passesPreconditions
> WARNING: No extractor specs specified for
> /mnt/celgene.rnd.combio.mmgp.external/TestSeqData/RNA-Seq/RawData/fastq/E837642_R2.fastq.gz
> Apr 02, 2016 10:12:15 PM org.apache.oodt.cas.crawl.ProductCrawler
> handleFile
> WARNING: Failed to pass preconditions for ingest of product:
> [/mnt/celgene.rnd.combio.mmgp.external/TestSeqData/RNA-Seq/RawData/fastq/E837642_R2.fastq.gz]
> Apr 02, 2016 10:12:15 PM org.apache.oodt.cas.crawl.ProductCrawler
> handleFile
> INFO: Handling file
> /mnt/celgene.rnd.combio.mmgp.external/TestSeqData/RNA-Seq/RawData/fastq/E837642_R2.fastq.gz.met
> Apr 02, 2016 10:12:15 PM
> org.apache.oodt.cas.metadata.preconditions.PreCondEvalUtils eval
> INFO: Passed precondition comparator id
> CheckThatDataFileSizeIsGreaterThanZero
> Apr 02, 2016 10:12:16 PM
> org.apache.oodt.cas.metadata.extractors.ExternMetExtractor extrMetadata
> INFO: Generating met file for product file:
> [/mnt/celgene.rnd.combio.mmgp.external/TestSeqData/RNA-Seq/RawData/fastq/E837642_R2.fastq.gz.met]
> Apr 02, 2016 10:12:16 PM
> org.apache.oodt.cas.metadata.extractors.ExternMetExtractor extrMetadata
> INFO: Executing command line:
> [/celgene/software/apache-oodt/crawler/bin/MetExtractorNGS.pl
> /mnt/celgene.rnd.combio.mmgp.external/TestSeqData/RNA-Seq/RawData/fastq/E837642_R2.fastq.gz.met
> text ] with workingDir:
> [/mnt/celgene.rnd.combio.mmgp.external/TestSeqData/RNA-Seq/RawData/fastq]
> to extract metadata
> OUTPUT: [WARN : MetExtractorNGS - 2016/04/02 22:12:16] - Input file
> /mnt/celgene.rnd.combio.mmgp.external/TestSeqData/RNA-Seq/RawData/fastq/E837642_R2.fastq.gz.met
> will be ignored. .met files are not processed !
> Apr 02, 2016 10:12:17 PM org.apache.oodt.cas.crawl.ProductCrawler
> handleFile
> SEVERE: Failed to get metadata for product : Met extractor failed to
> create metadata file
> org.apache.oodt.cas.metadata.exceptions.MetExtractionException: Met
> extractor failed to create metadata file
>         at
> org.apache.oodt.cas.metadata.extractors.ExternMetExtractor.extrMetadata(ExternMetExtractor.java:120)
>         at
> org.apache.oodt.cas.metadata.AbstractMetExtractor.extractMetadata(AbstractMetExtractor.java:74)
>         at
> org.apache.oodt.cas.crawl.AutoDetectProductCrawler.getMetadataForProduct(AutoDetectProductCrawler.java:84)
>         at
> org.apache.oodt.cas.crawl.ProductCrawler.handleFile(ProductCrawler.java:136)
>         at
> org.apache.oodt.cas.crawl.ProductCrawler.crawl(ProductCrawler.java:104)
>         at
> org.apache.oodt.cas.crawl.ProductCrawler.crawl(ProductCrawler.java:74)
>         at
> org.apache.oodt.cas.crawl.cli.action.CrawlerLauncherCliAction.execute(CrawlerLauncherCliAction.java:58)
>         at
> org.apache.oodt.cas.cli.CmdLineUtility.execute(CmdLineUtility.java:331)
>         at
> org.apache.oodt.cas.cli.CmdLineUtility.run(CmdLineUtility.java:188)
>         at
> org.apache.oodt.cas.crawl.CrawlerLauncher.main(CrawlerLauncher.java:36)
>
> Apr 02, 2016 10:12:17 PM org.apache.oodt.cas.crawl.ProductCrawler
> handleFile
> INFO: Handling file
> /mnt/celgene.rnd.combio.mmgp.external/TestSeqData/RNA-Seq/RawData/fastq/cas-crawler-04-02-16.log.gz
> Apr 02, 2016 10:12:17 PM
> org.apache.oodt.cas.crawl.AutoDetectProductCrawler passesPreconditions
> WARNING: No extractor specs specified for
> /mnt/celgene.rnd.combio.mmgp.external/TestSeqData/RNA-Seq/RawData/fastq/cas-crawler-04-02-16.log.gz
> Apr 02, 2016 10:12:17 PM org.apache.oodt.cas.crawl.ProductCrawler
> handleFile
> WARNING: Failed to pass preconditions for ingest of product:
> [/mnt/celgene.rnd.combio.mmgp.external/TestSeqData/RNA-Seq/RawData/fastq/cas-crawler-04-02-16.log.gz]
> Apr 02, 2016 10:12:17 PM org.apache.oodt.cas.crawl.ProductCrawler
> handleFile
> INFO: Handling file
> /mnt/celgene.rnd.combio.mmgp.external/TestSeqData/RNA-Seq/RawData/fastq/cas-crawler-04-02-16.tar.gz
> Apr 02, 2016 10:12:17 PM
> org.apache.oodt.cas.crawl.AutoDetectProductCrawler passesPreconditions
> WARNING: No extractor specs specified for
> /mnt/celgene.rnd.combio.mmgp.external/TestSeqData/RNA-Seq/RawData/fastq/cas-crawler-04-02-16.tar.gz
> Apr 02, 2016 10:12:17 PM org.apache.oodt.cas.crawl.ProductCrawler
> handleFile
> WARNING: Failed to pass preconditions for ingest of product:
> [/mnt/celgene.rnd.combio.mmgp.external/TestSeqData/RNA-Seq/RawData/fastq/cas-crawler-04-02-16.tar.gz]
> Apr 02, 2016 10:12:17 PM org.apache.oodt.cas.crawl.ProductCrawler
> handleFile
> INFO: Handling file
> /mnt/celgene.rnd.combio.mmgp.external/TestSeqData/RNA-Seq/RawData/fastq/cas-crawler-mnt-celgene.rnd.combio.mmgp.external-TestSeqData-RNA-Seq-RawData-fastq-04-02-16.tar.gz
> Apr 02, 2016 10:12:17 PM
> org.apache.oodt.cas.crawl.AutoDetectProductCrawler passesPreconditions
> WARNING: No extractor specs specified for
> /mnt/celgene.rnd.combio.mmgp.external/TestSeqData/RNA-Seq/RawData/fastq/cas-crawler-mnt-celgene.rnd.combio.mmgp.external-TestSeqData-RNA-Seq-RawData-fastq-04-02-16.tar.gz
> Apr 02, 2016 10:12:17 PM org.apache.oodt.cas.crawl.ProductCrawler
> handleFile
> WARNING: Failed to pass preconditions for ingest of product:
> [/mnt/celgene.rnd.combio.mmgp.external/TestSeqData/RNA-Seq/RawData/fastq/cas-crawler-mnt-celgene.rnd.combio.mmgp.external-TestSeqData-RNA-Seq-RawData-fastq-04-02-16.tar.gz]
> Apr 02, 2016 10:12:17 PM org.apache.oodt.cas.crawl.ProductCrawler
> handleFile
> INFO: Handling file
> /mnt/celgene.rnd.combio.mmgp.external/TestSeqData/RNA-Seq/RawData/fastq/test
> Apr 02, 2016 10:12:17 PM
> org.apache.oodt.cas.metadata.preconditions.PreCondEvalUtils eval
> INFO: Passed precondition comparator id
> CheckThatDataFileSizeIsGreaterThanZero
> Apr 02, 2016 10:12:17 PM
> org.apache.oodt.cas.metadata.extractors.ExternMetExtractor extrMetadata
> INFO: Generating met file for product file:
> [/mnt/celgene.rnd.combio.mmgp.external/TestSeqData/RNA-Seq/RawData/fastq/test]
> Apr 02, 2016 10:12:17 PM
> org.apache.oodt.cas.metadata.extractors.ExternMetExtractor extrMetadata
> INFO: Executing command line:
> [/celgene/software/apache-oodt/crawler/bin/MetExtractorNGS.pl
> /mnt/celgene.rnd.combio.mmgp.external/TestSeqData/RNA-Seq/RawData/fastq/test
> text ] with workingDir:
> [/mnt/celgene.rnd.combio.mmgp.external/TestSeqData/RNA-Seq/RawData/fastq]
> to extract metadata
> OUTPUT: [DEBUG : MetExtractorNGS - 2016/04/02 22:12:18] - Accessing NGS
> server at http://192.168.8.44:8082/RPC2
> OUTPUT: [DEBUG : metadataPrepare - 2016/04/02 22:12:18] - addMetadata:
> metadata for file_host are not in array format.Converting..
> OUTPUT: [DEBUG : metadataPrepare - 2016/04/02 22:12:18] - addMetadata:
> adding key/value [file_host]/[ip-192-168-8-66]
> OUTPUT: [DEBUG : metadataPrepare - 2016/04/02 22:12:18] - addMetadata:
> metadata for ProductType are not in array format.Converting..
> OUTPUT: [DEBUG : metadataPrepare - 2016/04/02 22:12:18] - addMetadata:
> adding key/value [ProductType]/[GenericFile]
> OUTPUT: [DEBUG : metadataPrepare - 2016/04/02 22:12:18] - addMetadata:
> metadata for ingest_user are not in array format.Converting..
> OUTPUT: [DEBUG : metadataPrepare - 2016/04/02 22:12:18] - addMetadata:
> adding key/value [ingest_user]/[kmavrommatis]
> OUTPUT: [DEBUG : MetExtractorNGS - 2016/04/02 22:12:18] - The file path is
> ARRAY(0x22d3f48). It will be added under the FilePath metadata field
> OUTPUT: [DEBUG : metadataPrepare - 2016/04/02 22:12:18] - addMetadata:
> metadata for FilePath are not in array format.Converting..
> OUTPUT: [DEBUG : metadataPrepare - 2016/04/02 22:12:18] - addMetadata:
> adding key/value
> [FilePath]/[/mnt/celgene.rnd.combio.mmgp.external/TestSeqData/RNA-Seq/RawData/fastq/test]
> OUTPUT: [DEBUG : MetExtractorNGS - 2016/04/02 22:12:18] - This file is of
> type text
> OUTPUT: [DEBUG : MetExtractorNGS - 2016/04/02 22:12:18] - Storing metadata
> in file
> /mnt/celgene.rnd.combio.mmgp.external/TestSeqData/RNA-Seq/RawData/fastq/test.met
> OUTPUT: [DEBUG : metadataPrepare - 2016/04/02 22:12:18] - Changing
> /mnt/celgene.rnd.combio.mmgp.external/TestSeqData/RNA-Seq/RawData/fastq/test
> to
> OUTPUT: [DEBUG : metadataPrepare - 2016/04/02 22:12:18] -
> /mnt/celgene.rnd.combio.mmgp.external/TestSeqData/RNA-Seq/RawData/fastq/test
> OUTPUT: [DEBUG : metadataPrepare - 2016/04/02 22:12:18] - Changing
> kmavrommatis to
> OUTPUT: [DEBUG : metadataPrepare - 2016/04/02 22:12:18] - kmavrommatis
> OUTPUT: [DEBUG : metadataPrepare - 2016/04/02 22:12:18] - Changing
> GenericFile to
> OUTPUT: [DEBUG : metadataPrepare - 2016/04/02 22:12:18] - GenericFile
> OUTPUT: [DEBUG : metadataPrepare - 2016/04/02 22:12:18] - Changing
> ip-192-168-8-66 to
> OUTPUT: [DEBUG : metadataPrepare - 2016/04/02 22:12:18] - ip-192-168-8-66
> OUTPUT: [DEBUG : MetExtractorNGS - 2016/04/02 22:12:19] - Process finished
> SUCCESSFULLY
> Apr 02, 2016 10:12:19 PM
> org.apache.oodt.cas.metadata.extractors.ExternMetExtractor extrMetadata
> INFO: Met extraction successful for product file:
> [/mnt/celgene.rnd.combio.mmgp.external/TestSeqData/RNA-Seq/RawData/fastq/test]
> Apr 02, 2016 10:12:19 PM org.apache.oodt.cas.crawl.ProductCrawler ingest
> INFO: ProductCrawler: Ready to ingest product:
> [/mnt/celgene.rnd.combio.mmgp.external/TestSeqData/RNA-Seq/RawData/fastq/test]:
> ProductType: [GenericFile]
> Apr 02, 2016 10:12:19 PM org.apache.oodt.cas.filemgr.ingest.StdIngester
> setFileManager
> INFO: StdIngester: connected to file manager: [http://192.168.8.44:9000]
> Apr 02, 2016 10:12:19 PM
> org.apache.oodt.cas.filemgr.datatransfer.InPlaceDataTransferer
> setFileManagerUrl
> INFO: In Place Data Transfer to: [http://192.168.8.44:9000] enabled
> Apr 02, 2016 10:12:19 PM org.apache.oodt.cas.filemgr.ingest.StdIngester
> ingest
> INFO: StdIngester: ingesting product: ProductName: [test]: ProductType:
> [GenericFile]: FileLocation:
> [/mnt/celgene.rnd.combio.mmgp.external/TestSeqData/RNA-Seq/RawData/fastq/]
> Apr 02, 2016 10:12:19 PM
> org.apache.oodt.cas.filemgr.system.XmlRpcFileManagerClient ingestProduct
> FINEST: File Manager Client: clientTransfer enabled: transfering product
> [test]
> Apr 02, 2016 10:12:19 PM
> org.apache.oodt.cas.filemgr.versioning.VersioningUtils
> createBasicDataStoreRefsFlat
> FINE: VersioningUtils: Generated data store ref:
> file:/opt/oodt/data/archive/test/test from origRef:
> file:/mnt/celgene.rnd.combio.mmgp.external/TestSeqData/RNA-Seq/RawData/fastq/test
> Apr 02, 2016 10:12:19 PM org.apache.oodt.cas.crawl.ProductCrawler ingest
> INFO: Successfully ingested product:
> [/mnt/celgene.rnd.combio.mmgp.external/TestSeqData/RNA-Seq/RawData/fastq/test]:
> product id: 4c8de2da-265a-48c4-8380-3f1103dfecfc
> Apr 02, 2016 10:12:19 PM org.apache.oodt.cas.crawl.ProductCrawler
> handleFile
> INFO: Successful ingest of product:
> [/mnt/celgene.rnd.combio.mmgp.external/TestSeqData/RNA-Seq/RawData/fastq/test]
>
>
> *********************************************************
> THIS ELECTRONIC MAIL MESSAGE AND ANY ATTACHMENT IS
> CONFIDENTIAL AND MAY CONTAIN LEGALLY PRIVILEGED
> INFORMATION INTENDED ONLY FOR THE USE OF THE INDIVIDUAL
> OR INDIVIDUALS NAMED ABOVE.
> If the reader is not the intended recipient, or the
> employee or agent responsible to deliver it to the
> intended recipient, you are hereby notified that any
> dissemination, distribution or copying of this
> communication is strictly prohibited. If you have
> received this communication in error, please reply to the
> sender to notify us of the error and delete the original
> message. Thank You.
>


-- 
*Lewis*