You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@oodt.apache.org by Jordan Padams <jo...@gmail.com> on 2013/11/09 09:31:47 UTC

PushPull HTTP Protocol

Hello all,

I am having some difficulty using the PushPull HTTP protocol and I can't
seem to find the issue. I continue to get a "Failed to get appropriate
protocol for RemoteSite" error.  Example configs are below and the log is
attached.  For this example I am simply trying to pull
http://pds-imaging.jpl.nasa.gov/data/msl/MSLHAZ_0XXX/CATALOG/CATINFO.TXT.
 Any help would be much appreciated.

RemoteSpecs.xml
        <daemon alias="msl" rtvlMethod=
"gov.nasa.jpl.oodt.cas.crawl.retrievalmethod.RemoteCrawler" active="yes">
            <propInfo dir="[CAS_PP_RESOURCES]/conf/DirStructXmlParserFiles">
                <propFiles regExp="msl\.xml"
parser="org.apache.oodt.cas.pushpull.filerestrictions.\
parsers.DirStructXmlParser"/>
            </propInfo>
            <dataInfo stagingArea="/usr/local/report/logs/msl"
deleteFromServer="no" queryElement="\
RetrievedFromLoc"/>
        </daemon>

ExternalSources.xml
        <source host="pds-imaging.jpl.nasa.gov">
          <login type="http" alias="msl">
            <username>none</username>
            <password>none</password>
          </login>
        </source>

msl.xml
    <dirstruct starting_path="/data/msl/MSLHAZ_0XXX/CATALOG">
      <file name="CATINFO\.TXT" />
    </dirstruct>

Thanks,
Jordan


-- 
Jordan Padams
Software Engineer
NASA Jet Propulsion Laboratory

Re: PushPull HTTP Protocol

Posted by Jordan Padams <jo...@gmail.com>.
Hey Lewis,

Thanks for the reply!

I had the following mime type specified before:

        <mime-type type="product/txt">
                <glob pattern="*.TXT"/>
        </mime-type>

I updated the config per your notes, but still got the following exception
(same as before):

*org.apache.oodt.cas.protocol.exceptions.ProtocolException: Failed to get
appropriate protocol for RemoteSite: alias = 'msl'  url =
'http://pds-imaging.jpl.nasa.gov <http://pds-imaging.jpl.nasa.gov>'
 username = 'none' cdTestDir = 'null' maxConnections = '-1'*


I didn't specify a runInfo element because, according to the docs, if I do
not specify a runInfo, then the daemon will run only once, and then quit.
 This is the desired functionality.

The log of the most current run is attached (cas-pushpull0.log).

On another note, I successfully ran the url-downloader script that ships
with PushPull (see updated version below, current used HttpClient versus
HttpProtocol) to download the file.  At least this shows the HttpProtocol
can get to the file.

./url-downloader
http://pds-imaging.jpl.nasa.gov/data/msl/MSLHAZ_0XXX/CATALOG/CATINFO.TXT .

url-downloader

#!/bin/csh


$JAVA_HOME/bin/java -Djava.ext.dirs=../lib \
        -Djava.util.logging.config.file=../etc/logging.properties \
        org.apache.oodt.cas.protocol.http.HttpProtocol \
        --url $1 \
        --downloadToDir $2


If someone had an example of a configuration for HTTP that may help me
debug the issue further, but right now I'm not sure what the problem is.

Thanks,
Jordan



On Sat, Nov 9, 2013 at 2:08 AM, Lewis John Mcgibbney <
lewis.mcgibbney@gmail.com> wrote:

> Hi Jordan,
> A couple of things here.
> * Firstly assuming we use "
> http://pds-imaging.jpl.nasa.gov/data/msl/MSLHAZ_0XXX/CATALOG/" as the
> root path
> * AFAIK you would need to include policy/mimetypes.xml with a mapping
> something like the following
>
> <mime-info>
>     <mime-type type="metadata/cas_pushpull">
>         <glob pattern="*.info.tmp"/>
>     </mime-type>
>     <mime-type type="metadata/cas_metadata">
>         <glob pattern="*.cas"/>
>         <glob pattern="*.met"/>
>     </mime-type>
>     <mime-type type="product/pds-imaging">
>         <_comment>Description of the CATALOG directory contents for the
> MSL HAZCAM EDR PDS Archive</_comment>
>         <glob pattern="CATINFO.TXT" isregex="false"/>
>     </mime-type>
> </mime-info>
>
> * The ExternalSources.xml file you posted looks A OK.
> * RemoteSpecs.xml also looks AOK however I would also consider possibly
> adding the <runInfo> elements e.g. <runInfo
> firstRunDateTime="2013-11-09T00:00:00Z" period="1m" runOnReboot="yes"/> as
> this lets you specify a sleep/wait time for the daemon. 3 mins is quite
> long.
> * msl.xml looks fine from what I can see.
> Do you get any logging for the failed jobs?
> There is quite a bit of config to do here and I find that it is easy to
> make mistakes.
> Thanks
> Lewis
>
> On Sat, Nov 9, 2013 at 8:31 AM, Jordan Padams <jo...@gmail.com>wrote:
>>
>>
>>
>> Thanks,
>> Jordan
>>
>>
>> --
>> Jordan Padams
>> Software Engineer
>> NASA Jet Propulsion Laboratory
>>
>
>
>
> --
> *Lewis*
>



-- 
Jordan Padams
Software Engineer
NASA Jet Propulsion Laboratory

Re: PushPull HTTP Protocol

Posted by Jordan Padams <jo...@gmail.com>.
Hey Lewis,

Thanks for the reply!

I had the following mime type specified before:

        <mime-type type="product/txt">
                <glob pattern="*.TXT"/>
        </mime-type>

I updated the config per your notes, but still got the following exception
(same as before):

*org.apache.oodt.cas.protocol.exceptions.ProtocolException: Failed to get
appropriate protocol for RemoteSite: alias = 'msl'  url =
'http://pds-imaging.jpl.nasa.gov <http://pds-imaging.jpl.nasa.gov>'
 username = 'none' cdTestDir = 'null' maxConnections = '-1'*


I didn't specify a runInfo element because, according to the docs, if I do
not specify a runInfo, then the daemon will run only once, and then quit.
 This is the desired functionality.

The log of the most current run is attached (cas-pushpull0.log).

On another note, I successfully ran the url-downloader script that ships
with PushPull (see updated version below, current used HttpClient versus
HttpProtocol) to download the file.  At least this shows the HttpProtocol
can get to the file.

./url-downloader
http://pds-imaging.jpl.nasa.gov/data/msl/MSLHAZ_0XXX/CATALOG/CATINFO.TXT .

url-downloader

#!/bin/csh


$JAVA_HOME/bin/java -Djava.ext.dirs=../lib \
        -Djava.util.logging.config.file=../etc/logging.properties \
        org.apache.oodt.cas.protocol.http.HttpProtocol \
        --url $1 \
        --downloadToDir $2


If someone had an example of a configuration for HTTP that may help me
debug the issue further, but right now I'm not sure what the problem is.

Thanks,
Jordan



On Sat, Nov 9, 2013 at 2:08 AM, Lewis John Mcgibbney <
lewis.mcgibbney@gmail.com> wrote:

> Hi Jordan,
> A couple of things here.
> * Firstly assuming we use "
> http://pds-imaging.jpl.nasa.gov/data/msl/MSLHAZ_0XXX/CATALOG/" as the
> root path
> * AFAIK you would need to include policy/mimetypes.xml with a mapping
> something like the following
>
> <mime-info>
>     <mime-type type="metadata/cas_pushpull">
>         <glob pattern="*.info.tmp"/>
>     </mime-type>
>     <mime-type type="metadata/cas_metadata">
>         <glob pattern="*.cas"/>
>         <glob pattern="*.met"/>
>     </mime-type>
>     <mime-type type="product/pds-imaging">
>         <_comment>Description of the CATALOG directory contents for the
> MSL HAZCAM EDR PDS Archive</_comment>
>         <glob pattern="CATINFO.TXT" isregex="false"/>
>     </mime-type>
> </mime-info>
>
> * The ExternalSources.xml file you posted looks A OK.
> * RemoteSpecs.xml also looks AOK however I would also consider possibly
> adding the <runInfo> elements e.g. <runInfo
> firstRunDateTime="2013-11-09T00:00:00Z" period="1m" runOnReboot="yes"/> as
> this lets you specify a sleep/wait time for the daemon. 3 mins is quite
> long.
> * msl.xml looks fine from what I can see.
> Do you get any logging for the failed jobs?
> There is quite a bit of config to do here and I find that it is easy to
> make mistakes.
> Thanks
> Lewis
>
> On Sat, Nov 9, 2013 at 8:31 AM, Jordan Padams <jo...@gmail.com>wrote:
>>
>>
>>
>> Thanks,
>> Jordan
>>
>>
>> --
>> Jordan Padams
>> Software Engineer
>> NASA Jet Propulsion Laboratory
>>
>
>
>
> --
> *Lewis*
>



-- 
Jordan Padams
Software Engineer
NASA Jet Propulsion Laboratory

Re: PushPull HTTP Protocol

Posted by Lewis John Mcgibbney <le...@gmail.com>.
Hi Jordan,
A couple of things here.
* Firstly assuming we use "
http://pds-imaging.jpl.nasa.gov/data/msl/MSLHAZ_0XXX/CATALOG/" as the root
path
* AFAIK you would need to include policy/mimetypes.xml with a mapping
something like the following

<mime-info>
    <mime-type type="metadata/cas_pushpull">
        <glob pattern="*.info.tmp"/>
    </mime-type>
    <mime-type type="metadata/cas_metadata">
        <glob pattern="*.cas"/>
        <glob pattern="*.met"/>
    </mime-type>
    <mime-type type="product/pds-imaging">
        <_comment>Description of the CATALOG directory contents for the MSL
HAZCAM EDR PDS Archive</_comment>
        <glob pattern="CATINFO.TXT" isregex="false"/>
    </mime-type>
</mime-info>

* The ExternalSources.xml file you posted looks A OK.
* RemoteSpecs.xml also looks AOK however I would also consider possibly
adding the <runInfo> elements e.g. <runInfo
firstRunDateTime="2013-11-09T00:00:00Z" period="1m" runOnReboot="yes"/> as
this lets you specify a sleep/wait time for the daemon. 3 mins is quite
long.
* msl.xml looks fine from what I can see.
Do you get any logging for the failed jobs?
There is quite a bit of config to do here and I find that it is easy to
make mistakes.
Thanks
Lewis

On Sat, Nov 9, 2013 at 8:31 AM, Jordan Padams <jo...@gmail.com>wrote:
>
>
>
> Thanks,
> Jordan
>
>
> --
> Jordan Padams
> Software Engineer
> NASA Jet Propulsion Laboratory
>



-- 
*Lewis*

Re: PushPull HTTP Protocol

Posted by Lewis John Mcgibbney <le...@gmail.com>.
Hi Jordan,
A couple of things here.
* Firstly assuming we use "
http://pds-imaging.jpl.nasa.gov/data/msl/MSLHAZ_0XXX/CATALOG/" as the root
path
* AFAIK you would need to include policy/mimetypes.xml with a mapping
something like the following

<mime-info>
    <mime-type type="metadata/cas_pushpull">
        <glob pattern="*.info.tmp"/>
    </mime-type>
    <mime-type type="metadata/cas_metadata">
        <glob pattern="*.cas"/>
        <glob pattern="*.met"/>
    </mime-type>
    <mime-type type="product/pds-imaging">
        <_comment>Description of the CATALOG directory contents for the MSL
HAZCAM EDR PDS Archive</_comment>
        <glob pattern="CATINFO.TXT" isregex="false"/>
    </mime-type>
</mime-info>

* The ExternalSources.xml file you posted looks A OK.
* RemoteSpecs.xml also looks AOK however I would also consider possibly
adding the <runInfo> elements e.g. <runInfo
firstRunDateTime="2013-11-09T00:00:00Z" period="1m" runOnReboot="yes"/> as
this lets you specify a sleep/wait time for the daemon. 3 mins is quite
long.
* msl.xml looks fine from what I can see.
Do you get any logging for the failed jobs?
There is quite a bit of config to do here and I find that it is easy to
make mistakes.
Thanks
Lewis

On Sat, Nov 9, 2013 at 8:31 AM, Jordan Padams <jo...@gmail.com>wrote:
>
>
>
> Thanks,
> Jordan
>
>
> --
> Jordan Padams
> Software Engineer
> NASA Jet Propulsion Laboratory
>



-- 
*Lewis*