You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@oodt.apache.org by Jordan Padams <jo...@gmail.com> on 2013/11/09 09:31:47 UTC
PushPull HTTP Protocol
Hello all,
I am having some difficulty using the PushPull HTTP protocol and I can't
seem to find the issue. I continue to get a "Failed to get appropriate
protocol for RemoteSite" error. Example configs are below and the log is
attached. For this example I am simply trying to pull
http://pds-imaging.jpl.nasa.gov/data/msl/MSLHAZ_0XXX/CATALOG/CATINFO.TXT.
Any help would be much appreciated.
RemoteSpecs.xml
<daemon alias="msl" rtvlMethod=
"gov.nasa.jpl.oodt.cas.crawl.retrievalmethod.RemoteCrawler" active="yes">
<propInfo dir="[CAS_PP_RESOURCES]/conf/DirStructXmlParserFiles">
<propFiles regExp="msl\.xml"
parser="org.apache.oodt.cas.pushpull.filerestrictions.\
parsers.DirStructXmlParser"/>
</propInfo>
<dataInfo stagingArea="/usr/local/report/logs/msl"
deleteFromServer="no" queryElement="\
RetrievedFromLoc"/>
</daemon>
ExternalSources.xml
<source host="pds-imaging.jpl.nasa.gov">
<login type="http" alias="msl">
<username>none</username>
<password>none</password>
</login>
</source>
msl.xml
<dirstruct starting_path="/data/msl/MSLHAZ_0XXX/CATALOG">
<file name="CATINFO\.TXT" />
</dirstruct>
Thanks,
Jordan
--
Jordan Padams
Software Engineer
NASA Jet Propulsion Laboratory
Re: PushPull HTTP Protocol
Posted by Jordan Padams <jo...@gmail.com>.
Hey Lewis,
Thanks for the reply!
I had the following mime type specified before:
<mime-type type="product/txt">
<glob pattern="*.TXT"/>
</mime-type>
I updated the config per your notes, but still got the following exception
(same as before):
*org.apache.oodt.cas.protocol.exceptions.ProtocolException: Failed to get
appropriate protocol for RemoteSite: alias = 'msl' url =
'http://pds-imaging.jpl.nasa.gov <http://pds-imaging.jpl.nasa.gov>'
username = 'none' cdTestDir = 'null' maxConnections = '-1'*
I didn't specify a runInfo element because, according to the docs, if I do
not specify a runInfo, then the daemon will run only once, and then quit.
This is the desired functionality.
The log of the most current run is attached (cas-pushpull0.log).
On another note, I successfully ran the url-downloader script that ships
with PushPull (see updated version below, current used HttpClient versus
HttpProtocol) to download the file. At least this shows the HttpProtocol
can get to the file.
./url-downloader
http://pds-imaging.jpl.nasa.gov/data/msl/MSLHAZ_0XXX/CATALOG/CATINFO.TXT .
url-downloader
#!/bin/csh
$JAVA_HOME/bin/java -Djava.ext.dirs=../lib \
-Djava.util.logging.config.file=../etc/logging.properties \
org.apache.oodt.cas.protocol.http.HttpProtocol \
--url $1 \
--downloadToDir $2
If someone had an example of a configuration for HTTP that may help me
debug the issue further, but right now I'm not sure what the problem is.
Thanks,
Jordan
On Sat, Nov 9, 2013 at 2:08 AM, Lewis John Mcgibbney <
lewis.mcgibbney@gmail.com> wrote:
> Hi Jordan,
> A couple of things here.
> * Firstly assuming we use "
> http://pds-imaging.jpl.nasa.gov/data/msl/MSLHAZ_0XXX/CATALOG/" as the
> root path
> * AFAIK you would need to include policy/mimetypes.xml with a mapping
> something like the following
>
> <mime-info>
> <mime-type type="metadata/cas_pushpull">
> <glob pattern="*.info.tmp"/>
> </mime-type>
> <mime-type type="metadata/cas_metadata">
> <glob pattern="*.cas"/>
> <glob pattern="*.met"/>
> </mime-type>
> <mime-type type="product/pds-imaging">
> <_comment>Description of the CATALOG directory contents for the
> MSL HAZCAM EDR PDS Archive</_comment>
> <glob pattern="CATINFO.TXT" isregex="false"/>
> </mime-type>
> </mime-info>
>
> * The ExternalSources.xml file you posted looks A OK.
> * RemoteSpecs.xml also looks AOK however I would also consider possibly
> adding the <runInfo> elements e.g. <runInfo
> firstRunDateTime="2013-11-09T00:00:00Z" period="1m" runOnReboot="yes"/> as
> this lets you specify a sleep/wait time for the daemon. 3 mins is quite
> long.
> * msl.xml looks fine from what I can see.
> Do you get any logging for the failed jobs?
> There is quite a bit of config to do here and I find that it is easy to
> make mistakes.
> Thanks
> Lewis
>
> On Sat, Nov 9, 2013 at 8:31 AM, Jordan Padams <jo...@gmail.com>wrote:
>>
>>
>>
>> Thanks,
>> Jordan
>>
>>
>> --
>> Jordan Padams
>> Software Engineer
>> NASA Jet Propulsion Laboratory
>>
>
>
>
> --
> *Lewis*
>
--
Jordan Padams
Software Engineer
NASA Jet Propulsion Laboratory
Re: PushPull HTTP Protocol
Posted by Jordan Padams <jo...@gmail.com>.
Hey Lewis,
Thanks for the reply!
I had the following mime type specified before:
<mime-type type="product/txt">
<glob pattern="*.TXT"/>
</mime-type>
I updated the config per your notes, but still got the following exception
(same as before):
*org.apache.oodt.cas.protocol.exceptions.ProtocolException: Failed to get
appropriate protocol for RemoteSite: alias = 'msl' url =
'http://pds-imaging.jpl.nasa.gov <http://pds-imaging.jpl.nasa.gov>'
username = 'none' cdTestDir = 'null' maxConnections = '-1'*
I didn't specify a runInfo element because, according to the docs, if I do
not specify a runInfo, then the daemon will run only once, and then quit.
This is the desired functionality.
The log of the most current run is attached (cas-pushpull0.log).
On another note, I successfully ran the url-downloader script that ships
with PushPull (see updated version below, current used HttpClient versus
HttpProtocol) to download the file. At least this shows the HttpProtocol
can get to the file.
./url-downloader
http://pds-imaging.jpl.nasa.gov/data/msl/MSLHAZ_0XXX/CATALOG/CATINFO.TXT .
url-downloader
#!/bin/csh
$JAVA_HOME/bin/java -Djava.ext.dirs=../lib \
-Djava.util.logging.config.file=../etc/logging.properties \
org.apache.oodt.cas.protocol.http.HttpProtocol \
--url $1 \
--downloadToDir $2
If someone had an example of a configuration for HTTP that may help me
debug the issue further, but right now I'm not sure what the problem is.
Thanks,
Jordan
On Sat, Nov 9, 2013 at 2:08 AM, Lewis John Mcgibbney <
lewis.mcgibbney@gmail.com> wrote:
> Hi Jordan,
> A couple of things here.
> * Firstly assuming we use "
> http://pds-imaging.jpl.nasa.gov/data/msl/MSLHAZ_0XXX/CATALOG/" as the
> root path
> * AFAIK you would need to include policy/mimetypes.xml with a mapping
> something like the following
>
> <mime-info>
> <mime-type type="metadata/cas_pushpull">
> <glob pattern="*.info.tmp"/>
> </mime-type>
> <mime-type type="metadata/cas_metadata">
> <glob pattern="*.cas"/>
> <glob pattern="*.met"/>
> </mime-type>
> <mime-type type="product/pds-imaging">
> <_comment>Description of the CATALOG directory contents for the
> MSL HAZCAM EDR PDS Archive</_comment>
> <glob pattern="CATINFO.TXT" isregex="false"/>
> </mime-type>
> </mime-info>
>
> * The ExternalSources.xml file you posted looks A OK.
> * RemoteSpecs.xml also looks AOK however I would also consider possibly
> adding the <runInfo> elements e.g. <runInfo
> firstRunDateTime="2013-11-09T00:00:00Z" period="1m" runOnReboot="yes"/> as
> this lets you specify a sleep/wait time for the daemon. 3 mins is quite
> long.
> * msl.xml looks fine from what I can see.
> Do you get any logging for the failed jobs?
> There is quite a bit of config to do here and I find that it is easy to
> make mistakes.
> Thanks
> Lewis
>
> On Sat, Nov 9, 2013 at 8:31 AM, Jordan Padams <jo...@gmail.com>wrote:
>>
>>
>>
>> Thanks,
>> Jordan
>>
>>
>> --
>> Jordan Padams
>> Software Engineer
>> NASA Jet Propulsion Laboratory
>>
>
>
>
> --
> *Lewis*
>
--
Jordan Padams
Software Engineer
NASA Jet Propulsion Laboratory
Re: PushPull HTTP Protocol
Posted by Lewis John Mcgibbney <le...@gmail.com>.
Hi Jordan,
A couple of things here.
* Firstly assuming we use "
http://pds-imaging.jpl.nasa.gov/data/msl/MSLHAZ_0XXX/CATALOG/" as the root
path
* AFAIK you would need to include policy/mimetypes.xml with a mapping
something like the following
<mime-info>
<mime-type type="metadata/cas_pushpull">
<glob pattern="*.info.tmp"/>
</mime-type>
<mime-type type="metadata/cas_metadata">
<glob pattern="*.cas"/>
<glob pattern="*.met"/>
</mime-type>
<mime-type type="product/pds-imaging">
<_comment>Description of the CATALOG directory contents for the MSL
HAZCAM EDR PDS Archive</_comment>
<glob pattern="CATINFO.TXT" isregex="false"/>
</mime-type>
</mime-info>
* The ExternalSources.xml file you posted looks A OK.
* RemoteSpecs.xml also looks AOK however I would also consider possibly
adding the <runInfo> elements e.g. <runInfo
firstRunDateTime="2013-11-09T00:00:00Z" period="1m" runOnReboot="yes"/> as
this lets you specify a sleep/wait time for the daemon. 3 mins is quite
long.
* msl.xml looks fine from what I can see.
Do you get any logging for the failed jobs?
There is quite a bit of config to do here and I find that it is easy to
make mistakes.
Thanks
Lewis
On Sat, Nov 9, 2013 at 8:31 AM, Jordan Padams <jo...@gmail.com>wrote:
>
>
>
> Thanks,
> Jordan
>
>
> --
> Jordan Padams
> Software Engineer
> NASA Jet Propulsion Laboratory
>
--
*Lewis*
Re: PushPull HTTP Protocol
Posted by Lewis John Mcgibbney <le...@gmail.com>.
Hi Jordan,
A couple of things here.
* Firstly assuming we use "
http://pds-imaging.jpl.nasa.gov/data/msl/MSLHAZ_0XXX/CATALOG/" as the root
path
* AFAIK you would need to include policy/mimetypes.xml with a mapping
something like the following
<mime-info>
<mime-type type="metadata/cas_pushpull">
<glob pattern="*.info.tmp"/>
</mime-type>
<mime-type type="metadata/cas_metadata">
<glob pattern="*.cas"/>
<glob pattern="*.met"/>
</mime-type>
<mime-type type="product/pds-imaging">
<_comment>Description of the CATALOG directory contents for the MSL
HAZCAM EDR PDS Archive</_comment>
<glob pattern="CATINFO.TXT" isregex="false"/>
</mime-type>
</mime-info>
* The ExternalSources.xml file you posted looks A OK.
* RemoteSpecs.xml also looks AOK however I would also consider possibly
adding the <runInfo> elements e.g. <runInfo
firstRunDateTime="2013-11-09T00:00:00Z" period="1m" runOnReboot="yes"/> as
this lets you specify a sleep/wait time for the daemon. 3 mins is quite
long.
* msl.xml looks fine from what I can see.
Do you get any logging for the failed jobs?
There is quite a bit of config to do here and I find that it is easy to
make mistakes.
Thanks
Lewis
On Sat, Nov 9, 2013 at 8:31 AM, Jordan Padams <jo...@gmail.com>wrote:
>
>
>
> Thanks,
> Jordan
>
>
> --
> Jordan Padams
> Software Engineer
> NASA Jet Propulsion Laboratory
>
--
*Lewis*