You are viewing a plain text version of this content. The canonical link for it is here.
Posted to solr-dev@lucene.apache.org by Chris Hostetter <ho...@fucit.org> on 2006/09/17 22:00:41 UTC
double curl calls in post.sh?
am i smoking crack of is post.sh mistakenly sending every doc twice in a
row? ...
for f in $FILES; do
echo Posting file $f to $URL
curl $URL --data-binary @$f
curl $URL --data-binary @$f -H 'Content-type:text/xml; charset=utf-8'
echo
done
...is there any reason not to delete that first execution of curl?
-Hoss
Re: double curl calls in post.sh?
Posted by Yonik Seeley <yo...@apache.org>.
On 9/17/06, Chris Hostetter <ho...@fucit.org> wrote:
> am i smoking crack of is post.sh mistakenly sending every doc twice in a
> row? ...
Heh... must have been a cut-n-paste bug. I just removed it.
-Yonik
Re: double curl calls in post.sh?
Posted by Bill Au <bi...@gmail.com>.
Looks like that was added for the UTF-8 example. But setting the
content-type/charset should work for all the other examples too,
right? So I don't see any reason for not deleting the first curl.
Bill
On 9/17/06, Chris Hostetter <ho...@fucit.org> wrote:
>
> am i smoking crack of is post.sh mistakenly sending every doc twice in a
> row? ...
>
> for f in $FILES; do
> echo Posting file $f to $URL
> curl $URL --data-binary @$f
> curl $URL --data-binary @$f -H 'Content-type:text/xml; charset=utf-8'
> echo
> done
>
>
> ...is there any reason not to delete that first execution of curl?
>
>
>
> -Hoss
>
>
Re: double curl calls in post.sh?
Posted by Yonik Seeley <yo...@apache.org>.
On 9/18/06, Walter Underwood <wu...@netflix.com> wrote:
> XML parsers already do this correctly.
Ah, I thought that maybe the servlet container itself could do that
when one requests a Reader. Using a byte-oriented InputStream and
passing that to the parser would work, but would require some little
changes to Solr.
-Yonik
Re: double curl calls in post.sh?
Posted by Walter Underwood <wu...@netflix.com>.
On 9/18/06 10:10 AM, "Yonik Seeley" <yo...@apache.org> wrote:
> On 9/18/06, Walter Underwood <wu...@netflix.com> wrote:
>> Instead, use a media type of application/xml, so that the server
>> is allowed to sniff the content to discover the character encoding.
>
> Cool! Do you know what servlet containers currently implement this
> "sniffing"?
XML parsers already do this correctly. They look at the XML declaration
for the encoding, and if that isn't there, they look for a BOM or
UTF-8 content, as described in the (non-normative) appendix to the
XML spec.
http://www.w3.org/TR/REC-xml/#sec-guessing
The servlet container needs to hand the raw bytes to the parser,
which should be normal behavior for application/*.
wunder
--
Walter Underwood
Search Guru, Netflix
Re: double curl calls in post.sh?
Posted by Yonik Seeley <yo...@apache.org>.
On 9/18/06, Walter Underwood <wu...@netflix.com> wrote:
> Instead, use a media type of application/xml, so that the server
> is allowed to sniff the content to discover the character encoding.
Cool! Do you know what servlet containers currently implement this "sniffing"?
-Yonik
Re: double curl calls in post.sh?
Posted by Walter Underwood <wu...@netflix.com>.
Also, do not use text/xml. Even with a charset parameter. In a correct
implementation, that will override the XML declaration of charset.
With text/xml, the charset parameter must be correct. When it is
omitted, the content MUST be interpreted as US-ASCII (yuk).
Instead, use a media type of application/xml, so that the server
is allowed to sniff the content to discover the character encoding.
For the gory details, see RFC 3023:
http://www.ietf.org/rfc/rfc3023.txt
wunder
==
Walter Underwood
Search Guru, Netflix
On 9/17/06 1:00 PM, "Chris Hostetter" <ho...@fucit.org> wrote:
>
> am i smoking crack of is post.sh mistakenly sending every doc twice in a
> row? ...
>
> for f in $FILES; do
> echo Posting file $f to $URL
> curl $URL --data-binary @$f
> curl $URL --data-binary @$f -H 'Content-type:text/xml; charset=utf-8'
> echo
> done
>
>
> ...is there any reason not to delete that first execution of curl?
>
>
>
> -Hoss
>