You are viewing a plain text version of this content. The canonical link for it is here.
Posted to solr-dev@lucene.apache.org by Chris Hostetter <ho...@fucit.org> on 2006/09/17 22:00:41 UTC

double curl calls in post.sh?

am i smoking crack of is post.sh mistakenly sending every doc twice in a
row? ...

for f in $FILES; do
  echo Posting file $f to $URL
  curl $URL --data-binary @$f
  curl $URL --data-binary @$f -H 'Content-type:text/xml; charset=utf-8'
  echo
done


...is there any reason not to delete that first execution of curl?



-Hoss


Re: double curl calls in post.sh?

Posted by Yonik Seeley <yo...@apache.org>.
On 9/17/06, Chris Hostetter <ho...@fucit.org> wrote:
> am i smoking crack of is post.sh mistakenly sending every doc twice in a
> row? ...

Heh... must have been a cut-n-paste bug.  I just removed it.

-Yonik

Re: double curl calls in post.sh?

Posted by Bill Au <bi...@gmail.com>.
Looks like that was added for the UTF-8 example.  But setting the
content-type/charset should work for all the other examples too,
right?  So I don't see any reason for not deleting the first curl.

Bill

On 9/17/06, Chris Hostetter <ho...@fucit.org> wrote:
>
> am i smoking crack of is post.sh mistakenly sending every doc twice in a
> row? ...
>
> for f in $FILES; do
>   echo Posting file $f to $URL
>   curl $URL --data-binary @$f
>   curl $URL --data-binary @$f -H 'Content-type:text/xml; charset=utf-8'
>   echo
> done
>
>
> ...is there any reason not to delete that first execution of curl?
>
>
>
> -Hoss
>
>

Re: double curl calls in post.sh?

Posted by Yonik Seeley <yo...@apache.org>.
On 9/18/06, Walter Underwood <wu...@netflix.com> wrote:
> XML parsers already do this correctly.

Ah, I thought that maybe the servlet container itself could do that
when one requests a Reader.  Using a byte-oriented InputStream and
passing that to the parser would work, but would require some little
changes to Solr.

-Yonik

Re: double curl calls in post.sh?

Posted by Walter Underwood <wu...@netflix.com>.
On 9/18/06 10:10 AM, "Yonik Seeley" <yo...@apache.org> wrote:
> On 9/18/06, Walter Underwood <wu...@netflix.com> wrote:
>> Instead, use a media type of application/xml, so that the server
>> is allowed to sniff the content to discover the character encoding.
> 
> Cool!  Do you know what servlet containers currently implement this
> "sniffing"?

XML parsers already do this correctly. They look at the XML declaration
for the encoding, and if that isn't there, they look for a BOM or
UTF-8 content, as described in the (non-normative) appendix to the
XML spec.

  http://www.w3.org/TR/REC-xml/#sec-guessing

The servlet container needs to hand the raw bytes to the parser,
which should be normal behavior for application/*.

wunder
--
Walter Underwood
Search Guru, Netflix


Re: double curl calls in post.sh?

Posted by Yonik Seeley <yo...@apache.org>.
On 9/18/06, Walter Underwood <wu...@netflix.com> wrote:
> Instead, use a media type of application/xml, so that the server
> is allowed to sniff the content to discover the character encoding.

Cool!  Do you know what servlet containers currently implement this "sniffing"?

-Yonik

Re: double curl calls in post.sh?

Posted by Walter Underwood <wu...@netflix.com>.
Also, do not use text/xml. Even with a charset parameter. In a correct
implementation, that will override the XML declaration of charset.
With text/xml, the charset parameter must be correct. When it is
omitted, the content MUST be interpreted as US-ASCII (yuk).

Instead, use a media type of application/xml, so that the server
is allowed to sniff the content to discover the character encoding.

For the gory details, see RFC 3023:

  http://www.ietf.org/rfc/rfc3023.txt

wunder
==
Walter Underwood
Search Guru, Netflix

On 9/17/06 1:00 PM, "Chris Hostetter" <ho...@fucit.org> wrote:

> 
> am i smoking crack of is post.sh mistakenly sending every doc twice in a
> row? ...
> 
> for f in $FILES; do
>   echo Posting file $f to $URL
>   curl $URL --data-binary @$f
>   curl $URL --data-binary @$f -H 'Content-type:text/xml; charset=utf-8'
>   echo
> done
> 
> 
> ...is there any reason not to delete that first execution of curl?
> 
> 
> 
> -Hoss
>