You are viewing a plain text version of this content. The canonical link for it is here.
Posted to solr-user@lucene.apache.org by Doug Steigerwald <ds...@mcclatchyinteractive.com> on 2008/02/20 18:31:47 UTC

YAML update request handler

A few months back I wrote a YAML update request handler to see if we could post documents faster 
than with XMl.  We did see some small speed improvements (didn't write down the numbers), but the 
hacked together code was probably making it slower as well.  Not sure if there are faster YAML 
libraries out there either.

We're not actually using it, since it was just a small proof of concept type of project, but is this 
anything people might be interested in?

-- 
Doug Steigerwald

Re: YAML update request handler

Posted by Noble Paul നോബിള്‍ नोब्ळ् <no...@gmail.com>.
Without breaking the existing stuff we can add another interface
BinaryQueryResponse extends QueryResponseWriter{
public void write(OutputStream out, SolrQueryRequest request,
SolrQueryResponse response) throws IOException;

}
and in the SolrDispatchFilter do something like this

QueryResponseWriter responseWriter = core.getQueryResponseWriter(solrReq);
if (responseWriter instanceof BinaryQueryResponse ) {
                BinaryQueryResponse binaryResp = (Object)
responseWriter;
binaryResp.write(response.getOutputStream(), solrReq, solrRsp);
    } else {
responseWriter.write(response.getWriter(), solrReq, solrRsp);}

--Noble
On Fri, Feb 22, 2008 at 8:05 PM, Grant Ingersoll <gs...@apache.org> wrote:
> The DispatchFilter could probably be modified to have the option of
>  using the ServletOutputStream instead of the Writer.  It would take
>  some doing to maintain the proper compatibility, but it can be done, I
>  think.  Maybe we could have a /binary path or something along those
>  lines and SolrJ could use that.  QueryResponseWriter could be extended
>  to have a write method that takes an OutputStream.   Caveat:  I
>  haven't fully investigated this, but I do believe it makes sense for
>  SolrJ to use a binary format by default.  The other thing it should do
>  is make sure, when sending/receiving XML is that the XML is as "tight"
>  as possible, i.e. minimal whitespace, etc.
>
>  Just thinking out loud,
>  Grant
>
>  On Feb 22, 2008, at 8:29 AM, Noble Paul നോബിള്‍
>
>
> नोब्ळ् wrote:
>
>  > The API forbids use of any non-text format.
>  >
>  > The QueryResponseWriter's write() method can take only a Writer. So we
>  > cannot write any binary stream into that.
>  >
>  > --Noble
>  >
>  > On Fri, Feb 22, 2008 at 12:30 AM, Walter Underwood
>  > <wu...@netflix.com> wrote:
>  >> Python marshal format is worth a try. It is binary and can represent
>  >> the same data as JSON. It should be a good fit to Solr.
>  >>
>  >> We benchmarked that against XML several years ago and it was 2X
>  >> faster.
>  >> Of course, XML parsers are a lot faster now.
>  >>
>  >> wunder
>  >>
>  >>
>  >>
>  >> On 2/21/08 10:50 AM, "Grant Ingersoll" <gs...@apache.org> wrote:
>  >>
>  >>> XML can be a problem when it is really lengthy (lots of results,
>  >>> large
>  >>> results) such that a binary format could be useful in certain cases
>  >>> where we control both ends of the pipe (i.e. SolrJ.)  I've seen apps
>  >>> that deal with really large files wrapped in XML where the XML
>  >>> parsing
>  >>> takes a significant amount of time as compared to a more compact
>  >>> binary format.
>  >>>
>  >>> I think it at least warrants profiling/testing.
>  >>>
>  >>> -Grant
>  >>>
>  >>> On Feb 21, 2008, at 12:07 PM, Noble Paul നോബിള്‍
>  >>> नोब्ळ् wrote:
>  >>>
>  >>>> hi,
>  >>>> The format over the wire is not of great significance because it
>  >>>> gets
>  >>>> unmarshalled into the corresponding language object as soon as it
>  >>>> comes out
>  >>>> of the wire. I would say XML/JSON should meet 99% of the
>  >>>> requirements
>  >>>> because all the platforms come with an unmarshaller for both of
>  >>>> these.
>  >>>>
>  >>>> But,If it can offer good performance improvement it is worth
>  >>>> trying.
>  >>>> --Noble
>  >>>>
>  >>>> On Thu, Feb 21, 2008 at 3:41 AM, alexander lind <ma...@webstay.org>
>  >>>> wrote:
>  >>>>
>  >>>>> On Feb 20, 2008, at 9:31 AM, Doug Steigerwald wrote:
>  >>>>>
>  >>>>>> A few months back I wrote a YAML update request handler to see
>  >>>>>> if we
>  >>>>>> could post documents faster than with XMl.  We did see some small
>  >>>>>> speed improvements (didn't write down the numbers), but the
>  >>>>>> hacked
>  >>>>>> together code was probably making it slower as well.  Not sure if
>  >>>>>> there are faster YAML libraries out there either.
>  >>>>>>
>  >>>>>> We're not actually using it, since it was just a small proof of
>  >>>>>> concept type of project, but is this anything people might be
>  >>>>>> interested in?
>  >>>>>>
>  >>>>>
>  >>>>> Out of simple preference I would love to see a YAML request
>  >>>>> handler
>  >>>>> just because I like the YAML format. If its also faster than XML,
>  >>>>> then
>  >>>>> all the better.
>  >>>>>
>  >>>>> Cheers
>  >>>>> Alec
>  >>>>>
>  >>>>
>  >>>>
>  >>>>
>  >>>> --
>  >>>> --Noble Paul
>  >>>
>  >>> --------------------------
>  >>> Grant Ingersoll
>  >>> http://www.lucenebootcamp.com
>  >>> Next Training: April 7, 2008 at ApacheCon Europe in Amsterdam
>  >>>
>  >>> Lucene Helpful Hints:
>  >>> http://wiki.apache.org/lucene-java/BasicsOfPerformance
>  >>> http://wiki.apache.org/lucene-java/LuceneFAQ
>  >>>
>  >>>
>  >>>
>  >>>
>  >>>
>  >>
>  >>
>  >
>  >
>  >
>  > --
>  > --Noble Paul
>
>  --------------------------
>  Grant Ingersoll
>  http://www.lucenebootcamp.com
>  Next Training: April 7, 2008 at ApacheCon Europe in Amsterdam
>
>  Lucene Helpful Hints:
>  http://wiki.apache.org/lucene-java/BasicsOfPerformance
>  http://wiki.apache.org/lucene-java/LuceneFAQ
>
>
>
>
>
>



-- 
--Noble Paul

Re: YAML update request handler

Posted by Grant Ingersoll <gs...@apache.org>.
The DispatchFilter could probably be modified to have the option of  
using the ServletOutputStream instead of the Writer.  It would take  
some doing to maintain the proper compatibility, but it can be done, I  
think.  Maybe we could have a /binary path or something along those  
lines and SolrJ could use that.  QueryResponseWriter could be extended  
to have a write method that takes an OutputStream.   Caveat:  I  
haven't fully investigated this, but I do believe it makes sense for  
SolrJ to use a binary format by default.  The other thing it should do  
is make sure, when sending/receiving XML is that the XML is as "tight"  
as possible, i.e. minimal whitespace, etc.

Just thinking out loud,
Grant

On Feb 22, 2008, at 8:29 AM, Noble Paul നോബിള്‍  
नोब्ळ् wrote:

> The API forbids use of any non-text format.
>
> The QueryResponseWriter's write() method can take only a Writer. So we
> cannot write any binary stream into that.
>
> --Noble
>
> On Fri, Feb 22, 2008 at 12:30 AM, Walter Underwood
> <wu...@netflix.com> wrote:
>> Python marshal format is worth a try. It is binary and can represent
>> the same data as JSON. It should be a good fit to Solr.
>>
>> We benchmarked that against XML several years ago and it was 2X  
>> faster.
>> Of course, XML parsers are a lot faster now.
>>
>> wunder
>>
>>
>>
>> On 2/21/08 10:50 AM, "Grant Ingersoll" <gs...@apache.org> wrote:
>>
>>> XML can be a problem when it is really lengthy (lots of results,  
>>> large
>>> results) such that a binary format could be useful in certain cases
>>> where we control both ends of the pipe (i.e. SolrJ.)  I've seen apps
>>> that deal with really large files wrapped in XML where the XML  
>>> parsing
>>> takes a significant amount of time as compared to a more compact
>>> binary format.
>>>
>>> I think it at least warrants profiling/testing.
>>>
>>> -Grant
>>>
>>> On Feb 21, 2008, at 12:07 PM, Noble Paul നോബിള്‍
>>> नोब्ळ् wrote:
>>>
>>>> hi,
>>>> The format over the wire is not of great significance because it  
>>>> gets
>>>> unmarshalled into the corresponding language object as soon as it
>>>> comes out
>>>> of the wire. I would say XML/JSON should meet 99% of the  
>>>> requirements
>>>> because all the platforms come with an unmarshaller for both of  
>>>> these.
>>>>
>>>> But,If it can offer good performance improvement it is worth  
>>>> trying.
>>>> --Noble
>>>>
>>>> On Thu, Feb 21, 2008 at 3:41 AM, alexander lind <ma...@webstay.org>
>>>> wrote:
>>>>
>>>>> On Feb 20, 2008, at 9:31 AM, Doug Steigerwald wrote:
>>>>>
>>>>>> A few months back I wrote a YAML update request handler to see  
>>>>>> if we
>>>>>> could post documents faster than with XMl.  We did see some small
>>>>>> speed improvements (didn't write down the numbers), but the  
>>>>>> hacked
>>>>>> together code was probably making it slower as well.  Not sure if
>>>>>> there are faster YAML libraries out there either.
>>>>>>
>>>>>> We're not actually using it, since it was just a small proof of
>>>>>> concept type of project, but is this anything people might be
>>>>>> interested in?
>>>>>>
>>>>>
>>>>> Out of simple preference I would love to see a YAML request  
>>>>> handler
>>>>> just because I like the YAML format. If its also faster than XML,
>>>>> then
>>>>> all the better.
>>>>>
>>>>> Cheers
>>>>> Alec
>>>>>
>>>>
>>>>
>>>>
>>>> --
>>>> --Noble Paul
>>>
>>> --------------------------
>>> Grant Ingersoll
>>> http://www.lucenebootcamp.com
>>> Next Training: April 7, 2008 at ApacheCon Europe in Amsterdam
>>>
>>> Lucene Helpful Hints:
>>> http://wiki.apache.org/lucene-java/BasicsOfPerformance
>>> http://wiki.apache.org/lucene-java/LuceneFAQ
>>>
>>>
>>>
>>>
>>>
>>
>>
>
>
>
> -- 
> --Noble Paul

--------------------------
Grant Ingersoll
http://www.lucenebootcamp.com
Next Training: April 7, 2008 at ApacheCon Europe in Amsterdam

Lucene Helpful Hints:
http://wiki.apache.org/lucene-java/BasicsOfPerformance
http://wiki.apache.org/lucene-java/LuceneFAQ






Re: YAML update request handler

Posted by Noble Paul നോബിള്‍ नोब्ळ् <no...@gmail.com>.
The API forbids use of any non-text format.

The QueryResponseWriter's write() method can take only a Writer. So we
cannot write any binary stream into that.

--Noble

On Fri, Feb 22, 2008 at 12:30 AM, Walter Underwood
<wu...@netflix.com> wrote:
> Python marshal format is worth a try. It is binary and can represent
>  the same data as JSON. It should be a good fit to Solr.
>
>  We benchmarked that against XML several years ago and it was 2X faster.
>  Of course, XML parsers are a lot faster now.
>
>  wunder
>
>
>
>  On 2/21/08 10:50 AM, "Grant Ingersoll" <gs...@apache.org> wrote:
>
>  > XML can be a problem when it is really lengthy (lots of results, large
>  > results) such that a binary format could be useful in certain cases
>  > where we control both ends of the pipe (i.e. SolrJ.)  I've seen apps
>  > that deal with really large files wrapped in XML where the XML parsing
>  > takes a significant amount of time as compared to a more compact
>  > binary format.
>  >
>  > I think it at least warrants profiling/testing.
>  >
>  > -Grant
>  >
>  > On Feb 21, 2008, at 12:07 PM, Noble Paul നോബിള്‍
>  > नोब्ळ् wrote:
>  >
>  >> hi,
>  >> The format over the wire is not of great significance because it gets
>  >> unmarshalled into the corresponding language object as soon as it
>  >> comes out
>  >> of the wire. I would say XML/JSON should meet 99% of the requirements
>  >> because all the platforms come with an unmarshaller for both of these.
>  >>
>  >> But,If it can offer good performance improvement it is worth trying.
>  >> --Noble
>  >>
>  >> On Thu, Feb 21, 2008 at 3:41 AM, alexander lind <ma...@webstay.org>
>  >> wrote:
>  >>
>  >>> On Feb 20, 2008, at 9:31 AM, Doug Steigerwald wrote:
>  >>>
>  >>>> A few months back I wrote a YAML update request handler to see if we
>  >>>> could post documents faster than with XMl.  We did see some small
>  >>>> speed improvements (didn't write down the numbers), but the hacked
>  >>>> together code was probably making it slower as well.  Not sure if
>  >>>> there are faster YAML libraries out there either.
>  >>>>
>  >>>> We're not actually using it, since it was just a small proof of
>  >>>> concept type of project, but is this anything people might be
>  >>>> interested in?
>  >>>>
>  >>>
>  >>> Out of simple preference I would love to see a YAML request handler
>  >>> just because I like the YAML format. If its also faster than XML,
>  >>> then
>  >>> all the better.
>  >>>
>  >>> Cheers
>  >>> Alec
>  >>>
>  >>
>  >>
>  >>
>  >> --
>  >> --Noble Paul
>  >
>  > --------------------------
>  > Grant Ingersoll
>  > http://www.lucenebootcamp.com
>  > Next Training: April 7, 2008 at ApacheCon Europe in Amsterdam
>  >
>  > Lucene Helpful Hints:
>  > http://wiki.apache.org/lucene-java/BasicsOfPerformance
>  > http://wiki.apache.org/lucene-java/LuceneFAQ
>  >
>  >
>  >
>  >
>  >
>
>



-- 
--Noble Paul

Re: YAML update request handler

Posted by Grant Ingersoll <gs...@apache.org>.
See https://issues.apache.org/jira/browse/SOLR-476

On Feb 22, 2008, at 5:17 AM, Noble Paul നോബിള്‍  
नोब्ळ् wrote:

> The SolrJ client is designed with the ResponseParser as an abstract
> class (which is good). But I have no means to plugin my custom
> ResponseParser class.
> Add a setter method . setResponseParser(ResponseParser parser)
> and  have a lazy initialization of Responseparser .
> if(_processor == null) _processor = new XMLResponseParser();
>
> in the beginning of the request method.
>
> While it is a good idea to use commons HttpClient It is a huge ball
> and chain to put those extra jars  (comons-http-client,
> commons-logging, commons-codec ) in my simple client application . It
> is too much to ask by a client API which is just supposed to parse an
> xml response.
>
> If httpclient  is not available we must be able to fall back to new
> URL().openConnection();
>
> --Noble
>
> On Fri, Feb 22, 2008 at 9:46 AM, Noble Paul നോബിള്‍  
> नोब्ळ्
> <no...@gmail.com> wrote:
>> For the case where we use Solrj (we control both ends) It is best  
>> to resort to a custom binary format. It works fastest and with  
>> least cost /bandwidth . We can use a custom object serialization/ 
>> deserialization mechanism (java standard serialization is verbose )  
>> which is lightweight .
>>
>> I can create a patch which can be used for the same if you think it  
>> is useful.
>>
>> --Noble
>>
>>
>>
>>
>>
>>
>>
>> On Fri, Feb 22, 2008 at 12:20 AM, Grant Ingersoll <gsingers@apache.org 
>> > wrote:
>>
>>> XML can be a problem when it is really lengthy (lots of results,  
>>> large
>>> results) such that a binary format could be useful in certain cases
>>> where we control both ends of the pipe (i.e. SolrJ.)  I've seen apps
>>> that deal with really large files wrapped in XML where the XML  
>>> parsing
>>> takes a significant amount of time as compared to a more compact
>>> binary format.
>>>
>>> I think it at least warrants profiling/testing.
>>>
>>> -Grant
>>>
>>> On Feb 21, 2008, at 12:07 PM, Noble Paul നോബിള്‍
>>>
>>>
>>>
>>> नोब्ळ् wrote:
>>>
>>>> hi,
>>>> The format over the wire is not of great significance because it  
>>>> gets
>>>> unmarshalled into the corresponding language object as soon as it
>>>> comes out
>>>> of the wire. I would say XML/JSON should meet 99% of the  
>>>> requirements
>>>> because all the platforms come with an unmarshaller for both of  
>>>> these.
>>>>
>>>> But,If it can offer good performance improvement it is worth  
>>>> trying.
>>>> --Noble
>>>>
>>>> On Thu, Feb 21, 2008 at 3:41 AM, alexander lind <ma...@webstay.org>
>>>> wrote:
>>>>
>>>>> On Feb 20, 2008, at 9:31 AM, Doug Steigerwald wrote:
>>>>>
>>>>>> A few months back I wrote a YAML update request handler to see  
>>>>>> if we
>>>>>> could post documents faster than with XMl.  We did see some small
>>>>>> speed improvements (didn't write down the numbers), but the  
>>>>>> hacked
>>>>>> together code was probably making it slower as well.  Not sure if
>>>>>> there are faster YAML libraries out there either.
>>>>>>
>>>>>> We're not actually using it, since it was just a small proof of
>>>>>> concept type of project, but is this anything people might be
>>>>>> interested in?
>>>>>>
>>>>>
>>>>> Out of simple preference I would love to see a YAML request  
>>>>> handler
>>>>> just because I like the YAML format. If its also faster than XML,
>>>>> then
>>>>> all the better.
>>>>>
>>>>> Cheers
>>>>> Alec
>>>>>
>>>>
>>>>
>>>>
>>>> --
>>>> --Noble Paul
>>>
>>> --------------------------
>>> Grant Ingersoll
>>> http://www.lucenebootcamp.com
>>> Next Training: April 7, 2008 at ApacheCon Europe in Amsterdam
>>>
>>> Lucene Helpful Hints:
>>> http://wiki.apache.org/lucene-java/BasicsOfPerformance
>>> http://wiki.apache.org/lucene-java/LuceneFAQ
>>>
>>>
>>>
>>>
>>>
>>>
>>
>>
>>
>> --
>> --Noble Paul
>
>
>
> -- 
> --Noble Paul

--------------------------
Grant Ingersoll
http://www.lucenebootcamp.com
Next Training: April 7, 2008 at ApacheCon Europe in Amsterdam

Lucene Helpful Hints:
http://wiki.apache.org/lucene-java/BasicsOfPerformance
http://wiki.apache.org/lucene-java/LuceneFAQ






Re: YAML update request handler

Posted by Noble Paul നോബിള്‍ नोब्ळ् <no...@gmail.com>.
The SolrJ client is designed with the ResponseParser as an abstract
class (which is good). But I have no means to plugin my custom
ResponseParser class.
Add a setter method . setResponseParser(ResponseParser parser)
and  have a lazy initialization of Responseparser .
if(_processor == null) _processor = new XMLResponseParser();

in the beginning of the request method.

While it is a good idea to use commons HttpClient It is a huge ball
and chain to put those extra jars  (comons-http-client,
commons-logging, commons-codec ) in my simple client application . It
is too much to ask by a client API which is just supposed to parse an
xml response.

If httpclient  is not available we must be able to fall back to new
URL().openConnection();

--Noble

On Fri, Feb 22, 2008 at 9:46 AM, Noble Paul നോബിള്‍ नोब्ळ्
<no...@gmail.com> wrote:
> For the case where we use Solrj (we control both ends) It is best to resort to a custom binary format. It works fastest and with least cost /bandwidth . We can use a custom object serialization/deserialization mechanism (java standard serialization is verbose ) which is lightweight .
>
> I can create a patch which can be used for the same if you think it is useful.
>
> --Noble
>
>
>
>
>
>
>
> On Fri, Feb 22, 2008 at 12:20 AM, Grant Ingersoll <gs...@apache.org> wrote:
>
> > XML can be a problem when it is really lengthy (lots of results, large
> > results) such that a binary format could be useful in certain cases
> > where we control both ends of the pipe (i.e. SolrJ.)  I've seen apps
> > that deal with really large files wrapped in XML where the XML parsing
> > takes a significant amount of time as compared to a more compact
> > binary format.
> >
> > I think it at least warrants profiling/testing.
> >
> > -Grant
> >
> > On Feb 21, 2008, at 12:07 PM, Noble Paul നോബിള്‍
> >
> >
> >
> > नोब्ळ् wrote:
> >
> > > hi,
> > > The format over the wire is not of great significance because it gets
> > > unmarshalled into the corresponding language object as soon as it
> > > comes out
> > > of the wire. I would say XML/JSON should meet 99% of the requirements
> > > because all the platforms come with an unmarshaller for both of these.
> > >
> > > But,If it can offer good performance improvement it is worth trying.
> > > --Noble
> > >
> > > On Thu, Feb 21, 2008 at 3:41 AM, alexander lind <ma...@webstay.org>
> > > wrote:
> > >
> > >> On Feb 20, 2008, at 9:31 AM, Doug Steigerwald wrote:
> > >>
> > >>> A few months back I wrote a YAML update request handler to see if we
> > >>> could post documents faster than with XMl.  We did see some small
> > >>> speed improvements (didn't write down the numbers), but the hacked
> > >>> together code was probably making it slower as well.  Not sure if
> > >>> there are faster YAML libraries out there either.
> > >>>
> > >>> We're not actually using it, since it was just a small proof of
> > >>> concept type of project, but is this anything people might be
> > >>> interested in?
> > >>>
> > >>
> > >> Out of simple preference I would love to see a YAML request handler
> > >> just because I like the YAML format. If its also faster than XML,
> > >> then
> > >> all the better.
> > >>
> > >> Cheers
> > >> Alec
> > >>
> > >
> > >
> > >
> > > --
> > > --Noble Paul
> >
> > --------------------------
> > Grant Ingersoll
> > http://www.lucenebootcamp.com
> > Next Training: April 7, 2008 at ApacheCon Europe in Amsterdam
> >
> > Lucene Helpful Hints:
> > http://wiki.apache.org/lucene-java/BasicsOfPerformance
> > http://wiki.apache.org/lucene-java/LuceneFAQ
> >
> >
> >
> >
> >
> >
>
>
>
> --
> --Noble Paul



-- 
--Noble Paul

Re: YAML update request handler

Posted by Noble Paul നോബിള്‍ नोब्ळ् <no...@gmail.com>.
For the case where we use Solrj (we control both ends) It is best to resort
to a custom binary format. It works fastest and with least cost /bandwidth .
We can use a custom object serialization/deserialization mechanism (java
standard serialization is verbose ) which is lightweight .

I can create a patch which can be used for the same if you think it is
useful.

--Noble



On Fri, Feb 22, 2008 at 12:20 AM, Grant Ingersoll <gs...@apache.org>
wrote:

> XML can be a problem when it is really lengthy (lots of results, large
> results) such that a binary format could be useful in certain cases
> where we control both ends of the pipe (i.e. SolrJ.)  I've seen apps
> that deal with really large files wrapped in XML where the XML parsing
> takes a significant amount of time as compared to a more compact
> binary format.
>
> I think it at least warrants profiling/testing.
>
> -Grant
>
> On Feb 21, 2008, at 12:07 PM, Noble Paul നോബിള്‍
> नोब्ळ् wrote:
>
> > hi,
> > The format over the wire is not of great significance because it gets
> > unmarshalled into the corresponding language object as soon as it
> > comes out
> > of the wire. I would say XML/JSON should meet 99% of the requirements
> > because all the platforms come with an unmarshaller for both of these.
> >
> > But,If it can offer good performance improvement it is worth trying.
> > --Noble
> >
> > On Thu, Feb 21, 2008 at 3:41 AM, alexander lind <ma...@webstay.org>
> > wrote:
> >
> >> On Feb 20, 2008, at 9:31 AM, Doug Steigerwald wrote:
> >>
> >>> A few months back I wrote a YAML update request handler to see if we
> >>> could post documents faster than with XMl.  We did see some small
> >>> speed improvements (didn't write down the numbers), but the hacked
> >>> together code was probably making it slower as well.  Not sure if
> >>> there are faster YAML libraries out there either.
> >>>
> >>> We're not actually using it, since it was just a small proof of
> >>> concept type of project, but is this anything people might be
> >>> interested in?
> >>>
> >>
> >> Out of simple preference I would love to see a YAML request handler
> >> just because I like the YAML format. If its also faster than XML,
> >> then
> >> all the better.
> >>
> >> Cheers
> >> Alec
> >>
> >
> >
> >
> > --
> > --Noble Paul
>
> --------------------------
> Grant Ingersoll
> http://www.lucenebootcamp.com
> Next Training: April 7, 2008 at ApacheCon Europe in Amsterdam
>
> Lucene Helpful Hints:
> http://wiki.apache.org/lucene-java/BasicsOfPerformance
> http://wiki.apache.org/lucene-java/LuceneFAQ
>
>
>
>
>
>


-- 
--Noble Paul

Re: YAML update request handler

Posted by Walter Underwood <wu...@netflix.com>.
Python marshal format is worth a try. It is binary and can represent
the same data as JSON. It should be a good fit to Solr.

We benchmarked that against XML several years ago and it was 2X faster.
Of course, XML parsers are a lot faster now.

wunder

On 2/21/08 10:50 AM, "Grant Ingersoll" <gs...@apache.org> wrote:

> XML can be a problem when it is really lengthy (lots of results, large
> results) such that a binary format could be useful in certain cases
> where we control both ends of the pipe (i.e. SolrJ.)  I've seen apps
> that deal with really large files wrapped in XML where the XML parsing
> takes a significant amount of time as compared to a more compact
> binary format.
> 
> I think it at least warrants profiling/testing.
> 
> -Grant
> 
> On Feb 21, 2008, at 12:07 PM, Noble Paul നോബിള്‍
> नोब्ळ् wrote:
> 
>> hi,
>> The format over the wire is not of great significance because it gets
>> unmarshalled into the corresponding language object as soon as it
>> comes out
>> of the wire. I would say XML/JSON should meet 99% of the requirements
>> because all the platforms come with an unmarshaller for both of these.
>> 
>> But,If it can offer good performance improvement it is worth trying.
>> --Noble
>> 
>> On Thu, Feb 21, 2008 at 3:41 AM, alexander lind <ma...@webstay.org>
>> wrote:
>> 
>>> On Feb 20, 2008, at 9:31 AM, Doug Steigerwald wrote:
>>> 
>>>> A few months back I wrote a YAML update request handler to see if we
>>>> could post documents faster than with XMl.  We did see some small
>>>> speed improvements (didn't write down the numbers), but the hacked
>>>> together code was probably making it slower as well.  Not sure if
>>>> there are faster YAML libraries out there either.
>>>> 
>>>> We're not actually using it, since it was just a small proof of
>>>> concept type of project, but is this anything people might be
>>>> interested in?
>>>> 
>>> 
>>> Out of simple preference I would love to see a YAML request handler
>>> just because I like the YAML format. If its also faster than XML,
>>> then
>>> all the better.
>>> 
>>> Cheers
>>> Alec
>>> 
>> 
>> 
>> 
>> -- 
>> --Noble Paul
> 
> --------------------------
> Grant Ingersoll
> http://www.lucenebootcamp.com
> Next Training: April 7, 2008 at ApacheCon Europe in Amsterdam
> 
> Lucene Helpful Hints:
> http://wiki.apache.org/lucene-java/BasicsOfPerformance
> http://wiki.apache.org/lucene-java/LuceneFAQ
> 
> 
> 
> 
> 


Re: YAML update request handler

Posted by Grant Ingersoll <gs...@apache.org>.
XML can be a problem when it is really lengthy (lots of results, large  
results) such that a binary format could be useful in certain cases  
where we control both ends of the pipe (i.e. SolrJ.)  I've seen apps  
that deal with really large files wrapped in XML where the XML parsing  
takes a significant amount of time as compared to a more compact  
binary format.

I think it at least warrants profiling/testing.

-Grant

On Feb 21, 2008, at 12:07 PM, Noble Paul നോബിള്‍  
नोब्ळ् wrote:

> hi,
> The format over the wire is not of great significance because it gets
> unmarshalled into the corresponding language object as soon as it  
> comes out
> of the wire. I would say XML/JSON should meet 99% of the requirements
> because all the platforms come with an unmarshaller for both of these.
>
> But,If it can offer good performance improvement it is worth trying.
> --Noble
>
> On Thu, Feb 21, 2008 at 3:41 AM, alexander lind <ma...@webstay.org>  
> wrote:
>
>> On Feb 20, 2008, at 9:31 AM, Doug Steigerwald wrote:
>>
>>> A few months back I wrote a YAML update request handler to see if we
>>> could post documents faster than with XMl.  We did see some small
>>> speed improvements (didn't write down the numbers), but the hacked
>>> together code was probably making it slower as well.  Not sure if
>>> there are faster YAML libraries out there either.
>>>
>>> We're not actually using it, since it was just a small proof of
>>> concept type of project, but is this anything people might be
>>> interested in?
>>>
>>
>> Out of simple preference I would love to see a YAML request handler
>> just because I like the YAML format. If its also faster than XML,  
>> then
>> all the better.
>>
>> Cheers
>> Alec
>>
>
>
>
> -- 
> --Noble Paul

--------------------------
Grant Ingersoll
http://www.lucenebootcamp.com
Next Training: April 7, 2008 at ApacheCon Europe in Amsterdam

Lucene Helpful Hints:
http://wiki.apache.org/lucene-java/BasicsOfPerformance
http://wiki.apache.org/lucene-java/LuceneFAQ






Re: YAML update request handler

Posted by Noble Paul നോബിള്‍ नोब्ळ् <no...@gmail.com>.
hi,
The format over the wire is not of great significance because it gets
unmarshalled into the corresponding language object as soon as it comes out
of the wire. I would say XML/JSON should meet 99% of the requirements
because all the platforms come with an unmarshaller for both of these.

But,If it can offer good performance improvement it is worth trying.
--Noble

On Thu, Feb 21, 2008 at 3:41 AM, alexander lind <ma...@webstay.org> wrote:

> On Feb 20, 2008, at 9:31 AM, Doug Steigerwald wrote:
>
> > A few months back I wrote a YAML update request handler to see if we
> > could post documents faster than with XMl.  We did see some small
> > speed improvements (didn't write down the numbers), but the hacked
> > together code was probably making it slower as well.  Not sure if
> > there are faster YAML libraries out there either.
> >
> > We're not actually using it, since it was just a small proof of
> > concept type of project, but is this anything people might be
> > interested in?
> >
>
> Out of simple preference I would love to see a YAML request handler
> just because I like the YAML format. If its also faster than XML, then
> all the better.
>
> Cheers
> Alec
>



-- 
--Noble Paul

Re: YAML update request handler

Posted by alexander lind <ma...@webstay.org>.
On Feb 20, 2008, at 9:31 AM, Doug Steigerwald wrote:

> A few months back I wrote a YAML update request handler to see if we  
> could post documents faster than with XMl.  We did see some small  
> speed improvements (didn't write down the numbers), but the hacked  
> together code was probably making it slower as well.  Not sure if  
> there are faster YAML libraries out there either.
>
> We're not actually using it, since it was just a small proof of  
> concept type of project, but is this anything people might be  
> interested in?
>

Out of simple preference I would love to see a YAML request handler  
just because I like the YAML format. If its also faster than XML, then  
all the better.

Cheers
Alec