You are viewing a plain text version of this content. The canonical link for it is here.
Posted to solr-user@lucene.apache.org by Simon Collins <si...@shoe-shop.com> on 2008/10/27 22:19:12 UTC

replication handler - compression

Is there an option on the replication handler to compress the files?

 

I'm trying to replicate off site, and seem to have accumulated about
1.4gb. When compressed with winzip of all things i can get this down to
about 10% of the size.

 

Is compression in the pipeline / can it be if not!

 

simon



This message has been scanned for malware by SurfControl plc. www.surfcontrol.com

Re: replication handler - compression

Posted by christophe <ch...@lemoine-fr.com>.
Hi,

Is the new replication feature based on HTTP requests between sites ?
If yes, then I guess it might be possible to configure an HTTP server 
with mod_deflate so the data is compressed on the fly.

C.

Simon Collins wrote:
> I have now optimized the index - down to 325mb, it compresses down to 20mb.
>
> I think the new replication thing in solr is great, but if it could compress the files it's sending, it would be an awful lot more useful when replicating, as we are, between sites.
>
>
>
> --------------------------------------------------------
>
> Simon Collins
> Systems Analyst
>
> Telephone: 01904 606 867
> Fax Number: 01904 528 791
>
> shoe-shop.com ltd
> Catherine House
> Northminster Business Park
> Upper Poppleton, YORK
> YO26 6QU
> www.shoe-shop.com
> --------------------------------------------------------
>
> This message (and any associated files) is intended only for the use of the individual or entity to which it is addressed and may contain information that is confidential, subject to copyright or constitutes a trade secret. If you are not the intended recipient you are hereby notified that any dissemination, copying or distribution of this message, or files associated with this message, is strictly prohibited. If you have received this message in error, please notify us immediately by replying to the message and deleting it from your computer. Messages sent to and from us may be monitored. 
>
> Internet communications cannot be guaranteed to be secure or error-free as information could be intercepted, corrupted, lost, destroyed, arrive late or incomplete, or contain viruses. Therefore, we do not accept responsibility for any errors or omissions that are present in this message, or any attachment, that have arisen as a result of e-mail transmission. If verification is required, please request a hard-copy version. Any views or opinions presented are solely those of the author and do not necessarily represent those of the company. (PAVD001) 
> Shoe-shop.com Limited is a company registered in England and Wales with company number 03817232. Vat Registration GB 734 256 241. Registered Office Catherine House, Northminster Business Park, Upper Poppleton, YORK, YO26 6QU.
>
>
> -----Original Message-----
>
> From: Noble Paul നോബിള്‍ नोब्ळ् [mailto:noble.paul@gmail.com] 
> Sent: 29 October 2008 03:29
> To: solr-user@lucene.apache.org
> Subject: Re: replication handler - compression
>
> The new replication feature does not use any unix commands , it is
> pure java.  On the fly compression is hard but possible.
> I wish to repeat the question. Did you optimize the index? Because a
> 10:1 compression is not usually observed in an optimized index. Our
> own experiments showed compression of around 10:6 for optimized
> indexes.
>
> --Noble
>
> On Wed, Oct 29, 2008 at 3:41 AM, Lance Norskog <go...@gmail.com> wrote:
>   
>> Aha! The hint to the actual problem: "When compressed with winzip". You are running Solr on Windows.
>>
>> Snapshots don't work on Windows: they depend on a Unix file system feature. You may be copying the entire index. Not just that, it could be inconsistent.
>> This is a fine topic for a "best practices for Windows" wiki page.
>>
>> The 'scp' program what you want. It has an option to compress on the fly without saving anything to disk. 'Rcopy' in particular has features to only copy what is not already at the target.  The Putty suite 'pscp' program also has the compression feature.
>>
>> Lance
>>
>> -----Original Message-----
>> From: Noble Paul നോബിള്‍ नोब्ळ् [mailto:noble.paul@gmail.com]
>> Sent: Monday, October 27, 2008 9:36 PM
>> To: solr-user@lucene.apache.org
>> Subject: Re: replication handler - compression
>>
>>     
>>> It is useful only if your bandwidth is very low.
>>> Otherwise the cost of copying/comprressing/decompressing can take up
>>> more time than we save.
>>>       
>> I mean compressing and transferring. If the optimized index itself has a very high compression ratio  then it is worth exploring the option of compresssing and transferring. And do not assume that all the files in the index directory is transferred during replication. It only transfers the files which are used by the current commit point and the ones which are absent in the slave
>>
>>
>>     
>>>
>>> On Tue, Oct 28, 2008 at 2:49 AM, Simon Collins
>>> <si...@shoe-shop.com> wrote:
>>>       
>>>> Is there an option on the replication handler to compress the files?
>>>>
>>>>
>>>>
>>>> I'm trying to replicate off site, and seem to have accumulated about
>>>> 1.4gb. When compressed with winzip of all things i can get this down
>>>> to about 10% of the size.
>>>>
>>>>
>>>>
>>>> Is compression in the pipeline / can it be if not!
>>>>
>>>>
>>>>
>>>> simon
>>>>
>>>>
>>>>
>>>> This message has been scanned for malware by SurfControl plc.
>>>> www.surfcontrol.com
>>>>
>>>>         
>>>
>>> --
>>> --Noble Paul
>>>
>>>       
>>
>> --
>> --Noble Paul
>>
>>
>>     
>
>
>
>   

Re: replication handler - compression

Posted by Chris Hostetter <ho...@fucit.org>.
: Yeah.  I'm just not sure how much benefit in terms of data transfer this 
: will save.  Has anyone tested this to see if this is even worth it?

one mans trash is another mans treasure ... if you're replicating 
snapshoots very frequently within a single datacenter speed is critical
and bandwidth is free -- if you're replicating once a day from one data 
center to another over a very expensive, very small, pipe spending some 
time+cpu to compress may be worth it.

either way: it should be almost trivial to implement if people wnat to 
supply a patch, and with a simple new requestDispatcher config option, 
easy to disable completeley on the server for people who might have 
clients sending "Accept-Encodig: gzip" willy nilly


-Hoss


Re: replication handler - compression

Posted by Walter Underwood <wu...@netflix.com>.
It could also be that the C version is a lot more efficient than
the Java version and it could take longer regardless. I could not
find a benchmark on that, but C is usually better for bit twiddling.

wunder

On 10/30/08 10:36 PM, "Otis Gospodnetic" <ot...@yahoo.com> wrote:

> man gzip:
> 
>        -# --fast --best
>               Regulate the speed of compression using the specified digit #,
> where -1 or --fast indicates the  fastest  compres-
>               sion  method (less compression) and -9 or --best indicates the
> slowest compression method (best compression).  The
>               default compression level is -6 (that is, biased towards high
> compression at expense of speed).
> 
>  
> So it could be better than the factor of 2, but also take longer. :)
> 
> Otis
> --
> Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch
> 
> 
> 
> ----- Original Message ----
>> From: Walter Underwood <wu...@netflix.com>
>> To: solr-user@lucene.apache.org
>> Sent: Thursday, October 30, 2008 11:52:47 AM
>> Subject: Re: replication handler - compression
>> 
>> About a factor of 2 on a small, optimized index. Gzipping took 20 seconds,
>> so it isn't free.
>> 
>> $ cd index-copy
>> $ du -sk
>> 134336  .
>> $ gzip *
>> $ du -sk
>> 62084   .
>> 
>> wunder
>> 
>> On 10/30/08 8:20 AM, "Otis Gospodnetic" wrote:
>> 
>>> Yeah.  I'm just not sure how much benefit in terms of data transfer this
>>> will
>>> save.  Has anyone tested this to see if this is even worth it?
>>> 
>>> 
>>> Otis
>>> --
>>> Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch
>>> 
>>> 
>>> 
>>> ----- Original Message ----
>>>> From: Erik Hatcher
>>>> To: solr-user@lucene.apache.org
>>>> Sent: Thursday, October 30, 2008 9:54:28 AM
>>>> Subject: Re: replication handler - compression
>>>> 
>>>> +1 - the GzipServletFilter is the way to go.
>>>> 
>>>> Regarding request handlers reading HTTP headers, yeah,... this will
>>>> improve,
>>>> for 
>>>> sure.
>>>> 
>>>>     Erik
>>>> 
>>>> On Oct 30, 2008, at 12:18 AM, Chris Hostetter wrote:
>>>> 
>>>>> 
>>>>> : You are partially right. Instead of the HTTP header , we use a request
>>>>> : parameter. (RequestHandlers cannot read HTP headers). If the param is
>>>>> 
>>>>> hmmm, i'm with walter: we shouldn't invent new mechanisms for
>>>>> clients to request compression over HTTP from servers.
>>>>> 
>>>>> replicatoin is both special enough and important enough that if we had to
>>>>> add special support to make that information available to the handler on
>>>>> the master we could.
>>>>> 
>>>>> but frankly i don't think that's neccessary: the logic to turn on
>>>>> compression if the client requests it using "Accept-Encoding: gzip" is
>>>>> generic enough that there is no reason for it to be in a handler.  we
>>>>> could easily put it in the SolrDispatchFilter, or even in a new
>>>>> ServletFilte (i'm guessing iv'e seen about 74 different implementations of
>>>>> a GzipServletFilter in the wild that could be used as is.
>>>>> 
>>>>> then we'd have double wins: compression for replication, and compression
>>>>> of all responses generated by Solr if hte client requests it.
>>>>> 
>>>>> -Hoss
>>> 
> 


Re: replication handler - compression

Posted by Otis Gospodnetic <ot...@yahoo.com>.
man gzip:

       -# --fast --best
              Regulate the speed of compression using the specified digit #, where -1 or --fast indicates the  fastest  compres-
              sion  method (less compression) and -9 or --best indicates the slowest compression method (best compression).  The
              default compression level is -6 (that is, biased towards high compression at expense of speed).

 
So it could be better than the factor of 2, but also take longer. :)

Otis
--
Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch



----- Original Message ----
> From: Walter Underwood <wu...@netflix.com>
> To: solr-user@lucene.apache.org
> Sent: Thursday, October 30, 2008 11:52:47 AM
> Subject: Re: replication handler - compression
> 
> About a factor of 2 on a small, optimized index. Gzipping took 20 seconds,
> so it isn't free.
> 
> $ cd index-copy
> $ du -sk
> 134336  .
> $ gzip *
> $ du -sk
> 62084   .
> 
> wunder
> 
> On 10/30/08 8:20 AM, "Otis Gospodnetic" wrote:
> 
> > Yeah.  I'm just not sure how much benefit in terms of data transfer this will
> > save.  Has anyone tested this to see if this is even worth it?
> > 
> > 
> > Otis
> > --
> > Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch
> > 
> > 
> > 
> > ----- Original Message ----
> >> From: Erik Hatcher 
> >> To: solr-user@lucene.apache.org
> >> Sent: Thursday, October 30, 2008 9:54:28 AM
> >> Subject: Re: replication handler - compression
> >> 
> >> +1 - the GzipServletFilter is the way to go.
> >> 
> >> Regarding request handlers reading HTTP headers, yeah,... this will improve,
> >> for 
> >> sure.
> >> 
> >>     Erik
> >> 
> >> On Oct 30, 2008, at 12:18 AM, Chris Hostetter wrote:
> >> 
> >>> 
> >>> : You are partially right. Instead of the HTTP header , we use a request
> >>> : parameter. (RequestHandlers cannot read HTP headers). If the param is
> >>> 
> >>> hmmm, i'm with walter: we shouldn't invent new mechanisms for
> >>> clients to request compression over HTTP from servers.
> >>> 
> >>> replicatoin is both special enough and important enough that if we had to
> >>> add special support to make that information available to the handler on
> >>> the master we could.
> >>> 
> >>> but frankly i don't think that's neccessary: the logic to turn on
> >>> compression if the client requests it using "Accept-Encoding: gzip" is
> >>> generic enough that there is no reason for it to be in a handler.  we
> >>> could easily put it in the SolrDispatchFilter, or even in a new
> >>> ServletFilte (i'm guessing iv'e seen about 74 different implementations of
> >>> a GzipServletFilter in the wild that could be used as is.
> >>> 
> >>> then we'd have double wins: compression for replication, and compression
> >>> of all responses generated by Solr if hte client requests it.
> >>> 
> >>> -Hoss
> > 


Re: replication handler - compression

Posted by Walter Underwood <wu...@netflix.com>.
CPU was at 100%, it was not IO bound. --wunder

On 10/30/08 8:58 AM, "christophe" <ch...@lemoine-fr.com> wrote:

> Gziping on disk requires quite some I/O. I guess that on the fly zipping
> should be faster.
> 
> C.
> 
> Walter Underwood wrote:
>> About a factor of 2 on a small, optimized index. Gzipping took 20 seconds,
>> so it isn't free.
>> 
>> $ cd index-copy
>> $ du -sk
>> 134336  .
>> $ gzip *
>> $ du -sk
>> 62084   .
>> 
>> wunder
>> 
>> On 10/30/08 8:20 AM, "Otis Gospodnetic" <ot...@yahoo.com> wrote:
>> 
>>   
>>> Yeah.  I'm just not sure how much benefit in terms of data transfer this
>>> will
>>> save.  Has anyone tested this to see if this is even worth it?
>>> 
>>> 
>>> Otis
>>> --
>>> Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch
>>> 
>>> 
>>> 
>>> ----- Original Message ----
>>>     
>>>> From: Erik Hatcher <er...@ehatchersolutions.com>
>>>> To: solr-user@lucene.apache.org
>>>> Sent: Thursday, October 30, 2008 9:54:28 AM
>>>> Subject: Re: replication handler - compression
>>>> 
>>>> +1 - the GzipServletFilter is the way to go.
>>>> 
>>>> Regarding request handlers reading HTTP headers, yeah,... this will
>>>> improve,
>>>> for 
>>>> sure.
>>>> 
>>>>     Erik
>>>> 
>>>> On Oct 30, 2008, at 12:18 AM, Chris Hostetter wrote:
>>>> 
>>>>       
>>>>> : You are partially right. Instead of the HTTP header , we use a request
>>>>> : parameter. (RequestHandlers cannot read HTP headers). If the param is
>>>>> 
>>>>> hmmm, i'm with walter: we shouldn't invent new mechanisms for
>>>>> clients to request compression over HTTP from servers.
>>>>> 
>>>>> replicatoin is both special enough and important enough that if we had to
>>>>> add special support to make that information available to the handler on
>>>>> the master we could.
>>>>> 
>>>>> but frankly i don't think that's neccessary: the logic to turn on
>>>>> compression if the client requests it using "Accept-Encoding: gzip" is
>>>>> generic enough that there is no reason for it to be in a handler.  we
>>>>> could easily put it in the SolrDispatchFilter, or even in a new
>>>>> ServletFilte (i'm guessing iv'e seen about 74 different implementations of
>>>>> a GzipServletFilter in the wild that could be used as is.
>>>>> 
>>>>> then we'd have double wins: compression for replication, and compression
>>>>> of all responses generated by Solr if hte client requests it.
>>>>> 
>>>>> -Hoss
>>>>>         
>> 
>>   


Re: replication handler - compression

Posted by christophe <ch...@lemoine-fr.com>.
Gziping on disk requires quite some I/O. I guess that on the fly zipping 
should be faster.

C.

Walter Underwood wrote:
> About a factor of 2 on a small, optimized index. Gzipping took 20 seconds,
> so it isn't free.
>
> $ cd index-copy
> $ du -sk
> 134336  .
> $ gzip *
> $ du -sk
> 62084   .
>
> wunder
>
> On 10/30/08 8:20 AM, "Otis Gospodnetic" <ot...@yahoo.com> wrote:
>
>   
>> Yeah.  I'm just not sure how much benefit in terms of data transfer this will
>> save.  Has anyone tested this to see if this is even worth it?
>>
>>
>> Otis
>> --
>> Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch
>>
>>
>>
>> ----- Original Message ----
>>     
>>> From: Erik Hatcher <er...@ehatchersolutions.com>
>>> To: solr-user@lucene.apache.org
>>> Sent: Thursday, October 30, 2008 9:54:28 AM
>>> Subject: Re: replication handler - compression
>>>
>>> +1 - the GzipServletFilter is the way to go.
>>>
>>> Regarding request handlers reading HTTP headers, yeah,... this will improve,
>>> for 
>>> sure.
>>>
>>>     Erik
>>>
>>> On Oct 30, 2008, at 12:18 AM, Chris Hostetter wrote:
>>>
>>>       
>>>> : You are partially right. Instead of the HTTP header , we use a request
>>>> : parameter. (RequestHandlers cannot read HTP headers). If the param is
>>>>
>>>> hmmm, i'm with walter: we shouldn't invent new mechanisms for
>>>> clients to request compression over HTTP from servers.
>>>>
>>>> replicatoin is both special enough and important enough that if we had to
>>>> add special support to make that information available to the handler on
>>>> the master we could.
>>>>
>>>> but frankly i don't think that's neccessary: the logic to turn on
>>>> compression if the client requests it using "Accept-Encoding: gzip" is
>>>> generic enough that there is no reason for it to be in a handler.  we
>>>> could easily put it in the SolrDispatchFilter, or even in a new
>>>> ServletFilte (i'm guessing iv'e seen about 74 different implementations of
>>>> a GzipServletFilter in the wild that could be used as is.
>>>>
>>>> then we'd have double wins: compression for replication, and compression
>>>> of all responses generated by Solr if hte client requests it.
>>>>
>>>> -Hoss
>>>>         
>
>   

Re: replication handler - compression

Posted by Walter Underwood <wu...@netflix.com>.
About a factor of 2 on a small, optimized index. Gzipping took 20 seconds,
so it isn't free.

$ cd index-copy
$ du -sk
134336  .
$ gzip *
$ du -sk
62084   .

wunder

On 10/30/08 8:20 AM, "Otis Gospodnetic" <ot...@yahoo.com> wrote:

> Yeah.  I'm just not sure how much benefit in terms of data transfer this will
> save.  Has anyone tested this to see if this is even worth it?
> 
> 
> Otis
> --
> Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch
> 
> 
> 
> ----- Original Message ----
>> From: Erik Hatcher <er...@ehatchersolutions.com>
>> To: solr-user@lucene.apache.org
>> Sent: Thursday, October 30, 2008 9:54:28 AM
>> Subject: Re: replication handler - compression
>> 
>> +1 - the GzipServletFilter is the way to go.
>> 
>> Regarding request handlers reading HTTP headers, yeah,... this will improve,
>> for 
>> sure.
>> 
>>     Erik
>> 
>> On Oct 30, 2008, at 12:18 AM, Chris Hostetter wrote:
>> 
>>> 
>>> : You are partially right. Instead of the HTTP header , we use a request
>>> : parameter. (RequestHandlers cannot read HTP headers). If the param is
>>> 
>>> hmmm, i'm with walter: we shouldn't invent new mechanisms for
>>> clients to request compression over HTTP from servers.
>>> 
>>> replicatoin is both special enough and important enough that if we had to
>>> add special support to make that information available to the handler on
>>> the master we could.
>>> 
>>> but frankly i don't think that's neccessary: the logic to turn on
>>> compression if the client requests it using "Accept-Encoding: gzip" is
>>> generic enough that there is no reason for it to be in a handler.  we
>>> could easily put it in the SolrDispatchFilter, or even in a new
>>> ServletFilte (i'm guessing iv'e seen about 74 different implementations of
>>> a GzipServletFilter in the wild that could be used as is.
>>> 
>>> then we'd have double wins: compression for replication, and compression
>>> of all responses generated by Solr if hte client requests it.
>>> 
>>> -Hoss
> 


Re: replication handler - compression

Posted by Otis Gospodnetic <ot...@yahoo.com>.
Yeah.  I'm just not sure how much benefit in terms of data transfer this will save.  Has anyone tested this to see if this is even worth it?


Otis
--
Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch



----- Original Message ----
> From: Erik Hatcher <er...@ehatchersolutions.com>
> To: solr-user@lucene.apache.org
> Sent: Thursday, October 30, 2008 9:54:28 AM
> Subject: Re: replication handler - compression
> 
> +1 - the GzipServletFilter is the way to go.
> 
> Regarding request handlers reading HTTP headers, yeah,... this will improve, for 
> sure.
> 
>     Erik
> 
> On Oct 30, 2008, at 12:18 AM, Chris Hostetter wrote:
> 
> > 
> > : You are partially right. Instead of the HTTP header , we use a request
> > : parameter. (RequestHandlers cannot read HTP headers). If the param is
> > 
> > hmmm, i'm with walter: we shouldn't invent new mechanisms for
> > clients to request compression over HTTP from servers.
> > 
> > replicatoin is both special enough and important enough that if we had to
> > add special support to make that information available to the handler on
> > the master we could.
> > 
> > but frankly i don't think that's neccessary: the logic to turn on
> > compression if the client requests it using "Accept-Encoding: gzip" is
> > generic enough that there is no reason for it to be in a handler.  we
> > could easily put it in the SolrDispatchFilter, or even in a new
> > ServletFilte (i'm guessing iv'e seen about 74 different implementations of
> > a GzipServletFilter in the wild that could be used as is.
> > 
> > then we'd have double wins: compression for replication, and compression
> > of all responses generated by Solr if hte client requests it.
> > 
> > -Hoss


Re: replication handler - compression

Posted by Erik Hatcher <er...@ehatchersolutions.com>.
+1 - the GzipServletFilter is the way to go.

Regarding request handlers reading HTTP headers, yeah,... this will  
improve, for sure.

	Erik

On Oct 30, 2008, at 12:18 AM, Chris Hostetter wrote:

>
> : You are partially right. Instead of the HTTP header , we use a  
> request
> : parameter. (RequestHandlers cannot read HTP headers). If the param  
> is
>
> hmmm, i'm with walter: we shouldn't invent new mechanisms for
> clients to request compression over HTTP from servers.
>
> replicatoin is both special enough and important enough that if we  
> had to
> add special support to make that information available to the  
> handler on
> the master we could.
>
> but frankly i don't think that's neccessary: the logic to turn on
> compression if the client requests it using "Accept-Encoding: gzip" is
> generic enough that there is no reason for it to be in a handler.  we
> could easily put it in the SolrDispatchFilter, or even in a new
> ServletFilte (i'm guessing iv'e seen about 74 different  
> implementations of
> a GzipServletFilter in the wild that could be used as is.
>
> then we'd have double wins: compression for replication, and  
> compression
> of all responses generated by Solr if hte client requests it.
>
> -Hoss


Re: replication handler - compression

Posted by Chris Hostetter <ho...@fucit.org>.
: You are partially right. Instead of the HTTP header , we use a request
: parameter. (RequestHandlers cannot read HTP headers). If the param is

hmmm, i'm with walter: we shouldn't invent new mechanisms for 
clients to request compression over HTTP from servers.

replicatoin is both special enough and important enough that if we had to 
add special support to make that information available to the handler on 
the master we could.

but frankly i don't think that's neccessary: the logic to turn on 
compression if the client requests it using "Accept-Encoding: gzip" is 
generic enough that there is no reason for it to be in a handler.  we 
could easily put it in the SolrDispatchFilter, or even in a new 
ServletFilte (i'm guessing iv'e seen about 74 different implementations of 
a GzipServletFilter in the wild that could be used as is.

then we'd have double wins: compression for replication, and compression 
of all responses generated by Solr if hte client requests it.

-Hoss


Re: replication handler - compression

Posted by Noble Paul നോബിള്‍ नोब्ळ् <no...@gmail.com>.
Hoss,
You are partially right. Instead of the HTTP header , we use a request
parameter. (RequestHandlers cannot read HTP headers). If the param is
present it wraps the response in an zip outputstream. It is configured
in the slave because Every slave may not want compression. . Slaves
which are near can skip it.



On Thu, Oct 30, 2008 at 3:54 AM, Chris Hostetter
<ho...@fucit.org> wrote:
>
> My understanding of Noble's comment (and i could be wrong, i'm reading
> between the lines) is that if you specify the new setting he's suggesting
> when initializing the replication handler on the slave, then the slave
> should start using an "Accept-Encoding: gzip" header when querying the
> master, and that when recieving this header, the master will start
> wrapping the response in a "Content-Encoding: gzip"
>
> (I'm making this assumption based on his note about this being a new slave
> config option, with no mention of any new otions on the master)
>
> : You propose to do compressed transfers over HTTP ignoring the standard
> : support for compressed transfers in HTTP. Programming that with a
> : library doesn't make it "standard".
>
> : >> open a JIRA issue. we
> : > will use a gzip on both ends of the pipe . On
> : > the slave
> : >> side you can
> : > say
> : > <str name="zip">true<str>
> : > as an extra option to compress and
> : >> send
> : > data from server
> : > --Noble
>
>
> -Hoss
>
>



-- 
--Noble Paul

Re: replication handler - compression

Posted by Chris Hostetter <ho...@fucit.org>.
My understanding of Noble's comment (and i could be wrong, i'm reading 
between the lines) is that if you specify the new setting he's suggesting 
when initializing the replication handler on the slave, then the slave 
should start using an "Accept-Encoding: gzip" header when querying the 
master, and that when recieving this header, the master will start 
wrapping the response in a "Content-Encoding: gzip"  

(I'm making this assumption based on his note about this being a new slave 
config option, with no mention of any new otions on the master)

: You propose to do compressed transfers over HTTP ignoring the standard
: support for compressed transfers in HTTP. Programming that with a
: library doesn't make it "standard".

: >> open a JIRA issue. we
: > will use a gzip on both ends of the pipe . On
: > the slave
: >> side you can
: > say
: > <str name="zip">true<str>
: > as an extra option to compress and
: >> send
: > data from server
: > --Noble


-Hoss


Re: replication handler - compression

Posted by Walter Underwood <wu...@netflix.com>.
You propose to do compressed transfers over HTTP ignoring the standard
support for compressed transfers in HTTP. Programming that with a
library doesn't make it "standard".

In Ultraseek, we implemented index synchronization over HTTP with
compression. It wasn't that hard.

I doubt that compression will make a huge difference, Lucene uses
reasonable compression in the indexes already.

wunder

On 10/29/08 10:35 AM, "Noble Paul നോബിള്‍ नोब्ळ्" <no...@gmail.com>
wrote:

> we are not doing anything non-standard
GZipInputStream/GZipOutputStream are
> standards. But asking users to
setup an extra apache is not fair if we can
> manage it with say 5 lines
of code

On Wed, Oct 29, 2008 at 7:44 PM, Walter
> Underwood
<wu...@netflix.com> wrote:
> Why invent something when
> compression is standard in HTTP? --wunder
>
> On 10/29/08 4:35 AM, "Noble Paul
> നോബിള്‍ नोब्ळ्" <no...@gmail.com>
> wrote:
>
>> open a JIRA issue. we
> will use a gzip on both ends of the pipe . On
> the slave
>> side you can
> say
> <str name="zip">true<str>
> as an extra option to compress and
>> send
> data from server
> --Noble
>
>



-- 
--Noble Paul



Re: replication handler - compression

Posted by Noble Paul നോബിള്‍ नोब्ळ् <no...@gmail.com>.
we are not doing anything non-standard
GZipInputStream/GZipOutputStream are standards. But asking users to
setup an extra apache is not fair if we can manage it with say 5 lines
of code

On Wed, Oct 29, 2008 at 7:44 PM, Walter Underwood
<wu...@netflix.com> wrote:
> Why invent something when compression is standard in HTTP? --wunder
>
> On 10/29/08 4:35 AM, "Noble Paul നോബിള്‍ नोब्ळ्" <no...@gmail.com>
> wrote:
>
>> open a JIRA issue. we will use a gzip on both ends of the pipe . On
> the slave
>> side you can say
> <str name="zip">true<str>
> as an extra option to compress and
>> send data from server
> --Noble
>
>



-- 
--Noble Paul

Re: replication handler - compression

Posted by Bill Au <bi...@gmail.com>.
Do keep in mind that compression is a CPU intensive process so it is a trade
off between CPU utilization and network bandwidth.  I have see cases where
compressing the data before a network transfer ended up being slower than
without compression because the cost of compression and un-compression was
more than the gain in network transfer.

Bill

On Wed, Oct 29, 2008 at 7:35 AM, Noble Paul നോബിള്‍ नोब्ळ् <
noble.paul@gmail.com> wrote:

> open a JIRA issue. we will use a gzip on both ends of the pipe . On
> the slave side you can say
> <str name="zip">true<str>
> as an extra option to compress and send data from server
> --Noble
>
>
>
>
> On Wed, Oct 29, 2008 at 3:06 PM, Simon Collins
> <si...@shoe-shop.com> wrote:
> > I have now optimized the index - down to 325mb, it compresses down to
> 20mb.
> >
> > I think the new replication thing in solr is great, but if it could
> compress the files it's sending, it would be an awful lot more useful when
> replicating, as we are, between sites.
> >
> >
> >
> > --------------------------------------------------------
> >
> > Simon Collins
> > Systems Analyst
> >
> > Telephone: 01904 606 867
> > Fax Number: 01904 528 791
> >
> > shoe-shop.com ltd
> > Catherine House
> > Northminster Business Park
> > Upper Poppleton, YORK
> > YO26 6QU
> > www.shoe-shop.com
> > --------------------------------------------------------
> >
> > This message (and any associated files) is intended only for the use of
> the individual or entity to which it is addressed and may contain
> information that is confidential, subject to copyright or constitutes a
> trade secret. If you are not the intended recipient you are hereby notified
> that any dissemination, copying or distribution of this message, or files
> associated with this message, is strictly prohibited. If you have received
> this message in error, please notify us immediately by replying to the
> message and deleting it from your computer. Messages sent to and from us may
> be monitored.
> >
> > Internet communications cannot be guaranteed to be secure or error-free
> as information could be intercepted, corrupted, lost, destroyed, arrive late
> or incomplete, or contain viruses. Therefore, we do not accept
> responsibility for any errors or omissions that are present in this message,
> or any attachment, that have arisen as a result of e-mail transmission. If
> verification is required, please request a hard-copy version. Any views or
> opinions presented are solely those of the author and do not necessarily
> represent those of the company. (PAVD001)
> > Shoe-shop.com Limited is a company registered in England and Wales with
> company number 03817232. Vat Registration GB 734 256 241. Registered Office
> Catherine House, Northminster Business Park, Upper Poppleton, YORK, YO26
> 6QU.
> >
> >
> > -----Original Message-----
> >
> > From: Noble Paul നോബിള്‍ नोब्ळ् [mailto:noble.paul@gmail.com]
> > Sent: 29 October 2008 03:29
> > To: solr-user@lucene.apache.org
> > Subject: Re: replication handler - compression
> >
> > The new replication feature does not use any unix commands , it is
> > pure java.  On the fly compression is hard but possible.
> > I wish to repeat the question. Did you optimize the index? Because a
> > 10:1 compression is not usually observed in an optimized index. Our
> > own experiments showed compression of around 10:6 for optimized
> > indexes.
> >
> > --Noble
> >
> > On Wed, Oct 29, 2008 at 3:41 AM, Lance Norskog <go...@gmail.com>
> wrote:
> >> Aha! The hint to the actual problem: "When compressed with winzip". You
> are running Solr on Windows.
> >>
> >> Snapshots don't work on Windows: they depend on a Unix file system
> feature. You may be copying the entire index. Not just that, it could be
> inconsistent.
> >> This is a fine topic for a "best practices for Windows" wiki page.
> >>
> >> The 'scp' program what you want. It has an option to compress on the fly
> without saving anything to disk. 'Rcopy' in particular has features to only
> copy what is not already at the target.  The Putty suite 'pscp' program also
> has the compression feature.
> >>
> >> Lance
> >>
> >> -----Original Message-----
> >> From: Noble Paul നോബിള്‍ नोब्ळ् [mailto:noble.paul@gmail.com]
> >> Sent: Monday, October 27, 2008 9:36 PM
> >> To: solr-user@lucene.apache.org
> >> Subject: Re: replication handler - compression
> >>
> >>> It is useful only if your bandwidth is very low.
> >>> Otherwise the cost of copying/comprressing/decompressing can take up
> >>> more time than we save.
> >>
> >> I mean compressing and transferring. If the optimized index itself has a
> very high compression ratio  then it is worth exploring the option of
> compresssing and transferring. And do not assume that all the files in the
> index directory is transferred during replication. It only transfers the
> files which are used by the current commit point and the ones which are
> absent in the slave
> >>
> >>
> >>>
> >>>
> >>>
> >>> On Tue, Oct 28, 2008 at 2:49 AM, Simon Collins
> >>> <si...@shoe-shop.com> wrote:
> >>>> Is there an option on the replication handler to compress the files?
> >>>>
> >>>>
> >>>>
> >>>> I'm trying to replicate off site, and seem to have accumulated about
> >>>> 1.4gb. When compressed with winzip of all things i can get this down
> >>>> to about 10% of the size.
> >>>>
> >>>>
> >>>>
> >>>> Is compression in the pipeline / can it be if not!
> >>>>
> >>>>
> >>>>
> >>>> simon
> >>>>
> >>>>
> >>>>
> >>>> This message has been scanned for malware by SurfControl plc.
> >>>> www.surfcontrol.com
> >>>>
> >>>
> >>>
> >>>
> >>> --
> >>> --Noble Paul
> >>>
> >>
> >>
> >>
> >> --
> >> --Noble Paul
> >>
> >>
> >
> >
> >
> > --
> > --Noble Paul
> >
>
>
>
> --
> --Noble Paul
>

Re: replication handler - compression

Posted by Walter Underwood <wu...@netflix.com>.
Why invent something when compression is standard in HTTP? --wunder

On 10/29/08 4:35 AM, "Noble Paul നോബിള്‍ नोब्ळ्" <no...@gmail.com>
wrote:

> open a JIRA issue. we will use a gzip on both ends of the pipe . On
the slave
> side you can say
<str name="zip">true<str>
as an extra option to compress and
> send data from server
--Noble


Re: replication handler - compression

Posted by Noble Paul നോബിള്‍ नोब्ळ् <no...@gmail.com>.
open a JIRA issue. we will use a gzip on both ends of the pipe . On
the slave side you can say
<str name="zip">true<str>
as an extra option to compress and send data from server
--Noble




On Wed, Oct 29, 2008 at 3:06 PM, Simon Collins
<si...@shoe-shop.com> wrote:
> I have now optimized the index - down to 325mb, it compresses down to 20mb.
>
> I think the new replication thing in solr is great, but if it could compress the files it's sending, it would be an awful lot more useful when replicating, as we are, between sites.
>
>
>
> --------------------------------------------------------
>
> Simon Collins
> Systems Analyst
>
> Telephone: 01904 606 867
> Fax Number: 01904 528 791
>
> shoe-shop.com ltd
> Catherine House
> Northminster Business Park
> Upper Poppleton, YORK
> YO26 6QU
> www.shoe-shop.com
> --------------------------------------------------------
>
> This message (and any associated files) is intended only for the use of the individual or entity to which it is addressed and may contain information that is confidential, subject to copyright or constitutes a trade secret. If you are not the intended recipient you are hereby notified that any dissemination, copying or distribution of this message, or files associated with this message, is strictly prohibited. If you have received this message in error, please notify us immediately by replying to the message and deleting it from your computer. Messages sent to and from us may be monitored.
>
> Internet communications cannot be guaranteed to be secure or error-free as information could be intercepted, corrupted, lost, destroyed, arrive late or incomplete, or contain viruses. Therefore, we do not accept responsibility for any errors or omissions that are present in this message, or any attachment, that have arisen as a result of e-mail transmission. If verification is required, please request a hard-copy version. Any views or opinions presented are solely those of the author and do not necessarily represent those of the company. (PAVD001)
> Shoe-shop.com Limited is a company registered in England and Wales with company number 03817232. Vat Registration GB 734 256 241. Registered Office Catherine House, Northminster Business Park, Upper Poppleton, YORK, YO26 6QU.
>
>
> -----Original Message-----
>
> From: Noble Paul നോബിള്‍ नोब्ळ् [mailto:noble.paul@gmail.com]
> Sent: 29 October 2008 03:29
> To: solr-user@lucene.apache.org
> Subject: Re: replication handler - compression
>
> The new replication feature does not use any unix commands , it is
> pure java.  On the fly compression is hard but possible.
> I wish to repeat the question. Did you optimize the index? Because a
> 10:1 compression is not usually observed in an optimized index. Our
> own experiments showed compression of around 10:6 for optimized
> indexes.
>
> --Noble
>
> On Wed, Oct 29, 2008 at 3:41 AM, Lance Norskog <go...@gmail.com> wrote:
>> Aha! The hint to the actual problem: "When compressed with winzip". You are running Solr on Windows.
>>
>> Snapshots don't work on Windows: they depend on a Unix file system feature. You may be copying the entire index. Not just that, it could be inconsistent.
>> This is a fine topic for a "best practices for Windows" wiki page.
>>
>> The 'scp' program what you want. It has an option to compress on the fly without saving anything to disk. 'Rcopy' in particular has features to only copy what is not already at the target.  The Putty suite 'pscp' program also has the compression feature.
>>
>> Lance
>>
>> -----Original Message-----
>> From: Noble Paul നോബിള്‍ नोब्ळ् [mailto:noble.paul@gmail.com]
>> Sent: Monday, October 27, 2008 9:36 PM
>> To: solr-user@lucene.apache.org
>> Subject: Re: replication handler - compression
>>
>>> It is useful only if your bandwidth is very low.
>>> Otherwise the cost of copying/comprressing/decompressing can take up
>>> more time than we save.
>>
>> I mean compressing and transferring. If the optimized index itself has a very high compression ratio  then it is worth exploring the option of compresssing and transferring. And do not assume that all the files in the index directory is transferred during replication. It only transfers the files which are used by the current commit point and the ones which are absent in the slave
>>
>>
>>>
>>>
>>>
>>> On Tue, Oct 28, 2008 at 2:49 AM, Simon Collins
>>> <si...@shoe-shop.com> wrote:
>>>> Is there an option on the replication handler to compress the files?
>>>>
>>>>
>>>>
>>>> I'm trying to replicate off site, and seem to have accumulated about
>>>> 1.4gb. When compressed with winzip of all things i can get this down
>>>> to about 10% of the size.
>>>>
>>>>
>>>>
>>>> Is compression in the pipeline / can it be if not!
>>>>
>>>>
>>>>
>>>> simon
>>>>
>>>>
>>>>
>>>> This message has been scanned for malware by SurfControl plc.
>>>> www.surfcontrol.com
>>>>
>>>
>>>
>>>
>>> --
>>> --Noble Paul
>>>
>>
>>
>>
>> --
>> --Noble Paul
>>
>>
>
>
>
> --
> --Noble Paul
>



-- 
--Noble Paul

RE: replication handler - compression

Posted by Simon Collins <si...@shoe-shop.com>.
I have now optimized the index - down to 325mb, it compresses down to 20mb.

I think the new replication thing in solr is great, but if it could compress the files it's sending, it would be an awful lot more useful when replicating, as we are, between sites.



--------------------------------------------------------

Simon Collins
Systems Analyst

Telephone: 01904 606 867
Fax Number: 01904 528 791

shoe-shop.com ltd
Catherine House
Northminster Business Park
Upper Poppleton, YORK
YO26 6QU
www.shoe-shop.com
--------------------------------------------------------

This message (and any associated files) is intended only for the use of the individual or entity to which it is addressed and may contain information that is confidential, subject to copyright or constitutes a trade secret. If you are not the intended recipient you are hereby notified that any dissemination, copying or distribution of this message, or files associated with this message, is strictly prohibited. If you have received this message in error, please notify us immediately by replying to the message and deleting it from your computer. Messages sent to and from us may be monitored. 

Internet communications cannot be guaranteed to be secure or error-free as information could be intercepted, corrupted, lost, destroyed, arrive late or incomplete, or contain viruses. Therefore, we do not accept responsibility for any errors or omissions that are present in this message, or any attachment, that have arisen as a result of e-mail transmission. If verification is required, please request a hard-copy version. Any views or opinions presented are solely those of the author and do not necessarily represent those of the company. (PAVD001) 
Shoe-shop.com Limited is a company registered in England and Wales with company number 03817232. Vat Registration GB 734 256 241. Registered Office Catherine House, Northminster Business Park, Upper Poppleton, YORK, YO26 6QU.


-----Original Message-----

From: Noble Paul നോബിള്‍ नोब्ळ् [mailto:noble.paul@gmail.com] 
Sent: 29 October 2008 03:29
To: solr-user@lucene.apache.org
Subject: Re: replication handler - compression

The new replication feature does not use any unix commands , it is
pure java.  On the fly compression is hard but possible.
I wish to repeat the question. Did you optimize the index? Because a
10:1 compression is not usually observed in an optimized index. Our
own experiments showed compression of around 10:6 for optimized
indexes.

--Noble

On Wed, Oct 29, 2008 at 3:41 AM, Lance Norskog <go...@gmail.com> wrote:
> Aha! The hint to the actual problem: "When compressed with winzip". You are running Solr on Windows.
>
> Snapshots don't work on Windows: they depend on a Unix file system feature. You may be copying the entire index. Not just that, it could be inconsistent.
> This is a fine topic for a "best practices for Windows" wiki page.
>
> The 'scp' program what you want. It has an option to compress on the fly without saving anything to disk. 'Rcopy' in particular has features to only copy what is not already at the target.  The Putty suite 'pscp' program also has the compression feature.
>
> Lance
>
> -----Original Message-----
> From: Noble Paul നോബിള്‍ नोब्ळ् [mailto:noble.paul@gmail.com]
> Sent: Monday, October 27, 2008 9:36 PM
> To: solr-user@lucene.apache.org
> Subject: Re: replication handler - compression
>
>> It is useful only if your bandwidth is very low.
>> Otherwise the cost of copying/comprressing/decompressing can take up
>> more time than we save.
>
> I mean compressing and transferring. If the optimized index itself has a very high compression ratio  then it is worth exploring the option of compresssing and transferring. And do not assume that all the files in the index directory is transferred during replication. It only transfers the files which are used by the current commit point and the ones which are absent in the slave
>
>
>>
>>
>>
>> On Tue, Oct 28, 2008 at 2:49 AM, Simon Collins
>> <si...@shoe-shop.com> wrote:
>>> Is there an option on the replication handler to compress the files?
>>>
>>>
>>>
>>> I'm trying to replicate off site, and seem to have accumulated about
>>> 1.4gb. When compressed with winzip of all things i can get this down
>>> to about 10% of the size.
>>>
>>>
>>>
>>> Is compression in the pipeline / can it be if not!
>>>
>>>
>>>
>>> simon
>>>
>>>
>>>
>>> This message has been scanned for malware by SurfControl plc.
>>> www.surfcontrol.com
>>>
>>
>>
>>
>> --
>> --Noble Paul
>>
>
>
>
> --
> --Noble Paul
>
>



-- 
--Noble Paul

Re: replication handler - compression

Posted by Noble Paul നോബിള്‍ नोब्ळ् <no...@gmail.com>.
The new replication feature does not use any unix commands , it is
pure java.  On the fly compression is hard but possible.
I wish to repeat the question. Did you optimize the index? Because a
10:1 compression is not usually observed in an optimized index. Our
own experiments showed compression of around 10:6 for optimized
indexes.

--Noble

On Wed, Oct 29, 2008 at 3:41 AM, Lance Norskog <go...@gmail.com> wrote:
> Aha! The hint to the actual problem: "When compressed with winzip". You are running Solr on Windows.
>
> Snapshots don't work on Windows: they depend on a Unix file system feature. You may be copying the entire index. Not just that, it could be inconsistent.
> This is a fine topic for a "best practices for Windows" wiki page.
>
> The 'scp' program what you want. It has an option to compress on the fly without saving anything to disk. 'Rcopy' in particular has features to only copy what is not already at the target.  The Putty suite 'pscp' program also has the compression feature.
>
> Lance
>
> -----Original Message-----
> From: Noble Paul നോബിള്‍ नोब्ळ् [mailto:noble.paul@gmail.com]
> Sent: Monday, October 27, 2008 9:36 PM
> To: solr-user@lucene.apache.org
> Subject: Re: replication handler - compression
>
>> It is useful only if your bandwidth is very low.
>> Otherwise the cost of copying/comprressing/decompressing can take up
>> more time than we save.
>
> I mean compressing and transferring. If the optimized index itself has a very high compression ratio  then it is worth exploring the option of compresssing and transferring. And do not assume that all the files in the index directory is transferred during replication. It only transfers the files which are used by the current commit point and the ones which are absent in the slave
>
>
>>
>>
>>
>> On Tue, Oct 28, 2008 at 2:49 AM, Simon Collins
>> <si...@shoe-shop.com> wrote:
>>> Is there an option on the replication handler to compress the files?
>>>
>>>
>>>
>>> I'm trying to replicate off site, and seem to have accumulated about
>>> 1.4gb. When compressed with winzip of all things i can get this down
>>> to about 10% of the size.
>>>
>>>
>>>
>>> Is compression in the pipeline / can it be if not!
>>>
>>>
>>>
>>> simon
>>>
>>>
>>>
>>> This message has been scanned for malware by SurfControl plc.
>>> www.surfcontrol.com
>>>
>>
>>
>>
>> --
>> --Noble Paul
>>
>
>
>
> --
> --Noble Paul
>
>



-- 
--Noble Paul

RE: Query integer type

Posted by "Nguyen, Joe" <jn...@automotive.com>.
Never mind.  I misused the syntax.  :-)

-----Original Message-----
From: Nguyen, Joe [mailto:jnguyen@automotive.com] 
Sent: Tuesday, October 28, 2008 7:00 Joe
To: solr-user@lucene.apache.org
Subject: Query integer type

SITE is defined as integer.  I wanted to select all document whose SITE=3002, but SITE of the response was different.  

<field name="SITE" type="integer" indexed="true" stored="true" required="true"/>


http://localhost:8080/solr/mysite/select?indent=on&qt=standard&fl=SITE&fq:SITE:3002

http://localhost:8080/solr/mysite/select?indent=on&qt=dismax&fl=SITE&fq:SITE:3002

http://localhost:8080/solr/mysite/select?indent=on&qt=standard&fl=SITE&SITE:3002




<result name="response" numFound="470" start="0">
−
<doc>
<int name="SITE">1</int>
</doc>
−
<doc>
<int name="SITE">1</int>
</doc>
−
<doc>
<int name="SITE">1</int>
</doc>
−
<doc>
<int name="SITE">1</int>
</doc>
−
<doc>
<int name="SITE">2</int>
....

Field Analysis

Index Analyzer
org.apache.solr.schema.FieldType$DefaultAnalyzer {}
term position 	1
term text 	3002
term type 	word
source start,end 	0,4
payload 	

Query Analyzer
org.apache.solr.schema.FieldType$DefaultAnalyzer {}
term position 	1
term text 	3002
term type 	word
source start,end 	0,4
payload 	  

Should term type be integer?

Any suggestion?

Cheers
  

Query integer type

Posted by "Nguyen, Joe" <jn...@automotive.com>.
SITE is defined as integer.  I wanted to select all document whose SITE=3002, but SITE of the response was different.  

<field name="SITE" type="integer" indexed="true" stored="true" required="true"/>


http://localhost:8080/solr/mysite/select?indent=on&qt=standard&fl=SITE&fq:SITE:3002

http://localhost:8080/solr/mysite/select?indent=on&qt=dismax&fl=SITE&fq:SITE:3002

http://localhost:8080/solr/mysite/select?indent=on&qt=standard&fl=SITE&SITE:3002




<result name="response" numFound="470" start="0">
−
<doc>
<int name="SITE">1</int>
</doc>
−
<doc>
<int name="SITE">1</int>
</doc>
−
<doc>
<int name="SITE">1</int>
</doc>
−
<doc>
<int name="SITE">1</int>
</doc>
−
<doc>
<int name="SITE">2</int>
....

Field Analysis

Index Analyzer
org.apache.solr.schema.FieldType$DefaultAnalyzer {}
term position 	1
term text 	3002
term type 	word
source start,end 	0,4
payload 	

Query Analyzer
org.apache.solr.schema.FieldType$DefaultAnalyzer {}
term position 	1
term text 	3002
term type 	word
source start,end 	0,4
payload 	  

Should term type be integer?

Any suggestion?

Cheers
  

RE: replication handler - compression

Posted by Lance Norskog <go...@gmail.com>.
Aha! The hint to the actual problem: "When compressed with winzip". You are running Solr on Windows.

Snapshots don't work on Windows: they depend on a Unix file system feature. You may be copying the entire index. Not just that, it could be inconsistent.
This is a fine topic for a "best practices for Windows" wiki page.

The 'scp' program what you want. It has an option to compress on the fly without saving anything to disk. 'Rcopy' in particular has features to only copy what is not already at the target.  The Putty suite 'pscp' program also has the compression feature.

Lance

-----Original Message-----
From: Noble Paul നോബിള്‍ नोब्ळ् [mailto:noble.paul@gmail.com] 
Sent: Monday, October 27, 2008 9:36 PM
To: solr-user@lucene.apache.org
Subject: Re: replication handler - compression

> It is useful only if your bandwidth is very low.
> Otherwise the cost of copying/comprressing/decompressing can take up 
> more time than we save.

I mean compressing and transferring. If the optimized index itself has a very high compression ratio  then it is worth exploring the option of compresssing and transferring. And do not assume that all the files in the index directory is transferred during replication. It only transfers the files which are used by the current commit point and the ones which are absent in the slave


>
>
>
> On Tue, Oct 28, 2008 at 2:49 AM, Simon Collins 
> <si...@shoe-shop.com> wrote:
>> Is there an option on the replication handler to compress the files?
>>
>>
>>
>> I'm trying to replicate off site, and seem to have accumulated about 
>> 1.4gb. When compressed with winzip of all things i can get this down 
>> to about 10% of the size.
>>
>>
>>
>> Is compression in the pipeline / can it be if not!
>>
>>
>>
>> simon
>>
>>
>>
>> This message has been scanned for malware by SurfControl plc. 
>> www.surfcontrol.com
>>
>
>
>
> --
> --Noble Paul
>



--
--Noble Paul


Re: replication handler - compression

Posted by Noble Paul നോബിള്‍ नोब्ळ् <no...@gmail.com>.
> It is useful only if your bandwidth is very low.
> Otherwise the cost of copying/comprressing/decompressing can take up
> more time than we save.

I mean compressing and transferring. If the optimized index itself has
a very high compression ratio  then it is worth exploring the option
of compresssing and transferring. And do not assume that all the files
in the index directory is transferred during replication. It only
transfers the files which are used by the current commit point and the
ones which are absent in the slave


>
>
>
> On Tue, Oct 28, 2008 at 2:49 AM, Simon Collins
> <si...@shoe-shop.com> wrote:
>> Is there an option on the replication handler to compress the files?
>>
>>
>>
>> I'm trying to replicate off site, and seem to have accumulated about
>> 1.4gb. When compressed with winzip of all things i can get this down to
>> about 10% of the size.
>>
>>
>>
>> Is compression in the pipeline / can it be if not!
>>
>>
>>
>> simon
>>
>>
>>
>> This message has been scanned for malware by SurfControl plc. www.surfcontrol.com
>>
>
>
>
> --
> --Noble Paul
>



-- 
--Noble Paul

Re: replication handler - compression

Posted by Noble Paul നോബിള്‍ नोब्ळ् <no...@gmail.com>.
Are you sure you optimized the index?
It is useful only if your bandwidth is very low.
Otherwise the cost of copying/comprressing/decompressing can take up
more time than we save.



On Tue, Oct 28, 2008 at 2:49 AM, Simon Collins
<si...@shoe-shop.com> wrote:
> Is there an option on the replication handler to compress the files?
>
>
>
> I'm trying to replicate off site, and seem to have accumulated about
> 1.4gb. When compressed with winzip of all things i can get this down to
> about 10% of the size.
>
>
>
> Is compression in the pipeline / can it be if not!
>
>
>
> simon
>
>
>
> This message has been scanned for malware by SurfControl plc. www.surfcontrol.com
>



-- 
--Noble Paul