You are viewing a plain text version of this content. The canonical link for it is here.

Posted to dev@lucene.apache.org by "Jason Rutherglen (JIRA)" <ji...@apache.org> on 2008/12/02 16:50:44 UTC

[jira] Created: (LUCENE-1473) Implement Externalizable in main top level searcher classes

Implement Externalizable in main top level searcher classes
-----------------------------------------------------------

                 Key: LUCENE-1473
                 URL: https://issues.apache.org/jira/browse/LUCENE-1473
             Project: Lucene - Java
          Issue Type: Bug
          Components: Search
    Affects Versions: 2.4
            Reporter: Jason Rutherglen
            Priority: Minor


To maintain serialization compatibility between Lucene versions, major classes can implement Externalizable.  This will make Serialization faster due to no reflection required and maintain backwards compatibility.  

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org

Re: [jira] Commented: (LUCENE-1473) Implement Externalizable in main top level searcher classes

Posted by John Wang <jo...@gmail.com>.

You are right, we can always transmit the string form and re-parse on the
other-end.
Our problem is that we took this (serialization nature) for granted, and
once something is deployed over a cluster, it would be difficult to do
partial roll-outs in this case. But I guess there is no immediate remedy for
this.

Since we all agree careful scrutiny is a good thing:

ScoreDocComparator.sortValue(), according to its javadoc: "The object
returned must implement the java.io.Serializable interface."

This has implicit implications how a distributed system should be designed
around lucene, in my case result merge. You cannot transmit Strings or any
other representatives around, because you don't know what the Comparable
instance is (when SortField.type is set to Custom). I am curious, how would
distributed Solr handle this without resorting to Java serialization?

A side note, do you think returning Comparable here is good api design,
shouldn't it be some sub-interface that extends both Comparable and
Serializable, instead of resorting to javadoc?

Thanks

-John

On Wed, Dec 3, 2008 at 10:19 AM, Doug Cutting (JIRA) <ji...@apache.org>wrote:

>
>    [
> https://issues.apache.org/jira/browse/LUCENE-1473?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12652882#action_12652882]
>
> Doug Cutting commented on LUCENE-1473:
> --------------------------------------
>
> > But, what's now being asked for (expected) with this issue is "long-term
> persistence", which is really a very different beast and a much taller
> order.
>
> That's the crux, alright.  Does Lucene want to start adding cross-version
> guarantees about the durability of its objects when serialized by Java
> serialization.  This is a hard problem.  Systems like Thrift and
> ProtocolBuffers offer support for this, but Java Serialiation itself doesn't
> really provide much assistance.  One can roll one's own serialization
> compatibility story manually, as proposed by this patch, but that adds a
> burden to the project.  We'd need, for example, test cases that keep
> serialized instances from past versions, so that we can be sure that patches
> do not break this.
>
> The use case provided may not use RMI, but it is similar: it involves
> transmitting Lucene objects over the wire between different versions of
> Lucene.  Since Java APIs, like Lucene, do not generally provide
> cross-version compatibility, it would be safer to architect such a system so
> that it controls the serialization of transmitted instances itself and can
> thus guarantee their compatibility as the system is updated.  Thus it would
> develop its own representations for queries independent of Lucene's Query,
> and map this to Lucene's Query.  Is that not workable in this case?
>
>
> > Implement Externalizable in main top level searcher classes
> > -----------------------------------------------------------
> >
> >                 Key: LUCENE-1473
> >                 URL: https://issues.apache.org/jira/browse/LUCENE-1473
> >             Project: Lucene - Java
> >          Issue Type: Bug
> >          Components: Search
> >    Affects Versions: 2.4
> >            Reporter: Jason Rutherglen
> >            Priority: Minor
> >         Attachments: LUCENE-1473.patch
> >
> >
> > To maintain serialization compatibility between Lucene versions, major
> classes can implement Externalizable.  This will make Serialization faster
> due to no reflection required and maintain backwards compatibility.
>
> --
> This message is automatically generated by JIRA.
> -
> You can reply to this email to add a comment to the issue online.
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-dev-help@lucene.apache.org
>
>

[jira] Updated: (LUCENE-1473) Implement standard Serialization across Lucene versions

Posted by "Jason Rutherglen (JIRA)" <ji...@apache.org>.

     [ https://issues.apache.org/jira/browse/LUCENE-1473?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Jason Rutherglen updated LUCENE-1473:
-------------------------------------

    Attachment: LUCENE-1473.patch

LUCENE-1473.patch

serialVersionUID added to the relevant classes manually.  Defaulted to 10 because it does not matter, as long it is different between versions.  Thought of writing some code to go through the Lucene JAR, do an instanceof on the classes for Serializable and then verify that the serialVersionUID is 10.  

Term implements Externalizable.  

SerializationUtils was adapted from WriteableUtils of Hadoop for writing VLong.  

TestSerialization use case does term serialization and serializes an arbitrary query to a file and compares them.  

TODO: 
- Implement Externalizable
- More unit tests?  How to write a unit test for multiple versions?

> Implement standard Serialization across Lucene versions
> -------------------------------------------------------
>
>                 Key: LUCENE-1473
>                 URL: https://issues.apache.org/jira/browse/LUCENE-1473
>             Project: Lucene - Java
>          Issue Type: Bug
>          Components: Search
>    Affects Versions: 2.4
>            Reporter: Jason Rutherglen
>            Priority: Minor
>         Attachments: LUCENE-1473.patch, LUCENE-1473.patch
>
>   Original Estimate: 8h
>  Remaining Estimate: 8h
>
> To maintain serialization compatibility between Lucene versions, serialVersionUID needs to be added to classes that implement java.io.Serializable.  java.io.Externalizable may be implemented in classes for faster performance.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org

[jira] Commented: (LUCENE-1473) Implement Externalizable in main top level searcher classes

Posted by "Doug Cutting (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/LUCENE-1473?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12652939#action_12652939 ] 

Doug Cutting commented on LUCENE-1473:
--------------------------------------

The documentation should probably be fixed to state that Lucene's use of Serializeable currently assumes that all parties are using the exact same version of Lucene.  That's the default for Serializeable, but it probably bears stating explicitly.  Then we should decide, going forward, whether this should change, and, if so, for which classes and how.  Such a policy should be agreed on before code is written, no?

> Implement Externalizable in main top level searcher classes
> -----------------------------------------------------------
>
>                 Key: LUCENE-1473
>                 URL: https://issues.apache.org/jira/browse/LUCENE-1473
>             Project: Lucene - Java
>          Issue Type: Bug
>          Components: Search
>    Affects Versions: 2.4
>            Reporter: Jason Rutherglen
>            Priority: Minor
>         Attachments: LUCENE-1473.patch
>
>
> To maintain serialization compatibility between Lucene versions, major classes can implement Externalizable.  This will make Serialization faster due to no reflection required and maintain backwards compatibility.  

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org

[jira] Commented: (LUCENE-1473) Implement standard Serialization across Lucene versions

Posted by "John Wang (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/LUCENE-1473?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12653545#action_12653545 ] 

John Wang commented on LUCENE-1473:
-----------------------------------

The discussion here is whether it is better to have 100% of the time failing vs. 10% of the time failing. (these are just meaningless numbers to express a point)
I do buy Doug's comment about getting into a weird state due to data serialization, but this is something Externalizable would solve.
This discussion has digressed to general Java serialization design, where it originally scoped only to several lucene classes.

If it is documented that lucene only supports serialization of classes from the same jar, is that really enough, doesn't it also depend on the compiler, if someone were to build their own jar?

Furthermore, in a distributed environment with lotsa machines, it is always idea to upgrade bit by bit, is taking this functionality away by imposing this restriction a good trade-off to just implementing Externalizable for a few classes, if Serializable is deemed to be dangerous, which I am not so sure given the lucene classes we are talking about.

> Implement standard Serialization across Lucene versions
> -------------------------------------------------------
>
>                 Key: LUCENE-1473
>                 URL: https://issues.apache.org/jira/browse/LUCENE-1473
>             Project: Lucene - Java
>          Issue Type: Bug
>          Components: Search
>    Affects Versions: 2.4
>            Reporter: Jason Rutherglen
>            Priority: Minor
>         Attachments: LUCENE-1473.patch
>
>   Original Estimate: 8h
>  Remaining Estimate: 8h
>
> To maintain serialization compatibility between Lucene versions, serialVersionUID needs to be added to classes that implement java.io.Serializable.  java.io.Externalizable may be implemented in classes for faster performance.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org

[jira] Updated: (LUCENE-1473) Implement standard Serialization across Lucene versions

Posted by "Wolf Siberski (JIRA)" <ji...@apache.org>.

     [ https://issues.apache.org/jira/browse/LUCENE-1473?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Wolf Siberski updated LUCENE-1473:
----------------------------------

    Attachment: lucene-contrib-remote.patch

This patch removes all dependencies to Serializable and Remote from the core and adds contrib/remote as replacement

> Implement standard Serialization across Lucene versions
> -------------------------------------------------------
>
>                 Key: LUCENE-1473
>                 URL: https://issues.apache.org/jira/browse/LUCENE-1473
>             Project: Lucene - Java
>          Issue Type: Bug
>          Components: Search
>    Affects Versions: 2.4
>            Reporter: Jason Rutherglen
>            Priority: Minor
>         Attachments: custom-externalizable-reader.patch, LUCENE-1473.patch, LUCENE-1473.patch, LUCENE-1473.patch, LUCENE-1473.patch, lucene-contrib-remote.patch
>
>   Original Estimate: 8h
>  Remaining Estimate: 8h
>
> To maintain serialization compatibility between Lucene versions, serialVersionUID needs to be added to classes that implement java.io.Serializable.  java.io.Externalizable may be implemented in classes for faster performance.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org

[jira] Commented: (LUCENE-1473) Implement Externalizable in main top level searcher classes

Posted by "Mark Miller (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/LUCENE-1473?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12652569#action_12652569 ] 

Mark Miller commented on LUCENE-1473:
-------------------------------------

I share Michaels concerns. Whats the motivation for core Lucene classes supporting serialization and is it strong enough to warrant these changes? It comes with a cost even without the mentioned annoyances right? (which are only for the current class, there may be more?)

> Implement Externalizable in main top level searcher classes
> -----------------------------------------------------------
>
>                 Key: LUCENE-1473
>                 URL: https://issues.apache.org/jira/browse/LUCENE-1473
>             Project: Lucene - Java
>          Issue Type: Bug
>          Components: Search
>    Affects Versions: 2.4
>            Reporter: Jason Rutherglen
>            Priority: Minor
>         Attachments: LUCENE-1473.patch
>
>
> To maintain serialization compatibility between Lucene versions, major classes can implement Externalizable.  This will make Serialization faster due to no reflection required and maintain backwards compatibility.  

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org

[jira] Commented: (LUCENE-1473) Implement standard Serialization across Lucene versions

Posted by "Doug Cutting (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/LUCENE-1473?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12655894#action_12655894 ] 

Doug Cutting commented on LUCENE-1473:
--------------------------------------

> shift the problem around but do not really solve the underlying issues 

That's the idea, actually, to shift it out of the core into contrib.  We could use Externalizeable there, with no XML.

> Deprecating serialization entirely needs to be taken to the java-user mailing list as there are quite a number of installations relying on it.

No, we make decisions on the java-dev mailing list.  Also, it won't go away, folks might just have to update their code to use different APIs if and when when they upgrade to 3.0.


> Implement standard Serialization across Lucene versions
> -------------------------------------------------------
>
>                 Key: LUCENE-1473
>                 URL: https://issues.apache.org/jira/browse/LUCENE-1473
>             Project: Lucene - Java
>          Issue Type: Bug
>          Components: Search
>    Affects Versions: 2.4
>            Reporter: Jason Rutherglen
>            Priority: Minor
>         Attachments: custom-externalizable-reader.patch, LUCENE-1473.patch, LUCENE-1473.patch, LUCENE-1473.patch, LUCENE-1473.patch, lucene-contrib-remote.patch
>
>   Original Estimate: 8h
>  Remaining Estimate: 8h
>
> To maintain serialization compatibility between Lucene versions, serialVersionUID needs to be added to classes that implement java.io.Serializable.  java.io.Externalizable may be implemented in classes for faster performance.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org

[jira] Commented: (LUCENE-1473) Implement standard Serialization across Lucene versions

Posted by "Mark Harwood (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/LUCENE-1473?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12653057#action_12653057 ] 

Mark Harwood commented on LUCENE-1473:
--------------------------------------

The contrib section of Lucene contains an XML-based query parser which aims to provide full-coverage of Lucene queries/filters and provide extensibility to support 3rd party classes.
I use this regularly in distributed deployments and this allows both non-Java clients and long-term persistence of queries with good stability across Lucene versions.
Although I have not conducted formal benchmarks I have not been drawn to XML parsing as a bottleneck - search execution and/or document retrieves are normally the main bottlenecks.

Maintaining XML parsing code is an overhead but ultimately helps decouple requests from the logic that executes requests. In serializing Lucene Query/Filter objects we are dealing with the classes which combine both the representation of the request criteria (what needs to be done) and the implementation (how things are done). We are forever finessing the "how" bit of this equation e.g. moving from RangeQuery to RangeFilters to TrieRangeFilter. The criteria however remains relatively static (" I just want to search on a range") and so it is dangerous to build clients that refer tdirectly to query implementation classes.
The XML parser provides a language-independent abstraction for clients to define what they want to be done without being too tied to how this is implemented.

Cheers
Mark



> Implement standard Serialization across Lucene versions
> -------------------------------------------------------
>
>                 Key: LUCENE-1473
>                 URL: https://issues.apache.org/jira/browse/LUCENE-1473
>             Project: Lucene - Java
>          Issue Type: Bug
>          Components: Search
>    Affects Versions: 2.4
>            Reporter: Jason Rutherglen
>            Priority: Minor
>         Attachments: LUCENE-1473.patch
>
>   Original Estimate: 8h
>  Remaining Estimate: 8h
>
> To maintain serialization compatibility between Lucene versions, serialVersionUID needs to be added to classes that implement java.io.Serializable.  java.io.Externalizable may be implemented in classes for faster performance.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org

[jira] Commented: (LUCENE-1473) Implement standard Serialization across Lucene versions

Posted by "Doug Cutting (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/LUCENE-1473?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12653553#action_12653553 ] 

Doug Cutting commented on LUCENE-1473:
--------------------------------------

> This discussion has digressed to general Java serialization design, where it originally scoped only to several lucene classes. 

Which classes?  The existing patch applies to one class.  Jason said, "If it looks ok, I will implement Externalizable in other classes." but never said which.  It would be good to know how wide the impact of the proposed change would be.

> Implement standard Serialization across Lucene versions
> -------------------------------------------------------
>
>                 Key: LUCENE-1473
>                 URL: https://issues.apache.org/jira/browse/LUCENE-1473
>             Project: Lucene - Java
>          Issue Type: Bug
>          Components: Search
>    Affects Versions: 2.4
>            Reporter: Jason Rutherglen
>            Priority: Minor
>         Attachments: LUCENE-1473.patch
>
>   Original Estimate: 8h
>  Remaining Estimate: 8h
>
> To maintain serialization compatibility between Lucene versions, serialVersionUID needs to be added to classes that implement java.io.Serializable.  java.io.Externalizable may be implemented in classes for faster performance.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org

[jira] Commented: (LUCENE-1473) Implement standard Serialization across Lucene versions

Posted by "John Wang (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/LUCENE-1473?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12653378#action_12653378 ] 

John Wang commented on LUCENE-1473:
-----------------------------------

Mike:

       If you have class A implements Serializable, with a defined suid, say 1.

       Let A2 be a newer version of class A, and suid is not changed, say 1.

        Let's say A2 has a new field.

       Imaging A is running in VM1 and A2 is running in VM2. Serialization between VM1 and VM2 of class A is ok, just that A will not get the new fields. Which is fine since VM1 does not make use of it. 

       You can argue that A2 will not get the needed field from serialized A, but isn't that better than crashing?

        Either the case, I think the behavior is better than it is currently. (maybe that's why Eclipse and Findbug both report the lacking of suid definition in lucene code a warning)

       I agree adding Externalizable implementation is more work, but it would make the serialization story correct.

-John


> Implement standard Serialization across Lucene versions
> -------------------------------------------------------
>
>                 Key: LUCENE-1473
>                 URL: https://issues.apache.org/jira/browse/LUCENE-1473
>             Project: Lucene - Java
>          Issue Type: Bug
>          Components: Search
>    Affects Versions: 2.4
>            Reporter: Jason Rutherglen
>            Priority: Minor
>         Attachments: LUCENE-1473.patch
>
>   Original Estimate: 8h
>  Remaining Estimate: 8h
>
> To maintain serialization compatibility between Lucene versions, serialVersionUID needs to be added to classes that implement java.io.Serializable.  java.io.Externalizable may be implemented in classes for faster performance.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org

[jira] Commented: (LUCENE-1473) Implement Externalizable in main top level searcher classes

Posted by "Jason Rutherglen (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/LUCENE-1473?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12652920#action_12652920 ] 

Jason Rutherglen commented on LUCENE-1473:
------------------------------------------

"In regards to Doug's comment about an alternate form... doesn't SOLR already have a XML based query format?  If so, just persist the queries using this. You will be immune to serialization changes (provided the SOLR parser remains backwards compatible)."

SOLR does not have an XML based query format.  XML is not ideal for distributed search because it is slow and verbose.  There are many ways to serialize things, the issue is not in choosing one, but in supporting what most Java libraries do today which is native to the Java platform.  

> Implement Externalizable in main top level searcher classes
> -----------------------------------------------------------
>
>                 Key: LUCENE-1473
>                 URL: https://issues.apache.org/jira/browse/LUCENE-1473
>             Project: Lucene - Java
>          Issue Type: Bug
>          Components: Search
>    Affects Versions: 2.4
>            Reporter: Jason Rutherglen
>            Priority: Minor
>         Attachments: LUCENE-1473.patch
>
>
> To maintain serialization compatibility between Lucene versions, major classes can implement Externalizable.  This will make Serialization faster due to no reflection required and maintain backwards compatibility.  

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org

[jira] Commented: (LUCENE-1473) Implement standard Serialization across Lucene versions

Posted by "Yonik Seeley (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/LUCENE-1473?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12653109#action_12653109 ] 

Yonik Seeley commented on LUCENE-1473:
--------------------------------------

bq. The contrib section of Lucene contains an XML-based query parser which aims to provide full-coverage of Lucene queries

Thanks for the reminder... Solr has pluggable query parsers now, and I've been meaning to check this out as a way to provide a more programmatic query specification.

> Implement standard Serialization across Lucene versions
> -------------------------------------------------------
>
>                 Key: LUCENE-1473
>                 URL: https://issues.apache.org/jira/browse/LUCENE-1473
>             Project: Lucene - Java
>          Issue Type: Bug
>          Components: Search
>    Affects Versions: 2.4
>            Reporter: Jason Rutherglen
>            Priority: Minor
>         Attachments: LUCENE-1473.patch
>
>   Original Estimate: 8h
>  Remaining Estimate: 8h
>
> To maintain serialization compatibility between Lucene versions, serialVersionUID needs to be added to classes that implement java.io.Serializable.  java.io.Externalizable may be implemented in classes for faster performance.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org

[jira] Commented: (LUCENE-1473) Implement Externalizable in main top level searcher classes

Posted by "Hoss Man (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/LUCENE-1473?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12652872#action_12652872 ] 

Hoss Man commented on LUCENE-1473:
----------------------------------

For the record: i have limited understanding of java serialization issues...

bq. At the risk of pissing off the Lucene powerhouse, I feel I have to express some candor. I am growing more and more frustrated with the lack of the open source nature of this project and its unwillingness to work with the developer community.

The developer community consists of hundreds (possibly thousands) of people, who participate at various levels.  At the time the above quoted comment was posted, 4 members of the community had expressed an opinion on this issue: 1 clearly in favor, and 3 questioning the advantages and disadvantages as they affect the *whole* community, both in terms of the performance impacts for existing use cases, and the long term support issues that might come from a change like this.

How anyone could consider these comments and questions "unwillingness to work with the developer community" blows my mind ... i do not see an overwhelming push by the community at large for a feature, i do not see a "Lucene powerhouse" arguing against the will of the masses ... I see two people arguing in favor of a change, and three questioning whether this change is a good idea (i'm not sure if i understand robert's post fully, i believe he's suggesting we maintain the status quo such that serialization is supported but no claims are made about back-compatible serialization).  

I would define that as healthy discussion.


This, to me, seems to be the crux of the issue...

{quote}
Lucene, today, only guarantees "live serialization", and that's the
intention when "implements Serializable" is added to a class.

But, what's now being asked for (expected) with this issue is
"long-term persistence", which is really a very different beast and a
much taller order. With it comes a number of challenges, that warrant
scrutiny:
{quote}

...this jives with even my limited experience with java serialization, and i share Michael's concerns.  The current behavior does not appear to be a bug, it appears to be the correct effects of a serializable class changing between two versions w/o any explicit policy on serialization compatibility.  The changes being discussed  seem to be a request for a new "feature": a back-compat commitment on the serialization of one (or more) classes.  However small the patch may be, it's still a significant change that should not be made lightly ... I certainly think all 4 of Michael's bullet points should be clearly addressed before committing anything, and I agree with Doug's earlier observation regarding performance: we need to test that a new serialization strategy won't adversely affect existing users who rely on (what Michael refered to as) "live serialization".

This is my opinion. I voice it not as a member of any sort of mythical "Lucene powerhouse" but as member of the Lucene community who is is concerned about making sure that decisions are made towards "everyone's interest to make sure Lucene grows in a healthy environment." -- not just the interests of two of vocal people.


> Implement Externalizable in main top level searcher classes
> -----------------------------------------------------------
>
>                 Key: LUCENE-1473
>                 URL: https://issues.apache.org/jira/browse/LUCENE-1473
>             Project: Lucene - Java
>          Issue Type: Bug
>          Components: Search
>    Affects Versions: 2.4
>            Reporter: Jason Rutherglen
>            Priority: Minor
>         Attachments: LUCENE-1473.patch
>
>
> To maintain serialization compatibility between Lucene versions, major classes can implement Externalizable.  This will make Serialization faster due to no reflection required and maintain backwards compatibility.  

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org

Re: [jira] Commented: (LUCENE-1473) Implement Externalizable in main top level searcher classes

Posted by Jason Rutherglen <ja...@gmail.com>.

The tests will be for backwards compatibility with previous versions of
Lucene using the described process of including previous versioned encoded
serialized objects into the test code base.  Similar to how CFS index files
are included in the test code tree.

There is a an elegance to the RemoteSearcher type of code that allows one to
focus on their queries and algorithms and ignore the fact that they are
searching over N machines.

Protocol buffers seem okay.  However given the way that Lucene allows
customizations in things like SortComparatorSource I do not see how protocol
buffers can be used with custom Java classes in the same way Java
serialization works.  If in the future Lucene allows greater customization
such as with scorers, similarities and queries in Lucene 3.0 then marrying
the data with code in a grid environment using protocol buffers gets ugly.
Protocol buffers are nice and can be added to a distributed Lucene
environment, but the cost of implementing them vs. Serialization is much
higher.

Uber distributed search may not be the most common use case right now for
Lucene but as it improves it's capabilities then people will try to use
Lucene in a distributed grid environment.  One could conceivably execute
arbitrarily complex coordinated operations over the standard Lucene 3.0 APIs
without tearing down processes and other worries. Oracle has PL/SQL and
Lucene effectively operates using Java for customized query operations like
PL/SQL.  It would seem natural to at least support Java as a way to execute
customized queries.  The customized queries would be dynamically loaded Java
objects.

In the marketplace Lucene seems to be a good place to do realtime search
based data processing.  At least compared to Sphinx and MG4J.

A little further into the future with SSDs, it should be possible to perform
place replacement of inverted index data using Lucene (at which point it is
similar to a database) and the ability to execute remote code may be very
useful.  Hopefully the APIs for 3.0 will have a goal of being open enough
for this.

On Fri, Dec 5, 2008 at 2:40 PM, Doug Cutting <cu...@apache.org> wrote:

> Jason Rutherglen wrote:
>
>> I think it's best to implement Externalizable as long as someone is
>> willing to maintain it.  I commit to maintaining the Externalizable code.
>>
>
> We need to agree to maintain things as a community, not as individuals.  We
> can't rely on any particular individual being around in the future.
>
>  This will insure forward compatability between serialized versions, make
>> the serialized objects smaller, and make serialization faster.
>>
>
> If we want to promise compatibility we need to scope it and test it.  We
> cannot in good faith promise that Query will be serially compatible forever,
> nor should we make any promises that we don't test.  So if you choose to
> continue promoting this route, please specify the scope of compatibility and
> your plans to add tests for it.
>
>  Apparently it matters enough for Hadoop to implement Writeable in all over
>> the wire classes.
>>
>
> I'm not sure what you're saying here.  As I've said before, Hadoop is
> moving away from Writable because it is too fragile as classes change. As a
> part of the preparations for Hadoop 1.0 we are agreeing on serialization
> back-compatibility requirements and what technology we will use to support
> these.  Hadoop is at its core a distributed system, while Lucene is not.
>  Even then, Hadoop will continue to require that one update all nodes in a
> cluster in a coordinated manner, so only end-user protocols need be
> cross-version compatible, not internal protocols.  I do not yet see a strong
> analogy here.
>
>
> Doug
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-dev-help@lucene.apache.org
>
>

Re: [jira] Commented: (LUCENE-1473) Implement Externalizable in main top level searcher classes

Posted by Doug Cutting <cu...@apache.org>.

Jason Rutherglen wrote:
> I think it's best to implement Externalizable as long as someone is 
> willing to maintain it.  I commit to maintaining the Externalizable 
> code.

We need to agree to maintain things as a community, not as individuals. 
  We can't rely on any particular individual being around in the future.

> This will insure forward compatability between serialized versions, make 
> the serialized objects smaller, and make serialization faster. 

If we want to promise compatibility we need to scope it and test it.  We 
cannot in good faith promise that Query will be serially compatible 
forever, nor should we make any promises that we don't test.  So if you 
choose to continue promoting this route, please specify the scope of 
compatibility and your plans to add tests for it.

> Apparently it matters enough for Hadoop to implement Writeable in all 
> over the wire classes.

I'm not sure what you're saying here.  As I've said before, Hadoop is 
moving away from Writable because it is too fragile as classes change. 
As a part of the preparations for Hadoop 1.0 we are agreeing on 
serialization back-compatibility requirements and what technology we 
will use to support these.  Hadoop is at its core a distributed system, 
while Lucene is not.  Even then, Hadoop will continue to require that 
one update all nodes in a cluster in a coordinated manner, so only 
end-user protocols need be cross-version compatible, not internal 
protocols.  I do not yet see a strong analogy here.

Doug

---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org

Re: [jira] Commented: (LUCENE-1473) Implement Externalizable in main top level searcher classes

Posted by Jason Rutherglen <ja...@gmail.com>.

I think it's best to implement Externalizable as long as someone is willing
to maintain it.  I commit to maintaining the Externalizable code.  The
programming overhead is no more than implementing the equals method in the
classes.  New classes outside the Lucene code base simply need to implement
Serializable to work.  External developers are not required to implement
Externalizable but may if they see fit.

This will insure forward compatability between serialized versions, make the
serialized objects smaller, and make serialization faster.

Apparently it matters enough for Hadoop to implement Writeable in all over
the wire classes.

On Fri, Dec 5, 2008 at 1:47 PM, Michael McCandless <
lucene@mikemccandless.com> wrote:

>
> OK works for me too.
>
> John or Jason, can you update the patch on LUCENE-1743?  We no longer need
> to implement Externalizable (just add fixed SUIDs), but we do need to update
> the javadocs for all classes implementing Serializable to state that
> cross-version compatibility is not guaranteed.
>
> Mike
>
>
> John Wang wrote:
>
>  Works for me.
>>
>> Thanks
>>
>> -John
>>
>> On Fri, Dec 5, 2008 at 1:23 PM, Doug Cutting <cu...@apache.org> wrote:
>> John Wang wrote:
>>      This has been gone back and forth on this thread already. Again, I
>> agree it is not the perfect solution. I am comparing that to the current
>> behavior, I don't think it is worse. (Only in my opinion).
>>
>> So, if it's good enough for you, a user of java serialization, then
>> perhaps those of us who don't use java serialization shouldn't complain.  I
>> think we'd want to add to the documentation something to the effect that
>> this is all that's been done, and that if the classes change substantially
>> then all bets are off.  We do not want to imply that we're making any
>> cross-version compatibility guarantees about serialization, rather just that
>> folks who're willing to take their chances will not be impeded.  Could
>> something like that work?
>>
>> Doug
>>
>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
>> For additional commands, e-mail: java-dev-help@lucene.apache.org
>>
>>
>>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-dev-help@lucene.apache.org
>
>

Re: [jira] Commented: (LUCENE-1473) Implement Externalizable in main top level searcher classes

Posted by Michael McCandless <lu...@mikemccandless.com>.

OK works for me too.

John or Jason, can you update the patch on LUCENE-1743?  We no longer  
need to implement Externalizable (just add fixed SUIDs), but we do  
need to update the javadocs for all classes implementing Serializable  
to state that cross-version compatibility is not guaranteed.

Mike

John Wang wrote:

> Works for me.
>
> Thanks
>
> -John
>
> On Fri, Dec 5, 2008 at 1:23 PM, Doug Cutting <cu...@apache.org>  
> wrote:
> John Wang wrote:
>       This has been gone back and forth on this thread already.  
> Again, I agree it is not the perfect solution. I am comparing that  
> to the current behavior, I don't think it is worse. (Only in my  
> opinion).
>
> So, if it's good enough for you, a user of java serialization, then  
> perhaps those of us who don't use java serialization shouldn't  
> complain.  I think we'd want to add to the documentation something  
> to the effect that this is all that's been done, and that if the  
> classes change substantially then all bets are off.  We do not want  
> to imply that we're making any cross-version compatibility  
> guarantees about serialization, rather just that folks who're  
> willing to take their chances will not be impeded.  Could something  
> like that work?
>
> Doug
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-dev-help@lucene.apache.org
>
>


---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org

Re: [jira] Commented: (LUCENE-1473) Implement Externalizable in main top level searcher classes

Posted by John Wang <jo...@gmail.com>.

Works for me.

Thanks

-John

On Fri, Dec 5, 2008 at 1:23 PM, Doug Cutting <cu...@apache.org> wrote:

> John Wang wrote:
>
>>       This has been gone back and forth on this thread already. Again, I
>> agree it is not the perfect solution. I am comparing that to the current
>> behavior, I don't think it is worse. (Only in my opinion).
>>
>
> So, if it's good enough for you, a user of java serialization, then perhaps
> those of us who don't use java serialization shouldn't complain.  I think
> we'd want to add to the documentation something to the effect that this is
> all that's been done, and that if the classes change substantially then all
> bets are off.  We do not want to imply that we're making any cross-version
> compatibility guarantees about serialization, rather just that folks who're
> willing to take their chances will not be impeded.  Could something like
> that work?
>
> Doug
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-dev-help@lucene.apache.org
>
>

Re: [jira] Commented: (LUCENE-1473) Implement Externalizable in main top level searcher classes

Posted by Doug Cutting <cu...@apache.org>.

John Wang wrote:
>        This has been gone back and forth on this thread already. Again, 
> I agree it is not the perfect solution. I am comparing that to the 
> current behavior, I don't think it is worse. (Only in my opinion).

So, if it's good enough for you, a user of java serialization, then 
perhaps those of us who don't use java serialization shouldn't complain. 
  I think we'd want to add to the documentation something to the effect 
that this is all that's been done, and that if the classes change 
substantially then all bets are off.  We do not want to imply that we're 
making any cross-version compatibility guarantees about serialization, 
rather just that folks who're willing to take their chances will not be 
impeded.  Could something like that work?

Doug

---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org

Re: [jira] Commented: (LUCENE-1473) Implement Externalizable in main top level searcher classes

Posted by John Wang <jo...@gmail.com>.

Mike:
       This has been gone back and forth on this thread already. Again, I
agree it is not the perfect solution. I am comparing that to the current
behavior, I don't think it is worse. (Only in my opinion).

       "live serialization" is not familiar to me. To understand it more,
can you point me to somewhere the J2EE spec defines it? AFAIK, the J2EE spec
does not make a distinction, and from what I gather from this thread, Lucene
does not fall into the special category on how Serializable is used. Of
course, it could just be my lack of understanding in the spec.

       We are happy to accept whatever you guys think on this issue. As it
is currently, it is not consistent amongst different committers.

Thanks

-John

On Fri, Dec 5, 2008 at 12:07 PM, Michael McCandless <
lucene@mikemccandless.com> wrote:

>
> John Wang wrote:
>
>  My proposal is to add the suid to Serializable classes
>>
>
> That's too brittle.
>
> If we do that, then what happens when we need to add a field to the
> class (eg, in 2.9 we've replaced "inclusive" in RangeQuery with
> "includeLower" and "includeUpper")?  The standard answer is you bump
> the suid, but, then that breaks back compatibility.
>
> Since we would still sometimes, unpredictably, break back
> compatibility, no app could rely on it.  You can't have a "mostly
> back compatible" promise.
>
> So... we have to either 1) only support "live serialization" and
> update the javadocs saying so, or 2) support full back compat of
> serialized classes and spell out the actual policy, make thorough
> tests for it, etc.
>
> Mike
>
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-dev-help@lucene.apache.org
>
>

Re: [jira] Commented: (LUCENE-1473) Implement Externalizable in main top level searcher classes

Posted by Michael McCandless <lu...@mikemccandless.com>.

John Wang wrote:

> My proposal is to add the suid to Serializable classes

That's too brittle.

If we do that, then what happens when we need to add a field to the
class (eg, in 2.9 we've replaced "inclusive" in RangeQuery with
"includeLower" and "includeUpper")?  The standard answer is you bump
the suid, but, then that breaks back compatibility.

Since we would still sometimes, unpredictably, break back
compatibility, no app could rely on it.  You can't have a "mostly
back compatible" promise.

So... we have to either 1) only support "live serialization" and
update the javadocs saying so, or 2) support full back compat of
serialized classes and spell out the actual policy, make thorough
tests for it, etc.

Mike


---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org

Re: [jira] Commented: (LUCENE-1473) Implement Externalizable in main top level searcher classes

Posted by Doug Cutting <cu...@apache.org>.

John Wang wrote:
> Also, if you find us addressing this issue being a hassle, e.g. 
> addressing serialization in lucene is an incorrect thing to do, feel 
> free to let us know and we can close the bug and terminate the thread.

I don't know whether cross-version serialization belongs in Lucene.  We 
need to discuss it, to find out how many users might want it, how many 
developers might fear it, how reasonable their fears are, etc.

The discussion so far has not been an easy one.  There have been many 
claims made which have little to do with the technical issue.  As a 
project, we must reach consensus before we can do anything.  Polarized 
comments do not help build consensus.

Doug

---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org

Re: [jira] Commented: (LUCENE-1473) Implement Externalizable in main top level searcher classes

Posted by John Wang <jo...@gmail.com>.

Doug:
1)  "Incrementally upgrading distributed systems has, at least in the past,
been outside the scope of Lucene" - That's good to know. Is it also out of
the scope for distributed lucene effort (if it is still happening)?

2) I used the word broken to describe what happened for our deployment. I
will try to use less harsh words when addressing lucene in the future.

3) " If you think that's trivial, then please pursue it and show us how
trivial it is." - My proposal is to add the suid to Serializable classes, if
you don't think that's trivial, many IDEs doe that for you. I think your
main concern is that this is not the perfect solution to this problem, but
it does provide better behavior than what it is now IMO. I understand we
have discussed earlier in the thread there are cases where adding suid does
not work. Given many of these classes are rather static, I don't share your
concern.

4) "You developed based on some very optimistic guesses about some unstated
aspects" - this is developed based on our understanding of Serializable
without Lucene documentation discouraging us doing so. We also interpreted
the fact RemoteSearcher being part of the package is an example of a valid
use-case. The JOSS protocol is designed to handle versioning (although not
perfectly) We didn't think that was risky, obviously in hindsight it is. But
I do find it hard to believe it is something the author of these classes had
in mind when Serializable interface was implemented.

This is getting into a philosophical discussion on Java Serialization, and
how it pertains to lucene. I don't see any resolution in the near future.
Moving forward, we'd be happy to provide patches given the agreed solution.
There is no reason to provide code patches if it is decided only
documentation needs to change. (from what you have outlined, I interpret it
being only documentation changes)

Also, if you find us addressing this issue being a hassle, e.g. addressing
serialization in lucene is an incorrect thing to do, feel free to let us
know and we can close the bug and terminate the thread.

Thanks

-John

On Fri, Dec 5, 2008 at 9:18 AM, Doug Cutting <cu...@apache.org> wrote:

> John Wang wrote:
>
>> Thus we are enforcing users that care about Serialization to use the
>> release jar.
>>
>
> We already encourage folks to use a release jar if possible.  So this is
> not a big change.  Also, if folks choose to build their own jar, then they
> are expected to use that same jar everywhere, effectively making their own
> release.  That doesn't seem unreasonable to me.  Incrementally upgrading
> distributed systems has, at least in the past, been outside the scope of
> Lucene.
>
>  3) Clean up the serialization story, either add SUID or implement
>> Externalizable for some classes within Lucene that implements Serializable:
>>
>> From what I am told, this is too much work for the committers.
>>
>
> Not that it's too much work today, but that it adds an ongoing burden and
> we should take this on cautiously if at all.  If we want to go this way we'd
> need to:
>
> - document precisely which classes we'll evolve back-compatibly;
> - document the releases (major? minor?) that will be compatible; and
> - provide a test suite that validates this.
>
> As a side note, we should probably move the back-compatibility
> documentation from the wiki to the project website.  This would permit
> patches to it, among other things.
>
> http://wiki.apache.org/lucene-java/BackwardsCompatibility
>
>  I hope you guys at least agree with me with the way it is currently, the
>> serialization story is broken, whether in documentation or in code.
>>
>
> Documenting an unstated assumption is a good thing to do, especially when
> not everyone seems to share the assumption, but "broken" seems a bit strong
> here.
>
>  I see the disagreement being its severity, and whether it is a trivial
>> fix, which I have learned it is not really my place to say.
>>
>
> I've outlined above what I think would be required.  If you think that's
> trivial, then please pursue it and show us how trivial it is.  The patch
> provided thus far is incomplete.
>
>  Please do understand this is not a far-fetched, made-up use-case, we are
>> running into this in production, and we are developing in accordance to
>> lucene documentation.
>>
>
> You developed based on some very optimistic guesses about some unstated
> aspects.  In Java, implementing Serializeable alone does not generally
> provide any cross-version guarantees.  Assuming that it did was risky.
>
> Doug
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-dev-help@lucene.apache.org
>
>

Re: [jira] Commented: (LUCENE-1473) Implement Externalizable in main top level searcher classes

Posted by Doug Cutting <cu...@apache.org>.

John Wang wrote:
> Thus we are enforcing users 
> that care about Serialization to use the release jar.

We already encourage folks to use a release jar if possible.  So this is 
not a big change.  Also, if folks choose to build their own jar, then 
they are expected to use that same jar everywhere, effectively making 
their own release.  That doesn't seem unreasonable to me.  Incrementally 
upgrading distributed systems has, at least in the past, been outside 
the scope of Lucene.

> 3) Clean up the serialization story, either add SUID or implement 
> Externalizable for some classes within Lucene that implements Serializable:
> 
> From what I am told, this is too much work for the committers.

Not that it's too much work today, but that it adds an ongoing burden 
and we should take this on cautiously if at all.  If we want to go this 
way we'd need to:

- document precisely which classes we'll evolve back-compatibly;
- document the releases (major? minor?) that will be compatible; and
- provide a test suite that validates this.

As a side note, we should probably move the back-compatibility 
documentation from the wiki to the project website.  This would permit 
patches to it, among other things.

http://wiki.apache.org/lucene-java/BackwardsCompatibility

> I hope you guys at least agree with me with the way it is currently, the 
> serialization story is broken, whether in documentation or in code.

Documenting an unstated assumption is a good thing to do, especially 
when not everyone seems to share the assumption, but "broken" seems a 
bit strong here.

> I see the disagreement being its severity, and whether it is a trivial 
> fix, which I have learned it is not really my place to say.

I've outlined above what I think would be required.  If you think that's 
trivial, then please pursue it and show us how trivial it is.  The patch 
provided thus far is incomplete.

> Please do understand this is not a far-fetched, made-up use-case, we are 
> running into this in production, and we are developing in accordance to 
> lucene documentation.

You developed based on some very optimistic guesses about some unstated 
aspects.  In Java, implementing Serializeable alone does not generally 
provide any cross-version guarantees.  Assuming that it did was risky.

Doug

---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org

Re: [jira] Commented: (LUCENE-1473) Implement Externalizable in main top level searcher classes

Posted by John Wang <jo...@gmail.com>.

Hi Grant:
     I agree and I apologize for hijacking this thread. If Luceners feel our
criticisms are invalid, then so be it.

     We should focus on this issue, being the serialization story in Lucene.
Not general java serialization, so I don't see how it would benefit to move
this to the java dev list.

      As far as lucene serialization, incorporating comments from various
people, this is what I gather are the choices (feel free to correct me)

1) Remove implementation and support of Serializable: We all agreed this is
bad and breaks backward compatibility.

2) Do nothing to the code base and fix documentation, and clarify Lucene
only supports Serialization between components with the release jar. This
seems to be the suggested approach where I have a coupla concerns:

a) Since given the exact code base, due to the nature of java serialization,
different builds of the jar via IBM vm vs. Sun VM vs. Jrocket etc, cannot
guarantee compatibility. Thus we are enforcing users that care about
Serialization to use the release jar.

b) There is at least one place, as I have previously mentioned, e.g.
ScoreDocComparator, the contract returns a Comparable and via javadoc, must
be serializable. How should this be treated? This can be an application
object, should we pass on the same enforcement there when merge/sort is
happening across the wire since similar serialization problem would break
inside MultiSearcher?

3) Clean up the serialization story, either add SUID or implement
Externalizable for some classes within Lucene that implements Serializable:

>From what I am told, this is too much work for the committers.

I hope you guys at least agree with me with the way it is currently, the
serialization story is broken, whether in documentation or in code. I see
the disagreement being its severity, and whether it is a trivial fix, which
I have learned it is not really my place to say.

Please do understand this is not a far-fetched, made-up use-case, we are
running into this in production, and we are developing in accordance to
lucene documentation.

Thanks

-John

On Thu, Dec 4, 2008 at 3:23 PM, Grant Ingersoll <gs...@apache.org> wrote:

>
> On Dec 4, 2008, at 2:21 PM, Jason Rutherglen wrote:
>
>  To put things in perspective, I believe Microsoft (who could potentially
>> place a lot of resources towards Lucene) now uses Lucene through Powerset?
>> and I don't think those folks are contributing back.  I know of several
>> other companies who do the same, and many potential contributions that are
>> not submitted because people and their companies do not see the benefit of
>> going through the hoops required to get patches committed.  A relatively
>> simple patch such as 1473 Serialization represents this well.
>>
>
> What do you suggest?  We didn't force anyone to use Lucene.  Heck, most of
> our users don't even ever participate on the mailing list.
>
> We do provide a very clear, transparent path for making contributions and
> becoming a committer.  I don't know what else we can do, but we're totally
> open to suggestions on how to improve it.
>
> FWIW, just b/c you think 1473 is trivial doesn't make it so.  You have a
> single use case and that's all you care about.  The community has dozens, if
> not hundreds of use cases, and your "trivial" patch may not be so trivial in
> that regards.  How would you feel if we "broke" something that you have
> relied on for years in the name of us moving faster?  I am willing to bet
> the large number of people here in Lucene appreciate our deliberations for
> the most part.  As for my opinion on 1473, I personally think there are
> better ways of achieving what you are trying to do, as Robert and others
> have suggested and I don't think it is worth it to maintain serialization
> across versions as it is a too large of a burden, IMO.  But, heh, make an
> argument (preferably w/o the accusations) and convince me otherwise.
>
>
>>
>> For example if a company is developing custom search algorithms, Lucene
>> supports TF/IDF but not much else.  Custom search algorithms require
>> rewriting lots of Lucene code.  Companies who write new search algorithms do
>> not necessarily want to rewrite Lucene as well to make it pluggable for new
>> scoring as it is out of scope, they will simply branch the code.  It does
>> not help that the core APIs underneath IndexReader are protected and package
>> protected which assumes a user that is not advanced.  It is repeated in the
>> mailing lists that new features will threaten the existing user base which
>> is based on opinion rather than fact.  More advanced users are currently
>> hindered by the conservatism of the project and so naturally have stopped
>> trying to submit changes that alter the core non-public code.
>>
>
> So, your mad at us for others not contributing back their forks?  Even the
> ones we don't know about?  Simply put, I'm sorry we can't please you.  If
> you go read the archives, you will see plenty of times when even us
> committers have been frustrated from time to time by the process (just look
> at the JDK 1.5 debate, or the Interface/Abstract debate) but in the end, I
> feel Lucene is stronger for it.  Community over code, it's the Apache Way.
>  You are free to disagree.  In fact, you have several options available to
> you to show that disagreement:  1. You can work to become a committer and
> change it from within.  The bar really isn't that high, 3 to 4 non-trivial
> patches and a willingness to work with others in a mostly pleasant way.  2.
>  You can make us aware of the patches and be persistent about seeing it
> through and we'll try to get to it.  Just look at CHANGES.txt and JIRA and
> you will see that this happens all the time and from a wide variety of
> contributors (including both you and John).  3.  You can fork the code and
> go do your thing and build your own community, etc.
>
> Personally, I hope you choose 1 or 2, as we're all stronger together than
> we are apart.
>
>
>>
>> The rancor is from users would benefit from a faster pace and the ability
>> to be more creative inside the core Lucene system.  As the internals change
>> frequently and unnannounced the process of developing core patches is
>> difficult and frustrating.
>>
>
> I'm sorry that we can't work at a faster pace.  Suggestions on how to deal
> with the number of patches we have and still maintain quality and how to
> move forward w/o breaking old patches are much appreciated.
>
> As for the internals changing, you have just hit the nail on the head as to
> why it is so important to maintain back-compat.
>
> I simply don't get the unannounced part.  What isn't announced?  Geez, I've
> been a committer for a few years now, and I have yet to see another open
> source project that is as public as Lucene, for better or worse.  Look at
> the archives, we regularly even put our warts out for public consumption in
> an effort to improve ourselves.
>
> Rather than continue hijacking this thread, why don't we either let it die
> and focus on serialization, or we go over to java-dev and you and John and
> the rest of us can create a concrete list of suggestions that we think could
> make Lucene better and we can all discuss them in a positive manner and see
> how we can go about addressing them.  I'd be more than happy to discuss
> there if you want.
>
> Cheers,
>
> Grant
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-dev-help@lucene.apache.org
>
>

Re: [jira] Commented: (LUCENE-1473) Implement Externalizable in main top level searcher classes

Posted by Grant Ingersoll <gs...@apache.org>.

On Dec 4, 2008, at 2:21 PM, Jason Rutherglen wrote:

> To put things in perspective, I believe Microsoft (who could  
> potentially place a lot of resources towards Lucene) now uses Lucene  
> through Powerset? and I don't think those folks are contributing  
> back.  I know of several other companies who do the same, and many  
> potential contributions that are not submitted because people and  
> their companies do not see the benefit of going through the hoops  
> required to get patches committed.  A relatively simple patch such  
> as 1473 Serialization represents this well.

What do you suggest?  We didn't force anyone to use Lucene.  Heck,  
most of our users don't even ever participate on the mailing list.

We do provide a very clear, transparent path for making contributions  
and becoming a committer.  I don't know what else we can do, but we're  
totally open to suggestions on how to improve it.

FWIW, just b/c you think 1473 is trivial doesn't make it so.  You have  
a single use case and that's all you care about.  The community has  
dozens, if not hundreds of use cases, and your "trivial" patch may not  
be so trivial in that regards.  How would you feel if we "broke"  
something that you have relied on for years in the name of us moving  
faster?  I am willing to bet the large number of people here in Lucene  
appreciate our deliberations for the most part.  As for my opinion on  
1473, I personally think there are better ways of achieving what you  
are trying to do, as Robert and others have suggested and I don't  
think it is worth it to maintain serialization across versions as it  
is a too large of a burden, IMO.  But, heh, make an argument  
(preferably w/o the accusations) and convince me otherwise.

>
>
> For example if a company is developing custom search algorithms,  
> Lucene supports TF/IDF but not much else.  Custom search algorithms  
> require rewriting lots of Lucene code.  Companies who write new  
> search algorithms do not necessarily want to rewrite Lucene as well  
> to make it pluggable for new scoring as it is out of scope, they  
> will simply branch the code.  It does not help that the core APIs  
> underneath IndexReader are protected and package protected which  
> assumes a user that is not advanced.  It is repeated in the mailing  
> lists that new features will threaten the existing user base which  
> is based on opinion rather than fact.  More advanced users are  
> currently hindered by the conservatism of the project and so  
> naturally have stopped trying to submit changes that alter the core  
> non-public code.

So, your mad at us for others not contributing back their forks?  Even  
the ones we don't know about?  Simply put, I'm sorry we can't please  
you.  If you go read the archives, you will see plenty of times when  
even us committers have been frustrated from time to time by the  
process (just look at the JDK 1.5 debate, or the Interface/Abstract  
debate) but in the end, I feel Lucene is stronger for it.  Community  
over code, it's the Apache Way.  You are free to disagree.  In fact,  
you have several options available to you to show that disagreement:   
1. You can work to become a committer and change it from within.  The  
bar really isn't that high, 3 to 4 non-trivial patches and a  
willingness to work with others in a mostly pleasant way.  2.  You can  
make us aware of the patches and be persistent about seeing it through  
and we'll try to get to it.  Just look at CHANGES.txt and JIRA and you  
will see that this happens all the time and from a wide variety of  
contributors (including both you and John).  3.  You can fork the code  
and go do your thing and build your own community, etc.

Personally, I hope you choose 1 or 2, as we're all stronger together  
than we are apart.

>
>
> The rancor is from users would benefit from a faster pace and the  
> ability to be more creative inside the core Lucene system.  As the  
> internals change frequently and unnannounced the process of  
> developing core patches is difficult and frustrating.

I'm sorry that we can't work at a faster pace.  Suggestions on how to  
deal with the number of patches we have and still maintain quality and  
how to move forward w/o breaking old patches are much appreciated.

As for the internals changing, you have just hit the nail on the head  
as to why it is so important to maintain back-compat.

I simply don't get the unannounced part.  What isn't announced?  Geez,  
I've been a committer for a few years now, and I have yet to see  
another open source project that is as public as Lucene, for better or  
worse.  Look at the archives, we regularly even put our warts out for  
public consumption in an effort to improve ourselves.

Rather than continue hijacking this thread, why don't we either let it  
die and focus on serialization, or we go over to java-dev and you and  
John and the rest of us can create a concrete list of suggestions that  
we think could make Lucene better and we can all discuss them in a  
positive manner and see how we can go about addressing them.  I'd be  
more than happy to discuss there if you want.

Cheers,
Grant

---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org

Re: [jira] Commented: (LUCENE-1473) Implement Externalizable in main top level searcher classes

Posted by Jason Rutherglen <ja...@gmail.com>.

Correction: Powerset apparently did not use Lucene.  And apparently there
are a few other companies who are not open sourcing, use Lucene
serialization regularly.

> Did you pay Michael?  No one here is compelled to work with anyone else.
 We work with others when we feel it is in our mutual self interest.

Nice... I guess our government is the macrocosm.

On Thu, Dec 4, 2008 at 11:21 AM, Jason Rutherglen <
jason.rutherglen@gmail.com> wrote:

> To put things in perspective, I believe Microsoft (who could potentially
> place a lot of resources towards Lucene) now uses Lucene through Powerset?
> and I don't think those folks are contributing back.  I know of several
> other companies who do the same, and many potential contributions that are
> not submitted because people and their companies do not see the benefit of
> going through the hoops required to get patches committed.  A relatively
> simple patch such as 1473 Serialization represents this well.
>
> For example if a company is developing custom search algorithms, Lucene
> supports TF/IDF but not much else.  Custom search algorithms require
> rewriting lots of Lucene code.  Companies who write new search algorithms do
> not necessarily want to rewrite Lucene as well to make it pluggable for new
> scoring as it is out of scope, they will simply branch the code.  It does
> not help that the core APIs underneath IndexReader are protected and package
> protected which assumes a user that is not advanced.  It is repeated in the
> mailing lists that new features will threaten the existing user base which
> is based on opinion rather than fact.  More advanced users are currently
> hindered by the conservatism of the project and so naturally have stopped
> trying to submit changes that alter the core non-public code.
>
> The rancor is from users would benefit from a faster pace and the ability
> to be more creative inside the core Lucene system.  As the internals change
> frequently and unnannounced the process of developing core patches is
> difficult and frustrating.
>
> Now that Lucene is stable and flexible indexing is being implemented.  It
> would benefit the community to focus on the future.  Who exactly is
> responsible for this?  Which of the committers are building for the future?
> Which are doing bug fixes?  What is the process of developing more advanced
> features in open source?  Right now it seems to be one person, Michael
> McCandless developing all of the new core code.  This is great forward
> progress, however it's unclear how others can get involved and not get
> stampeded by the constant changes that all happen via one brilliant person.
>
>
> I have requested of people such as Michael Busch to collaborate on the
> column stride fields and received no response.
>
> To me, an good example of volunteers are people who prepare food and donate
> their time at soup kitchens with no pay, and no hope for pay related to
> feeding the hungry.
>
> -J
>
>
> On Wed, Dec 3, 2008 at 2:52 PM, Grant Ingersoll <gs...@apache.org>wrote:
>
>>
>> On Dec 3, 2008, at 2:27 PM, Jason Rutherglen (JIRA) wrote:
>>
>>
>>>
>>> Hoss wrote: "sort of mythical "Lucene powerhouse"
>>> Lucene seems to run itself quite differently than other open source Java
>>> projects.  Perhaps it would be good to spell out the reasons for the
>>> reluctance to move ahead with features that developers work on, that work,
>>> but do not go in.  The developer contributions seem to be quite low right
>>> now, especially compared to neighbor projects such as Hadoop.  Is this
>>> because fewer people are using Lucene?  Or is it due to the reluctance to
>>> work with the developer community?  Unfortunately the perception in the eyes
>>> of some people who work on search related projects it is the latter.
>>>
>>
>>
>> Or, could it be that Hadoop is relatively new and in vogue at the moment,
>> very malleable and buggy(?) and has a HUGE corporate sponsor who dedicates
>> lots of resources to it on a full time basis, whilst Lucene has been around
>> in the ASF for 7+ years (and 12+ years total) and has a really large install
>> base and thus must move more deliberately and basically has 1 person who
>> gets to work on it full time while the rest of us pretty much volunteer?
>>  That's not an excuse, it's just the way it is.  I personally, would love to
>> work on Lucene all day every day as I have a lot of things I'd love to
>> engage the community on, but the fact is I'm not paid to do that, so I give
>> what I can when I can.  I know most of the other committers are that way
>> too.
>>
>> Thus, I don't think any one of us has a reluctance to move ahead with
>> features or bug fixes.   Looking at CHANGES.txt, I see a lot of
>> contributors.  Looking at java-dev and JIRA, I see lots of engagement with
>> the community.  Is it near the historical high for traffic, no it's not, but
>> that isn't necessarily a bad thing.  I think it's a sign that Lucene is
>> pretty stable.
>>
>> What we do have a reluctance for are patches that don't have tests (i.e.
>> this one), patches that massively change Lucene APIs in non-trivial ways or
>> break back compatibility or are not kept up to date.  Are we perfect?  Of
>> course not.  I, personally, would love for there to be a way that helps us
>> process a larger volume of patches (note, I didn't say commit a larger
>> volume).  Hadoop's automated patch tester would be a huge start in that, but
>> at the end of the day, Lucene still works the way all ASF projects do: via
>> meritocracy and volunteerism.     You want stuff committed, keep it up to
>> date, make it manageable to review, document it, respond to
>> questions/concerns with answers as best you can.  To that end, a real simple
>> question can go a long way and getting something committed, and it simply
>> is:  "Hey Lucener's,  what else can I do to help you review and commit
>> LUCENE-XXXX?"  Lather, rinse, repeat.   Next thing you know, you'll be on
>> the receiving end as a committer.
>>
>> -Grant
>>
>>
>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
>> For additional commands, e-mail: java-dev-help@lucene.apache.org
>>
>>
>

Re: [jira] Commented: (LUCENE-1473) Implement Externalizable in main top level searcher classes

Posted by Doug Cutting <cu...@apache.org>.

Jason Rutherglen wrote:
> A relatively simple patch such as 1473 Serialization 
> represents this well. 

LUCENE-1473 is an incomplete patch that proposes to commit the project 
to new back-compatibility requirements.  Compatibility requirements 
should not be added lightly, but only deliberately, as they have a 
long-term impact on the ability of the project to evolve.  Prior to this 
we've not heard from folks who require cross-version java serialization 
compatibility.  Without more folks asserting this as a need it is hard 
to rationalize adding this.

> As the 
> internals change frequently and unnannounced the process of developing 
> core patches is difficult and frustrating.

The process is entirely in public.  You have as much announcement as 
anyone.  Patches are weighed on there merits as they are contributed.

> It would benefit the community to focus on the future.  Who exactly is 
> responsible for this?  Which of the committers are building for the 
> future?  Which are doing bug fixes?  What is the process of developing 
> more advanced features in open source?

I've already explained the process several times.

We cannot easily make a long-term plan when we do not have the power to 
assign folks.  We can state long-term goals, like flexible indexing, but 
in the end, it won't get done until someone volunteers to write the 
code.  So you're welcome to start a wish list on the wiki, and you're 
welcome to then start contributing patches that implement items on your 
wish list.  If you propose something that folks think is extremely 
useful, but requires an incompatible change, then it could perhaps be 
done in a branch.  But most of the existing community is interested in 
pushing forward incrementally, trying hard to keep most things 
back-compatible.  If that's too frustrating for you, you can fork Lucene 
and build a new community.

> Right now it seems to be one 
> person, Michael McCandless developing all of the new core code.

Mike does a lot of development, but he also commits a lot of patches 
written by others.

> This is 
> great forward progress, however it's unclear how others can get involved 
> and not get stampeded by the constant changes that all happen via one 
> brilliant person. 

You want Mike to do less?  Others can and do get involved all the time. 
  Look at http://tinyurl.com/5nl78n.  The majority of the things Mike 
works on are instigated by others.

> I have requested of people such as Michael Busch to collaborate on the 
> column stride fields and received no response. 

Did you pay Michael?  No one here is compelled to work with anyone else. 
   We work with others when we feel it is in our mutual self interest.

Doug

---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org

Re: [jira] Commented: (LUCENE-1473) Implement Externalizable in main top level searcher classes

Posted by Jason Rutherglen <ja...@gmail.com>.

To put things in perspective, I believe Microsoft (who could potentially
place a lot of resources towards Lucene) now uses Lucene through Powerset?
and I don't think those folks are contributing back.  I know of several
other companies who do the same, and many potential contributions that are
not submitted because people and their companies do not see the benefit of
going through the hoops required to get patches committed.  A relatively
simple patch such as 1473 Serialization represents this well.

For example if a company is developing custom search algorithms, Lucene
supports TF/IDF but not much else.  Custom search algorithms require
rewriting lots of Lucene code.  Companies who write new search algorithms do
not necessarily want to rewrite Lucene as well to make it pluggable for new
scoring as it is out of scope, they will simply branch the code.  It does
not help that the core APIs underneath IndexReader are protected and package
protected which assumes a user that is not advanced.  It is repeated in the
mailing lists that new features will threaten the existing user base which
is based on opinion rather than fact.  More advanced users are currently
hindered by the conservatism of the project and so naturally have stopped
trying to submit changes that alter the core non-public code.

The rancor is from users would benefit from a faster pace and the ability to
be more creative inside the core Lucene system.  As the internals change
frequently and unnannounced the process of developing core patches is
difficult and frustrating.

Now that Lucene is stable and flexible indexing is being implemented.  It
would benefit the community to focus on the future.  Who exactly is
responsible for this?  Which of the committers are building for the future?
Which are doing bug fixes?  What is the process of developing more advanced
features in open source?  Right now it seems to be one person, Michael
McCandless developing all of the new core code.  This is great forward
progress, however it's unclear how others can get involved and not get
stampeded by the constant changes that all happen via one brilliant person.

I have requested of people such as Michael Busch to collaborate on the
column stride fields and received no response.

To me, an good example of volunteers are people who prepare food and donate
their time at soup kitchens with no pay, and no hope for pay related to
feeding the hungry.

-J

On Wed, Dec 3, 2008 at 2:52 PM, Grant Ingersoll <gs...@apache.org> wrote:

>
> On Dec 3, 2008, at 2:27 PM, Jason Rutherglen (JIRA) wrote:
>
>
>>
>> Hoss wrote: "sort of mythical "Lucene powerhouse"
>> Lucene seems to run itself quite differently than other open source Java
>> projects.  Perhaps it would be good to spell out the reasons for the
>> reluctance to move ahead with features that developers work on, that work,
>> but do not go in.  The developer contributions seem to be quite low right
>> now, especially compared to neighbor projects such as Hadoop.  Is this
>> because fewer people are using Lucene?  Or is it due to the reluctance to
>> work with the developer community?  Unfortunately the perception in the eyes
>> of some people who work on search related projects it is the latter.
>>
>
>
> Or, could it be that Hadoop is relatively new and in vogue at the moment,
> very malleable and buggy(?) and has a HUGE corporate sponsor who dedicates
> lots of resources to it on a full time basis, whilst Lucene has been around
> in the ASF for 7+ years (and 12+ years total) and has a really large install
> base and thus must move more deliberately and basically has 1 person who
> gets to work on it full time while the rest of us pretty much volunteer?
>  That's not an excuse, it's just the way it is.  I personally, would love to
> work on Lucene all day every day as I have a lot of things I'd love to
> engage the community on, but the fact is I'm not paid to do that, so I give
> what I can when I can.  I know most of the other committers are that way
> too.
>
> Thus, I don't think any one of us has a reluctance to move ahead with
> features or bug fixes.   Looking at CHANGES.txt, I see a lot of
> contributors.  Looking at java-dev and JIRA, I see lots of engagement with
> the community.  Is it near the historical high for traffic, no it's not, but
> that isn't necessarily a bad thing.  I think it's a sign that Lucene is
> pretty stable.
>
> What we do have a reluctance for are patches that don't have tests (i.e.
> this one), patches that massively change Lucene APIs in non-trivial ways or
> break back compatibility or are not kept up to date.  Are we perfect?  Of
> course not.  I, personally, would love for there to be a way that helps us
> process a larger volume of patches (note, I didn't say commit a larger
> volume).  Hadoop's automated patch tester would be a huge start in that, but
> at the end of the day, Lucene still works the way all ASF projects do: via
> meritocracy and volunteerism.     You want stuff committed, keep it up to
> date, make it manageable to review, document it, respond to
> questions/concerns with answers as best you can.  To that end, a real simple
> question can go a long way and getting something committed, and it simply
> is:  "Hey Lucener's,  what else can I do to help you review and commit
> LUCENE-XXXX?"  Lather, rinse, repeat.   Next thing you know, you'll be on
> the receiving end as a committer.
>
> -Grant
>
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-dev-help@lucene.apache.org
>
>

Re: [jira] Commented: (LUCENE-1473) Implement Externalizable in main top level searcher classes

Posted by Doug Cutting <cu...@apache.org>.

John Wang wrote:
> I agree with the process itself, what would make it better is 
> some transparency on how patches/issues are evaluated to be committed. 

To be clear: there is no forum for communication about patches except 
this list, and, by extension, Jira.  The process of patch evaluation is 
completely transparent.

> At least seemed from the outside, it is purely being decided on by the 
> committers, and since my understanding is that an open source project 
> belongs to the public, the public user base should have some say.

It is not a democracy, it is a meritocracy.

http://www.apache.org/foundation/how-it-works.html#meritocracy

I'll repeat: committers are added when they've both contributed a series 
of high-quality, easy-to-commit patches, and when they've demonstrated 
that they are easy to work with.  That process has resulted in the 
current set of committers, and those committers determine which patches 
are committed and when.  Those are the rules.

However committers cannot ram just any patch through.  Committers are 
only added after they've demonstrated the ability to build consensus 
around their patches.  And they must continue to build consensus around 
their patches even after they are committers.  Patches that receive no 
endorsement from others are not committed, no matter who contributes 
them.  A contribution is not more rapidly committed simply because the 
contributor is a committer.  Rather, committers knows how to elicit and 
respond to criticism and build consensus around a patch in order to get 
them committed rapidly.

Doug

---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org

Re: [jira] Commented: (LUCENE-1473) Implement Externalizable in main top level searcher classes

Posted by robert engels <re...@ix.netcom.com>.

My two cents...

I think the committers do a great job of managing the product.  I  
feel the single biggest failure when it comes to producing quality  
software is lack of vision, and/or enforcement of this vision.

If every "wisher" or "submitter" had their code committed - even if  
it is "good code" - the product would quickly become unwieldy to  
maintain and/or learn (for new users), lessening its usefulness to  
everyone.

The only problem I have with Lucene's current focus is that I feel  
the Lucene folks should work on standardizing the API, focusing on  
interfaces and/or abstract classes with proper protected level access.

By doing this, people are much freer to develop their own  
enhancements, and can quickly apply them to later Lucene releases  
just by applying a patch (at worst), or just a link (at best !).   
Similar to how the JDK works. We have rarely if ever needed to change  
our code between JDK releases.

I realize this is a dream right now, because of the bad shape (sorry)  
of the structure of much of Lucene, but if the committers spent more  
time on issues like this, I think they would hear far less complaints  
from the community.

As an example of the above - being able to access the underlying  
readers in a multi-reader (I know there is a current bug for this).  
There is no harm to Lucene folks to expose this, and it is very  
helpful in many cases. If some developer uses this information in the  
wrong way, that is their fault, not Lucene's....  Making something  
protected is very different than making it public.

Robert Engels

On Dec 3, 2008, at 11:36 PM, John Wang wrote:

> Grant:
>
>         I am sorry that I disagree with some points:
>
> 1) "I think it's a sign that Lucene is pretty stable." - While  
> lucene is a great project, especially with 2.x releases, great  
> improvements are made, but do we really have a clear picture on how  
> lucene is being used and deployed. While lucene works great running  
> as a vanilla search library, when pushed to limits, one needs to  
> "hack" into lucene to make certain things work. If 90% of the user  
> base use it to build small indexes and using the vanilla api, and  
> the other 10% is really stressing both on the scalability and api  
> side and are running into issues, would you still say: "running  
> well for 90% of the users, therefore it is stable or extensible"? I  
> think it is unfair to the project itself to be measured by the  
> vanilla use-case. I have done couple of large deployments, e.g. >30  
> million documents indexed and searched in realtime., and I really  
> had to do some tweaking.
>
> 2) "You want stuff committed, keep it up to date, make it  
> manageable to review, document it, respond to questions/concerns  
> with answers as best you can. " - To some degree I would hope it  
> depends on what the issue is, e.g. enforcing such process on a one- 
> line null check seems to be an overkill. I agree with the process  
> itself, what would make it better is some transparency on how  
> patches/issues are evaluated to be committed. At least seemed from  
> the outside, it is purely being decided on by the committers, and  
> since my understanding is that an open source project belongs to  
> the public, the public user base should have some say.
>
> 3) which brings me to this point: "I personally, would love to work  
> on Lucene all day every day as I have a lot of things I'd love to  
> engage the community on, but the fact is I'm not paid to do that,  
> so I give what I can when I can.  I know most of the other  
> committers are that way too." - Is this really true? Isn't a large  
> part of the committer base also a part of the for-profit,  
> consulting business, e.g. Lucid? Would groups/companies that pay  
> for consulting service get their patches/requirements committed  
> with higher priority? If so, seems to me to be a conflict of  
> interest there.
>
> 4) "Lather, rinse, repeat.   Next thing you know, you'll be on the  
> receiving end as a committer." - While I agree that being a  
> committer is a great honor and many committers are awesome, but  
> assuming everyone would want to be a committer is a little  
> presumptuous.
>
> In conclusion, I hope I didn't unleash any wrath from the  
> committers for expressing candor.
>
> -John
>
> On Wed, Dec 3, 2008 at 2:52 PM, Grant Ingersoll  
> <gs...@apache.org> wrote:
>
> On Dec 3, 2008, at 2:27 PM, Jason Rutherglen (JIRA) wrote:
>
>
>
> Hoss wrote: "sort of mythical "Lucene powerhouse"
> Lucene seems to run itself quite differently than other open source  
> Java projects.  Perhaps it would be good to spell out the reasons  
> for the reluctance to move ahead with features that developers work  
> on, that work, but do not go in.  The developer contributions seem  
> to be quite low right now, especially compared to neighbor projects  
> such as Hadoop.  Is this because fewer people are using Lucene?  Or  
> is it due to the reluctance to work with the developer community?   
> Unfortunately the perception in the eyes of some people who work on  
> search related projects it is the latter.
>
>
> Or, could it be that Hadoop is relatively new and in vogue at the  
> moment, very malleable and buggy(?) and has a HUGE corporate  
> sponsor who dedicates lots of resources to it on a full time basis,  
> whilst Lucene has been around in the ASF for 7+ years (and 12+  
> years total) and has a really large install base and thus must move  
> more deliberately and basically has 1 person who gets to work on it  
> full time while the rest of us pretty much volunteer?    That's not  
> an excuse, it's just the way it is.  I personally, would love to  
> work on Lucene all day every day as I have a lot of things I'd love  
> to engage the community on, but the fact is I'm not paid to do  
> that, so I give what I can when I can.  I know most of the other  
> committers are that way too.
>
> Thus, I don't think any one of us has a reluctance to move ahead  
> with features or bug fixes.   Looking at CHANGES.txt, I see a lot  
> of contributors.  Looking at java-dev and JIRA, I see lots of  
> engagement with the community.  Is it near the historical high for  
> traffic, no it's not, but that isn't necessarily a bad thing.  I  
> think it's a sign that Lucene is pretty stable.
>
> What we do have a reluctance for are patches that don't have tests  
> (i.e. this one), patches that massively change Lucene APIs in non- 
> trivial ways or break back compatibility or are not kept up to  
> date.  Are we perfect?  Of course not.  I, personally, would love  
> for there to be a way that helps us process a larger volume of  
> patches (note, I didn't say commit a larger volume).  Hadoop's  
> automated patch tester would be a huge start in that, but at the  
> end of the day, Lucene still works the way all ASF projects do: via  
> meritocracy and volunteerism.     You want stuff committed, keep it  
> up to date, make it manageable to review, document it, respond to  
> questions/concerns with answers as best you can.  To that end, a  
> real simple question can go a long way and getting something  
> committed, and it simply is:  "Hey Lucener's,  what else can I do  
> to help you review and commit LUCENE-XXXX?"  Lather, rinse,  
> repeat.   Next thing you know, you'll be on the receiving end as a  
> committer.
>
> -Grant
>
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-dev-help@lucene.apache.org
>
>

Re: [jira] Commented: (LUCENE-1473) Implement Externalizable in main top level searcher classes

Posted by John Wang <jo...@gmail.com>.

Mark and Grant:

    I do apologize if I came off seeming rude. I guess I let my frustration
of the serialization issue got the better of me (and also a built up from
some of the other issues, which I thought are trivial but was made to be
not). And I will improve my behavior in the future.

   There is a reason I have stopped submitting patches via Jira. (For which
I no longer dare to express.)

   There is absolutely nothing wrong with getting paid for Lucene expertise.
I was just commenting on your comment about "volunteering", but if you think
I am wrong, then I am. I did have a concern with the focus of the project
getting biased by paying companies to the committers, but obviously it is
not my business.

    The issues/patches I am having are trivial stuffs, and that was
precisely my point. I am not pushing for  grandeous ideas, I am frustrated
with some very brain dead issues (I am not smart enough to provide any earth
shattering patches) that has blown out of proportion in my mind.

    I will try to keep my mouth shut in the future.

-John

On Thu, Dec 4, 2008 at 5:24 AM, Grant Ingersoll <gs...@apache.org> wrote:

>
> On Dec 4, 2008, at 12:36 AM, John Wang wrote:
>
>  Grant:
>>
>>        I am sorry that I disagree with some points:
>>
>> 1) "I think it's a sign that Lucene is pretty stable." - While lucene is a
>> great project, especially with 2.x releases, great improvements are made,
>> but do we really have a clear picture on how lucene is being used and
>> deployed. While lucene works great running as a vanilla search library, when
>> pushed to limits, one needs to "hack" into lucene to make certain things
>> work. If 90% of the user base use it to build small indexes and using the
>> vanilla api, and the other 10% is really stressing both on the scalability
>> and api side and are running into issues, would you still say: "running well
>> for 90% of the users, therefore it is stable or extensible"? I think it is
>> unfair to the project itself to be measured by the vanilla use-case. I have
>> done couple of large deployments, e.g. >30 million documents indexed and
>> searched in realtime., and I really had to do some tweaking.
>>
>
> Sorry, we should have written a perfect engine the first time out.  I'll
> get on that.  Question for you:  how much of that tweaking have you
> contributed back?  If you have such obvious wins, put them up as patches so
> we can all benefit, just like you've benefitted from our volunteering.
>
> As for 90%, I'd say it is more like > 95% and, gee, if I can write a
> general purpose open source search library that keeps 95% of a very, very,
> very large install base happy all while still improving it and maintaining
> backward compatibility, than color me stable.
>
>
>> 2) "You want stuff committed, keep it up to date, make it manageable to
>> review, document it, respond to questions/concerns with answers as best you
>> can. " - To some degree I would hope it depends on what the issue is, e.g.
>> enforcing such process on a one-line null check seems to be an overkill. I
>> agree with the process itself, what would make it better is some
>> transparency on how patches/issues are evaluated to be committed. At least
>> seemed from the outside, it is purely being decided on by the committers,
>> and since my understanding is that an open source project belongs to the
>> public, the public user base should have some say.
>>
>
> Here's your list of opened issues:
> https://issues.apache.org/jira/secure/IssueNavigator.jspa?reset=true&reporterSelect=specificuser&reporter=john.wang@gmail.com  Only 1 of which has more than 2 votes and which is assigned to Hoss.
>  However, from what I can see, you've had all but 1, I repeat ONE, issue not
> resolved.
>
> And, yes, what gets committed is decided on by the COMMITTERS with input
> from the community; who else can be responsible for committing?  Hence the
> title.  We can't please everyone, but I'll be damned if you're going to
> disparage the work of so many because you have sour grapes over some people
> (not all) disagreeing with you over how serialization should work in Lucene
> just b/c you think the problem is trivial when clearly others do not.
>
> Committers are picked by the project over a long period of time (feel free
> to nominate someone who you feel has merit, we've elected committers based
> on community nominations in the past) because they stick around and stay
> involved and respond on the list, etc.  I'm starting to think your real
> issue here is that we haven't all agreed with you the minute you suggest
> something, but sorry, that is how open source works.
>
>
>
>> 3) which brings me to this point: "I personally, would love to work on
>> Lucene all day every day as I have a lot of things I'd love to engage the
>> community on, but the fact is I'm not paid to do that, so I give what I can
>> when I can.  I know most of the other committers are that way too." - Is
>> this really true? Isn't a large part of the committer base also a part of
>> the for-profit, consulting business, e.g. Lucid? Would groups/companies that
>> pay for consulting service get their patches/requirements committed with
>> higher priority? If so, seems to me to be a conflict of interest there.
>>
>
> Yes, John, it is true.  I would love to work on Lucene all day.  If I won
> the lottery tomorrow, I'd probably still volunteer on Lucene.  Let me ask
> you back, who pays you to work on Lucene?  Was this patch submitted because
> you just happened to spot it while pouring over the code at night on your
> own and out of the goodness of your heart?  Or did you discover it at
> LinkedIn where you were specifically hired because of your Lucene skills and
> knowledge of the Lucene community?  In other words, you're accusing me and
> others of getting paid for my expertise in Lucene, all the while you are
> getting paid for your expertise in Lucene.
>
>
>> 4) "Lather, rinse, repeat.   Next thing you know, you'll be on the
>> receiving end as a committer." - While I agree that being a committer is a
>> great honor and many committers are awesome, but assuming everyone would
>> want to be a committer is a little presumptuous.
>>
>
> Where did I imply that?  All I'm saying, is you can't just throw your code
> up here and say "Hey, fix this for me the way I want it fixed and then come
> back and tell me when it's done"  It doesn't work that way.  It never has.
>  No open source project works that way.
>
>
>
>> In conclusion, I hope I didn't unleash any wrath from the committers for
>> expressing candor.
>>
>
> Hey, we're all entitled to your opinions.  Personally, I think you've made
> a lot of nice contributions to Lucene over the years in terms of insights,
> ideas and patches.  So, I guess I am a bit surprised by the rancor in your
> message, which came from out of no where, not too mention the fact that it
> has completely hijacked an otherwise interesting conversation about the
> right way to do serialization.  If you want to call that candor, than feel
> free.
>
>
> -Grant
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-dev-help@lucene.apache.org
>
>

Re: [jira] Commented: (LUCENE-1473) Implement Externalizable in main top level searcher classes

Posted by Grant Ingersoll <gs...@apache.org>.

On Dec 4, 2008, at 12:36 AM, John Wang wrote:

> Grant:
>
>         I am sorry that I disagree with some points:
>
> 1) "I think it's a sign that Lucene is pretty stable." - While  
> lucene is a great project, especially with 2.x releases, great  
> improvements are made, but do we really have a clear picture on how  
> lucene is being used and deployed. While lucene works great running  
> as a vanilla search library, when pushed to limits, one needs to  
> "hack" into lucene to make certain things work. If 90% of the user  
> base use it to build small indexes and using the vanilla api, and  
> the other 10% is really stressing both on the scalability and api  
> side and are running into issues, would you still say: "running well  
> for 90% of the users, therefore it is stable or extensible"? I think  
> it is unfair to the project itself to be measured by the vanilla use- 
> case. I have done couple of large deployments, e.g. >30 million  
> documents indexed and searched in realtime., and I really had to do  
> some tweaking.

Sorry, we should have written a perfect engine the first time out.   
I'll get on that.  Question for you:  how much of that tweaking have  
you contributed back?  If you have such obvious wins, put them up as  
patches so we can all benefit, just like you've benefitted from our  
volunteering.

As for 90%, I'd say it is more like > 95% and, gee, if I can write a  
general purpose open source search library that keeps 95% of a very,  
very, very large install base happy all while still improving it and  
maintaining backward compatibility, than color me stable.

>
> 2) "You want stuff committed, keep it up to date, make it manageable  
> to review, document it, respond to questions/concerns with answers  
> as best you can. " - To some degree I would hope it depends on what  
> the issue is, e.g. enforcing such process on a one-line null check  
> seems to be an overkill. I agree with the process itself, what would  
> make it better is some transparency on how patches/issues are  
> evaluated to be committed. At least seemed from the outside, it is  
> purely being decided on by the committers, and since my  
> understanding is that an open source project belongs to the public,  
> the public user base should have some say.

Here's your list of opened issues:  https://issues.apache.org/jira/secure/IssueNavigator.jspa?reset=true&reporterSelect=specificuser&reporter=john.wang@gmail.com 
    Only 1 of which has more than 2 votes and which is assigned to  
Hoss.  However, from what I can see, you've had all but 1, I repeat  
ONE, issue not resolved.

And, yes, what gets committed is decided on by the COMMITTERS with  
input from the community; who else can be responsible for committing?   
Hence the title.  We can't please everyone, but I'll be damned if  
you're going to disparage the work of so many because you have sour  
grapes over some people (not all) disagreeing with you over how  
serialization should work in Lucene just b/c you think the problem is  
trivial when clearly others do not.

Committers are picked by the project over a long period of time (feel  
free to nominate someone who you feel has merit, we've elected  
committers based on community nominations in the past) because they  
stick around and stay involved and respond on the list, etc.  I'm  
starting to think your real issue here is that we haven't all agreed  
with you the minute you suggest something, but sorry, that is how open  
source works.

>
> 3) which brings me to this point: "I personally, would love to work  
> on Lucene all day every day as I have a lot of things I'd love to  
> engage the community on, but the fact is I'm not paid to do that, so  
> I give what I can when I can.  I know most of the other committers  
> are that way too." - Is this really true? Isn't a large part of the  
> committer base also a part of the for-profit, consulting business,  
> e.g. Lucid? Would groups/companies that pay for consulting service  
> get their patches/requirements committed with higher priority? If  
> so, seems to me to be a conflict of interest there.

Yes, John, it is true.  I would love to work on Lucene all day.  If I  
won the lottery tomorrow, I'd probably still volunteer on Lucene.  Let  
me ask you back, who pays you to work on Lucene?  Was this patch  
submitted because you just happened to spot it while pouring over the  
code at night on your own and out of the goodness of your heart?  Or  
did you discover it at LinkedIn where you were specifically hired  
because of your Lucene skills and knowledge of the Lucene community?   
In other words, you're accusing me and others of getting paid for my  
expertise in Lucene, all the while you are getting paid for your  
expertise in Lucene.

>
> 4) "Lather, rinse, repeat.   Next thing you know, you'll be on the  
> receiving end as a committer." - While I agree that being a  
> committer is a great honor and many committers are awesome, but  
> assuming everyone would want to be a committer is a little  
> presumptuous.

Where did I imply that?  All I'm saying, is you can't just throw your  
code up here and say "Hey, fix this for me the way I want it fixed and  
then come back and tell me when it's done"  It doesn't work that way.   
It never has.  No open source project works that way.

>
> In conclusion, I hope I didn't unleash any wrath from the committers  
> for expressing candor.

Hey, we're all entitled to your opinions.  Personally, I think you've  
made a lot of nice contributions to Lucene over the years in terms of  
insights, ideas and patches.  So, I guess I am a bit surprised by the  
rancor in your message, which came from out of no where, not too  
mention the fact that it has completely hijacked an otherwise  
interesting conversation about the right way to do serialization.  If  
you want to call that candor, than feel free.

-Grant

---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org

Re: [jira] Commented: (LUCENE-1473) Implement Externalizable in main top level searcher classes

Posted by John Wang <jo...@gmail.com>.

good open source projects should be better than the commercial counter
parts.

I really like 2.4. The DocIDSet/Filter apis really allowed me to do some
interesting stuff.

I feel lucene has potential to be more than just a full text search library.

-John

On Wed, Dec 3, 2008 at 11:58 PM, Robert Muir <rc...@gmail.com> wrote:

> no, i'm not doing any caching but as mentioned it did require some work to
> become almost completely i/o bound due to the nature of my wacky queries,
> example removing O(n) behavior from fuzzy and regexp.
>
> probably the os cache is not helping much because indexes are very large.
> I'm very happy being i/o bound because now and especially in the future i
> think it will be cheaper to speed up with additional ram and faster storage.
>
> still even out of box without any tricks lucene performs *much* better than
> the commercial alternatives i have fought with. lucene was evaluated a while
> ago before 2.3 and this was not the case, but I re-evaluated around 2.3
> release and it is now.
>
>
> On Thu, Dec 4, 2008 at 2:45 AM, John Wang <jo...@gmail.com> wrote:
>
>> Thanks Robert, definitely interested!
>> We are too, looking into SSDs for performance.
>> 2.4 allows you to create extend QueryParser and create your own "leaf"
>> queries.
>> I am surprised you are mostly IO bound. Lucene does a good job caching. Do
>> you do some sort of caching yourself? If your index is not changing often,
>> there is a lot you can do without SSDs.
>>
>> -John
>>
>>
>> On Wed, Dec 3, 2008 at 11:27 PM, Robert Muir <rc...@gmail.com> wrote:
>>
>>> yeah i am using read-only.
>>>
>>> i will admit to subclassing queryparser and having customized
>>> query/scorer for several. all queries contain fuzzy queries so this was
>>> necessary.
>>>
>>> "high" throughput i guess is a matter of opinion. in attempting to
>>> profile high-throughput, again customized query/scorer made it easy for me
>>> to simplify some things, such as some math in termquery that doesn't make
>>> sense (redundant) for my Similarity. everything is pretty much i/o bound now
>>> so if tehre is some throughput issue i will look into SSD for high volume
>>> indexes.
>>>
>>> i posted on Use Cases on the wiki how I made fuzzy and regex fast if you
>>> are curious.
>>>
>>>
>>> On Thu, Dec 4, 2008 at 2:10 AM, John Wang <jo...@gmail.com> wrote:
>>>
>>>> Thanks Robert for sharing.
>>>> Good to hear it is working for what you need it to do.
>>>>
>>>> 3) Especially with ReadOnlyIndexReaders, you should not be blocked while
>>>> indexing. Especially if you have multicore machines.
>>>> 4) do you stay with sub-second responses with high thru-put?
>>>>
>>>> -John
>>>>
>>>>
>>>> On Wed, Dec 3, 2008 at 11:03 PM, Robert Muir <rc...@gmail.com> wrote:
>>>>
>>>>>
>>>>>
>>>>> On Thu, Dec 4, 2008 at 1:24 AM, John Wang <jo...@gmail.com> wrote:
>>>>>
>>>>>> Nice!
>>>>>> Some questions:
>>>>>>
>>>>>> 1) one index?
>>>>>>
>>>>> no, but two individual ones today were around 100M docs
>>>>>
>>>>>> 2) how big is your document? e.g. how many terms etc.
>>>>>>
>>>>> last one built has over 4M terms
>>>>>
>>>>>> 3) are you serving(searching) the docs in realtime?
>>>>>>
>>>>> i dont understand this question, but searching is slower if i am
>>>>> indexing on a disk thats also being searched.
>>>>>
>>>>>>
>>>>>> 4) search speed?
>>>>>>
>>>>> usually subsecond (or close) after some warmup. while this might seem
>>>>> slow its fast compared to the competition, trust me.
>>>>>
>>>>>>
>>>>>> I'd love to learn more about your architecture.
>>>>>>
>>>>> i hate to say you would be disappointed, but theres nothign fancy.
>>>>> probably why it works...
>>>>>
>>>>>>
>>>>>> -John
>>>>>>
>>>>>>
>>>>>> On Wed, Dec 3, 2008 at 10:13 PM, Robert Muir <rc...@gmail.com>wrote:
>>>>>>
>>>>>>> sorry gotta speak up on this. i indexed 300m docs today. I'm using an
>>>>>>> out of box jar.
>>>>>>>
>>>>>>> yeah i have some special subclasses but if i thought any of this
>>>>>>> stuff was general enough to be useful to others i'd submit it. I'm just
>>>>>>> happy to have something scalable that i can customize to my peculiarities.
>>>>>>>
>>>>>>> so i think i fit in your 10% and im not stressing on either
>>>>>>> scalability or api.
>>>>>>>
>>>>>>> thanks,
>>>>>>> robert
>>>>>>>
>>>>>>>
>>>>>>> On Thu, Dec 4, 2008 at 12:36 AM, John Wang <jo...@gmail.com>wrote:
>>>>>>>
>>>>>>>> Grant:
>>>>>>>>         I am sorry that I disagree with some points:
>>>>>>>>
>>>>>>>> 1) "I think it's a sign that Lucene is pretty stable." - While
>>>>>>>> lucene is a great project, especially with 2.x releases, great improvements
>>>>>>>> are made, but do we really have a clear picture on how lucene is being used
>>>>>>>> and deployed. While lucene works great running as a vanilla search library,
>>>>>>>> when pushed to limits, one needs to "hack" into lucene to make certain
>>>>>>>> things work. If 90% of the user base use it to build small indexes and using
>>>>>>>> the vanilla api, and the other 10% is really stressing both on the
>>>>>>>> scalability and api side and are running into issues, would you still say:
>>>>>>>> "running well for 90% of the users, therefore it is stable or extensible"? I
>>>>>>>> think it is unfair to the project itself to be measured by the vanilla
>>>>>>>> use-case. I have done couple of large deployments, e.g. >30 million
>>>>>>>> documents indexed and searched in realtime., and I really had to do some
>>>>>>>> tweaking.
>>>>>>>>
>>>>>>>>
>>>>>>>
>>>>>>> --
>>>>>>> Robert Muir
>>>>>>> rcmuir@gmail.com
>>>>>>>
>>>>>>
>>>>>>
>>>>>
>>>>>
>>>>> --
>>>>> Robert Muir
>>>>> rcmuir@gmail.com
>>>>>
>>>>
>>>>
>>>
>>>
>>> --
>>> Robert Muir
>>> rcmuir@gmail.com
>>>
>>
>>
>
>
> --
> Robert Muir
> rcmuir@gmail.com
>

Re: [jira] Commented: (LUCENE-1473) Implement Externalizable in main top level searcher classes

Posted by Robert Muir <rc...@gmail.com>.

no, i'm not doing any caching but as mentioned it did require some work to
become almost completely i/o bound due to the nature of my wacky queries,
example removing O(n) behavior from fuzzy and regexp.

probably the os cache is not helping much because indexes are very large.
I'm very happy being i/o bound because now and especially in the future i
think it will be cheaper to speed up with additional ram and faster storage.

still even out of box without any tricks lucene performs *much* better than
the commercial alternatives i have fought with. lucene was evaluated a while
ago before 2.3 and this was not the case, but I re-evaluated around 2.3
release and it is now.

On Thu, Dec 4, 2008 at 2:45 AM, John Wang <jo...@gmail.com> wrote:

> Thanks Robert, definitely interested!
> We are too, looking into SSDs for performance.
> 2.4 allows you to create extend QueryParser and create your own "leaf"
> queries.
> I am surprised you are mostly IO bound. Lucene does a good job caching. Do
> you do some sort of caching yourself? If your index is not changing often,
> there is a lot you can do without SSDs.
>
> -John
>
>
> On Wed, Dec 3, 2008 at 11:27 PM, Robert Muir <rc...@gmail.com> wrote:
>
>> yeah i am using read-only.
>>
>> i will admit to subclassing queryparser and having customized query/scorer
>> for several. all queries contain fuzzy queries so this was necessary.
>>
>> "high" throughput i guess is a matter of opinion. in attempting to profile
>> high-throughput, again customized query/scorer made it easy for me to
>> simplify some things, such as some math in termquery that doesn't make sense
>> (redundant) for my Similarity. everything is pretty much i/o bound now so if
>> tehre is some throughput issue i will look into SSD for high volume indexes.
>>
>> i posted on Use Cases on the wiki how I made fuzzy and regex fast if you
>> are curious.
>>
>>
>> On Thu, Dec 4, 2008 at 2:10 AM, John Wang <jo...@gmail.com> wrote:
>>
>>> Thanks Robert for sharing.
>>> Good to hear it is working for what you need it to do.
>>>
>>> 3) Especially with ReadOnlyIndexReaders, you should not be blocked while
>>> indexing. Especially if you have multicore machines.
>>> 4) do you stay with sub-second responses with high thru-put?
>>>
>>> -John
>>>
>>>
>>> On Wed, Dec 3, 2008 at 11:03 PM, Robert Muir <rc...@gmail.com> wrote:
>>>
>>>>
>>>>
>>>> On Thu, Dec 4, 2008 at 1:24 AM, John Wang <jo...@gmail.com> wrote:
>>>>
>>>>> Nice!
>>>>> Some questions:
>>>>>
>>>>> 1) one index?
>>>>>
>>>> no, but two individual ones today were around 100M docs
>>>>
>>>>> 2) how big is your document? e.g. how many terms etc.
>>>>>
>>>> last one built has over 4M terms
>>>>
>>>>> 3) are you serving(searching) the docs in realtime?
>>>>>
>>>> i dont understand this question, but searching is slower if i am
>>>> indexing on a disk thats also being searched.
>>>>
>>>>>
>>>>> 4) search speed?
>>>>>
>>>> usually subsecond (or close) after some warmup. while this might seem
>>>> slow its fast compared to the competition, trust me.
>>>>
>>>>>
>>>>> I'd love to learn more about your architecture.
>>>>>
>>>> i hate to say you would be disappointed, but theres nothign fancy.
>>>> probably why it works...
>>>>
>>>>>
>>>>> -John
>>>>>
>>>>>
>>>>> On Wed, Dec 3, 2008 at 10:13 PM, Robert Muir <rc...@gmail.com> wrote:
>>>>>
>>>>>> sorry gotta speak up on this. i indexed 300m docs today. I'm using an
>>>>>> out of box jar.
>>>>>>
>>>>>> yeah i have some special subclasses but if i thought any of this stuff
>>>>>> was general enough to be useful to others i'd submit it. I'm just happy to
>>>>>> have something scalable that i can customize to my peculiarities.
>>>>>>
>>>>>> so i think i fit in your 10% and im not stressing on either
>>>>>> scalability or api.
>>>>>>
>>>>>> thanks,
>>>>>> robert
>>>>>>
>>>>>>
>>>>>> On Thu, Dec 4, 2008 at 12:36 AM, John Wang <jo...@gmail.com>wrote:
>>>>>>
>>>>>>> Grant:
>>>>>>>         I am sorry that I disagree with some points:
>>>>>>>
>>>>>>> 1) "I think it's a sign that Lucene is pretty stable." - While lucene
>>>>>>> is a great project, especially with 2.x releases, great improvements are
>>>>>>> made, but do we really have a clear picture on how lucene is being used and
>>>>>>> deployed. While lucene works great running as a vanilla search library, when
>>>>>>> pushed to limits, one needs to "hack" into lucene to make certain things
>>>>>>> work. If 90% of the user base use it to build small indexes and using the
>>>>>>> vanilla api, and the other 10% is really stressing both on the scalability
>>>>>>> and api side and are running into issues, would you still say: "running well
>>>>>>> for 90% of the users, therefore it is stable or extensible"? I think it is
>>>>>>> unfair to the project itself to be measured by the vanilla use-case. I have
>>>>>>> done couple of large deployments, e.g. >30 million documents indexed and
>>>>>>> searched in realtime., and I really had to do some tweaking.
>>>>>>>
>>>>>>>
>>>>>>
>>>>>> --
>>>>>> Robert Muir
>>>>>> rcmuir@gmail.com
>>>>>>
>>>>>
>>>>>
>>>>
>>>>
>>>> --
>>>> Robert Muir
>>>> rcmuir@gmail.com
>>>>
>>>
>>>
>>
>>
>> --
>> Robert Muir
>> rcmuir@gmail.com
>>
>
>


-- 
Robert Muir
rcmuir@gmail.com

Re: [jira] Commented: (LUCENE-1473) Implement Externalizable in main top level searcher classes

Posted by John Wang <jo...@gmail.com>.

Thanks Robert, definitely interested!
We are too, looking into SSDs for performance.
2.4 allows you to create extend QueryParser and create your own "leaf"
queries.
I am surprised you are mostly IO bound. Lucene does a good job caching. Do
you do some sort of caching yourself? If your index is not changing often,
there is a lot you can do without SSDs.

-John

On Wed, Dec 3, 2008 at 11:27 PM, Robert Muir <rc...@gmail.com> wrote:

> yeah i am using read-only.
>
> i will admit to subclassing queryparser and having customized query/scorer
> for several. all queries contain fuzzy queries so this was necessary.
>
> "high" throughput i guess is a matter of opinion. in attempting to profile
> high-throughput, again customized query/scorer made it easy for me to
> simplify some things, such as some math in termquery that doesn't make sense
> (redundant) for my Similarity. everything is pretty much i/o bound now so if
> tehre is some throughput issue i will look into SSD for high volume indexes.
>
> i posted on Use Cases on the wiki how I made fuzzy and regex fast if you
> are curious.
>
>
> On Thu, Dec 4, 2008 at 2:10 AM, John Wang <jo...@gmail.com> wrote:
>
>> Thanks Robert for sharing.
>> Good to hear it is working for what you need it to do.
>>
>> 3) Especially with ReadOnlyIndexReaders, you should not be blocked while
>> indexing. Especially if you have multicore machines.
>> 4) do you stay with sub-second responses with high thru-put?
>>
>> -John
>>
>>
>> On Wed, Dec 3, 2008 at 11:03 PM, Robert Muir <rc...@gmail.com> wrote:
>>
>>>
>>>
>>> On Thu, Dec 4, 2008 at 1:24 AM, John Wang <jo...@gmail.com> wrote:
>>>
>>>> Nice!
>>>> Some questions:
>>>>
>>>> 1) one index?
>>>>
>>> no, but two individual ones today were around 100M docs
>>>
>>>> 2) how big is your document? e.g. how many terms etc.
>>>>
>>> last one built has over 4M terms
>>>
>>>> 3) are you serving(searching) the docs in realtime?
>>>>
>>> i dont understand this question, but searching is slower if i am indexing
>>> on a disk thats also being searched.
>>>
>>>>
>>>> 4) search speed?
>>>>
>>> usually subsecond (or close) after some warmup. while this might seem
>>> slow its fast compared to the competition, trust me.
>>>
>>>>
>>>> I'd love to learn more about your architecture.
>>>>
>>> i hate to say you would be disappointed, but theres nothign fancy.
>>> probably why it works...
>>>
>>>>
>>>> -John
>>>>
>>>>
>>>> On Wed, Dec 3, 2008 at 10:13 PM, Robert Muir <rc...@gmail.com> wrote:
>>>>
>>>>> sorry gotta speak up on this. i indexed 300m docs today. I'm using an
>>>>> out of box jar.
>>>>>
>>>>> yeah i have some special subclasses but if i thought any of this stuff
>>>>> was general enough to be useful to others i'd submit it. I'm just happy to
>>>>> have something scalable that i can customize to my peculiarities.
>>>>>
>>>>> so i think i fit in your 10% and im not stressing on either scalability
>>>>> or api.
>>>>>
>>>>> thanks,
>>>>> robert
>>>>>
>>>>>
>>>>> On Thu, Dec 4, 2008 at 12:36 AM, John Wang <jo...@gmail.com>wrote:
>>>>>
>>>>>> Grant:
>>>>>>         I am sorry that I disagree with some points:
>>>>>>
>>>>>> 1) "I think it's a sign that Lucene is pretty stable." - While lucene
>>>>>> is a great project, especially with 2.x releases, great improvements are
>>>>>> made, but do we really have a clear picture on how lucene is being used and
>>>>>> deployed. While lucene works great running as a vanilla search library, when
>>>>>> pushed to limits, one needs to "hack" into lucene to make certain things
>>>>>> work. If 90% of the user base use it to build small indexes and using the
>>>>>> vanilla api, and the other 10% is really stressing both on the scalability
>>>>>> and api side and are running into issues, would you still say: "running well
>>>>>> for 90% of the users, therefore it is stable or extensible"? I think it is
>>>>>> unfair to the project itself to be measured by the vanilla use-case. I have
>>>>>> done couple of large deployments, e.g. >30 million documents indexed and
>>>>>> searched in realtime., and I really had to do some tweaking.
>>>>>>
>>>>>>
>>>>>
>>>>> --
>>>>> Robert Muir
>>>>> rcmuir@gmail.com
>>>>>
>>>>
>>>>
>>>
>>>
>>> --
>>> Robert Muir
>>> rcmuir@gmail.com
>>>
>>
>>
>
>
> --
> Robert Muir
> rcmuir@gmail.com
>

Re: [jira] Commented: (LUCENE-1473) Implement Externalizable in main top level searcher classes

Posted by Michael McCandless <lu...@mikemccandless.com>.

Robert Muir wrote:

> i posted on Use Cases on the wiki how I made fuzzy and regex fast if  
> you are curious.

It looks like this is the wiki page:

     http://wiki.apache.org/lucene-java/FastSSFuzzy?highlight=(fuzzy)

The approach is similar to how contrib/spellchecker generates its  
candidates, in that you build a 2nd index from the primary index and  
use the 2nd index to more quickly (not O(N)) generate candidates.   
It'd be nice to get your approach into contrib as well ;)

Mike

---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org

Re: [jira] Commented: (LUCENE-1473) Implement Externalizable in main top level searcher classes

Posted by Robert Muir <rc...@gmail.com>.

yeah i am using read-only.

i will admit to subclassing queryparser and having customized query/scorer
for several. all queries contain fuzzy queries so this was necessary.

"high" throughput i guess is a matter of opinion. in attempting to profile
high-throughput, again customized query/scorer made it easy for me to
simplify some things, such as some math in termquery that doesn't make sense
(redundant) for my Similarity. everything is pretty much i/o bound now so if
tehre is some throughput issue i will look into SSD for high volume indexes.

i posted on Use Cases on the wiki how I made fuzzy and regex fast if you are
curious.

On Thu, Dec 4, 2008 at 2:10 AM, John Wang <jo...@gmail.com> wrote:

> Thanks Robert for sharing.
> Good to hear it is working for what you need it to do.
>
> 3) Especially with ReadOnlyIndexReaders, you should not be blocked while
> indexing. Especially if you have multicore machines.
> 4) do you stay with sub-second responses with high thru-put?
>
> -John
>
>
> On Wed, Dec 3, 2008 at 11:03 PM, Robert Muir <rc...@gmail.com> wrote:
>
>>
>>
>> On Thu, Dec 4, 2008 at 1:24 AM, John Wang <jo...@gmail.com> wrote:
>>
>>> Nice!
>>> Some questions:
>>>
>>> 1) one index?
>>>
>> no, but two individual ones today were around 100M docs
>>
>>> 2) how big is your document? e.g. how many terms etc.
>>>
>> last one built has over 4M terms
>>
>>> 3) are you serving(searching) the docs in realtime?
>>>
>> i dont understand this question, but searching is slower if i am indexing
>> on a disk thats also being searched.
>>
>>>
>>> 4) search speed?
>>>
>> usually subsecond (or close) after some warmup. while this might seem slow
>> its fast compared to the competition, trust me.
>>
>>>
>>> I'd love to learn more about your architecture.
>>>
>> i hate to say you would be disappointed, but theres nothign fancy.
>> probably why it works...
>>
>>>
>>> -John
>>>
>>>
>>> On Wed, Dec 3, 2008 at 10:13 PM, Robert Muir <rc...@gmail.com> wrote:
>>>
>>>> sorry gotta speak up on this. i indexed 300m docs today. I'm using an
>>>> out of box jar.
>>>>
>>>> yeah i have some special subclasses but if i thought any of this stuff
>>>> was general enough to be useful to others i'd submit it. I'm just happy to
>>>> have something scalable that i can customize to my peculiarities.
>>>>
>>>> so i think i fit in your 10% and im not stressing on either scalability
>>>> or api.
>>>>
>>>> thanks,
>>>> robert
>>>>
>>>>
>>>> On Thu, Dec 4, 2008 at 12:36 AM, John Wang <jo...@gmail.com> wrote:
>>>>
>>>>> Grant:
>>>>>         I am sorry that I disagree with some points:
>>>>>
>>>>> 1) "I think it's a sign that Lucene is pretty stable." - While lucene
>>>>> is a great project, especially with 2.x releases, great improvements are
>>>>> made, but do we really have a clear picture on how lucene is being used and
>>>>> deployed. While lucene works great running as a vanilla search library, when
>>>>> pushed to limits, one needs to "hack" into lucene to make certain things
>>>>> work. If 90% of the user base use it to build small indexes and using the
>>>>> vanilla api, and the other 10% is really stressing both on the scalability
>>>>> and api side and are running into issues, would you still say: "running well
>>>>> for 90% of the users, therefore it is stable or extensible"? I think it is
>>>>> unfair to the project itself to be measured by the vanilla use-case. I have
>>>>> done couple of large deployments, e.g. >30 million documents indexed and
>>>>> searched in realtime., and I really had to do some tweaking.
>>>>>
>>>>>
>>>>
>>>> --
>>>> Robert Muir
>>>> rcmuir@gmail.com
>>>>
>>>
>>>
>>
>>
>> --
>> Robert Muir
>> rcmuir@gmail.com
>>
>
>


-- 
Robert Muir
rcmuir@gmail.com

Re: [jira] Commented: (LUCENE-1473) Implement Externalizable in main top level searcher classes

Posted by John Wang <jo...@gmail.com>.

Thanks Robert for sharing.
Good to hear it is working for what you need it to do.

3) Especially with ReadOnlyIndexReaders, you should not be blocked while
indexing. Especially if you have multicore machines.
4) do you stay with sub-second responses with high thru-put?

-John

On Wed, Dec 3, 2008 at 11:03 PM, Robert Muir <rc...@gmail.com> wrote:

>
>
> On Thu, Dec 4, 2008 at 1:24 AM, John Wang <jo...@gmail.com> wrote:
>
>> Nice!
>> Some questions:
>>
>> 1) one index?
>>
> no, but two individual ones today were around 100M docs
>
>> 2) how big is your document? e.g. how many terms etc.
>>
> last one built has over 4M terms
>
>> 3) are you serving(searching) the docs in realtime?
>>
> i dont understand this question, but searching is slower if i am indexing
> on a disk thats also being searched.
>
>>
>> 4) search speed?
>>
> usually subsecond (or close) after some warmup. while this might seem slow
> its fast compared to the competition, trust me.
>
>>
>> I'd love to learn more about your architecture.
>>
> i hate to say you would be disappointed, but theres nothign fancy. probably
> why it works...
>
>>
>> -John
>>
>>
>> On Wed, Dec 3, 2008 at 10:13 PM, Robert Muir <rc...@gmail.com> wrote:
>>
>>> sorry gotta speak up on this. i indexed 300m docs today. I'm using an out
>>> of box jar.
>>>
>>> yeah i have some special subclasses but if i thought any of this stuff
>>> was general enough to be useful to others i'd submit it. I'm just happy to
>>> have something scalable that i can customize to my peculiarities.
>>>
>>> so i think i fit in your 10% and im not stressing on either scalability
>>> or api.
>>>
>>> thanks,
>>> robert
>>>
>>>
>>> On Thu, Dec 4, 2008 at 12:36 AM, John Wang <jo...@gmail.com> wrote:
>>>
>>>> Grant:
>>>>         I am sorry that I disagree with some points:
>>>>
>>>> 1) "I think it's a sign that Lucene is pretty stable." - While lucene is
>>>> a great project, especially with 2.x releases, great improvements are made,
>>>> but do we really have a clear picture on how lucene is being used and
>>>> deployed. While lucene works great running as a vanilla search library, when
>>>> pushed to limits, one needs to "hack" into lucene to make certain things
>>>> work. If 90% of the user base use it to build small indexes and using the
>>>> vanilla api, and the other 10% is really stressing both on the scalability
>>>> and api side and are running into issues, would you still say: "running well
>>>> for 90% of the users, therefore it is stable or extensible"? I think it is
>>>> unfair to the project itself to be measured by the vanilla use-case. I have
>>>> done couple of large deployments, e.g. >30 million documents indexed and
>>>> searched in realtime., and I really had to do some tweaking.
>>>>
>>>>
>>>
>>> --
>>> Robert Muir
>>> rcmuir@gmail.com
>>>
>>
>>
>
>
> --
> Robert Muir
> rcmuir@gmail.com
>

Re: [jira] Commented: (LUCENE-1473) Implement Externalizable in main top level searcher classes

Posted by Robert Muir <rc...@gmail.com>.

On Thu, Dec 4, 2008 at 1:24 AM, John Wang <jo...@gmail.com> wrote:

> Nice!
> Some questions:
>
> 1) one index?
>
no, but two individual ones today were around 100M docs

> 2) how big is your document? e.g. how many terms etc.
>
last one built has over 4M terms

> 3) are you serving(searching) the docs in realtime?
>
i dont understand this question, but searching is slower if i am indexing on
a disk thats also being searched.

>
> 4) search speed?
>
usually subsecond (or close) after some warmup. while this might seem slow
its fast compared to the competition, trust me.

>
> I'd love to learn more about your architecture.
>
i hate to say you would be disappointed, but theres nothign fancy. probably
why it works...

>
> -John
>
>
> On Wed, Dec 3, 2008 at 10:13 PM, Robert Muir <rc...@gmail.com> wrote:
>
>> sorry gotta speak up on this. i indexed 300m docs today. I'm using an out
>> of box jar.
>>
>> yeah i have some special subclasses but if i thought any of this stuff was
>> general enough to be useful to others i'd submit it. I'm just happy to have
>> something scalable that i can customize to my peculiarities.
>>
>> so i think i fit in your 10% and im not stressing on either scalability or
>> api.
>>
>> thanks,
>> robert
>>
>>
>> On Thu, Dec 4, 2008 at 12:36 AM, John Wang <jo...@gmail.com> wrote:
>>
>>> Grant:
>>>         I am sorry that I disagree with some points:
>>>
>>> 1) "I think it's a sign that Lucene is pretty stable." - While lucene is
>>> a great project, especially with 2.x releases, great improvements are made,
>>> but do we really have a clear picture on how lucene is being used and
>>> deployed. While lucene works great running as a vanilla search library, when
>>> pushed to limits, one needs to "hack" into lucene to make certain things
>>> work. If 90% of the user base use it to build small indexes and using the
>>> vanilla api, and the other 10% is really stressing both on the scalability
>>> and api side and are running into issues, would you still say: "running well
>>> for 90% of the users, therefore it is stable or extensible"? I think it is
>>> unfair to the project itself to be measured by the vanilla use-case. I have
>>> done couple of large deployments, e.g. >30 million documents indexed and
>>> searched in realtime., and I really had to do some tweaking.
>>>
>>>
>>
>> --
>> Robert Muir
>> rcmuir@gmail.com
>>
>
>


-- 
Robert Muir
rcmuir@gmail.com

Re: [jira] Commented: (LUCENE-1473) Implement Externalizable in main top level searcher classes

Posted by John Wang <jo...@gmail.com>.

Nice!
Some questions:

1) one index?
2) how big is your document? e.g. how many terms etc.
3) are you serving(searching) the docs in realtime?
4) search speed?

I'd love to learn more about your architecture.

-John


On Wed, Dec 3, 2008 at 10:13 PM, Robert Muir <rc...@gmail.com> wrote:

> sorry gotta speak up on this. i indexed 300m docs today. I'm using an out
> of box jar.
>
> yeah i have some special subclasses but if i thought any of this stuff was
> general enough to be useful to others i'd submit it. I'm just happy to have
> something scalable that i can customize to my peculiarities.
>
> so i think i fit in your 10% and im not stressing on either scalability or
> api.
>
> thanks,
> robert
>
>
> On Thu, Dec 4, 2008 at 12:36 AM, John Wang <jo...@gmail.com> wrote:
>
>> Grant:
>>         I am sorry that I disagree with some points:
>>
>> 1) "I think it's a sign that Lucene is pretty stable." - While lucene is a
>> great project, especially with 2.x releases, great improvements are made,
>> but do we really have a clear picture on how lucene is being used and
>> deployed. While lucene works great running as a vanilla search library, when
>> pushed to limits, one needs to "hack" into lucene to make certain things
>> work. If 90% of the user base use it to build small indexes and using the
>> vanilla api, and the other 10% is really stressing both on the scalability
>> and api side and are running into issues, would you still say: "running well
>> for 90% of the users, therefore it is stable or extensible"? I think it is
>> unfair to the project itself to be measured by the vanilla use-case. I have
>> done couple of large deployments, e.g. >30 million documents indexed and
>> searched in realtime., and I really had to do some tweaking.
>>
>>
>
> --
> Robert Muir
> rcmuir@gmail.com
>

Re: [jira] Commented: (LUCENE-1473) Implement Externalizable in main top level searcher classes

Posted by Robert Muir <rc...@gmail.com>.

sorry gotta speak up on this. i indexed 300m docs today. I'm using an out of
box jar.

yeah i have some special subclasses but if i thought any of this stuff was
general enough to be useful to others i'd submit it. I'm just happy to have
something scalable that i can customize to my peculiarities.

so i think i fit in your 10% and im not stressing on either scalability or
api.

thanks,
robert

On Thu, Dec 4, 2008 at 12:36 AM, John Wang <jo...@gmail.com> wrote:

> Grant:
>         I am sorry that I disagree with some points:
>
> 1) "I think it's a sign that Lucene is pretty stable." - While lucene is a
> great project, especially with 2.x releases, great improvements are made,
> but do we really have a clear picture on how lucene is being used and
> deployed. While lucene works great running as a vanilla search library, when
> pushed to limits, one needs to "hack" into lucene to make certain things
> work. If 90% of the user base use it to build small indexes and using the
> vanilla api, and the other 10% is really stressing both on the scalability
> and api side and are running into issues, would you still say: "running well
> for 90% of the users, therefore it is stable or extensible"? I think it is
> unfair to the project itself to be measured by the vanilla use-case. I have
> done couple of large deployments, e.g. >30 million documents indexed and
> searched in realtime., and I really had to do some tweaking.
>
>

-- 
Robert Muir
rcmuir@gmail.com

Re: [jira] Commented: (LUCENE-1473) Implement Externalizable in main top level searcher classes

Posted by Mark Miller <ma...@gmail.com>.

John Wang wrote:
>
>
> Seems like being a committer can be rather lucrative.

I think being an Apache committer on any project can be somewhat 
lucrative. Companies know that you probably work well with others if 
your a committer, which can probably lead to improved career 
opportunities. Cant say too much about working well with others :) I may 
not be extracting as much money as I can though - sounds like I could be 
taking bribes to commit code if I wanted to make more ;)

> My comment was on the statements of being volunteers and don't get 
> paid, which is a little misleading.
It depends. Sometimes, something your doing with a customer might make 
its way into Lucene. Thats not most of the work that goes on here 
though. Most of the work is looking at submitted patches in our free 
time, going over them, running the tests, and possibly committing them. 
I do that for the project because I like to, not for any money I'm 
getting (true enough I havnt been a core committer long, but I did the 
same as a contrib committer). When I'm sitting around at 11 at night or 
7 in the morning, trying to get patches committed, I'd hate to be 
classified as a non volunteer. Its just as easy to get the committer 
title and then fall off the face of the world. No one ensures you are 
helping anyone get anything done.
>
> I guess I need to learn to be a good boy not to piss off the 
> committers anymore (or convince my company to pay to get some patches 
> in) And hopefully someday I get to grow up and get to become a 
> committer and make some $ too.
You might consider it. I think you have been a bit rude, but watch and 
see...quality patches you submit will still get processed like any 
other. The people around here are friendly and mainly interested in the 
quality of Lucene. Noone is trying to enforce some sort of "power elite" 
here. There is no blacklist. At the same time, lashing out isnt going to 
help get any issues passed (in fact, I've seen it flounder more than one 
issue).

I've certainly never been involved in Lucene for the money myself (and I 
don't have much of it, believe you me).

- Mark
>
> -John
>


---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org

Re: [jira] Commented: (LUCENE-1473) Implement Externalizable in main top level searcher classes

Posted by John Wang <jo...@gmail.com>.

Thanks Eks for the "education".

1) If you think Lucene is good enough for you, then great. I think there is
room for improvement, and wanted to share on some work we did to the rest of
the community thru open source. If you are happy to take a snapshot of
lucene and build on top of it, then good for you.

2) yes, there is Jira. Yet at least seems to me the severity and votes do
not reflect on how to patches gets committed. Good for you that your patches
get regularly committed, I guess there is a lot for me to learn from you on
how to do that. Obviously being out-spoken does not help. Open source
politics, cool!

3) If that is how it works, then it is how it works. (Sounds a lot like the
Spring project.)

Seems like being a committer can be rather lucrative. My comment was on the
statements of being volunteers and don't get paid, which is a little
misleading.

I guess I need to learn to be a good boy not to piss off the committers
anymore (or convince my company to pay to get some patches in) And hopefully
someday I get to grow up and get to become a committer and make some $ too.

-John

On Wed, Dec 3, 2008 at 10:36 PM, eks dev <ek...@yahoo.co.uk> wrote:

> John,
> sorry I have to comment,  but I feel here some substantial missconceptions
> abot Open Source
>
> 1)
> "e.g. >30 million documents indexed and searched in realtime., and I really
> had to do some tweaking."
> So what? What I or anyone else has to do with it? "some tweaking" is
> definitely better than making everything from the scratch or going  to
> commercial vendors... no?
>
> 2)
> "what would make it better is some transparency on how patches/issues are
> evaluated to be committed. At least seemed from the outside, it is purely
> being decided on by the committers, and since my understanding is that an
> open source project belongs to the public, the public user base should have
> some say."
>
> Transparency, Jira + this mailing list. Everybody is allowed to express an
> opinion,  *even committers* , weather you like it or not is just another
> question. If you put up convincing arguments, be assured even committers can
> change opinions.
> Imo, it does not go much more transparent than that.
> Sure it belongs to public, you do not have to pay for it, read ASF Licence.
> If you have better proposal on how to organize Open Source projects,
> speak-up.  I do not know how we could ever avoid committers having final say
> on things without provoking haos?
>
> 3) "Would groups/companies that pay for consulting service get their
> patches/requirements committed with higher priority?"
> Sure, of course, *even commmercial users are parts of the comunity* and we
> schould be greatful that they contribute and commit ther resouces so that
> others can benefit from it. Think again about it, there is absolutly nothing
> bad behind it, no conspiracy.
> Just one example on micro scale. I had an itch  and had to do some
> "tweaking", my customer(comercial) had nothing against contributing back to
> Lucene, so I did it. I get my money and I give something back to the
> comunity. End result, I am happy, Lucene gets better and everybody profits a
> bit from it.
> Should I have problems with my consciones?  I do not think so.
>
> Conflict of interests, no, that is rather evolution. What do you think why
> commiters work on Lucene, do you honestly beleive they have no families to
> feed and just sit and wait someone feeds them with proposals for nice
> features?  Commiters as well as everybody else here have their own, private
> agendas, goals, ideas, needs ... and all these things get somehow conflated
> into Lucene.
> Back to my example, I was lucky that a few commiters shared my opinion
> about usfulness and the priority of this patch, it could have been
> different. If all commiters were busy with private agenda and had higher
> priorities at that moment, well, that would habe been bad luck for me. No
> hard feelings even in that case, why should I expect someone puts my itch as
> their priority.
>
> Cheers, eks
>
>
>
>
>
>
>
>
>
>
>
> ------------------------------
> *From:* John Wang <jo...@gmail.com>
> *To:* java-dev@lucene.apache.org
> *Sent:* Thursday, 4 December, 2008 6:36:28
> *Subject:* Re: [jira] Commented: (LUCENE-1473) Implement Externalizable in
> main top level searcher classes
>
> Grant:
>         I am sorry that I disagree with some points:
>
> 1) "I think it's a sign that Lucene is pretty stable." - While lucene is a
> great project, especially with 2.x releases, great improvements are made,
> but do we really have a clear picture on how lucene is being used and
> deployed. While lucene works great running as a vanilla search library, when
> pushed to limits, one needs to "hack" into lucene to make certain things
> work. If 90% of the user base use it to build small indexes and using the
> vanilla api, and the other 10% is really stressing both on the scalability
> and api side and are running into issues, would you still say: "running well
> for 90% of the users, therefore it is stable or extensible"? I think it is
> unfair to the project itself to be measured by the vanilla use-case. I have
> done couple of large deployments, e.g. >30 million documents indexed and
> searched in realtime., and I really had to do some tweaking.
>
> 2) "You want stuff committed, keep it up to date, make it manageable to
> review, document it, respond to questions/concerns with answers as best you
> can. " - To some degree I would hope it depends on what the issue is, e.g.
> enforcing such process on a one-line null check seems to be an overkill. I
> agree with the process itself, what would make it better is some
> transparency on how patches/issues are evaluated to be committed. At least
> seemed from the outside, it is purely being decided on by the committers,
> and since my understanding is that an open source project belongs to the
> public, the public user base should have some say.
>
> 3) which brings me to this point: "I personally, would love to work on
> Lucene all day every day as I have a lot of things I'd love to engage the
> community on, but the fact is I'm not paid to do that, so I give what I can
> when I can.  I know most of the other committers are that way too." - Is
> this really true? Isn't a large part of the committer base also a part of
> the for-profit, consulting business, e.g. Lucid? Would groups/companies that
> pay for consulting service get their patches/requirements committed with
> higher priority? If so, seems to me to be a conflict of interest there.
>
> 4) "Lather, rinse, repeat.   Next thing you know, you'll be on the
> receiving end as a committer." - While I agree that being a committer is a
> great honor and many committers are awesome, but assuming everyone would
> want to be a committer is a little presumptuous.
>
> In conclusion, I hope I didn't unleash any wrath from the committers for
> expressing candor.
>
> -John
>
> On Wed, Dec 3, 2008 at 2:52 PM, Grant Ingersoll <gs...@apache.org>wrote:
>
>>
>> On Dec 3, 2008, at 2:27 PM, Jason Rutherglen (JIRA) wrote:
>>
>>
>>>
>>> Hoss wrote: "sort of mythical "Lucene powerhouse"
>>> Lucene seems to run itself quite differently than other open source Java
>>> projects.  Perhaps it would be good to spell out the reasons for the
>>> reluctance to move ahead with features that developers work on, that work,
>>> but do not go in.  The developer contributions seem to be quite low right
>>> now, especially compared to neighbor projects such as Hadoop.  Is this
>>> because fewer people are using Lucene?  Or is it due to the reluctance to
>>> work with the developer community?  Unfortunately the perception in the eyes
>>> of some people who work on search related projects it is the latter.
>>>
>>
>>
>> Or, could it be that Hadoop is relatively new and in vogue at the moment,
>> very malleable and buggy(?) and has a HUGE corporate sponsor who dedicates
>> lots of resources to it on a full time basis, whilst Lucene has been around
>> in the ASF for 7+ years (and 12+ years total) and has a really large install
>> base and thus must move more deliberately and basically has 1 person who
>> gets to work on it full time while the rest of us pretty much volunteer?
>>  That's not an excuse, it's just the way it is.  I personally, would love to
>> work on Lucene all day every day as I have a lot of things I'd love to
>> engage the community on, but the fact is I'm not paid to do that, so I give
>> what I can when I can.  I know most of the other committers are that way
>> too.
>>
>> Thus, I don't think any one of us has a reluctance to move ahead with
>> features or bug fixes.   Looking at CHANGES.txt, I see a lot of
>> contributors.  Looking at java-dev and JIRA, I see lots of engagement with
>> the community.  Is it near the historical high for traffic, no it's not, but
>> that isn't necessarily a bad thing.  I think it's a sign that Lucene is
>> pretty stable.
>>
>> What we do have a reluctance for are patches that don't have tests (i.e.
>> this one), patches that massively change Lucene APIs in non-trivial ways or
>> break back compatibility or are not kept up to date.  Are we perfect?  Of
>> course not.  I, personally, would love for there to be a way that helps us
>> process a larger volume of patches (note, I didn't say commit a larger
>> volume).  Hadoop's automated patch tester would be a huge start in that, but
>> at the end of the day, Lucene still works the way all ASF projects do: via
>> meritocracy and volunteerism.     You want stuff committed, keep it up to
>> date, make it manageable to review, document it, respond to
>> questions/concerns with answers as best you can.  To that end, a real simple
>> question can go a long way and getting something committed, and it simply
>> is:  "Hey Lucener's,  what else can I do to help you review and commit
>> LUCENE-XXXX?"  Lather, rinse, repeat.   Next thing you know, you'll be on
>> the receiving end as a committer.
>>
>> -Grant
>>
>>
>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
>> For additional commands, e-mail: java-dev-help@lucene.apache.org
>>
>>
>
>

Re: [jira] Commented: (LUCENE-1473) Implement Externalizable in main top level searcher classes

Posted by eks dev <ek...@yahoo.co.uk>.

John, 
sorry I have to comment,  but I feel here some substantial missconceptions abot Open Source

1)
"e.g. >30 million documents indexed and searched in realtime., and I really had to do some tweaking."
So what? What I or anyone else has to do with it? "some tweaking" is definitely better than making everything from the scratch or going  to commercial vendors... no?


2) 
"what would make it better is some transparency on how patches/issues
are evaluated to be committed. At least seemed from the outside, it is
purely being decided on by the committers, and since my understanding
is that an open source project belongs to the public, the public user
base should have some say."

Transparency, Jira + this mailing list. Everybody is allowed to express an opinion,  *even committers* , weather you like it or not is just another question. If you put up convincing arguments, be assured even committers can change opinions.
Imo, it does not go much more transparent than that. 
Sure it belongs to public, you do not have to pay for it, read ASF Licence. If you have better proposal on how to organize Open Source projects, speak-up.  I do not know how we could ever avoid committers having final say on things without provoking haos? 

3) "Would groups/companies that pay for consulting service get their patches/requirements committed with higher priority?"
Sure, of course, *even commmercial users are parts of the comunity* and we  schould be greatful that they contribute and commit ther resouces so that others can benefit from it. Think again about it, there is absolutly nothing bad behind it, no conspiracy.
Just one example on micro scale. I had an itch  and had to do some "tweaking", my customer(comercial) had nothing against contributing back to Lucene, so I did it. I get my money and I give something back to the comunity. End result, I am happy, Lucene gets better and everybody profits a bit from it.  
Should I have problems with my consciones?  I do not think so.  

Conflict of interests, no, that is rather evolution. What do you think why commiters work on Lucene, do you honestly beleive they have no families to feed and just sit and wait someone feeds them with proposals for nice features?  Commiters as well as everybody else here have their own, private agendas, goals, ideas, needs ... and all these things get somehow conflated into Lucene. 
Back to my example, I was lucky that a few commiters shared my opinion about usfulness and the priority of this patch, it could have been different. If all commiters were busy with private agenda and had higher priorities at that moment, well, that would habe been bad luck for me. No hard feelings even in that case, why should I expect someone puts my itch as their priority.

Cheers, eks

  





 





________________________________
From: John Wang <jo...@gmail.com>
To: java-dev@lucene.apache.org
Sent: Thursday, 4 December, 2008 6:36:28
Subject: Re: [jira] Commented: (LUCENE-1473) Implement Externalizable in main top level searcher classes

Grant:

        I am sorry that I disagree with some points:

1) "I think it's a sign that Lucene is pretty stable." - While lucene is a great project, especially with 2.x releases, great improvements are made, but do we really have a clear picture on how lucene is being used and deployed. While lucene works great running as a vanilla search library, when pushed to limits, one needs to "hack" into lucene to make certain things work. If 90% of the user base use it to build small indexes and using the vanilla api, and the other 10% is really stressing both on the scalability and api side and are running into issues, would you still say: "running well for 90% of the users, therefore it is stable or extensible"? I think it is unfair to the project itself to be measured by the vanilla use-case. I have done couple of large deployments, e.g. >30 million documents indexed and searched in realtime., and I really had to do some tweaking.

2) "You want stuff committed, keep it up to date, make it manageable to review, document it, respond to questions/concerns with answers as best you can. " - To some degree I would hope it depends on what the issue is, e.g. enforcing such process on a one-line null check seems to be an overkill. I agree with the process itself, what would make it better is some transparency on how patches/issues are evaluated to be committed. At least seemed from the outside, it is purely being decided on by the committers, and since my understanding is that an open source project belongs to the public, the public user base should have some say.

3) which brings me to this point: "I personally, would love to work on Lucene all day every day as I have a lot of things I'd love to engage the community on, but the fact is I'm not paid to do that, so I give what I can when I can.  I know most of the other committers are that way too." - Is this really true? Isn't a large part of the committer base also a part of the for-profit, consulting business, e.g. Lucid? Would groups/companies that pay for consulting service get their patches/requirements committed with higher priority? If so, seems to me to be a conflict of interest there.

4) "Lather, rinse, repeat.   Next thing you know, you'll be on the receiving end as a committer." - While I agree that being a committer is a great honor and many committers are awesome, but assuming everyone would want to be a committer is a little presumptuous.

In conclusion, I hope I didn't unleash any wrath from the committers for expressing candor.

-John


On Wed, Dec 3, 2008 at 2:52 PM, Grant Ingersoll <gs...@apache.org> wrote:


On Dec 3, 2008, at 2:27 PM, Jason Rutherglen (JIRA) wrote:




Hoss wrote: "sort of mythical "Lucene powerhouse"
Lucene seems to run itself quite differently than other open source Java projects.  Perhaps it would be good to spell out the reasons for the reluctance to move ahead with features that developers work on, that work, but do not go in.  The developer contributions seem to be quite low right now, especially compared to neighbor projects such as Hadoop.  Is this because fewer people are using Lucene?  Or is it due to the reluctance to work with the developer community?  Unfortunately the perception in the eyes of some people who work on search related projects it is the latter.



Or, could it be that Hadoop is relatively new and in vogue at the moment, very malleable and buggy(?) and has a HUGE corporate sponsor who dedicates lots of resources to it on a full time basis, whilst Lucene has been around in the ASF for 7+ years (and 12+ years total) and has a really large install base and thus must move more deliberately and basically has 1 person who gets to work on it full time while the rest of us pretty much volunteer?    That's not an excuse, it's just the way it is.  I personally, would love to work on Lucene all day every day as I have a lot of things I'd love to engage the community on, but the fact is I'm not paid to do that, so I give what I can when I can.  I know most of the other committers are that way too.

Thus, I don't think any one of us has a reluctance to move ahead with features or bug fixes.   Looking at CHANGES.txt, I see a lot of contributors.  Looking at java-dev and JIRA, I see lots of engagement with the community.  Is it near the historical high for traffic, no it's not, but that isn't necessarily a bad thing.  I think it's a sign that Lucene is pretty stable.

What we do have a reluctance for are patches that don't have tests (i.e. this one), patches that massively change Lucene APIs in non-trivial ways or break back compatibility or are not kept up to date.  Are we perfect?  Of course not.  I, personally, would love for there to be a way that helps us process a larger volume of patches (note, I didn't say commit a larger volume).  Hadoop's automated patch tester would be a huge start in that, but at the end of the day, Lucene still works the way all ASF projects do: via meritocracy and volunteerism.     You want stuff committed, keep it up to date, make it manageable to review, document it, respond to questions/concerns with answers as best you can.  To that end, a real simple question can go a long way and getting something committed, and it simply is:  "Hey Lucener's,  what else can I do to help you review and commit LUCENE-XXXX?"  Lather, rinse, repeat.   Next thing you know, you'll be on the receiving end
 as a committer.

-Grant



---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org

Re: [jira] Commented: (LUCENE-1473) Implement Externalizable in main top level searcher classes

Posted by John Wang <jo...@gmail.com>.

Grant:
        I am sorry that I disagree with some points:

1) "I think it's a sign that Lucene is pretty stable." - While lucene is a
great project, especially with 2.x releases, great improvements are made,
but do we really have a clear picture on how lucene is being used and
deployed. While lucene works great running as a vanilla search library, when
pushed to limits, one needs to "hack" into lucene to make certain things
work. If 90% of the user base use it to build small indexes and using the
vanilla api, and the other 10% is really stressing both on the scalability
and api side and are running into issues, would you still say: "running well
for 90% of the users, therefore it is stable or extensible"? I think it is
unfair to the project itself to be measured by the vanilla use-case. I have
done couple of large deployments, e.g. >30 million documents indexed and
searched in realtime., and I really had to do some tweaking.

2) "You want stuff committed, keep it up to date, make it manageable to
review, document it, respond to questions/concerns with answers as best you
can. " - To some degree I would hope it depends on what the issue is, e.g.
enforcing such process on a one-line null check seems to be an overkill. I
agree with the process itself, what would make it better is some
transparency on how patches/issues are evaluated to be committed. At least
seemed from the outside, it is purely being decided on by the committers,
and since my understanding is that an open source project belongs to the
public, the public user base should have some say.

3) which brings me to this point: "I personally, would love to work on
Lucene all day every day as I have a lot of things I'd love to engage the
community on, but the fact is I'm not paid to do that, so I give what I can
when I can.  I know most of the other committers are that way too." - Is
this really true? Isn't a large part of the committer base also a part of
the for-profit, consulting business, e.g. Lucid? Would groups/companies that
pay for consulting service get their patches/requirements committed with
higher priority? If so, seems to me to be a conflict of interest there.

4) "Lather, rinse, repeat.   Next thing you know, you'll be on the receiving
end as a committer." - While I agree that being a committer is a great honor
and many committers are awesome, but assuming everyone would want to be a
committer is a little presumptuous.

In conclusion, I hope I didn't unleash any wrath from the committers for
expressing candor.

-John

On Wed, Dec 3, 2008 at 2:52 PM, Grant Ingersoll <gs...@apache.org> wrote:

>
> On Dec 3, 2008, at 2:27 PM, Jason Rutherglen (JIRA) wrote:
>
>
>>
>> Hoss wrote: "sort of mythical "Lucene powerhouse"
>> Lucene seems to run itself quite differently than other open source Java
>> projects.  Perhaps it would be good to spell out the reasons for the
>> reluctance to move ahead with features that developers work on, that work,
>> but do not go in.  The developer contributions seem to be quite low right
>> now, especially compared to neighbor projects such as Hadoop.  Is this
>> because fewer people are using Lucene?  Or is it due to the reluctance to
>> work with the developer community?  Unfortunately the perception in the eyes
>> of some people who work on search related projects it is the latter.
>>
>
>
> Or, could it be that Hadoop is relatively new and in vogue at the moment,
> very malleable and buggy(?) and has a HUGE corporate sponsor who dedicates
> lots of resources to it on a full time basis, whilst Lucene has been around
> in the ASF for 7+ years (and 12+ years total) and has a really large install
> base and thus must move more deliberately and basically has 1 person who
> gets to work on it full time while the rest of us pretty much volunteer?
>  That's not an excuse, it's just the way it is.  I personally, would love to
> work on Lucene all day every day as I have a lot of things I'd love to
> engage the community on, but the fact is I'm not paid to do that, so I give
> what I can when I can.  I know most of the other committers are that way
> too.
>
> Thus, I don't think any one of us has a reluctance to move ahead with
> features or bug fixes.   Looking at CHANGES.txt, I see a lot of
> contributors.  Looking at java-dev and JIRA, I see lots of engagement with
> the community.  Is it near the historical high for traffic, no it's not, but
> that isn't necessarily a bad thing.  I think it's a sign that Lucene is
> pretty stable.
>
> What we do have a reluctance for are patches that don't have tests (i.e.
> this one), patches that massively change Lucene APIs in non-trivial ways or
> break back compatibility or are not kept up to date.  Are we perfect?  Of
> course not.  I, personally, would love for there to be a way that helps us
> process a larger volume of patches (note, I didn't say commit a larger
> volume).  Hadoop's automated patch tester would be a huge start in that, but
> at the end of the day, Lucene still works the way all ASF projects do: via
> meritocracy and volunteerism.     You want stuff committed, keep it up to
> date, make it manageable to review, document it, respond to
> questions/concerns with answers as best you can.  To that end, a real simple
> question can go a long way and getting something committed, and it simply
> is:  "Hey Lucener's,  what else can I do to help you review and commit
> LUCENE-XXXX?"  Lather, rinse, repeat.   Next thing you know, you'll be on
> the receiving end as a committer.
>
> -Grant
>
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-dev-help@lucene.apache.org
>
>

Re: [jira] Commented: (LUCENE-1473) Implement Externalizable in main top level searcher classes

Posted by Grant Ingersoll <gs...@apache.org>.

On Dec 3, 2008, at 2:27 PM, Jason Rutherglen (JIRA) wrote:

>
>
> Hoss wrote: "sort of mythical "Lucene powerhouse"
> Lucene seems to run itself quite differently than other open source  
> Java projects.  Perhaps it would be good to spell out the reasons  
> for the reluctance to move ahead with features that developers work  
> on, that work, but do not go in.  The developer contributions seem  
> to be quite low right now, especially compared to neighbor projects  
> such as Hadoop.  Is this because fewer people are using Lucene?  Or  
> is it due to the reluctance to work with the developer community?   
> Unfortunately the perception in the eyes of some people who work on  
> search related projects it is the latter.

Or, could it be that Hadoop is relatively new and in vogue at the  
moment, very malleable and buggy(?) and has a HUGE corporate sponsor  
who dedicates lots of resources to it on a full time basis, whilst  
Lucene has been around in the ASF for 7+ years (and 12+ years total)  
and has a really large install base and thus must move more  
deliberately and basically has 1 person who gets to work on it full  
time while the rest of us pretty much volunteer?    That's not an  
excuse, it's just the way it is.  I personally, would love to work on  
Lucene all day every day as I have a lot of things I'd love to engage  
the community on, but the fact is I'm not paid to do that, so I give  
what I can when I can.  I know most of the other committers are that  
way too.

Thus, I don't think any one of us has a reluctance to move ahead with  
features or bug fixes.   Looking at CHANGES.txt, I see a lot of  
contributors.  Looking at java-dev and JIRA, I see lots of engagement  
with the community.  Is it near the historical high for traffic, no  
it's not, but that isn't necessarily a bad thing.  I think it's a sign  
that Lucene is pretty stable.

What we do have a reluctance for are patches that don't have tests  
(i.e. this one), patches that massively change Lucene APIs in non- 
trivial ways or break back compatibility or are not kept up to date.   
Are we perfect?  Of course not.  I, personally, would love for there  
to be a way that helps us process a larger volume of patches (note, I  
didn't say commit a larger volume).  Hadoop's automated patch tester  
would be a huge start in that, but at the end of the day, Lucene still  
works the way all ASF projects do: via meritocracy and  
volunteerism.     You want stuff committed, keep it up to date, make  
it manageable to review, document it, respond to questions/concerns  
with answers as best you can.  To that end, a real simple question can  
go a long way and getting something committed, and it simply is:  "Hey  
Lucener's,  what else can I do to help you review and commit LUCENE- 
XXXX?"  Lather, rinse, repeat.   Next thing you know, you'll be on the  
receiving end as a committer.

-Grant

---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org

[jira] Commented: (LUCENE-1473) Implement Externalizable in main top level searcher classes

Posted by "Jason Rutherglen (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/LUCENE-1473?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12652914#action_12652914 ] 

Jason Rutherglen commented on LUCENE-1473:
------------------------------------------

"This is a hard problem."

I disagree.  It's completely manageable.  Doesn't Hadoop handle versioning inside of Writeable classes?

ScoreDocComparator javadoc "sortValue(ScoreDoc i) Returns the value used to sort the given document. The object returned *must implement the java.io.Serializable* interface. This is used by multisearchers to determine how to collate results from their searchers."

This kind of statement in the code leads one to believe that Lucene supports Serialization.  Maybe it should be removed from the Javadocs.  

"Thrift and ProtocolBuffers" don't support dynamic class loading.  If one were to create their own Query class with custom code, serializing is the only way to represent the Query object and have Java load the additional implementation code.  One easy to see use case is if Analyzer were made Serializable then indexing over the network and trying different analyzing techniques could be accomplished with ease in a grid computing environment.  

"representations for queries independent of Lucene's Query, and map this to Lucene's Query. Is that not workable in this case?"  

Mike wrote "if we add field X to a class implementing Serializable,
and must bump the SUID, that's a hard break on back compat. "

There needs to be "if statements" in readExternal to handle backwards compatibility.  Given the number of classes, and the number of fields this isn't very much work.  Neither are the test cases.  I worked on RMI and Jini at Sun and elsewhere.  I am happy to put forth the effort to maintain and develop this functionality.  It is advantageous to place this functionality directly into the classes because in my experience many of the Lucene classes do not make all of the field data public, and things like dedicated serialization such as the XML query code are voluminous.  Also the half support of serialization right now seems to indicate there really isn't support for it.  

Hoss wrote: "sort of mythical "Lucene powerhouse" 
Lucene seems to run itself quite differently than other open source Java projects.  Perhaps it would be good to spell out the reasons for the reluctance to move ahead with features that developers work on, that work, but do not go in.  The developer contributions seem to be quite low right now, especially compared to neighbor projects such as Hadoop.  Is this because fewer people are using Lucene?  Or is it due to the reluctance to work with the developer community?  Unfortunately the perception in the eyes of some people who work on search related projects it is the latter.  

Many developers seem to be working outside of Lucene and choosing *not* to open source in order to avoid going through the current hassles of getting code committed to the project.  

> Implement Externalizable in main top level searcher classes
> -----------------------------------------------------------
>
>                 Key: LUCENE-1473
>                 URL: https://issues.apache.org/jira/browse/LUCENE-1473
>             Project: Lucene - Java
>          Issue Type: Bug
>          Components: Search
>    Affects Versions: 2.4
>            Reporter: Jason Rutherglen
>            Priority: Minor
>         Attachments: LUCENE-1473.patch
>
>
> To maintain serialization compatibility between Lucene versions, major classes can implement Externalizable.  This will make Serialization faster due to no reflection required and maintain backwards compatibility.  

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org

[jira] Commented: (LUCENE-1473) Implement standard Serialization across Lucene versions

Posted by "Michael McCandless (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/LUCENE-1473?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12654108#action_12654108 ] 

Michael McCandless commented on LUCENE-1473:
--------------------------------------------

{quote}
> Often in the past was ensuring backwards-compatibility the part of
> writing patches that took the longest and involved the most
> discussions.
{quote}

It very much still is, as I'm learning with LUCENE-1458!

Your first example is missing the read/writeExternal methods.

I think the proposed approach is rather heavy-weight -- we will have
implemented readExternal, writeExternal, this new
CustomExtenralizableReader, package private init methods, make private
inner classes package private, the need to javadoc specifically the
current externalization format written for each of our classes, the
future need to help users to understand how they an achieve back
compatibility by subclassing CustomExternalizableReader, etc.

I guess my feeling is all of that is a good amount more work than just
deciding to directly implement back compatibility, ourselves.

EG, to do your example in a future world where we do support back
compat of serialized classes (NOTE -- none of the code below is
compiled/tested):

First a util class for managing versions:
{code}
public class Versions {
  private int current;

  int add(String desc) {
    // TODO: do something more interesting with desc
    return current++;
  }

  void write(ObjectOutput out) throws IOException {
    // TODO: writeVInt
    out.writeByte((byte) current);
  }

  void read(ObjectInput in) throws IOException {
    // TODO: readVInt
    final byte version = in.readByte();
    if (version > current)
      throw new IOException("this object was serialized by a newer version of Lucene (got " + version + " but expected <= " + current + ")");
  }
}
{code}

Then, someone creates SomeClass:

{code}
public class SomeClass implements Externalizable {
  private int one;
  private int two;

  private static final Versions versions = new Versions();
  private static final int VERSION0 = versions.add("start");

  public SomeClass() {};

  public void writeExternal(ObjectOutput out) throws IOException {
    versions.write(out);
    out.writeInt(one);
    out.writeInt(two);
  }

  public void readExternal(ObjectInput in) throws IOException {
    versions.read(in);
    one = in.readInt();
    two = in.readInt();
  }

  ...
}
{code}

Then on adding field three:

{code}
public class SomeClass implements Externalizable {
  private int one;
  private int two;
  private int three;

  private static final Versions versions = new Versions();
  private static final int VERSION0 = versions.add("start");
  private static final int VERSION1 = versions.add("the new field three");

  public SomeClass() {};

  public void writeExternal(ObjectOutput out) throws IOException {
    versions.write(out);
    out.writeInt(one);
    out.writeInt(two);
  }

  public void readExternal(ObjectInput in) throws IOException {
    int version = versions.read(in);
    one = in.readInt();
    two = in.readInt();
    if (version >= VERSION1)
      three = in.readInt();
    else
      // default
      three = -3;
  }

  ...
}
{code}

In fact I think we should switch to Versions utils class for writing/reading our index files...



> Implement standard Serialization across Lucene versions
> -------------------------------------------------------
>
>                 Key: LUCENE-1473
>                 URL: https://issues.apache.org/jira/browse/LUCENE-1473
>             Project: Lucene - Java
>          Issue Type: Bug
>          Components: Search
>    Affects Versions: 2.4
>            Reporter: Jason Rutherglen
>            Priority: Minor
>         Attachments: custom-externalizable-reader.patch, LUCENE-1473.patch, LUCENE-1473.patch, LUCENE-1473.patch, LUCENE-1473.patch
>
>   Original Estimate: 8h
>  Remaining Estimate: 8h
>
> To maintain serialization compatibility between Lucene versions, serialVersionUID needs to be added to classes that implement java.io.Serializable.  java.io.Externalizable may be implemented in classes for faster performance.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org

[jira] Commented: (LUCENE-1473) Implement standard Serialization across Lucene versions

Posted by "Wolf Siberski (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/LUCENE-1473?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12655944#action_12655944 ] 

Wolf Siberski commented on LUCENE-1473:
---------------------------------------

Thanks to Doug and Jason for your constructive feedback. Let me first clarify the purpose and scope of the patch. IMHO, the discussion about Serialization in Lucene is not clear-cut at all. My opinion is that moving all distribution-related code out of the core leads to a cleaner separation of concerns and thus is better design. On the other hand with removing Serializable we limit the Lucene application space at least a bit (e.g., no support for dynamic class loading), and abandon the advantages default Java serialization offers. Therefore the patch is to be taken as contribution to explore the design space (as Michaels patch on custom readers explored the Serializable option), and not as a full-fledged solution proposal.

> [Doug] The removal of Serializeable will break compatibility, so must be well-advertised.
Sure. I removed Serializable to catch all related errors; this was not meant as proposal for a final patch.

>  [Doug] The Searchable API was designed for remote use and does not include HitCollector-based access.
Currently Searchable does include a HitCollector-based search method, although the comment says that 'HitCollector-based access to remote indexes is discouraged'. The only reason to provide an implementation is that I wanted to keep the Searchable contract. Is remote access the only purpose of Searchable/MultiSearcher? Is it ok to break compatibility with respect to these classes? IMHO a significant fraction of the current clumsiness in the remote package stems from my attempt to fully preserve the Searchable API.
 
>  [Doug] Weighting, and hence ranking, does not appear to be implemented correctly by this patch. 
True, I was a bit too fast here. We could either solve it along the line you propose, or revert to pass the Weight again instead of the Query. The issue IMHO is orthogonal to the Serializable discussion and more related to the question how a good remote search interface and protocol should look like.

> [Jason] Restricting people to XML will probably not be suitable though.
The patch does not limit serialization to XML. It just requires that encoding to and decoding from String is implemented, no matter how. I used XML/XStream as proof-of-concept implementation, but don't propose to make XML mandatory. The main reason for introduction of the Serializer interface was to emphasize that XML/XStream is just one implemantation option. Actually, the current approach feels like at least one indirection more than required; for a final solution I would try to come up with a better design.

> [Jason] It seems the alternative solutions to serialization simply shift the problem around but do not really solve 
> the underlying issues (speed, versioning, writing custom serialization code, and perhaps dynamic classloading).
In a sense, the problem is indeed 'only' shifted around and not yet solved. The good thing about this shift is that Lucene core becomes decoupled from these issues. The only real limitation I see is that dynamic classloading can't be realized anymore. 

With respect to speed, I don't think that encoding/decoding is a significant performance factor in distributed search, but this would need to be benchmarked. With respect to versioning, my patch still keeps all options open. What is more important, Lucene users can now decide if they need compatibility between different versions, and roll their own encoding/decoding if they need it. Of course, if they are willing to contribute and maintain custom serializers which preserve back compatibility, they can do it in contrib as well as they could have done it in the core. Custom serialization is still possible although the standard Java serialization framework can't be used anymore for that purpose, and I admit that this is a disadvantage.

> Implement standard Serialization across Lucene versions
> -------------------------------------------------------
>
>                 Key: LUCENE-1473
>                 URL: https://issues.apache.org/jira/browse/LUCENE-1473
>             Project: Lucene - Java
>          Issue Type: Bug
>          Components: Search
>    Affects Versions: 2.4
>            Reporter: Jason Rutherglen
>            Priority: Minor
>         Attachments: custom-externalizable-reader.patch, LUCENE-1473.patch, LUCENE-1473.patch, LUCENE-1473.patch, LUCENE-1473.patch, lucene-contrib-remote.patch
>
>   Original Estimate: 8h
>  Remaining Estimate: 8h
>
> To maintain serialization compatibility between Lucene versions, serialVersionUID needs to be added to classes that implement java.io.Serializable.  java.io.Externalizable may be implemented in classes for faster performance.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org

[jira] Commented: (LUCENE-1473) Implement standard Serialization across Lucene versions

Posted by "Doug Cutting (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/LUCENE-1473?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12653869#action_12653869 ] 

Doug Cutting commented on LUCENE-1473:
--------------------------------------

> How to write a unit test for multiple versions?

We can save, in files, serialized instances of each query type from the oldest release we intend to support.  Then read each of thes queries and check that it s equal to a current query that's meant to be equivalent (ssuming all queries implement equals well).  Something similar would need to be done for each class that is meant to be transmitted cross-version.

This tests that older queries may be processed by newer code.  It does not test that newer queries can be processed by older code.  Documentation is a big part of this effort, that should be completed first.  What guarantees to we intend to provide?  Once we've documented these, then we can begin writing tests.  For example, we may only guarantee that older queries work with newer code, and that newer hits work with older code.  To test that we'd need to have an old jar around that we could test against.  This will be a trickier test to configure.


> Implement standard Serialization across Lucene versions
> -------------------------------------------------------
>
>                 Key: LUCENE-1473
>                 URL: https://issues.apache.org/jira/browse/LUCENE-1473
>             Project: Lucene - Java
>          Issue Type: Bug
>          Components: Search
>    Affects Versions: 2.4
>            Reporter: Jason Rutherglen
>            Priority: Minor
>         Attachments: LUCENE-1473.patch, LUCENE-1473.patch
>
>   Original Estimate: 8h
>  Remaining Estimate: 8h
>
> To maintain serialization compatibility between Lucene versions, serialVersionUID needs to be added to classes that implement java.io.Serializable.  java.io.Externalizable may be implemented in classes for faster performance.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org

[jira] Commented: (LUCENE-1473) Implement Externalizable in main top level searcher classes

Posted by "Doug Cutting (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/LUCENE-1473?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12652591#action_12652591 ] 

Doug Cutting commented on LUCENE-1473:
--------------------------------------

The attached patch optimizes java serialization.  Also, if we want java serialization to work cross-version, it gives us a leg to stand on.  It doesn't change anything for most Lucene users, since Lucene doesn't use java serialization except for RMI.  So, today, if you're using RemoteSearcher, and the local and remote versions have different versions of Term.java, things will probably fail, since, by default, the serialVersionUID is a hash of the method signatures and fields.  If we want RemoteSearcher to work cross-version, then we need to explicitly manage the serialVersionUID and readExternal/writeExternal implementations.  But is that really a priority?

As with all optimizations, the performance improvement should be measured and demonstrated to be significant.  Also, an invasive change like this is hard to justify when so little of Lucene depends on java serialization.



> Implement Externalizable in main top level searcher classes
> -----------------------------------------------------------
>
>                 Key: LUCENE-1473
>                 URL: https://issues.apache.org/jira/browse/LUCENE-1473
>             Project: Lucene - Java
>          Issue Type: Bug
>          Components: Search
>    Affects Versions: 2.4
>            Reporter: Jason Rutherglen
>            Priority: Minor
>         Attachments: LUCENE-1473.patch
>
>
> To maintain serialization compatibility between Lucene versions, major classes can implement Externalizable.  This will make Serialization faster due to no reflection required and maintain backwards compatibility.  

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org

[jira] Commented: (LUCENE-1473) Implement Externalizable in main top level searcher classes

Posted by "Jason Rutherglen (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/LUCENE-1473?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12652878#action_12652878 ] 

Jason Rutherglen commented on LUCENE-1473:
------------------------------------------

> the performance improvement should be measured and demonstrated to be significant

The initial concern was the incompatibility of serialized objects between Lucene versions.  The performance improvements created by using Externalizable are secondary and so providing tests would be a waste of time if the "community" believes it is too much effort to add *1 line* of code to a handful of classes.  Implementing Externalizable is a way to reduce the size of the serialized objects, manage the serialized object versions, and provide performance improvements.  Externalizable provides the most benefits and is very similar to the system Hadoop uses with Writeable.  Externalizable works seamlessly with native object serialization and Serializable implemented classes, meaning it works with a number of existing Java classes in addition to Externalizable classes.  

Using distributed serialized objects for search in Lucene is a natural Java based way to run a Lucene system.  In many cases it is ideal because Java provides something C++ does not, dynamic in-process class loading.  In a large grid based search system that requires 100% uptime this feature can be particularly useful.  

Adding a serialVersionUID to the classes is one option, adding Externalizable is another option.  

If the decision is to not support Serialization in Lucene then I recommend removing Serializable from all classes in Lucene 3.0 so that users do not mistakenly expect the search library to behave the way other Java libraries such as ICU4J, JDK class libraries, Spring, etc do.

> Implement Externalizable in main top level searcher classes
> -----------------------------------------------------------
>
>                 Key: LUCENE-1473
>                 URL: https://issues.apache.org/jira/browse/LUCENE-1473
>             Project: Lucene - Java
>          Issue Type: Bug
>          Components: Search
>    Affects Versions: 2.4
>            Reporter: Jason Rutherglen
>            Priority: Minor
>         Attachments: LUCENE-1473.patch
>
>
> To maintain serialization compatibility between Lucene versions, major classes can implement Externalizable.  This will make Serialization faster due to no reflection required and maintain backwards compatibility.  

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org

[jira] Commented: (LUCENE-1473) Implement Externalizable in main top level searcher classes

Posted by "Doug Cutting (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/LUCENE-1473?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12652965#action_12652965 ] 

Doug Cutting commented on LUCENE-1473:
--------------------------------------

> I'm not sure why you and Doug and focusing on performance when that is not really the main issue I brought up.

In the description of this issue you claim, "This will make Serialization faster".  I only added that, if this is a motivation, then it should be benchmarked and quantified.  If it's not a motivation, and cross-version compatibility is the only motivation, then that should be clarified.

> dynamic classloading is being ignored by you folks

Perhaps most Lucene developers are not using dynamic class loading, just as most Lucene developers seem not to be relying on java serialization.  Perhaps you can convince them to start using it, but if not, you may need to find a way to get Lucene to support it that minimally impacts other users of Lucene.  Adding methods to Term and Query that must be changed whenever these classes change adds a cost.  If folks don't see much benefit, then that cost outweighs.  Perhaps you can better enlighten us to the benefits rather than assert willful ignorance?  Jini promised great things with dynamic class loading, but the list of folks that use Jini is not long (http://www.jini.org/wiki/Who_uses_Jini%3F).

> Implement Externalizable in main top level searcher classes
> -----------------------------------------------------------
>
>                 Key: LUCENE-1473
>                 URL: https://issues.apache.org/jira/browse/LUCENE-1473
>             Project: Lucene - Java
>          Issue Type: Bug
>          Components: Search
>    Affects Versions: 2.4
>            Reporter: Jason Rutherglen
>            Priority: Minor
>         Attachments: LUCENE-1473.patch
>
>
> To maintain serialization compatibility between Lucene versions, major classes can implement Externalizable.  This will make Serialization faster due to no reflection required and maintain backwards compatibility.  

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org

[jira] Commented: (LUCENE-1473) Implement standard Serialization across Lucene versions

Posted by "Doug Cutting (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/LUCENE-1473?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12655806#action_12655806 ] 

Doug Cutting commented on LUCENE-1473:
--------------------------------------

Thanks, Wolf, this looks like a promising approach.

Jason, John: would this sort of thing meet your needs?

I'm not sure we can remove everything from trunk immediately.  Rather we should deprecate things and remove them in 3.0.  The removal of Serializeable will break compatibility, so must be well-advertised.

HitCollector-based search should simply not be supported in distributed search.  The Searchable API was designed for remote use and does not include HitCollector-based access.

Weighting, and hence ranking, does not appear to be implemented correctly by this patch.  An approach that might work would be to:
  - extend MultiSearcher
  - pass its CachedDfSource to remote searchers along with queries
  - construct a Weight on the search node using the CachedDfSource
Does that make sense?


> Implement standard Serialization across Lucene versions
> -------------------------------------------------------
>
>                 Key: LUCENE-1473
>                 URL: https://issues.apache.org/jira/browse/LUCENE-1473
>             Project: Lucene - Java
>          Issue Type: Bug
>          Components: Search
>    Affects Versions: 2.4
>            Reporter: Jason Rutherglen
>            Priority: Minor
>         Attachments: custom-externalizable-reader.patch, LUCENE-1473.patch, LUCENE-1473.patch, LUCENE-1473.patch, LUCENE-1473.patch, lucene-contrib-remote.patch
>
>   Original Estimate: 8h
>  Remaining Estimate: 8h
>
> To maintain serialization compatibility between Lucene versions, serialVersionUID needs to be added to classes that implement java.io.Serializable.  java.io.Externalizable may be implemented in classes for faster performance.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org

[jira] Commented: (LUCENE-1473) Implement standard Serialization across Lucene versions

Posted by "Wolf Siberski (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/LUCENE-1473?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12655676#action_12655676 ] 

Wolf Siberski commented on LUCENE-1473:
---------------------------------------

This seems to be the right way to go. The patch attached removes all dependencies to Serializable and Remote from the core and moves it to contrib/remote. I introduced a new interface RemoteSearcher
(not RemoteSearchable because I didn't want to pass Weights around), implemented by DefaultRemoteSearcher. An adapter realizing Searchable and delegating to RemoteSearcher is also included (RemoteSearcherAdapter. Encoding/Decoding of Lucene objects is delegated to the org.apache.lucene.remote.Serializer. For a sample serialization, I employed XStream which offers XML serialization (nearly) out-of-the-box. 
Everything is rather undocumented and would need a lot of cleanup, but as proof-of-concept it should be ok. Core and remote tests pass, with one exception: it is not possible anymore to serialize a RAMDirectory.
What I don't like with the current patch is that a lot of different objects are passed around to keep the Searchable interface alive. Would it be possible to refactor such that Searchable represents a higher-level interface (or introduce a new alternative abstraction)?

> Implement standard Serialization across Lucene versions
> -------------------------------------------------------
>
>                 Key: LUCENE-1473
>                 URL: https://issues.apache.org/jira/browse/LUCENE-1473
>             Project: Lucene - Java
>          Issue Type: Bug
>          Components: Search
>    Affects Versions: 2.4
>            Reporter: Jason Rutherglen
>            Priority: Minor
>         Attachments: custom-externalizable-reader.patch, LUCENE-1473.patch, LUCENE-1473.patch, LUCENE-1473.patch, LUCENE-1473.patch
>
>   Original Estimate: 8h
>  Remaining Estimate: 8h
>
> To maintain serialization compatibility between Lucene versions, serialVersionUID needs to be added to classes that implement java.io.Serializable.  java.io.Externalizable may be implemented in classes for faster performance.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org

[jira] Commented: (LUCENE-1473) Implement Externalizable in main top level searcher classes

Posted by "Doug Cutting (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/LUCENE-1473?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12652882#action_12652882 ] 

Doug Cutting commented on LUCENE-1473:
--------------------------------------

> But, what's now being asked for (expected) with this issue is "long-term persistence", which is really a very different beast and a much taller order.

That's the crux, alright.  Does Lucene want to start adding cross-version guarantees about the durability of its objects when serialized by Java serialization.  This is a hard problem.  Systems like Thrift and ProtocolBuffers offer support for this, but Java Serialiation itself doesn't really provide much assistance.  One can roll one's own serialization compatibility story manually, as proposed by this patch, but that adds a burden to the project.  We'd need, for example, test cases that keep serialized instances from past versions, so that we can be sure that patches do not break this.

The use case provided may not use RMI, but it is similar: it involves transmitting Lucene objects over the wire between different versions of Lucene.  Since Java APIs, like Lucene, do not generally provide cross-version compatibility, it would be safer to architect such a system so that it controls the serialization of transmitted instances itself and can thus guarantee their compatibility as the system is updated.  Thus it would develop its own representations for queries independent of Lucene's Query, and map this to Lucene's Query.  Is that not workable in this case?


> Implement Externalizable in main top level searcher classes
> -----------------------------------------------------------
>
>                 Key: LUCENE-1473
>                 URL: https://issues.apache.org/jira/browse/LUCENE-1473
>             Project: Lucene - Java
>          Issue Type: Bug
>          Components: Search
>    Affects Versions: 2.4
>            Reporter: Jason Rutherglen
>            Priority: Minor
>         Attachments: LUCENE-1473.patch
>
>
> To maintain serialization compatibility between Lucene versions, major classes can implement Externalizable.  This will make Serialization faster due to no reflection required and maintain backwards compatibility.  

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org

[jira] Commented: (LUCENE-1473) Implement Externalizable in main top level searcher classes

Posted by "Michael McCandless (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/LUCENE-1473?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12652546#action_12652546 ] 

Michael McCandless commented on LUCENE-1473:
--------------------------------------------

Do we really need to write the serialVersionUID?  That's adding 8 bytes to the storage of each term.

The term storage is not particularly efficient when storing many terms in the same field, because eg the String field is not written as an intern'd string.

Also I see many tests failing with this, eg TestBoolean2 -- I think we'll have to add:

   public Term() {}

so deserialization can work?  Which is then sort of annoying because it means it's possible to create a Term with null field & text (though, you can do that anyway by passing in "null" for each).

> Implement Externalizable in main top level searcher classes
> -----------------------------------------------------------
>
>                 Key: LUCENE-1473
>                 URL: https://issues.apache.org/jira/browse/LUCENE-1473
>             Project: Lucene - Java
>          Issue Type: Bug
>          Components: Search
>    Affects Versions: 2.4
>            Reporter: Jason Rutherglen
>            Priority: Minor
>         Attachments: LUCENE-1473.patch
>
>
> To maintain serialization compatibility between Lucene versions, major classes can implement Externalizable.  This will make Serialization faster due to no reflection required and maintain backwards compatibility.  

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org

[jira] Commented: (LUCENE-1473) Implement Externalizable in main top level searcher classes

Posted by "Jason Rutherglen (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/LUCENE-1473?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12653001#action_12653001 ] 

Jason Rutherglen commented on LUCENE-1473:
------------------------------------------

The discussion has evolved out of scope.  Cross-version compatibility is the main goal.  We have multiple versions of Lucene using the Spring RPC protocol.  The standard way to solve this is add a serialVersionUID like HashMap, String, and the other Java classes do.  

Performance is a topic of concern for Lucene users and existing RMI/Serialization users would transparently benefit by Externalizable being used.  

I would like to implement this as a separate project, however performing reflection on each of the objects is not an efficient approach.  Writing wrapper code for each and every variation of Query is a waste of time when it implements Serializable and the communication is between Java systems. 

It seems best to remove Serialization from Lucene so that users are not confused and create a better solution.

> Perhaps most Lucene developers are not using dynamic class loading,

Dynamic classloading is popular and accepted in J2EE servers.  If done properly it is a very convenient way of deploying Java based systems.  Jini did not make this convenient enough.  Jini did not have a specific problem it was trying to solve and so was too complex and open ended.  Spring and J2EE make use of Java serialization for distributed objects.  People may not be using Lucene for this today but this is due largely to lack of support for things like standard Serialization.  With Lucene it is possible to make dynamic search classloading convenient in a search grid environment.  

When J2EE was designed, no one was using Java on the server side.  A framework was composed and a handful of companies implemented the specification and then found it's way into projects.  If you are looking for users to ask for something that does not exist like this, it will not happen.  

The interface Lucene exposes is relatively static and known.  All of the top level classes, Query, Analyzer, Document, Similarity do not change very often.  In a search based grid computing environment, the ability to execute arbitrary code against the cloud saves time and effort in deploying new code to servers.  Restarting processes in a production environment is always expensive.  If one is implementing for example mutating genetic algorithms using Java over Lucene then it would be advantageous to dynamically load the classes that implement this.  They would be modifications of several classes such as Query, Similarity.  

It is peculiar all the effort that goes into backwards compatibility of the index, but for Serialization it is ignored.  This is and will be very confusing to users, especially ones who use J2EE.

> Implement Externalizable in main top level searcher classes
> -----------------------------------------------------------
>
>                 Key: LUCENE-1473
>                 URL: https://issues.apache.org/jira/browse/LUCENE-1473
>             Project: Lucene - Java
>          Issue Type: Bug
>          Components: Search
>    Affects Versions: 2.4
>            Reporter: Jason Rutherglen
>            Priority: Minor
>         Attachments: LUCENE-1473.patch
>
>
> To maintain serialization compatibility between Lucene versions, major classes can implement Externalizable.  This will make Serialization faster due to no reflection required and maintain backwards compatibility.  

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org

[jira] Commented: (LUCENE-1473) Implement Externalizable in main top level searcher classes

Posted by "robert engels (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/LUCENE-1473?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12652940#action_12652940 ] 

robert engels commented on LUCENE-1473:
---------------------------------------

Jason, you are only partially correct.

SOLR has XML definitions for updates/commits/deletes. It also supports string based queries. It also supports using XML for queries if you provide a handler, but the string syntax is simpler.

As for the serialization performance, you are mistaken.

For things like searches, the parsing time is extremely minor compared with the search time, so this additional overhead would be a fraction of the cost.

When returning results sets, using XML can make a huge difference, as the overhead to paid on every item.  Even still with modern XML processors, the search time is still going to be the overriding performance factor by a huge margin.  Typically "paged" results are also used, so again, the XML parsing compared to the network overhead, is going to be minor.

Still, if it is only for temporary serialization, binary works best - as long as it is Java to Java.

We have a search server that uses Java serialization for the message passing, including the results. It can be done without any changes to Lucene - again the overriding performance factor is the search itself (unless the queries returns 100k + documents and all are being returned - then the Java serialization time can be more than the search itself...



> Implement Externalizable in main top level searcher classes
> -----------------------------------------------------------
>
>                 Key: LUCENE-1473
>                 URL: https://issues.apache.org/jira/browse/LUCENE-1473
>             Project: Lucene - Java
>          Issue Type: Bug
>          Components: Search
>    Affects Versions: 2.4
>            Reporter: Jason Rutherglen
>            Priority: Minor
>         Attachments: LUCENE-1473.patch
>
>
> To maintain serialization compatibility between Lucene versions, major classes can implement Externalizable.  This will make Serialization faster due to no reflection required and maintain backwards compatibility.  

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org

[jira] Commented: (LUCENE-1473) Implement standard Serialization across Lucene versions

Posted by "Michael McCandless (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/LUCENE-1473?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12653735#action_12653735 ] 

Michael McCandless commented on LUCENE-1473:
--------------------------------------------

SerializeUtils is missing from the patch.

> Implement standard Serialization across Lucene versions
> -------------------------------------------------------
>
>                 Key: LUCENE-1473
>                 URL: https://issues.apache.org/jira/browse/LUCENE-1473
>             Project: Lucene - Java
>          Issue Type: Bug
>          Components: Search
>    Affects Versions: 2.4
>            Reporter: Jason Rutherglen
>            Priority: Minor
>         Attachments: LUCENE-1473.patch, LUCENE-1473.patch
>
>   Original Estimate: 8h
>  Remaining Estimate: 8h
>
> To maintain serialization compatibility between Lucene versions, serialVersionUID needs to be added to classes that implement java.io.Serializable.  java.io.Externalizable may be implemented in classes for faster performance.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org

[jira] Commented: (LUCENE-1473) Implement Externalizable in main top level searcher classes

Posted by "robert engels (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/LUCENE-1473?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12652892#action_12652892 ] 

robert engels commented on LUCENE-1473:
---------------------------------------

In regards to Doug's comment about an alternate form... doesn't SOLR already have a XML based query format?

If so, just persist the queries using this. You will be immune to serialization changes (provided the SOLR parser remains backwards compatible).

> Implement Externalizable in main top level searcher classes
> -----------------------------------------------------------
>
>                 Key: LUCENE-1473
>                 URL: https://issues.apache.org/jira/browse/LUCENE-1473
>             Project: Lucene - Java
>          Issue Type: Bug
>          Components: Search
>    Affects Versions: 2.4
>            Reporter: Jason Rutherglen
>            Priority: Minor
>         Attachments: LUCENE-1473.patch
>
>
> To maintain serialization compatibility between Lucene versions, major classes can implement Externalizable.  This will make Serialization faster due to no reflection required and maintain backwards compatibility.  

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org

Re: [jira] Commented: (LUCENE-1473) Implement standard Serialization across Lucene versions

Posted by markharw00d <ma...@yahoo.co.uk>.

> The problem with that is that in most cases you still need a "string" 
> based syntax that "people" can enter...

The XML syntax includes a <UserQuery> tag for embedding user input of 
this type.
>
> I guess you can always have an "advanced search" page that builds and 
> submits the XML query behind the scenes.

Contrib now includes a worked demo web app showing how a very typical 
search form is converted into XML using XSL.
User input is a mixture of edit boxes for classic QueryParser syntax 
used on free-text fields but also includes drop-downs and checkboxes etc 
that map to other non-free-text fields.

Cheers
Mark




---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org

Re: [jira] Commented: (LUCENE-1473) Implement standard Serialization across Lucene versions

Posted by robert engels <re...@ix.netcom.com>.

I only meant is from a persistence standpoint - if you need a full  
"human enterable" query syntax anyway, why not just use that as the  
persistence format.

On Dec 8, 2008, at 4:53 PM, Earwin Burrfoot wrote:

> Building your own parser with Antlr is really easy. Using Ragel is
> harder, but yields insane parsing performance.
> Is there any reason to worry about library-bundled parsers if you're
> making something more complex then a college project?
>
> On Tue, Dec 9, 2008 at 01:49, robert engels <re...@ix.netcom.com>  
> wrote:
>> The problem with that is that in most cases you still need a  
>> "string" based
>> syntax that "people" can enter...
>>
>> I guess you can always have an "advanced search" page that builds and
>> submits the XML query behind the scenes.
>>
>>
>>
>> On Dec 8, 2008, at 4:40 PM, Erik Hatcher wrote:
>>
>>> Well, there's the pretty sophisticated and extensible XML query  
>>> parser in
>>> contrib.  I've still only scratched the surface of it, but it  
>>> meets the
>>> specs you mentioned.
>>>
>>>        Erik
>>>
>>>
>>> On Dec 8, 2008, at 4:51 PM, robert engels wrote:
>>>
>>>> I think an important piece to make this work is the query parser/ 
>>>> syntax.
>>>>
>>>> We already have a system similar to what is outlined below.  We  
>>>> made
>>>> changes to the query syntax to support our various query  
>>>> extensions.
>>>>
>>>> The nice thing, is that persisting queries is a simple string.   
>>>> It also
>>>> makes it very easy for external system to submit queries.
>>>>
>>>> We also have XML definitions for a "result set".
>>>>
>>>> I think the only way to make this work though, is probably a more
>>>> detailed query syntax (similar to SQL), so that it can be easily  
>>>> extended
>>>> with new clauses/functions without breaking existing code.
>>>>
>>>> I would also suggest that any core queries classes have a  
>>>> representation
>>>> here.
>>>>
>>>> I would also like to see a way for "proprietary" clauses to be  
>>>> supported
>>>> (like calls in SQL).
>>>>
>>>> On Dec 8, 2008, at 3:37 PM, eks dev wrote:
>>>>
>>>>> That sounds much better. Trying to distribute lucene (my reason  
>>>>> why all
>>>>> this would be interesting) itself is just not going to work for  
>>>>> far too many
>>>>> applications and will put burden on API extensions.
>>>>>
>>>>> My point is, I do not want to distribute Lucene Index, I need to
>>>>> distribute my application that is using Lucene. Think of it  
>>>>> like having
>>>>> distributed Luke, usefull by itself, but not really usefull for  
>>>>> slightly
>>>>> more complex use cases.
>>>>> My Hit class is specialized Lucene Hit object, my Query has  
>>>>> totally
>>>>> diferent features and agregates Lucene Query... this is what I  
>>>>> can control,
>>>>> what I need to send over the wire and that is the place where I  
>>>>> define what
>>>>> is my Version/API, if lucene API Classes change and all  
>>>>> existing featurs
>>>>> remain, I have no problems in keeping my serialized objects  
>>>>> compatible.  So
>>>>> the versioning becomes under my control, Lucene provides only  
>>>>> features,
>>>>> library.
>>>>>
>>>>> Having light layer, easily extensible,  on top of the core  API  
>>>>> would be
>>>>> just great, as fas as I am concerned java Serialization is not  
>>>>> my world,
>>>>> having something light and extensible in etch/thrift/hadop
>>>>> IPC/ProtocolBuffers  direction is much more thrilling. That is  
>>>>> exactly the
>>>>> road hadoop, nutch, katta and probably many others are taking,  
>>>>> having comon
>>>>> base that supports such cases is maybe good idea, why not making
>>>>> RemoteSearchable using hadoop IPC, or etch/thrift ...
>>>>>
>>>>> Maybe there are other reasons to suport java serialization, I  
>>>>> do not
>>>>> know. Just painting one view on this idea
>>>>>
>>>>>
>>>>>
>>>>>
>>>>> ----- Original Message ----
>>>>>>
>>>>>> From: Doug Cutting (JIRA) <ji...@apache.org>
>>>>>> To: java-dev@lucene.apache.org
>>>>>> Sent: Monday, 8 December, 2008 19:52:46
>>>>>> Subject: [jira] Commented: (LUCENE-1473) Implement standard
>>>>>> Serialization across Lucene versions
>>>>>>
>>>>>>
>>>>>>   [
>>>>>>
>>>>>> https://issues.apache.org/jira/browse/LUCENE-1473? 
>>>>>> page=com.atlassian.jira.plugin.system.issuetabpanels:comment- 
>>>>>> tabpanel&focusedCommentId=12654513#action_12654513
>>>>>> ]
>>>>>>
>>>>>> Doug Cutting commented on LUCENE-1473:
>>>>>> --------------------------------------
>>>>>>
>>>>>> Would it take any more lines of code to remove Serializeable  
>>>>>> from the
>>>>>> core
>>>>>> classes and re-implement RemoteSearchable in a separate layer  
>>>>>> on top of
>>>>>> the core
>>>>>> APIs?  That layer could be a contrib module and could get all the
>>>>>> externalizeable love it needs.  It could support a specific  
>>>>>> popular
>>>>>> subset of
>>>>>> query and filter classes, rather than arbitrary Query  
>>>>>> implementations.
>>>>>>  It would
>>>>>> be extensible, so that if folks wanted to support new kinds of  
>>>>>> queries,
>>>>>> they
>>>>>> easily could.  This other approach seems like a slippery slope,
>>>>>> complicating
>>>>>> already complex code with new concerns.  It would be better to
>>>>>> encapsulate these
>>>>>> concerns in a layer atop APIs whose back-compatibility we  
>>>>>> already make
>>>>>> promises
>>>>>> about, no?
>>>>>>
>>>>>>> Implement standard Serialization across Lucene versions
>>>>>>> -------------------------------------------------------
>>>>>>>
>>>>>>>               Key: LUCENE-1473
>>>>>>>               URL: https://issues.apache.org/jira/browse/ 
>>>>>>> LUCENE-1473
>>>>>>>           Project: Lucene - Java
>>>>>>>        Issue Type: Bug
>>>>>>>        Components: Search
>>>>>>>  Affects Versions: 2.4
>>>>>>>          Reporter: Jason Rutherglen
>>>>>>>          Priority: Minor
>>>>>>>       Attachments: custom-externalizable-reader.patch,
>>>>>>> LUCENE-1473.patch,
>>>>>>
>>>>>> LUCENE-1473.patch, LUCENE-1473.patch, LUCENE-1473.patch
>>>>>>>
>>>>>>>  Original Estimate: 8h
>>>>>>> Remaining Estimate: 8h
>>>>>>>
>>>>>>> To maintain serialization compatibility between Lucene versions,
>>>>>>
>>>>>> serialVersionUID needs to be added to classes that implement
>>>>>> java.io.Serializable.  java.io.Externalizable may be  
>>>>>> implemented in
>>>>>> classes for
>>>>>> faster performance.
>>>>>>
>>>>>> --
>>>>>> This message is automatically generated by JIRA.
>>>>>> -
>>>>>> You can reply to this email to add a comment to the issue online.
>>>>>>
>>>>>>
>>>>>> ----------------------------------------------------------------- 
>>>>>> ----
>>>>>> To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
>>>>>> For additional commands, e-mail: java-dev-help@lucene.apache.org
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>> ------------------------------------------------------------------ 
>>>>> ---
>>>>> To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
>>>>> For additional commands, e-mail: java-dev-help@lucene.apache.org
>>>>>
>>>>
>>>>
>>>> ------------------------------------------------------------------- 
>>>> --
>>>> To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
>>>> For additional commands, e-mail: java-dev-help@lucene.apache.org
>>>
>>>
>>> -------------------------------------------------------------------- 
>>> -
>>> To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
>>> For additional commands, e-mail: java-dev-help@lucene.apache.org
>>>
>>
>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
>> For additional commands, e-mail: java-dev-help@lucene.apache.org
>>
>>
>
>
>
> -- 
> Kirill Zakharenko/Кирилл Захаренко (earwin@gmail.com)
> Home / Mobile: +7 (495) 683-567-4 / +7 (903) 5-888-423
> ICQ: 104465785


---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org

Re: [jira] Commented: (LUCENE-1473) Implement standard Serialization across Lucene versions

Posted by Earwin Burrfoot <ea...@gmail.com>.

Building your own parser with Antlr is really easy. Using Ragel is
harder, but yields insane parsing performance.
Is there any reason to worry about library-bundled parsers if you're
making something more complex then a college project?

On Tue, Dec 9, 2008 at 01:49, robert engels <re...@ix.netcom.com> wrote:
> The problem with that is that in most cases you still need a "string" based
> syntax that "people" can enter...
>
> I guess you can always have an "advanced search" page that builds and
> submits the XML query behind the scenes.
>
>
>
> On Dec 8, 2008, at 4:40 PM, Erik Hatcher wrote:
>
>> Well, there's the pretty sophisticated and extensible XML query parser in
>> contrib.  I've still only scratched the surface of it, but it meets the
>> specs you mentioned.
>>
>>        Erik
>>
>>
>> On Dec 8, 2008, at 4:51 PM, robert engels wrote:
>>
>>> I think an important piece to make this work is the query parser/syntax.
>>>
>>> We already have a system similar to what is outlined below.  We made
>>> changes to the query syntax to support our various query extensions.
>>>
>>> The nice thing, is that persisting queries is a simple string.  It also
>>> makes it very easy for external system to submit queries.
>>>
>>> We also have XML definitions for a "result set".
>>>
>>> I think the only way to make this work though, is probably a more
>>> detailed query syntax (similar to SQL), so that it can be easily extended
>>> with new clauses/functions without breaking existing code.
>>>
>>> I would also suggest that any core queries classes have a representation
>>> here.
>>>
>>> I would also like to see a way for "proprietary" clauses to be supported
>>> (like calls in SQL).
>>>
>>> On Dec 8, 2008, at 3:37 PM, eks dev wrote:
>>>
>>>> That sounds much better. Trying to distribute lucene (my reason why all
>>>> this would be interesting) itself is just not going to work for far too many
>>>> applications and will put burden on API extensions.
>>>>
>>>> My point is, I do not want to distribute Lucene Index, I need to
>>>> distribute my application that is using Lucene. Think of it like having
>>>> distributed Luke, usefull by itself, but not really usefull for slightly
>>>> more complex use cases.
>>>> My Hit class is specialized Lucene Hit object, my Query has totally
>>>> diferent features and agregates Lucene Query... this is what I can control,
>>>> what I need to send over the wire and that is the place where I define what
>>>> is my Version/API, if lucene API Classes change and all existing featurs
>>>> remain, I have no problems in keeping my serialized objects compatible.  So
>>>> the versioning becomes under my control, Lucene provides only features,
>>>> library.
>>>>
>>>> Having light layer, easily extensible,  on top of the core  API would be
>>>> just great, as fas as I am concerned java Serialization is not my world,
>>>> having something light and extensible in etch/thrift/hadop
>>>> IPC/ProtocolBuffers  direction is much more thrilling. That is exactly the
>>>> road hadoop, nutch, katta and probably many others are taking, having comon
>>>> base that supports such cases is maybe good idea, why not making
>>>> RemoteSearchable using hadoop IPC, or etch/thrift ...
>>>>
>>>> Maybe there are other reasons to suport java serialization, I do not
>>>> know. Just painting one view on this idea
>>>>
>>>>
>>>>
>>>>
>>>> ----- Original Message ----
>>>>>
>>>>> From: Doug Cutting (JIRA) <ji...@apache.org>
>>>>> To: java-dev@lucene.apache.org
>>>>> Sent: Monday, 8 December, 2008 19:52:46
>>>>> Subject: [jira] Commented: (LUCENE-1473) Implement standard
>>>>> Serialization across Lucene versions
>>>>>
>>>>>
>>>>>   [
>>>>>
>>>>> https://issues.apache.org/jira/browse/LUCENE-1473?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12654513#action_12654513
>>>>> ]
>>>>>
>>>>> Doug Cutting commented on LUCENE-1473:
>>>>> --------------------------------------
>>>>>
>>>>> Would it take any more lines of code to remove Serializeable from the
>>>>> core
>>>>> classes and re-implement RemoteSearchable in a separate layer on top of
>>>>> the core
>>>>> APIs?  That layer could be a contrib module and could get all the
>>>>> externalizeable love it needs.  It could support a specific popular
>>>>> subset of
>>>>> query and filter classes, rather than arbitrary Query implementations.
>>>>>  It would
>>>>> be extensible, so that if folks wanted to support new kinds of queries,
>>>>> they
>>>>> easily could.  This other approach seems like a slippery slope,
>>>>> complicating
>>>>> already complex code with new concerns.  It would be better to
>>>>> encapsulate these
>>>>> concerns in a layer atop APIs whose back-compatibility we already make
>>>>> promises
>>>>> about, no?
>>>>>
>>>>>> Implement standard Serialization across Lucene versions
>>>>>> -------------------------------------------------------
>>>>>>
>>>>>>               Key: LUCENE-1473
>>>>>>               URL: https://issues.apache.org/jira/browse/LUCENE-1473
>>>>>>           Project: Lucene - Java
>>>>>>        Issue Type: Bug
>>>>>>        Components: Search
>>>>>>  Affects Versions: 2.4
>>>>>>          Reporter: Jason Rutherglen
>>>>>>          Priority: Minor
>>>>>>       Attachments: custom-externalizable-reader.patch,
>>>>>> LUCENE-1473.patch,
>>>>>
>>>>> LUCENE-1473.patch, LUCENE-1473.patch, LUCENE-1473.patch
>>>>>>
>>>>>>  Original Estimate: 8h
>>>>>> Remaining Estimate: 8h
>>>>>>
>>>>>> To maintain serialization compatibility between Lucene versions,
>>>>>
>>>>> serialVersionUID needs to be added to classes that implement
>>>>> java.io.Serializable.  java.io.Externalizable may be implemented in
>>>>> classes for
>>>>> faster performance.
>>>>>
>>>>> --
>>>>> This message is automatically generated by JIRA.
>>>>> -
>>>>> You can reply to this email to add a comment to the issue online.
>>>>>
>>>>>
>>>>> ---------------------------------------------------------------------
>>>>> To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
>>>>> For additional commands, e-mail: java-dev-help@lucene.apache.org
>>>>
>>>>
>>>>
>>>>
>>>>
>>>> ---------------------------------------------------------------------
>>>> To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
>>>> For additional commands, e-mail: java-dev-help@lucene.apache.org
>>>>
>>>
>>>
>>> ---------------------------------------------------------------------
>>> To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
>>> For additional commands, e-mail: java-dev-help@lucene.apache.org
>>
>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
>> For additional commands, e-mail: java-dev-help@lucene.apache.org
>>
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-dev-help@lucene.apache.org
>
>



-- 
Kirill Zakharenko/Кирилл Захаренко (earwin@gmail.com)
Home / Mobile: +7 (495) 683-567-4 / +7 (903) 5-888-423
ICQ: 104465785

Re: [jira] Commented: (LUCENE-1473) Implement standard Serialization across Lucene versions

Posted by robert engels <re...@ix.netcom.com>.

The problem with that is that in most cases you still need a "string"  
based syntax that "people" can enter...

I guess you can always have an "advanced search" page that builds and  
submits the XML query behind the scenes.



On Dec 8, 2008, at 4:40 PM, Erik Hatcher wrote:

> Well, there's the pretty sophisticated and extensible XML query  
> parser in contrib.  I've still only scratched the surface of it,  
> but it meets the specs you mentioned.
>
> 	Erik
>
>
> On Dec 8, 2008, at 4:51 PM, robert engels wrote:
>
>> I think an important piece to make this work is the query parser/ 
>> syntax.
>>
>> We already have a system similar to what is outlined below.  We  
>> made changes to the query syntax to support our various query  
>> extensions.
>>
>> The nice thing, is that persisting queries is a simple string.  It  
>> also makes it very easy for external system to submit queries.
>>
>> We also have XML definitions for a "result set".
>>
>> I think the only way to make this work though, is probably a more  
>> detailed query syntax (similar to SQL), so that it can be easily  
>> extended with new clauses/functions without breaking existing code.
>>
>> I would also suggest that any core queries classes have a  
>> representation here.
>>
>> I would also like to see a way for "proprietary" clauses to be  
>> supported (like calls in SQL).
>>
>> On Dec 8, 2008, at 3:37 PM, eks dev wrote:
>>
>>> That sounds much better. Trying to distribute lucene (my reason  
>>> why all this would be interesting) itself is just not going to  
>>> work for far too many applications and will put burden on API  
>>> extensions.
>>>
>>> My point is, I do not want to distribute Lucene Index, I need to  
>>> distribute my application that is using Lucene. Think of it like  
>>> having distributed Luke, usefull by itself, but not really  
>>> usefull for slightly more complex use cases.
>>> My Hit class is specialized Lucene Hit object, my Query has  
>>> totally diferent features and agregates Lucene Query... this is  
>>> what I can control, what I need to send over the wire and that is  
>>> the place where I define what is my Version/API, if lucene API  
>>> Classes change and all existing featurs remain, I have no  
>>> problems in keeping my serialized objects compatible.  So the  
>>> versioning becomes under my control, Lucene provides only  
>>> features, library.
>>>
>>> Having light layer, easily extensible,  on top of the core  API  
>>> would be just great, as fas as I am concerned java Serialization  
>>> is not my world, having something light and extensible in etch/ 
>>> thrift/hadop IPC/ProtocolBuffers  direction is much more  
>>> thrilling. That is exactly the road hadoop, nutch, katta and  
>>> probably many others are taking, having comon base that supports  
>>> such cases is maybe good idea, why not making RemoteSearchable  
>>> using hadoop IPC, or etch/thrift ...
>>>
>>> Maybe there are other reasons to suport java serialization, I do  
>>> not know. Just painting one view on this idea
>>>
>>>
>>>
>>>
>>> ----- Original Message ----
>>>> From: Doug Cutting (JIRA) <ji...@apache.org>
>>>> To: java-dev@lucene.apache.org
>>>> Sent: Monday, 8 December, 2008 19:52:46
>>>> Subject: [jira] Commented: (LUCENE-1473) Implement standard  
>>>> Serialization across Lucene versions
>>>>
>>>>
>>>>    [
>>>> https://issues.apache.org/jira/browse/LUCENE-1473? 
>>>> page=com.atlassian.jira.plugin.system.issuetabpanels:comment- 
>>>> tabpanel&focusedCommentId=12654513#action_12654513
>>>> ]
>>>>
>>>> Doug Cutting commented on LUCENE-1473:
>>>> --------------------------------------
>>>>
>>>> Would it take any more lines of code to remove Serializeable  
>>>> from the core
>>>> classes and re-implement RemoteSearchable in a separate layer on  
>>>> top of the core
>>>> APIs?  That layer could be a contrib module and could get all the
>>>> externalizeable love it needs.  It could support a specific  
>>>> popular subset of
>>>> query and filter classes, rather than arbitrary Query  
>>>> implementations.  It would
>>>> be extensible, so that if folks wanted to support new kinds of  
>>>> queries, they
>>>> easily could.  This other approach seems like a slippery slope,  
>>>> complicating
>>>> already complex code with new concerns.  It would be better to  
>>>> encapsulate these
>>>> concerns in a layer atop APIs whose back-compatibility we  
>>>> already make promises
>>>> about, no?
>>>>
>>>>> Implement standard Serialization across Lucene versions
>>>>> -------------------------------------------------------
>>>>>
>>>>>                Key: LUCENE-1473
>>>>>                URL: https://issues.apache.org/jira/browse/ 
>>>>> LUCENE-1473
>>>>>            Project: Lucene - Java
>>>>>         Issue Type: Bug
>>>>>         Components: Search
>>>>>   Affects Versions: 2.4
>>>>>           Reporter: Jason Rutherglen
>>>>>           Priority: Minor
>>>>>        Attachments: custom-externalizable-reader.patch,  
>>>>> LUCENE-1473.patch,
>>>> LUCENE-1473.patch, LUCENE-1473.patch, LUCENE-1473.patch
>>>>>
>>>>>  Original Estimate: 8h
>>>>> Remaining Estimate: 8h
>>>>>
>>>>> To maintain serialization compatibility between Lucene versions,
>>>> serialVersionUID needs to be added to classes that implement
>>>> java.io.Serializable.  java.io.Externalizable may be implemented  
>>>> in classes for
>>>> faster performance.
>>>>
>>>> -- 
>>>> This message is automatically generated by JIRA.
>>>> -
>>>> You can reply to this email to add a comment to the issue online.
>>>>
>>>>
>>>> ------------------------------------------------------------------- 
>>>> --
>>>> To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
>>>> For additional commands, e-mail: java-dev-help@lucene.apache.org
>>>
>>>
>>>
>>>
>>>
>>> -------------------------------------------------------------------- 
>>> -
>>> To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
>>> For additional commands, e-mail: java-dev-help@lucene.apache.org
>>>
>>
>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
>> For additional commands, e-mail: java-dev-help@lucene.apache.org
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-dev-help@lucene.apache.org
>


---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org

Re: [jira] Commented: (LUCENE-1473) Implement standard Serialization across Lucene versions

Posted by Erik Hatcher <er...@ehatchersolutions.com>.

Well, there's the pretty sophisticated and extensible XML query parser  
in contrib.  I've still only scratched the surface of it, but it meets  
the specs you mentioned.

	Erik


On Dec 8, 2008, at 4:51 PM, robert engels wrote:

> I think an important piece to make this work is the query parser/ 
> syntax.
>
> We already have a system similar to what is outlined below.  We made  
> changes to the query syntax to support our various query extensions.
>
> The nice thing, is that persisting queries is a simple string.  It  
> also makes it very easy for external system to submit queries.
>
> We also have XML definitions for a "result set".
>
> I think the only way to make this work though, is probably a more  
> detailed query syntax (similar to SQL), so that it can be easily  
> extended with new clauses/functions without breaking existing code.
>
> I would also suggest that any core queries classes have a  
> representation here.
>
> I would also like to see a way for "proprietary" clauses to be  
> supported (like calls in SQL).
>
> On Dec 8, 2008, at 3:37 PM, eks dev wrote:
>
>> That sounds much better. Trying to distribute lucene (my reason why  
>> all this would be interesting) itself is just not going to work for  
>> far too many applications and will put burden on API extensions.
>>
>> My point is, I do not want to distribute Lucene Index, I need to  
>> distribute my application that is using Lucene. Think of it like  
>> having distributed Luke, usefull by itself, but not really usefull  
>> for slightly more complex use cases.
>> My Hit class is specialized Lucene Hit object, my Query has totally  
>> diferent features and agregates Lucene Query... this is what I can  
>> control, what I need to send over the wire and that is the place  
>> where I define what is my Version/API, if lucene API Classes change  
>> and all existing featurs remain, I have no problems in keeping my  
>> serialized objects compatible.  So the versioning becomes under my  
>> control, Lucene provides only features, library.
>>
>> Having light layer, easily extensible,  on top of the core  API  
>> would be just great, as fas as I am concerned java Serialization is  
>> not my world, having something light and extensible in etch/thrift/ 
>> hadop IPC/ProtocolBuffers  direction is much more thrilling. That  
>> is exactly the road hadoop, nutch, katta and probably many others  
>> are taking, having comon base that supports such cases is maybe  
>> good idea, why not making RemoteSearchable using hadoop IPC, or  
>> etch/thrift ...
>>
>> Maybe there are other reasons to suport java serialization, I do  
>> not know. Just painting one view on this idea
>>
>>
>>
>>
>> ----- Original Message ----
>>> From: Doug Cutting (JIRA) <ji...@apache.org>
>>> To: java-dev@lucene.apache.org
>>> Sent: Monday, 8 December, 2008 19:52:46
>>> Subject: [jira] Commented: (LUCENE-1473) Implement standard  
>>> Serialization across Lucene versions
>>>
>>>
>>>    [
>>> https://issues.apache.org/jira/browse/LUCENE-1473?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12654513 
>>> #action_12654513
>>> ]
>>>
>>> Doug Cutting commented on LUCENE-1473:
>>> --------------------------------------
>>>
>>> Would it take any more lines of code to remove Serializeable from  
>>> the core
>>> classes and re-implement RemoteSearchable in a separate layer on  
>>> top of the core
>>> APIs?  That layer could be a contrib module and could get all the
>>> externalizeable love it needs.  It could support a specific  
>>> popular subset of
>>> query and filter classes, rather than arbitrary Query  
>>> implementations.  It would
>>> be extensible, so that if folks wanted to support new kinds of  
>>> queries, they
>>> easily could.  This other approach seems like a slippery slope,  
>>> complicating
>>> already complex code with new concerns.  It would be better to  
>>> encapsulate these
>>> concerns in a layer atop APIs whose back-compatibility we already  
>>> make promises
>>> about, no?
>>>
>>>> Implement standard Serialization across Lucene versions
>>>> -------------------------------------------------------
>>>>
>>>>                Key: LUCENE-1473
>>>>                URL: https://issues.apache.org/jira/browse/LUCENE-1473
>>>>            Project: Lucene - Java
>>>>         Issue Type: Bug
>>>>         Components: Search
>>>>   Affects Versions: 2.4
>>>>           Reporter: Jason Rutherglen
>>>>           Priority: Minor
>>>>        Attachments: custom-externalizable-reader.patch,  
>>>> LUCENE-1473.patch,
>>> LUCENE-1473.patch, LUCENE-1473.patch, LUCENE-1473.patch
>>>>
>>>>  Original Estimate: 8h
>>>> Remaining Estimate: 8h
>>>>
>>>> To maintain serialization compatibility between Lucene versions,
>>> serialVersionUID needs to be added to classes that implement
>>> java.io.Serializable.  java.io.Externalizable may be implemented  
>>> in classes for
>>> faster performance.
>>>
>>> -- 
>>> This message is automatically generated by JIRA.
>>> -
>>> You can reply to this email to add a comment to the issue online.
>>>
>>>
>>> ---------------------------------------------------------------------
>>> To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
>>> For additional commands, e-mail: java-dev-help@lucene.apache.org
>>
>>
>>
>>
>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
>> For additional commands, e-mail: java-dev-help@lucene.apache.org
>>
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-dev-help@lucene.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org

Re: [jira] Commented: (LUCENE-1473) Implement standard Serialization across Lucene versions

Posted by robert engels <re...@ix.netcom.com>.

I think an important piece to make this work is the query parser/syntax.

We already have a system similar to what is outlined below.  We made  
changes to the query syntax to support our various query extensions.

The nice thing, is that persisting queries is a simple string.  It  
also makes it very easy for external system to submit queries.

We also have XML definitions for a "result set".

I think the only way to make this work though, is probably a more  
detailed query syntax (similar to SQL), so that it can be easily  
extended with new clauses/functions without breaking existing code.

I would also suggest that any core queries classes have a  
representation here.

I would also like to see a way for "proprietary" clauses to be  
supported (like calls in SQL).

On Dec 8, 2008, at 3:37 PM, eks dev wrote:

> That sounds much better. Trying to distribute lucene (my reason why  
> all this would be interesting) itself is just not going to work for  
> far too many applications and will put burden on API extensions.
>
> My point is, I do not want to distribute Lucene Index, I need to  
> distribute my application that is using Lucene. Think of it like  
> having distributed Luke, usefull by itself, but not really usefull  
> for slightly more complex use cases.
> My Hit class is specialized Lucene Hit object, my Query has totally  
> diferent features and agregates Lucene Query... this is what I can  
> control, what I need to send over the wire and that is the place  
> where I define what is my Version/API, if lucene API Classes change  
> and all existing featurs remain, I have no problems in keeping my  
> serialized objects compatible.  So the versioning becomes under my  
> control, Lucene provides only features, library.
>
> Having light layer, easily extensible,  on top of the core  API  
> would be just great, as fas as I am concerned java Serialization is  
> not my world, having something light and extensible in etch/thrift/ 
> hadop IPC/ProtocolBuffers  direction is much more thrilling. That  
> is exactly the road hadoop, nutch, katta and probably many others  
> are taking, having comon base that supports such cases is maybe  
> good idea, why not making RemoteSearchable using hadoop IPC, or  
> etch/thrift ...
>
> Maybe there are other reasons to suport java serialization, I do  
> not know. Just painting one view on this idea
>
>
>
>
> ----- Original Message ----
>> From: Doug Cutting (JIRA) <ji...@apache.org>
>> To: java-dev@lucene.apache.org
>> Sent: Monday, 8 December, 2008 19:52:46
>> Subject: [jira] Commented: (LUCENE-1473) Implement standard  
>> Serialization across Lucene versions
>>
>>
>>     [
>> https://issues.apache.org/jira/browse/LUCENE-1473? 
>> page=com.atlassian.jira.plugin.system.issuetabpanels:comment- 
>> tabpanel&focusedCommentId=12654513#action_12654513
>> ]
>>
>> Doug Cutting commented on LUCENE-1473:
>> --------------------------------------
>>
>> Would it take any more lines of code to remove Serializeable from  
>> the core
>> classes and re-implement RemoteSearchable in a separate layer on  
>> top of the core
>> APIs?  That layer could be a contrib module and could get all the
>> externalizeable love it needs.  It could support a specific  
>> popular subset of
>> query and filter classes, rather than arbitrary Query  
>> implementations.  It would
>> be extensible, so that if folks wanted to support new kinds of  
>> queries, they
>> easily could.  This other approach seems like a slippery slope,  
>> complicating
>> already complex code with new concerns.  It would be better to  
>> encapsulate these
>> concerns in a layer atop APIs whose back-compatibility we already  
>> make promises
>> about, no?
>>
>>> Implement standard Serialization across Lucene versions
>>> -------------------------------------------------------
>>>
>>>                 Key: LUCENE-1473
>>>                 URL: https://issues.apache.org/jira/browse/ 
>>> LUCENE-1473
>>>             Project: Lucene - Java
>>>          Issue Type: Bug
>>>          Components: Search
>>>    Affects Versions: 2.4
>>>            Reporter: Jason Rutherglen
>>>            Priority: Minor
>>>         Attachments: custom-externalizable-reader.patch,  
>>> LUCENE-1473.patch,
>> LUCENE-1473.patch, LUCENE-1473.patch, LUCENE-1473.patch
>>>
>>>   Original Estimate: 8h
>>>  Remaining Estimate: 8h
>>>
>>> To maintain serialization compatibility between Lucene versions,
>> serialVersionUID needs to be added to classes that implement
>> java.io.Serializable.  java.io.Externalizable may be implemented  
>> in classes for
>> faster performance.
>>
>> -- 
>> This message is automatically generated by JIRA.
>> -
>> You can reply to this email to add a comment to the issue online.
>>
>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
>> For additional commands, e-mail: java-dev-help@lucene.apache.org
>
>
>
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-dev-help@lucene.apache.org
>


---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org

Re: [jira] Commented: (LUCENE-1473) Implement standard Serialization across Lucene versions

Posted by eks dev <ek...@yahoo.co.uk>.

That sounds much better. Trying to distribute lucene (my reason why all this would be interesting) itself is just not going to work for far too many applications and will put burden on API extensions.

My point is, I do not want to distribute Lucene Index, I need to distribute my application that is using Lucene. Think of it like having distributed Luke, usefull by itself, but not really usefull for slightly more complex use cases.
My Hit class is specialized Lucene Hit object, my Query has totally diferent features and agregates Lucene Query... this is what I can control, what I need to send over the wire and that is the place where I define what is my Version/API, if lucene API Classes change and all existing featurs remain, I have no problems in keeping my serialized objects compatible.  So the versioning becomes under my control, Lucene provides only features, library.     

Having light layer, easily extensible,  on top of the core  API would be just great, as fas as I am concerned java Serialization is not my world, having something light and extensible in etch/thrift/hadop IPC/ProtocolBuffers  direction is much more thrilling. That is exactly the road hadoop, nutch, katta and probably many others are taking, having comon base that supports such cases is maybe good idea, why not making RemoteSearchable using hadoop IPC, or etch/thrift ...
         
Maybe there are other reasons to suport java serialization, I do not know. Just painting one view on this idea 




----- Original Message ----
> From: Doug Cutting (JIRA) <ji...@apache.org>
> To: java-dev@lucene.apache.org
> Sent: Monday, 8 December, 2008 19:52:46
> Subject: [jira] Commented: (LUCENE-1473) Implement standard Serialization across Lucene versions
> 
> 
>     [ 
> https://issues.apache.org/jira/browse/LUCENE-1473?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12654513#action_12654513 
> ] 
> 
> Doug Cutting commented on LUCENE-1473:
> --------------------------------------
> 
> Would it take any more lines of code to remove Serializeable from the core 
> classes and re-implement RemoteSearchable in a separate layer on top of the core 
> APIs?  That layer could be a contrib module and could get all the 
> externalizeable love it needs.  It could support a specific popular subset of 
> query and filter classes, rather than arbitrary Query implementations.  It would 
> be extensible, so that if folks wanted to support new kinds of queries, they 
> easily could.  This other approach seems like a slippery slope, complicating 
> already complex code with new concerns.  It would be better to encapsulate these 
> concerns in a layer atop APIs whose back-compatibility we already make promises 
> about, no?
> 
> > Implement standard Serialization across Lucene versions
> > -------------------------------------------------------
> >
> >                 Key: LUCENE-1473
> >                 URL: https://issues.apache.org/jira/browse/LUCENE-1473
> >             Project: Lucene - Java
> >          Issue Type: Bug
> >          Components: Search
> >    Affects Versions: 2.4
> >            Reporter: Jason Rutherglen
> >            Priority: Minor
> >         Attachments: custom-externalizable-reader.patch, LUCENE-1473.patch, 
> LUCENE-1473.patch, LUCENE-1473.patch, LUCENE-1473.patch
> >
> >   Original Estimate: 8h
> >  Remaining Estimate: 8h
> >
> > To maintain serialization compatibility between Lucene versions, 
> serialVersionUID needs to be added to classes that implement 
> java.io.Serializable.  java.io.Externalizable may be implemented in classes for 
> faster performance.
> 
> -- 
> This message is automatically generated by JIRA.
> -
> You can reply to this email to add a comment to the issue online.
> 
> 
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-dev-help@lucene.apache.org



      

---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org

Re: [jira] Commented: (LUCENE-1473) Implement standard Serialization across Lucene versions

Posted by Grant Ingersoll <gs...@apache.org>.

See http://lucene.markmail.org/message/fu34tuomnqejchfj?q=RemoteSearchable 
  for just such a proposal

On Dec 8, 2008, at 1:52 PM, Doug Cutting (JIRA) wrote:

>
>    [ https://issues.apache.org/jira/browse/LUCENE-1473?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12654513 
> #action_12654513 ]
>
> Doug Cutting commented on LUCENE-1473:
> --------------------------------------
>
> Would it take any more lines of code to remove Serializeable from  
> the core classes and re-implement RemoteSearchable in a separate  
> layer on top of the core APIs?  That layer could be a contrib module  
> and could get all the externalizeable love it needs.  It could  
> support a specific popular subset of query and filter classes,  
> rather than arbitrary Query implementations.  It would be  
> extensible, so that if folks wanted to support new kinds of queries,  
> they easily could.  This other approach seems like a slippery slope,  
> complicating already complex code with new concerns.  It would be  
> better to encapsulate these concerns in a layer atop APIs whose back- 
> compatibility we already make promises about, no?
>
>> Implement standard Serialization across Lucene versions
>> -------------------------------------------------------
>>
>>                Key: LUCENE-1473
>>                URL: https://issues.apache.org/jira/browse/LUCENE-1473
>>            Project: Lucene - Java
>>         Issue Type: Bug
>>         Components: Search
>>   Affects Versions: 2.4
>>           Reporter: Jason Rutherglen
>>           Priority: Minor
>>        Attachments: custom-externalizable-reader.patch,  
>> LUCENE-1473.patch, LUCENE-1473.patch, LUCENE-1473.patch,  
>> LUCENE-1473.patch
>>
>>  Original Estimate: 8h
>> Remaining Estimate: 8h
>>
>> To maintain serialization compatibility between Lucene versions,  
>> serialVersionUID needs to be added to classes that implement  
>> java.io.Serializable.  java.io.Externalizable may be implemented in  
>> classes for faster performance.
>
> -- 
> This message is automatically generated by JIRA.
> -
> You can reply to this email to add a comment to the issue online.
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-dev-help@lucene.apache.org
>

--------------------------
Grant Ingersoll

Lucene Helpful Hints:
http://wiki.apache.org/lucene-java/BasicsOfPerformance
http://wiki.apache.org/lucene-java/LuceneFAQ











---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org

[jira] Commented: (LUCENE-1473) Implement standard Serialization across Lucene versions

Posted by "Doug Cutting (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/LUCENE-1473?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12654513#action_12654513 ] 

Doug Cutting commented on LUCENE-1473:
--------------------------------------

Would it take any more lines of code to remove Serializeable from the core classes and re-implement RemoteSearchable in a separate layer on top of the core APIs?  That layer could be a contrib module and could get all the externalizeable love it needs.  It could support a specific popular subset of query and filter classes, rather than arbitrary Query implementations.  It would be extensible, so that if folks wanted to support new kinds of queries, they easily could.  This other approach seems like a slippery slope, complicating already complex code with new concerns.  It would be better to encapsulate these concerns in a layer atop APIs whose back-compatibility we already make promises about, no?

> Implement standard Serialization across Lucene versions
> -------------------------------------------------------
>
>                 Key: LUCENE-1473
>                 URL: https://issues.apache.org/jira/browse/LUCENE-1473
>             Project: Lucene - Java
>          Issue Type: Bug
>          Components: Search
>    Affects Versions: 2.4
>            Reporter: Jason Rutherglen
>            Priority: Minor
>         Attachments: custom-externalizable-reader.patch, LUCENE-1473.patch, LUCENE-1473.patch, LUCENE-1473.patch, LUCENE-1473.patch
>
>   Original Estimate: 8h
>  Remaining Estimate: 8h
>
> To maintain serialization compatibility between Lucene versions, serialVersionUID needs to be added to classes that implement java.io.Serializable.  java.io.Externalizable may be implemented in classes for faster performance.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org

[jira] Commented: (LUCENE-1473) Implement Externalizable in main top level searcher classes

Posted by "Jason Rutherglen (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/LUCENE-1473?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12652555#action_12652555 ] 

Jason Rutherglen commented on LUCENE-1473:
------------------------------------------

The serialVersionUID needs to be written if the class is going to evolve.  It's written now, and currently in default serialization the field names are also written.  We'll need empty constructors.

> Implement Externalizable in main top level searcher classes
> -----------------------------------------------------------
>
>                 Key: LUCENE-1473
>                 URL: https://issues.apache.org/jira/browse/LUCENE-1473
>             Project: Lucene - Java
>          Issue Type: Bug
>          Components: Search
>    Affects Versions: 2.4
>            Reporter: Jason Rutherglen
>            Priority: Minor
>         Attachments: LUCENE-1473.patch
>
>
> To maintain serialization compatibility between Lucene versions, major classes can implement Externalizable.  This will make Serialization faster due to no reflection required and maintain backwards compatibility.  

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org

[jira] Commented: (LUCENE-1473) Implement Externalizable in main top level searcher classes

Posted by "Jason Rutherglen (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/LUCENE-1473?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12652945#action_12652945 ] 

Jason Rutherglen commented on LUCENE-1473:
------------------------------------------

Robert:
> using XML for queries if you provide a handler, 

That doesn't sound like query serialization.

SOLR has a binary protocol due to criticisms about XML being slow.  

I'm not sure why you and Doug and focusing on performance when that is not really the main issue I brought up.

Also I'm confused as to why dynamic classloading is being ignored by you folks as a Java feature that a Java search library could take advantage of to differentiate itself in the search (closed and open source) marketplace.  

> Implement Externalizable in main top level searcher classes
> -----------------------------------------------------------
>
>                 Key: LUCENE-1473
>                 URL: https://issues.apache.org/jira/browse/LUCENE-1473
>             Project: Lucene - Java
>          Issue Type: Bug
>          Components: Search
>    Affects Versions: 2.4
>            Reporter: Jason Rutherglen
>            Priority: Minor
>         Attachments: LUCENE-1473.patch
>
>
> To maintain serialization compatibility between Lucene versions, major classes can implement Externalizable.  This will make Serialization faster due to no reflection required and maintain backwards compatibility.  

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org

[jira] Commented: (LUCENE-1473) Implement Externalizable in main top level searcher classes

Posted by "robert engels (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/LUCENE-1473?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12652843#action_12652843 ] 

robert engels commented on LUCENE-1473:
---------------------------------------

I don't see why you can't just break compatibility between versions when dealing with Serialization. Just have it continue to mean live (or close to live) persistence.

Even the JDK does this (e.g. Swing serialization makes no guarantees). Just do the same - bound to change between releases...

Also, different compilers will generate different SUID... usually due to synthetic methods. It's kind of a problem...




> Implement Externalizable in main top level searcher classes
> -----------------------------------------------------------
>
>                 Key: LUCENE-1473
>                 URL: https://issues.apache.org/jira/browse/LUCENE-1473
>             Project: Lucene - Java
>          Issue Type: Bug
>          Components: Search
>    Affects Versions: 2.4
>            Reporter: Jason Rutherglen
>            Priority: Minor
>         Attachments: LUCENE-1473.patch
>
>
> To maintain serialization compatibility between Lucene versions, major classes can implement Externalizable.  This will make Serialization faster due to no reflection required and maintain backwards compatibility.  

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org

[jira] Commented: (LUCENE-1473) Implement Externalizable in main top level searcher classes

Posted by "Michael McCandless (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/LUCENE-1473?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12652576#action_12652576 ] 

Michael McCandless commented on LUCENE-1473:
--------------------------------------------

bq. The serialVersionUID needs to be written if the class is going to evolve.

Can we use a byte, not long, for serialVersionUID?  And maybe change its name to SERIAL_VERSION?  (I think serialVersionUID is completely unused once you implement Externalizable?).

This brings up another question: what's our back compat policy here?  For how many releases after you've serialized a Term can you still read it back?  This is getting complicated... I'm wondering if we shouldn't even go here (ie, make any promise that something serialized in release X will be deserializable on release Y).

I also think serialization is better done "at the top" where you can do a better job encoding things.  EG this is the purpose of the TermsDict (to serialize many Terms, in sorted order).

Jason, what's the big picture use case here; what are you serializing?

> Implement Externalizable in main top level searcher classes
> -----------------------------------------------------------
>
>                 Key: LUCENE-1473
>                 URL: https://issues.apache.org/jira/browse/LUCENE-1473
>             Project: Lucene - Java
>          Issue Type: Bug
>          Components: Search
>    Affects Versions: 2.4
>            Reporter: Jason Rutherglen
>            Priority: Minor
>         Attachments: LUCENE-1473.patch
>
>
> To maintain serialization compatibility between Lucene versions, major classes can implement Externalizable.  This will make Serialization faster due to no reflection required and maintain backwards compatibility.  

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org

[jira] Commented: (LUCENE-1473) Implement Externalizable in main top level searcher classes

Posted by "Jason Rutherglen (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/LUCENE-1473?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12652572#action_12652572 ] 

Jason Rutherglen commented on LUCENE-1473:
------------------------------------------

Lucene supports serialization explicitly by implementing Serializable.  

> Implement Externalizable in main top level searcher classes
> -----------------------------------------------------------
>
>                 Key: LUCENE-1473
>                 URL: https://issues.apache.org/jira/browse/LUCENE-1473
>             Project: Lucene - Java
>          Issue Type: Bug
>          Components: Search
>    Affects Versions: 2.4
>            Reporter: Jason Rutherglen
>            Priority: Minor
>         Attachments: LUCENE-1473.patch
>
>
> To maintain serialization compatibility between Lucene versions, major classes can implement Externalizable.  This will make Serialization faster due to no reflection required and maintain backwards compatibility.  

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org

[jira] Commented: (LUCENE-1473) Implement Externalizable in main top level searcher classes

Posted by "Mark Miller (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/LUCENE-1473?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12652899#action_12652899 ] 

Mark Miller commented on LUCENE-1473:
-------------------------------------

bq. The "implements Serializeable" was added to support RemoteSearchable. If we believe this creates a bug, then perhaps we should remove this and implement RemoteSearchable in another way. As it stands, Lucene does not support Java Serialization across Lucene versions. That seems to me more like a limitation than a bug, no?

There will be complaints no matter what. GWT tried getting around people having to implement Serializable by providing an interface with fewer promises: isSerizable. Many complained right away, as they had other classes that perhaps they where making Serializable simply for Apache XMLRpc or something. So now you can use either Serializable or isSerialiazble.

Personally, I think its fine to do as we are. I'm not against supporting more though. 

If we choose not to go further (and from what I can tell that decision has *not* yet been made yet, against or for) add to the javadocs about what we support, as I don't think its a bug myself. The Serializable interface indicates that the class and its subclasses will be Serializable, my reading of the javadoc does not indicate what cross version compatibility must be supported. I believe that is up to the implementor.

- Mark

> Implement Externalizable in main top level searcher classes
> -----------------------------------------------------------
>
>                 Key: LUCENE-1473
>                 URL: https://issues.apache.org/jira/browse/LUCENE-1473
>             Project: Lucene - Java
>          Issue Type: Bug
>          Components: Search
>    Affects Versions: 2.4
>            Reporter: Jason Rutherglen
>            Priority: Minor
>         Attachments: LUCENE-1473.patch
>
>
> To maintain serialization compatibility between Lucene versions, major classes can implement Externalizable.  This will make Serialization faster due to no reflection required and maintain backwards compatibility.  

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org

[jira] Commented: (LUCENE-1473) Implement standard Serialization across Lucene versions

Posted by "robert engels (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/LUCENE-1473?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12653421#action_12653421 ] 

robert engels commented on LUCENE-1473:
---------------------------------------

Even if you changed SUIDs based on version changes, there is the very real possibility that the new code CAN'T be instantiated in any meaningful way from the old data. Then what would you do?

Even if you had all of the old classes, and their dependencies available from dynamic classloading, it still won't work UNLESS every new feature is designed with backwards compatibility with previous versions  - a burden that is just too great when required of all Lucene code.

Given that, as has been discussed, there are other formats that can be used where isolated backwards persistence is desired (like XML based query descriptions).  Even these won't work if the XML description references explicit classes - which is why designing such a format for a near limitless query structure (given user defined query classes) is probably impossible.

So strive for a decent solution that covers most cases, and fails gracefully when it can't work.

using standard serialization (with proper transient fields) seems to fit this bill, since in a stable API, most core classes should remain fairly constant, and those that are bound to change may take explicit steps in their serialization (if deemed needed)


> Implement standard Serialization across Lucene versions
> -------------------------------------------------------
>
>                 Key: LUCENE-1473
>                 URL: https://issues.apache.org/jira/browse/LUCENE-1473
>             Project: Lucene - Java
>          Issue Type: Bug
>          Components: Search
>    Affects Versions: 2.4
>            Reporter: Jason Rutherglen
>            Priority: Minor
>         Attachments: LUCENE-1473.patch
>
>   Original Estimate: 8h
>  Remaining Estimate: 8h
>
> To maintain serialization compatibility between Lucene versions, serialVersionUID needs to be added to classes that implement java.io.Serializable.  java.io.Externalizable may be implemented in classes for faster performance.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org

[jira] Commented: (LUCENE-1473) Implement Externalizable in main top level searcher classes

Posted by "Jason Rutherglen (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/LUCENE-1473?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12652599#action_12652599 ] 

Jason Rutherglen commented on LUCENE-1473:
------------------------------------------

The Spring framework http://www.springframework.org/ is a good example of a widely used open source Java project that implements and uses Serialization in most of it's classes.  If Serialization will not be fixed in Lucene then perhaps it's best to implement a serializable wrapper in the Spring project for Lucene.

> Implement Externalizable in main top level searcher classes
> -----------------------------------------------------------
>
>                 Key: LUCENE-1473
>                 URL: https://issues.apache.org/jira/browse/LUCENE-1473
>             Project: Lucene - Java
>          Issue Type: Bug
>          Components: Search
>    Affects Versions: 2.4
>            Reporter: Jason Rutherglen
>            Priority: Minor
>         Attachments: LUCENE-1473.patch
>
>
> To maintain serialization compatibility between Lucene versions, major classes can implement Externalizable.  This will make Serialization faster due to no reflection required and maintain backwards compatibility.  

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org

[jira] Commented: (LUCENE-1473) Implement Externalizable in main top level searcher classes

Posted by "Doug Cutting (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/LUCENE-1473?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12652932#action_12652932 ] 

Doug Cutting commented on LUCENE-1473:
--------------------------------------

> Doesn't Hadoop handle versioning inside of Writeable classes?

Currently, yes, but this probably insufficient for a Hadoop 1.0 release.  Hadoop is a distributed system, and would like to provide RPC back-compatibility across minor versions after we go 1.0.  This is an explicit decision that's in the process of discussion within the Hadoop project.  RPC's within Hadoop daemons will probably require identical versions -- all daemons in a cluster must be upgraded in lockstep.  We'll thus probably limit back-compatibility to client RPC's, so that a given client can talk to multiple clusters that are not running identical versions of Hadoop.  Lucene has made no such explicit policy decision.  Lucene is not an inherently distributed system.

Hadoop has not yet decided what mechanism to use to support back-compatible RPC, but Writable versioning is not sufficient, since it does not handle RPC protocols, and it's lower-level than we'd prefer.  We'll probably go with something more like Thrift or ProtocolBuffers.  Hadoop does not use Java serialization and makes no promises about that.

> The developer contributions seem to be quite low right now, especially compared to neighbor projects such as Hadoop.

As one who monitors both projects, I don't see a marked difference.  In both there are sometimes patches that unfortunately languish, because, while they're important to the contributor, they fail to sufficiently engage a committer.  For example, HADOOP-3422 took over 6 months to get committed, probably because not many committers use Ganglia.

There is a difference in quantity: hadoop-dev has over 4x the traffic of lucene-dev.  But, normalized for that, the number of patches from non-committers feels comparable.  If anything I'd guess Lucene commits more patches from non-committers than does Hadoop.


> Implement Externalizable in main top level searcher classes
> -----------------------------------------------------------
>
>                 Key: LUCENE-1473
>                 URL: https://issues.apache.org/jira/browse/LUCENE-1473
>             Project: Lucene - Java
>          Issue Type: Bug
>          Components: Search
>    Affects Versions: 2.4
>            Reporter: Jason Rutherglen
>            Priority: Minor
>         Attachments: LUCENE-1473.patch
>
>
> To maintain serialization compatibility between Lucene versions, major classes can implement Externalizable.  This will make Serialization faster due to no reflection required and maintain backwards compatibility.  

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org

[jira] Updated: (LUCENE-1473) Implement standard Serialization across Lucene versions

Posted by "Jason Rutherglen (JIRA)" <ji...@apache.org>.

     [ https://issues.apache.org/jira/browse/LUCENE-1473?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Jason Rutherglen updated LUCENE-1473:
-------------------------------------

    Attachment: LUCENE-1473.patch

LUCENE-1473.patch

Added Externalizable to Document, Field, AbstractField (as compared to the previous patch).  SerializationUtils is included.

TODO:
- More Externalizable classes with test cases for each one





> Implement standard Serialization across Lucene versions
> -------------------------------------------------------
>
>                 Key: LUCENE-1473
>                 URL: https://issues.apache.org/jira/browse/LUCENE-1473
>             Project: Lucene - Java
>          Issue Type: Bug
>          Components: Search
>    Affects Versions: 2.4
>            Reporter: Jason Rutherglen
>            Priority: Minor
>         Attachments: LUCENE-1473.patch, LUCENE-1473.patch, LUCENE-1473.patch
>
>   Original Estimate: 8h
>  Remaining Estimate: 8h
>
> To maintain serialization compatibility between Lucene versions, serialVersionUID needs to be added to classes that implement java.io.Serializable.  java.io.Externalizable may be implemented in classes for faster performance.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org

[jira] Updated: (LUCENE-1473) Implement Externalizable in main top level searcher classes

Posted by "Jason Rutherglen (JIRA)" <ji...@apache.org>.

     [ https://issues.apache.org/jira/browse/LUCENE-1473?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Jason Rutherglen updated LUCENE-1473:
-------------------------------------

    Attachment: LUCENE-1473.patch

LUCENE-1473.patch

Term implements Externalizable.  Added serialVersionUID handling in the read/writeExternal methods.  The long encoding needs to be variable long encoded to reduce the size of the resulting serialized bytes.

If it looks ok, I will implement Externalizable in other classes.  

> Implement Externalizable in main top level searcher classes
> -----------------------------------------------------------
>
>                 Key: LUCENE-1473
>                 URL: https://issues.apache.org/jira/browse/LUCENE-1473
>             Project: Lucene - Java
>          Issue Type: Bug
>          Components: Search
>    Affects Versions: 2.4
>            Reporter: Jason Rutherglen
>            Priority: Minor
>         Attachments: LUCENE-1473.patch
>
>
> To maintain serialization compatibility between Lucene versions, major classes can implement Externalizable.  This will make Serialization faster due to no reflection required and maintain backwards compatibility.  

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org

[jira] Updated: (LUCENE-1473) Implement standard Serialization across Lucene versions

Posted by "Jason Rutherglen (JIRA)" <ji...@apache.org>.

     [ https://issues.apache.org/jira/browse/LUCENE-1473?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Jason Rutherglen updated LUCENE-1473:
-------------------------------------

    Attachment: LUCENE-1473.patch

LUCENE-1473.patch

Added some more Externalizables.  

o.a.l.util.Parameter is peculiar in that it implements readResolve to override the serialization and return a local object to emulate enums.  I haven't figured out the places this is used and what the best approach is to externalize them.

TODO:
- Same as before

Doug wrote: ""within a major release cycle, serialized queries from older releases will work with newer releases, however serialized queries from newer releases will not generally work with older releases, since we might add new kinds of queries in the course of a major release cycle". Similarly detailed statements would need to be made for each Externalizeable, no?"

Serialized objects in minor releases will work.  Serialized objects of older versions starting with 2.9 will be compatible with newer versions.  New versions will be compatible with older versions on a classes by class basis defined in the release notes.  It could look something like this:

Serialization notes:
BooleanQuery added a scoreMap variable that does not have a default value in 3.0 and is now not backwards compatible with 2.9.  
PhraseQuery added a ultraWeight variable that defaults to true in 3.0 and is backwards compatible with 2.9.

> Implement standard Serialization across Lucene versions
> -------------------------------------------------------
>
>                 Key: LUCENE-1473
>                 URL: https://issues.apache.org/jira/browse/LUCENE-1473
>             Project: Lucene - Java
>          Issue Type: Bug
>          Components: Search
>    Affects Versions: 2.4
>            Reporter: Jason Rutherglen
>            Priority: Minor
>         Attachments: LUCENE-1473.patch, LUCENE-1473.patch, LUCENE-1473.patch, LUCENE-1473.patch
>
>   Original Estimate: 8h
>  Remaining Estimate: 8h
>
> To maintain serialization compatibility between Lucene versions, serialVersionUID needs to be added to classes that implement java.io.Serializable.  java.io.Externalizable may be implemented in classes for faster performance.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org

[jira] Commented: (LUCENE-1473) Implement standard Serialization across Lucene versions

Posted by "Michael McCandless (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/LUCENE-1473?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12653321#action_12653321 ] 

Michael McCandless commented on LUCENE-1473:
--------------------------------------------

bq. For classes that no one submits an Externalizable patch for, the serialVersionUID needs to be added.

The serialVersionUID approach would be too simplistic, because we can't simply bump it up whenever we make a change since that then breaks back compatibility.  We would have to override write/readObject or write/readExternal, and serialVersionUID would not be used.

> Implement standard Serialization across Lucene versions
> -------------------------------------------------------
>
>                 Key: LUCENE-1473
>                 URL: https://issues.apache.org/jira/browse/LUCENE-1473
>             Project: Lucene - Java
>          Issue Type: Bug
>          Components: Search
>    Affects Versions: 2.4
>            Reporter: Jason Rutherglen
>            Priority: Minor
>         Attachments: LUCENE-1473.patch
>
>   Original Estimate: 8h
>  Remaining Estimate: 8h
>
> To maintain serialization compatibility between Lucene versions, serialVersionUID needs to be added to classes that implement java.io.Serializable.  java.io.Externalizable may be implemented in classes for faster performance.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org

[jira] Commented: (LUCENE-1473) Implement Externalizable in main top level searcher classes

Posted by "Doug Cutting (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/LUCENE-1473?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12652888#action_12652888 ] 

Doug Cutting commented on LUCENE-1473:
--------------------------------------

bq. If it is not meant to be serialized, why did it implement Serializable. Furthermore, what is the reason to avoid it being serialized? I find the reason being the cost of support kinda ridiculous, seems this reason can be applied to any bug fix, because this at the end of the day, it is a bug.

The "implements Serializeable" was added to support RemoteSearchable.  If we believe this creates a bug, then perhaps we should remove this and implement RemoteSearchable in another way.  As it stands, Lucene does not support Java Serialization across Lucene versions.  That seems to me more like a limitation than a bug, no?

Every line of code added to Lucene is a support burden, so we must carefully weigh the costs and benefits of each line.  This issue proposes to add many lines, and to add a substantial new back-compatibility requirement.  Back-compatibility is something that Lucene takes seriously. We make promises about both API back-compatibility and file-format back-compatibility.  These already significantly constrain development.  Adding a new back-compatibility requirement should not be done lightly, but only after broad consensus is reached through patient discussion.


> Implement Externalizable in main top level searcher classes
> -----------------------------------------------------------
>
>                 Key: LUCENE-1473
>                 URL: https://issues.apache.org/jira/browse/LUCENE-1473
>             Project: Lucene - Java
>          Issue Type: Bug
>          Components: Search
>    Affects Versions: 2.4
>            Reporter: Jason Rutherglen
>            Priority: Minor
>         Attachments: LUCENE-1473.patch
>
>
> To maintain serialization compatibility between Lucene versions, major classes can implement Externalizable.  This will make Serialization faster due to no reflection required and maintain backwards compatibility.  

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org

[jira] Updated: (LUCENE-1473) Implement standard Serialization across Lucene versions

Posted by "Jason Rutherglen (JIRA)" <ji...@apache.org>.

     [ https://issues.apache.org/jira/browse/LUCENE-1473?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Jason Rutherglen updated LUCENE-1473:
-------------------------------------

    Description: To maintain serialization compatibility between Lucene versions, serialVersionUID needs to be added to classes that implement java.io.Serializable.  java.io.Externalizable may be implemented in classes for faster performance.  (was: To maintain serialization compatibility between Lucene versions, major classes can implement Externalizable.  This will make Serialization faster due to no reflection required and maintain backwards compatibility.  )

> Implement standard Serialization across Lucene versions
> -------------------------------------------------------
>
>                 Key: LUCENE-1473
>                 URL: https://issues.apache.org/jira/browse/LUCENE-1473
>             Project: Lucene - Java
>          Issue Type: Bug
>          Components: Search
>    Affects Versions: 2.4
>            Reporter: Jason Rutherglen
>            Priority: Minor
>         Attachments: LUCENE-1473.patch
>
>   Original Estimate: 8h
>  Remaining Estimate: 8h
>
> To maintain serialization compatibility between Lucene versions, serialVersionUID needs to be added to classes that implement java.io.Serializable.  java.io.Externalizable may be implemented in classes for faster performance.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org

[jira] Commented: (LUCENE-1473) Implement standard Serialization across Lucene versions

Posted by "John Wang (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/LUCENE-1473?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12653563#action_12653563 ] 

John Wang commented on LUCENE-1473:
-----------------------------------

For our problem, it is Query all all its derived and encapsulated classes. I guess the title of the bug is too generic.

As far as my comment about other lucene classes, one can just go to the lucene javadoc and click on "Tree" and look for Serializable. If you want me to, I can go an fetch the complete list, but here are some examples:

1) Document (Field etc.)
2) OpenBitSet, Filter ...
3) Sort, SortField
4) Term
5) TopDocs, Hits etc.

For the top level API.



> Implement standard Serialization across Lucene versions
> -------------------------------------------------------
>
>                 Key: LUCENE-1473
>                 URL: https://issues.apache.org/jira/browse/LUCENE-1473
>             Project: Lucene - Java
>          Issue Type: Bug
>          Components: Search
>    Affects Versions: 2.4
>            Reporter: Jason Rutherglen
>            Priority: Minor
>         Attachments: LUCENE-1473.patch
>
>   Original Estimate: 8h
>  Remaining Estimate: 8h
>
> To maintain serialization compatibility between Lucene versions, serialVersionUID needs to be added to classes that implement java.io.Serializable.  java.io.Externalizable may be implemented in classes for faster performance.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org

[jira] Commented: (LUCENE-1473) Implement Externalizable in main top level searcher classes

Posted by "robert engels (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/LUCENE-1473?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12652962#action_12652962 ] 

robert engels commented on LUCENE-1473:
---------------------------------------

The reason the XML is not needed, is because the string format is robust enough, and is simpler...

I am not focused on performance. It is just that Java serialization works well for temporary persistence. Other formats are better for long-term persistence. If you are only doing temporary persistence, you don't need backwards 
compatibility.

Also if the API is exposed to the world (i.e. non-Java), a human readable (or even binary) format that is not based on Java serialization is going to work much better.

if you use a non-binary protocol it is far easier to extend it in the future, and retain the ability to easily read older versions. This is why Swing uses XML for serialization, and not binary.

You could certainly use specialized class loaders to load old versions of classes in order to maintain the ability to read old now incompatible classes... it is just a lot of work (maintenance too, need to keep the old code around... etc.) for not a lot of benefit.

As for SOLR's binary protocol, fine, but it is probably for a fringe use case, or the submitter didn't do real world tests...  The XML parsing is just not that much greater than binary (at least in Java, since it is the object creation they both use that affects it). The search time is going to be far greater.

For large updates a binary loader can be more efficient that XML, but if you test it using real-world examples, I doubt you will see a huge difference - at least for the types of application TYPICALLY written using Lucene.



> Implement Externalizable in main top level searcher classes
> -----------------------------------------------------------
>
>                 Key: LUCENE-1473
>                 URL: https://issues.apache.org/jira/browse/LUCENE-1473
>             Project: Lucene - Java
>          Issue Type: Bug
>          Components: Search
>    Affects Versions: 2.4
>            Reporter: Jason Rutherglen
>            Priority: Minor
>         Attachments: LUCENE-1473.patch
>
>
> To maintain serialization compatibility between Lucene versions, major classes can implement Externalizable.  This will make Serialization faster due to no reflection required and maintain backwards compatibility.  

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org

[jira] Commented: (LUCENE-1473) Implement standard Serialization across Lucene versions

Posted by "Michael McCandless (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/LUCENE-1473?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12653297#action_12653297 ] 

Michael McCandless commented on LUCENE-1473:
--------------------------------------------

bq. It seems best to remove Serialization from Lucene so that users are not confused and create a better solution.

I don't think that's the case.  If we choose to only support "live serialization" then we should add "implements Serializable" but spell out clearly in the javadocs that there is no guarantee of cross-version compatibility ("long term persistence") and in fact that often there are incompatibilities.

I think "live serialization" is still a useful feature.

> Implement standard Serialization across Lucene versions
> -------------------------------------------------------
>
>                 Key: LUCENE-1473
>                 URL: https://issues.apache.org/jira/browse/LUCENE-1473
>             Project: Lucene - Java
>          Issue Type: Bug
>          Components: Search
>    Affects Versions: 2.4
>            Reporter: Jason Rutherglen
>            Priority: Minor
>         Attachments: LUCENE-1473.patch
>
>   Original Estimate: 8h
>  Remaining Estimate: 8h
>
> To maintain serialization compatibility between Lucene versions, serialVersionUID needs to be added to classes that implement java.io.Serializable.  java.io.Externalizable may be implemented in classes for faster performance.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org

[jira] Commented: (LUCENE-1473) Implement Externalizable in main top level searcher classes

Posted by "Jason Rutherglen (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/LUCENE-1473?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12652921#action_12652921 ] 

Jason Rutherglen commented on LUCENE-1473:
------------------------------------------

Mark: "There will be complaints no matter what. GWT tried getting around people having to implement Serializable by providing an interface with fewer promises: isSerizable. Many complained right away, as they had other classes that perhaps they where making Serializable simply for Apache XMLRpc or something. So now you can use either Serializable or isSerialiazble.

Personally, I think its fine to do as we are. I'm not against supporting more though. "

Externalizable and Serializable work interchangeably, a nice feature of Java.  For classes that no one submits an Externalizable patch for, the serialVersionUID needs to be added.  For ones that implement Externalizable, there is slightly more work, but not something someone with a year of Java experience can't maintain.

> Implement Externalizable in main top level searcher classes
> -----------------------------------------------------------
>
>                 Key: LUCENE-1473
>                 URL: https://issues.apache.org/jira/browse/LUCENE-1473
>             Project: Lucene - Java
>          Issue Type: Bug
>          Components: Search
>    Affects Versions: 2.4
>            Reporter: Jason Rutherglen
>            Priority: Minor
>         Attachments: LUCENE-1473.patch
>
>
> To maintain serialization compatibility between Lucene versions, major classes can implement Externalizable.  This will make Serialization faster due to no reflection required and maintain backwards compatibility.  

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org

[jira] Commented: (LUCENE-1473) Implement standard Serialization across Lucene versions

Posted by "Jason Rutherglen (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/LUCENE-1473?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12653955#action_12653955 ] 

Jason Rutherglen commented on LUCENE-1473:
------------------------------------------

Doug wrote: We can save, in files, serialized instances of each query type from the oldest release we intend to support. Then read each of thes queries and check that it s equal to a current query that's meant to be equivalent (ssuming all queries implement equals well). Something similar would need to be done for each class that is meant to be transmitted cross-version.

This tests that older queries may be processed by newer code. It does not test that newer queries can be processed by older code. Documentation is a big part of this effort, that should be completed first. What guarantees to we intend to provide? Once we've documented these, then we can begin writing tests. For example, we may only guarantee that older queries work with newer code, and that newer hits work with older code. To test that we'd need to have an old jar around that we could test against. This will be a trickier test to configure.

--------------

Makes sense.  I guarantee 2.9 and above classes will be backward compatible with the previous classes.  I think that for 3.0 we'll start to create new replacement classes that will not conflict with the old classes.  I'd really like to redesign the query, similarity, and scoring code to work with flexible indexing and allow new algorithms.  This new code will not create changes in the existing query, similarity, and scoring code which will remain serialization compatible with 2.9.  The 2.9 query, similarity, and scoring should leverage the new query, similarity and scoring code to be backwards compatible.  

> Implement standard Serialization across Lucene versions
> -------------------------------------------------------
>
>                 Key: LUCENE-1473
>                 URL: https://issues.apache.org/jira/browse/LUCENE-1473
>             Project: Lucene - Java
>          Issue Type: Bug
>          Components: Search
>    Affects Versions: 2.4
>            Reporter: Jason Rutherglen
>            Priority: Minor
>         Attachments: LUCENE-1473.patch, LUCENE-1473.patch, LUCENE-1473.patch
>
>   Original Estimate: 8h
>  Remaining Estimate: 8h
>
> To maintain serialization compatibility between Lucene versions, serialVersionUID needs to be added to classes that implement java.io.Serializable.  java.io.Externalizable may be implemented in classes for faster performance.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org

[jira] Commented: (LUCENE-1473) Implement standard Serialization across Lucene versions

Posted by "Mark Miller (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/LUCENE-1473?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12655689#action_12655689 ] 

Mark Miller commented on LUCENE-1473:
-------------------------------------

Thanks Wolf, +1 on the change. This issue proposes to do the same thing: LUCENE-1407

> Implement standard Serialization across Lucene versions
> -------------------------------------------------------
>
>                 Key: LUCENE-1473
>                 URL: https://issues.apache.org/jira/browse/LUCENE-1473
>             Project: Lucene - Java
>          Issue Type: Bug
>          Components: Search
>    Affects Versions: 2.4
>            Reporter: Jason Rutherglen
>            Priority: Minor
>         Attachments: custom-externalizable-reader.patch, LUCENE-1473.patch, LUCENE-1473.patch, LUCENE-1473.patch, LUCENE-1473.patch, lucene-contrib-remote.patch
>
>   Original Estimate: 8h
>  Remaining Estimate: 8h
>
> To maintain serialization compatibility between Lucene versions, serialVersionUID needs to be added to classes that implement java.io.Serializable.  java.io.Externalizable may be implemented in classes for faster performance.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org

[jira] Commented: (LUCENE-1473) Implement standard Serialization across Lucene versions

Posted by "Doug Cutting (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/LUCENE-1473?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12656071#action_12656071 ] 

Doug Cutting commented on LUCENE-1473:
--------------------------------------

> Therefore the patch is to be taken as contribution to explore the design space [ ... ]

Yes, and it is much appreciated for that.  Thanks again!

> Currently Searchable does include a HitCollector-based search method [ ... ]

You're right.  I misremembered.  This dates back to the origin of Searchable.

http://svn.apache.org/viewvc?view=rev&revision=149813

Personally, I think it would be reasonable for a distributed implementation to throw an exception if one tries to use a HitCollector.

> We could either solve it along the line you propose, or revert to pass the Weight again instead of the Query.

Without using an introspection-based serialization like Java serialization it would be difficult to pass a Weight over the wire using public APIs, since most implementations are not public.  But, since Weight's are constructed via a standard protocol, the method I outlined could work.


> Implement standard Serialization across Lucene versions
> -------------------------------------------------------
>
>                 Key: LUCENE-1473
>                 URL: https://issues.apache.org/jira/browse/LUCENE-1473
>             Project: Lucene - Java
>          Issue Type: Bug
>          Components: Search
>    Affects Versions: 2.4
>            Reporter: Jason Rutherglen
>            Priority: Minor
>         Attachments: custom-externalizable-reader.patch, LUCENE-1473.patch, LUCENE-1473.patch, LUCENE-1473.patch, LUCENE-1473.patch, lucene-contrib-remote.patch
>
>   Original Estimate: 8h
>  Remaining Estimate: 8h
>
> To maintain serialization compatibility between Lucene versions, serialVersionUID needs to be added to classes that implement java.io.Serializable.  java.io.Externalizable may be implemented in classes for faster performance.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org

Re: [jira] Commented: (LUCENE-1473) Implement Externalizable in main top level searcher classes

Posted by John Wang <jo...@gmail.com>.

Doug:
       My apologies if I came off seeming angry and/or trying to lobby to be
a committer. Neither is the case.

       I am expressing a concern with how patches are being handled with
this project, and providing my view point on how this can be better managed.
Of course my concern can be either accepted or rejected. I just hope the
committers would be "calm" enough to be able to see criticisms for what they
are.

       I am a strong advocate of Lucene, hence my passion for its success.

-John

On Wed, Dec 3, 2008 at 10:07 AM, Doug Cutting <cu...@apache.org> wrote:

> John Wang wrote:
>
>> If you guys need help, maybe you guys should expand your committer list?
>>
>
> Committers are added when they've contributed a series of high-quality
> patches that have been committed, and demonstrated their ability to be easy
> to work with.  Displaying anger is not a good way to become a committer.
>  Calm persistence is advised.
>
> Lucene does not currently use Java Serialization much.  Many committers may
> not be terribly familiar with it.
>
>  Use case: deploying lucene in a distributed environment, we have a
>> broker/server architecture. (standard stuff), we want roll out search
>> servers with lucene 2.4 instance by instance. The problem is that the
>> broker is sending a Query object to the searcher via java
>> serialization at the server level, and the broker is running 2.3. And
>> because of specifically this problem, 2.3 brokers cannot to talk to
>> 2.4 search servers even when the Query object was not changed.
>>
>
> Thanks for providing a use case.  One way to address this would be for
> Lucene to better support cross-version serialization.  Another way might be
> for your application, which adds this requirement, to use an alternate
> representation for queries that it can guarantee is compatible across
> versions, e.g., a string.  Might that be possible?
>
> Doug
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-dev-help@lucene.apache.org
>
>

Re: [jira] Commented: (LUCENE-1473) Implement Externalizable in main top level searcher classes

Posted by Doug Cutting <cu...@apache.org>.

John Wang wrote:
> If you guys need help, maybe you guys should expand your committer list?

Committers are added when they've contributed a series of high-quality 
patches that have been committed, and demonstrated their ability to be 
easy to work with.  Displaying anger is not a good way to become a 
committer.  Calm persistence is advised.

Lucene does not currently use Java Serialization much.  Many committers 
may not be terribly familiar with it.

> Use case: deploying lucene in a distributed environment, we have a
> broker/server architecture. (standard stuff), we want roll out search
> servers with lucene 2.4 instance by instance. The problem is that the
> broker is sending a Query object to the searcher via java
> serialization at the server level, and the broker is running 2.3. And
> because of specifically this problem, 2.3 brokers cannot to talk to
> 2.4 search servers even when the Query object was not changed.

Thanks for providing a use case.  One way to address this would be for 
Lucene to better support cross-version serialization.  Another way might 
be for your application, which adds this requirement, to use an 
alternate representation for queries that it can guarantee is compatible 
across versions, e.g., a string.  Might that be possible?

Doug

---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org

Re: [jira] Commented: (LUCENE-1473) Implement Externalizable in main top level searcher classes

Posted by John Wang <jo...@gmail.com>.

If you guys need help, maybe you guys should expand your committer list?
"product speaks well for itself so far", from what I have heard, losta
people are just branching off the code-base and making changes and do merges
every release. I really don't want to do that here, but I am being forced
down that road. What I think should be avoided what happened to Linux, where
there are different versions of the kernel, e.g. there are different version
of lucene projects.

Don't get me wrong, I think it is one of the best projects out there. But
sometimes I think you guys should listen to the community a bit more,
instead of presuming how the product is used.

Anyway, thanks for looking into those issues.

-John

On Tue, Dec 2, 2008 at 4:11 PM, Mark Miller <ma...@gmail.com> wrote:

> I worked on getting both of thoses issues resolved :) Sorry, can't please
> everyone. If it helps, I'll commit that second one soon now that I can. It's
> lazy consensus around here man. Mabye it's not ideal, but I think the
> product speaks well for itself so far. I've never met a more accomadating
> group of guys myself. It is a large part volunteer effort.
>
> - Mark
>
> On Dec 2, 2008, at 7:02 PM, "John Wang" <jo...@gmail.com> wrote:
>
> I have described our use-case in good detail. I think it is a common
> architecture. And we are not using RemoteSearcher. This problem is not tied
> to RemoteSearcher, and we are not using RMI. Serialized java objects can be
> used at places other than RMI.
> "sometime you IMp serializable for RMI but you don't want to fully support
> it. Mabye it's not great java, but it's common enough, and makes sense to me
> in  certain instances." - does not make sense to me. There are lotsa bugs
> that are common, e.g. thread-safety, dead-lock, memory leak, and they are
> bad java, doesn't mean they should not be addressed.
>
> Pardon me for being blunt, but this is really a bug: the expected behavior
> stated by the API is not honored. It would have been avoided if the same
> compiler was used for the release, with Java being WORA, this smells like a
> bug to me.
>
> My frustration is not unfounded, here are some examples I personally ran
> into:
>
> <https://issues.apache.org/jira/browse/LUCENE-1246>
> https://issues.apache.org/jira/browse/LUCENE-1246: simple 1 line null
> check, over 8 months, and still being "discussed"
>
> <https://issues.apache.org/jira/browse/SOLR-243>
> https://issues.apache.org/jira/browse/SOLR-243: with 4 votes, also few
> lines of change with the patch was originally done, over 18months, and still
> being "discussed"
>
> -John
>
> On Tue, Dec 2, 2008 at 3:43 PM, Mark Miller < <ma...@gmail.com>
> markrmiller@gmail.com> wrote:
>
>> Woah! I think you got the wrong impression. I think Doug said basically
>> what I was thinking (if not a bit more clearly than I was thinking it). I
>> think we are all open to any good patches. It's nice to understand and
>> discuss them first though.
>>
>> To reiterate what doug mentioned, sometime you IMp serializable for RMI
>> but you don't want to fully support it. Mabye it's not great java, but it's
>> common enough, and makes sense to me in  certain instances.
>>
>> - Mark
>>
>>
>>
>> On Dec 2, 2008, at 6:30 PM, "John Wang (JIRA)" < <ji...@apache.org>
>> jira@apache.org> wrote:
>>
>>
>>>   [
>>> <https://issues.apache.org/jira/browse/LUCENE-1473?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12652594#action_12652594>
>>> https://issues.apache.org/jira/browse/LUCENE-1473?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12652594#action_12652594
>>>  ]
>>>
>>> John Wang commented on LUCENE-1473:
>>> -----------------------------------
>>>
>>> the fact an object implements Serializable implies this object can be
>>> serialized. It is a known good java programming practice to include a suid
>>> to the class (as a static variable) when the object declares itself to be
>>> Serializable. If it is not meant to be serialized, why did it implement
>>> Serializable. Furthermore, what is the reason to avoid it being serialized?
>>> I find the reason being the cost of support kinda ridiculous, seems this
>>> reason can be applied to any bug fix, because this at the end of the day, it
>>> is a bug.
>>>
>>> I don't understand the issue of "extra bytes" to the term dictionary if
>>> the Term instance is not actually serialized to the index (at least I really
>>> hope that is not done)
>>>
>>> The serialVersionUID (suid) is a long because it is a java thing. Here is
>>> a link to some information on the subject:
>>> <http://java.sun.com/developer/technicalArticles/Programming/serialization/>
>>> http://java.sun.com/developer/technicalArticles/Programming/serialization/
>>>
>>> Use case: deploying lucene in a distributed environment, we have a
>>> broker/server architecture. (standard stuff), we want roll out search
>>> servers with lucene 2.4 instance by instance. The problem is that the broker
>>> is sending a Query object to the searcher via java serialization at the
>>> server level, and the broker is running 2.3. And because of specifically
>>> this problem, 2.3 brokers cannot to talk to 2.4 search servers even when the
>>> Query object was not changed.
>>>
>>> To me, this is a very valid use-case. The problem was two different
>>> people did the release with different compilers.
>>>
>>> At the risk of pissing off the Lucene powerhouse, I feel I have to
>>> express some candor. I am growing more and more frustrated with the lack of
>>> the open source nature of this project and its unwillingness to work with
>>> the developer community. This is a rather trivial issue, and it is taking 7
>>> back-and-forth's to reiterate some standard Java behavior that has been
>>> around for years.
>>>
>>> Lucene is a great project and has enjoyed great success, and I think it
>>> is to everyone's interest to make sure Lucene grows in a healthy
>>> environment.
>>>
>>>
>>>
>>>  Implement Externalizable in main top level searcher classes
>>>> -----------------------------------------------------------
>>>>
>>>>               Key: LUCENE-1473
>>>>               URL: <https://issues.apache.org/jira/browse/LUCENE-1473>
>>>> https://issues.apache.org/jira/browse/LUCENE-1473
>>>>           Project: Lucene - Java
>>>>        Issue Type: Bug
>>>>        Components: Search
>>>>  Affects Versions: 2.4
>>>>          Reporter: Jason Rutherglen
>>>>          Priority: Minor
>>>>       Attachments: LUCENE-1473.patch
>>>>
>>>>
>>>> To maintain serialization compatibility between Lucene versions, major
>>>> classes can implement Externalizable.  This will make Serialization faster
>>>> due to no reflection required and maintain backwards compatibility.
>>>>
>>>
>>> --
>>> This message is automatically generated by JIRA.
>>> -
>>> You can reply to this email to add a comment to the issue online.
>>>
>>>
>>> ---------------------------------------------------------------------
>>> To unsubscribe, e-mail: <ja...@lucene.apache.org>
>>> java-dev-unsubscribe@lucene.apache.org
>>> For additional commands, e-mail: <ja...@lucene.apache.org>
>>> java-dev-help@lucene.apache.org
>>>
>>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: <ja...@lucene.apache.org>
>> java-dev-unsubscribe@lucene.apache.org
>> For additional commands, e-mail: <ja...@lucene.apache.org>
>> java-dev-help@lucene.apache.org
>>
>>
>

Re: [jira] Commented: (LUCENE-1473) Implement Externalizable in main top level searcher classes

Posted by Mark Miller <ma...@gmail.com>.

I worked on getting both of thoses issues resolved :) Sorry, can't  
please everyone. If it helps, I'll commit that second one soon now  
that I can. It's lazy consensus around here man. Mabye it's not ideal,  
but I think the product speaks well for itself so far. I've never met  
a more accomadating group of guys myself. It is a large part volunteer  
effort.

- Mark


On Dec 2, 2008, at 7:02 PM, "John Wang" <jo...@gmail.com> wrote:

> I have described our use-case in good detail. I think it is a common  
> architecture. And we are not using RemoteSearcher. This problem is  
> not tied to RemoteSearcher, and we are not using RMI. Serialized  
> java objects can be used at places other than RMI.
>
> "sometime you IMp serializable for RMI but you don't want to fully  
> support it. Mabye it's not great java, but it's common enough, and  
> makes sense to me in  certain instances." - does not make sense to  
> me. There are lotsa bugs that are common, e.g. thread-safety, dead- 
> lock, memory leak, and they are bad java, doesn't mean they should  
> not be addressed.
>
> Pardon me for being blunt, but this is really a bug: the expected  
> behavior stated by the API is not honored. It would have been  
> avoided if the same compiler was used for the release, with Java  
> being WORA, this smells like a bug to me.
>
> My frustration is not unfounded, here are some examples I personally  
> ran into:
>
> https://issues.apache.org/jira/browse/LUCENE-1246: simple 1 line  
> null check, over 8 months, and still being "discussed"
>
> https://issues.apache.org/jira/browse/SOLR-243: with 4 votes, also  
> few lines of change with the patch was originally done, over  
> 18months, and still being "discussed"
>
> -John
>
> On Tue, Dec 2, 2008 at 3:43 PM, Mark Miller <ma...@gmail.com>  
> wrote:
> Woah! I think you got the wrong impression. I think Doug said  
> basically what I was thinking (if not a bit more clearly than I was  
> thinking it). I think we are all open to any good patches. It's nice  
> to understand and discuss them first though.
>
> To reiterate what doug mentioned, sometime you IMp serializable for  
> RMI but you don't want to fully support it. Mabye it's not great  
> java, but it's common enough, and makes sense to me in  certain  
> instances.
>
> - Mark
>
>
>
> On Dec 2, 2008, at 6:30 PM, "John Wang (JIRA)" <ji...@apache.org>  
> wrote:
>
>
>   [ https://issues.apache.org/jira/browse/LUCENE-1473?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12652594#action_12652594 
>  ]
>
> John Wang commented on LUCENE-1473:
> -----------------------------------
>
> the fact an object implements Serializable implies this object can  
> be serialized. It is a known good java programming practice to  
> include a suid to the class (as a static variable) when the object  
> declares itself to be Serializable. If it is not meant to be  
> serialized, why did it implement Serializable. Furthermore, what is  
> the reason to avoid it being serialized? I find the reason being the  
> cost of support kinda ridiculous, seems this reason can be applied  
> to any bug fix, because this at the end of the day, it is a bug.
>
> I don't understand the issue of "extra bytes" to the term dictionary  
> if the Term instance is not actually serialized to the index (at  
> least I really hope that is not done)
>
> The serialVersionUID (suid) is a long because it is a java thing.  
> Here is a link to some information on the subject:
> http://java.sun.com/developer/technicalArticles/Programming/serialization/
>
> Use case: deploying lucene in a distributed environment, we have a  
> broker/server architecture. (standard stuff), we want roll out  
> search servers with lucene 2.4 instance by instance. The problem is  
> that the broker is sending a Query object to the searcher via java  
> serialization at the server level, and the broker is running 2.3.  
> And because of specifically this problem, 2.3 brokers cannot to talk  
> to 2.4 search servers even when the Query object was not changed.
>
> To me, this is a very valid use-case. The problem was two different  
> people did the release with different compilers.
>
> At the risk of pissing off the Lucene powerhouse, I feel I have to  
> express some candor. I am growing more and more frustrated with the  
> lack of the open source nature of this project and its unwillingness  
> to work with the developer community. This is a rather trivial  
> issue, and it is taking 7 back-and-forth's to reiterate some  
> standard Java behavior that has been around for years.
>
> Lucene is a great project and has enjoyed great success, and I think  
> it is to everyone's interest to make sure Lucene grows in a healthy  
> environment.
>
>
>
> Implement Externalizable in main top level searcher classes
> -----------------------------------------------------------
>
>               Key: LUCENE-1473
>               URL: https://issues.apache.org/jira/browse/LUCENE-1473
>           Project: Lucene - Java
>        Issue Type: Bug
>        Components: Search
>  Affects Versions: 2.4
>          Reporter: Jason Rutherglen
>          Priority: Minor
>       Attachments: LUCENE-1473.patch
>
>
> To maintain serialization compatibility between Lucene versions,  
> major classes can implement Externalizable.  This will make  
> Serialization faster due to no reflection required and maintain  
> backwards compatibility.
>
> -- 
> This message is automatically generated by JIRA.
> -
> You can reply to this email to add a comment to the issue online.
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-dev-help@lucene.apache.org
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-dev-help@lucene.apache.org
>
>

Re: [jira] Commented: (LUCENE-1473) Implement Externalizable in main top level searcher classes

Posted by Michael McCandless <lu...@mikemccandless.com>.

John Wang wrote:

> It would have been avoided if the same compiler was used for the  
> release,

I took the same compiler (Sun JDK 1.6.0_06) and used the "serialver"  
tool to compute the SUID for Term.java, and on 2.3.2 it reports  
"554776219862331599L" for 2.4.0 and "435090971444481257L" for 2.3.2.   
In other words, the addition of "public Term(String field)" changed  
the SUID.

Then I tried Sun JDK 1.4.2_15, and it reports the same results.

Mike

---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org

Re: [jira] Commented: (LUCENE-1473) Implement Externalizable in main top level searcher classes

Posted by John Wang <jo...@gmail.com>.

I have described our use-case in good detail. I think it is a common
architecture. And we are not using RemoteSearcher. This problem is not tied
to RemoteSearcher, and we are not using RMI. Serialized java objects can be
used at places other than RMI.
"sometime you IMp serializable for RMI but you don't want to fully support
it. Mabye it's not great java, but it's common enough, and makes sense to me
in  certain instances." - does not make sense to me. There are lotsa bugs
that are common, e.g. thread-safety, dead-lock, memory leak, and they are
bad java, doesn't mean they should not be addressed.

Pardon me for being blunt, but this is really a bug: the expected behavior
stated by the API is not honored. It would have been avoided if the same
compiler was used for the release, with Java being WORA, this smells like a
bug to me.

My frustration is not unfounded, here are some examples I personally ran
into:

https://issues.apache.org/jira/browse/LUCENE-1246: simple 1 line null check,
over 8 months, and still being "discussed"

https://issues.apache.org/jira/browse/SOLR-243: with 4 votes, also few lines
of change with the patch was originally done, over 18months, and still being
"discussed"

-John

On Tue, Dec 2, 2008 at 3:43 PM, Mark Miller <ma...@gmail.com> wrote:

> Woah! I think you got the wrong impression. I think Doug said basically
> what I was thinking (if not a bit more clearly than I was thinking it). I
> think we are all open to any good patches. It's nice to understand and
> discuss them first though.
>
> To reiterate what doug mentioned, sometime you IMp serializable for RMI but
> you don't want to fully support it. Mabye it's not great java, but it's
> common enough, and makes sense to me in  certain instances.
>
> - Mark
>
>
>
> On Dec 2, 2008, at 6:30 PM, "John Wang (JIRA)" <ji...@apache.org> wrote:
>
>
>>   [
>> https://issues.apache.org/jira/browse/LUCENE-1473?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12652594#action_12652594
>>  ]
>>
>> John Wang commented on LUCENE-1473:
>> -----------------------------------
>>
>> the fact an object implements Serializable implies this object can be
>> serialized. It is a known good java programming practice to include a suid
>> to the class (as a static variable) when the object declares itself to be
>> Serializable. If it is not meant to be serialized, why did it implement
>> Serializable. Furthermore, what is the reason to avoid it being serialized?
>> I find the reason being the cost of support kinda ridiculous, seems this
>> reason can be applied to any bug fix, because this at the end of the day, it
>> is a bug.
>>
>> I don't understand the issue of "extra bytes" to the term dictionary if
>> the Term instance is not actually serialized to the index (at least I really
>> hope that is not done)
>>
>> The serialVersionUID (suid) is a long because it is a java thing. Here is
>> a link to some information on the subject:
>> http://java.sun.com/developer/technicalArticles/Programming/serialization/
>>
>> Use case: deploying lucene in a distributed environment, we have a
>> broker/server architecture. (standard stuff), we want roll out search
>> servers with lucene 2.4 instance by instance. The problem is that the broker
>> is sending a Query object to the searcher via java serialization at the
>> server level, and the broker is running 2.3. And because of specifically
>> this problem, 2.3 brokers cannot to talk to 2.4 search servers even when the
>> Query object was not changed.
>>
>> To me, this is a very valid use-case. The problem was two different people
>> did the release with different compilers.
>>
>> At the risk of pissing off the Lucene powerhouse, I feel I have to express
>> some candor. I am growing more and more frustrated with the lack of the open
>> source nature of this project and its unwillingness to work with the
>> developer community. This is a rather trivial issue, and it is taking 7
>> back-and-forth's to reiterate some standard Java behavior that has been
>> around for years.
>>
>> Lucene is a great project and has enjoyed great success, and I think it is
>> to everyone's interest to make sure Lucene grows in a healthy environment.
>>
>>
>>
>>  Implement Externalizable in main top level searcher classes
>>> -----------------------------------------------------------
>>>
>>>               Key: LUCENE-1473
>>>               URL: https://issues.apache.org/jira/browse/LUCENE-1473
>>>           Project: Lucene - Java
>>>        Issue Type: Bug
>>>        Components: Search
>>>  Affects Versions: 2.4
>>>          Reporter: Jason Rutherglen
>>>          Priority: Minor
>>>       Attachments: LUCENE-1473.patch
>>>
>>>
>>> To maintain serialization compatibility between Lucene versions, major
>>> classes can implement Externalizable.  This will make Serialization faster
>>> due to no reflection required and maintain backwards compatibility.
>>>
>>
>> --
>> This message is automatically generated by JIRA.
>> -
>> You can reply to this email to add a comment to the issue online.
>>
>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
>> For additional commands, e-mail: java-dev-help@lucene.apache.org
>>
>>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-dev-help@lucene.apache.org
>
>

Re: [jira] Commented: (LUCENE-1473) Implement Externalizable in main top level searcher classes

Posted by Mark Miller <ma...@gmail.com>.

Woah! I think you got the wrong impression. I think Doug said  
basically what I was thinking (if not a bit more clearly than I was  
thinking it). I think we are all open to any good patches. It's nice  
to understand and discuss them first though.

To reiterate what doug mentioned, sometime you IMp serializable for  
RMI but you don't want to fully support it. Mabye it's not great java,  
but it's common enough, and makes sense to me in  certain instances.

- Mark


On Dec 2, 2008, at 6:30 PM, "John Wang (JIRA)" <ji...@apache.org> wrote:

>
>    [ https://issues.apache.org/jira/browse/LUCENE-1473?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12652594#action_12652594 
>  ]
>
> John Wang commented on LUCENE-1473:
> -----------------------------------
>
> the fact an object implements Serializable implies this object can  
> be serialized. It is a known good java programming practice to  
> include a suid to the class (as a static variable) when the object  
> declares itself to be Serializable. If it is not meant to be  
> serialized, why did it implement Serializable. Furthermore, what is  
> the reason to avoid it being serialized? I find the reason being the  
> cost of support kinda ridiculous, seems this reason can be applied  
> to any bug fix, because this at the end of the day, it is a bug.
>
> I don't understand the issue of "extra bytes" to the term dictionary  
> if the Term instance is not actually serialized to the index (at  
> least I really hope that is not done)
>
> The serialVersionUID (suid) is a long because it is a java thing.  
> Here is a link to some information on the subject:
> http://java.sun.com/developer/technicalArticles/Programming/serialization/
>
> Use case: deploying lucene in a distributed environment, we have a  
> broker/server architecture. (standard stuff), we want roll out  
> search servers with lucene 2.4 instance by instance. The problem is  
> that the broker is sending a Query object to the searcher via java  
> serialization at the server level, and the broker is running 2.3.  
> And because of specifically this problem, 2.3 brokers cannot to talk  
> to 2.4 search servers even when the Query object was not changed.
>
> To me, this is a very valid use-case. The problem was two different  
> people did the release with different compilers.
>
> At the risk of pissing off the Lucene powerhouse, I feel I have to  
> express some candor. I am growing more and more frustrated with the  
> lack of the open source nature of this project and its unwillingness  
> to work with the developer community. This is a rather trivial  
> issue, and it is taking 7 back-and-forth's to reiterate some  
> standard Java behavior that has been around for years.
>
> Lucene is a great project and has enjoyed great success, and I think  
> it is to everyone's interest to make sure Lucene grows in a healthy  
> environment.
>
>
>
>> Implement Externalizable in main top level searcher classes
>> -----------------------------------------------------------
>>
>>                Key: LUCENE-1473
>>                URL: https://issues.apache.org/jira/browse/LUCENE-1473
>>            Project: Lucene - Java
>>         Issue Type: Bug
>>         Components: Search
>>   Affects Versions: 2.4
>>           Reporter: Jason Rutherglen
>>           Priority: Minor
>>        Attachments: LUCENE-1473.patch
>>
>>
>> To maintain serialization compatibility between Lucene versions,  
>> major classes can implement Externalizable.  This will make  
>> Serialization faster due to no reflection required and maintain  
>> backwards compatibility.
>
> -- 
> This message is automatically generated by JIRA.
> -
> You can reply to this email to add a comment to the issue online.
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-dev-help@lucene.apache.org
>

---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org

[jira] Commented: (LUCENE-1473) Implement Externalizable in main top level searcher classes

Posted by "John Wang (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/LUCENE-1473?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12652594#action_12652594 ] 

John Wang commented on LUCENE-1473:
-----------------------------------

the fact an object implements Serializable implies this object can be serialized. It is a known good java programming practice to include a suid to the class (as a static variable) when the object declares itself to be Serializable. If it is not meant to be serialized, why did it implement Serializable. Furthermore, what is the reason to avoid it being serialized? I find the reason being the cost of support kinda ridiculous, seems this reason can be applied to any bug fix, because this at the end of the day, it is a bug.

I don't understand the issue of "extra bytes" to the term dictionary if the Term instance is not actually serialized to the index (at least I really hope that is not done)

The serialVersionUID (suid) is a long because it is a java thing. Here is a link to some information on the subject:
http://java.sun.com/developer/technicalArticles/Programming/serialization/

Use case: deploying lucene in a distributed environment, we have a broker/server architecture. (standard stuff), we want roll out search servers with lucene 2.4 instance by instance. The problem is that the broker is sending a Query object to the searcher via java serialization at the server level, and the broker is running 2.3. And because of specifically this problem, 2.3 brokers cannot to talk to 2.4 search servers even when the Query object was not changed. 

To me, this is a very valid use-case. The problem was two different people did the release with different compilers.

At the risk of pissing off the Lucene powerhouse, I feel I have to express some candor. I am growing more and more frustrated with the lack of the open source nature of this project and its unwillingness to work with the developer community. This is a rather trivial issue, and it is taking 7 back-and-forth's to reiterate some standard Java behavior that has been around for years.

Lucene is a great project and has enjoyed great success, and I think it is to everyone's interest to make sure Lucene grows in a healthy environment.



> Implement Externalizable in main top level searcher classes
> -----------------------------------------------------------
>
>                 Key: LUCENE-1473
>                 URL: https://issues.apache.org/jira/browse/LUCENE-1473
>             Project: Lucene - Java
>          Issue Type: Bug
>          Components: Search
>    Affects Versions: 2.4
>            Reporter: Jason Rutherglen
>            Priority: Minor
>         Attachments: LUCENE-1473.patch
>
>
> To maintain serialization compatibility between Lucene versions, major classes can implement Externalizable.  This will make Serialization faster due to no reflection required and maintain backwards compatibility.  

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org

[jira] Updated: (LUCENE-1473) Implement standard Serialization across Lucene versions

Posted by "Jason Rutherglen (JIRA)" <ji...@apache.org>.

     [ https://issues.apache.org/jira/browse/LUCENE-1473?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Jason Rutherglen updated LUCENE-1473:
-------------------------------------

    Remaining Estimate: 8h
     Original Estimate: 8h
               Summary: Implement standard Serialization across Lucene versions  (was: Implement Externalizable in main top level searcher classes)

> Implement standard Serialization across Lucene versions
> -------------------------------------------------------
>
>                 Key: LUCENE-1473
>                 URL: https://issues.apache.org/jira/browse/LUCENE-1473
>             Project: Lucene - Java
>          Issue Type: Bug
>          Components: Search
>    Affects Versions: 2.4
>            Reporter: Jason Rutherglen
>            Priority: Minor
>         Attachments: LUCENE-1473.patch
>
>   Original Estimate: 8h
>  Remaining Estimate: 8h
>
> To maintain serialization compatibility between Lucene versions, major classes can implement Externalizable.  This will make Serialization faster due to no reflection required and maintain backwards compatibility.  

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org

[jira] Commented: (LUCENE-1473) Implement standard Serialization across Lucene versions

Posted by "Jason Rutherglen (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/LUCENE-1473?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12655845#action_12655845 ] 

Jason Rutherglen commented on LUCENE-1473:
------------------------------------------

To Wolf: Your patch looked like it was quite a bit of work, nice job!  Restricting people to XML will probably not be suitable though.  Some may want JSON or something that more directly encodes the objects.  

General:
It seems the alternative solutions to serialization simply shift the problem around but do not really solve the underlying issues (speed, versioning, writing custom serialization code, and perhaps dynamic classloading).  The externalizable code will not be too lengthy and should be more convenient than alternatives to implement (with the code necessary being roughly equivalent to an equals method).  For example protocol buffers requires maintaining files that remind me of IDL files from CORBA to describe the objects.  

Deprecating serialization entirely needs to be taken to the java-user mailing list as there are quite a number of installations relying on it.  If this is something that overlaps with SOLR then it would be good for the SOLR folks to separate it out as a serialization library that could be used outside of the SOLR server.  This would be a good idea for most of the SOLR functionality otherwise there would seem to be redundant development occurring.  

I'll finish up the Externalizable patch once LUCENE-1314 is completed (IndexReader.clone) as it is something that needs feedback and testing to ensure it's workable for 2.9, whereas Externalizable is somewhat easier.  

> Implement standard Serialization across Lucene versions
> -------------------------------------------------------
>
>                 Key: LUCENE-1473
>                 URL: https://issues.apache.org/jira/browse/LUCENE-1473
>             Project: Lucene - Java
>          Issue Type: Bug
>          Components: Search
>    Affects Versions: 2.4
>            Reporter: Jason Rutherglen
>            Priority: Minor
>         Attachments: custom-externalizable-reader.patch, LUCENE-1473.patch, LUCENE-1473.patch, LUCENE-1473.patch, LUCENE-1473.patch, lucene-contrib-remote.patch
>
>   Original Estimate: 8h
>  Remaining Estimate: 8h
>
> To maintain serialization compatibility between Lucene versions, serialVersionUID needs to be added to classes that implement java.io.Serializable.  java.io.Externalizable may be implemented in classes for faster performance.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org

[jira] Commented: (LUCENE-1473) Implement standard Serialization across Lucene versions

Posted by "Doug Cutting (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/LUCENE-1473?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12653413#action_12653413 ] 

Doug Cutting commented on LUCENE-1473:
--------------------------------------

> Serialization between VM1 and VM2 of class A is ok, just that A will not get the new fields. Which is fine since VM1 does not make use of it.

But VM1 might require an older field that the new field replaced, and VM1 may then crash in an unpredictable way.  Not defining explicit suid's is more conservative: you get a well-defined exception when things might not work.  Defining suid's but doing nothing else about compatibility is playing fast-and-loose: it might work in many cases, but it also might cause strange, hard-to-diagnose problems in others.  If we want Lucene to work reliably across versions, then we need to commit to that goal as a project, define the limits of the compatibility, implement Externalizeable, add tests, etc.  Just adding suid's doesn't achieve that, so far as I can see.


> Implement standard Serialization across Lucene versions
> -------------------------------------------------------
>
>                 Key: LUCENE-1473
>                 URL: https://issues.apache.org/jira/browse/LUCENE-1473
>             Project: Lucene - Java
>          Issue Type: Bug
>          Components: Search
>    Affects Versions: 2.4
>            Reporter: Jason Rutherglen
>            Priority: Minor
>         Attachments: LUCENE-1473.patch
>
>   Original Estimate: 8h
>  Remaining Estimate: 8h
>
> To maintain serialization compatibility between Lucene versions, serialVersionUID needs to be added to classes that implement java.io.Serializable.  java.io.Externalizable may be implemented in classes for faster performance.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org

[jira] Commented: (LUCENE-1473) Implement standard Serialization across Lucene versions

Posted by "Doug Cutting (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/LUCENE-1473?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12653972#action_12653972 ] 

Doug Cutting commented on LUCENE-1473:
--------------------------------------

> I guarantee 2.9 and above classes will be backward compatible with the previous classes.

It sounds like you are personally guaranteeing that all serializeable classes will be forever compatible.  That's not what we'd need.  We'd need a proposed policy for the project to consider in terms of major and minor releases, specifying forward and/or backward compatibility guarantees.  For example, we might say, "within a major release cycle, serialized queries from older releases will work with newer releases, however serialized queries from newer releases will not generally work with older releases, since we might add new kinds of queries in the course of a major release cycle".  Similarly detailed statements would need to be made for each Externalizeable, no?

> Implement standard Serialization across Lucene versions
> -------------------------------------------------------
>
>                 Key: LUCENE-1473
>                 URL: https://issues.apache.org/jira/browse/LUCENE-1473
>             Project: Lucene - Java
>          Issue Type: Bug
>          Components: Search
>    Affects Versions: 2.4
>            Reporter: Jason Rutherglen
>            Priority: Minor
>         Attachments: LUCENE-1473.patch, LUCENE-1473.patch, LUCENE-1473.patch
>
>   Original Estimate: 8h
>  Remaining Estimate: 8h
>
> To maintain serialization compatibility between Lucene versions, serialVersionUID needs to be added to classes that implement java.io.Serializable.  java.io.Externalizable may be implemented in classes for faster performance.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org

[jira] Updated: (LUCENE-1473) Implement standard Serialization across Lucene versions

Posted by "Michael Busch (JIRA)" <ji...@apache.org>.

     [ https://issues.apache.org/jira/browse/LUCENE-1473?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Michael Busch updated LUCENE-1473:
----------------------------------

    Attachment: custom-externalizable-reader.patch

I really wouldn't want to add another backwards-compatibility
requirement to Lucene, as the others already stated. Often in the past
was ensuring backwards-compatibility the part of writing patches that
took the longest and involved the most discussions.

But maybe we can come up with a fair compromise here. What if we
change the classes that currently implement Serializable, so that they
implement Externalizable and add a suid as Jason suggests. But we make
clear in the javadocs that we don't support backwards-compatiblity
here, so e.g. a Term externalized with Lucene 2.9 can't be read with
3.0, only with 2.9.
Then we add a new class CustomExternalizableReader to util:
{code:java}
public abstract class CustomExternalizableReader {
  public abstract void readExternal(Object obj, ObjectInput in)
      throws IOException, ClassNotFoundException;
} 
{code}

add a package-private, static variable of this type to a class that
implements Externalizable and implement the deserialization code in a
default instance of such a reader. This could look like this:
{code:java}
public class SomeClass implements Externalizable {
  private int one;
  private int two;

...

  static CustomExternalizableReader extReader = new CustomExternalizableReader() {
    public void readExternal(Object obj, ObjectInput in) throws IOException,
        ClassNotFoundException {
      SomeClass s = (SomeClass) obj;
      long uid = in.readLong();
      if (uid != serialVersionUID) {
        throw new IOException("Wrong serialVerionUID: " + uid);
      }
      int one = in.readInt();
      int two = in.readInt();
      s.init(one, two);
    }
  };

  // initialization method for readExternal
  void init(int one, int two) {
    this.one = one;
    this.two = two;
  }
{code}

Note that I also specified an init() method. Since both init() and
extReader are both package-private, they are not protected by our
backwards-compatibility policy and we can change them in any release.

Now if in the next version of this class we add a new variable 'three'
we have to change init() and the reader:

{code:java}
public class SomeClassNewVersion implements Externalizable {
  private int one;
  private int two;
  private int three;

  static final long serialVersionUID = 2L;

  public void readExternal(ObjectInput in) throws IOException,
      ClassNotFoundException {
    extReader.readExternal(this, in);
  }

  public void writeExternal(ObjectOutput out) throws IOException {
    out.writeLong(serialVersionUID);
    out.writeInt(one);
    out.writeInt(two);
    out.writeInt(three);
  }

  /**
   * This reader can only read the externalized format created with the same
   * version of this class. If backwards-compatibility is desired, a custom
   * reader has to be implemented.
   */
  static CustomExternalizableReader extReader = new CustomExternalizableReader() {
    public void readExternal(Object obj, ObjectInput in) throws IOException,
        ClassNotFoundException {
      SomeClassNewVersion s = (SomeClassNewVersion) obj;
      long uid = in.readLong();
      if (uid != serialVersionUID) {
        throw new IOException("Wrong serialVerionUID: " + uid);
      }
      int one = in.readInt();
      int two = in.readInt();
      int three = in.readInt();
      s.init(one, two, three);
    }
  };

  void init(int one, int two, int three) {
    this.one = one;
    this.two = two;
    this.three = three;
  }
{code}

Now if someone tries to deserialize an object that was written with
an old Lucene version, an exception will be thrown.

But the user can simply implement an own, backwards-compatible reader:

{code:java}
    // Now the user implements their own backwards compatible reader
    SomeClassNewVersion.extReader = new CustomExternalizableReader() {
      public void readExternal(Object obj, ObjectInput in) throws IOException,
          ClassNotFoundException {
        SomeClassNewVersion c_new = (SomeClassNewVersion) obj;
        long uid = in.readLong();
        int one = in.readInt();
        int two = in.readInt();
        int three;
        if (uid == 1) {
          // old version - initialze with default value
          three = -3;
        } else {
          // new version
          three = in.readInt();
        }
        c_new.init(one, two, three);
      }
    };
{code}

With this approach we have to clearly document in the javadocs the
externalization format. Also if externalizable classes contain private
inner classes that need to be serialized, then those inner classes
have to be made package-private.

The nice thing here is that we allow backwards-compatibility, but push
the burden of maintaining it to the user.

I coded this all up as an example that I'm attaching here. Let me know
what you think, please. The patch file contains a Demo.java with a main
method that demonstrates what I'm proposing here.

> Implement standard Serialization across Lucene versions
> -------------------------------------------------------
>
>                 Key: LUCENE-1473
>                 URL: https://issues.apache.org/jira/browse/LUCENE-1473
>             Project: Lucene - Java
>          Issue Type: Bug
>          Components: Search
>    Affects Versions: 2.4
>            Reporter: Jason Rutherglen
>            Priority: Minor
>         Attachments: custom-externalizable-reader.patch, LUCENE-1473.patch, LUCENE-1473.patch, LUCENE-1473.patch, LUCENE-1473.patch
>
>   Original Estimate: 8h
>  Remaining Estimate: 8h
>
> To maintain serialization compatibility between Lucene versions, serialVersionUID needs to be added to classes that implement java.io.Serializable.  java.io.Externalizable may be implemented in classes for faster performance.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org

[jira] Commented: (LUCENE-1473) Implement Externalizable in main top level searcher classes

Posted by "Mark Miller (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/LUCENE-1473?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12652580#action_12652580 ] 

Mark Miller commented on LUCENE-1473:
-------------------------------------

bq. Lucene supports serialization explicitly by implementing Serializable. 

Right, but we don't *really* support it (like many/most I would guess). There is a pain in the butt cost of support. Since this patch seems to push that pain around, I'm just wondering if the motivation for it is worth the cost (not knowing the motivation).

> Implement Externalizable in main top level searcher classes
> -----------------------------------------------------------
>
>                 Key: LUCENE-1473
>                 URL: https://issues.apache.org/jira/browse/LUCENE-1473
>             Project: Lucene - Java
>          Issue Type: Bug
>          Components: Search
>    Affects Versions: 2.4
>            Reporter: Jason Rutherglen
>            Priority: Minor
>         Attachments: LUCENE-1473.patch
>
>
> To maintain serialization compatibility between Lucene versions, major classes can implement Externalizable.  This will make Serialization faster due to no reflection required and maintain backwards compatibility.  

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org

[jira] Commented: (LUCENE-1473) Implement standard Serialization across Lucene versions

Posted by "robert engels (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/LUCENE-1473?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12653058#action_12653058 ] 

robert engels commented on LUCENE-1473:
---------------------------------------

Even better. Thanks Mark.

> Implement standard Serialization across Lucene versions
> -------------------------------------------------------
>
>                 Key: LUCENE-1473
>                 URL: https://issues.apache.org/jira/browse/LUCENE-1473
>             Project: Lucene - Java
>          Issue Type: Bug
>          Components: Search
>    Affects Versions: 2.4
>            Reporter: Jason Rutherglen
>            Priority: Minor
>         Attachments: LUCENE-1473.patch
>
>   Original Estimate: 8h
>  Remaining Estimate: 8h
>
> To maintain serialization compatibility between Lucene versions, serialVersionUID needs to be added to classes that implement java.io.Serializable.  java.io.Externalizable may be implemented in classes for faster performance.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org

[jira] Commented: (LUCENE-1473) Implement Externalizable in main top level searcher classes

Posted by "Jason Rutherglen (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/LUCENE-1473?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12652585#action_12652585 ] 

Jason Rutherglen commented on LUCENE-1473:
------------------------------------------

Currently serialization between 2.3 and 2.4 is broken, backwards compatibility is broken.  

Saying "we don't really support it" means Serializable will not be implemented in any classes.  

This really simple Java stuff that I'm surprised is raising concern here.

> Implement Externalizable in main top level searcher classes
> -----------------------------------------------------------
>
>                 Key: LUCENE-1473
>                 URL: https://issues.apache.org/jira/browse/LUCENE-1473
>             Project: Lucene - Java
>          Issue Type: Bug
>          Components: Search
>    Affects Versions: 2.4
>            Reporter: Jason Rutherglen
>            Priority: Minor
>         Attachments: LUCENE-1473.patch
>
>
> To maintain serialization compatibility between Lucene versions, major classes can implement Externalizable.  This will make Serialization faster due to no reflection required and maintain backwards compatibility.  

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org

[jira] Commented: (LUCENE-1473) Implement standard Serialization across Lucene versions

Posted by "Michael Busch (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/LUCENE-1473?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12654147#action_12654147 ] 

Michael Busch commented on LUCENE-1473:
---------------------------------------

{quote}
Your first example is missing the read/writeExternal methods.
{quote}

Oups, I forgot to copy&paste it. It's in the attached patch file though.

{quote}
I think the proposed approach is rather heavy-weight
{quote}

Really? In case we go the Externalizable way anyway, then I think this
approach doesn't add too much overhead. You only need to add init()
and move the deserialization code from readExternal() to the reader's
readExternal. It's really not too much more code.

And, the code changes are straightforward when the class changes. No
need to worry about how to initialize newly added variables if an old 
version is read for example.

What I think will be the most work is documenting and explaining
this. But this would be an expert API, so probably people who really
need to use it are most likely looking into the sources anyway.

But for the record: I'm totally fine with using Serializable and just
adding the serialVersionUID. Just if we use Externalizable, we might
want to consider something like this to avoid new backwards-
compatibility requirements.

> Implement standard Serialization across Lucene versions
> -------------------------------------------------------
>
>                 Key: LUCENE-1473
>                 URL: https://issues.apache.org/jira/browse/LUCENE-1473
>             Project: Lucene - Java
>          Issue Type: Bug
>          Components: Search
>    Affects Versions: 2.4
>            Reporter: Jason Rutherglen
>            Priority: Minor
>         Attachments: custom-externalizable-reader.patch, LUCENE-1473.patch, LUCENE-1473.patch, LUCENE-1473.patch, LUCENE-1473.patch
>
>   Original Estimate: 8h
>  Remaining Estimate: 8h
>
> To maintain serialization compatibility between Lucene versions, serialVersionUID needs to be added to classes that implement java.io.Serializable.  java.io.Externalizable may be implemented in classes for faster performance.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org

[jira] Commented: (LUCENE-1473) Implement Externalizable in main top level searcher classes

Posted by "Michael McCandless (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/LUCENE-1473?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12652740#action_12652740 ] 

Michael McCandless commented on LUCENE-1473:
--------------------------------------------


{quote}
> At the risk of pissing off the Lucene powerhouse, I feel I have to express some candor. I am growing more and more frustrated with the lack of the open source nature of this project and its unwillingness to work with the developer community. This is a rather trivial issue, and it is taking 7 back-and-forth's to reiterate some standard Java behavior that has been around for years.
{quote}
Whoa!  I'm sorry if my questions are giving this impression.  I don't
intend to.

But I do have real questions, still, because I don't think
Serialization is actually so simple.  I too was surprised on looking
at what started as a simple patch yet on digging into it uncovered
some real challenges.

{quote}
>Use case: deploying lucene in a distributed environment, we have a broker/server architecture. (standard stuff), we want roll out search servers with lucene 2.4 instance by instance. The problem is that the broker is sending a Query object to the searcher via java serialization at the server level, and the broker is running 2.3. And because of specifically this problem, 2.3 brokers cannot to talk to 2.4 search servers even when the Query object was not changed.
{quote}
OK that is a great use case -- thanks.  That helps focus the many
questions here.

{quote}
> It is a known good java programming practice to include a suid to the class (as a static variable) when the object declares itself to be Serializable.
{quote}

But that alone gives a too-fragile back-compat solution because it's
too coarse.  If we add field X to a class implementing Serializable,
and must bump the SUID, that's a hard break on back compat.  So really
we need to override read/writeObject() or read/writeExternal() to do
our own versioning.

Consider this actual example: RangeQuery, in 2.9, now separately
stores "boolean includeLower" and "boolean includeUpper".  In versions
<= 2.4, it only stores "boolean inclusive".  This means we can't rely
on the JVM's default versioning for serialization.

{quote}
> The serialVersionUID (suid) is a long because it is a java thing.
{quote}

But, that's only if you rely on the JVM's default serialization.  If
we implement our own (overriding read/writeObject or
read/writeExtenral) we don't have to use "long SUID".

{quote}
> The problem was two different people did the release with different compilers.
{quote}

I think it's more likely the addition of a new ctor to Term (that
takes only String field), that changed the SUID.

{quote}
> If it is not meant to be serialized, why did it implement Serializable.
{quote}

Because there are two different things it can "mean" when a class
implements Serializable, and I think that's the core
disconnect/challenge to this issue.

The first meaning (let's call it "live serialization") is: "within the
same version of Lucene you can serialize/deserialize this object".

The second meaning (let's call it "long-term persistence") is: "you
can serialize this object in version X of Lucene and later deserialize
it using a newer version Y of Lucene".

Lucene, today, only guarantees "live serialization", and that's the
intention when "implements Serializable" is added to a class.

But, what's now being asked for (expected) with this issue is
"long-term persistence", which is really a very different beast and a
much taller order.  With it comes a number of challenges, that warrant
scrutiny:

  * What's our back-compat policy for "long-term persistence"?

  * The storage protocol must have a version header, so future changes
    can switch on that and decode older formats.

  * We need strong test cases that deserialize older versions of these
    serialized classes so we don't accidentally break it.

  * We should look carefully at the protocol and not waste bytes if we
    can (1 byte vs 8 byte version header).

These issues are the same issues we face with the index file format,
because that is also long-term persistence.


> Implement Externalizable in main top level searcher classes
> -----------------------------------------------------------
>
>                 Key: LUCENE-1473
>                 URL: https://issues.apache.org/jira/browse/LUCENE-1473
>             Project: Lucene - Java
>          Issue Type: Bug
>          Components: Search
>    Affects Versions: 2.4
>            Reporter: Jason Rutherglen
>            Priority: Minor
>         Attachments: LUCENE-1473.patch
>
>
> To maintain serialization compatibility between Lucene versions, major classes can implement Externalizable.  This will make Serialization faster due to no reflection required and maintain backwards compatibility.  

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org