You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@lucene.apache.org by Erik Hatcher <er...@ehatchersolutions.com> on 2003/12/02 20:36:53 UTC

Re: Language neutral index format representation

On Tuesday, December 2, 2003, at 01:56  PM, Simon Cozens wrote:
> Yep, thanks to Kasei, who are also cleaning up and documenting the 
> code I
> write. For the interested, what I'm doing is at
> http://cvs.simon-cozens.org/viewcvs.cgi/plucene/ and I hope to sync 
> back over
> the docs/tests once they're completed.

Speaking of tests - are you testing Java/Perl interoperability?  For 
example - are you testing an index created in Java is read fine by your 
Perl API?  And vice versa?  I'm interested in developing some sort of 
test suite to do this with the Ruby port eventually.

> My version's almost there, thanks to a month basically full-time work 
> on it.

I'm jealous!  Or I guess you might say I should forget Ruby and switch 
to Perl :)

> I believe so. You'd generate, conceptually, an ObjectSerializer class 
> of
> some sort which has read and write methods, which is overloaded to do
> the right thing with the right object type.

I'm thinking more in terms of generating classes like FieldInfos and 
SegmentInfos from an XML descriptor that represented the info here:

	http://jakarta.apache.org/lucene/docs/fileformats.html

> However, I can imagine some snags, such as the one which prompted this
> thread: how would you represent sequences of objects with their 
> properties
> delta-encoded, for instance, or the cunning buffer-substring trick 
> used to
> store the terms in the .tis file?

I view this as what the language-specific code generator would build 
from a general file format descriptor.  For example, in Java I'd 
probably write some Velocity templates that keyed of the XML 
descriptor.  In Ruby, I'd use REXML and ERb templates.

I haven't thought through any detailed issues that could come up or if 
it would impact the design of the Java "reference implementation" to 
accommodate generated code or not.

	Erik


---------------------------------------------------------------------
To unsubscribe, e-mail: lucene-dev-unsubscribe@jakarta.apache.org
For additional commands, e-mail: lucene-dev-help@jakarta.apache.org


Re: Language neutral index format representation

Posted by Simon Cozens <si...@simon-cozens.org>.
Erik Hatcher:
> Speaking of tests - are you testing Java/Perl interoperability?  For 
> example - are you testing an index created in Java is read fine by your 
> Perl API?  And vice versa?

Only ad-hoc tests. :( I started testing the index reader by copying across a
Java-created index and making sure it could read that, but now I can generate
indexes in Perl, I'm using those instead. Ideally, I will have some tests that
make sure that the two representations are identical.

> I'm jealous!  Or I guess you might say I should forget Ruby and switch 
> to Perl :)

Oh, believe me, I'd love it if I could work full-time in Ruby.

> I'm thinking more in terms of generating classes like FieldInfos and 
> SegmentInfos from an XML descriptor that represented the info here:
> 
> 	http://jakarta.apache.org/lucene/docs/fileformats.html

Sounds interesting, and not too difficult. 

-- 
The Blit is a nice terminal, but it runs emacs.

---------------------------------------------------------------------
To unsubscribe, e-mail: lucene-dev-unsubscribe@jakarta.apache.org
For additional commands, e-mail: lucene-dev-help@jakarta.apache.org