You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@lucene.apache.org by Shai Erera <se...@gmail.com> on 2009/06/19 21:20:38 UTC

Shouldn't IndexWriter.commit(Map) accept Properties instead?

It really assumes a String, String map ... Is it just because Properties is
synced?

If so, then when moving to 1.5 we should declare the Map with Map<String,
String> because currently if anyone will pass anything other than Strings,
the code will fail with a ClassCastException in
ChecksumIndexOutput.writeStringStringMap.

Shai

Re: Shouldn't IndexWriter.commit(Map) accept Properties instead?

Posted by Chris Hostetter <ho...@fucit.org>.
: But then when you retrieve your metadata it's converted to String -> String.

Correct ... the documentation should make it clear that what gets 
persisted is a String, but the method of giving the String to the API is 
by passing an Obejct that will be toString()ed.

(Asside: it would be really nice if Java had a Stringable interface)

It's not the prettiest API in the world, in a pure Java1.5 code base i 
wouldn't even suggest it, but in 1.4 code bases it tends to be a lot 
more freindly then then to document that people must pass a collection of 
Stings and cast them all.




-Hoss


---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org


Re: Shouldn't IndexWriter.commit(Map) accept Properties instead?

Posted by Michael McCandless <lu...@mikemccandless.com>.
On Mon, Jun 22, 2009 at 3:43 PM, Chris
Hostetter<ho...@fucit.org> wrote:
> : The javadocs state clearly it must be Map<String,String>.  Plus, the
> : type checking is in fact enforced (you hit an exception if you violate
> : it), dynamically (like Python).
> :
> : And then I was thinking with 1.5 (3.0 -- huh, neat how it's exactly
> : 2X) we'd statically type it (change Map to Map<String,String>).
>
> the other option i've seen in similar situations is to document that
> Map<Object,Object> is allowed, but that the Object will be toString()ed
> and the resulting value is what will be used.
>
> In the common case of Strings, the functionality is the same without
> requiring any explicit casting or instanceof error checking.
>
> the added bonuses are:
>  1) people can pass other "simple" objects (Integers, Foats, Booleans)
> and 99% of the time get what they want.
>  2) people can pass wrapper objects that implement toString() in a non
> trivial way and have the string produced for them lazily when the time
> comes to use the String.  (ie: if my string value is expensive to produce,
> i can defer that cost until needed in case the commit fails for some other
> reason before my string is even used)

But then when you retrieve your metadata it's converted to String -> String.

Mike

---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org


Re: Shouldn't IndexWriter.commit(Map) accept Properties instead?

Posted by Earwin Burrfoot <ea...@gmail.com>.
> What other issues would we be taking on by using Java's serialization here...?
It's insanely slow. Though, that doesn't apply to a once-per-commit call.

The other point is, if you store Object, you can no longer mix lucene
and user data.
With Map<String, whatever> approach you could reserve some key space
for lucene and let user add his stuff on top.

-- 
Kirill Zakharenko/Кирилл Захаренко (earwin@gmail.com)
Home / Mobile: +7 (495) 683-567-4 / +7 (903) 5-888-423
ICQ: 104465785

---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org


RE: Shouldn't IndexWriter.commit(Map) accept Properties instead?

Posted by Chris Hostetter <ho...@fucit.org>.
: If the user serializes object, opens the index on another machine where
: different versions of these classes are installed and he did not use
: serialVersionId to create a version info in index. As long as you only
: serialize standard Java classes like String, HashMap,... you will have no
: problem with that, but with own classes a lot of care must be taken that
: they can be serialized in different versions. In my case with the stored
: document Field it was just a LinkedHashSet of String or something like that
: (very easy for serialization).
: 
: An the second problem is, that if you want to open such an index e.g. with
: PyLucene? Should PyLucene just ignore the binary serialization data?

Right ... i wouldn't advocate using Java serialization here for all of 
those reasons (especially since so many people have worked so hard to move 
towards dealing with pure byte[]s on disk instead of java serialized 
Strings)

So to be clear: I wasn't in any way advocating that we do arbitrary 
serialization, or do anything different with the "String" values once we 
get them from the caller -- i was just suggesting an alternate API for 
getting String values from the caller in a way that didn't involve 
casting.


-Hoss


---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org


RE: Shouldn't IndexWriter.commit(Map) accept Properties instead?

Posted by Uwe Schindler <uw...@thetaphi.de>.
> > You could also serialize arbitrary objects into the index
> (Map<String,?>).
> 
> Or just commit(Object) (not commit(Map<?,?>)).
> 
> The back-compat problems in LUCENE-1473 don't apply since the Object
> is opaque to Lucene.
> 
> What other issues would we be taking on by using Java's serialization
> here...?

If the user serializes object, opens the index on another machine where
different versions of these classes are installed and he did not use
serialVersionId to create a version info in index. As long as you only
serialize standard Java classes like String, HashMap,... you will have no
problem with that, but with own classes a lot of care must be taken that
they can be serialized in different versions. In my case with the stored
document Field it was just a LinkedHashSet of String or something like that
(very easy for serialization).

An the second problem is, that if you want to open such an index e.g. with
PyLucene? Should PyLucene just ignore the binary serialization data?




---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org


Re: Shouldn't IndexWriter.commit(Map) accept Properties instead?

Posted by Michael McCandless <lu...@mikemccandless.com>.
On Mon, Jun 22, 2009 at 3:49 PM, Uwe Schindler<uw...@thetaphi.de> wrote:
> You could also serialize arbitrary objects into the index (Map<String,?>).

Or just commit(Object) (not commit(Map<?,?>)).

The back-compat problems in LUCENE-1473 don't apply since the Object
is opaque to Lucene.

What other issues would we be taking on by using Java's serialization here...?

Mike

---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org


RE: Shouldn't IndexWriter.commit(Map) accept Properties instead?

Posted by Uwe Schindler <uw...@thetaphi.de>.
You could also serialize arbitrary objects into the index (Map<String,?>).
Not that this may not be good idea because of different class variants and
the known Java serialization problems, but it should principally work. And
Strings can also be serialized in the same way and are always
backwards-compatible (as far as you try to open the index with Lucene... --
I think this is the interesting point)

I have an index, where I have serialized objects in a stored binary field
(Object -> ObjectOutputStream(ByteArrayOutputStream)) -> binary field).
Works good.

-----
Uwe Schindler
H.-H.-Meier-Allee 63, D-28213 Bremen
http://www.thetaphi.de
eMail: uwe@thetaphi.de

> -----Original Message-----
> From: Chris Hostetter [mailto:hossman_lucene@fucit.org]
> Sent: Monday, June 22, 2009 9:43 PM
> To: java-dev@lucene.apache.org
> Subject: Re: Shouldn't IndexWriter.commit(Map) accept Properties instead?
> 
> : The javadocs state clearly it must be Map<String,String>.  Plus, the
> : type checking is in fact enforced (you hit an exception if you violate
> : it), dynamically (like Python).
> :
> : And then I was thinking with 1.5 (3.0 -- huh, neat how it's exactly
> : 2X) we'd statically type it (change Map to Map<String,String>).
> 
> the other option i've seen in similar situations is to document that
> Map<Object,Object> is allowed, but that the Object will be toString()ed
> and the resulting value is what will be used.
> 
> In the common case of Strings, the functionality is the same without
> requiring any explicit casting or instanceof error checking.
> 
> the added bonuses are:
>   1) people can pass other "simple" objects (Integers, Foats, Booleans)
> and 99% of the time get what they want.
>   2) people can pass wrapper objects that implement toString() in a non
> trivial way and have the string produced for them lazily when the time
> comes to use the String.  (ie: if my string value is expensive to produce,
> i can defer that cost until needed in case the commit fails for some other
> reason before my string is even used)
> 
> 
> 
> -Hoss
> 
> 
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-dev-help@lucene.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org


Re: Shouldn't IndexWriter.commit(Map) accept Properties instead?

Posted by Chris Hostetter <ho...@fucit.org>.
: The javadocs state clearly it must be Map<String,String>.  Plus, the
: type checking is in fact enforced (you hit an exception if you violate
: it), dynamically (like Python).
: 
: And then I was thinking with 1.5 (3.0 -- huh, neat how it's exactly
: 2X) we'd statically type it (change Map to Map<String,String>).

the other option i've seen in similar situations is to document that 
Map<Object,Object> is allowed, but that the Object will be toString()ed 
and the resulting value is what will be used.

In the common case of Strings, the functionality is the same without 
requiring any explicit casting or instanceof error checking.

the added bonuses are:
  1) people can pass other "simple" objects (Integers, Foats, Booleans) 
and 99% of the time get what they want.
  2) people can pass wrapper objects that implement toString() in a non 
trivial way and have the string produced for them lazily when the time 
comes to use the String.  (ie: if my string value is expensive to produce, 
i can defer that cost until needed in case the commit fails for some other 
reason before my string is even used)



-Hoss


---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org


Re: Shouldn't IndexWriter.commit(Map) accept Properties instead?

Posted by Michael McCandless <lu...@mikemccandless.com>.
The javadocs state clearly it must be Map<String,String>.  Plus, the
type checking is in fact enforced (you hit an exception if you violate
it), dynamically (like Python).

And then I was thinking with 1.5 (3.0 -- huh, neat how it's exactly
2X) we'd statically type it (change Map to Map<String,String>).

Mike

On Fri, Jun 19, 2009 at 3:20 PM, Shai Erera<se...@gmail.com> wrote:
> It really assumes a String, String map ... Is it just because Properties is
> synced?
>
> If so, then when moving to 1.5 we should declare the Map with Map<String,
> String> because currently if anyone will pass anything other than Strings,
> the code will fail with a ClassCastException in
> ChecksumIndexOutput.writeStringStringMap.
>
> Shai
>

---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org