You are viewing a plain text version of this content. The canonical link for it is here.
Posted to pylucene-dev@lucene.apache.org by Eric Hall <lu...@darkart.com> on 2011/01/19 19:23:16 UTC

Using IndexWriter.commit(Map commitUserData) in pylucene

Hello-
	I'd like to store some index metadata using

	IndexWriter.commit(Map<String,String> commitUserData)

	I've set up a python dict with string to string mappings,
but if I use that I get an InvalidArgsError.  Is there a different
python type to use in this call?

	I'm using pylucene-3.0.3-1.

Trimmed/pseudo sample (real code is on a different system):

	import lucene

	indexMetaDataDict = {"one":"two", "three":"four"}
	writer = lucene.IndexWriter(store, analyzer, True, lucene.IndexWriter.MaxFieldLength.UNLIMITED)

	## do document indexing

	writer.optimize()

	writer.commit(indexMetaDataDict)

	writer.close()


	If the "writer.commit(indexMetaDataDict)" line is commented out, it works fine. 
If its not, I get the InvalidArgsError back for that line.



		Thanks much,


		-eric


Re: Using IndexWriter.commit(Map commitUserData) in pylucene

Posted by Andi Vajda <va...@apache.org>.
On Mon, 24 Jan 2011, Eric Hall wrote:

> 	Sadly this appears to have been a case of PEBKAC or
> PICNIC, looks like I had a typo in the names of the index
> with the metadata in it and an older one w/o the metadata.
> Sorry for the annoyance.
> 	Using reader.getCommitUserData() is working just fine, I'm able
> to retrieve the metadata I stored in the index.
>
>
> 	Thanks for all the help,

You're very welcome !
Thanks for letting us know.

Andi..

>
>
> 		-eric
>
>
> p.s. the luke source uses getCommitUserData() as well, to round
> out the info for those who may come searching later.
>

Re: Using IndexWriter.commit(Map commitUserData) in pylucene

Posted by Eric Hall <er...@darkart.com>.
	Sadly this appears to have been a case of PEBKAC or
PICNIC, looks like I had a typo in the names of the index
with the metadata in it and an older one w/o the metadata. 
Sorry for the annoyance.  
	Using reader.getCommitUserData() is working just fine, I'm able
to retrieve the metadata I stored in the index.


	Thanks for all the help,


		-eric


p.s. the luke source uses getCommitUserData() as well, to round
out the info for those who may come searching later.


Re: Using IndexWriter.commit(Map commitUserData) in pylucene

Posted by Eric Hall <lu...@darkart.com>.
On Fri, Jan 21, 2011 at 05:54:52PM -0800, Andi Vajda wrote:
> 
> On Sat, 22 Jan 2011, Eric Hall wrote:
> 
> >On Wed, Jan 19, 2011 at 07:15:11PM -0800, Andi Vajda wrote:
> >>
> >>On Wed, 19 Jan 2011, Eric Hall wrote:
> >>
> >>>On Wed, Jan 19, 2011 at 11:05:28AM -0800, Andi Vajda wrote:
> >>>>
> >>>>On Wed, 19 Jan 2011, Eric Hall wrote:
> >>>>
> >>>
> >>>	Tha HashMap()... method works fine for storing the metadata, and luke
> >>>shows that its there.  Of course getting the metadata back for later
> >>>comparison/display is also useful, and seemed like it would be
> >>>straightforward....  Naturally its not working using:
> >>>
> >>>	indexMetaDataHashMap = reader.getCommitUserData()
> >>>
> >>>	I get an empty hashmap back from the above.  I also tried using:
> >
> >	Whups, I was wrong, I don't get a HashMap type back, I get
> >a 'Map' type back (still empty).  I don't know if that makes a difference
> >or not....
> 
> That shouldn't make a difference, that interface is wrapped too.
> To see what the actual Java class is for that map, you can call getClass() 
> on it. Maybe I'm missing something obvious here, how do you see that this 
> Map instance is empty ?

	I currently use:

	if (indexMetaDataHashMap.isEmpty()):
		## the map is empty...
	else
		## the map is not empty...

	I also tried getting the size() (is zero), and getting the
keySet() (the set/list I get back says its empty (isEmpty()), and
I get nothing when I iterate over it).
	Hopefully I'm doing all of those correctly of course.


> 
> >>>
> >>>	indexCommit = reader.getIndexCommit
> >>>	indexMetaDataHashMap = indexCommit.getUserData()
> >>>
> >>>with the same result (empty hashmap).  Is there a different way to do 
> >>>this?
> >>>
> >>
> >>If you're getting a HashMap back then the PyLucene side of things is
> >>working. If it's empty it could mean that you're doing something wrong
> >>Lucene-wise or that you found a bug there. Could it be that you opened the
> >>IndexReader before the IndexWriter got committed ? If so, reopen() it 
> >>after
> >>commit or move the opening code. If not, you may want to ask about this on
> >>the Lucene user list at java-user@lucene.apache.org.
> >>
> >
> >	I definitely open the reader after the writer is committed and 
> >	closed,
> >they're separate scripts.
> 
> Maybe you're not using this API correctly ?
> http://lucene.apache.org/java/3_0_3/api/core/org/apache/lucene/index/IndexReader.html#getIndexCommit()
> Why aren't you calling reader.getCommitUserData() directly ? (the former is 
> labeled Expert and liable to change but I don't know either APIs)
> http://lucene.apache.org/java/3_0_3/api/core/org/apache/lucene/index/IndexReader.html#getCommitUserData()

	I am using reader.getCommitUserData(), I just tried using
the getIndexCommit() to see if it would work or not.

> 
> Did you ask about this on java-user@lucene.apache.org ?

	I have not, that'll be the next thing, I wanted to check that
it should work in pylucene and I'm doing it correctly.

> 
> Of course, to help with narrowing down the bug, you could write the same 
> Java code directly and see if you encounter the same problem...

	Yup, and/or check the luke source since its able to show me the
information.


			Thanks again,


			-eric


Re: Using IndexWriter.commit(Map commitUserData) in pylucene

Posted by Andi Vajda <va...@apache.org>.
On Sat, 22 Jan 2011, Eric Hall wrote:

> On Wed, Jan 19, 2011 at 07:15:11PM -0800, Andi Vajda wrote:
>>
>> On Wed, 19 Jan 2011, Eric Hall wrote:
>>
>>> On Wed, Jan 19, 2011 at 11:05:28AM -0800, Andi Vajda wrote:
>>>>
>>>> On Wed, 19 Jan 2011, Eric Hall wrote:
>>>>
>>>
>>> 	Tha HashMap()... method works fine for storing the metadata, and luke
>>> shows that its there.  Of course getting the metadata back for later
>>> comparison/display is also useful, and seemed like it would be
>>> straightforward....  Naturally its not working using:
>>>
>>> 	indexMetaDataHashMap = reader.getCommitUserData()
>>>
>>> 	I get an empty hashmap back from the above.  I also tried using:
>
> 	Whups, I was wrong, I don't get a HashMap type back, I get
> a 'Map' type back (still empty).  I don't know if that makes a difference
> or not....

That shouldn't make a difference, that interface is wrapped too.
To see what the actual Java class is for that map, you can call getClass() 
on it. Maybe I'm missing something obvious here, how do you see that this 
Map instance is empty ?

>>>
>>> 	indexCommit = reader.getIndexCommit
>>> 	indexMetaDataHashMap = indexCommit.getUserData()
>>>
>>> with the same result (empty hashmap).  Is there a different way to do this?
>>>
>>
>> If you're getting a HashMap back then the PyLucene side of things is
>> working. If it's empty it could mean that you're doing something wrong
>> Lucene-wise or that you found a bug there. Could it be that you opened the
>> IndexReader before the IndexWriter got committed ? If so, reopen() it after
>> commit or move the opening code. If not, you may want to ask about this on
>> the Lucene user list at java-user@lucene.apache.org.
>>
>
> 	I definitely open the reader after the writer is committed and closed,
> they're separate scripts.

Maybe you're not using this API correctly ?
http://lucene.apache.org/java/3_0_3/api/core/org/apache/lucene/index/IndexReader.html#getIndexCommit()
Why aren't you calling reader.getCommitUserData() directly ? (the former is 
labeled Expert and liable to change but I don't know either APIs)
http://lucene.apache.org/java/3_0_3/api/core/org/apache/lucene/index/IndexReader.html#getCommitUserData()

Did you ask about this on java-user@lucene.apache.org ?

Of course, to help with narrowing down the bug, you could write the same 
Java code directly and see if you encounter the same problem...

Andi..

Re: Using IndexWriter.commit(Map commitUserData) in pylucene

Posted by Eric Hall <lu...@darkart.com>.
On Wed, Jan 19, 2011 at 07:15:11PM -0800, Andi Vajda wrote:
> 
> On Wed, 19 Jan 2011, Eric Hall wrote:
> 
> >On Wed, Jan 19, 2011 at 11:05:28AM -0800, Andi Vajda wrote:
> >>
> >>On Wed, 19 Jan 2011, Eric Hall wrote:
> >>
> >
> >	Tha HashMap()... method works fine for storing the metadata, and luke
> >shows that its there.  Of course getting the metadata back for later
> >comparison/display is also useful, and seemed like it would be
> >straightforward....  Naturally its not working using:
> >
> >	indexMetaDataHashMap = reader.getCommitUserData()
> >
> >	I get an empty hashmap back from the above.  I also tried using:

	Whups, I was wrong, I don't get a HashMap type back, I get
a 'Map' type back (still empty).  I don't know if that makes a difference
or not....


> >
> >	indexCommit = reader.getIndexCommit
> >	indexMetaDataHashMap = indexCommit.getUserData()
> >
> >with the same result (empty hashmap).  Is there a different way to do this?
> >
> 
> If you're getting a HashMap back then the PyLucene side of things is 
> working. If it's empty it could mean that you're doing something wrong 
> Lucene-wise or that you found a bug there. Could it be that you opened the 
> IndexReader before the IndexWriter got committed ? If so, reopen() it after 
> commit or move the opening code. If not, you may want to ask about this on 
> the Lucene user list at java-user@lucene.apache.org.
> 

	I definitely open the reader after the writer is committed and closed,
they're separate scripts.


		-eric


Re: Using IndexWriter.commit(Map commitUserData) in pylucene

Posted by Andi Vajda <va...@apache.org>.
On Wed, 19 Jan 2011, Eric Hall wrote:

> On Wed, Jan 19, 2011 at 11:05:28AM -0800, Andi Vajda wrote:
>>
>> On Wed, 19 Jan 2011, Eric Hall wrote:
>>
>>> 	I'd like to store some index metadata using
>>>
>>> 	IndexWriter.commit(Map<String,String> commitUserData)
>>>
>>> 	I've set up a python dict with string to string mappings,
>>> but if I use that I get an InvalidArgsError.  Is there a different
>>> python type to use in this call?
>>
>> Yes, use a Java HashMap for which there is a Python wrapper in PyLucene:
>>
>>  >>> import lucene
>>  >>> lucene.initVM()
>>   <jcc.JCCEnv object at 0x10040a0d8>
>>  >>> a = lucene.HashMap().of_(lucene.String, lucene.String)
>>  >>> a.put('foo', 'bar')
>>  >>> a
>>   <HashMap: {foo=bar}>
>>
>> The use of the of_() method is optional but it conveys the <String, String>
>> part and helps with enforcing the generic parameter and return type for the
>> map's methods.
>>
>> If you'd rather use a Python dict directly, see the example in PyLucene's
>> python/collections.py module where a Python set is wrapped by a class
>> called JavaSet. The same could be implemented for the java.util.Map
>> interface.
>
> 	Tha HashMap()... method works fine for storing the metadata, and luke
> shows that its there.  Of course getting the metadata back for later
> comparison/display is also useful, and seemed like it would be
> straightforward....  Naturally its not working using:
>
> 	indexMetaDataHashMap = reader.getCommitUserData()
>
> 	I get an empty hashmap back from the above.  I also tried using:
>
> 	indexCommit = reader.getIndexCommit
> 	indexMetaDataHashMap = indexCommit.getUserData()
>
> with the same result (empty hashmap).  Is there a different way to do this?
>

If you're getting a HashMap back then the PyLucene side of things is 
working. If it's empty it could mean that you're doing something wrong 
Lucene-wise or that you found a bug there. Could it be that you opened the 
IndexReader before the IndexWriter got committed ? If so, reopen() it after 
commit or move the opening code. If not, you may want to ask about this on 
the Lucene user list at java-user@lucene.apache.org.

Andi..

Re: Using IndexWriter.commit(Map commitUserData) in pylucene

Posted by Eric Hall <lu...@darkart.com>.
On Wed, Jan 19, 2011 at 11:05:28AM -0800, Andi Vajda wrote:
> 
> On Wed, 19 Jan 2011, Eric Hall wrote:
> 
> >	I'd like to store some index metadata using
> >
> >	IndexWriter.commit(Map<String,String> commitUserData)
> >
> >	I've set up a python dict with string to string mappings,
> >but if I use that I get an InvalidArgsError.  Is there a different
> >python type to use in this call?
> 
> Yes, use a Java HashMap for which there is a Python wrapper in PyLucene:
> 
>   >>> import lucene
>   >>> lucene.initVM()
>   <jcc.JCCEnv object at 0x10040a0d8>
>   >>> a = lucene.HashMap().of_(lucene.String, lucene.String)
>   >>> a.put('foo', 'bar')
>   >>> a
>   <HashMap: {foo=bar}>
> 
> The use of the of_() method is optional but it conveys the <String, String>
> part and helps with enforcing the generic parameter and return type for the 
> map's methods.
> 
> If you'd rather use a Python dict directly, see the example in PyLucene's 
> python/collections.py module where a Python set is wrapped by a class 
> called JavaSet. The same could be implemented for the java.util.Map 
> interface.

	Tha HashMap()... method works fine for storing the metadata, and luke
shows that its there.  Of course getting the metadata back for later
comparison/display is also useful, and seemed like it would be
straightforward....  Naturally its not working using:

	indexMetaDataHashMap = reader.getCommitUserData()

	I get an empty hashmap back from the above.  I also tried using:

	indexCommit = reader.getIndexCommit
	indexMetaDataHashMap = indexCommit.getUserData()

with the same result (empty hashmap).  Is there a different way to do this?


		Thanks again,


		-eric




Re: Using IndexWriter.commit(Map commitUserData) in pylucene

Posted by Andi Vajda <va...@apache.org>.
On Wed, 19 Jan 2011, Eric Hall wrote:

> 	I'd like to store some index metadata using
>
> 	IndexWriter.commit(Map<String,String> commitUserData)
>
> 	I've set up a python dict with string to string mappings,
> but if I use that I get an InvalidArgsError.  Is there a different
> python type to use in this call?

Yes, use a Java HashMap for which there is a Python wrapper in PyLucene:

   >>> import lucene
   >>> lucene.initVM()
   <jcc.JCCEnv object at 0x10040a0d8>
   >>> a = lucene.HashMap().of_(lucene.String, lucene.String)
   >>> a.put('foo', 'bar')
   >>> a
   <HashMap: {foo=bar}>

The use of the of_() method is optional but it conveys the <String, String>
part and helps with enforcing the generic parameter and return type for the 
map's methods.

If you'd rather use a Python dict directly, see the example in PyLucene's 
python/collections.py module where a Python set is wrapped by a class called 
JavaSet. The same could be implemented for the java.util.Map interface.

Andi..


>
> 	I'm using pylucene-3.0.3-1.
>
> Trimmed/pseudo sample (real code is on a different system):
>
> 	import lucene
>
> 	indexMetaDataDict = {"one":"two", "three":"four"}
> 	writer = lucene.IndexWriter(store, analyzer, True, lucene.IndexWriter.MaxFieldLength.UNLIMITED)
>
> 	## do document indexing
>
> 	writer.optimize()
>
> 	writer.commit(indexMetaDataDict)
>
> 	writer.close()
>
>
> 	If the "writer.commit(indexMetaDataDict)" line is commented out, it works fine.
> If its not, I get the InvalidArgsError back for that line.
>
>
>
> 		Thanks much,
>
>
> 		-eric
>