You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@mahout.apache.org by Karl Wettin <ka...@gmail.com> on 2008/04/18 23:37:18 UTC

local object storage

I need to persist my tree is some way. Was thinking ad hoc:

a file with branch node pks
a file with branch node records
a file with leaf node pks
a file with leaf node records
an optional file with node mean instances

Will probably start with BDB JE though. Any comments to adding that to 
the libs?


       karl

Re: local object storage

Posted by Andrzej Bialecki <ab...@getopt.org>.
Karl Wettin wrote:
> It should not be too hard. I was looking at ByteBuffer and FileChannels 
> today but didn't figure out how to write it so it will automatically 
> grow with more file segments as they are required.
> 
> Anyone that can fix something like that in a few minutes?

This page contains some useful pointers:

http://aurora.regenstrief.org/~schadow/dbm-java/

License-compatible implementations include SoLinger and W3C dbm, there 
is also jdbm.sourceforge.net. I looked through the code of SoLinger - it 
seems very simple and easy to follow, so it could be a good-enough 
candidate for further hacking (though not having used it I can't vote 
for its quality).

-- 
Best regards,
Andrzej Bialecki     <><
  ___. ___ ___ ___ _ _   __________________________________
[__ || __|__/|__||\/|  Information Retrieval, Semantic Web
___|||__||  \|  ||  |  Embedded Unix, System Integration
http://www.sigram.com  Contact: info at sigram dot com


Re: local object storage

Posted by Karl Wettin <ka...@gmail.com>.
The problem is that the tree built by the driver must be persistent so 
that it can be opened again to add more instances and so that other 
applications can navigate the tree when extracting the cluster for a 
given instance using some stragegy.

It takes less than a millisecond to extract a cluster once the distance 
between the nodes in the tree is calculated.

In my case the instances represents the documents in a Lucene index and 
I can use to instantly cluster the results, have a "more like this" with 
threadshold knob for each search result for the user to play with, and 
what not.


This tree becomes rather large and you do not want to keep the whole 
thing in memory, it needs to be persistent. I need some sort of local 
object storage and I like BDB, but their license isn't really 
comaptiable with the foundation. BDB is just a persistent hashtable and 
I think I can make my own ASLed variant rather easy using Harmony code.

This is what I just sent to their list:

So I'm thinking I should clone your HashMap, make all access to element 
data abstract and run it on ByteBuffers.

It would be used as a Map<K, DataFileEntry> pointing at where in a 
object data file the current value is located.

Updaing values would mean to mark the old instance deleted, add a new 
one to the end of the object data file and update the position in the index.

It would use Hadoops Writable to serialize keys and values.

It would initially be transactionless.


Any comments to this? Perhaps something similar already exists ASLed?



Ted Dunning skrev:
> Can you say a bit more.
> 
> It looks to me like the hash map in the Apache Harmony project is a
> completely ordinary hashmap implementation.
> 
> My confusion makes me think that I don't think I completely understood what
> you were referring to.
> 
> On Sun, Apr 20, 2008 at 9:37 AM, Karl Wettin <ka...@gmail.com> wrote:
> 
>> Karl Wettin skrev:
>>
>>> We could implement our own transactionless variant that use Writable for
>>> serialization. Is it possible to seek on DFS?
>>>
>> I think it could be a trivial thing to implement such a thing based on the
>> Harmony HashMap.
>>
>>
>>    karl
>>
>>
>>
>>
>>
>>
>>
>>
>>
> 
> 


Re: local object storage

Posted by Ted Dunning <te...@gmail.com>.
Can you say a bit more.

It looks to me like the hash map in the Apache Harmony project is a
completely ordinary hashmap implementation.

My confusion makes me think that I don't think I completely understood what
you were referring to.

On Sun, Apr 20, 2008 at 9:37 AM, Karl Wettin <ka...@gmail.com> wrote:

> Karl Wettin skrev:
>
> > We could implement our own transactionless variant that use Writable for
> > serialization. Is it possible to seek on DFS?
> >
> I think it could be a trivial thing to implement such a thing based on the
> Harmony HashMap.
>
>
>    karl
>
>
>
>
>
>
>
>
>


-- 
ted

Re: local object storage

Posted by Karl Wettin <ka...@gmail.com>.
Karl Wettin skrev:
> We could implement our own transactionless variant that use Writable for 
> serialization. Is it possible to seek on DFS?
I think it could be a trivial thing to implement such a thing based on 
the Harmony HashMap.


     karl









Re: local object storage

Posted by Ted Dunning <td...@veoh.com>.
Yes.

Up to you whether it fits.  I think the API is a bit specialized, even odd,
for what you want to do.


On 4/19/08 4:56 PM, "Karl Wettin" <ka...@gmail.com> wrote:

> You mean MapFile?
> 
> http://hadoop.apache.org/core/docs/r0.16.3/api/org/apache/hadoop/io/MapFile.ht
> ml
> 
> It says: The index file is read entirely into memory. Thus key
> implementations should try to keep themselves small.
> 
> I'll support it though.
> 
> 
> Ted Dunning skrev:
>> 
>> There is a MapTable available in Hadoop.  It is a bit slow because random
>> reads from HDFS are kind of slow.
>> 
>> It might be just what you need.
>> 
>> On 4/19/08 3:47 PM, "Karl Wettin" <ka...@gmail.com> wrote:
>> 
>>> We could implement our own transactionless variant that use Writable for
>>> serialization. Is it possible to seek on DFS?
>>> 
>>> 
>>>        karl
>>> 
>>> Grant Ingersoll skrev:
>>>> Yeah, I think it does the good ol' download process, meaning it isn't
>>>> compatible :-(
>>>> 
>>>> How much work to roll your own?  Or, I suppose, find something that is
>>>> compatible.
>>>> 
>>>> On Apr 19, 2008, at 12:45 PM, Karl Wettin wrote:
>>>> 
>>>>> trunk/contrib/db/bdb-je to be precise
>>>>> 
>>>>> but I notice it is not in the libs there.
>>>>> 
>>>>> 
>>>>> Grant Ingersoll skrev:
>>>>>> Is that what Lucene Java contrib/db/bdb uses?  Or at least a
>>>>>> different version?
>>>>>> On Apr 18, 2008, at 5:58 PM, Karl Wettin wrote:
>>>>>>> http://www.oracle.com/technology/software/products/berkeley-db/htdocs/li
>>>>>>> ce
>>>>>>> nsing.html 
>>>>>>> 
>>>>>>> 
>>>>>>> 
>>>>>>> Grant Ingersoll skrev:
>>>>>>>> What's the license?
>>>>>>>> On Apr 18, 2008, at 5:37 PM, Karl Wettin wrote:
>>>>>>>>> I need to persist my tree is some way. Was thinking ad hoc:
>>>>>>>>> 
>>>>>>>>> a file with branch node pks
>>>>>>>>> a file with branch node records
>>>>>>>>> a file with leaf node pks
>>>>>>>>> a file with leaf node records
>>>>>>>>> an optional file with node mean instances
>>>>>>>>> 
>>>>>>>>> Will probably start with BDB JE though. Any comments to adding
>>>>>>>>> that to the libs?
>>>>>>>>> 
>>>>>>>>> 
>>>>>>>>>    karl
>> 
> 


Re: local object storage

Posted by Karl Wettin <ka...@gmail.com>.
You mean MapFile?

http://hadoop.apache.org/core/docs/r0.16.3/api/org/apache/hadoop/io/MapFile.html

It says: The index file is read entirely into memory. Thus key 
implementations should try to keep themselves small.

I'll support it though.


Ted Dunning skrev:
> 
> There is a MapTable available in Hadoop.  It is a bit slow because random
> reads from HDFS are kind of slow.
> 
> It might be just what you need.
> 
> On 4/19/08 3:47 PM, "Karl Wettin" <ka...@gmail.com> wrote:
> 
>> We could implement our own transactionless variant that use Writable for
>> serialization. Is it possible to seek on DFS?
>>
>>
>>        karl
>>
>> Grant Ingersoll skrev:
>>> Yeah, I think it does the good ol' download process, meaning it isn't
>>> compatible :-(
>>>
>>> How much work to roll your own?  Or, I suppose, find something that is
>>> compatible.
>>>
>>> On Apr 19, 2008, at 12:45 PM, Karl Wettin wrote:
>>>
>>>> trunk/contrib/db/bdb-je to be precise
>>>>
>>>> but I notice it is not in the libs there.
>>>>
>>>>
>>>> Grant Ingersoll skrev:
>>>>> Is that what Lucene Java contrib/db/bdb uses?  Or at least a
>>>>> different version?
>>>>> On Apr 18, 2008, at 5:58 PM, Karl Wettin wrote:
>>>>>> http://www.oracle.com/technology/software/products/berkeley-db/htdocs/lice
>>>>>> nsing.html 
>>>>>>
>>>>>>
>>>>>>
>>>>>> Grant Ingersoll skrev:
>>>>>>> What's the license?
>>>>>>> On Apr 18, 2008, at 5:37 PM, Karl Wettin wrote:
>>>>>>>> I need to persist my tree is some way. Was thinking ad hoc:
>>>>>>>>
>>>>>>>> a file with branch node pks
>>>>>>>> a file with branch node records
>>>>>>>> a file with leaf node pks
>>>>>>>> a file with leaf node records
>>>>>>>> an optional file with node mean instances
>>>>>>>>
>>>>>>>> Will probably start with BDB JE though. Any comments to adding
>>>>>>>> that to the libs?
>>>>>>>>
>>>>>>>>
>>>>>>>>    karl
> 


Re: local object storage

Posted by Ted Dunning <td...@veoh.com>.

There is a MapTable available in Hadoop.  It is a bit slow because random
reads from HDFS are kind of slow.

It might be just what you need.

On 4/19/08 3:47 PM, "Karl Wettin" <ka...@gmail.com> wrote:

> 
> We could implement our own transactionless variant that use Writable for
> serialization. Is it possible to seek on DFS?
> 
> 
>        karl
> 
> Grant Ingersoll skrev:
>> Yeah, I think it does the good ol' download process, meaning it isn't
>> compatible :-(
>> 
>> How much work to roll your own?  Or, I suppose, find something that is
>> compatible.
>> 
>> On Apr 19, 2008, at 12:45 PM, Karl Wettin wrote:
>> 
>>> trunk/contrib/db/bdb-je to be precise
>>> 
>>> but I notice it is not in the libs there.
>>> 
>>> 
>>> Grant Ingersoll skrev:
>>>> Is that what Lucene Java contrib/db/bdb uses?  Or at least a
>>>> different version?
>>>> On Apr 18, 2008, at 5:58 PM, Karl Wettin wrote:
>>>>> http://www.oracle.com/technology/software/products/berkeley-db/htdocs/lice
>>>>> nsing.html 
>>>>> 
>>>>> 
>>>>> 
>>>>> Grant Ingersoll skrev:
>>>>>> What's the license?
>>>>>> On Apr 18, 2008, at 5:37 PM, Karl Wettin wrote:
>>>>>>> I need to persist my tree is some way. Was thinking ad hoc:
>>>>>>> 
>>>>>>> a file with branch node pks
>>>>>>> a file with branch node records
>>>>>>> a file with leaf node pks
>>>>>>> a file with leaf node records
>>>>>>> an optional file with node mean instances
>>>>>>> 
>>>>>>> Will probably start with BDB JE though. Any comments to adding
>>>>>>> that to the libs?
>>>>>>> 
>>>>>>> 
>>>>>>>    karl
>>>>> 
>>> 
>> 
> 


Re: local object storage

Posted by Karl Wettin <ka...@gmail.com>.
It should not be too hard. I was looking at ByteBuffer and FileChannels 
today but didn't figure out how to write it so it will automatically 
grow with more file segments as they are required.

Anyone that can fix something like that in a few minutes?

The tree is abstract for persistency. Implementations use a combination 
of visitors and factories and it is quite simple to add support for 
anything else. Derby?

I often say that BDB is the perfect balance between OODBMS and RDBMS. 
All entities are serialized with all aggregated data and associated with 
a primary key in a hashtable on disk. Thats it.

We could implement our own transactionless variant that use Writable for 
serialization. Is it possible to seek on DFS?


       karl

Grant Ingersoll skrev:
> Yeah, I think it does the good ol' download process, meaning it isn't 
> compatible :-(
> 
> How much work to roll your own?  Or, I suppose, find something that is 
> compatible.
> 
> On Apr 19, 2008, at 12:45 PM, Karl Wettin wrote:
> 
>> trunk/contrib/db/bdb-je to be precise
>>
>> but I notice it is not in the libs there.
>>
>>
>> Grant Ingersoll skrev:
>>> Is that what Lucene Java contrib/db/bdb uses?  Or at least a 
>>> different version?
>>> On Apr 18, 2008, at 5:58 PM, Karl Wettin wrote:
>>>> http://www.oracle.com/technology/software/products/berkeley-db/htdocs/licensing.html 
>>>>
>>>>
>>>>
>>>> Grant Ingersoll skrev:
>>>>> What's the license?
>>>>> On Apr 18, 2008, at 5:37 PM, Karl Wettin wrote:
>>>>>> I need to persist my tree is some way. Was thinking ad hoc:
>>>>>>
>>>>>> a file with branch node pks
>>>>>> a file with branch node records
>>>>>> a file with leaf node pks
>>>>>> a file with leaf node records
>>>>>> an optional file with node mean instances
>>>>>>
>>>>>> Will probably start with BDB JE though. Any comments to adding 
>>>>>> that to the libs?
>>>>>>
>>>>>>
>>>>>>    karl
>>>>
>>
> 


Re: local object storage

Posted by Grant Ingersoll <gs...@apache.org>.
Yeah, I think it does the good ol' download process, meaning it isn't  
compatible :-(

How much work to roll your own?  Or, I suppose, find something that is  
compatible.

On Apr 19, 2008, at 12:45 PM, Karl Wettin wrote:

> trunk/contrib/db/bdb-je to be precise
>
> but I notice it is not in the libs there.
>
>
> Grant Ingersoll skrev:
>> Is that what Lucene Java contrib/db/bdb uses?  Or at least a  
>> different version?
>> On Apr 18, 2008, at 5:58 PM, Karl Wettin wrote:
>>> http://www.oracle.com/technology/software/products/berkeley-db/htdocs/licensing.html
>>>
>>>
>>> Grant Ingersoll skrev:
>>>> What's the license?
>>>> On Apr 18, 2008, at 5:37 PM, Karl Wettin wrote:
>>>>> I need to persist my tree is some way. Was thinking ad hoc:
>>>>>
>>>>> a file with branch node pks
>>>>> a file with branch node records
>>>>> a file with leaf node pks
>>>>> a file with leaf node records
>>>>> an optional file with node mean instances
>>>>>
>>>>> Will probably start with BDB JE though. Any comments to adding  
>>>>> that to the libs?
>>>>>
>>>>>
>>>>>    karl
>>>
>


Re: local object storage

Posted by Karl Wettin <ka...@gmail.com>.
trunk/contrib/db/bdb-je to be precise

but I notice it is not in the libs there.


Grant Ingersoll skrev:
> Is that what Lucene Java contrib/db/bdb uses?  Or at least a different 
> version?
> 
> 
> On Apr 18, 2008, at 5:58 PM, Karl Wettin wrote:
> 
>> http://www.oracle.com/technology/software/products/berkeley-db/htdocs/licensing.html 
>>
>>
>>
>> Grant Ingersoll skrev:
>>> What's the license?
>>> On Apr 18, 2008, at 5:37 PM, Karl Wettin wrote:
>>>> I need to persist my tree is some way. Was thinking ad hoc:
>>>>
>>>> a file with branch node pks
>>>> a file with branch node records
>>>> a file with leaf node pks
>>>> a file with leaf node records
>>>> an optional file with node mean instances
>>>>
>>>> Will probably start with BDB JE though. Any comments to adding that 
>>>> to the libs?
>>>>
>>>>
>>>>     karl
>>
> 
> 


Re: local object storage

Posted by Grant Ingersoll <gs...@apache.org>.
Is that what Lucene Java contrib/db/bdb uses?  Or at least a different  
version?


On Apr 18, 2008, at 5:58 PM, Karl Wettin wrote:

> http://www.oracle.com/technology/software/products/berkeley-db/htdocs/licensing.html
>
>
> Grant Ingersoll skrev:
>> What's the license?
>> On Apr 18, 2008, at 5:37 PM, Karl Wettin wrote:
>>> I need to persist my tree is some way. Was thinking ad hoc:
>>>
>>> a file with branch node pks
>>> a file with branch node records
>>> a file with leaf node pks
>>> a file with leaf node records
>>> an optional file with node mean instances
>>>
>>> Will probably start with BDB JE though. Any comments to adding  
>>> that to the libs?
>>>
>>>
>>>     karl
>



Re: local object storage

Posted by Karl Wettin <ka...@gmail.com>.
http://www.oracle.com/technology/software/products/berkeley-db/htdocs/licensing.html


Grant Ingersoll skrev:
> What's the license?
> 
> On Apr 18, 2008, at 5:37 PM, Karl Wettin wrote:
> 
>> I need to persist my tree is some way. Was thinking ad hoc:
>>
>> a file with branch node pks
>> a file with branch node records
>> a file with leaf node pks
>> a file with leaf node records
>> an optional file with node mean instances
>>
>> Will probably start with BDB JE though. Any comments to adding that to 
>> the libs?
>>
>>
>>      karl
> 


Re: local object storage

Posted by Grant Ingersoll <gs...@apache.org>.
What's the license?

On Apr 18, 2008, at 5:37 PM, Karl Wettin wrote:

> I need to persist my tree is some way. Was thinking ad hoc:
>
> a file with branch node pks
> a file with branch node records
> a file with leaf node pks
> a file with leaf node records
> an optional file with node mean instances
>
> Will probably start with BDB JE though. Any comments to adding that  
> to the libs?
>
>
>      karl