You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@mahout.apache.org by Karl Wettin <ka...@gmail.com> on 2008/04/18 23:37:18 UTC
local object storage
I need to persist my tree is some way. Was thinking ad hoc:
a file with branch node pks
a file with branch node records
a file with leaf node pks
a file with leaf node records
an optional file with node mean instances
Will probably start with BDB JE though. Any comments to adding that to
the libs?
karl
Re: local object storage
Posted by Andrzej Bialecki <ab...@getopt.org>.
Karl Wettin wrote:
> It should not be too hard. I was looking at ByteBuffer and FileChannels
> today but didn't figure out how to write it so it will automatically
> grow with more file segments as they are required.
>
> Anyone that can fix something like that in a few minutes?
This page contains some useful pointers:
http://aurora.regenstrief.org/~schadow/dbm-java/
License-compatible implementations include SoLinger and W3C dbm, there
is also jdbm.sourceforge.net. I looked through the code of SoLinger - it
seems very simple and easy to follow, so it could be a good-enough
candidate for further hacking (though not having used it I can't vote
for its quality).
--
Best regards,
Andrzej Bialecki <><
___. ___ ___ ___ _ _ __________________________________
[__ || __|__/|__||\/| Information Retrieval, Semantic Web
___|||__|| \| || | Embedded Unix, System Integration
http://www.sigram.com Contact: info at sigram dot com
Re: local object storage
Posted by Karl Wettin <ka...@gmail.com>.
The problem is that the tree built by the driver must be persistent so
that it can be opened again to add more instances and so that other
applications can navigate the tree when extracting the cluster for a
given instance using some stragegy.
It takes less than a millisecond to extract a cluster once the distance
between the nodes in the tree is calculated.
In my case the instances represents the documents in a Lucene index and
I can use to instantly cluster the results, have a "more like this" with
threadshold knob for each search result for the user to play with, and
what not.
This tree becomes rather large and you do not want to keep the whole
thing in memory, it needs to be persistent. I need some sort of local
object storage and I like BDB, but their license isn't really
comaptiable with the foundation. BDB is just a persistent hashtable and
I think I can make my own ASLed variant rather easy using Harmony code.
This is what I just sent to their list:
So I'm thinking I should clone your HashMap, make all access to element
data abstract and run it on ByteBuffers.
It would be used as a Map<K, DataFileEntry> pointing at where in a
object data file the current value is located.
Updaing values would mean to mark the old instance deleted, add a new
one to the end of the object data file and update the position in the index.
It would use Hadoops Writable to serialize keys and values.
It would initially be transactionless.
Any comments to this? Perhaps something similar already exists ASLed?
Ted Dunning skrev:
> Can you say a bit more.
>
> It looks to me like the hash map in the Apache Harmony project is a
> completely ordinary hashmap implementation.
>
> My confusion makes me think that I don't think I completely understood what
> you were referring to.
>
> On Sun, Apr 20, 2008 at 9:37 AM, Karl Wettin <ka...@gmail.com> wrote:
>
>> Karl Wettin skrev:
>>
>>> We could implement our own transactionless variant that use Writable for
>>> serialization. Is it possible to seek on DFS?
>>>
>> I think it could be a trivial thing to implement such a thing based on the
>> Harmony HashMap.
>>
>>
>> karl
>>
>>
>>
>>
>>
>>
>>
>>
>>
>
>
Re: local object storage
Posted by Ted Dunning <te...@gmail.com>.
Can you say a bit more.
It looks to me like the hash map in the Apache Harmony project is a
completely ordinary hashmap implementation.
My confusion makes me think that I don't think I completely understood what
you were referring to.
On Sun, Apr 20, 2008 at 9:37 AM, Karl Wettin <ka...@gmail.com> wrote:
> Karl Wettin skrev:
>
> > We could implement our own transactionless variant that use Writable for
> > serialization. Is it possible to seek on DFS?
> >
> I think it could be a trivial thing to implement such a thing based on the
> Harmony HashMap.
>
>
> karl
>
>
>
>
>
>
>
>
>
--
ted
Re: local object storage
Posted by Karl Wettin <ka...@gmail.com>.
Karl Wettin skrev:
> We could implement our own transactionless variant that use Writable for
> serialization. Is it possible to seek on DFS?
I think it could be a trivial thing to implement such a thing based on
the Harmony HashMap.
karl
Re: local object storage
Posted by Ted Dunning <td...@veoh.com>.
Yes.
Up to you whether it fits. I think the API is a bit specialized, even odd,
for what you want to do.
On 4/19/08 4:56 PM, "Karl Wettin" <ka...@gmail.com> wrote:
> You mean MapFile?
>
> http://hadoop.apache.org/core/docs/r0.16.3/api/org/apache/hadoop/io/MapFile.ht
> ml
>
> It says: The index file is read entirely into memory. Thus key
> implementations should try to keep themselves small.
>
> I'll support it though.
>
>
> Ted Dunning skrev:
>>
>> There is a MapTable available in Hadoop. It is a bit slow because random
>> reads from HDFS are kind of slow.
>>
>> It might be just what you need.
>>
>> On 4/19/08 3:47 PM, "Karl Wettin" <ka...@gmail.com> wrote:
>>
>>> We could implement our own transactionless variant that use Writable for
>>> serialization. Is it possible to seek on DFS?
>>>
>>>
>>> karl
>>>
>>> Grant Ingersoll skrev:
>>>> Yeah, I think it does the good ol' download process, meaning it isn't
>>>> compatible :-(
>>>>
>>>> How much work to roll your own? Or, I suppose, find something that is
>>>> compatible.
>>>>
>>>> On Apr 19, 2008, at 12:45 PM, Karl Wettin wrote:
>>>>
>>>>> trunk/contrib/db/bdb-je to be precise
>>>>>
>>>>> but I notice it is not in the libs there.
>>>>>
>>>>>
>>>>> Grant Ingersoll skrev:
>>>>>> Is that what Lucene Java contrib/db/bdb uses? Or at least a
>>>>>> different version?
>>>>>> On Apr 18, 2008, at 5:58 PM, Karl Wettin wrote:
>>>>>>> http://www.oracle.com/technology/software/products/berkeley-db/htdocs/li
>>>>>>> ce
>>>>>>> nsing.html
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> Grant Ingersoll skrev:
>>>>>>>> What's the license?
>>>>>>>> On Apr 18, 2008, at 5:37 PM, Karl Wettin wrote:
>>>>>>>>> I need to persist my tree is some way. Was thinking ad hoc:
>>>>>>>>>
>>>>>>>>> a file with branch node pks
>>>>>>>>> a file with branch node records
>>>>>>>>> a file with leaf node pks
>>>>>>>>> a file with leaf node records
>>>>>>>>> an optional file with node mean instances
>>>>>>>>>
>>>>>>>>> Will probably start with BDB JE though. Any comments to adding
>>>>>>>>> that to the libs?
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> karl
>>
>
Re: local object storage
Posted by Karl Wettin <ka...@gmail.com>.
You mean MapFile?
http://hadoop.apache.org/core/docs/r0.16.3/api/org/apache/hadoop/io/MapFile.html
It says: The index file is read entirely into memory. Thus key
implementations should try to keep themselves small.
I'll support it though.
Ted Dunning skrev:
>
> There is a MapTable available in Hadoop. It is a bit slow because random
> reads from HDFS are kind of slow.
>
> It might be just what you need.
>
> On 4/19/08 3:47 PM, "Karl Wettin" <ka...@gmail.com> wrote:
>
>> We could implement our own transactionless variant that use Writable for
>> serialization. Is it possible to seek on DFS?
>>
>>
>> karl
>>
>> Grant Ingersoll skrev:
>>> Yeah, I think it does the good ol' download process, meaning it isn't
>>> compatible :-(
>>>
>>> How much work to roll your own? Or, I suppose, find something that is
>>> compatible.
>>>
>>> On Apr 19, 2008, at 12:45 PM, Karl Wettin wrote:
>>>
>>>> trunk/contrib/db/bdb-je to be precise
>>>>
>>>> but I notice it is not in the libs there.
>>>>
>>>>
>>>> Grant Ingersoll skrev:
>>>>> Is that what Lucene Java contrib/db/bdb uses? Or at least a
>>>>> different version?
>>>>> On Apr 18, 2008, at 5:58 PM, Karl Wettin wrote:
>>>>>> http://www.oracle.com/technology/software/products/berkeley-db/htdocs/lice
>>>>>> nsing.html
>>>>>>
>>>>>>
>>>>>>
>>>>>> Grant Ingersoll skrev:
>>>>>>> What's the license?
>>>>>>> On Apr 18, 2008, at 5:37 PM, Karl Wettin wrote:
>>>>>>>> I need to persist my tree is some way. Was thinking ad hoc:
>>>>>>>>
>>>>>>>> a file with branch node pks
>>>>>>>> a file with branch node records
>>>>>>>> a file with leaf node pks
>>>>>>>> a file with leaf node records
>>>>>>>> an optional file with node mean instances
>>>>>>>>
>>>>>>>> Will probably start with BDB JE though. Any comments to adding
>>>>>>>> that to the libs?
>>>>>>>>
>>>>>>>>
>>>>>>>> karl
>
Re: local object storage
Posted by Ted Dunning <td...@veoh.com>.
There is a MapTable available in Hadoop. It is a bit slow because random
reads from HDFS are kind of slow.
It might be just what you need.
On 4/19/08 3:47 PM, "Karl Wettin" <ka...@gmail.com> wrote:
>
> We could implement our own transactionless variant that use Writable for
> serialization. Is it possible to seek on DFS?
>
>
> karl
>
> Grant Ingersoll skrev:
>> Yeah, I think it does the good ol' download process, meaning it isn't
>> compatible :-(
>>
>> How much work to roll your own? Or, I suppose, find something that is
>> compatible.
>>
>> On Apr 19, 2008, at 12:45 PM, Karl Wettin wrote:
>>
>>> trunk/contrib/db/bdb-je to be precise
>>>
>>> but I notice it is not in the libs there.
>>>
>>>
>>> Grant Ingersoll skrev:
>>>> Is that what Lucene Java contrib/db/bdb uses? Or at least a
>>>> different version?
>>>> On Apr 18, 2008, at 5:58 PM, Karl Wettin wrote:
>>>>> http://www.oracle.com/technology/software/products/berkeley-db/htdocs/lice
>>>>> nsing.html
>>>>>
>>>>>
>>>>>
>>>>> Grant Ingersoll skrev:
>>>>>> What's the license?
>>>>>> On Apr 18, 2008, at 5:37 PM, Karl Wettin wrote:
>>>>>>> I need to persist my tree is some way. Was thinking ad hoc:
>>>>>>>
>>>>>>> a file with branch node pks
>>>>>>> a file with branch node records
>>>>>>> a file with leaf node pks
>>>>>>> a file with leaf node records
>>>>>>> an optional file with node mean instances
>>>>>>>
>>>>>>> Will probably start with BDB JE though. Any comments to adding
>>>>>>> that to the libs?
>>>>>>>
>>>>>>>
>>>>>>> karl
>>>>>
>>>
>>
>
Re: local object storage
Posted by Karl Wettin <ka...@gmail.com>.
It should not be too hard. I was looking at ByteBuffer and FileChannels
today but didn't figure out how to write it so it will automatically
grow with more file segments as they are required.
Anyone that can fix something like that in a few minutes?
The tree is abstract for persistency. Implementations use a combination
of visitors and factories and it is quite simple to add support for
anything else. Derby?
I often say that BDB is the perfect balance between OODBMS and RDBMS.
All entities are serialized with all aggregated data and associated with
a primary key in a hashtable on disk. Thats it.
We could implement our own transactionless variant that use Writable for
serialization. Is it possible to seek on DFS?
karl
Grant Ingersoll skrev:
> Yeah, I think it does the good ol' download process, meaning it isn't
> compatible :-(
>
> How much work to roll your own? Or, I suppose, find something that is
> compatible.
>
> On Apr 19, 2008, at 12:45 PM, Karl Wettin wrote:
>
>> trunk/contrib/db/bdb-je to be precise
>>
>> but I notice it is not in the libs there.
>>
>>
>> Grant Ingersoll skrev:
>>> Is that what Lucene Java contrib/db/bdb uses? Or at least a
>>> different version?
>>> On Apr 18, 2008, at 5:58 PM, Karl Wettin wrote:
>>>> http://www.oracle.com/technology/software/products/berkeley-db/htdocs/licensing.html
>>>>
>>>>
>>>>
>>>> Grant Ingersoll skrev:
>>>>> What's the license?
>>>>> On Apr 18, 2008, at 5:37 PM, Karl Wettin wrote:
>>>>>> I need to persist my tree is some way. Was thinking ad hoc:
>>>>>>
>>>>>> a file with branch node pks
>>>>>> a file with branch node records
>>>>>> a file with leaf node pks
>>>>>> a file with leaf node records
>>>>>> an optional file with node mean instances
>>>>>>
>>>>>> Will probably start with BDB JE though. Any comments to adding
>>>>>> that to the libs?
>>>>>>
>>>>>>
>>>>>> karl
>>>>
>>
>
Re: local object storage
Posted by Grant Ingersoll <gs...@apache.org>.
Yeah, I think it does the good ol' download process, meaning it isn't
compatible :-(
How much work to roll your own? Or, I suppose, find something that is
compatible.
On Apr 19, 2008, at 12:45 PM, Karl Wettin wrote:
> trunk/contrib/db/bdb-je to be precise
>
> but I notice it is not in the libs there.
>
>
> Grant Ingersoll skrev:
>> Is that what Lucene Java contrib/db/bdb uses? Or at least a
>> different version?
>> On Apr 18, 2008, at 5:58 PM, Karl Wettin wrote:
>>> http://www.oracle.com/technology/software/products/berkeley-db/htdocs/licensing.html
>>>
>>>
>>> Grant Ingersoll skrev:
>>>> What's the license?
>>>> On Apr 18, 2008, at 5:37 PM, Karl Wettin wrote:
>>>>> I need to persist my tree is some way. Was thinking ad hoc:
>>>>>
>>>>> a file with branch node pks
>>>>> a file with branch node records
>>>>> a file with leaf node pks
>>>>> a file with leaf node records
>>>>> an optional file with node mean instances
>>>>>
>>>>> Will probably start with BDB JE though. Any comments to adding
>>>>> that to the libs?
>>>>>
>>>>>
>>>>> karl
>>>
>
Re: local object storage
Posted by Karl Wettin <ka...@gmail.com>.
trunk/contrib/db/bdb-je to be precise
but I notice it is not in the libs there.
Grant Ingersoll skrev:
> Is that what Lucene Java contrib/db/bdb uses? Or at least a different
> version?
>
>
> On Apr 18, 2008, at 5:58 PM, Karl Wettin wrote:
>
>> http://www.oracle.com/technology/software/products/berkeley-db/htdocs/licensing.html
>>
>>
>>
>> Grant Ingersoll skrev:
>>> What's the license?
>>> On Apr 18, 2008, at 5:37 PM, Karl Wettin wrote:
>>>> I need to persist my tree is some way. Was thinking ad hoc:
>>>>
>>>> a file with branch node pks
>>>> a file with branch node records
>>>> a file with leaf node pks
>>>> a file with leaf node records
>>>> an optional file with node mean instances
>>>>
>>>> Will probably start with BDB JE though. Any comments to adding that
>>>> to the libs?
>>>>
>>>>
>>>> karl
>>
>
>
Re: local object storage
Posted by Grant Ingersoll <gs...@apache.org>.
Is that what Lucene Java contrib/db/bdb uses? Or at least a different
version?
On Apr 18, 2008, at 5:58 PM, Karl Wettin wrote:
> http://www.oracle.com/technology/software/products/berkeley-db/htdocs/licensing.html
>
>
> Grant Ingersoll skrev:
>> What's the license?
>> On Apr 18, 2008, at 5:37 PM, Karl Wettin wrote:
>>> I need to persist my tree is some way. Was thinking ad hoc:
>>>
>>> a file with branch node pks
>>> a file with branch node records
>>> a file with leaf node pks
>>> a file with leaf node records
>>> an optional file with node mean instances
>>>
>>> Will probably start with BDB JE though. Any comments to adding
>>> that to the libs?
>>>
>>>
>>> karl
>
Re: local object storage
Posted by Karl Wettin <ka...@gmail.com>.
http://www.oracle.com/technology/software/products/berkeley-db/htdocs/licensing.html
Grant Ingersoll skrev:
> What's the license?
>
> On Apr 18, 2008, at 5:37 PM, Karl Wettin wrote:
>
>> I need to persist my tree is some way. Was thinking ad hoc:
>>
>> a file with branch node pks
>> a file with branch node records
>> a file with leaf node pks
>> a file with leaf node records
>> an optional file with node mean instances
>>
>> Will probably start with BDB JE though. Any comments to adding that to
>> the libs?
>>
>>
>> karl
>
Re: local object storage
Posted by Grant Ingersoll <gs...@apache.org>.
What's the license?
On Apr 18, 2008, at 5:37 PM, Karl Wettin wrote:
> I need to persist my tree is some way. Was thinking ad hoc:
>
> a file with branch node pks
> a file with branch node records
> a file with leaf node pks
> a file with leaf node records
> an optional file with node mean instances
>
> Will probably start with BDB JE though. Any comments to adding that
> to the libs?
>
>
> karl