You are viewing a plain text version of this content. The canonical link for it is here.

Posted to solr-dev@lucene.apache.org by Noble Paul നോബിള്‍ नोब्ळ् <no...@gmail.com> on 2008/12/01 10:43:01 UTC

Can we use Berkley DB java in Solr

I guess it is not possible to distribute it w/ apache products because
of the license.

But can we have a compile time dependency ?

I may be interested in that for SOLR-828

-- 
--Noble Paul

Re: Can we use Berkley DB java in Solr

Posted by Noble Paul നോബിള്‍ नोब्ळ् <no...@gmail.com>.

cool. So we can have a compile time dependency if we wish to do so.
Anyway in the first cut I'll be only coding for JDBC.

And users can be given an option to download and drop in the jar if
they wish to use BDB-JE as the backup option.

My considerations are these w/ the data store.
* Embedded (if possible). I can store the data in the dataDir itself.
* Support huge data store. (HSQLDB has 8GB limit)
* Performance
* small footprint

On Tue, Dec 2, 2008 at 10:55 AM, Chris Hostetter
<ho...@fucit.org> wrote:
>
> : I just wanted to know if BDB-JE can be given as an option or not.
>
> Lucene-Java has a contrib with a BDB dependency (contrib/db), it just
> can't be included in the release.
>
>
> -Hoss
>
>

-- 
--Noble Paul

Re: Can we use Berkley DB java in Solr

Posted by Chris Hostetter <ho...@fucit.org>.

: I just wanted to know if BDB-JE can be given as an option or not.

Lucene-Java has a contrib with a BDB dependency (contrib/db), it just 
can't be included in the release.


-Hoss

Re: Can we use Berkley DB java in Solr

Posted by Noble Paul നോബിള്‍ नोब्ळ् <no...@gmail.com>.

On Mon, Dec 1, 2008 at 4:36 PM, Ian Holsman <li...@holsman.net> wrote:
> could you use something like derby http://db.apache.org/derby/  or hsqldb ?
> sure... it has a SQL interface, but it might let people use other DB
> backends.
JDBC is definitely there. So HSQLDB is something I have in mind. Derby
is too big to be included in a distro (I guess HSQLDB is faster too).

I just wanted to know if BDB-JE can be given as an option or not.

>
> regards
> Ian
>
> On Mon, Dec 1, 2008 at 8:43 PM, Noble Paul നോബിള്‍ नोब्ळ् <
> noble.paul@gmail.com> wrote:
>
>> I guess it is not possible to distribute it w/ apache products because
>> of the license.
>>
>> But can we have a compile time dependency ?
>>
>> I may be interested in that for SOLR-828
>>
>> --
>> --Noble Paul
>>
>



-- 
--Noble Paul

Re: Can we use Berkley DB java in Solr

Posted by Ian Holsman <li...@holsman.net>.

could you use something like derby http://db.apache.org/derby/  or hsqldb ?
sure... it has a SQL interface, but it might let people use other DB
backends.

regards
Ian

On Mon, Dec 1, 2008 at 8:43 PM, Noble Paul നോബിള്‍ नोब्ळ् <
noble.paul@gmail.com> wrote:

> I guess it is not possible to distribute it w/ apache products because
> of the license.
>
> But can we have a compile time dependency ?
>
> I may be interested in that for SOLR-828
>
> --
> --Noble Paul
>

Re: Can we use Berkley DB java in Solr

Posted by Yonik Seeley <yo...@apache.org>.

On Tue, Dec 2, 2008 at 11:47 AM, Noble Paul നോബിള്‍ नोब्ळ्
<no...@gmail.com> wrote:
> OK . Could you guys give some quick feedback on SOLR-828 and SOLR-810
>
> If I get early feedback I may be able to avoid rewrites.

Took a very quick look.  Seems like documents should have a version or
revision number instead of a "COMMITTED" column.

I'm not sure how the whole scheme is supposed to work though.  It
might be helpful to describe how things work w/o reference to specific
UpdateProcessor methods - that gets too much into implementation.

Also, in Solr-828 please move the big description into a comment (the
description field is echoed on every update, so it should be kept
small).

-Yonik

> On Tue, Dec 2, 2008 at 10:14 PM, Noble Paul നോബിള്‍ नोब्ळ्
> <no...@gmail.com> wrote:
>> OK . So , I'll stick to JDBC. Derby looks like the best bet
>>
>> If we must ship it along w/ Solr it is another 2.6MB jar (embdded
>> version) with the distro.
>>
>>
>>
>> On Tue, Dec 2, 2008 at 9:35 PM, Jason Rutherglen
>> <ja...@gmail.com> wrote:
>>> It's mostly dead and synchronizes on reads and writes.
>>>
>>> On Tue, Dec 2, 2008 at 7:46 AM, Yonik Seeley <yo...@apache.org> wrote:
>>>
>>>> On Tue, Dec 2, 2008 at 10:12 AM, Andrzej Bialecki <ab...@getopt.org> wrote:
>>>> > Please consider using JDBM, now in the Apache incubator,
>>>>
>>>> It doesn't look like it's in the incubator yet... there was interest,
>>>> but a proposal was never put on the wiki.
>>>>
>>>> > but with a long
>>>> > history at SF.net and wide usage. Its API is essentially the same as BDB.
>>>>
>>>> You wouldn't be able to tell by the SourceForge page - it has the look
>>>> of a dead project.
>>>>
>>>> -Yonik
>>>>
>>>
>>
>>
>>
>> --
>> --Noble Paul
>>
>
>
>
> --
> --Noble Paul
>

Re: Can we use Berkley DB java in Solr

Posted by Noble Paul നോബിള്‍ नोब्ळ् <no...@gmail.com>.

OK . Could you guys give some quick feedback on SOLR-828 and SOLR-810

If I get early feedback I may be able to avoid rewrites.

On Tue, Dec 2, 2008 at 10:14 PM, Noble Paul നോബിള്‍ नोब्ळ्
<no...@gmail.com> wrote:
> OK . So , I'll stick to JDBC. Derby looks like the best bet
>
> If we must ship it along w/ Solr it is another 2.6MB jar (embdded
> version) with the distro.
>
>
>
> On Tue, Dec 2, 2008 at 9:35 PM, Jason Rutherglen
> <ja...@gmail.com> wrote:
>> It's mostly dead and synchronizes on reads and writes.
>>
>> On Tue, Dec 2, 2008 at 7:46 AM, Yonik Seeley <yo...@apache.org> wrote:
>>
>>> On Tue, Dec 2, 2008 at 10:12 AM, Andrzej Bialecki <ab...@getopt.org> wrote:
>>> > Please consider using JDBM, now in the Apache incubator,
>>>
>>> It doesn't look like it's in the incubator yet... there was interest,
>>> but a proposal was never put on the wiki.
>>>
>>> > but with a long
>>> > history at SF.net and wide usage. Its API is essentially the same as BDB.
>>>
>>> You wouldn't be able to tell by the SourceForge page - it has the look
>>> of a dead project.
>>>
>>> -Yonik
>>>
>>
>
>
>
> --
> --Noble Paul
>



-- 
--Noble Paul

Re: Can we use Berkley DB java in Solr

Posted by Noble Paul നോബിള്‍ नोब्ळ् <no...@gmail.com>.

OK . So , I'll stick to JDBC. Derby looks like the best bet

If we must ship it along w/ Solr it is another 2.6MB jar (embdded
version) with the distro.



On Tue, Dec 2, 2008 at 9:35 PM, Jason Rutherglen
<ja...@gmail.com> wrote:
> It's mostly dead and synchronizes on reads and writes.
>
> On Tue, Dec 2, 2008 at 7:46 AM, Yonik Seeley <yo...@apache.org> wrote:
>
>> On Tue, Dec 2, 2008 at 10:12 AM, Andrzej Bialecki <ab...@getopt.org> wrote:
>> > Please consider using JDBM, now in the Apache incubator,
>>
>> It doesn't look like it's in the incubator yet... there was interest,
>> but a proposal was never put on the wiki.
>>
>> > but with a long
>> > history at SF.net and wide usage. Its API is essentially the same as BDB.
>>
>> You wouldn't be able to tell by the SourceForge page - it has the look
>> of a dead project.
>>
>> -Yonik
>>
>



-- 
--Noble Paul

Re: Can we use Berkley DB java in Solr

Posted by Noble Paul നോബിള്‍ नोब्ळ् <no...@gmail.com>.

I shall try to explain the implementation.

Ideally the backup store should keep all fields of all uncommitted docs.

For committed docs, it must store all the non-stored fields and any
field which is a destination of copyField.

The idea of fetching all the stored fields from DB is not right I
guess . We cannot do it because the DB is available only in the master
(It is not replicated).

The schema in the description is a bit dated . instead of the field
'COMMITTED' it is called STATUS. It is an enum with values

COMITTED = 0;
UNCOMMITTED = 1;
UNCOMMITTED_MARKED_FOR_DELETE = 2
COMMITTED_MARKED_FOR_DELETE = 3

If before committing a document is overwritten multiple times we just
need to keep one version of UNCOMMITTED doc right?

The other important change is SOLR-810. It is an attempt to reduce the
datastore size.

All the field names are stored in another table . These are the
KNOWN_STRING collection .

Please ask if anymore details is required

--Noble

On Tue, Dec 2, 2008 at 10:55 PM, Yonik Seeley <yo...@apache.org> wrote:
> On Tue, Dec 2, 2008 at 12:08 PM, Shalin Shekhar Mangar
> <sh...@gmail.com> wrote:
>> HSQLDB has a limit of upto 8GB of data. In Solr, you might want to go beyond
>> that without a commit.
>
> Ah, so you only use the DB to store the uncommitted docs so you can
> quickly reference their fields?
>
> If that's all, we might be able to get that functionality from Lucene
> (IndexWriter would need to support getting the latest stored fields by
> term at least).
>
> The other option is to actually store the stored fields permanently in
> the DB... and get them from the DB instead of Lucene when requested.
> That's the road I thought you were going down.
>
> -Yonik
>
>
>> On Tue, Dec 2, 2008 at 10:33 PM, Dawid Weiss
>> <da...@cs.put.poznan.pl>wrote:
>>
>>>
>>> Isn't HSQLDB an option? Its performance ranges a lot depending on the
>>> volume of data and queries, but otherwise the license looks BSDish.
>>>
>>> http://hsqldb.org/web/hsqlLicense.html
>>>
>>> Dawid
>>>
>>
>>
>>
>> --
>> Regards,
>> Shalin Shekhar Mangar.
>>
>

-- 
--Noble Paul

Re: Can we use Berkley DB java in Solr

Posted by Yonik Seeley <yo...@apache.org>.

On Tue, Dec 2, 2008 at 12:08 PM, Shalin Shekhar Mangar
<sh...@gmail.com> wrote:
> HSQLDB has a limit of upto 8GB of data. In Solr, you might want to go beyond
> that without a commit.

Ah, so you only use the DB to store the uncommitted docs so you can
quickly reference their fields?

If that's all, we might be able to get that functionality from Lucene
(IndexWriter would need to support getting the latest stored fields by
term at least).

The other option is to actually store the stored fields permanently in
the DB... and get them from the DB instead of Lucene when requested.
That's the road I thought you were going down.

-Yonik

> On Tue, Dec 2, 2008 at 10:33 PM, Dawid Weiss
> <da...@cs.put.poznan.pl>wrote:
>
>>
>> Isn't HSQLDB an option? Its performance ranges a lot depending on the
>> volume of data and queries, but otherwise the license looks BSDish.
>>
>> http://hsqldb.org/web/hsqlLicense.html
>>
>> Dawid
>>
>
>
>
> --
> Regards,
> Shalin Shekhar Mangar.
>

Re: Can we use Berkley DB java in Solr

Posted by Noble Paul നോബിള്‍ नोब्ळ् <no...@gmail.com>.

On Fri, Dec 5, 2008 at 7:46 PM, Yonik Seeley <yo...@apache.org> wrote:
> On Thu, Dec 4, 2008 at 11:49 PM, Noble Paul നോബിള്‍ नोब्ळ्
> <no...@gmail.com> wrote:
>> Considering the fact that the extra Lucene write is over and above the
>> normal indexing I guess we must compare the cost of indexing of 1
>> document in luven vs cost of writing one row in a DB.

>
> I'ts one write vs two.  I'd propose storing the fields that would
> normally not be stored on the document being indexed.
Until the user commits I cannot read the fields.
So if user issues an update before commit I cannot get the field
values. (unless the Lucene API lets me do that)
>
>> Does lucene allow me to write byte[]. ?
>
> Yes... it's not really exposed in Solr yet though.
If the Lucene API allows that I can straightaway use it . Solr does
not have to expose it. Can you give me pointers to that?
>
> -Yonik
>



-- 
--Noble Paul

Re: Can we use Berkley DB java in Solr

Posted by Yonik Seeley <yo...@apache.org>.

On Thu, Dec 4, 2008 at 11:49 PM, Noble Paul നോബിള്‍ नोब्ळ्
<no...@gmail.com> wrote:
> Considering the fact that the extra Lucene write is over and above the
> normal indexing I guess we must compare the cost of indexing of 1
> document in luven vs cost of writing one row in a DB.

I'ts one write vs two.  I'd propose storing the fields that would
normally not be stored on the document being indexed.

> Does lucene allow me to write byte[]. ?

Yes... it's not really exposed in Solr yet though.

-Yonik

Re: Can we use Berkley DB java in Solr

Posted by Noble Paul നോബിള്‍ नोब्ळ् <no...@gmail.com>.

On Fri, Dec 5, 2008 at 12:57 AM, Yonik Seeley <yo...@apache.org> wrote:
> On Thu, Dec 4, 2008 at 11:47 AM, Noble Paul നോബിള്‍ नोब्ळ्
> <no...@gmail.com> wrote:
>> I tried that and the solution looked so clumsy .
>> I need to commit the to read anything was making things difficult
>
> In a high update environment, most documents would be exposed to an
> open reader with no need to commit or reopen the index to retrieve the
> stored fields.
> In a way, solving the more realtime update issue removes the necessity
> for this altogether.
>
>> Is Lucene write much faster than DB (embedded) writes?
>
> More to the point, we're already doing the Lucene write (for the most
> part) anyway, and the DB write is overhead to the indexing process.
Considering the fact that the extra Lucene write is over and above the
normal indexing I guess we must compare the cost of indexing of 1
document in luven vs cost of writing one row in a DB.
DB gives me an option of writing to a remote m/c . Thus freeing up my
local disk. Lucene has to write to Local disk

In DB I am writing a byte[] (which is quite compressed) . Lucene may
end up writing more data. So more disk I/O (I am just giving a theory
).
Does lucene allow me to write byte[]. ?

The Lucene API itself is more complex for this kind of operations.
(disclaimer: I do not know a whole lot of it) .

Moreover this is just an UpdateRequestProcessor (No changes to the
core). We can have a Lucene based one also.

Most of the users would not use this feature (the perf sensistive
users).The ones who do random updates will not notice it.
The only problem is for users who index heavily and still want to enable this.


>
> -Yonik
>
>> On Thu, Dec 4, 2008 at 10:07 PM, Yonik Seeley <yo...@apache.org> wrote:
>>> A database, just to store uncommitted documents in case they might be
>>> updated, seems like it will have a pretty major impact on indexing
>>> performance.  A lucene-only implementation would seem to be much
>>> lighter on resources.
>>>
>>> -Yonik
>>>
>>> On Thu, Dec 4, 2008 at 11:32 AM, Noble Paul നോബിള്‍ नोब्ळ्
>>> <no...@gmail.com> wrote:
>>>> The solution will be an UpdateRequestProcessor (which itself is
>>>> pluggable).I am implementing a JDBC based one. I'll test with H2 and
>>>> MySql (and may be Derby)
>>>>
>>>> We will ship the H2 (embedded) jar
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>> On Thu, Dec 4, 2008 at 9:53 PM, Ryan McKinley <ry...@gmail.com> wrote:
>>>>> Again, I would hope that solr builds a storage agnostic solution.
>>>>>
>>>>> As long as we have a simple interface to load/store documents, it should be
>>>>> easy to write a JDBC/ehcache/disk/Cassandra/whatever implementation.
>>>>>
>>>>> ryan
>>>>>
>>>>>
>>>>> On Dec 4, 2008, at 10:29 AM, Noble Paul നോബിള്‍ नोब्ळ् wrote:
>>>>>
>>>>>> Cassandra does not meet our requirements.
>>>>>> we do not need that kind of scalability
>>>>>>
>>>>>> Moreover its future is uncertain and they are trying to incubate it into
>>>>>> Solr
>>>>>>
>>>>>>
>>>>>> On Thu, Dec 4, 2008 at 8:52 PM, Sami Siren <ss...@gmail.com> wrote:
>>>>>>>
>>>>>>> Yet another possibility: http://wiki.apache.org/incubator/Cassandra
>>>>>>>
>>>>>>> It at least claims to be scalable, no personal experience.
>>>>>>>
>>>>>>> --
>>>>>>> Sami Siren
>>>>>>>
>>>>>>> Noble Paul ??????? ?????? wrote:
>>>>>>>>
>>>>>>>> Another persistence solution is ehcache with diskstore. It even has
>>>>>>>> replication
>>>>>>>>
>>>>>>>> I have never used  ehcache . So I cannot comment on it
>>>>>>>>
>>>>>>>> any comments?
>>>>>>>>
>>>>>>>> --Noble
>>>>>>>>
>>>>>>>> On Wed, Dec 3, 2008 at 8:50 PM, Noble Paul ??????? ??????
>>>>>>>> <no...@gmail.com> wrote:
>>>>>>>>
>>>>>>>>>
>>>>>>>>> On Wed, Dec 3, 2008 at 5:52 PM, Grant Ingersoll <gs...@apache.org>
>>>>>>>>> wrote:
>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> On Dec 3, 2008, at 1:28 AM, Noble Paul ??????? ?????? wrote:
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> The code can be written against JDBC. But we need to test the DDL and
>>>>>>>>>>> data types on al the supported DBs
>>>>>>>>>>>
>>>>>>>>>>> But , which one would we like to ship with Solr as a default option?
>>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> Why do we need a default option?  Is this something that is intended
>>>>>>>>>> to
>>>>>>>>>> be
>>>>>>>>>> on by default?  Or, do you mean just to have one for unit tests to
>>>>>>>>>> work?
>>>>>>>>>>
>>>>>>>>>
>>>>>>>>> Default does not mean that it is enabled bby default. But if it is
>>>>>>>>> enabled I can have defaults for stuff like driver, url , DDL etc. And
>>>>>>>>> the user may not need to provide an extra jar
>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> I don't know if it is still the case, but I often find embedded dbs to
>>>>>>>>>> be
>>>>>>>>>> quite annoying since you often can't connect to them from other
>>>>>>>>>> clients
>>>>>>>>>> outside of the JVM which makes debugging harder.  Of course, maybe I
>>>>>>>>>> just
>>>>>>>>>> don't know the tricks to do it.  Derby is one DB that you can still
>>>>>>>>>> connect
>>>>>>>>>> to even when it is embedded.
>>>>>>>>>>
>>>>>>>>>
>>>>>>>>> Embedded is the best bet for us because of performance reasons and
>>>>>>>>> zero management.
>>>>>>>>> The users can still read the data through Solr itself .
>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> Also, whatever is chosen needs to scale to millions of documents, and
>>>>>>>>>> I
>>>>>>>>>> wonder about an embedded DB doing that.  I also have a hard time
>>>>>>>>>> believing
>>>>>>>>>> that both a DB w/ millions of docs and Solr can live on the same
>>>>>>>>>> machine,
>>>>>>>>>> which is presumably what an embedded DB must do.  Presumably, it also
>>>>>>>>>> needs
>>>>>>>>>> to be able to be replicated, right?
>>>>>>>>>>
>>>>>>>>>
>>>>>>>>> millions of docs.?
>>>>>>>>> then you must configure a remote DB for storage reasons
>>>>>>>>> and must manage the replication separately
>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> H2 looks impressive. the jar (small)  is just 667KB and the memory
>>>>>>>>>>> footprint is small too
>>>>>>>>>>> --Noble
>>>>>>>>>>>
>>>>>>>>>>> On Wed, Dec 3, 2008 at 10:30 AM, Ryan McKinley <ry...@gmail.com>
>>>>>>>>>>> wrote:
>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>> check http://www.h2database.com/  in my view the best embedded DB
>>>>>>>>>>>> out
>>>>>>>>>>>> there.
>>>>>>>>>>>>
>>>>>>>>>>>> from the maker of HSQLDB...  is second round.
>>>>>>>>>>>>
>>>>>>>>>>>> However, from anything solr, I would hope it would just rely on
>>>>>>>>>>>> JDBC.
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>> On Dec 2, 2008, at 12:08 PM, Shalin Shekhar Mangar wrote:
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>> HSQLDB has a limit of upto 8GB of data. In Solr, you might want to
>>>>>>>>>>>>> go
>>>>>>>>>>>>> beyond
>>>>>>>>>>>>> that without a commit.
>>>>>>>>>>>>>
>>>>>>>>>>>>> On Tue, Dec 2, 2008 at 10:33 PM, Dawid Weiss
>>>>>>>>>>>>> <da...@cs.put.poznan.pl>wrote:
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> Isn't HSQLDB an option? Its performance ranges a lot depending on
>>>>>>>>>>>>>> the
>>>>>>>>>>>>>> volume of data and queries, but otherwise the license looks
>>>>>>>>>>>>>> BSDish.
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> http://hsqldb.org/web/hsqlLicense.html
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> Dawid
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>> --
>>>>>>>>>>>>> Regards,
>>>>>>>>>>>>> Shalin Shekhar Mangar.
>>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> --
>>>>>>>>>>> --Noble Paul
>>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> --------------------------
>>>>>>>>>> Grant Ingersoll
>>>>>>>>>>
>>>>>>>>>> Lucene Helpful Hints:
>>>>>>>>>> http://wiki.apache.org/lucene-java/BasicsOfPerformance
>>>>>>>>>> http://wiki.apache.org/lucene-java/LuceneFAQ
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>
>>>>>>>>> --
>>>>>>>>> --Noble Paul
>>>>>>>>>
>>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>> --
>>>>>> --Noble Paul
>>>>>
>>>>>
>>>>
>>>>
>>>>
>>>> --
>>>> --Noble Paul
>>>>
>>>
>>
>>
>>
>> --
>> --Noble Paul
>>
>



-- 
--Noble Paul

Re: Can we use Berkley DB java in Solr

Posted by Yonik Seeley <yo...@apache.org>.

On Thu, Dec 4, 2008 at 11:47 AM, Noble Paul നോബിള്‍ नोब्ळ्
<no...@gmail.com> wrote:
> I tried that and the solution looked so clumsy .
> I need to commit the to read anything was making things difficult

In a high update environment, most documents would be exposed to an
open reader with no need to commit or reopen the index to retrieve the
stored fields.
In a way, solving the more realtime update issue removes the necessity
for this altogether.

> Is Lucene write much faster than DB (embedded) writes?

More to the point, we're already doing the Lucene write (for the most
part) anyway, and the DB write is overhead to the indexing process.

-Yonik

> On Thu, Dec 4, 2008 at 10:07 PM, Yonik Seeley <yo...@apache.org> wrote:
>> A database, just to store uncommitted documents in case they might be
>> updated, seems like it will have a pretty major impact on indexing
>> performance.  A lucene-only implementation would seem to be much
>> lighter on resources.
>>
>> -Yonik
>>
>> On Thu, Dec 4, 2008 at 11:32 AM, Noble Paul നോബിള്‍ नोब्ळ्
>> <no...@gmail.com> wrote:
>>> The solution will be an UpdateRequestProcessor (which itself is
>>> pluggable).I am implementing a JDBC based one. I'll test with H2 and
>>> MySql (and may be Derby)
>>>
>>> We will ship the H2 (embedded) jar
>>>
>>>
>>>
>>>
>>>
>>>
>>> On Thu, Dec 4, 2008 at 9:53 PM, Ryan McKinley <ry...@gmail.com> wrote:
>>>> Again, I would hope that solr builds a storage agnostic solution.
>>>>
>>>> As long as we have a simple interface to load/store documents, it should be
>>>> easy to write a JDBC/ehcache/disk/Cassandra/whatever implementation.
>>>>
>>>> ryan
>>>>
>>>>
>>>> On Dec 4, 2008, at 10:29 AM, Noble Paul നോബിള്‍ नोब्ळ् wrote:
>>>>
>>>>> Cassandra does not meet our requirements.
>>>>> we do not need that kind of scalability
>>>>>
>>>>> Moreover its future is uncertain and they are trying to incubate it into
>>>>> Solr
>>>>>
>>>>>
>>>>> On Thu, Dec 4, 2008 at 8:52 PM, Sami Siren <ss...@gmail.com> wrote:
>>>>>>
>>>>>> Yet another possibility: http://wiki.apache.org/incubator/Cassandra
>>>>>>
>>>>>> It at least claims to be scalable, no personal experience.
>>>>>>
>>>>>> --
>>>>>> Sami Siren
>>>>>>
>>>>>> Noble Paul ??????? ?????? wrote:
>>>>>>>
>>>>>>> Another persistence solution is ehcache with diskstore. It even has
>>>>>>> replication
>>>>>>>
>>>>>>> I have never used  ehcache . So I cannot comment on it
>>>>>>>
>>>>>>> any comments?
>>>>>>>
>>>>>>> --Noble
>>>>>>>
>>>>>>> On Wed, Dec 3, 2008 at 8:50 PM, Noble Paul ??????? ??????
>>>>>>> <no...@gmail.com> wrote:
>>>>>>>
>>>>>>>>
>>>>>>>> On Wed, Dec 3, 2008 at 5:52 PM, Grant Ingersoll <gs...@apache.org>
>>>>>>>> wrote:
>>>>>>>>
>>>>>>>>>
>>>>>>>>> On Dec 3, 2008, at 1:28 AM, Noble Paul ??????? ?????? wrote:
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> The code can be written against JDBC. But we need to test the DDL and
>>>>>>>>>> data types on al the supported DBs
>>>>>>>>>>
>>>>>>>>>> But , which one would we like to ship with Solr as a default option?
>>>>>>>>>>
>>>>>>>>>
>>>>>>>>> Why do we need a default option?  Is this something that is intended
>>>>>>>>> to
>>>>>>>>> be
>>>>>>>>> on by default?  Or, do you mean just to have one for unit tests to
>>>>>>>>> work?
>>>>>>>>>
>>>>>>>>
>>>>>>>> Default does not mean that it is enabled bby default. But if it is
>>>>>>>> enabled I can have defaults for stuff like driver, url , DDL etc. And
>>>>>>>> the user may not need to provide an extra jar
>>>>>>>>
>>>>>>>>>
>>>>>>>>> I don't know if it is still the case, but I often find embedded dbs to
>>>>>>>>> be
>>>>>>>>> quite annoying since you often can't connect to them from other
>>>>>>>>> clients
>>>>>>>>> outside of the JVM which makes debugging harder.  Of course, maybe I
>>>>>>>>> just
>>>>>>>>> don't know the tricks to do it.  Derby is one DB that you can still
>>>>>>>>> connect
>>>>>>>>> to even when it is embedded.
>>>>>>>>>
>>>>>>>>
>>>>>>>> Embedded is the best bet for us because of performance reasons and
>>>>>>>> zero management.
>>>>>>>> The users can still read the data through Solr itself .
>>>>>>>>
>>>>>>>>>
>>>>>>>>> Also, whatever is chosen needs to scale to millions of documents, and
>>>>>>>>> I
>>>>>>>>> wonder about an embedded DB doing that.  I also have a hard time
>>>>>>>>> believing
>>>>>>>>> that both a DB w/ millions of docs and Solr can live on the same
>>>>>>>>> machine,
>>>>>>>>> which is presumably what an embedded DB must do.  Presumably, it also
>>>>>>>>> needs
>>>>>>>>> to be able to be replicated, right?
>>>>>>>>>
>>>>>>>>
>>>>>>>> millions of docs.?
>>>>>>>> then you must configure a remote DB for storage reasons
>>>>>>>> and must manage the replication separately
>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> H2 looks impressive. the jar (small)  is just 667KB and the memory
>>>>>>>>>> footprint is small too
>>>>>>>>>> --Noble
>>>>>>>>>>
>>>>>>>>>> On Wed, Dec 3, 2008 at 10:30 AM, Ryan McKinley <ry...@gmail.com>
>>>>>>>>>> wrote:
>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> check http://www.h2database.com/  in my view the best embedded DB
>>>>>>>>>>> out
>>>>>>>>>>> there.
>>>>>>>>>>>
>>>>>>>>>>> from the maker of HSQLDB...  is second round.
>>>>>>>>>>>
>>>>>>>>>>> However, from anything solr, I would hope it would just rely on
>>>>>>>>>>> JDBC.
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> On Dec 2, 2008, at 12:08 PM, Shalin Shekhar Mangar wrote:
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>> HSQLDB has a limit of upto 8GB of data. In Solr, you might want to
>>>>>>>>>>>> go
>>>>>>>>>>>> beyond
>>>>>>>>>>>> that without a commit.
>>>>>>>>>>>>
>>>>>>>>>>>> On Tue, Dec 2, 2008 at 10:33 PM, Dawid Weiss
>>>>>>>>>>>> <da...@cs.put.poznan.pl>wrote:
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>> Isn't HSQLDB an option? Its performance ranges a lot depending on
>>>>>>>>>>>>> the
>>>>>>>>>>>>> volume of data and queries, but otherwise the license looks
>>>>>>>>>>>>> BSDish.
>>>>>>>>>>>>>
>>>>>>>>>>>>> http://hsqldb.org/web/hsqlLicense.html
>>>>>>>>>>>>>
>>>>>>>>>>>>> Dawid
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>> --
>>>>>>>>>>>> Regards,
>>>>>>>>>>>> Shalin Shekhar Mangar.
>>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> --
>>>>>>>>>> --Noble Paul
>>>>>>>>>>
>>>>>>>>>
>>>>>>>>> --------------------------
>>>>>>>>> Grant Ingersoll
>>>>>>>>>
>>>>>>>>> Lucene Helpful Hints:
>>>>>>>>> http://wiki.apache.org/lucene-java/BasicsOfPerformance
>>>>>>>>> http://wiki.apache.org/lucene-java/LuceneFAQ
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>
>>>>>>>> --
>>>>>>>> --Noble Paul
>>>>>>>>
>>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>
>>>>>>
>>>>>
>>>>>
>>>>>
>>>>> --
>>>>> --Noble Paul
>>>>
>>>>
>>>
>>>
>>>
>>> --
>>> --Noble Paul
>>>
>>
>
>
>
> --
> --Noble Paul
>

Re: Can we use Berkley DB java in Solr

Posted by Noble Paul നോബിള്‍ नोब्ळ् <no...@gmail.com>.

I tried that and the solution looked so clumsy .
I need to commit the to read anything was making things difficult
DB provides me 'immediate' reads .
I am sure performance will be hit anyway.
Is Lucene write much faster than DB (embedded) writes?
http://www.h2database.com/html/performance.html


On Thu, Dec 4, 2008 at 10:07 PM, Yonik Seeley <yo...@apache.org> wrote:
> A database, just to store uncommitted documents in case they might be
> updated, seems like it will have a pretty major impact on indexing
> performance.  A lucene-only implementation would seem to be much
> lighter on resources.
>
> -Yonik
>
> On Thu, Dec 4, 2008 at 11:32 AM, Noble Paul നോബിള്‍ नोब्ळ्
> <no...@gmail.com> wrote:
>> The solution will be an UpdateRequestProcessor (which itself is
>> pluggable).I am implementing a JDBC based one. I'll test with H2 and
>> MySql (and may be Derby)
>>
>> We will ship the H2 (embedded) jar
>>
>>
>>
>>
>>
>>
>> On Thu, Dec 4, 2008 at 9:53 PM, Ryan McKinley <ry...@gmail.com> wrote:
>>> Again, I would hope that solr builds a storage agnostic solution.
>>>
>>> As long as we have a simple interface to load/store documents, it should be
>>> easy to write a JDBC/ehcache/disk/Cassandra/whatever implementation.
>>>
>>> ryan
>>>
>>>
>>> On Dec 4, 2008, at 10:29 AM, Noble Paul നോബിള്‍ नोब्ळ् wrote:
>>>
>>>> Cassandra does not meet our requirements.
>>>> we do not need that kind of scalability
>>>>
>>>> Moreover its future is uncertain and they are trying to incubate it into
>>>> Solr
>>>>
>>>>
>>>> On Thu, Dec 4, 2008 at 8:52 PM, Sami Siren <ss...@gmail.com> wrote:
>>>>>
>>>>> Yet another possibility: http://wiki.apache.org/incubator/Cassandra
>>>>>
>>>>> It at least claims to be scalable, no personal experience.
>>>>>
>>>>> --
>>>>> Sami Siren
>>>>>
>>>>> Noble Paul ??????? ?????? wrote:
>>>>>>
>>>>>> Another persistence solution is ehcache with diskstore. It even has
>>>>>> replication
>>>>>>
>>>>>> I have never used  ehcache . So I cannot comment on it
>>>>>>
>>>>>> any comments?
>>>>>>
>>>>>> --Noble
>>>>>>
>>>>>> On Wed, Dec 3, 2008 at 8:50 PM, Noble Paul ??????? ??????
>>>>>> <no...@gmail.com> wrote:
>>>>>>
>>>>>>>
>>>>>>> On Wed, Dec 3, 2008 at 5:52 PM, Grant Ingersoll <gs...@apache.org>
>>>>>>> wrote:
>>>>>>>
>>>>>>>>
>>>>>>>> On Dec 3, 2008, at 1:28 AM, Noble Paul ??????? ?????? wrote:
>>>>>>>>
>>>>>>>>
>>>>>>>>>
>>>>>>>>> The code can be written against JDBC. But we need to test the DDL and
>>>>>>>>> data types on al the supported DBs
>>>>>>>>>
>>>>>>>>> But , which one would we like to ship with Solr as a default option?
>>>>>>>>>
>>>>>>>>
>>>>>>>> Why do we need a default option?  Is this something that is intended
>>>>>>>> to
>>>>>>>> be
>>>>>>>> on by default?  Or, do you mean just to have one for unit tests to
>>>>>>>> work?
>>>>>>>>
>>>>>>>
>>>>>>> Default does not mean that it is enabled bby default. But if it is
>>>>>>> enabled I can have defaults for stuff like driver, url , DDL etc. And
>>>>>>> the user may not need to provide an extra jar
>>>>>>>
>>>>>>>>
>>>>>>>> I don't know if it is still the case, but I often find embedded dbs to
>>>>>>>> be
>>>>>>>> quite annoying since you often can't connect to them from other
>>>>>>>> clients
>>>>>>>> outside of the JVM which makes debugging harder.  Of course, maybe I
>>>>>>>> just
>>>>>>>> don't know the tricks to do it.  Derby is one DB that you can still
>>>>>>>> connect
>>>>>>>> to even when it is embedded.
>>>>>>>>
>>>>>>>
>>>>>>> Embedded is the best bet for us because of performance reasons and
>>>>>>> zero management.
>>>>>>> The users can still read the data through Solr itself .
>>>>>>>
>>>>>>>>
>>>>>>>> Also, whatever is chosen needs to scale to millions of documents, and
>>>>>>>> I
>>>>>>>> wonder about an embedded DB doing that.  I also have a hard time
>>>>>>>> believing
>>>>>>>> that both a DB w/ millions of docs and Solr can live on the same
>>>>>>>> machine,
>>>>>>>> which is presumably what an embedded DB must do.  Presumably, it also
>>>>>>>> needs
>>>>>>>> to be able to be replicated, right?
>>>>>>>>
>>>>>>>
>>>>>>> millions of docs.?
>>>>>>> then you must configure a remote DB for storage reasons
>>>>>>> and must manage the replication separately
>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>>>
>>>>>>>>> H2 looks impressive. the jar (small)  is just 667KB and the memory
>>>>>>>>> footprint is small too
>>>>>>>>> --Noble
>>>>>>>>>
>>>>>>>>> On Wed, Dec 3, 2008 at 10:30 AM, Ryan McKinley <ry...@gmail.com>
>>>>>>>>> wrote:
>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> check http://www.h2database.com/  in my view the best embedded DB
>>>>>>>>>> out
>>>>>>>>>> there.
>>>>>>>>>>
>>>>>>>>>> from the maker of HSQLDB...  is second round.
>>>>>>>>>>
>>>>>>>>>> However, from anything solr, I would hope it would just rely on
>>>>>>>>>> JDBC.
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> On Dec 2, 2008, at 12:08 PM, Shalin Shekhar Mangar wrote:
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> HSQLDB has a limit of upto 8GB of data. In Solr, you might want to
>>>>>>>>>>> go
>>>>>>>>>>> beyond
>>>>>>>>>>> that without a commit.
>>>>>>>>>>>
>>>>>>>>>>> On Tue, Dec 2, 2008 at 10:33 PM, Dawid Weiss
>>>>>>>>>>> <da...@cs.put.poznan.pl>wrote:
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>> Isn't HSQLDB an option? Its performance ranges a lot depending on
>>>>>>>>>>>> the
>>>>>>>>>>>> volume of data and queries, but otherwise the license looks
>>>>>>>>>>>> BSDish.
>>>>>>>>>>>>
>>>>>>>>>>>> http://hsqldb.org/web/hsqlLicense.html
>>>>>>>>>>>>
>>>>>>>>>>>> Dawid
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> --
>>>>>>>>>>> Regards,
>>>>>>>>>>> Shalin Shekhar Mangar.
>>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>
>>>>>>>>> --
>>>>>>>>> --Noble Paul
>>>>>>>>>
>>>>>>>>
>>>>>>>> --------------------------
>>>>>>>> Grant Ingersoll
>>>>>>>>
>>>>>>>> Lucene Helpful Hints:
>>>>>>>> http://wiki.apache.org/lucene-java/BasicsOfPerformance
>>>>>>>> http://wiki.apache.org/lucene-java/LuceneFAQ
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>
>>>>>>> --
>>>>>>> --Noble Paul
>>>>>>>
>>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>
>>>>>
>>>>
>>>>
>>>>
>>>> --
>>>> --Noble Paul
>>>
>>>
>>
>>
>>
>> --
>> --Noble Paul
>>
>



-- 
--Noble Paul

Re: Can we use Berkley DB java in Solr

Posted by Yonik Seeley <yo...@apache.org>.

A database, just to store uncommitted documents in case they might be
updated, seems like it will have a pretty major impact on indexing
performance.  A lucene-only implementation would seem to be much
lighter on resources.

-Yonik

On Thu, Dec 4, 2008 at 11:32 AM, Noble Paul നോബിള്‍ नोब्ळ्
<no...@gmail.com> wrote:
> The solution will be an UpdateRequestProcessor (which itself is
> pluggable).I am implementing a JDBC based one. I'll test with H2 and
> MySql (and may be Derby)
>
> We will ship the H2 (embedded) jar
>
>
>
>
>
>
> On Thu, Dec 4, 2008 at 9:53 PM, Ryan McKinley <ry...@gmail.com> wrote:
>> Again, I would hope that solr builds a storage agnostic solution.
>>
>> As long as we have a simple interface to load/store documents, it should be
>> easy to write a JDBC/ehcache/disk/Cassandra/whatever implementation.
>>
>> ryan
>>
>>
>> On Dec 4, 2008, at 10:29 AM, Noble Paul നോബിള്‍ नोब्ळ् wrote:
>>
>>> Cassandra does not meet our requirements.
>>> we do not need that kind of scalability
>>>
>>> Moreover its future is uncertain and they are trying to incubate it into
>>> Solr
>>>
>>>
>>> On Thu, Dec 4, 2008 at 8:52 PM, Sami Siren <ss...@gmail.com> wrote:
>>>>
>>>> Yet another possibility: http://wiki.apache.org/incubator/Cassandra
>>>>
>>>> It at least claims to be scalable, no personal experience.
>>>>
>>>> --
>>>> Sami Siren
>>>>
>>>> Noble Paul ??????? ?????? wrote:
>>>>>
>>>>> Another persistence solution is ehcache with diskstore. It even has
>>>>> replication
>>>>>
>>>>> I have never used  ehcache . So I cannot comment on it
>>>>>
>>>>> any comments?
>>>>>
>>>>> --Noble
>>>>>
>>>>> On Wed, Dec 3, 2008 at 8:50 PM, Noble Paul ??????? ??????
>>>>> <no...@gmail.com> wrote:
>>>>>
>>>>>>
>>>>>> On Wed, Dec 3, 2008 at 5:52 PM, Grant Ingersoll <gs...@apache.org>
>>>>>> wrote:
>>>>>>
>>>>>>>
>>>>>>> On Dec 3, 2008, at 1:28 AM, Noble Paul ??????? ?????? wrote:
>>>>>>>
>>>>>>>
>>>>>>>>
>>>>>>>> The code can be written against JDBC. But we need to test the DDL and
>>>>>>>> data types on al the supported DBs
>>>>>>>>
>>>>>>>> But , which one would we like to ship with Solr as a default option?
>>>>>>>>
>>>>>>>
>>>>>>> Why do we need a default option?  Is this something that is intended
>>>>>>> to
>>>>>>> be
>>>>>>> on by default?  Or, do you mean just to have one for unit tests to
>>>>>>> work?
>>>>>>>
>>>>>>
>>>>>> Default does not mean that it is enabled bby default. But if it is
>>>>>> enabled I can have defaults for stuff like driver, url , DDL etc. And
>>>>>> the user may not need to provide an extra jar
>>>>>>
>>>>>>>
>>>>>>> I don't know if it is still the case, but I often find embedded dbs to
>>>>>>> be
>>>>>>> quite annoying since you often can't connect to them from other
>>>>>>> clients
>>>>>>> outside of the JVM which makes debugging harder.  Of course, maybe I
>>>>>>> just
>>>>>>> don't know the tricks to do it.  Derby is one DB that you can still
>>>>>>> connect
>>>>>>> to even when it is embedded.
>>>>>>>
>>>>>>
>>>>>> Embedded is the best bet for us because of performance reasons and
>>>>>> zero management.
>>>>>> The users can still read the data through Solr itself .
>>>>>>
>>>>>>>
>>>>>>> Also, whatever is chosen needs to scale to millions of documents, and
>>>>>>> I
>>>>>>> wonder about an embedded DB doing that.  I also have a hard time
>>>>>>> believing
>>>>>>> that both a DB w/ millions of docs and Solr can live on the same
>>>>>>> machine,
>>>>>>> which is presumably what an embedded DB must do.  Presumably, it also
>>>>>>> needs
>>>>>>> to be able to be replicated, right?
>>>>>>>
>>>>>>
>>>>>> millions of docs.?
>>>>>> then you must configure a remote DB for storage reasons
>>>>>> and must manage the replication separately
>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>>
>>>>>>>> H2 looks impressive. the jar (small)  is just 667KB and the memory
>>>>>>>> footprint is small too
>>>>>>>> --Noble
>>>>>>>>
>>>>>>>> On Wed, Dec 3, 2008 at 10:30 AM, Ryan McKinley <ry...@gmail.com>
>>>>>>>> wrote:
>>>>>>>>
>>>>>>>>>
>>>>>>>>> check http://www.h2database.com/  in my view the best embedded DB
>>>>>>>>> out
>>>>>>>>> there.
>>>>>>>>>
>>>>>>>>> from the maker of HSQLDB...  is second round.
>>>>>>>>>
>>>>>>>>> However, from anything solr, I would hope it would just rely on
>>>>>>>>> JDBC.
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> On Dec 2, 2008, at 12:08 PM, Shalin Shekhar Mangar wrote:
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> HSQLDB has a limit of upto 8GB of data. In Solr, you might want to
>>>>>>>>>> go
>>>>>>>>>> beyond
>>>>>>>>>> that without a commit.
>>>>>>>>>>
>>>>>>>>>> On Tue, Dec 2, 2008 at 10:33 PM, Dawid Weiss
>>>>>>>>>> <da...@cs.put.poznan.pl>wrote:
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> Isn't HSQLDB an option? Its performance ranges a lot depending on
>>>>>>>>>>> the
>>>>>>>>>>> volume of data and queries, but otherwise the license looks
>>>>>>>>>>> BSDish.
>>>>>>>>>>>
>>>>>>>>>>> http://hsqldb.org/web/hsqlLicense.html
>>>>>>>>>>>
>>>>>>>>>>> Dawid
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> --
>>>>>>>>>> Regards,
>>>>>>>>>> Shalin Shekhar Mangar.
>>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>
>>>>>>>> --
>>>>>>>> --Noble Paul
>>>>>>>>
>>>>>>>
>>>>>>> --------------------------
>>>>>>> Grant Ingersoll
>>>>>>>
>>>>>>> Lucene Helpful Hints:
>>>>>>> http://wiki.apache.org/lucene-java/BasicsOfPerformance
>>>>>>> http://wiki.apache.org/lucene-java/LuceneFAQ
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>
>>>>>> --
>>>>>> --Noble Paul
>>>>>>
>>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>
>>>>
>>>
>>>
>>>
>>> --
>>> --Noble Paul
>>
>>
>
>
>
> --
> --Noble Paul
>

Re: Can we use Berkley DB java in Solr

Posted by Noble Paul നോബിള്‍ नोब्ळ् <no...@gmail.com>.

The solution will be an UpdateRequestProcessor (which itself is
pluggable).I am implementing a JDBC based one. I'll test with H2 and
MySql (and may be Derby)

We will ship the H2 (embedded) jar






On Thu, Dec 4, 2008 at 9:53 PM, Ryan McKinley <ry...@gmail.com> wrote:
> Again, I would hope that solr builds a storage agnostic solution.
>
> As long as we have a simple interface to load/store documents, it should be
> easy to write a JDBC/ehcache/disk/Cassandra/whatever implementation.
>
> ryan
>
>
> On Dec 4, 2008, at 10:29 AM, Noble Paul നോബിള്‍ नोब्ळ् wrote:
>
>> Cassandra does not meet our requirements.
>> we do not need that kind of scalability
>>
>> Moreover its future is uncertain and they are trying to incubate it into
>> Solr
>>
>>
>> On Thu, Dec 4, 2008 at 8:52 PM, Sami Siren <ss...@gmail.com> wrote:
>>>
>>> Yet another possibility: http://wiki.apache.org/incubator/Cassandra
>>>
>>> It at least claims to be scalable, no personal experience.
>>>
>>> --
>>> Sami Siren
>>>
>>> Noble Paul ??????? ?????? wrote:
>>>>
>>>> Another persistence solution is ehcache with diskstore. It even has
>>>> replication
>>>>
>>>> I have never used  ehcache . So I cannot comment on it
>>>>
>>>> any comments?
>>>>
>>>> --Noble
>>>>
>>>> On Wed, Dec 3, 2008 at 8:50 PM, Noble Paul ??????? ??????
>>>> <no...@gmail.com> wrote:
>>>>
>>>>>
>>>>> On Wed, Dec 3, 2008 at 5:52 PM, Grant Ingersoll <gs...@apache.org>
>>>>> wrote:
>>>>>
>>>>>>
>>>>>> On Dec 3, 2008, at 1:28 AM, Noble Paul ??????? ?????? wrote:
>>>>>>
>>>>>>
>>>>>>>
>>>>>>> The code can be written against JDBC. But we need to test the DDL and
>>>>>>> data types on al the supported DBs
>>>>>>>
>>>>>>> But , which one would we like to ship with Solr as a default option?
>>>>>>>
>>>>>>
>>>>>> Why do we need a default option?  Is this something that is intended
>>>>>> to
>>>>>> be
>>>>>> on by default?  Or, do you mean just to have one for unit tests to
>>>>>> work?
>>>>>>
>>>>>
>>>>> Default does not mean that it is enabled bby default. But if it is
>>>>> enabled I can have defaults for stuff like driver, url , DDL etc. And
>>>>> the user may not need to provide an extra jar
>>>>>
>>>>>>
>>>>>> I don't know if it is still the case, but I often find embedded dbs to
>>>>>> be
>>>>>> quite annoying since you often can't connect to them from other
>>>>>> clients
>>>>>> outside of the JVM which makes debugging harder.  Of course, maybe I
>>>>>> just
>>>>>> don't know the tricks to do it.  Derby is one DB that you can still
>>>>>> connect
>>>>>> to even when it is embedded.
>>>>>>
>>>>>
>>>>> Embedded is the best bet for us because of performance reasons and
>>>>> zero management.
>>>>> The users can still read the data through Solr itself .
>>>>>
>>>>>>
>>>>>> Also, whatever is chosen needs to scale to millions of documents, and
>>>>>> I
>>>>>> wonder about an embedded DB doing that.  I also have a hard time
>>>>>> believing
>>>>>> that both a DB w/ millions of docs and Solr can live on the same
>>>>>> machine,
>>>>>> which is presumably what an embedded DB must do.  Presumably, it also
>>>>>> needs
>>>>>> to be able to be replicated, right?
>>>>>>
>>>>>
>>>>> millions of docs.?
>>>>> then you must configure a remote DB for storage reasons
>>>>> and must manage the replication separately
>>>>>
>>>>>>
>>>>>>
>>>>>>>
>>>>>>> H2 looks impressive. the jar (small)  is just 667KB and the memory
>>>>>>> footprint is small too
>>>>>>> --Noble
>>>>>>>
>>>>>>> On Wed, Dec 3, 2008 at 10:30 AM, Ryan McKinley <ry...@gmail.com>
>>>>>>> wrote:
>>>>>>>
>>>>>>>>
>>>>>>>> check http://www.h2database.com/  in my view the best embedded DB
>>>>>>>> out
>>>>>>>> there.
>>>>>>>>
>>>>>>>> from the maker of HSQLDB...  is second round.
>>>>>>>>
>>>>>>>> However, from anything solr, I would hope it would just rely on
>>>>>>>> JDBC.
>>>>>>>>
>>>>>>>>
>>>>>>>> On Dec 2, 2008, at 12:08 PM, Shalin Shekhar Mangar wrote:
>>>>>>>>
>>>>>>>>
>>>>>>>>>
>>>>>>>>> HSQLDB has a limit of upto 8GB of data. In Solr, you might want to
>>>>>>>>> go
>>>>>>>>> beyond
>>>>>>>>> that without a commit.
>>>>>>>>>
>>>>>>>>> On Tue, Dec 2, 2008 at 10:33 PM, Dawid Weiss
>>>>>>>>> <da...@cs.put.poznan.pl>wrote:
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> Isn't HSQLDB an option? Its performance ranges a lot depending on
>>>>>>>>>> the
>>>>>>>>>> volume of data and queries, but otherwise the license looks
>>>>>>>>>> BSDish.
>>>>>>>>>>
>>>>>>>>>> http://hsqldb.org/web/hsqlLicense.html
>>>>>>>>>>
>>>>>>>>>> Dawid
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>
>>>>>>>>> --
>>>>>>>>> Regards,
>>>>>>>>> Shalin Shekhar Mangar.
>>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>
>>>>>>> --
>>>>>>> --Noble Paul
>>>>>>>
>>>>>>
>>>>>> --------------------------
>>>>>> Grant Ingersoll
>>>>>>
>>>>>> Lucene Helpful Hints:
>>>>>> http://wiki.apache.org/lucene-java/BasicsOfPerformance
>>>>>> http://wiki.apache.org/lucene-java/LuceneFAQ
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>
>>>>> --
>>>>> --Noble Paul
>>>>>
>>>>>
>>>>
>>>>
>>>>
>>>>
>>>
>>>
>>
>>
>>
>> --
>> --Noble Paul
>
>



-- 
--Noble Paul

Re: Can we use Berkley DB java in Solr

Posted by Ryan McKinley <ry...@gmail.com>.

Again, I would hope that solr builds a storage agnostic solution.

As long as we have a simple interface to load/store documents, it  
should be easy to write a JDBC/ehcache/disk/Cassandra/whatever  
implementation.

ryan


On Dec 4, 2008, at 10:29 AM, Noble Paul നോബിള്‍  
नोब्ळ् wrote:

> Cassandra does not meet our requirements.
> we do not need that kind of scalability
>
> Moreover its future is uncertain and they are trying to incubate it  
> into Solr
>
>
> On Thu, Dec 4, 2008 at 8:52 PM, Sami Siren <ss...@gmail.com> wrote:
>> Yet another possibility: http://wiki.apache.org/incubator/Cassandra
>>
>> It at least claims to be scalable, no personal experience.
>>
>> --
>> Sami Siren
>>
>> Noble Paul ??????? ?????? wrote:
>>>
>>> Another persistence solution is ehcache with diskstore. It even has
>>> replication
>>>
>>> I have never used  ehcache . So I cannot comment on it
>>>
>>> any comments?
>>>
>>> --Noble
>>>
>>> On Wed, Dec 3, 2008 at 8:50 PM, Noble Paul ??????? ??????
>>> <no...@gmail.com> wrote:
>>>
>>>>
>>>> On Wed, Dec 3, 2008 at 5:52 PM, Grant Ingersoll <gsingers@apache.org 
>>>> >
>>>> wrote:
>>>>
>>>>>
>>>>> On Dec 3, 2008, at 1:28 AM, Noble Paul ??????? ?????? wrote:
>>>>>
>>>>>
>>>>>>
>>>>>> The code can be written against JDBC. But we need to test the  
>>>>>> DDL and
>>>>>> data types on al the supported DBs
>>>>>>
>>>>>> But , which one would we like to ship with Solr as a default  
>>>>>> option?
>>>>>>
>>>>>
>>>>> Why do we need a default option?  Is this something that is  
>>>>> intended to
>>>>> be
>>>>> on by default?  Or, do you mean just to have one for unit tests  
>>>>> to work?
>>>>>
>>>>
>>>> Default does not mean that it is enabled bby default. But if it is
>>>> enabled I can have defaults for stuff like driver, url , DDL etc.  
>>>> And
>>>> the user may not need to provide an extra jar
>>>>
>>>>>
>>>>> I don't know if it is still the case, but I often find embedded  
>>>>> dbs to
>>>>> be
>>>>> quite annoying since you often can't connect to them from other  
>>>>> clients
>>>>> outside of the JVM which makes debugging harder.  Of course,  
>>>>> maybe I
>>>>> just
>>>>> don't know the tricks to do it.  Derby is one DB that you can  
>>>>> still
>>>>> connect
>>>>> to even when it is embedded.
>>>>>
>>>>
>>>> Embedded is the best bet for us because of performance reasons and
>>>> zero management.
>>>> The users can still read the data through Solr itself .
>>>>
>>>>>
>>>>> Also, whatever is chosen needs to scale to millions of  
>>>>> documents, and I
>>>>> wonder about an embedded DB doing that.  I also have a hard time
>>>>> believing
>>>>> that both a DB w/ millions of docs and Solr can live on the same
>>>>> machine,
>>>>> which is presumably what an embedded DB must do.  Presumably, it  
>>>>> also
>>>>> needs
>>>>> to be able to be replicated, right?
>>>>>
>>>>
>>>> millions of docs.?
>>>> then you must configure a remote DB for storage reasons
>>>> and must manage the replication separately
>>>>
>>>>>
>>>>>
>>>>>>
>>>>>> H2 looks impressive. the jar (small)  is just 667KB and the  
>>>>>> memory
>>>>>> footprint is small too
>>>>>> --Noble
>>>>>>
>>>>>> On Wed, Dec 3, 2008 at 10:30 AM, Ryan McKinley  
>>>>>> <ry...@gmail.com>
>>>>>> wrote:
>>>>>>
>>>>>>>
>>>>>>> check http://www.h2database.com/  in my view the best embedded  
>>>>>>> DB out
>>>>>>> there.
>>>>>>>
>>>>>>> from the maker of HSQLDB...  is second round.
>>>>>>>
>>>>>>> However, from anything solr, I would hope it would just rely  
>>>>>>> on JDBC.
>>>>>>>
>>>>>>>
>>>>>>> On Dec 2, 2008, at 12:08 PM, Shalin Shekhar Mangar wrote:
>>>>>>>
>>>>>>>
>>>>>>>>
>>>>>>>> HSQLDB has a limit of upto 8GB of data. In Solr, you might  
>>>>>>>> want to go
>>>>>>>> beyond
>>>>>>>> that without a commit.
>>>>>>>>
>>>>>>>> On Tue, Dec 2, 2008 at 10:33 PM, Dawid Weiss
>>>>>>>> <da...@cs.put.poznan.pl>wrote:
>>>>>>>>
>>>>>>>>
>>>>>>>>>
>>>>>>>>> Isn't HSQLDB an option? Its performance ranges a lot  
>>>>>>>>> depending on
>>>>>>>>> the
>>>>>>>>> volume of data and queries, but otherwise the license looks  
>>>>>>>>> BSDish.
>>>>>>>>>
>>>>>>>>> http://hsqldb.org/web/hsqlLicense.html
>>>>>>>>>
>>>>>>>>> Dawid
>>>>>>>>>
>>>>>>>>>
>>>>>>>>
>>>>>>>> --
>>>>>>>> Regards,
>>>>>>>> Shalin Shekhar Mangar.
>>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>
>>>>>> --
>>>>>> --Noble Paul
>>>>>>
>>>>>
>>>>> --------------------------
>>>>> Grant Ingersoll
>>>>>
>>>>> Lucene Helpful Hints:
>>>>> http://wiki.apache.org/lucene-java/BasicsOfPerformance
>>>>> http://wiki.apache.org/lucene-java/LuceneFAQ
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>
>>>> --
>>>> --Noble Paul
>>>>
>>>>
>>>
>>>
>>>
>>>
>>
>>
>
>
>
> -- 
> --Noble Paul

Re: Can we use Berkley DB java in Solr

Posted by Noble Paul നോബിള്‍ नोब्ळ् <no...@gmail.com>.

Cassandra does not meet our requirements.
we do not need that kind of scalability

Moreover its future is uncertain and they are trying to incubate it into Solr


On Thu, Dec 4, 2008 at 8:52 PM, Sami Siren <ss...@gmail.com> wrote:
> Yet another possibility: http://wiki.apache.org/incubator/Cassandra
>
> It at least claims to be scalable, no personal experience.
>
> --
> Sami Siren
>
> Noble Paul ??????? ?????? wrote:
>>
>> Another persistence solution is ehcache with diskstore. It even has
>> replication
>>
>> I have never used  ehcache . So I cannot comment on it
>>
>> any comments?
>>
>> --Noble
>>
>> On Wed, Dec 3, 2008 at 8:50 PM, Noble Paul ??????? ??????
>> <no...@gmail.com> wrote:
>>
>>>
>>> On Wed, Dec 3, 2008 at 5:52 PM, Grant Ingersoll <gs...@apache.org>
>>> wrote:
>>>
>>>>
>>>> On Dec 3, 2008, at 1:28 AM, Noble Paul ??????? ?????? wrote:
>>>>
>>>>
>>>>>
>>>>> The code can be written against JDBC. But we need to test the DDL and
>>>>> data types on al the supported DBs
>>>>>
>>>>> But , which one would we like to ship with Solr as a default option?
>>>>>
>>>>
>>>> Why do we need a default option?  Is this something that is intended to
>>>> be
>>>> on by default?  Or, do you mean just to have one for unit tests to work?
>>>>
>>>
>>> Default does not mean that it is enabled bby default. But if it is
>>> enabled I can have defaults for stuff like driver, url , DDL etc. And
>>> the user may not need to provide an extra jar
>>>
>>>>
>>>> I don't know if it is still the case, but I often find embedded dbs to
>>>> be
>>>> quite annoying since you often can't connect to them from other clients
>>>> outside of the JVM which makes debugging harder.  Of course, maybe I
>>>> just
>>>> don't know the tricks to do it.  Derby is one DB that you can still
>>>> connect
>>>> to even when it is embedded.
>>>>
>>>
>>> Embedded is the best bet for us because of performance reasons and
>>> zero management.
>>> The users can still read the data through Solr itself .
>>>
>>>>
>>>> Also, whatever is chosen needs to scale to millions of documents, and I
>>>> wonder about an embedded DB doing that.  I also have a hard time
>>>> believing
>>>> that both a DB w/ millions of docs and Solr can live on the same
>>>> machine,
>>>> which is presumably what an embedded DB must do.  Presumably, it also
>>>> needs
>>>> to be able to be replicated, right?
>>>>
>>>
>>> millions of docs.?
>>> then you must configure a remote DB for storage reasons
>>> and must manage the replication separately
>>>
>>>>
>>>>
>>>>>
>>>>> H2 looks impressive. the jar (small)  is just 667KB and the memory
>>>>> footprint is small too
>>>>> --Noble
>>>>>
>>>>> On Wed, Dec 3, 2008 at 10:30 AM, Ryan McKinley <ry...@gmail.com>
>>>>> wrote:
>>>>>
>>>>>>
>>>>>> check http://www.h2database.com/  in my view the best embedded DB out
>>>>>> there.
>>>>>>
>>>>>> from the maker of HSQLDB...  is second round.
>>>>>>
>>>>>> However, from anything solr, I would hope it would just rely on JDBC.
>>>>>>
>>>>>>
>>>>>> On Dec 2, 2008, at 12:08 PM, Shalin Shekhar Mangar wrote:
>>>>>>
>>>>>>
>>>>>>>
>>>>>>> HSQLDB has a limit of upto 8GB of data. In Solr, you might want to go
>>>>>>> beyond
>>>>>>> that without a commit.
>>>>>>>
>>>>>>> On Tue, Dec 2, 2008 at 10:33 PM, Dawid Weiss
>>>>>>> <da...@cs.put.poznan.pl>wrote:
>>>>>>>
>>>>>>>
>>>>>>>>
>>>>>>>> Isn't HSQLDB an option? Its performance ranges a lot depending on
>>>>>>>> the
>>>>>>>> volume of data and queries, but otherwise the license looks BSDish.
>>>>>>>>
>>>>>>>> http://hsqldb.org/web/hsqlLicense.html
>>>>>>>>
>>>>>>>> Dawid
>>>>>>>>
>>>>>>>>
>>>>>>>
>>>>>>> --
>>>>>>> Regards,
>>>>>>> Shalin Shekhar Mangar.
>>>>>>>
>>>>>>
>>>>>>
>>>>>
>>>>> --
>>>>> --Noble Paul
>>>>>
>>>>
>>>> --------------------------
>>>> Grant Ingersoll
>>>>
>>>> Lucene Helpful Hints:
>>>> http://wiki.apache.org/lucene-java/BasicsOfPerformance
>>>> http://wiki.apache.org/lucene-java/LuceneFAQ
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>
>>> --
>>> --Noble Paul
>>>
>>>
>>
>>
>>
>>
>
>



-- 
--Noble Paul

Re: Can we use Berkley DB java in Solr

Posted by Sami Siren <ss...@gmail.com>.

Yet another possibility: http://wiki.apache.org/incubator/Cassandra

It at least claims to be scalable, no personal experience.

--
 Sami Siren

Noble Paul ??????? ?????? wrote:
> Another persistence solution is ehcache with diskstore. It even has replication
>
> I have never used  ehcache . So I cannot comment on it
>
> any comments?
>
> --Noble
>
> On Wed, Dec 3, 2008 at 8:50 PM, Noble Paul ??????? ??????
> <no...@gmail.com> wrote:
>   
>> On Wed, Dec 3, 2008 at 5:52 PM, Grant Ingersoll <gs...@apache.org> wrote:
>>     
>>> On Dec 3, 2008, at 1:28 AM, Noble Paul ??????? ?????? wrote:
>>>
>>>       
>>>> The code can be written against JDBC. But we need to test the DDL and
>>>> data types on al the supported DBs
>>>>
>>>> But , which one would we like to ship with Solr as a default option?
>>>>         
>>> Why do we need a default option?  Is this something that is intended to be
>>> on by default?  Or, do you mean just to have one for unit tests to work?
>>>       
>> Default does not mean that it is enabled bby default. But if it is
>> enabled I can have defaults for stuff like driver, url , DDL etc. And
>> the user may not need to provide an extra jar
>>     
>>> I don't know if it is still the case, but I often find embedded dbs to be
>>> quite annoying since you often can't connect to them from other clients
>>> outside of the JVM which makes debugging harder.  Of course, maybe I just
>>> don't know the tricks to do it.  Derby is one DB that you can still connect
>>> to even when it is embedded.
>>>       
>> Embedded is the best bet for us because of performance reasons and
>> zero management.
>> The users can still read the data through Solr itself .
>>     
>>> Also, whatever is chosen needs to scale to millions of documents, and I
>>> wonder about an embedded DB doing that.  I also have a hard time believing
>>> that both a DB w/ millions of docs and Solr can live on the same machine,
>>> which is presumably what an embedded DB must do.  Presumably, it also needs
>>> to be able to be replicated, right?
>>>       
>> millions of docs.?
>> then you must configure a remote DB for storage reasons
>> and must manage the replication separately
>>     
>>>       
>>>> H2 looks impressive. the jar (small)  is just 667KB and the memory
>>>> footprint is small too
>>>> --Noble
>>>>
>>>> On Wed, Dec 3, 2008 at 10:30 AM, Ryan McKinley <ry...@gmail.com> wrote:
>>>>         
>>>>> check http://www.h2database.com/  in my view the best embedded DB out
>>>>> there.
>>>>>
>>>>> from the maker of HSQLDB...  is second round.
>>>>>
>>>>> However, from anything solr, I would hope it would just rely on JDBC.
>>>>>
>>>>>
>>>>> On Dec 2, 2008, at 12:08 PM, Shalin Shekhar Mangar wrote:
>>>>>
>>>>>           
>>>>>> HSQLDB has a limit of upto 8GB of data. In Solr, you might want to go
>>>>>> beyond
>>>>>> that without a commit.
>>>>>>
>>>>>> On Tue, Dec 2, 2008 at 10:33 PM, Dawid Weiss
>>>>>> <da...@cs.put.poznan.pl>wrote:
>>>>>>
>>>>>>             
>>>>>>> Isn't HSQLDB an option? Its performance ranges a lot depending on the
>>>>>>> volume of data and queries, but otherwise the license looks BSDish.
>>>>>>>
>>>>>>> http://hsqldb.org/web/hsqlLicense.html
>>>>>>>
>>>>>>> Dawid
>>>>>>>
>>>>>>>               
>>>>>>
>>>>>> --
>>>>>> Regards,
>>>>>> Shalin Shekhar Mangar.
>>>>>>             
>>>>>           
>>>>
>>>> --
>>>> --Noble Paul
>>>>         
>>> --------------------------
>>> Grant Ingersoll
>>>
>>> Lucene Helpful Hints:
>>> http://wiki.apache.org/lucene-java/BasicsOfPerformance
>>> http://wiki.apache.org/lucene-java/LuceneFAQ
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>       
>>
>> --
>> --Noble Paul
>>
>>     
>
>
>
>

Re: Can we use Berkley DB java in Solr

Posted by Noble Paul നോബിള്‍ नोब्ळ् <no...@gmail.com>.

Another persistence solution is ehcache with diskstore. It even has replication

I have never used  ehcache . So I cannot comment on it

any comments?

--Noble

On Wed, Dec 3, 2008 at 8:50 PM, Noble Paul നോബിള്‍ नोब्ळ्
<no...@gmail.com> wrote:
> On Wed, Dec 3, 2008 at 5:52 PM, Grant Ingersoll <gs...@apache.org> wrote:
>>
>> On Dec 3, 2008, at 1:28 AM, Noble Paul നോബിള്‍ नोब्ळ् wrote:
>>
>>> The code can be written against JDBC. But we need to test the DDL and
>>> data types on al the supported DBs
>>>
>>> But , which one would we like to ship with Solr as a default option?
>>
>> Why do we need a default option?  Is this something that is intended to be
>> on by default?  Or, do you mean just to have one for unit tests to work?
> Default does not mean that it is enabled bby default. But if it is
> enabled I can have defaults for stuff like driver, url , DDL etc. And
> the user may not need to provide an extra jar
>>
>> I don't know if it is still the case, but I often find embedded dbs to be
>> quite annoying since you often can't connect to them from other clients
>> outside of the JVM which makes debugging harder.  Of course, maybe I just
>> don't know the tricks to do it.  Derby is one DB that you can still connect
>> to even when it is embedded.
> Embedded is the best bet for us because of performance reasons and
> zero management.
> The users can still read the data through Solr itself .
>>
>> Also, whatever is chosen needs to scale to millions of documents, and I
>> wonder about an embedded DB doing that.  I also have a hard time believing
>> that both a DB w/ millions of docs and Solr can live on the same machine,
>> which is presumably what an embedded DB must do.  Presumably, it also needs
>> to be able to be replicated, right?
> millions of docs.?
> then you must configure a remote DB for storage reasons
> and must manage the replication separately
>>
>>
>>>
>>>
>>> H2 looks impressive. the jar (small)  is just 667KB and the memory
>>> footprint is small too
>>> --Noble
>>>
>>> On Wed, Dec 3, 2008 at 10:30 AM, Ryan McKinley <ry...@gmail.com> wrote:
>>>>
>>>> check http://www.h2database.com/  in my view the best embedded DB out
>>>> there.
>>>>
>>>> from the maker of HSQLDB...  is second round.
>>>>
>>>> However, from anything solr, I would hope it would just rely on JDBC.
>>>>
>>>>
>>>> On Dec 2, 2008, at 12:08 PM, Shalin Shekhar Mangar wrote:
>>>>
>>>>> HSQLDB has a limit of upto 8GB of data. In Solr, you might want to go
>>>>> beyond
>>>>> that without a commit.
>>>>>
>>>>> On Tue, Dec 2, 2008 at 10:33 PM, Dawid Weiss
>>>>> <da...@cs.put.poznan.pl>wrote:
>>>>>
>>>>>>
>>>>>> Isn't HSQLDB an option? Its performance ranges a lot depending on the
>>>>>> volume of data and queries, but otherwise the license looks BSDish.
>>>>>>
>>>>>> http://hsqldb.org/web/hsqlLicense.html
>>>>>>
>>>>>> Dawid
>>>>>>
>>>>>
>>>>>
>>>>>
>>>>> --
>>>>> Regards,
>>>>> Shalin Shekhar Mangar.
>>>>
>>>>
>>>
>>>
>>>
>>> --
>>> --Noble Paul
>>
>> --------------------------
>> Grant Ingersoll
>>
>> Lucene Helpful Hints:
>> http://wiki.apache.org/lucene-java/BasicsOfPerformance
>> http://wiki.apache.org/lucene-java/LuceneFAQ
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>
>
>
> --
> --Noble Paul
>



-- 
--Noble Paul

Re: Can we use Berkley DB java in Solr

Posted by Noble Paul നോബിള്‍ नोब्ळ् <no...@gmail.com>.

On Wed, Dec 3, 2008 at 5:52 PM, Grant Ingersoll <gs...@apache.org> wrote:
>
> On Dec 3, 2008, at 1:28 AM, Noble Paul നോബിള്‍ नोब्ळ् wrote:
>
>> The code can be written against JDBC. But we need to test the DDL and
>> data types on al the supported DBs
>>
>> But , which one would we like to ship with Solr as a default option?
>
> Why do we need a default option?  Is this something that is intended to be
> on by default?  Or, do you mean just to have one for unit tests to work?
Default does not mean that it is enabled bby default. But if it is
enabled I can have defaults for stuff like driver, url , DDL etc. And
the user may not need to provide an extra jar
>
> I don't know if it is still the case, but I often find embedded dbs to be
> quite annoying since you often can't connect to them from other clients
> outside of the JVM which makes debugging harder.  Of course, maybe I just
> don't know the tricks to do it.  Derby is one DB that you can still connect
> to even when it is embedded.
Embedded is the best bet for us because of performance reasons and
zero management.
The users can still read the data through Solr itself .
>
> Also, whatever is chosen needs to scale to millions of documents, and I
> wonder about an embedded DB doing that.  I also have a hard time believing
> that both a DB w/ millions of docs and Solr can live on the same machine,
> which is presumably what an embedded DB must do.  Presumably, it also needs
> to be able to be replicated, right?
millions of docs.?
then you must configure a remote DB for storage reasons
and must manage the replication separately
>
>
>>
>>
>> H2 looks impressive. the jar (small)  is just 667KB and the memory
>> footprint is small too
>> --Noble
>>
>> On Wed, Dec 3, 2008 at 10:30 AM, Ryan McKinley <ry...@gmail.com> wrote:
>>>
>>> check http://www.h2database.com/  in my view the best embedded DB out
>>> there.
>>>
>>> from the maker of HSQLDB...  is second round.
>>>
>>> However, from anything solr, I would hope it would just rely on JDBC.
>>>
>>>
>>> On Dec 2, 2008, at 12:08 PM, Shalin Shekhar Mangar wrote:
>>>
>>>> HSQLDB has a limit of upto 8GB of data. In Solr, you might want to go
>>>> beyond
>>>> that without a commit.
>>>>
>>>> On Tue, Dec 2, 2008 at 10:33 PM, Dawid Weiss
>>>> <da...@cs.put.poznan.pl>wrote:
>>>>
>>>>>
>>>>> Isn't HSQLDB an option? Its performance ranges a lot depending on the
>>>>> volume of data and queries, but otherwise the license looks BSDish.
>>>>>
>>>>> http://hsqldb.org/web/hsqlLicense.html
>>>>>
>>>>> Dawid
>>>>>
>>>>
>>>>
>>>>
>>>> --
>>>> Regards,
>>>> Shalin Shekhar Mangar.
>>>
>>>
>>
>>
>>
>> --
>> --Noble Paul
>
> --------------------------
> Grant Ingersoll
>
> Lucene Helpful Hints:
> http://wiki.apache.org/lucene-java/BasicsOfPerformance
> http://wiki.apache.org/lucene-java/LuceneFAQ
>
>
>
>
>
>
>
>
>
>
>



-- 
--Noble Paul

Re: Can we use Berkley DB java in Solr

Posted by Mark Miller <ma...@gmail.com>.

>> Also, whatever is chosen needs to scale to millions of documents, and 
>> I wonder about an embedded DB doing that.  I also have a hard time 
>> believing that both a DB w/ millions of docs and Solr can live on the 
>> same machine, which is presumably what an embedded DB must do.  
>> Presumably, it also needs to be able to be replicated, right?
>>
>
> h2 can take millions of docs.. but as long as we just rely on JDBC, 
> the SQL scale/replication becomes a standard/known/solved problem
>
>
I've had over 10 million 'docs' in embedded derby...it didn't break a 
sweat. I don't think the embedded part is much of a hindrance...your in 
the same JVM, so you have those limitations, but otherwise its mostly 
the same as none embedded...

If done right, its also easy to abstract out so that switching from 
embedded to non embedded is very very easy.

- Mark

Re: Can we use Berkley DB java in Solr

Posted by Ryan McKinley <ry...@gmail.com>.

On Dec 3, 2008, at 7:22 AM, Grant Ingersoll wrote:

>
> On Dec 3, 2008, at 1:28 AM, Noble Paul നോബിള്‍  
> नोब्ळ् wrote:
>
>> The code can be written against JDBC. But we need to test the DDL and
>> data types on al the supported DBs
>>
>> But , which one would we like to ship with Solr as a default option?
>
> Why do we need a default option?  Is this something that is intended  
> to be on by default?  Or, do you mean just to have one for unit  
> tests to work?
>
> I don't know if it is still the case, but I often find embedded dbs  
> to be quite annoying since you often can't connect to them from  
> other clients outside of the JVM which makes debugging harder.  Of  
> course, maybe I just don't know the tricks to do it.  Derby is one  
> DB that you can still connect to even when it is embedded.
>

also with h2 (unless you are using the in memory version) Check the  
possible connection modes:
http://www.h2database.com/html/features.html#connection_modes

> Also, whatever is chosen needs to scale to millions of documents,  
> and I wonder about an embedded DB doing that.  I also have a hard  
> time believing that both a DB w/ millions of docs and Solr can live  
> on the same machine, which is presumably what an embedded DB must  
> do.  Presumably, it also needs to be able to be replicated, right?
>

h2 can take millions of docs.. but as long as we just rely on JDBC,  
the SQL scale/replication becomes a standard/known/solved problem

ryan





>
>>
>>
>> H2 looks impressive. the jar (small)  is just 667KB and the memory
>> footprint is small too
>> --Noble
>>
>> On Wed, Dec 3, 2008 at 10:30 AM, Ryan McKinley <ry...@gmail.com>  
>> wrote:
>>> check http://www.h2database.com/  in my view the best embedded DB  
>>> out there.
>>>
>>> from the maker of HSQLDB...  is second round.
>>>
>>> However, from anything solr, I would hope it would just rely on  
>>> JDBC.
>>>
>>>
>>> On Dec 2, 2008, at 12:08 PM, Shalin Shekhar Mangar wrote:
>>>
>>>> HSQLDB has a limit of upto 8GB of data. In Solr, you might want  
>>>> to go
>>>> beyond
>>>> that without a commit.
>>>>
>>>> On Tue, Dec 2, 2008 at 10:33 PM, Dawid Weiss
>>>> <da...@cs.put.poznan.pl>wrote:
>>>>
>>>>>
>>>>> Isn't HSQLDB an option? Its performance ranges a lot depending  
>>>>> on the
>>>>> volume of data and queries, but otherwise the license looks  
>>>>> BSDish.
>>>>>
>>>>> http://hsqldb.org/web/hsqlLicense.html
>>>>>
>>>>> Dawid
>>>>>
>>>>
>>>>
>>>>
>>>> --
>>>> Regards,
>>>> Shalin Shekhar Mangar.
>>>
>>>
>>
>>
>>
>> -- 
>> --Noble Paul
>
> --------------------------
> Grant Ingersoll
>
> Lucene Helpful Hints:
> http://wiki.apache.org/lucene-java/BasicsOfPerformance
> http://wiki.apache.org/lucene-java/LuceneFAQ
>
>
>
>
>
>
>
>
>
>

Re: Can we use Berkley DB java in Solr

Posted by Grant Ingersoll <gs...@apache.org>.

On Dec 3, 2008, at 1:28 AM, Noble Paul നോബിള്‍  
नोब्ळ् wrote:

> The code can be written against JDBC. But we need to test the DDL and
> data types on al the supported DBs
>
> But , which one would we like to ship with Solr as a default option?

Why do we need a default option?  Is this something that is intended  
to be on by default?  Or, do you mean just to have one for unit tests  
to work?

I don't know if it is still the case, but I often find embedded dbs to  
be quite annoying since you often can't connect to them from other  
clients outside of the JVM which makes debugging harder.  Of course,  
maybe I just don't know the tricks to do it.  Derby is one DB that you  
can still connect to even when it is embedded.

Also, whatever is chosen needs to scale to millions of documents, and  
I wonder about an embedded DB doing that.  I also have a hard time  
believing that both a DB w/ millions of docs and Solr can live on the  
same machine, which is presumably what an embedded DB must do.   
Presumably, it also needs to be able to be replicated, right?

>
>
> H2 looks impressive. the jar (small)  is just 667KB and the memory
> footprint is small too
> --Noble
>
> On Wed, Dec 3, 2008 at 10:30 AM, Ryan McKinley <ry...@gmail.com>  
> wrote:
>> check http://www.h2database.com/  in my view the best embedded DB  
>> out there.
>>
>> from the maker of HSQLDB...  is second round.
>>
>> However, from anything solr, I would hope it would just rely on JDBC.
>>
>>
>> On Dec 2, 2008, at 12:08 PM, Shalin Shekhar Mangar wrote:
>>
>>> HSQLDB has a limit of upto 8GB of data. In Solr, you might want to  
>>> go
>>> beyond
>>> that without a commit.
>>>
>>> On Tue, Dec 2, 2008 at 10:33 PM, Dawid Weiss
>>> <da...@cs.put.poznan.pl>wrote:
>>>
>>>>
>>>> Isn't HSQLDB an option? Its performance ranges a lot depending on  
>>>> the
>>>> volume of data and queries, but otherwise the license looks BSDish.
>>>>
>>>> http://hsqldb.org/web/hsqlLicense.html
>>>>
>>>> Dawid
>>>>
>>>
>>>
>>>
>>> --
>>> Regards,
>>> Shalin Shekhar Mangar.
>>
>>
>
>
>
> -- 
> --Noble Paul

--------------------------
Grant Ingersoll

Lucene Helpful Hints:
http://wiki.apache.org/lucene-java/BasicsOfPerformance
http://wiki.apache.org/lucene-java/LuceneFAQ

Re: Can we use Berkley DB java in Solr

Posted by Andrzej Bialecki <ab...@getopt.org>.

Noble Paul നോബിള്‍ नोब्ळ् wrote:
> The code can be written against JDBC. But we need to test the DDL and
> data types on al the supported DBs
> 
> But , which one would we like to ship with Solr as a default option?
> 
> H2 looks impressive. the jar (small)  is just 667KB and the memory
> footprint is small too

I agree, this looks solid - well-maintained, impressive performance, 
friendly license. It supports in-memory DBs, too, which is good for testing.


-- 
Best regards,
Andrzej Bialecki     <><
  ___. ___ ___ ___ _ _   __________________________________
[__ || __|__/|__||\/|  Information Retrieval, Semantic Web
___|||__||  \|  ||  |  Embedded Unix, System Integration
http://www.sigram.com  Contact: info at sigram dot com

Re: Can we use Berkley DB java in Solr

Posted by Noble Paul നോബിള്‍ नोब्ळ् <no...@gmail.com>.

The code can be written against JDBC. But we need to test the DDL and
data types on al the supported DBs

But , which one would we like to ship with Solr as a default option?

H2 looks impressive. the jar (small)  is just 667KB and the memory
footprint is small too
--Noble

On Wed, Dec 3, 2008 at 10:30 AM, Ryan McKinley <ry...@gmail.com> wrote:
> check http://www.h2database.com/  in my view the best embedded DB out there.
>
> from the maker of HSQLDB...  is second round.
>
> However, from anything solr, I would hope it would just rely on JDBC.
>
>
> On Dec 2, 2008, at 12:08 PM, Shalin Shekhar Mangar wrote:
>
>> HSQLDB has a limit of upto 8GB of data. In Solr, you might want to go
>> beyond
>> that without a commit.
>>
>> On Tue, Dec 2, 2008 at 10:33 PM, Dawid Weiss
>> <da...@cs.put.poznan.pl>wrote:
>>
>>>
>>> Isn't HSQLDB an option? Its performance ranges a lot depending on the
>>> volume of data and queries, but otherwise the license looks BSDish.
>>>
>>> http://hsqldb.org/web/hsqlLicense.html
>>>
>>> Dawid
>>>
>>
>>
>>
>> --
>> Regards,
>> Shalin Shekhar Mangar.
>
>



-- 
--Noble Paul

Re: Can we use Berkley DB java in Solr

Posted by Ryan McKinley <ry...@gmail.com>.

check http://www.h2database.com/  in my view the best embedded DB out  
there.

from the maker of HSQLDB...  is second round.

However, from anything solr, I would hope it would just rely on JDBC.

On Dec 2, 2008, at 12:08 PM, Shalin Shekhar Mangar wrote:

> HSQLDB has a limit of upto 8GB of data. In Solr, you might want to  
> go beyond
> that without a commit.
>
> On Tue, Dec 2, 2008 at 10:33 PM, Dawid Weiss
> <da...@cs.put.poznan.pl>wrote:
>
>>
>> Isn't HSQLDB an option? Its performance ranges a lot depending on the
>> volume of data and queries, but otherwise the license looks BSDish.
>>
>> http://hsqldb.org/web/hsqlLicense.html
>>
>> Dawid
>>
>
>
>
> -- 
> Regards,
> Shalin Shekhar Mangar.

Re: Can we use Berkley DB java in Solr

Posted by Shalin Shekhar Mangar <sh...@gmail.com>.

HSQLDB has a limit of upto 8GB of data. In Solr, you might want to go beyond
that without a commit.

On Tue, Dec 2, 2008 at 10:33 PM, Dawid Weiss
<da...@cs.put.poznan.pl>wrote:

>
> Isn't HSQLDB an option? Its performance ranges a lot depending on the
> volume of data and queries, but otherwise the license looks BSDish.
>
> http://hsqldb.org/web/hsqlLicense.html
>
> Dawid
>

-- 
Regards,
Shalin Shekhar Mangar.

Re: Can we use Berkley DB java in Solr

Posted by Dawid Weiss <da...@cs.put.poznan.pl>.

Isn't HSQLDB an option? Its performance ranges a lot depending on the volume of 
data and queries, but otherwise the license looks BSDish.

http://hsqldb.org/web/hsqlLicense.html

Dawid

Re: Can we use Berkley DB java in Solr

Posted by Andrzej Bialecki <ab...@getopt.org>.

Jason Rutherglen wrote:
> It's mostly dead and synchronizes on reads and writes.
> 
> On Tue, Dec 2, 2008 at 7:46 AM, Yonik Seeley <yo...@apache.org> wrote:
> 
>> On Tue, Dec 2, 2008 at 10:12 AM, Andrzej Bialecki <ab...@getopt.org> wrote:
>>> Please consider using JDBM, now in the Apache incubator,
>> It doesn't look like it's in the incubator yet... there was interest,
>> but a proposal was never put on the wiki.

Oh, indeed. I remembered reading about it and trying to get it from the 
incubator, but in the end I got it from SF. Sorry for the confusion.

>>> but with a long
>>> history at SF.net and wide usage. Its API is essentially the same as BDB.
>> You wouldn't be able to tell by the SourceForge page - it has the look
>> of a dead project.

The project is semi-dead (a zombie? ;) ), there has been some activity 
recently if you know where to look - which is not the SF CVS repo, but 
confusingly the SVN repo:

	https://jdbm.svn.sourceforge.net/svnroot/jdbm

It would be a shame if this project died, that's about the only option 
out there of a Java dbm-like data store under a liberal license.

-- 
Best regards,
Andrzej Bialecki     <><
  ___. ___ ___ ___ _ _   __________________________________
[__ || __|__/|__||\/|  Information Retrieval, Semantic Web
___|||__||  \|  ||  |  Embedded Unix, System Integration
http://www.sigram.com  Contact: info at sigram dot com

Re: Can we use Berkley DB java in Solr

Posted by Jason Rutherglen <ja...@gmail.com>.

It's mostly dead and synchronizes on reads and writes.

On Tue, Dec 2, 2008 at 7:46 AM, Yonik Seeley <yo...@apache.org> wrote:

> On Tue, Dec 2, 2008 at 10:12 AM, Andrzej Bialecki <ab...@getopt.org> wrote:
> > Please consider using JDBM, now in the Apache incubator,
>
> It doesn't look like it's in the incubator yet... there was interest,
> but a proposal was never put on the wiki.
>
> > but with a long
> > history at SF.net and wide usage. Its API is essentially the same as BDB.
>
> You wouldn't be able to tell by the SourceForge page - it has the look
> of a dead project.
>
> -Yonik
>

Re: Can we use Berkley DB java in Solr

Posted by Yonik Seeley <yo...@apache.org>.

On Tue, Dec 2, 2008 at 10:12 AM, Andrzej Bialecki <ab...@getopt.org> wrote:
> Please consider using JDBM, now in the Apache incubator,

It doesn't look like it's in the incubator yet... there was interest,
but a proposal was never put on the wiki.

> but with a long
> history at SF.net and wide usage. Its API is essentially the same as BDB.

You wouldn't be able to tell by the SourceForge page - it has the look
of a dead project.

-Yonik

Re: Can we use Berkley DB java in Solr

Posted by Andrzej Bialecki <ab...@getopt.org>.

Yonik Seeley wrote:
> On Mon, Dec 1, 2008 at 11:20 PM, Noble Paul നോബിള്‍ नोब्ळ्
> <no...@gmail.com> wrote:
>> BDB-JE does not have a a JDBC driver. It has a java API to read/write.
> 
> I just looked at this license - unfortunately it has a GPL like clause:
> 
>  * 3. Redistributions in any form must be accompanied by information on
>  *    how to obtain complete source code for the DB software and any
>  *    accompanying software that uses the DB software.
> 
> So a company that wanted to base their product on Solr would then have
> to go and buy a license from Oracle.  Not nice, and we should avoid
> IMO.
> 
> Other licenses like LGPL would be OK I think - we can't
> store/distribute in Apache, but it doesn't force users to give away
> the store either.

Please consider using JDBM, now in the Apache incubator, but with a long 
history at SF.net and wide usage. Its API is essentially the same as BDB.



-- 
Best regards,
Andrzej Bialecki     <><
  ___. ___ ___ ___ _ _   __________________________________
[__ || __|__/|__||\/|  Information Retrieval, Semantic Web
___|||__||  \|  ||  |  Embedded Unix, System Integration
http://www.sigram.com  Contact: info at sigram dot com

Re: Can we use Berkley DB java in Solr

Posted by Yonik Seeley <yo...@apache.org>.

On Mon, Dec 1, 2008 at 11:20 PM, Noble Paul നോബിള്‍ नोब्ळ्
<no...@gmail.com> wrote:
> BDB-JE does not have a a JDBC driver. It has a java API to read/write.

I just looked at this license - unfortunately it has a GPL like clause:

 * 3. Redistributions in any form must be accompanied by information on
 *    how to obtain complete source code for the DB software and any
 *    accompanying software that uses the DB software.

So a company that wanted to base their product on Solr would then have
to go and buy a license from Oracle.  Not nice, and we should avoid
IMO.

Other licenses like LGPL would be OK I think - we can't
store/distribute in Apache, but it doesn't force users to give away
the store either.

-Yonik

Re: Can we use Berkley DB java in Solr

Posted by Noble Paul നോബിള്‍ नोब्ळ् <no...@gmail.com>.

BDB-JE does not have a a JDBC driver. It has a java API to read/write.

http://www.oracle.com/technology/products/berkeley-db/pdf/performing%20queries%20in%20oracle%20berkeley%20db%20java%20edition.pdf





On Tue, Dec 2, 2008 at 3:52 AM, Grant Ingersoll <gs...@apache.org> wrote:
> I'm not following why you would need a compile time dep?  Wouldn't your
> dependency be on JDBC?
>
>
> On Dec 1, 2008, at 4:43 AM, Noble Paul നോബിള്‍ नोब्ळ् wrote:
>
>> I guess it is not possible to distribute it w/ apache products because
>> of the license.
>>
>> But can we have a compile time dependency ?
>>
>> I may be interested in that for SOLR-828
>>
>> --
>> --Noble Paul
>
>
>



-- 
--Noble Paul

Re: Can we use Berkley DB java in Solr

Posted by Grant Ingersoll <gs...@apache.org>.

I'm not following why you would need a compile time dep?  Wouldn't  
your dependency be on JDBC?

On Dec 1, 2008, at 4:43 AM, Noble Paul നോബിള്‍  
नोब्ळ् wrote:

> I guess it is not possible to distribute it w/ apache products because
> of the license.
>
> But can we have a compile time dependency ?
>
> I may be interested in that for SOLR-828
>
> -- 
> --Noble Paul