You are viewing a plain text version of this content. The canonical link for it is here.
Posted to java-user@lucene.apache.org by Guilherme Barile <gu...@prosoma.com.br> on 2007/09/04 13:58:07 UTC

Data in the Index [was: JdbcDirectory]

So,
	Anyone ever stored the data in the index also ? What are your  
experiences ?

Thanks a lot

Gui

On Sep 3, 2007, at 3:47 PM, Guilherme Barile wrote:

> Storing the data in the index, mainly for non-structured data.
> We plan to implement something like this ThingDB from http:// 
> demo.openlibrary.org/about/tech, and though that maybe lucene +  
> JdbcDirectory could act as a backend.
>
> gui
>
> On Sep 3, 2007, at 2:34 PM, Askar Zaidi wrote:
>
>> Yes. Every time a user updates a piece of information, you do the  
>> update in
>> the DB as well as the Index. If you are using Hibernate, they have  
>> an API
>> that does this mapping. I am not sure why you plan to store data  
>> in the
>> Index ?? Storing data is the DBs job, searching is the Index job.  
>> I would
>> suggest you have both (a schema for your data and an index). Thats  
>> what I
>> did. I could have stored everything on the Lucene Index, but I am  
>> scared as
>> the application grows I will need a DB system eventually. I don't  
>> think
>> people use Lucene to "store" data just as they do it in a DBMS.
>>
>> -
>> Askar
>>
>> On 9/3/07, Guilherme Barile <gu...@prosoma.com.br> wrote:
>>>
>>>> 1) I don't understand why the index would get corrupted. We store
>>>> huge data
>>>> and meta-data using Lucene.
>>>
>>> I got that information when lucene 1.4 was the lastest version, may
>>> have changed. I'll trust you.
>>>
>>>> 2) For this, I synced Lucene with the DB operations. If you use
>>>> Hibernate,
>>>> theres an API for that. Or, you could just write your own factory
>>>> methods to
>>>> add/delete/edit index documents when a DB operation takes place
>>>> (e.g edit).
>>>
>>> You mean every time you update the db, you update the index also ?
>>> I'm actually planning not to use any external entity, and rely
>>> everything on Lucene. Wondered if some simple query (get the lastest
>>> document for example) would solve the versioning issue
>>>
>>> Thanks a lot
>>>
>>> Gui
>>>
>>>>
>>>> On 9/3/07, Guilherme Barile <gu...@prosoma.com.br> wrote:
>>>>>
>>>>> Hello,
>>>>>         We're starting a new project, which basically catalogs
>>>>> everything
>>>>> we
>>>>> have in the department (different objects with different  
>>>>> metadata),
>>>>> and as I used Lucene before, I'm preparing a presentation to the
>>>>> team, as I think it would really simplify the storage of  
>>>>> metadata and
>>>>> documents.
>>>>>         The system will be pretty straightforward, all items  
>>>>> will be
>>>>> cataloged,  and most of them won't be changed too much ( I'll  
>>>>> raise
>>>>> this question later ).
>>>>>         So, here are my main concerns, hope you can help
>>>>>
>>>>> 1) Storing all data (index and content) wasn't recommended in the
>>>>> past, as the index could become corrupted. Do I have this  
>>>>> problem if
>>>>> I use a JdbcDirectory (PostgreSQL backend) ? I already read  
>>>>> about the
>>>>> performance degradation when using a database as main storage, but
>>>>> this won't be a problem.
>>>>>
>>>>> 2) Lucene doesn't support incremental editing (a new Document  
>>>>> will be
>>>>> created when someone edits an item), so is it possible to  
>>>>> manage some
>>>>> kind of versioning ? Anyone ever implemented something this way ?
>>>>>
>>>>> Thanks a lot for the attention
>>>>>
>>>>> Guilherme Barile
>>>>> Prosoma Informática
>>>>> ------------------------------------------------------------------ 
>>>>> ---
>>>>> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
>>>>> For additional commands, e-mail: java-user-help@lucene.apache.org
>>>>>
>>>>>
>>>
>>>
>>> -------------------------------------------------------------------- 
>>> -
>>> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
>>> For additional commands, e-mail: java-user-help@lucene.apache.org
>>>
>>>
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-user-help@lucene.apache.org
>


---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Re: Data in the Index [was: JdbcDirectory]

Posted by Patrick Turcotte <pa...@gmail.com>.
Hi,

At first, we thought we would use a "dual" approach, an Lucene index
and a RDBMS for storage.

While prototyping, for simplicity sake, we used the Lucene index as
storage, thinking we could easily replace it later. So far, speed is
satisfying enough that we are going to keep data there util retrieval
performance becomes an issue.

But our data is mainly "static", with changes only a few times a week.

Our main data is an XML document.

Hope this helps.

Patrick

On 9/4/07, Guilherme Barile <gu...@prosoma.com.br> wrote:
> So,
>         Anyone ever stored the data in the index also ? What are your
> experiences ?
>
> Thanks a lot
>
> Gui
>

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Re: Data in the Index [was: JdbcDirectory]

Posted by Chris Lu <ch...@gmail.com>.
I store Lucene index outside database, and run indexing periodically to get
the latest updates, not depending on ORM APIs.
In general, search data can be slower to update unless some realtime
requirements.

Storing data in index saves trips to databases. This usually is a huge
difference on rendering speed comparing to retrieve the rest data from
database.

-- 
Chris Lu
-------------------------
Instant Scalable Full-Text Search On Any Database/Application
site: http://www.dbsight.net
demo: http://search.dbsight.com
Lucene Database Search in 3 minutes:
http://wiki.dbsight.com/index.php?title=Create_Lucene_Database_Search_in_3_minutes

On 9/4/07, Guilherme Barile <gu...@prosoma.com.br> wrote:
>
> So,
>         Anyone ever stored the data in the index also ? What are your
> experiences ?
>
> Thanks a lot
>
> Gui
>
> On Sep 3, 2007, at 3:47 PM, Guilherme Barile wrote:
>
> > Storing the data in the index, mainly for non-structured data.
> > We plan to implement something like this ThingDB from http://
> > demo.openlibrary.org/about/tech, and though that maybe lucene +
> > JdbcDirectory could act as a backend.
> >
> > gui
> >
> > On Sep 3, 2007, at 2:34 PM, Askar Zaidi wrote:
> >
> >> Yes. Every time a user updates a piece of information, you do the
> >> update in
> >> the DB as well as the Index. If you are using Hibernate, they have
> >> an API
> >> that does this mapping. I am not sure why you plan to store data
> >> in the
> >> Index ?? Storing data is the DBs job, searching is the Index job.
> >> I would
> >> suggest you have both (a schema for your data and an index). Thats
> >> what I
> >> did. I could have stored everything on the Lucene Index, but I am
> >> scared as
> >> the application grows I will need a DB system eventually. I don't
> >> think
> >> people use Lucene to "store" data just as they do it in a DBMS.
> >>
> >> -
> >> Askar
> >>
> >> On 9/3/07, Guilherme Barile <gu...@prosoma.com.br> wrote:
> >>>
> >>>> 1) I don't understand why the index would get corrupted. We store
> >>>> huge data
> >>>> and meta-data using Lucene.
> >>>
> >>> I got that information when lucene 1.4 was the lastest version, may
> >>> have changed. I'll trust you.
> >>>
> >>>> 2) For this, I synced Lucene with the DB operations. If you use
> >>>> Hibernate,
> >>>> theres an API for that. Or, you could just write your own factory
> >>>> methods to
> >>>> add/delete/edit index documents when a DB operation takes place
> >>>> (e.g edit).
> >>>
> >>> You mean every time you update the db, you update the index also ?
> >>> I'm actually planning not to use any external entity, and rely
> >>> everything on Lucene. Wondered if some simple query (get the lastest
> >>> document for example) would solve the versioning issue
> >>>
> >>> Thanks a lot
> >>>
> >>> Gui
> >>>
> >>>>
> >>>> On 9/3/07, Guilherme Barile <gu...@prosoma.com.br> wrote:
> >>>>>
> >>>>> Hello,
> >>>>>         We're starting a new project, which basically catalogs
> >>>>> everything
> >>>>> we
> >>>>> have in the department (different objects with different
> >>>>> metadata),
> >>>>> and as I used Lucene before, I'm preparing a presentation to the
> >>>>> team, as I think it would really simplify the storage of
> >>>>> metadata and
> >>>>> documents.
> >>>>>         The system will be pretty straightforward, all items
> >>>>> will be
> >>>>> cataloged,  and most of them won't be changed too much ( I'll
> >>>>> raise
> >>>>> this question later ).
> >>>>>         So, here are my main concerns, hope you can help
> >>>>>
> >>>>> 1) Storing all data (index and content) wasn't recommended in the
> >>>>> past, as the index could become corrupted. Do I have this
> >>>>> problem if
> >>>>> I use a JdbcDirectory (PostgreSQL backend) ? I already read
> >>>>> about the
> >>>>> performance degradation when using a database as main storage, but
> >>>>> this won't be a problem.
> >>>>>
> >>>>> 2) Lucene doesn't support incremental editing (a new Document
> >>>>> will be
> >>>>> created when someone edits an item), so is it possible to
> >>>>> manage some
> >>>>> kind of versioning ? Anyone ever implemented something this way ?
> >>>>>
> >>>>> Thanks a lot for the attention
> >>>>>
> >>>>> Guilherme Barile
> >>>>> Prosoma Informática
> >>>>> ------------------------------------------------------------------
> >>>>> ---
> >>>>> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> >>>>> For additional commands, e-mail: java-user-help@lucene.apache.org
> >>>>>
> >>>>>
> >>>
> >>>
> >>> --------------------------------------------------------------------
> >>> -
> >>> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> >>> For additional commands, e-mail: java-user-help@lucene.apache.org
> >>>
> >>>
> >
> >
> > ---------------------------------------------------------------------
> > To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> > For additional commands, e-mail: java-user-help@lucene.apache.org
> >
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-user-help@lucene.apache.org
>
>