You are viewing a plain text version of this content. The canonical link for it is here.
Posted to java-user@lucene.apache.org by Xiaolong Zheng <zh...@gmail.com> on 2016/11/22 04:38:50 UTC

Non-index files under the search directory

Hello,

I am trying to adding some meta data into the search data base. Instead of
adding a new search filed or adding a phony document, I am looking at the
method org.apache.lucene.store.Directory#createOutpu, which is create new
file in the search directory.


I am wondering does indexwriter can also merge this non-index file while it
merging multiple search index?

And if I am stepping back a little bit, what's is the best way to add meta
data into the search database.

For example, I would like to add a indicator which is showing the different
kind of stemmer is being used while it created.





Thanks,

--Xiaolong

Re: Non-index files under the search directory

Posted by András Péteri <ap...@b2international.com>.
Correct, this data is associated with individual IndexCommits (you
should be able to see the key-value pairs in the segment_xy files' raw
contents in an index directory). To consolidate the entries, you'll
have to retrieve user data from each sub-index, put all of them into a
new map, then set this data on the aggregate writer.

On Tue, Nov 22, 2016 at 9:02 PM, Xiaolong Zheng <zh...@gmail.com> wrote:
> Hi András,
>
> Thanks, this is what I need!
>
>  I also notice this user commit data does not carry over if I am
> consolidating several search database into a new one, I guess the solution
> should be explicitly use getCommitData for each sub-index, then set it into
> new consolidated search database, right?
>
> Best,
>
> --Xiaolong
>
>
> On Tue, Nov 22, 2016 at 12:10 PM, András Péteri <apeteri@b2international.com
>> wrote:
>
>> Hi Xiaolong,
>>
>> A Map of key-value pairs can be supplied to
>> IndexWriter#setCommitData(Map<String,String>) and will be persisted
>> when committing changes (setting the commit data counts as a change).
>> It can be retrieved with IndexWriter#getCommitData() later.
>>
>> This may serve as good storage for metadata; as an example,
>> Elasticsearch stores attributes related to its transaction log there
>> (UUID and generation identifier).
>>
>> Regards,
>> András
>>
>> On Tue, Nov 22, 2016 at 5:40 PM, Xiaolong Zheng <zh...@gmail.com>
>> wrote:
>> > Thanks, StoredField seems still down to the per-document level, which
>> means
>> > for every document they will contains this search field.
>> >
>> > What I really would like is a global level storage to hold this single
>> > value. Maybe this is impossible.
>> >
>> > Sincerely,
>> >
>> > --Xiaolong
>> >
>> >
>> > On Tue, Nov 22, 2016 at 5:13 AM, Michael McCandless <
>> > lucene@mikemccandless.com> wrote:
>> >
>> >> Lucene won't merge foreign files for you, and in general it's
>> >> dangerous to put such files into Lucene's index directory because if
>> >> they look like codec files Lucene may delete them.
>> >>
>> >> Can you just add a StoredField to each document to hold your
>> information?
>> >>
>> >> Mike McCandless
>> >>
>> >> http://blog.mikemccandless.com
>> >>
>> >>
>> >> On Mon, Nov 21, 2016 at 11:38 PM, Xiaolong Zheng
>> >> <zh...@gmail.com> wrote:
>> >> > Hello,
>> >> >
>> >> > I am trying to adding some meta data into the search data base.
>> Instead
>> >> of
>> >> > adding a new search filed or adding a phony document, I am looking at
>> the
>> >> > method org.apache.lucene.store.Directory#createOutpu, which is create
>> >> new
>> >> > file in the search directory.
>> >> >
>> >> >
>> >> > I am wondering does indexwriter can also merge this non-index file
>> while
>> >> it
>> >> > merging multiple search index?
>> >> >
>> >> > And if I am stepping back a little bit, what's is the best way to add
>> >> meta
>> >> > data into the search database.
>> >> >
>> >> > For example, I would like to add a indicator which is showing the
>> >> different
>> >> > kind of stemmer is being used while it created.
>> >> >
>> >> >
>> >> >
>> >> >
>> >> >
>> >> > Thanks,
>> >> >
>> >> > --Xiaolong
>> >>
>>
>> --
>> András Péteri
>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
>> For additional commands, e-mail: java-user-help@lucene.apache.org
>>
>>

-- 
András Péteri

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Re: Non-index files under the search directory

Posted by Xiaolong Zheng <zh...@gmail.com>.
Hi András,

Thanks, this is what I need!

 I also notice this user commit data does not carry over if I am
consolidating several search database into a new one, I guess the solution
should be explicitly use getCommitData for each sub-index, then set it into
new consolidated search database, right?

Best,

--Xiaolong


On Tue, Nov 22, 2016 at 12:10 PM, András Péteri <apeteri@b2international.com
> wrote:

> Hi Xiaolong,
>
> A Map of key-value pairs can be supplied to
> IndexWriter#setCommitData(Map<String,String>) and will be persisted
> when committing changes (setting the commit data counts as a change).
> It can be retrieved with IndexWriter#getCommitData() later.
>
> This may serve as good storage for metadata; as an example,
> Elasticsearch stores attributes related to its transaction log there
> (UUID and generation identifier).
>
> Regards,
> András
>
> On Tue, Nov 22, 2016 at 5:40 PM, Xiaolong Zheng <zh...@gmail.com>
> wrote:
> > Thanks, StoredField seems still down to the per-document level, which
> means
> > for every document they will contains this search field.
> >
> > What I really would like is a global level storage to hold this single
> > value. Maybe this is impossible.
> >
> > Sincerely,
> >
> > --Xiaolong
> >
> >
> > On Tue, Nov 22, 2016 at 5:13 AM, Michael McCandless <
> > lucene@mikemccandless.com> wrote:
> >
> >> Lucene won't merge foreign files for you, and in general it's
> >> dangerous to put such files into Lucene's index directory because if
> >> they look like codec files Lucene may delete them.
> >>
> >> Can you just add a StoredField to each document to hold your
> information?
> >>
> >> Mike McCandless
> >>
> >> http://blog.mikemccandless.com
> >>
> >>
> >> On Mon, Nov 21, 2016 at 11:38 PM, Xiaolong Zheng
> >> <zh...@gmail.com> wrote:
> >> > Hello,
> >> >
> >> > I am trying to adding some meta data into the search data base.
> Instead
> >> of
> >> > adding a new search filed or adding a phony document, I am looking at
> the
> >> > method org.apache.lucene.store.Directory#createOutpu, which is create
> >> new
> >> > file in the search directory.
> >> >
> >> >
> >> > I am wondering does indexwriter can also merge this non-index file
> while
> >> it
> >> > merging multiple search index?
> >> >
> >> > And if I am stepping back a little bit, what's is the best way to add
> >> meta
> >> > data into the search database.
> >> >
> >> > For example, I would like to add a indicator which is showing the
> >> different
> >> > kind of stemmer is being used while it created.
> >> >
> >> >
> >> >
> >> >
> >> >
> >> > Thanks,
> >> >
> >> > --Xiaolong
> >>
>
> --
> András Péteri
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-user-help@lucene.apache.org
>
>

Re: Non-index files under the search directory

Posted by amarnath cse <am...@nitmz.ac.in>.
Can anyone tell me the procedure of text document indexing using Lucene.

Thanks..
On Nov 22, 2016 10:40 PM, "András Péteri" <ap...@b2international.com>
wrote:

> Hi Xiaolong,
>
> A Map of key-value pairs can be supplied to
> IndexWriter#setCommitData(Map<String,String>) and will be persisted
> when committing changes (setting the commit data counts as a change).
> It can be retrieved with IndexWriter#getCommitData() later.
>
> This may serve as good storage for metadata; as an example,
> Elasticsearch stores attributes related to its transaction log there
> (UUID and generation identifier).
>
> Regards,
> András
>
> On Tue, Nov 22, 2016 at 5:40 PM, Xiaolong Zheng <zh...@gmail.com>
> wrote:
> > Thanks, StoredField seems still down to the per-document level, which
> means
> > for every document they will contains this search field.
> >
> > What I really would like is a global level storage to hold this single
> > value. Maybe this is impossible.
> >
> > Sincerely,
> >
> > --Xiaolong
> >
> >
> > On Tue, Nov 22, 2016 at 5:13 AM, Michael McCandless <
> > lucene@mikemccandless.com> wrote:
> >
> >> Lucene won't merge foreign files for you, and in general it's
> >> dangerous to put such files into Lucene's index directory because if
> >> they look like codec files Lucene may delete them.
> >>
> >> Can you just add a StoredField to each document to hold your
> information?
> >>
> >> Mike McCandless
> >>
> >> http://blog.mikemccandless.com
> >>
> >>
> >> On Mon, Nov 21, 2016 at 11:38 PM, Xiaolong Zheng
> >> <zh...@gmail.com> wrote:
> >> > Hello,
> >> >
> >> > I am trying to adding some meta data into the search data base.
> Instead
> >> of
> >> > adding a new search filed or adding a phony document, I am looking at
> the
> >> > method org.apache.lucene.store.Directory#createOutpu, which is create
> >> new
> >> > file in the search directory.
> >> >
> >> >
> >> > I am wondering does indexwriter can also merge this non-index file
> while
> >> it
> >> > merging multiple search index?
> >> >
> >> > And if I am stepping back a little bit, what's is the best way to add
> >> meta
> >> > data into the search database.
> >> >
> >> > For example, I would like to add a indicator which is showing the
> >> different
> >> > kind of stemmer is being used while it created.
> >> >
> >> >
> >> >
> >> >
> >> >
> >> > Thanks,
> >> >
> >> > --Xiaolong
> >>
>
> --
> András Péteri
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-user-help@lucene.apache.org
>
>

Re: Non-index files under the search directory

Posted by András Péteri <ap...@b2international.com>.
Hi Xiaolong,

A Map of key-value pairs can be supplied to
IndexWriter#setCommitData(Map<String,String>) and will be persisted
when committing changes (setting the commit data counts as a change).
It can be retrieved with IndexWriter#getCommitData() later.

This may serve as good storage for metadata; as an example,
Elasticsearch stores attributes related to its transaction log there
(UUID and generation identifier).

Regards,
András

On Tue, Nov 22, 2016 at 5:40 PM, Xiaolong Zheng <zh...@gmail.com> wrote:
> Thanks, StoredField seems still down to the per-document level, which means
> for every document they will contains this search field.
>
> What I really would like is a global level storage to hold this single
> value. Maybe this is impossible.
>
> Sincerely,
>
> --Xiaolong
>
>
> On Tue, Nov 22, 2016 at 5:13 AM, Michael McCandless <
> lucene@mikemccandless.com> wrote:
>
>> Lucene won't merge foreign files for you, and in general it's
>> dangerous to put such files into Lucene's index directory because if
>> they look like codec files Lucene may delete them.
>>
>> Can you just add a StoredField to each document to hold your information?
>>
>> Mike McCandless
>>
>> http://blog.mikemccandless.com
>>
>>
>> On Mon, Nov 21, 2016 at 11:38 PM, Xiaolong Zheng
>> <zh...@gmail.com> wrote:
>> > Hello,
>> >
>> > I am trying to adding some meta data into the search data base. Instead
>> of
>> > adding a new search filed or adding a phony document, I am looking at the
>> > method org.apache.lucene.store.Directory#createOutpu, which is create
>> new
>> > file in the search directory.
>> >
>> >
>> > I am wondering does indexwriter can also merge this non-index file while
>> it
>> > merging multiple search index?
>> >
>> > And if I am stepping back a little bit, what's is the best way to add
>> meta
>> > data into the search database.
>> >
>> > For example, I would like to add a indicator which is showing the
>> different
>> > kind of stemmer is being used while it created.
>> >
>> >
>> >
>> >
>> >
>> > Thanks,
>> >
>> > --Xiaolong
>>

-- 
András Péteri

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Re: Non-index files under the search directory

Posted by Xiaolong Zheng <zh...@gmail.com>.
Thanks, StoredField seems still down to the per-document level, which means
for every document they will contains this search field.

What I really would like is a global level storage to hold this single
value. Maybe this is impossible.

Sincerely,

--Xiaolong


On Tue, Nov 22, 2016 at 5:13 AM, Michael McCandless <
lucene@mikemccandless.com> wrote:

> Lucene won't merge foreign files for you, and in general it's
> dangerous to put such files into Lucene's index directory because if
> they look like codec files Lucene may delete them.
>
> Can you just add a StoredField to each document to hold your information?
>
> Mike McCandless
>
> http://blog.mikemccandless.com
>
>
> On Mon, Nov 21, 2016 at 11:38 PM, Xiaolong Zheng
> <zh...@gmail.com> wrote:
> > Hello,
> >
> > I am trying to adding some meta data into the search data base. Instead
> of
> > adding a new search filed or adding a phony document, I am looking at the
> > method org.apache.lucene.store.Directory#createOutpu, which is create
> new
> > file in the search directory.
> >
> >
> > I am wondering does indexwriter can also merge this non-index file while
> it
> > merging multiple search index?
> >
> > And if I am stepping back a little bit, what's is the best way to add
> meta
> > data into the search database.
> >
> > For example, I would like to add a indicator which is showing the
> different
> > kind of stemmer is being used while it created.
> >
> >
> >
> >
> >
> > Thanks,
> >
> > --Xiaolong
>

Re: Non-index files under the search directory

Posted by Michael McCandless <lu...@mikemccandless.com>.
Lucene won't merge foreign files for you, and in general it's
dangerous to put such files into Lucene's index directory because if
they look like codec files Lucene may delete them.

Can you just add a StoredField to each document to hold your information?

Mike McCandless

http://blog.mikemccandless.com


On Mon, Nov 21, 2016 at 11:38 PM, Xiaolong Zheng
<zh...@gmail.com> wrote:
> Hello,
>
> I am trying to adding some meta data into the search data base. Instead of
> adding a new search filed or adding a phony document, I am looking at the
> method org.apache.lucene.store.Directory#createOutpu, which is create new
> file in the search directory.
>
>
> I am wondering does indexwriter can also merge this non-index file while it
> merging multiple search index?
>
> And if I am stepping back a little bit, what's is the best way to add meta
> data into the search database.
>
> For example, I would like to add a indicator which is showing the different
> kind of stemmer is being used while it created.
>
>
>
>
>
> Thanks,
>
> --Xiaolong

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org