You are viewing a plain text version of this content. The canonical link for it is here.
Posted to java-user@lucene.apache.org by Michael McCandless <lu...@mikemccandless.com> on 2009/10/01 13:53:22 UTC

Re: How does the term infos file (.tis) works?

It's better to use the TermEnum API (IndexReader.terms()) to step
through the terms, than to directly access the raw file (unless you
have some reason to do so...).

Mike

On Wed, Sep 30, 2009 at 6:29 AM, iron light <ir...@gmail.com> wrote:
> I try to traverse all the term text in one tis files. And it failed. the
> code is below.
> Does I misunderstand something?
> The source code (especial the index namespace) is very complicated for me.
> Is there any more document about the design and something can help me
> understand the source?
>
> Thanks.
>
>    public static void main(String[] args) throws IOException {
>        File path = new File("../../wiki/index").getCanonicalFile();
>        FSDirectory directory = FSDirectory.open(path);
>        IndexInput input = directory.openInput("_1ck.tis");
>        System.out.println(input.readInt());
>        System.out.println(input.readLong());
>        System.out.println(input.readInt());
>        System.out.println(input.readInt());
>        System.out.println(input.readInt());
>
>        String text = "";
>        for(int i = 0; i < 10; ++i){
>            TermInfo termInfo = new TermInfo(input);
>            text = text.substring(0, termInfo.term.prefixLength) +
> termInfo.term.suffix;
>            System.out.println(text);
>        }
>    }
>
>    private static class Term {
>        public int prefixLength;
>        public String suffix;
>        public int fieldNum;
>        Term(IndexInput input) throws IOException {
>            this.prefixLength = input.readVInt();
>            this.suffix = input.readString();
>            this.fieldNum = input.readVInt();
>        }
>    }
>
>    private static class TermInfo {
>        public Term term;
>        public int docFreq, freqDelta, proxDelta, skipDelta;
>        TermInfo(IndexInput input) throws IOException {
>            this.term = new Term(input);
>            this.docFreq = input.readVInt();
>            this.freqDelta = input.readVInt();
>            this.proxDelta = input.readVInt();
>            this.skipDelta = input.readVInt();
>        }
>    }
>

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Re: How does the term infos file (.tis) works?

Posted by Michael McCandless <lu...@mikemccandless.com>.
On Thu, Oct 1, 2009 at 8:21 AM, iron light <ir...@gmail.com> wrote:

> The reason is I wanna dig deeply.

OK :)  That's fun!

> I just read the code. And found that  the index namespace (IndexWriter!) in
> so tough for me.
> Is there any document, resource or blog about the code?

In general there's no separate document detailing how Lucene's source
code works; the code itself (and its comments) is all there is.  If
you are going through the code, please add comments & post a patch and
we can improve its comments over time.  Source code is a living thing
and constantly evolving :)

But, for the index file format in particular, there is this:

    http://lucene.apache.org/java/2_9_0/fileformats.html

If you want specifically to parse the terms dict/index, you should
look at SegmentTermEnum and TermInfosReader for inspiration...

Mike

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Re: How does the term infos file (.tis) works?

Posted by iron light <ir...@gmail.com>.
Thanks, Mike.

The reason is I wanna dig deeply.
I just read the code. And found that  the index namespace (IndexWriter!) in
so tough for me.
Is there any document, resource or blog about the code?

IL

On Thu, Oct 1, 2009 at 8:53 PM, Michael McCandless <
lucene@mikemccandless.com> wrote:

> It's better to use the TermEnum API (IndexReader.terms()) to step
> through the terms, than to directly access the raw file (unless you
> have some reason to do so...).
>
> Mike
>
> On Wed, Sep 30, 2009 at 6:29 AM, iron light <ir...@gmail.com> wrote:
> > I try to traverse all the term text in one tis files. And it failed. the
> > code is below.
> > Does I misunderstand something?
> > The source code (especial the index namespace) is very complicated for
> me.
> > Is there any more document about the design and something can help me
> > understand the source?
> >
> > Thanks.
> >
> >    public static void main(String[] args) throws IOException {
> >        File path = new File("../../wiki/index").getCanonicalFile();
> >        FSDirectory directory = FSDirectory.open(path);
> >        IndexInput input = directory.openInput("_1ck.tis");
> >        System.out.println(input.readInt());
> >        System.out.println(input.readLong());
> >        System.out.println(input.readInt());
> >        System.out.println(input.readInt());
> >        System.out.println(input.readInt());
> >
> >        String text = "";
> >        for(int i = 0; i < 10; ++i){
> >            TermInfo termInfo = new TermInfo(input);
> >            text = text.substring(0, termInfo.term.prefixLength) +
> > termInfo.term.suffix;
> >            System.out.println(text);
> >        }
> >    }
> >
> >    private static class Term {
> >        public int prefixLength;
> >        public String suffix;
> >        public int fieldNum;
> >        Term(IndexInput input) throws IOException {
> >            this.prefixLength = input.readVInt();
> >            this.suffix = input.readString();
> >            this.fieldNum = input.readVInt();
> >        }
> >    }
> >
> >    private static class TermInfo {
> >        public Term term;
> >        public int docFreq, freqDelta, proxDelta, skipDelta;
> >        TermInfo(IndexInput input) throws IOException {
> >            this.term = new Term(input);
> >            this.docFreq = input.readVInt();
> >            this.freqDelta = input.readVInt();
> >            this.proxDelta = input.readVInt();
> >            this.skipDelta = input.readVInt();
> >        }
> >    }
> >
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-user-help@lucene.apache.org
>
>