You are viewing a plain text version of this content. The canonical link for it is here.
Posted to java-user@lucene.apache.org by Ranganath B N <ra...@huawei.com> on 2017/07/31 13:02:19 UTC

lucene Input and Output format



Hi All,

     Can you point me to some of the implementations  of lucene Input and Output format? I wanted to know them to  understand the distributed implementation approach.


Thanks,
Ranganath B. N.

RE: lucene Input and Output format

Posted by wz00000 <18...@163.com>.
Hi, 
  Do you means use lucene as hadoop input and output source? It seems that
there are no implementation, you can do it by yourself.  you may refer to
elasticsearch-hadoop



--
View this message in context: http://lucene.472066.n3.nabble.com/lucene-Input-and-Output-format-tp4348370p4349824.html
Sent from the Lucene - Java Users mailing list archive at Nabble.com.

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


RE: lucene Input and Output format

Posted by Uwe Schindler <uw...@thetaphi.de>.
Hi,

There is no such class in Lucene, it's part of your code or some other software that just uses Lucene. Just to give you a hint: Lucene is just the library that can take plain text from a java Reader and index it by tokenizing and normalizing the tokens. It cannot read any special file formats. You just give it text and it indexes it. If you want to index files form your harddisk (e.g. word documents) you have to write your own code that extracts the text from the files and sends it to the indexer. There are tools out there (e.g., Solr or Elasticsearch) that can help with that. The class names you sent us do not ring any bells on my side, it might be some small project or home-grown software in your company.

Uwe

-----
Uwe Schindler
Achterdiek 19, D-28357 Bremen
http://www.thetaphi.de
eMail: uwe@thetaphi.de

> -----Original Message-----
> From: Ian Lea [mailto:ian.lea@gmail.com]
> Sent: Wednesday, August 2, 2017 10:42 AM
> To: java-user@lucene.apache.org
> Subject: Re: lucene Input and Output format
> 
> What are the full package names for these interfaces?  I don't think they
> are org.apache.lucene.
> 
> 
> --
> Ian.
> 
> 
> On Wed, Aug 2, 2017 at 9:00 AM, Ranganath B N
> <ra...@huawei.com>
> wrote:
> 
> > Hi,
> >
> >   It's not about the file formats. Rather It is about LuceneInputFormat
> > and LuceneOutputFormat interfaces which deals with getsplit(),
> > getRecordReader() and getRecordWriter() methods. Are there any
> > Implementations for these interfaces?
> >
> >
> > Thanks,
> > Ranganath B. N.
> >
> > -----Original Message-----
> > From: Adrien Grand [mailto:jpountz@gmail.com]
> > Sent: Tuesday, August 01, 2017 7:23 PM
> > To: java-user@lucene.apache.org
> > Cc: Vadiraj Muradi
> > Subject: Re: lucene Input and Output format
> >
> > Which part of the index do you want to learn about? Here are some
> > descriptions of the file formats:
> >  - terms dict:
> > http://lucene.apache.org/core/6_6_0/core/org/apache/lucene/
> > codecs/blocktree/BlockTreeTermsWriter.html
> >  - postings:
> > http://lucene.apache.org/core/6_6_0/core/index.html?org/
> > apache/lucene/index/IndexableField.html
> >  - doc values:
> > http://lucene.apache.org/core/6_6_0/core/index.html?org/
> > apache/lucene/index/IndexableField.html
> >  - stored fields:
> > http://lucene.apache.org/core/6_6_0/core/index.html?org/
> > apache/lucene/index/IndexableField.html
> >
> > Le lun. 31 juil. 2017 à 15:02, Ranganath B N <ra...@huawei.com>
> a
> > écrit :
> >
> > >
> > >
> > >
> > > Hi All,
> > >
> > >      Can you point me to some of the implementations  of lucene Input
> > > and Output format? I wanted to know them to  understand the
> > > distributed implementation approach.
> > >
> > >
> > > Thanks,
> > > Ranganath B. N.
> > >
> >


---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Re: lucene Input and Output format

Posted by Ian Lea <ia...@gmail.com>.
What are the full package names for these interfaces?  I don't think they
are org.apache.lucene.


--
Ian.


On Wed, Aug 2, 2017 at 9:00 AM, Ranganath B N <ra...@huawei.com>
wrote:

> Hi,
>
>   It's not about the file formats. Rather It is about LuceneInputFormat
> and LuceneOutputFormat interfaces which deals with getsplit(),
> getRecordReader() and getRecordWriter() methods. Are there any
> Implementations for these interfaces?
>
>
> Thanks,
> Ranganath B. N.
>
> -----Original Message-----
> From: Adrien Grand [mailto:jpountz@gmail.com]
> Sent: Tuesday, August 01, 2017 7:23 PM
> To: java-user@lucene.apache.org
> Cc: Vadiraj Muradi
> Subject: Re: lucene Input and Output format
>
> Which part of the index do you want to learn about? Here are some
> descriptions of the file formats:
>  - terms dict:
> http://lucene.apache.org/core/6_6_0/core/org/apache/lucene/
> codecs/blocktree/BlockTreeTermsWriter.html
>  - postings:
> http://lucene.apache.org/core/6_6_0/core/index.html?org/
> apache/lucene/index/IndexableField.html
>  - doc values:
> http://lucene.apache.org/core/6_6_0/core/index.html?org/
> apache/lucene/index/IndexableField.html
>  - stored fields:
> http://lucene.apache.org/core/6_6_0/core/index.html?org/
> apache/lucene/index/IndexableField.html
>
> Le lun. 31 juil. 2017 à 15:02, Ranganath B N <ra...@huawei.com> a
> écrit :
>
> >
> >
> >
> > Hi All,
> >
> >      Can you point me to some of the implementations  of lucene Input
> > and Output format? I wanted to know them to  understand the
> > distributed implementation approach.
> >
> >
> > Thanks,
> > Ranganath B. N.
> >
>

RE: lucene Input and Output format

Posted by Ranganath B N <ra...@huawei.com>.
Hi,

  It's not about the file formats. Rather It is about LuceneInputFormat and LuceneOutputFormat interfaces which deals with getsplit(), getRecordReader() and getRecordWriter() methods. Are there any 
Implementations for these interfaces?


Thanks,
Ranganath B. N. 

-----Original Message-----
From: Adrien Grand [mailto:jpountz@gmail.com] 
Sent: Tuesday, August 01, 2017 7:23 PM
To: java-user@lucene.apache.org
Cc: Vadiraj Muradi
Subject: Re: lucene Input and Output format

Which part of the index do you want to learn about? Here are some descriptions of the file formats:
 - terms dict:
http://lucene.apache.org/core/6_6_0/core/org/apache/lucene/codecs/blocktree/BlockTreeTermsWriter.html
 - postings:
http://lucene.apache.org/core/6_6_0/core/index.html?org/apache/lucene/index/IndexableField.html
 - doc values:
http://lucene.apache.org/core/6_6_0/core/index.html?org/apache/lucene/index/IndexableField.html
 - stored fields:
http://lucene.apache.org/core/6_6_0/core/index.html?org/apache/lucene/index/IndexableField.html

Le lun. 31 juil. 2017 à 15:02, Ranganath B N <ra...@huawei.com> a écrit :

>
>
>
> Hi All,
>
>      Can you point me to some of the implementations  of lucene Input 
> and Output format? I wanted to know them to  understand the 
> distributed implementation approach.
>
>
> Thanks,
> Ranganath B. N.
>

Re: lucene Input and Output format

Posted by Adrien Grand <jp...@gmail.com>.
Which part of the index do you want to learn about? Here are some
descriptions of the file formats:
 - terms dict:
http://lucene.apache.org/core/6_6_0/core/org/apache/lucene/codecs/blocktree/BlockTreeTermsWriter.html
 - postings:
http://lucene.apache.org/core/6_6_0/core/index.html?org/apache/lucene/index/IndexableField.html
 - doc values:
http://lucene.apache.org/core/6_6_0/core/index.html?org/apache/lucene/index/IndexableField.html
 - stored fields:
http://lucene.apache.org/core/6_6_0/core/index.html?org/apache/lucene/index/IndexableField.html

Le lun. 31 juil. 2017 à 15:02, Ranganath B N <ra...@huawei.com> a
écrit :

>
>
>
> Hi All,
>
>      Can you point me to some of the implementations  of lucene Input and
> Output format? I wanted to know them to  understand the distributed
> implementation approach.
>
>
> Thanks,
> Ranganath B. N.
>