You are viewing a plain text version of this content. The canonical link for it is here.

Posted to java-user@lucene.apache.org by Brittany Jacobs <bj...@jbmanagement.com> on 2008/08/01 15:28:58 UTC

getting started

Just trying to grasp the concept.

 

I want to search a text file where each line is a separate item to be
searched.  When text it entered by the user, I want to return all the lines
in which that text appears.

For example, if the text file has:

I like apples.

I went to the store.

I bought an apple.

 

If the user searches "apple", I want it to return the first and third
sentences.

 

Is each sentence a Token?  Is the user input going to be a QueryParser?  How
should I read in the file so that each line of text is a token to search?

 

Thanks in advance.

 

 

 

Brittany Jacobs

Java Developer

JBManagement, Inc.

12 Christopher Way, Suite 103

Eatontown, NJ 07724

ph: 732-542-9200 ext. 229

fax: 732-380-0678

email:  <ma...@jbmanagement.com> bjacobs@jbmanagement.com

RE: getting started

Posted by Brittany Jacobs <bj...@jbmanagement.com>.

Ok, say each line is an address.  So the text file would look like:
123 Water St. Somerville, GA 12345
456 Easy St. Hope, CA 45676
34 Ocean Blvd. Staten Island, NY 93843

The file would have hundreds of thousands of addresses.

So the user would type "34, St" in the search box and press a "Search" button.
In the table below the search box, the first and third record from the addresses above would be displayed because they both have a "34" somewhere in them, and they both have a "St" somewhere in them.

So the table would show:
123 Water St. Somerville, GA 12345
34 Ocean Blvd. Staten Island, NY 93843

because they match both criteria as pointed out here:
123 Water "St". Somerville, GA 12"34"5
"34" Ocean Blvd. "St"aten Island, NY 93843

Thanks.
Brittany 

Well, this could get to be a really ugly query. Let's say you have 10 lines.
Then the
doc would have 10 different fields? ("line1", "line2" etc.)? Then to search
it
you have to have an or clause across all fields. And a file with 100,000
lines would be
a 100,000 term query...... Or I misunderstand you completely.

Calling doc.add with the *same* field (say "text") is a possibility,
especially if you
provide your own tokenizer that returns a large increment gap, say 1000.
This offset
gets added to each call to doc.add on a field. So say you have 10 lines,
each with 5 tokens.
The first token of each line would be at offsets
0, 15, 30, 45...

You have a couple of choices here. Say you can guarantee that no line will
be longer than 100 terms.
Each line could begin on an even 100 offset (assuming you're not indexing
something with many millions
of lines). Now, to find the line you just divide the offset by 100.

Another possibility is to keep a field in the document that correlates
offsets to lines and read that
in when you need to.

It all depends upon what the purpose of needing to keep track of lines. If
it's for a single document,
this kind of thing can work. But if you want line information for all the
hits, it could be too expensive.

The increment gap will play interesting games with Span queries (or slop in
phrase queries). If you need
proximity to span lines, this scheme needs some modification. Say I want
hits when "firstname" is within 10
terms of "lastname". Well, if you have a large increment gap this won't
work.

So it would be a good thing to tell us a bit more about why you want to
distinguish lines to get
better advice <G>.

Best
Erick

On Fri, Aug 1, 2008 at 9:59 AM, ನಾಗೇಶ್ ಸುಬ್ರಹ್ಮಣ್ಯ (Nagesh S) <
nageshblore@gmail.com> wrote:

> Why should each line be a Document ? If there is a single document having
> each line as a Field, then the search would result in a single Document as
> a
> 'hit' not the individual lines matching it. Is this right ?
>
> Nagesh
>
> On Fri, Aug 1, 2008 at 7:21 PM, <ro...@xemaps.com> wrote:
>
> > Hello Brittany,
> >
> > I think the easiest thing for you to do is make each line a Document.
>  You
> > might want a FileName and LineNumber field on top of a "Text" field, this
> > way if you need to gather all the lines of your File back together again
> > you
> > can do a search on the FileName.
> >
> > So in your case:
> >
> > Document 1
> >  FileName: [the file]
> >  LineNumber: 1
> >  Text: I like apples
> > Document 2
> >  ...etc
> >
> > Regards,
> > Roy
> >
> > On Fri, Aug 1, 2008 at 9:28 AM, Brittany Jacobs <
> bjacobs@jbmanagement.com
> > >wrote:
> >
> > > Just trying to grasp the concept.
> > >
> > >
> > >
> > > I want to search a text file where each line is a separate item to be
> > > searched.  When text it entered by the user, I want to return all the
> > lines
> > > in which that text appears.
> > >
> > > For example, if the text file has:
> > >
> > > I like apples.
> > >
> > > I went to the store.
> > >
> > > I bought an apple.
> > >
> > >
> > >
> > > If the user searches "apple", I want it to return the first and third
> > > sentences.
> > >
> > >
> > >
> > > Is each sentence a Token?  Is the user input going to be a QueryParser?
> > >  How
> > > should I read in the file so that each line of text is a token to
> search?
> > >
> > >
> > >
> > > Thanks in advance.
> > >
> > >
> > >
> > >
> > >
> > >
> > >
> > > Brittany Jacobs
> > >
> > > Java Developer
> > >
> > > JBManagement, Inc.
> > >
> > > 12 Christopher Way, Suite 103
> > >
> > > Eatontown, NJ 07724
> > >
> > > ph: 732-542-9200 ext. 229
> > >
> > > fax: 732-380-0678
> > >
> > > email:  <ma...@jbmanagement.com> bjacobs@jbmanagement.com
> > >
> > >
> > >
> > >
> >
>

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org

Re: getting started

Posted by Erick Erickson <er...@gmail.com>.

Well, this could get to be a really ugly query. Let's say you have 10 lines.
Then the
doc would have 10 different fields? ("line1", "line2" etc.)? Then to search
it
you have to have an or clause across all fields. And a file with 100,000
lines would be
a 100,000 term query...... Or I misunderstand you completely.

Calling doc.add with the *same* field (say "text") is a possibility,
especially if you
provide your own tokenizer that returns a large increment gap, say 1000.
This offset
gets added to each call to doc.add on a field. So say you have 10 lines,
each with 5 tokens.
The first token of each line would be at offsets
0, 15, 30, 45...

You have a couple of choices here. Say you can guarantee that no line will
be longer than 100 terms.
Each line could begin on an even 100 offset (assuming you're not indexing
something with many millions
of lines). Now, to find the line you just divide the offset by 100.

Another possibility is to keep a field in the document that correlates
offsets to lines and read that
in when you need to.

It all depends upon what the purpose of needing to keep track of lines. If
it's for a single document,
this kind of thing can work. But if you want line information for all the
hits, it could be too expensive.

The increment gap will play interesting games with Span queries (or slop in
phrase queries). If you need
proximity to span lines, this scheme needs some modification. Say I want
hits when "firstname" is within 10
terms of "lastname". Well, if you have a large increment gap this won't
work.

So it would be a good thing to tell us a bit more about why you want to
distinguish lines to get
better advice <G>.

Best
Erick

On Fri, Aug 1, 2008 at 9:59 AM, ನಾಗೇಶ್ ಸುಬ್ರಹ್ಮಣ್ಯ (Nagesh S) <
nageshblore@gmail.com> wrote:

> Why should each line be a Document ? If there is a single document having
> each line as a Field, then the search would result in a single Document as
> a
> 'hit' not the individual lines matching it. Is this right ?
>
> Nagesh
>
> On Fri, Aug 1, 2008 at 7:21 PM, <ro...@xemaps.com> wrote:
>
> > Hello Brittany,
> >
> > I think the easiest thing for you to do is make each line a Document.
>  You
> > might want a FileName and LineNumber field on top of a "Text" field, this
> > way if you need to gather all the lines of your File back together again
> > you
> > can do a search on the FileName.
> >
> > So in your case:
> >
> > Document 1
> >  FileName: [the file]
> >  LineNumber: 1
> >  Text: I like apples
> > Document 2
> >  ...etc
> >
> > Regards,
> > Roy
> >
> > On Fri, Aug 1, 2008 at 9:28 AM, Brittany Jacobs <
> bjacobs@jbmanagement.com
> > >wrote:
> >
> > > Just trying to grasp the concept.
> > >
> > >
> > >
> > > I want to search a text file where each line is a separate item to be
> > > searched.  When text it entered by the user, I want to return all the
> > lines
> > > in which that text appears.
> > >
> > > For example, if the text file has:
> > >
> > > I like apples.
> > >
> > > I went to the store.
> > >
> > > I bought an apple.
> > >
> > >
> > >
> > > If the user searches "apple", I want it to return the first and third
> > > sentences.
> > >
> > >
> > >
> > > Is each sentence a Token?  Is the user input going to be a QueryParser?
> > >  How
> > > should I read in the file so that each line of text is a token to
> search?
> > >
> > >
> > >
> > > Thanks in advance.
> > >
> > >
> > >
> > >
> > >
> > >
> > >
> > > Brittany Jacobs
> > >
> > > Java Developer
> > >
> > > JBManagement, Inc.
> > >
> > > 12 Christopher Way, Suite 103
> > >
> > > Eatontown, NJ 07724
> > >
> > > ph: 732-542-9200 ext. 229
> > >
> > > fax: 732-380-0678
> > >
> > > email:  <ma...@jbmanagement.com> bjacobs@jbmanagement.com
> > >
> > >
> > >
> > >
> >
>

Re: getting started

Posted by ro...@xemaps.com.

That certainly works if the intent is to grab the entire file.   If all you
want is that particular line to be returned in the search then that's not
going to work.

Let's say the files was made up of a million lines and the text was stored
in the index (I know, absurd).

When grabbing the Document from a search, you don't necessarily want to grab
all the lines.

Also when you get the document, how do you know which Field contained the
line you wanted?

Roy

On Fri, Aug 1, 2008 at 9:59 AM, ನಾಗೇಶ್ ಸುಬ್ರಹ್ಮಣ್ಯ (Nagesh S) <
nageshblore@gmail.com> wrote:

>
> Why should each line be a Document ? If there is a single document having
> each line as a Field, then the search would result in a single Document as
> a
> 'hit' not the individual lines matching it. Is this right ?
>
> Nagesh
>
> On Fri, Aug 1, 2008 at 7:21 PM, <ro...@xemaps.com> wrote:
>
> > Hello Brittany,
> >
> > I think the easiest thing for you to do is make each line a Document.
>  You
> > might want a FileName and LineNumber field on top of a "Text" field, this
> > way if you need to gather all the lines of your File back together again
> > you
> > can do a search on the FileName.
> >
> > So in your case:
> >
> > Document 1
> >  FileName: [the file]
> >  LineNumber: 1
> >  Text: I like apples
> > Document 2
> >  ...etc
> >
> > Regards,
> > Roy
> >
> > On Fri, Aug 1, 2008 at 9:28 AM, Brittany Jacobs <
> bjacobs@jbmanagement.com
> > >wrote:
> >
> > > Just trying to grasp the concept.
> > >
> > >
> > >
> > > I want to search a text file where each line is a separate item to be
> > > searched.  When text it entered by the user, I want to return all the
> > lines
> > > in which that text appears.
> > >
> > > For example, if the text file has:
> > >
> > > I like apples.
> > >
> > > I went to the store.
> > >
> > > I bought an apple.
> > >
> > >
> > >
> > > If the user searches "apple", I want it to return the first and third
> > > sentences.
> > >
> > >
> > >
> > > Is each sentence a Token?  Is the user input going to be a QueryParser?
> > >  How
> > > should I read in the file so that each line of text is a token to
> search?
> > >
> > >
> > >
> > > Thanks in advance.
> > >
> > >
> > >
> > >
> > >
> > >
> > >
> > > Brittany Jacobs
> > >
> > > Java Developer
> > >
> > > JBManagement, Inc.
> > >
> > > 12 Christopher Way, Suite 103
> > >
> > > Eatontown, NJ 07724
> > >
> > > ph: 732-542-9200 ext. 229
> > >
> > > fax: 732-380-0678
> > >
> > > email:  <ma...@jbmanagement.com> bjacobs@jbmanagement.com
> > >
> > >
> > >
> > >
> >
>

Re: getting started

Posted by "ನಾಗೇಶ್ ಸುಬ್ರಹ್ಮಣ್ಯ (Nagesh S)" <na...@gmail.com>.

Hi Brittany,
"What is the web address you are seeing this message on?"
Me ? I am not sure, I followed the question.

Nagesh

On Fri, Aug 1, 2008 at 7:45 PM, ನಾಗೇಶ್ ಸುಬ್ರಹ್ಮಣ್ಯ (Nagesh S) <
nageshblore@gmail.com> wrote:

> Thanks, Ian !
>
>
> On Fri, Aug 1, 2008 at 7:43 PM, Ian Lea <ia...@gmail.com> wrote:
>
>> Yes, that is correct.
>>
>>
>> --
>> Ian.
>>
>>
>> On Fri, Aug 1, 2008 at 2:59 PM, ನಾಗೇಶ್ ಸುಬ್ರಹ್ಮಣ್ಯ (Nagesh S)
>> <na...@gmail.com> wrote:
>> > Why should each line be a Document ? If there is a single document
>> having
>> > each line as a Field, then the search would result in a single Document
>> as a
>> > 'hit' not the individual lines matching it. Is this right ?
>> >
>> > Nagesh
>> >
>> > On Fri, Aug 1, 2008 at 7:21 PM, <ro...@xemaps.com> wrote:
>> >
>> >> Hello Brittany,
>> >>
>> >> I think the easiest thing for you to do is make each line a Document.
>>  You
>> >> might want a FileName and LineNumber field on top of a "Text" field,
>> this
>> >> way if you need to gather all the lines of your File back together
>> again
>> >> you
>> >> can do a search on the FileName.
>> >>
>> >> So in your case:
>> >>
>> >> Document 1
>> >>  FileName: [the file]
>> >>  LineNumber: 1
>> >>  Text: I like apples
>> >> Document 2
>> >>  ...etc
>> >>
>> >> Regards,
>> >> Roy
>> >>
>> >> On Fri, Aug 1, 2008 at 9:28 AM, Brittany Jacobs <
>> bjacobs@jbmanagement.com
>> >> >wrote:
>> >>
>> >> > Just trying to grasp the concept.
>> >> >
>> >> >
>> >> >
>> >> > I want to search a text file where each line is a separate item to be
>> >> > searched.  When text it entered by the user, I want to return all the
>> >> lines
>> >> > in which that text appears.
>> >> >
>> >> > For example, if the text file has:
>> >> >
>> >> > I like apples.
>> >> >
>> >> > I went to the store.
>> >> >
>> >> > I bought an apple.
>> >> >
>> >> >
>> >> >
>> >> > If the user searches "apple", I want it to return the first and third
>> >> > sentences.
>> >> >
>> >> >
>> >> >
>> >> > Is each sentence a Token?  Is the user input going to be a
>> QueryParser?
>> >> >  How
>> >> > should I read in the file so that each line of text is a token to
>> search?
>> >> >
>> >> >
>> >> >
>> >> > Thanks in advance.
>> >> >
>> >> >
>> >> >
>> >> >
>> >> >
>> >> >
>> >> >
>> >> > Brittany Jacobs
>> >> >
>> >> > Java Developer
>> >> >
>> >> > JBManagement, Inc.
>> >> >
>> >> > 12 Christopher Way, Suite 103
>> >> >
>> >> > Eatontown, NJ 07724
>> >> >
>> >> > ph: 732-542-9200 ext. 229
>> >> >
>> >> > fax: 732-380-0678
>> >> >
>> >> > email:  <ma...@jbmanagement.com> bjacobs@jbmanagement.com
>> >> >
>> >> >
>> >> >
>> >> >
>> >>
>> >
>>
>
>

Re: getting started

Posted by "ನಾಗೇಶ್ ಸುಬ್ರಹ್ಮಣ್ಯ (Nagesh S)" <na...@gmail.com>.

Thanks, Ian !

On Fri, Aug 1, 2008 at 7:43 PM, Ian Lea <ia...@gmail.com> wrote:

> Yes, that is correct.
>
>
> --
> Ian.
>
>
> On Fri, Aug 1, 2008 at 2:59 PM, ನಾಗೇಶ್ ಸುಬ್ರಹ್ಮಣ್ಯ (Nagesh S)
> <na...@gmail.com> wrote:
> > Why should each line be a Document ? If there is a single document having
> > each line as a Field, then the search would result in a single Document
> as a
> > 'hit' not the individual lines matching it. Is this right ?
> >
> > Nagesh
> >
> > On Fri, Aug 1, 2008 at 7:21 PM, <ro...@xemaps.com> wrote:
> >
> >> Hello Brittany,
> >>
> >> I think the easiest thing for you to do is make each line a Document.
>  You
> >> might want a FileName and LineNumber field on top of a "Text" field,
> this
> >> way if you need to gather all the lines of your File back together again
> >> you
> >> can do a search on the FileName.
> >>
> >> So in your case:
> >>
> >> Document 1
> >>  FileName: [the file]
> >>  LineNumber: 1
> >>  Text: I like apples
> >> Document 2
> >>  ...etc
> >>
> >> Regards,
> >> Roy
> >>
> >> On Fri, Aug 1, 2008 at 9:28 AM, Brittany Jacobs <
> bjacobs@jbmanagement.com
> >> >wrote:
> >>
> >> > Just trying to grasp the concept.
> >> >
> >> >
> >> >
> >> > I want to search a text file where each line is a separate item to be
> >> > searched.  When text it entered by the user, I want to return all the
> >> lines
> >> > in which that text appears.
> >> >
> >> > For example, if the text file has:
> >> >
> >> > I like apples.
> >> >
> >> > I went to the store.
> >> >
> >> > I bought an apple.
> >> >
> >> >
> >> >
> >> > If the user searches "apple", I want it to return the first and third
> >> > sentences.
> >> >
> >> >
> >> >
> >> > Is each sentence a Token?  Is the user input going to be a
> QueryParser?
> >> >  How
> >> > should I read in the file so that each line of text is a token to
> search?
> >> >
> >> >
> >> >
> >> > Thanks in advance.
> >> >
> >> >
> >> >
> >> >
> >> >
> >> >
> >> >
> >> > Brittany Jacobs
> >> >
> >> > Java Developer
> >> >
> >> > JBManagement, Inc.
> >> >
> >> > 12 Christopher Way, Suite 103
> >> >
> >> > Eatontown, NJ 07724
> >> >
> >> > ph: 732-542-9200 ext. 229
> >> >
> >> > fax: 732-380-0678
> >> >
> >> > email:  <ma...@jbmanagement.com> bjacobs@jbmanagement.com
> >> >
> >> >
> >> >
> >> >
> >>
> >
>

Re: getting started

Posted by Ian Lea <ia...@gmail.com>.

Yes, that is correct.


--
Ian.


On Fri, Aug 1, 2008 at 2:59 PM, ನಾಗೇಶ್ ಸುಬ್ರಹ್ಮಣ್ಯ (Nagesh S)
<na...@gmail.com> wrote:
> Why should each line be a Document ? If there is a single document having
> each line as a Field, then the search would result in a single Document as a
> 'hit' not the individual lines matching it. Is this right ?
>
> Nagesh
>
> On Fri, Aug 1, 2008 at 7:21 PM, <ro...@xemaps.com> wrote:
>
>> Hello Brittany,
>>
>> I think the easiest thing for you to do is make each line a Document.  You
>> might want a FileName and LineNumber field on top of a "Text" field, this
>> way if you need to gather all the lines of your File back together again
>> you
>> can do a search on the FileName.
>>
>> So in your case:
>>
>> Document 1
>>  FileName: [the file]
>>  LineNumber: 1
>>  Text: I like apples
>> Document 2
>>  ...etc
>>
>> Regards,
>> Roy
>>
>> On Fri, Aug 1, 2008 at 9:28 AM, Brittany Jacobs <bjacobs@jbmanagement.com
>> >wrote:
>>
>> > Just trying to grasp the concept.
>> >
>> >
>> >
>> > I want to search a text file where each line is a separate item to be
>> > searched.  When text it entered by the user, I want to return all the
>> lines
>> > in which that text appears.
>> >
>> > For example, if the text file has:
>> >
>> > I like apples.
>> >
>> > I went to the store.
>> >
>> > I bought an apple.
>> >
>> >
>> >
>> > If the user searches "apple", I want it to return the first and third
>> > sentences.
>> >
>> >
>> >
>> > Is each sentence a Token?  Is the user input going to be a QueryParser?
>> >  How
>> > should I read in the file so that each line of text is a token to search?
>> >
>> >
>> >
>> > Thanks in advance.
>> >
>> >
>> >
>> >
>> >
>> >
>> >
>> > Brittany Jacobs
>> >
>> > Java Developer
>> >
>> > JBManagement, Inc.
>> >
>> > 12 Christopher Way, Suite 103
>> >
>> > Eatontown, NJ 07724
>> >
>> > ph: 732-542-9200 ext. 229
>> >
>> > fax: 732-380-0678
>> >
>> > email:  <ma...@jbmanagement.com> bjacobs@jbmanagement.com
>> >
>> >
>> >
>> >
>>
>

RE: getting started

Posted by Brittany Jacobs <bj...@jbmanagement.com>.

What is the web address you are seeing this message on?

Brittany Jacobs
Java Developer
JBManagement, Inc.
12 Christopher Way, Suite 103
Eatontown, NJ 07724
ph: 732-542-9200 ext. 229
fax: 732-380-0678
email: bjacobs@jbmanagement.com

-----Original Message-----
From: ನಾಗೇಶ್ ಸುಬ್ರಹ್ಮಣ್ಯ (Nagesh S) [mailto:nageshblore@gmail.com] 
Sent: Friday, August 01, 2008 10:00 AM
To: java-user@lucene.apache.org
Subject: Re: getting started

Why should each line be a Document ? If there is a single document having
each line as a Field, then the search would result in a single Document as a
'hit' not the individual lines matching it. Is this right ?

Nagesh

On Fri, Aug 1, 2008 at 7:21 PM, <ro...@xemaps.com> wrote:

> Hello Brittany,
>
> I think the easiest thing for you to do is make each line a Document.  You
> might want a FileName and LineNumber field on top of a "Text" field, this
> way if you need to gather all the lines of your File back together again
> you
> can do a search on the FileName.
>
> So in your case:
>
> Document 1
>  FileName: [the file]
>  LineNumber: 1
>  Text: I like apples
> Document 2
>  ...etc
>
> Regards,
> Roy
>
> On Fri, Aug 1, 2008 at 9:28 AM, Brittany Jacobs <bjacobs@jbmanagement.com
> >wrote:
>
> > Just trying to grasp the concept.
> >
> >
> >
> > I want to search a text file where each line is a separate item to be
> > searched.  When text it entered by the user, I want to return all the
> lines
> > in which that text appears.
> >
> > For example, if the text file has:
> >
> > I like apples.
> >
> > I went to the store.
> >
> > I bought an apple.
> >
> >
> >
> > If the user searches "apple", I want it to return the first and third
> > sentences.
> >
> >
> >
> > Is each sentence a Token?  Is the user input going to be a QueryParser?
> >  How
> > should I read in the file so that each line of text is a token to search?
> >
> >
> >
> > Thanks in advance.
> >
> >
> >
> >
> >
> >
> >
> > Brittany Jacobs
> >
> > Java Developer
> >
> > JBManagement, Inc.
> >
> > 12 Christopher Way, Suite 103
> >
> > Eatontown, NJ 07724
> >
> > ph: 732-542-9200 ext. 229
> >
> > fax: 732-380-0678
> >
> > email:  <ma...@jbmanagement.com> bjacobs@jbmanagement.com
> >
> >
> >
> >
>



---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org

Re: getting started

Posted by "ನಾಗೇಶ್ ಸುಬ್ರಹ್ಮಣ್ಯ (Nagesh S)" <na...@gmail.com>.

Why should each line be a Document ? If there is a single document having
each line as a Field, then the search would result in a single Document as a
'hit' not the individual lines matching it. Is this right ?

Nagesh

On Fri, Aug 1, 2008 at 7:21 PM, <ro...@xemaps.com> wrote:

> Hello Brittany,
>
> I think the easiest thing for you to do is make each line a Document.  You
> might want a FileName and LineNumber field on top of a "Text" field, this
> way if you need to gather all the lines of your File back together again
> you
> can do a search on the FileName.
>
> So in your case:
>
> Document 1
>  FileName: [the file]
>  LineNumber: 1
>  Text: I like apples
> Document 2
>  ...etc
>
> Regards,
> Roy
>
> On Fri, Aug 1, 2008 at 9:28 AM, Brittany Jacobs <bjacobs@jbmanagement.com
> >wrote:
>
> > Just trying to grasp the concept.
> >
> >
> >
> > I want to search a text file where each line is a separate item to be
> > searched.  When text it entered by the user, I want to return all the
> lines
> > in which that text appears.
> >
> > For example, if the text file has:
> >
> > I like apples.
> >
> > I went to the store.
> >
> > I bought an apple.
> >
> >
> >
> > If the user searches "apple", I want it to return the first and third
> > sentences.
> >
> >
> >
> > Is each sentence a Token?  Is the user input going to be a QueryParser?
> >  How
> > should I read in the file so that each line of text is a token to search?
> >
> >
> >
> > Thanks in advance.
> >
> >
> >
> >
> >
> >
> >
> > Brittany Jacobs
> >
> > Java Developer
> >
> > JBManagement, Inc.
> >
> > 12 Christopher Way, Suite 103
> >
> > Eatontown, NJ 07724
> >
> > ph: 732-542-9200 ext. 229
> >
> > fax: 732-380-0678
> >
> > email:  <ma...@jbmanagement.com> bjacobs@jbmanagement.com
> >
> >
> >
> >
>

Re: getting started

Posted by ro...@xemaps.com.

Hello Brittany,

I think the easiest thing for you to do is make each line a Document.  You
might want a FileName and LineNumber field on top of a "Text" field, this
way if you need to gather all the lines of your File back together again you
can do a search on the FileName.

So in your case:

Document 1
  FileName: [the file]
  LineNumber: 1
  Text: I like apples
Document 2
  ...etc

Regards,
Roy

On Fri, Aug 1, 2008 at 9:28 AM, Brittany Jacobs <bj...@jbmanagement.com>wrote:

> Just trying to grasp the concept.
>
>
>
> I want to search a text file where each line is a separate item to be
> searched.  When text it entered by the user, I want to return all the lines
> in which that text appears.
>
> For example, if the text file has:
>
> I like apples.
>
> I went to the store.
>
> I bought an apple.
>
>
>
> If the user searches "apple", I want it to return the first and third
> sentences.
>
>
>
> Is each sentence a Token?  Is the user input going to be a QueryParser?
>  How
> should I read in the file so that each line of text is a token to search?
>
>
>
> Thanks in advance.
>
>
>
>
>
>
>
> Brittany Jacobs
>
> Java Developer
>
> JBManagement, Inc.
>
> 12 Christopher Way, Suite 103
>
> Eatontown, NJ 07724
>
> ph: 732-542-9200 ext. 229
>
> fax: 732-380-0678
>
> email:  <ma...@jbmanagement.com> bjacobs@jbmanagement.com
>
>
>
>

RE: getting started

Posted by Brittany Jacobs <bj...@jbmanagement.com>.

Thank you so much!


-----Original Message-----
From: Ian Lea [mailto:ian.lea@gmail.com] 
Sent: Friday, August 01, 2008 9:51 AM
To: java-user@lucene.apache.org
Subject: Re: getting started

Each sentence will be a document.  Read the file a line at a time and
make each line a separate document.

The user input will be a word, or words, which you can pass through a
QueryParser to get a Query which can be used to search the index, and
which will return matching documents i.e. sentences.


Lucene in Action is strongly recommended.  Somewhat out of date but
all the core concepts are still valid.


--
Ian.


On Fri, Aug 1, 2008 at 2:28 PM, Brittany Jacobs
<bj...@jbmanagement.com> wrote:
> Just trying to grasp the concept.
>
>
>
> I want to search a text file where each line is a separate item to be
> searched.  When text it entered by the user, I want to return all the
lines
> in which that text appears.
>
> For example, if the text file has:
>
> I like apples.
>
> I went to the store.
>
> I bought an apple.
>
>
>
> If the user searches "apple", I want it to return the first and third
> sentences.
>
>
>
> Is each sentence a Token?  Is the user input going to be a QueryParser?
How
> should I read in the file so that each line of text is a token to search?
>
>
>
> Thanks in advance.
>
>
>
>
>
>
>
> Brittany Jacobs
>
> Java Developer
>
> JBManagement, Inc.
>
> 12 Christopher Way, Suite 103
>
> Eatontown, NJ 07724
>
> ph: 732-542-9200 ext. 229
>
> fax: 732-380-0678
>
> email:  <ma...@jbmanagement.com> bjacobs@jbmanagement.com
>
>
>
>

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org




---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org

Re: getting started

Posted by Ian Lea <ia...@gmail.com>.

Each sentence will be a document.  Read the file a line at a time and
make each line a separate document.

The user input will be a word, or words, which you can pass through a
QueryParser to get a Query which can be used to search the index, and
which will return matching documents i.e. sentences.


Lucene in Action is strongly recommended.  Somewhat out of date but
all the core concepts are still valid.


--
Ian.


On Fri, Aug 1, 2008 at 2:28 PM, Brittany Jacobs
<bj...@jbmanagement.com> wrote:
> Just trying to grasp the concept.
>
>
>
> I want to search a text file where each line is a separate item to be
> searched.  When text it entered by the user, I want to return all the lines
> in which that text appears.
>
> For example, if the text file has:
>
> I like apples.
>
> I went to the store.
>
> I bought an apple.
>
>
>
> If the user searches "apple", I want it to return the first and third
> sentences.
>
>
>
> Is each sentence a Token?  Is the user input going to be a QueryParser?  How
> should I read in the file so that each line of text is a token to search?
>
>
>
> Thanks in advance.
>
>
>
>
>
>
>
> Brittany Jacobs
>
> Java Developer
>
> JBManagement, Inc.
>
> 12 Christopher Way, Suite 103
>
> Eatontown, NJ 07724
>
> ph: 732-542-9200 ext. 229
>
> fax: 732-380-0678
>
> email:  <ma...@jbmanagement.com> bjacobs@jbmanagement.com
>
>
>
>

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org