You are viewing a plain text version of this content. The canonical link for it is here.
Posted to java-user@lucene.apache.org by David Pratt <fa...@eastlink.ca> on 2006/02/22 04:20:14 UTC
Searching/sorting strategy for many properties for semantic web app
Hi there. I am new to Lucene and I have been developing a semantic
application for a while and it appears to me Lucene could help me to get
a much needed search with reasonable speed. I have some general question
to start:
1) Since my app is virtually all metadata, what should I store in the
indexes if anything?
2) Should I only index the most common properties that people will
search and combine the rest (and index this combined text as a field)?
3) I would like to sort and filter results but am concerned this could
be very memory intensive
4) Some general guidance on organizing indexes in an app would be
appreciated.
My schema is fairly large but I generally expect people to search on
about 6 to 8 properties for the most part. I have the data stored in an
sql database but not in a conventional way. I am willing to accept a
slower advanced search on less common properties (accomodating this with
sql search) but I really want some speed for the main properties with
full text search.
Pretty much everything in the app is metadata so I am most interested in
focussing on the 6-8 properties that people will use to search on for
the most part. I am thinking of combining the text of the remaining
properties (quite a number) into a single description type field so that
essentially all information gets indexed and ranked. Is this a
reasonable approach?
I see that there are advanced possibilities with the indexes to sort and
filter. How advisable is using sort for large record sets. For example,
say you have got 20000 records returned from your search. Because this
will have a web interface I will only be showing first 20 likely so I
will be batching results. Is the sorting filtering highly memory intensive?
Hopefully, someone can provide some initial advice. Many thanks.
Regards,
David
---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org
Re: Searching/sorting strategy for many properties for semantic web app
Posted by David Pratt <fa...@eastlink.ca>.
Thanks Erik. I am continuing to experiment and making good progress. I
have got my basic functionality established and am now looking at
sorting and ranking. I guess the good thing is I can adjust and modify
things as I learn more. I am reading some archived material from the
list as well to get a general sense of issues that have come forward. I
managed to get a powerpoint from apachecon on advanced techniques. This
kind of materialhelps to understand what is possible and to discover
how dynamic Lucene really is.
Regards,
David
Erik Hatcher wrote:
>
> On Feb 22, 2006, at 9:01 PM, David Pratt wrote:
>
>> Hi Erik. Many thanks for your reply. I'll likely see if I can find a
>> list to pose a couple of questions there way. I am having fun with
>> Lucene since it is new to me and I am impressed with the speed I am
>> getting. I am reading anything I can get hold of and trying different
>> code experiments. So far, the code is fairly straight forward so not
>> so concerned about this at the moment.
>>
>> I am really hoping to hear from experienced people like yourself more
>> on strategically what to index, what sort of things it would be a
>> good idea to store and what to do about a fairly large schema that
>> has much metadata to offer. Also perhaps when sorting and filtering
>> gets too expensive. I realize that just because the metadata is
>> available doesn't necessarily mean you want to even put it all in an
>> index. I think these issues are pretty general, however I know there
>> are folks on this that would likely advise some particular path or
>> direction because of their own experiences with Lucene. I would
>> really like to hear from anyone that has been working with metadata
>> particularly or anyone generally about these topics.
>
>
> In my University job, I'm dealing with a fair bit of metadata in the
> form of RDF about 19th century literature objects. I'm indexing basic
> Dublin Core data such as title and author as individual fields, and
> also dropping all indexed metadata into a single searchable field.
> I've been using Kowari as the metadata store, but it also has Lucene
> integration (that I've not tried myself yet).
>
> I'm not sure what else to add as your query is a bit general. I think
> you'll find if you post more specific questions you're more likely to
> get detailed responses. General queries tend to be too general to
> respond to, I find.
>
> There really are no "best practices" with Lucene in terms of what to
> index, what to store - these are all highly application dependent and
> is often something I tune as the application itself evolves.
>
> Erik
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-user-help@lucene.apache.org
>
---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org
Re: Searching/sorting strategy for many properties for semantic web app
Posted by Erik Hatcher <er...@ehatchersolutions.com>.
On Feb 22, 2006, at 9:01 PM, David Pratt wrote:
> Hi Erik. Many thanks for your reply. I'll likely see if I can find
> a list to pose a couple of questions there way. I am having fun
> with Lucene since it is new to me and I am impressed with the speed
> I am getting. I am reading anything I can get hold of and trying
> different code experiments. So far, the code is fairly straight
> forward so not so concerned about this at the moment.
>
> I am really hoping to hear from experienced people like yourself
> more on strategically what to index, what sort of things it would
> be a good idea to store and what to do about a fairly large schema
> that has much metadata to offer. Also perhaps when sorting and
> filtering gets too expensive. I realize that just because the
> metadata is available doesn't necessarily mean you want to even put
> it all in an index. I think these issues are pretty general,
> however I know there are folks on this that would likely advise
> some particular path or direction because of their own experiences
> with Lucene. I would really like to hear from anyone that has been
> working with metadata particularly or anyone generally about these
> topics.
In my University job, I'm dealing with a fair bit of metadata in the
form of RDF about 19th century literature objects. I'm indexing
basic Dublin Core data such as title and author as individual fields,
and also dropping all indexed metadata into a single searchable
field. I've been using Kowari as the metadata store, but it also has
Lucene integration (that I've not tried myself yet).
I'm not sure what else to add as your query is a bit general. I
think you'll find if you post more specific questions you're more
likely to get detailed responses. General queries tend to be too
general to respond to, I find.
There really are no "best practices" with Lucene in terms of what to
index, what to store - these are all highly application dependent and
is often something I tune as the application itself evolves.
Erik
---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org
Re: Searching/sorting strategy for many properties for semantic web app
Posted by David Pratt <fa...@eastlink.ca>.
Hi Erik. Many thanks for your reply. I'll likely see if I can find a
list to pose a couple of questions there way. I am having fun with
Lucene since it is new to me and I am impressed with the speed I am
getting. I am reading anything I can get hold of and trying different
code experiments. So far, the code is fairly straight forward so not so
concerned about this at the moment.
I am really hoping to hear from experienced people like yourself more on
strategically what to index, what sort of things it would be a good idea
to store and what to do about a fairly large schema that has much
metadata to offer. Also perhaps when sorting and filtering gets too
expensive. I realize that just because the metadata is available doesn't
necessarily mean you want to even put it all in an index. I think these
issues are pretty general, however I know there are folks on this that
would likely advise some particular path or direction because of their
own experiences with Lucene. I would really like to hear from anyone
that has been working with metadata particularly or anyone generally
about these topics.
Regards,
David
Erik Hatcher wrote:
> One very nice implementation to take a look at is the Simile project at
> MIT. The Piggy Bank and Longwell projects use Lucene to index RDF and
> integrate full-text and structural queries nicely together.
> http://simile.mit.edu
>
> Erik
>
> On Feb 21, 2006, at 10:20 PM, David Pratt wrote:
>
>> Hi there. I am new to Lucene and I have been developing a semantic
>> application for a while and it appears to me Lucene could help me to
>> get a much needed search with reasonable speed. I have some general
>> question to start:
>>
>> 1) Since my app is virtually all metadata, what should I store in the
>> indexes if anything?
>> 2) Should I only index the most common properties that people will
>> search and combine the rest (and index this combined text as a field)?
>> 3) I would like to sort and filter results but am concerned this
>> could be very memory intensive
>> 4) Some general guidance on organizing indexes in an app would be
>> appreciated.
>>
>> My schema is fairly large but I generally expect people to search on
>> about 6 to 8 properties for the most part. I have the data stored in
>> an sql database but not in a conventional way. I am willing to accept
>> a slower advanced search on less common properties (accomodating this
>> with sql search) but I really want some speed for the main properties
>> with full text search.
>>
>> Pretty much everything in the app is metadata so I am most interested
>> in focussing on the 6-8 properties that people will use to search on
>> for the most part. I am thinking of combining the text of the
>> remaining properties (quite a number) into a single description type
>> field so that essentially all information gets indexed and ranked. Is
>> this a reasonable approach?
>>
>> I see that there are advanced possibilities with the indexes to sort
>> and filter. How advisable is using sort for large record sets. For
>> example, say you have got 20000 records returned from your search.
>> Because this will have a web interface I will only be showing first
>> 20 likely so I will be batching results. Is the sorting filtering
>> highly memory intensive?
>>
>> Hopefully, someone can provide some initial advice. Many thanks.
>>
>> Regards,
>> David
>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
>> For additional commands, e-mail: java-user-help@lucene.apache.org
>
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-user-help@lucene.apache.org
>
---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org
Re: Searching/sorting strategy for many properties for semantic web app
Posted by Erik Hatcher <er...@ehatchersolutions.com>.
One very nice implementation to take a look at is the Simile project
at MIT. The Piggy Bank and Longwell projects use Lucene to index
RDF and integrate full-text and structural queries nicely together.
http://simile.mit.edu
Erik
On Feb 21, 2006, at 10:20 PM, David Pratt wrote:
> Hi there. I am new to Lucene and I have been developing a semantic
> application for a while and it appears to me Lucene could help me
> to get a much needed search with reasonable speed. I have some
> general question to start:
>
> 1) Since my app is virtually all metadata, what should I store in
> the indexes if anything?
> 2) Should I only index the most common properties that people will
> search and combine the rest (and index this combined text as a field)?
> 3) I would like to sort and filter results but am concerned this
> could be very memory intensive
> 4) Some general guidance on organizing indexes in an app would be
> appreciated.
>
> My schema is fairly large but I generally expect people to search
> on about 6 to 8 properties for the most part. I have the data
> stored in an sql database but not in a conventional way. I am
> willing to accept a slower advanced search on less common
> properties (accomodating this with sql search) but I really want
> some speed for the main properties with full text search.
>
> Pretty much everything in the app is metadata so I am most
> interested in focussing on the 6-8 properties that people will use
> to search on for the most part. I am thinking of combining the text
> of the remaining properties (quite a number) into a single
> description type field so that essentially all information gets
> indexed and ranked. Is this a reasonable approach?
>
> I see that there are advanced possibilities with the indexes to
> sort and filter. How advisable is using sort for large record sets.
> For example, say you have got 20000 records returned from your
> search. Because this will have a web interface I will only be
> showing first 20 likely so I will be batching results. Is the
> sorting filtering highly memory intensive?
>
> Hopefully, someone can provide some initial advice. Many thanks.
>
> Regards,
> David
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-user-help@lucene.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org