You are viewing a plain text version of this content. The canonical link for it is here.
Posted to java-user@lucene.apache.org by Dmitry <dm...@hotmail.com> on 2007/07/26 05:38:04 UTC
Displaying results in the order
Is there a way to update a document in the Index without causing any change
to the order in which it comes up in searches?
thanks,
DT,
www.ejinz.com
Search everything
news, tech, movies, music
---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org
Re: Linear Hashing in Lucene?
Posted by Dmitry <dm...@hotmail.com>.
Karl,
Thanks for info, its very difficult to find something about Orion Algorithm
and Linear Hashing.
I will check the thread .
DT,
www.ejinz.com
Search Engine Advertisement
----- Original Message -----
From: "karl wettin" <ka...@gmail.com>
To: <ja...@lucene.apache.org>
Sent: Thursday, July 26, 2007 3:49 PM
Subject: Re: Linear Hashing in Lucene?
>
> 26 jul 2007 kl. 05.56 skrev Dmitry:
>
>> 1. does exist Ontology Wraper in Lucene implementation?
>
> Not publically available as far as I know. There have been some
> discussion on the forums though, you could try to search for OWL, RDF or
> something using Nabble and get in touch with the authors of those threads
> to see if they came up with something you can use. Feel free to report
> with your success or failure in locating something.
>
> --
> karl
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-user-help@lucene.apache.org
>
>
---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org
Re: Linear Hashing in Lucene?
Posted by karl wettin <ka...@gmail.com>.
26 jul 2007 kl. 05.56 skrev Dmitry:
> 1. does exist Ontology Wraper in Lucene implementation?
Not publically available as far as I know. There have been some
discussion on the forums though, you could try to search for OWL, RDF
or something using Nabble and get in touch with the authors of those
threads to see if they came up with something you can use. Feel free
to report with your success or failure in locating something.
--
karl
---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org
Linear Hashing in Lucene?
Posted by Dmitry <dm...@hotmail.com>.
Hey,
Some common questions about Lucene.
1. does exist Ontology Wraper in Lucene implementation?
2. Does Lucene using Linear Hashing?
thnaks,
DT,
www.ejinz.com
Search news
---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org
Re: Detection of index dublicates in Lucene
Posted by Michael Stoppelman <st...@gmail.com>.
A couple of thoughts here...
You could hash (e.g.md5) all the documents in your index and eliminate
duplicates that way. Just pick one of the docs in the hash bucket as
the non-dup document and the delete the other dups. This could be run as a
batch job to eliminate the duplicates in an off-line process.
Alternatively, if you don't want to run over the entire index all the time,
you could add a field that contains some kind of hash (e.g. md5) as a field
in
lucene Document. At query time you could make sure all the Documents
returned are unique.
-M
On 7/28/07, Dmitry <dm...@hotmail.com> wrote:
>
> We trying to find are any implementation for Lucene - detection index
> duclicates.
> Assuming we have a set of documents and a document is a bunch of words.
> After we created indexec for the same document we need to knwo that all
> ideces will be uniq for specific document. (lexical equivalency).
>
> Can we have like implementation of algorithm has not determined a
> duplicate
> and another situation when algorithm has offered a false duplicate. In
> this
> case we can find all dublicate indeces.
>
> And the same Algorithm we can use to detect Document dublicates - in this
> case we save time and can get better performance not to run indexed
> services
> against this document.
>
> Please any suggestions will be good.
>
> Thanks,
>
> DT,
>
> www.ejinz.com
>
> Search Engine News
>
>
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-user-help@lucene.apache.org
>
>
Re: Detection of index dublicates in Lucene
Posted by karl wettin <ka...@gmail.com>.
30 jul 2007 kl. 14.43 skrev Grant Ingersoll:
> I believe Nutch has a duplicate detection algorithm. I don't know
> how easy it would be to run independently on a Lucene index.
There have also been a bunch of near-duplicate ideas that have been
presented on the forums before.
This is one of the threads: <http://www.nabble.com/Checking-for-
duplicates-inside-index-tf1665494.html>
--
karl
>
> -Grant
>
> On Jul 29, 2007, at 2:18 AM, Dmitry wrote:
>
>> We trying to find are any implementation for Lucene - detection
>> index duclicates.
>> Assuming we have a set of documents and a document is a bunch of
>> words. After we created indexec for the same document we need to
>> knwo that all ideces will be uniq for specific document. (lexical
>> equivalency).
>>
>> Can we have like implementation of algorithm has not determined a
>> duplicate and another situation when algorithm has offered a false
>> duplicate. In this case we can find all dublicate indeces.
>>
>> And the same Algorithm we can use to detect Document dublicates -
>> in this case we save time and can get better performance not to
>> run indexed services against this document.
>>
>> Please any suggestions will be good.
>>
>> Thanks,
>>
>> DT,
>>
>> www.ejinz.com
>>
>> Search Engine News
>>
>>
>>
>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
>> For additional commands, e-mail: java-user-help@lucene.apache.org
>>
>
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-user-help@lucene.apache.org
>
---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org
Re: Detection of index dublicates in Lucene
Posted by Grant Ingersoll <gs...@apache.org>.
I believe Nutch has a duplicate detection algorithm. I don't know
how easy it would be to run independently on a Lucene index.
-Grant
On Jul 29, 2007, at 2:18 AM, Dmitry wrote:
> We trying to find are any implementation for Lucene - detection
> index duclicates.
> Assuming we have a set of documents and a document is a bunch of
> words. After we created indexec for the same document we need to
> knwo that all ideces will be uniq for specific document. (lexical
> equivalency).
>
> Can we have like implementation of algorithm has not determined a
> duplicate and another situation when algorithm has offered a false
> duplicate. In this case we can find all dublicate indeces.
>
> And the same Algorithm we can use to detect Document dublicates -
> in this case we save time and can get better performance not to run
> indexed services against this document.
>
> Please any suggestions will be good.
>
> Thanks,
>
> DT,
>
> www.ejinz.com
>
> Search Engine News
>
>
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-user-help@lucene.apache.org
>
---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org
Detection of index dublicates in Lucene
Posted by Dmitry <dm...@hotmail.com>.
We trying to find are any implementation for Lucene - detection index
duclicates.
Assuming we have a set of documents and a document is a bunch of words.
After we created indexec for the same document we need to knwo that all
ideces will be uniq for specific document. (lexical equivalency).
Can we have like implementation of algorithm has not determined a duplicate
and another situation when algorithm has offered a false duplicate. In this
case we can find all dublicate indeces.
And the same Algorithm we can use to detect Document dublicates - in this
case we save time and can get better performance not to run indexed services
against this document.
Please any suggestions will be good.
Thanks,
DT,
www.ejinz.com
Search Engine News
---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org
Re: lucene integration with PDM Windchill (Product Data Management System)
Posted by Dmitry <dm...@hotmail.com>.
Karl,
And by the way we created one of the solution - but we need to have more
embedded interfaces / implementation fucntions to the PDM Widchill.
(article you can find on www.profilesmagazine.com for one of the solution).
Thanks,
DT,
www.ejinz.com
Search Engine Platform News
----- Original Message -----
From: "Dmitry" <dm...@hotmail.com>
To: <ja...@lucene.apache.org>
Sent: Saturday, July 28, 2007 6:56 PM
Subject: Re: lucene integration with PDM Windchill (Product Data Management
System)
> Karl,
>
> thanks for help.
>
> I will try to explain requirements. There is system PDM - product Data
> Management System - which manages the data related to products, supports
> precudures druing the product lifecycle deals with the development and
> production infrastructure.
> There following design paterns which can be used for implementation and
> customization of this system - point to integrate Lucene with Windchill
> (PDM):
>
> - The Object Reference Design Pattern (encapsulates details concerning
> persistable objects and their unique databse keys- like for Objects
> WTParts, WTAssemblies, WTDocuments and their relations)
> - Business Service Design Pattern (building windchill services - like
> using remote interfaces service Events)
> - Master-iteration Design Pattern (like two objects: Mastered and
> Iterated: adhare all versioned data in Windchill PDM system <PDMLink> and
> <ProjectLink>)
>
> We need use Lucene with WTObeject like WTDocuments && WTParts to create
> indexes and provide search service.
> The properties of a document for instance - MS Documnts or PDF files are
> specified on the WTDocument class. But the sum of the propertiesare stored
> on WTDocumnetMaster. WTDocument implements ContenrHolder - primary content
> and secondary content. And by the way WTDocument can create two types of
> relationships to other documents using WTDocumentUsageLink (build
> relationships between documents and document structure). Atributes which
> need to be used for searching are on either WTDocumentMaster or
> WTDocument clasess.
>
> This was just short desription of architecture of PDMLink - Windchill.
> So we need create some Lucene services(processors) embedded to the system
> using extended interfaces for creation indexes and Search all Documnets by
> Attributes.
>
> thanks,
> Dmitry
> www.ejinz.com
> Search Engine News
>
> --------------------------------------------------------------------
>
>> Trying to integrate PDM system : WTPart obejct with Lucene indexing
>> search framework. Part of the work is integration with persistent layer
>> <hibernate>+ indeces storage+ mysql
>
> You have a product data management software of some sort that use
> MySQL via Hibernate for persistency, and now you want to use Lucene
> to search the data?
>
>> Could not find a good solution ...
>
> What possible solutions did you find, and what was the problem with
> these?
>
>> please advice
>
> My guess is that very few people in this forum knows how Windchill
> works, what a WTPart is or what the complexity of the system is in a
> persistent state. So you probably need to explain your requirements a
> bit more in order to get a helpful answer or discussion going.
>
>
> --
> karl
>
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-user-help@lucene.apache.org
>
>
---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org
Re: lucene integration with PDM Windchill (Product Data Management System)
Posted by karl wettin <ka...@gmail.com>.
I'm not sure I understand how all the objects you talk to relate to
each other, nor do I see a concrete question in your post.
Is your problem that you do not know how to denormalize the object
grapgs in order to represent them as Lucene documents that make sense
to search in? Or are you asking how to implement an ad hoc service
for your PDM that communicate with Lucene?
The latter is probably a question we will not be able to help out
with more than explaining how readers, writers and searchers work.
29 jul 2007 kl. 01.56 skrev Dmitry:
> Karl,
>
> thanks for help.
>
> I will try to explain requirements. There is system PDM - product
> Data Management System - which manages the data related to
> products, supports precudures druing the product lifecycle deals
> with the development and production infrastructure.
> There following design paterns which can be used for implementation
> and customization of this system - point to integrate Lucene with
> Windchill (PDM):
>
> - The Object Reference Design Pattern (encapsulates details
> concerning persistable objects and their unique databse keys- like
> for Objects WTParts, WTAssemblies, WTDocuments and their relations)
> - Business Service Design Pattern (building windchill services -
> like using remote interfaces service Events)
> - Master-iteration Design Pattern (like two objects: Mastered and
> Iterated: adhare all versioned data in Windchill PDM system
> <PDMLink> and <ProjectLink>)
>
> We need use Lucene with WTObeject like WTDocuments && WTParts to
> create indexes and provide search service.
> The properties of a document for instance - MS Documnts or PDF
> files are specified on the WTDocument class. But the sum of the
> propertiesare stored on WTDocumnetMaster. WTDocument implements
> ContenrHolder - primary content and secondary content. And by the
> way WTDocument can create two types of relationships to other
> documents using WTDocumentUsageLink (build relationships between
> documents and document structure). Atributes which need to be used
> for searching are on either WTDocumentMaster or WTDocument clasess.
>
> This was just short desription of architecture of PDMLink -
> Windchill.
> So we need create some Lucene services(processors) embedded to the
> system using extended interfaces for creation indexes and Search
> all Documnets by Attributes.
>
> thanks,
> Dmitry
> www.ejinz.com
> Search Engine News
>
> --------------------------------------------------------------------
>
>> Trying to integrate PDM system : WTPart obejct with Lucene
>> indexing search framework. Part of the work is integration with
>> persistent layer <hibernate>+ indeces storage+ mysql
>
> You have a product data management software of some sort that use
> MySQL via Hibernate for persistency, and now you want to use Lucene
> to search the data?
>
>> Could not find a good solution ...
>
> What possible solutions did you find, and what was the problem with
> these?
>
>> please advice
>
> My guess is that very few people in this forum knows how Windchill
> works, what a WTPart is or what the complexity of the system is in a
> persistent state. So you probably need to explain your requirements a
> bit more in order to get a helpful answer or discussion going.
>
>
> --
> karl
>
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-user-help@lucene.apache.org
>
---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org
Re: lucene integration with PDM Windchill (Product Data Management System)
Posted by Dmitry <dm...@hotmail.com>.
Karl,
thanks for help.
I will try to explain requirements. There is system PDM - product Data
Management System - which manages the data related to products, supports
precudures druing the product lifecycle deals with the development and
production infrastructure.
There following design paterns which can be used for implementation and
customization of this system - point to integrate Lucene with Windchill
(PDM):
- The Object Reference Design Pattern (encapsulates details concerning
persistable objects and their unique databse keys- like for Objects WTParts,
WTAssemblies, WTDocuments and their relations)
- Business Service Design Pattern (building windchill services - like using
remote interfaces service Events)
- Master-iteration Design Pattern (like two objects: Mastered and Iterated:
adhare all versioned data in Windchill PDM system <PDMLink> and
<ProjectLink>)
We need use Lucene with WTObeject like WTDocuments && WTParts to create
indexes and provide search service.
The properties of a document for instance - MS Documnts or PDF files are
specified on the WTDocument class. But the sum of the propertiesare stored
on WTDocumnetMaster. WTDocument implements ContenrHolder - primary content
and secondary content. And by the way WTDocument can create two types of
relationships to other documents using WTDocumentUsageLink (build
relationships between documents and document structure). Atributes which
need to be used for searching are on either WTDocumentMaster or WTDocument
clasess.
This was just short desription of architecture of PDMLink - Windchill.
So we need create some Lucene services(processors) embedded to the system
using extended interfaces for creation indexes and Search all Documnets by
Attributes.
thanks,
Dmitry
www.ejinz.com
Search Engine News
--------------------------------------------------------------------
> Trying to integrate PDM system : WTPart obejct with Lucene indexing search
> framework. Part of the work is integration with persistent layer
> <hibernate>+ indeces storage+ mysql
You have a product data management software of some sort that use
MySQL via Hibernate for persistency, and now you want to use Lucene
to search the data?
> Could not find a good solution ...
What possible solutions did you find, and what was the problem with
these?
> please advice
My guess is that very few people in this forum knows how Windchill
works, what a WTPart is or what the complexity of the system is in a
persistent state. So you probably need to explain your requirements a
bit more in order to get a helpful answer or discussion going.
--
karl
---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org
Re: Displaying results in the order
Posted by karl wettin <ka...@gmail.com>.
26 jul 2007 kl. 05.38 skrev Dmitry:
> Is there a way to update a document in the Index without causing
> any change
> to the order in which it comes up in searches?
I would say no, the score is calculated based on the matching terms,
content length, et c. For details see
<http://lucene.zones.apache.org:8080/hudson/job/Lucene-Nightly/
javadoc/org/apache/lucene/search/Similarity.html>
You might want to explain why you want to do this.
--
karl
---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org