You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@lucene.apache.org by Sanne Grinovero <sa...@gmail.com> on 2013/07/04 17:17:40 UTC

Re: Indexing file with security problem

To be honest I am not familiar with ManifoldCF, so I won't say if
Hibernate Search is better or not, but it would definitely not be too
hard with Hibernate Search:

1) You annotate with @Indexed the entity referring to your PostgreSQL
table containing the metadata; with @TikaBridge you point it to the
external resource containing the document.

Returning database ids is the default behaviour.

http://docs.jboss.org/hibernate/search/4.3/reference/en-US/html_single/#d0e4244

2) Is a bit more complex but I don't think any more complex than what
it would be with other technologies: you should encode some
information in the index, then define a parametric filter on that.

http://docs.jboss.org/hibernate/search/4.3/reference/en-US/html_single/#query-filter

3) Not sure, sorry. But the automatic indexing triggers happen as soon
as you store the metadata, so maybe that is good enough?

Looks interesting!

Sanne - Hibernate Search team


On 27 June 2013 03:14, Otis Gospodnetic <ot...@gmail.com> wrote:
> Hi,
>
> I would start from ManifoldCF - it may save you some work.
>
> Otis
> Solr & ElasticSearch Support
> http://sematext.com/
>
> On Jun 26, 2013 5:01 PM, "lukasw" <lu...@gmail.com> wrote:
>>
>> Hello
>>
>> I'll try to briefly describe my problem and task.
>> My name is Lukas and i am Java developer , my task is to create search
>> engine for different types of file (only text file types) pdf, word, odf,
>> xml but not html.
>> I have got little experience with lucene about year ago i wrote simple
>> full
>> text search using lucene and hibernate search. That was simple project.
>> But
>> now i have got very difficult task with searching.
>> We are using java 1.7 and glassfish 3 and i have to concentrate only
>> server
>> side approach not client ui. Ther is my three major problem :
>>
>> 1) All files is stored on webdav server, but information about file name ,
>> id file typ etc are stored into database (postgresql) so when i creating
>> index i need to use both information. As a result of query i need only
>> return file id from database. Summary content of file is stored in server
>> but information about file is stored in database so we must retrieve both.
>>
>> 2) Secondary problem it that  each file has a level of secrecy. But major
>> problem is that this level is calculated dynamically. When calculating
>> level
>> of security for file we considering several properties. The static
>> properties is files location, the folder in which the file is, but also
>> dynamic  information  user profiles user roles and departments . So when
>> user "Maggie" is logged she can search only files "test.pdf" , "test2.doc"
>> etc but if user "Stev" is logged he have got different profiles such a
>> Maggie so he can only search some phase in file "broken.pdf",
>> "mybook.odt".
>> test2.doc etc ..... . I think that when for example user search phase
>> "lucene +solr" we search in all indexed documents and after that filtered
>> result. But i think that solution is  is not very efficient. What if
>> results
>> count 100 files , so what next we filtered step by step each files  ? But
>> i
>> do not see any other solution. Maybe you can help me and lucene or solr
>> have
>> got mechanism to help.
>>
>> 3) Last problem is that some files are encrypted. So that files must be
>> indexed only once before encryption ! But i think that if we indexed
>> secure
>> files so we get security issue. Because all word from that file is
>> tokenized.
>> I have not got any idea haw to secure lucene documents and index datastore
>> ?
>> its possible ...
>>
>>
>> Also i have got question that i need to use Solr for my serarch engine or
>> using only lucene and write own search engine ? So as you can see i have
>> not
>> got problem with indexing , serching but with security files and files
>> secured levels.
>>
>> Thanks for any hints and time you spend for me.
>>
>>
>>
>> --
>> View this message in context:
>> http://lucene.472066.n3.nabble.com/Indexing-file-with-security-problem-tp4073394.html
>> Sent from the Lucene - Java Developer mailing list archive at Nabble.com.
>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
>> For additional commands, e-mail: dev-help@lucene.apache.org
>>
>

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org