You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@tuscany.apache.org by Wojtek Janiszewski <wo...@gmail.com> on 2009/03/30 23:19:34 UTC

[GSoC 2009] Search in SCA domain manager web app

Hi,
I'm interested in taking part in Google Summer of Code and project 
"tuscany-scadomain-search" [1] sounds interesting to me.

I've made a quick look inside domain manager web app and Apache Lucene 
and made few assumptions for a start. I defined three main areas which 
project should cover and they are indexing, searching and presentation. 
Having those areas separeted allows us to write modular code and test it.

1. Indexing

- Indexing should include all available contributions. File names as 
well as their contents (except non readable files like Java classes) 
should be indexed. Every indexed item should have link to its 
contribution parent.

- After adding, updating or deleting contribution from domain manager 
web application appropriate items should be reindexed.

- We may also consider having connections between indexed items, ie. we 
could scan composite files to acquire children names and build reversed 
links, so every indexed item (script, Java class etc.) could have 
connection to its composite parents.

2. Searching

- Search feature would be accessible via SCA domain manager web 
application. It should allow to:
-- simply search for files by name
-- search files content
-- filter - search inside specified contribution or composite

- Maybe we should consider candies like Ajax hints while typing search 
phrase?

- More research one Apache Lucene could provide more searching ideas.

3. Presentation

- Each search result should be presented using name and link to 
contribution which it belongs to. If it's viewable (it's not Java class 
etc) then simple preview feature for such item should be enabled. 
Obviously matched text should be highlighted (as Google does).

- If information about composite parents for this items would be 
accessible then such composites also should be listed.


This quick draft is direction which I'll take while creating proposal. 
It appears to be interesting project, especially it allows to explore 
new areas (everything beyond bindings in Tuscany, Lucene). There is 
still much place to improve (like other features) so any comments are 
welcome.

Thanks,
Wojtek

[1] - 
http://wiki.apache.org/general/SummerOfCode2009#tuscany-scadomain-search

Re: [GSoC 2009] Search in SCA domain manager web app

Posted by Wojtek Janiszewski <wo...@gmail.com>.
Hi Adriano, Simon.

If I understand correctly such capability would require index model  
redesign. For now each index document is physically one file in  
contribution, new approach would require index document to be logical  
part o file (ie. policy set can be part of definitions.xml).

It's good remark and makes me more aware of making whole thing more  
generic. Since it's lot of work I'd like to implement initial idea  
(but extendable) for a start.

Thanks,
Wojtek

Adriano Crestani:

> Hi Simon,
>
> Not sure, I was wondering if the search capability can be extended
> beyond components, services, references to the other artifacts in play
> in a running application, e.g. policy sets. Maybe it takes account of
> this already and I'm just not understanding properly. Also I'm not
> suggesting this extension as a first stage just trying to understand
> how it would be added in the future.
>
> That is the idea, to index everything contained in a artifact which  
> can be searched in future ; )
>
> Best Regards,
> Adriano Crestani Campos
>
> On Thu, Apr 16, 2009 at 1:55 AM, Simon Laws  
> <si...@googlemail.com> wrote:
> snip...
>
> >
> > By "Service/Reference/Component/Binding/Implementation/... " field  
> I meant
> > multiple fields. In fact we would have separate field for service,
> > reference, component etc.
>
> OK
>
> >
> >> I'm asking as I'm attracted by the presentation example you give  
> where
> >> you show an initial phrase search giving way to more targeted item
> >> based searches. A complexity of Tuscany is that it's based on  
> number
> >> hierarchies and relationships, e.g.
> >>
> >> contribution import/export
> >> component type
> >> component promotion
> >> component wiring
> >> domain/node confguration
> >> intent and policy configuration
> >>
> >> Finding things can often mean searching through various, seemingly
> >> unrelated, files. This is particularly the case where policy is
> >> concerned.  It seems that you are solving this problem and I'm
> >> wondering what general provision can be made to extend the index
> >> beyond the original contribution object.
> >>
> >> Regards
> >>
> >> Simon
> >
> >
> > I'm not sure if I'm getting you correctly. Do you mean having some  
> general
> > index field instead of multiple fields for Service, Reference,  
> Component
> > etc.?
> >
> >
>
> Not sure, I was wondering if the search capability can be extended
> beyond components, services, references to the other artifacts in play
> in a running application, e.g. policy sets. Maybe it takes account of
> this already and I'm just not understanding properly. Also I'm not
> suggesting this extension as a first stage just trying to understand
> how it would be added in the future.
>
> Simon
>


Re: [GSoC 2009] Search in SCA domain manager web app

Posted by Simon Laws <si...@googlemail.com>.
>
> That is the idea, to index everything contained in a artifact which can be
> searched in future ; )
>

Ok, nice.

Simon

Re: [GSoC 2009] Search in SCA domain manager web app

Posted by Adriano Crestani <ad...@apache.org>.
Hi Simon,

Not sure, I was wondering if the search capability can be extended
beyond components, services, references to the other artifacts in play
in a running application, e.g. policy sets. Maybe it takes account of
this already and I'm just not understanding properly. Also I'm not
suggesting this extension as a first stage just trying to understand
how it would be added in the future.

That is the idea, to index everything contained in a artifact which can be
searched in future ; )

Best Regards,
Adriano Crestani Campos

On Thu, Apr 16, 2009 at 1:55 AM, Simon Laws <si...@googlemail.com>wrote:

> snip...
>
> >
> > By "Service/Reference/Component/Binding/Implementation/... " field I
> meant
> > multiple fields. In fact we would have separate field for service,
> > reference, component etc.
>
> OK
>
> >
> >> I'm asking as I'm attracted by the presentation example you give where
> >> you show an initial phrase search giving way to more targeted item
> >> based searches. A complexity of Tuscany is that it's based on number
> >> hierarchies and relationships, e.g.
> >>
> >> contribution import/export
> >> component type
> >> component promotion
> >> component wiring
> >> domain/node confguration
> >> intent and policy configuration
> >>
> >> Finding things can often mean searching through various, seemingly
> >> unrelated, files. This is particularly the case where policy is
> >> concerned.  It seems that you are solving this problem and I'm
> >> wondering what general provision can be made to extend the index
> >> beyond the original contribution object.
> >>
> >> Regards
> >>
> >> Simon
> >
> >
> > I'm not sure if I'm getting you correctly. Do you mean having some
> general
> > index field instead of multiple fields for Service, Reference, Component
> > etc.?
> >
> >
>
> Not sure, I was wondering if the search capability can be extended
> beyond components, services, references to the other artifacts in play
> in a running application, e.g. policy sets. Maybe it takes account of
> this already and I'm just not understanding properly. Also I'm not
> suggesting this extension as a first stage just trying to understand
> how it would be added in the future.
>
> Simon
>

Re: [GSoC 2009] Search in SCA domain manager web app

Posted by Simon Laws <si...@googlemail.com>.
snip...

>
> By "Service/Reference/Component/Binding/Implementation/... " field I meant
> multiple fields. In fact we would have separate field for service,
> reference, component etc.

OK

>
>> I'm asking as I'm attracted by the presentation example you give where
>> you show an initial phrase search giving way to more targeted item
>> based searches. A complexity of Tuscany is that it's based on number
>> hierarchies and relationships, e.g.
>>
>> contribution import/export
>> component type
>> component promotion
>> component wiring
>> domain/node confguration
>> intent and policy configuration
>>
>> Finding things can often mean searching through various, seemingly
>> unrelated, files. This is particularly the case where policy is
>> concerned.  It seems that you are solving this problem and I'm
>> wondering what general provision can be made to extend the index
>> beyond the original contribution object.
>>
>> Regards
>>
>> Simon
>
>
> I'm not sure if I'm getting you correctly. Do you mean having some general
> index field instead of multiple fields for Service, Reference, Component
> etc.?
>
>

Not sure, I was wondering if the search capability can be extended
beyond components, services, references to the other artifacts in play
in a running application, e.g. policy sets. Maybe it takes account of
this already and I'm just not understanding properly. Also I'm not
suggesting this extension as a first stage just trying to understand
how it would be added in the future.

Simon

Re: [GSoC 2009] Search in SCA domain manager web app

Posted by Wojtek Janiszewski <wo...@gmail.com>.
Hi, Simon,
comments inline.

Thanks,
Wojtek

Simon Laws:
>
> Hi Wojtek
>
> Nice proposal (and something that would be really useful for Tuscany).
>
> A question. It feels like absolutely the right thing to do to start at
> the contribution and contribution content level but I'm interested in
> the index field you have described as
> "Service/Reference/Component/Binding/Implementation/... " which to me
> suggests some indexing of the contents of the files in the
> contribution, in this case the composite file.
>

By "Service/Reference/Component/Binding/Implementation/... " field I  
meant multiple fields. In fact we would have separate field for  
service, reference, component etc.

> I'm asking as I'm attracted by the presentation example you give where
> you show an initial phrase search giving way to more targeted item
> based searches. A complexity of Tuscany is that it's based on number
> hierarchies and relationships, e.g.
>
> contribution import/export
> component type
> component promotion
> component wiring
> domain/node confguration
> intent and policy configuration
>
> Finding things can often mean searching through various, seemingly
> unrelated, files. This is particularly the case where policy is
> concerned.  It seems that you are solving this problem and I'm
> wondering what general provision can be made to extend the index
> beyond the original contribution object.
>
> Regards
>
> Simon


I'm not sure if I'm getting you correctly. Do you mean having some  
general index field instead of multiple fields for Service, Reference,  
Component etc.?


Re: [GSoC 2009] Search in SCA domain manager web app

Posted by Simon Laws <si...@googlemail.com>.
On Sat, Apr 4, 2009 at 3:43 PM, Wojtek Janiszewski
<wo...@gmail.com> wrote:
> Hi,
> thanks for your input. Please see my comments inline.
>
> Wojtek
>
> Adriano Crestani:
>>
>> Hi Wojtek,
>>
>> Some comments inline:
>>
>>  > Found elements would be stored with current document data.
>>
>> I don't see any reason for storing the entire document content in the
>> index, why would you do that? Couldn't you just store some URI that would
>> point to the original file?
>
> I was trying to say here that found elements should be remembered. We
> wouldn't store anything but references to them.
>
>>
>> *> All* |  All above fields to provide non-filter queries.
>>
>> You don't need to do it, if you do, you will basically duplicate the
>> posting list size in the index. To reproduce what you want, the field "all"
>> should be available only for querying purposes, for example, the user could
>> type 'all:store' in the query, and before processing the query it could be
>> expanded to all the "searchable" fields: component:store
>> contribution:store...etc. It's a common practice on unstructured data world,
>> it's so common that Lucene has a query parser for that called
>> MultiFieldQueryParser : )...I think you already said it here:
>>
>>  > none (all document fields would be used to search)
>>
>>  > regular expressions
>
> Yes, this is probably what I wanted to achieve :)
>
>> Can you give me some example of regular expressions?
>
> I mean queries enhaned by regex syntax, ie. "Component[A-C]\.component" to
> find everywhere components from A to C.
> I plan to use here RegexQuery instead one of QueryParsers and as I see
> regular expressions cannot be used in standard Lucene syntax, so those
> searches wouldn't be as powerfull as standard ones. I mean you won't use
> logic operators and multiple fields and etc., it would be limited only to
> one or all fields and regex query string.
>
>>
>> I liked your presentation idea : )
>>
>> Architectural outline session is also very complete : )
>>
>> Good luck ;  )
>>
>> Adriano Crestani
>
>

Hi Wojtek

Nice proposal (and something that would be really useful for Tuscany).

A question. It feels like absolutely the right thing to do to start at
the contribution and contribution content level but I'm interested in
the index field you have described as
"Service/Reference/Component/Binding/Implementation/... " which to me
suggests some indexing of the contents of the files in the
contribution, in this case the composite file.

I'm asking as I'm attracted by the presentation example you give where
you show an initial phrase search giving way to more targeted item
based searches. A complexity of Tuscany is that it's based on number
hierarchies and relationships, e.g.

contribution import/export
component type
component promotion
component wiring
domain/node confguration
intent and policy configuration

Finding things can often mean searching through various, seemingly
unrelated, files. This is particularly the case where policy is
concerned.  It seems that you are solving this problem and I'm
wondering what general provision can be made to extend the index
beyond the original contribution object.

Regards

Simon

Re: [GSoC 2009] Search in SCA domain manager web app

Posted by Wojtek Janiszewski <wo...@gmail.com>.
Hi,
thanks for your input. Please see my comments inline.

Wojtek

Adriano Crestani:
> Hi Wojtek,
> 
> Some comments inline:
> 
>  > Found elements would be stored with current document data.
> 
> I don't see any reason for storing the entire document content in the 
> index, why would you do that? Couldn't you just store some URI that 
> would point to the original file?

I was trying to say here that found elements should be remembered. We 
wouldn't store anything but references to them.

> 
> *> All* |  All above fields to provide non-filter queries.
> 
> You don't need to do it, if you do, you will basically duplicate the 
> posting list size in the index. To reproduce what you want, the field 
> "all" should be available only for querying purposes, for example, the 
> user could type 'all:store' in the query, and before processing the 
> query it could be expanded to all the "searchable" fields: 
> component:store contribution:store...etc. It's a common practice on 
> unstructured data world, it's so common that Lucene has a query parser 
> for that called MultiFieldQueryParser : )...I think you already said it 
> here:
> 
>  > none (all document fields would be used to search)
> 
>  > regular expressions

Yes, this is probably what I wanted to achieve :)

> Can you give me some example of regular expressions?

I mean queries enhaned by regex syntax, ie. "Component[A-C]\.component" 
to find everywhere components from A to C.
I plan to use here RegexQuery instead one of QueryParsers and as I see 
regular expressions cannot be used in standard Lucene syntax, so those 
searches wouldn't be as powerfull as standard ones. I mean you won't use 
logic operators and multiple fields and etc., it would be limited only 
to one or all fields and regex query string.

> 
> I liked your presentation idea : )
> 
> Architectural outline session is also very complete : )
> 
> Good luck ;  )
> 
> Adriano Crestani


Re: [GSoC 2009] Search in SCA domain manager web app

Posted by Adriano Crestani <ad...@gmail.com>.
Hi Wojtek,

Some comments inline:

> Found elements would be stored with current document data.

I don't see any reason for storing the entire document content in the index,
why would you do that? Couldn't you just store some URI that would point to
the original file?

*> All* |  All above fields to provide non-filter queries.

You don't need to do it, if you do, you will basically duplicate the posting
list size in the index. To reproduce what you want, the field "all" should
be available only for querying purposes, for example, the user could type
'all:store' in the query, and before processing the query it could be
expanded to all the "searchable" fields: component:store
contribution:store...etc. It's a common practice on unstructured data world,
it's so common that Lucene has a query parser for that called
MultiFieldQueryParser : )...I think you already said it here:

> none (all document fields would be used to search)

> regular expressions

Can you give me some example of regular expressions?

I liked your presentation idea : )

Architectural outline session is also very complete : )

Good luck ;  )

Adriano Crestani

On Thu, Apr 2, 2009 at 3:24 PM, Wojtek Janiszewski <
wojtek.janiszewski@gmail.com> wrote:

> Hi, Adriano.
>
> Thanks for input. I've included your comments in updated proposal [1].
> (previous timeline was only pattern and I was going to update it later:)).
>
> Thanks,
> Wojtek
>
> [1] -
> http://cwiki.apache.org/confluence/display/TUSCANYWIKI/Searching+artifacts+across+SCA+domain
>
> Adriano Crestani pisze:
>
>  Hi Wojtek,
>>
>> nice proposal : )
>>
>> Indexing should include all available contributions. File names as well as
>> their contents (except non readable files like Java classes) should be
>> indexed. Every indexed item should have link to its contribution parent.
>>
>> I agree about a link to contributions...actually, if you make the
>> contributions the main search target, I mean, if the contribution will be
>> what the user would want as the results, every indexed term would point to a
>> contribution, so it already has a link to the contribution : ) . I only
>> disagree when you say that Java classes are non-readable, they are readable,
>> they have class/method/variables/annotation names, even a .zip is readable,
>> you could open it and index the name of the files contained in it, as well
>> as the contents of this files, if readable.
>>
>> - Maybe we should consider candies like Ajax hints while typing search
>> phrase?
>>
>> I would be reeeeally cool : ), but not priority. It could be easily added
>> later after everything else is working : )
>>
>> -- simply search for files by name
>>
>> I would recommend to index file names using an specific Lucene field for
>> that, like "filename", so the query could be
>> filename:(contributionname.composite)...otherwise, if the user types only
>> contributionname.composite, it could look for this text in every field
>> contained in the index, Lucene has a special feature for that, so it's easy
>> to be implemented. Associating terms with a field is always good for
>> fieltering :)
>>
>> Proposal:
>>
>>  > preview link (if item is readable)
>>
>> If the item is not readable, a link could also be provided for downloading
>> : )
>>
>> Could you please provide to us a more detailed timeline?
>>
>> I think you should add more detailed about  how the text will be parsed
>> and indexed. The way you do this is very important because it implies in how
>> the documents/contributions/artifacts can be searched and what kind o
>> results can be provide to the user.
>>
>> Best Regards,
>> Adriano Crestani
>>
>

Re: [GSoC 2009] Search in SCA domain manager web app

Posted by Wojtek Janiszewski <wo...@gmail.com>.
Hi, Adriano.

Thanks for input. I've included your comments in updated proposal [1]. 
(previous timeline was only pattern and I was going to update it later:)).

Thanks,
Wojtek

[1] - 
http://cwiki.apache.org/confluence/display/TUSCANYWIKI/Searching+artifacts+across+SCA+domain

Adriano Crestani pisze:
> Hi Wojtek,
> 
> nice proposal : )
> 
> Indexing should include all available contributions. File names as well 
> as their contents (except non readable files like Java classes) should 
> be indexed. Every indexed item should have link to its contribution parent.
> 
> I agree about a link to contributions...actually, if you make the 
> contributions the main search target, I mean, if the contribution will 
> be what the user would want as the results, every indexed term would 
> point to a contribution, so it already has a link to the contribution : 
> ) . I only disagree when you say that Java classes are non-readable, 
> they are readable, they have class/method/variables/annotation names, 
> even a .zip is readable, you could open it and index the name of the 
> files contained in it, as well as the contents of this files, if readable.
> 
> - Maybe we should consider candies like Ajax hints while typing search 
> phrase?
> 
> I would be reeeeally cool : ), but not priority. It could be easily 
> added later after everything else is working : )
> 
> -- simply search for files by name
> 
> I would recommend to index file names using an specific Lucene field for 
> that, like "filename", so the query could be 
> filename:(contributionname.composite)...otherwise, if the user types 
> only contributionname.composite, it could look for this text in every 
> field contained in the index, Lucene has a special feature for that, so 
> it's easy to be implemented. Associating terms with a field is always 
> good for fieltering :)
> 
> Proposal:
> 
>  > preview link (if item is readable)
> 
> If the item is not readable, a link could also be provided for 
> downloading : )
> 
> Could you please provide to us a more detailed timeline?
> 
> I think you should add more detailed about  how the text will be parsed 
> and indexed. The way you do this is very important because it implies in 
> how the documents/contributions/artifacts can be searched and what kind 
> o results can be provide to the user.
> 
> Best Regards,
> Adriano Crestani

Re: [GSoC 2009] Search in SCA domain manager web app

Posted by Adriano Crestani <ad...@gmail.com>.
Hi Wojtek,

nice proposal : )

Indexing should include all available contributions. File names as well as
their contents (except non readable files like Java classes) should be
indexed. Every indexed item should have link to its contribution parent.

I agree about a link to contributions...actually, if you make the
contributions the main search target, I mean, if the contribution will be
what the user would want as the results, every indexed term would point to a
contribution, so it already has a link to the contribution : ) . I only
disagree when you say that Java classes are non-readable, they are readable,
they have class/method/variables/annotation names, even a .zip is readable,
you could open it and index the name of the files contained in it, as well
as the contents of this files, if readable.

- Maybe we should consider candies like Ajax hints while typing search
phrase?

I would be reeeeally cool : ), but not priority. It could be easily added
later after everything else is working : )

-- simply search for files by name

I would recommend to index file names using an specific Lucene field for
that, like "filename", so the query could be
filename:(contributionname.composite)...otherwise, if the user types only
contributionname.composite, it could look for this text in every field
contained in the index, Lucene has a special feature for that, so it's easy
to be implemented. Associating terms with a field is always good for
fieltering :)

Proposal:

> preview link (if item is readable)

If the item is not readable, a link could also be provided for downloading :
)

Could you please provide to us a more detailed timeline?

I think you should add more detailed about  how the text will be parsed and
indexed. The way you do this is very important because it implies in how the
documents/contributions/artifacts can be searched and what kind o results
can be provide to the user.

Best Regards,
Adriano Crestani

On Tue, Mar 31, 2009 at 9:29 AM, Luciano Resende <lu...@gmail.com>wrote:

> On Tue, Mar 31, 2009 at 9:21 AM, Wojtek Janiszewski
> <wo...@gmail.com> wrote:
> > Hi, Luciano, Raymond.
> >
> > Thanks for your input, I've just updated my proposal [1]. I've included
> most
> > of yours ideas, and I'll also work on possibility of integration with JMX
> > management.
> >
> > Luciano, I have one comment:
> >
> > Luciano Resende:
> >>>
> >>> 1. Indexing
> >>>
> >>> - Indexing should include all available contributions. File names as
> well
> >>> as
> >>> their contents (except non readable files like Java classes) should be
> >>> indexed. Every indexed item should have link to its contribution
> parent.
> >>>
> >>> - After adding, updating or deleting contribution from domain manager
> web
> >>> application appropriate items should be reindexed.
> >>>
> >>> - We may also consider having connections between indexed items, ie. we
> >>> could scan composite files to acquire children names and build reversed
> >>> links, so every indexed item (script, Java class etc.) could have
> >>> connection
> >>> to its composite parents.
> >>>
> >>
> >> Looks good, I'll probably give first priority for Composites and other
> >> SCA related files, WSDL and XSD.
> >>
> >
> > I don't understand. Could you give more description?
> >
>
> I was just trying to say that, if it makes any difference, we should
> provide search for these types of artifacts first, them move to the
> others.
>
> >
> >
> > [1] -
> >
> http://cwiki.apache.org/confluence/display/TUSCANYWIKI/Searching+artifacts+across+SCA+domain
> >
>
>
>
> --
> Luciano Resende
> Apache Tuscany, Apache PhotArk
> http://people.apache.org/~lresende <http://people.apache.org/%7Elresende>
> http://lresende.blogspot.com/
>

Re: [GSoC 2009] Search in SCA domain manager web app

Posted by Luciano Resende <lu...@gmail.com>.
On Tue, Mar 31, 2009 at 9:21 AM, Wojtek Janiszewski
<wo...@gmail.com> wrote:
> Hi, Luciano, Raymond.
>
> Thanks for your input, I've just updated my proposal [1]. I've included most
> of yours ideas, and I'll also work on possibility of integration with JMX
> management.
>
> Luciano, I have one comment:
>
> Luciano Resende:
>>>
>>> 1. Indexing
>>>
>>> - Indexing should include all available contributions. File names as well
>>> as
>>> their contents (except non readable files like Java classes) should be
>>> indexed. Every indexed item should have link to its contribution parent.
>>>
>>> - After adding, updating or deleting contribution from domain manager web
>>> application appropriate items should be reindexed.
>>>
>>> - We may also consider having connections between indexed items, ie. we
>>> could scan composite files to acquire children names and build reversed
>>> links, so every indexed item (script, Java class etc.) could have
>>> connection
>>> to its composite parents.
>>>
>>
>> Looks good, I'll probably give first priority for Composites and other
>> SCA related files, WSDL and XSD.
>>
>
> I don't understand. Could you give more description?
>

I was just trying to say that, if it makes any difference, we should
provide search for these types of artifacts first, them move to the
others.

>
>
> [1] -
> http://cwiki.apache.org/confluence/display/TUSCANYWIKI/Searching+artifacts+across+SCA+domain
>



-- 
Luciano Resende
Apache Tuscany, Apache PhotArk
http://people.apache.org/~lresende
http://lresende.blogspot.com/

Re: [GSoC 2009] Search in SCA domain manager web app

Posted by Wojtek Janiszewski <wo...@gmail.com>.
Hi, Luciano, Raymond.

Thanks for your input, I've just updated my proposal [1]. I've included 
most of yours ideas, and I'll also work on possibility of integration 
with JMX management.

Luciano, I have one comment:

Luciano Resende:
>> 1. Indexing
>>
>> - Indexing should include all available contributions. File names as well as
>> their contents (except non readable files like Java classes) should be
>> indexed. Every indexed item should have link to its contribution parent.
>>
>> - After adding, updating or deleting contribution from domain manager web
>> application appropriate items should be reindexed.
>>
>> - We may also consider having connections between indexed items, ie. we
>> could scan composite files to acquire children names and build reversed
>> links, so every indexed item (script, Java class etc.) could have connection
>> to its composite parents.
>>
> 
> Looks good, I'll probably give first priority for Composites and other
> SCA related files, WSDL and XSD.
> 

I don't understand. Could you give more description?



[1] - 
http://cwiki.apache.org/confluence/display/TUSCANYWIKI/Searching+artifacts+across+SCA+domain

Re: [GSoC 2009] Search in SCA domain manager web app

Posted by Luciano Resende <lu...@gmail.com>.
2009/3/30 Wojtek Janiszewski <wo...@gmail.com>:
> Hi,
> I'm interested in taking part in Google Summer of Code and project
> "tuscany-scadomain-search" [1] sounds interesting to me.
>
> I've made a quick look inside domain manager web app and Apache Lucene and
> made few assumptions for a start. I defined three main areas which project
> should cover and they are indexing, searching and presentation. Having those
> areas separeted allows us to write modular code and test it.
>
> 1. Indexing
>
> - Indexing should include all available contributions. File names as well as
> their contents (except non readable files like Java classes) should be
> indexed. Every indexed item should have link to its contribution parent.
>
> - After adding, updating or deleting contribution from domain manager web
> application appropriate items should be reindexed.
>
> - We may also consider having connections between indexed items, ie. we
> could scan composite files to acquire children names and build reversed
> links, so every indexed item (script, Java class etc.) could have connection
> to its composite parents.
>

Looks good, I'll probably give first priority for Composites and other
SCA related files, WSDL and XSD.

> 2. Searching
>
> - Search feature would be accessible via SCA domain manager web application.

Tuscany and it's various bindings make it easier to define a search
component and expose it using various different protocols. While I
agree that we should concentrate in producing a search ui integrated
with the existent SCA domain manager web application UI (e.g using a
json-rpc or other web 2.0 binding) but we should not prevent other
scenarios to consume this search component.

> It should allow to:
> -- simply search for files by name
> -- search files content
> -- filter - search inside specified contribution or composite

+1 with sca related files, wsdl and xsd having a high priority

>
> - Maybe we should consider candies like Ajax hints while typing search
> phrase?
>

+1

Another think I had in mind was to allow user to search for component
foo, and when displaying the result have all the references linked
together (e.g if component foo has a <implementation.java
class="fooImpl.java">  clicking fooImpl.java would just redirect you
to the actual file content)...

> - More research one Apache Lucene could provide more searching ideas.
>

Adriano Crestani is the Lucene expert ... he should be able to help
here as well...

> 3. Presentation
>
> - Each search result should be presented using name and link to contribution
> which it belongs to. If it's viewable (it's not Java class etc) then simple
> preview feature for such item should be enabled. Obviously matched text
> should be highlighted (as Google does).
>
> - If information about composite parents for this items would be accessible
> then such composites also should be listed.
>
>
> This quick draft is direction which I'll take while creating proposal. It
> appears to be interesting project, especially it allows to explore new areas
> (everything beyond bindings in Tuscany, Lucene). There is still much place
> to improve (like other features) so any comments are welcome.
>
> Thanks,
> Wojtek
>
> [1] -
> http://wiki.apache.org/general/SummerOfCode2009#tuscany-scadomain-search
>



-- 
Luciano Resende
http://people.apache.org/~lresende
http://lresende.blogspot.com/

Re: [GSoC 2009] Search in SCA domain manager web app

Posted by Raymond Feng <en...@gmail.com>.
Good question. I just dump whatever comes to my mind. It should be viewed as 
stretched goal.

Thanks,
Raymond
--------------------------------------------------
From: "Luciano Resende" <lu...@gmail.com>
Sent: Monday, March 30, 2009 4:07 PM
To: <de...@tuscany.apache.org>
Subject: Re: [GSoC 2009] Search in SCA domain manager web app

> On Mon, Mar 30, 2009 at 3:59 PM, Raymond Feng <en...@gmail.com> wrote:
>>>> 3) The search capability could be potentially integrated with the
>>>> management
>>>> of the SCA domain.
>>>>
>>>
>>> What do you mean here ? Integrated in the Domain Management Web UI ?
>>> Integrated with lifeCycle management of the domain ? or something else
>>> ?
>>
>> I meant to say the JMX-based management of the entities within an SCA
>> domain. Imagine
>> there is a view of the SCA domain and clicking on the search button will
>> highlight the matching
>> entities. Then we can zoom into more details or send commands to 
>> start/stop
>> the live component
>> services.
>>
>
> Got it, do you think this is doable as part of the GSoC deliverable or
> it should be a stretch goal ?
>
>
> -- 
> Luciano Resende
> Apache Tuscany, Apache PhotArk
> http://people.apache.org/~lresende
> http://lresende.blogspot.com/ 


Re: [GSoC 2009] Search in SCA domain manager web app

Posted by Luciano Resende <lu...@gmail.com>.
On Mon, Mar 30, 2009 at 3:59 PM, Raymond Feng <en...@gmail.com> wrote:
>>> 3) The search capability could be potentially integrated with the
>>> management
>>> of the SCA domain.
>>>
>>
>> What do you mean here ? Integrated in the Domain Management Web UI ?
>> Integrated with lifeCycle management of the domain ? or something else
>> ?
>
> I meant to say the JMX-based management of the entities within an SCA
> domain. Imagine
> there is a view of the SCA domain and clicking on the search button will
> highlight the matching
> entities. Then we can zoom into more details or send commands to start/stop
> the live component
> services.
>

Got it, do you think this is doable as part of the GSoC deliverable or
it should be a stretch goal ?


-- 
Luciano Resende
Apache Tuscany, Apache PhotArk
http://people.apache.org/~lresende
http://lresende.blogspot.com/

Re: [GSoC 2009] Search in SCA domain manager web app

Posted by Raymond Feng <en...@gmail.com>.
Comments inline.

--------------------------------------------------
From: "Luciano Resende" <lu...@gmail.com>
Sent: Monday, March 30, 2009 3:22 PM
To: <de...@tuscany.apache.org>
Subject: Re: [GSoC 2009] Search in SCA domain manager web app

> 2009/3/30 Raymond Feng <en...@gmail.com>:
>> Hi, Wojtek.
>>
>> It's great to hear your interest in this GSoC project. Your success in
>> Tuscany CORBA binding project from GSoC 2008 is really encouraging.
>>
>> Your understanding pretty much matches what I have in mind. A few more
>> comments.
>>
>> 1) Indexing: I think indexing is probably not only just keyword based. It
>> will involve the "QName" indexing of the artifacts (such as QName of java
>> classes, QName of composites, WSDLs, XSDs, BPEL files). The runtime
>> processing of SCA contributions can also benefit from this work. For
>> example, the Tuscany already lazily load the WSDL/XSD files upon the need 
>> to
>> resolve references by QName. We should apply the same strategy for 
>> composite
>> files too.
>>
>> 2) The search can be based on keywords, structural URIs, QName of various
>> artifacts, Policy settings, etc.
>>
>
> Would it make sense to have some kind of SCA structured search defined
> such as composite:foo or component:foo which would help filter the
> search results ?
>

Sure. The search is against the SCA composition from different angles, 
including the
contributions/artifacts, assembly, components, interfaces, implementations 
and policies.

>> 3) The search capability could be potentially integrated with the 
>> management
>> of the SCA domain.
>>
>
> What do you mean here ? Integrated in the Domain Management Web UI ?
> Integrated with lifeCycle management of the domain ? or something else
> ?

I meant to say the JMX-based management of the entities within an SCA 
domain. Imagine
there is a view of the SCA domain and clicking on the search button will 
highlight the matching
entities. Then we can zoom into more details or send commands to start/stop 
the live component
services.

>
>> Thanks,
>> Raymond
>
> -- 
> Luciano Resende
> http://people.apache.org/~lresende
> http://lresende.blogspot.com/ 


Re: [GSoC 2009] Search in SCA domain manager web app

Posted by Luciano Resende <lu...@gmail.com>.
2009/3/30 Raymond Feng <en...@gmail.com>:
> Hi, Wojtek.
>
> It's great to hear your interest in this GSoC project. Your success in
> Tuscany CORBA binding project from GSoC 2008 is really encouraging.
>
> Your understanding pretty much matches what I have in mind. A few more
> comments.
>
> 1) Indexing: I think indexing is probably not only just keyword based. It
> will involve the "QName" indexing of the artifacts (such as QName of java
> classes, QName of composites, WSDLs, XSDs, BPEL files). The runtime
> processing of SCA contributions can also benefit from this work. For
> example, the Tuscany already lazily load the WSDL/XSD files upon the need to
> resolve references by QName. We should apply the same strategy for composite
> files too.
>
> 2) The search can be based on keywords, structural URIs, QName of various
> artifacts, Policy settings, etc.
>

Would it make sense to have some kind of SCA structured search defined
such as composite:foo or component:foo which would help filter the
search results ?

> 3) The search capability could be potentially integrated with the management
> of the SCA domain.
>

What do you mean here ? Integrated in the Domain Management Web UI ?
Integrated with lifeCycle management of the domain ? or something else
?

> Thanks,
> Raymond

-- 
Luciano Resende
http://people.apache.org/~lresende
http://lresende.blogspot.com/

Re: [GSoC 2009] Search in SCA domain manager web app

Posted by Raymond Feng <en...@gmail.com>.
Hi, Wojtek.

It's great to hear your interest in this GSoC project. Your success in 
Tuscany CORBA binding project from GSoC 2008 is really encouraging.

Your understanding pretty much matches what I have in mind. A few more 
comments.

1) Indexing: I think indexing is probably not only just keyword based. It 
will involve the "QName" indexing of the artifacts (such as QName of java 
classes, QName of composites, WSDLs, XSDs, BPEL files). The runtime 
processing of SCA contributions can also benefit from this work. For 
example, the Tuscany already lazily load the WSDL/XSD files upon the need to 
resolve references by QName. We should apply the same strategy for composite 
files too.

2) The search can be based on keywords, structural URIs, QName of various 
artifacts, Policy settings, etc.

3) The search capability could be potentially integrated with the management 
of the SCA domain.

Thanks,
Raymond
--------------------------------------------------
From: "Wojtek Janiszewski" <wo...@gmail.com>
Sent: Monday, March 30, 2009 2:19 PM
To: <de...@tuscany.apache.org>
Subject: [GSoC 2009] Search in SCA domain manager web app

> Hi,
> I'm interested in taking part in Google Summer of Code and project 
> "tuscany-scadomain-search" [1] sounds interesting to me.
>
> I've made a quick look inside domain manager web app and Apache Lucene and 
> made few assumptions for a start. I defined three main areas which project 
> should cover and they are indexing, searching and presentation. Having 
> those areas separeted allows us to write modular code and test it.
>
> 1. Indexing
>
> - Indexing should include all available contributions. File names as well 
> as their contents (except non readable files like Java classes) should be 
> indexed. Every indexed item should have link to its contribution parent.
>
> - After adding, updating or deleting contribution from domain manager web 
> application appropriate items should be reindexed.
>
> - We may also consider having connections between indexed items, ie. we 
> could scan composite files to acquire children names and build reversed 
> links, so every indexed item (script, Java class etc.) could have 
> connection to its composite parents.
>
> 2. Searching
>
> - Search feature would be accessible via SCA domain manager web 
> application. It should allow to:
> -- simply search for files by name
> -- search files content
> -- filter - search inside specified contribution or composite
>
> - Maybe we should consider candies like Ajax hints while typing search 
> phrase?
>
> - More research one Apache Lucene could provide more searching ideas.
>
> 3. Presentation
>
> - Each search result should be presented using name and link to 
> contribution which it belongs to. If it's viewable (it's not Java class 
> etc) then simple preview feature for such item should be enabled. 
> Obviously matched text should be highlighted (as Google does).
>
> - If information about composite parents for this items would be 
> accessible then such composites also should be listed.
>
>
> This quick draft is direction which I'll take while creating proposal. It 
> appears to be interesting project, especially it allows to explore new 
> areas (everything beyond bindings in Tuscany, Lucene). There is still much 
> place to improve (like other features) so any comments are welcome.
>
> Thanks,
> Wojtek
>
> [1] - 
> http://wiki.apache.org/general/SummerOfCode2009#tuscany-scadomain-search