You are viewing a plain text version of this content. The canonical link for it is here.

Posted to solr-user@lucene.apache.org by swamynathan <me...@gmail.com> on 2009/08/16 06:15:46 UTC

structural preserving

hi,
im swamynathan a computer science engineering studying in jaya engg college
which is under anna univercity,chennai,India
as a part of my curriculam in the final year i need to do a proj.
i spoke with some solr users and programmers and found out that all content
that are indexed to it are stored in a plain text and the structure is not
preserver(as in the heading,bold,underlined all have same preference)


!) ABC:
 <para>.........
..........
.........

2)XYZ:
<para>........
ABC.........
......


now ABC in both have same prefeence though the first one should have a
higher one as it is in heading


i was thing if i can do some modification in the parser or write a code that
would hook and add preference to the heading,bold,italics etc  so that the
text that is stored will be stored in a data structure that will hold
heading,bold,italics etc seperately


i just want to know is it feasible to finish in a span of -5-6 months
and how do i proceed with it and where do i get core documentation





-- 
your caring/loving/sincere/oyoyoy[select it urself]
swamynathan.

Re: structural preserving

Posted by Shalin Shekhar Mangar <sh...@gmail.com>.

On Sun, Aug 16, 2009 at 9:45 AM, swamynathan <me...@gmail.com> wrote:

> hi,
> im swamynathan a computer science engineering studying in jaya engg college
> which is under anna univercity,chennai,India
> as a part of my curriculam in the final year i need to do a proj.
> i spoke with some solr users and programmers and found out that all content
> that are indexed to it are stored in a plain text and the structure is not
> preserver(as in the heading,bold,underlined all have same preference)
>

Welcome to Solr!

First, I think you do not have the right information about Solr. Solr stores
documents which have key/value pairs where some values may be multi-valued.
If you store html in a field, it will be stored as text. If you can parse
your html and store headline, bold, italics as separate fields, Solr will
store them separately.

>
>
> !) ABC:
>  <para>.........
> ..........
> .........
>
> 2)XYZ:
> <para>........
> ABC.........
> ......
>
>
> now ABC in both have same prefeence though the first one should have a
> higher one as it is in heading
>

You may be interested in Apache Nutch which is a crawler/indexer. AFAIK, it
already does these kind of things.

Apart from this particular thing, there are lots of things to be done in
Solr. We're close to release 1.4 but we have lots of interesting things
coming up for 1.5:

https://issues.apache.org/jira/secure/IssueNavigator.jspa?reset=true&mode=hide&sorter/order=DESC&sorter/field=priority&resolution=-1&pid=12310230&fixfor=12313566

http://wiki.apache.org/solr/HowToContribute

-- 
Regards,
Shalin Shekhar Mangar.