You are viewing a plain text version of this content. The canonical link for it is here.
Posted to java-user@lucene.apache.org by hui liu <iv...@gmail.com> on 2004/09/07 22:35:35 UTC

lucene index parser problem

Hi,

I have such a problem when creating lucene index for many html files:

It shows "aborted, expected<tagname>....<tagend>" for those html files
which contain java scripts. It seems it cannot parse the tags < \>.
Does anyone has any solution?

Thank you very very much...!!!

Ivy.

---------------------------------------------------------------------
To unsubscribe, e-mail: lucene-user-unsubscribe@jakarta.apache.org
For additional commands, e-mail: lucene-user-help@jakarta.apache.org


Re: lucene index parser problem

Posted by sergiu gordea <gs...@ifit.uni-klu.ac.at>.
maybe you should encode the html code ...

Patrick Burleson wrote:

>Why oh why did you send this to the tomcat lists?
>
>Don't cross post! Especially when the question doesn't even apply to
>one of the lists.
>
>Patrick
>
>On Tue, 7 Sep 2004 16:35:35 -0400, hui liu <iv...@gmail.com> wrote:
>  
>
>>Hi,
>>
>>I have such a problem when creating lucene index for many html files:
>>
>>It shows "aborted, expected<tagname>....<tagend>" for those html files
>>which contain java scripts. It seems it cannot parse the tags < \>.
>>    
>>
?? is < \> a valid tag? I think it should be < />
Do you want to index the whole HTML file, or just the information i this 
files?
Maybe you should use a HTML2TXT converter, and then index the resulting 
text.

 
 All the best,

  Sergiu

>>Does anyone has any solution?
>>
>>Thank you very very much...!!!
>>
>>Ivy.
>>
>>---------------------------------------------------------------------
>>To unsubscribe, e-mail: lucene-user-unsubscribe@jakarta.apache.org
>>For additional commands, e-mail: lucene-user-help@jakarta.apache.org
>>
>>
>>    
>>
>
>---------------------------------------------------------------------
>To unsubscribe, e-mail: lucene-user-unsubscribe@jakarta.apache.org
>For additional commands, e-mail: lucene-user-help@jakarta.apache.org
>
>  
>


---------------------------------------------------------------------
To unsubscribe, e-mail: lucene-user-unsubscribe@jakarta.apache.org
For additional commands, e-mail: lucene-user-help@jakarta.apache.org


Re: lucene index parser problem

Posted by Patrick Burleson <pb...@gmail.com>.
Why oh why did you send this to the tomcat lists?

Don't cross post! Especially when the question doesn't even apply to
one of the lists.

Patrick

On Tue, 7 Sep 2004 16:35:35 -0400, hui liu <iv...@gmail.com> wrote:
> Hi,
> 
> I have such a problem when creating lucene index for many html files:
> 
> It shows "aborted, expected<tagname>....<tagend>" for those html files
> which contain java scripts. It seems it cannot parse the tags < \>.
> Does anyone has any solution?
> 
> Thank you very very much...!!!
> 
> Ivy.
> 
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: lucene-user-unsubscribe@jakarta.apache.org
> For additional commands, e-mail: lucene-user-help@jakarta.apache.org
> 
>

---------------------------------------------------------------------
To unsubscribe, e-mail: tomcat-user-unsubscribe@jakarta.apache.org
For additional commands, e-mail: tomcat-user-help@jakarta.apache.org


Re: lucene index parser problem

Posted by Patrick Burleson <pb...@gmail.com>.
Why oh why did you send this to the tomcat lists?

Don't cross post! Especially when the question doesn't even apply to
one of the lists.

Patrick

On Tue, 7 Sep 2004 16:35:35 -0400, hui liu <iv...@gmail.com> wrote:
> Hi,
> 
> I have such a problem when creating lucene index for many html files:
> 
> It shows "aborted, expected<tagname>....<tagend>" for those html files
> which contain java scripts. It seems it cannot parse the tags < \>.
> Does anyone has any solution?
> 
> Thank you very very much...!!!
> 
> Ivy.
> 
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: lucene-user-unsubscribe@jakarta.apache.org
> For additional commands, e-mail: lucene-user-help@jakarta.apache.org
> 
>

---------------------------------------------------------------------
To unsubscribe, e-mail: lucene-user-unsubscribe@jakarta.apache.org
For additional commands, e-mail: lucene-user-help@jakarta.apache.org