You are viewing a plain text version of this content. The canonical link for it is here.
Posted to java-user@lucene.apache.org by Hareesh <ha...@hotmail.com> on 2008/08/14 18:28:01 UTC

Issue while creating Regular Expressions

I have a small problem. I will describe you the problem first ..I am working
on a search Engine now, in which the crawling is done using Heritrix and the
crawled data is the input for my Logic. while trying to index the ARC files
from Heritrix its not creating indexes in the desired format. The Expression
which Iam using to create the segments is like this   ^(.*)$\\n+  when
passing the filtered contents i;e after removing the Html tags.I created a
pattern using the above said regex and when passing the filtered contents
through the method to Match the pattern it returns 'false'. so Iam little
bit confused regarding which will be the exact regular expression . pls help
me with your suggestions.

Thanks
-- 
View this message in context: http://www.nabble.com/Issue-while-creating-Regular-Expressions-tp18985052p18985052.html
Sent from the Lucene - Java Users mailing list archive at Nabble.com.


---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Re: Issue while creating Regular Expressions

Posted by Erick Erickson <er...@gmail.com>.
What are you trying to do with the regex? And why is it
appropriate to the Lucene list? What is a segment and how
does it relate to Lucene?

It would really help if you showed us some example input
and what transformation you are trying to implement with
your regex. If it's a pure regex question you might
get more informed responses if you looked at one
of the Java forums.

Best
Erick

On Thu, Aug 14, 2008 at 12:28 PM, Hareesh <ha...@hotmail.com>wrote:

>
> I have a small problem. I will describe you the problem first ..I am
> working
> on a search Engine now, in which the crawling is done using Heritrix and
> the
> crawled data is the input for my Logic. while trying to index the ARC files
> from Heritrix its not creating indexes in the desired format. The
> Expression
> which Iam using to create the segments is like this   ^(.*)$\\n+  when
> passing the filtered contents i;e after removing the Html tags.I created a
> pattern using the above said regex and when passing the filtered contents
> through the method to Match the pattern it returns 'false'. so Iam little
> bit confused regarding which will be the exact regular expression . pls
> help
> me with your suggestions.
>
> Thanks
> --
> View this message in context:
> http://www.nabble.com/Issue-while-creating-Regular-Expressions-tp18985052p18985052.html
> Sent from the Lucene - Java Users mailing list archive at Nabble.com.
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-user-help@lucene.apache.org
>
>