You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@lucene.apache.org by Uwe Schindler <uw...@thetaphi.de> on 2009/06/15 17:55:19 UTC

Core JDK 1.4 compatible.

By the way:
I compiled core and corresponding tests with an old JDK 1.4 version, I found
locally on my machine. Works fine!

Uwe

-----
Uwe Schindler
H.-H.-Meier-Allee 63, D-28213 Bremen
http://www.thetaphi.de
eMail: uwe@thetaphi.de

> -----Original Message-----
> From: Uwe Schindler (JIRA) [mailto:jira@apache.org]
> Sent: Monday, June 15, 2009 5:48 PM
> To: java-dev@lucene.apache.org
> Subject: [jira] Commented: (LUCENE-1606) Automaton Query/Filter (scalable
> regex)
> 
> 
>     [ https://issues.apache.org/jira/browse/LUCENE-
> 1606?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-
> tabpanel&focusedCommentId=12719606#action_12719606 ]
> 
> Uwe Schindler commented on LUCENE-1606:
> ---------------------------------------
> 
> Doesn't seem to work, I will check the sources:
> 
> {code}
> compile-core:
>     [javac] Compiling 12 source files to
> C:\Projects\lucene\trunk\build\contrib\regex\classes\java
>     [javac]
> C:\Projects\lucene\trunk\contrib\regex\src\java\org\apache\lucene\search\r
> egex\AutomatonFuzzyQuery.java:11: cannot access
> dk.brics.automaton.Automaton
>     [javac] bad class file:
> C:\Projects\lucene\trunk\contrib\regex\lib\automaton
> .jar(dk/brics/automaton/Automaton.class)
>     [javac] class file has wrong version 49.0, should be 48.0
>     [javac] Please remove or make sure it appears in the correct
> subdirectory of
>  the classpath.
>     [javac] import dk.brics.automaton.Automaton;
>     [javac]                           ^
>     [javac] 1 error
> {code}
> 
> > Automaton Query/Filter (scalable regex)
> > ---------------------------------------
> >
> >                 Key: LUCENE-1606
> >                 URL: https://issues.apache.org/jira/browse/LUCENE-1606
> >             Project: Lucene - Java
> >          Issue Type: New Feature
> >          Components: contrib/*
> >            Reporter: Robert Muir
> >            Assignee: Uwe Schindler
> >            Priority: Minor
> >             Fix For: 2.9
> >
> >         Attachments: automaton.patch, automatonMultiQuery.patch,
> automatonmultiqueryfuzzy.patch, automatonMultiQuerySmart.patch,
> automatonWithWildCard.patch, automatonWithWildCard2.patch, LUCENE-
> 1606.patch
> >
> >
> > Attached is a patch for an AutomatonQuery/Filter (name can change if its
> not suitable).
> > Whereas the out-of-box contrib RegexQuery is nice, I have some very
> large indexes (100M+ unique tokens) where queries are quite slow, 2
> minutes, etc. Additionally all of the existing RegexQuery implementations
> in Lucene are really slow if there is no constant prefix. This
> implementation does not depend upon constant prefix, and runs the same
> query in 640ms.
> > Some use cases I envision:
> >  1. lexicography/etc on large text corpora
> >  2. looking for things such as urls where the prefix is not constant
> (http:// or ftp://)
> > The Filter uses the BRICS package (http://www.brics.dk/automaton/) to
> convert regular expressions into a DFA. Then, the filter "enumerates"
> terms in a special way, by using the underlying state machine. Here is my
> short description from the comments:
> >      The algorithm here is pretty basic. Enumerate terms but instead of
> a binary accept/reject do:
> >
> >      1. Look at the portion that is OK (did not enter a reject state in
> the DFA)
> >      2. Generate the next possible String and seek to that.
> > the Query simply wraps the filter with ConstantScoreQuery.
> > I did not include the automaton.jar inside the patch but it can be
> downloaded from http://www.brics.dk/automaton/ and is BSD-licensed.
> 
> --
> This message is automatically generated by JIRA.
> -
> You can reply to this email to add a comment to the issue online.
> 
> 
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-dev-help@lucene.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org


Re: Core JDK 1.4 compatible.

Posted by Shai Erera <se...@gmail.com>.
It would help if we have a target date, then I'll know how many more X's I
need to mark on the Calendar :)

On Mon, Jun 15, 2009 at 6:56 PM, Michael McCandless <
lucene@mikemccandless.com> wrote:

> :)
>
> But those days are numbered!
>
> Mike
>
> On Mon, Jun 15, 2009 at 11:55 AM, Uwe Schindler<uw...@thetaphi.de> wrote:
> > By the way:
> > I compiled core and corresponding tests with an old JDK 1.4 version, I
> found
> > locally on my machine. Works fine!
> >
> > Uwe
> >
> > -----
> > Uwe Schindler
> > H.-H.-Meier-Allee 63, D-28213 Bremen
> > http://www.thetaphi.de
> > eMail: uwe@thetaphi.de
> >
> >> -----Original Message-----
> >> From: Uwe Schindler (JIRA) [mailto:jira@apache.org]
> >> Sent: Monday, June 15, 2009 5:48 PM
> >> To: java-dev@lucene.apache.org
> >> Subject: [jira] Commented: (LUCENE-1606) Automaton Query/Filter
> (scalable
> >> regex)
> >>
> >>
> >>     [ https://issues.apache.org/jira/browse/LUCENE-
> >> 1606?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-
> >> tabpanel&focusedCommentId=12719606#action_12719606 ]
> >>
> >> Uwe Schindler commented on LUCENE-1606:
> >> ---------------------------------------
> >>
> >> Doesn't seem to work, I will check the sources:
> >>
> >> {code}
> >> compile-core:
> >>     [javac] Compiling 12 source files to
> >> C:\Projects\lucene\trunk\build\contrib\regex\classes\java
> >>     [javac]
> >>
> C:\Projects\lucene\trunk\contrib\regex\src\java\org\apache\lucene\search\r
> >> egex\AutomatonFuzzyQuery.java:11: cannot access
> >> dk.brics.automaton.Automaton
> >>     [javac] bad class file:
> >> C:\Projects\lucene\trunk\contrib\regex\lib\automaton
> >> .jar(dk/brics/automaton/Automaton.class)
> >>     [javac] class file has wrong version 49.0, should be 48.0
> >>     [javac] Please remove or make sure it appears in the correct
> >> subdirectory of
> >>  the classpath.
> >>     [javac] import dk.brics.automaton.Automaton;
> >>     [javac]                           ^
> >>     [javac] 1 error
> >> {code}
> >>
> >> > Automaton Query/Filter (scalable regex)
> >> > ---------------------------------------
> >> >
> >> >                 Key: LUCENE-1606
> >> >                 URL:
> https://issues.apache.org/jira/browse/LUCENE-1606
> >> >             Project: Lucene - Java
> >> >          Issue Type: New Feature
> >> >          Components: contrib/*
> >> >            Reporter: Robert Muir
> >> >            Assignee: Uwe Schindler
> >> >            Priority: Minor
> >> >             Fix For: 2.9
> >> >
> >> >         Attachments: automaton.patch, automatonMultiQuery.patch,
> >> automatonmultiqueryfuzzy.patch, automatonMultiQuerySmart.patch,
> >> automatonWithWildCard.patch, automatonWithWildCard2.patch, LUCENE-
> >> 1606.patch
> >> >
> >> >
> >> > Attached is a patch for an AutomatonQuery/Filter (name can change if
> its
> >> not suitable).
> >> > Whereas the out-of-box contrib RegexQuery is nice, I have some very
> >> large indexes (100M+ unique tokens) where queries are quite slow, 2
> >> minutes, etc. Additionally all of the existing RegexQuery
> implementations
> >> in Lucene are really slow if there is no constant prefix. This
> >> implementation does not depend upon constant prefix, and runs the same
> >> query in 640ms.
> >> > Some use cases I envision:
> >> >  1. lexicography/etc on large text corpora
> >> >  2. looking for things such as urls where the prefix is not constant
> >> (http:// or ftp://)
> >> > The Filter uses the BRICS package (http://www.brics.dk/automaton/) to
> >> convert regular expressions into a DFA. Then, the filter "enumerates"
> >> terms in a special way, by using the underlying state machine. Here is
> my
> >> short description from the comments:
> >> >      The algorithm here is pretty basic. Enumerate terms but instead
> of
> >> a binary accept/reject do:
> >> >
> >> >      1. Look at the portion that is OK (did not enter a reject state
> in
> >> the DFA)
> >> >      2. Generate the next possible String and seek to that.
> >> > the Query simply wraps the filter with ConstantScoreQuery.
> >> > I did not include the automaton.jar inside the patch but it can be
> >> downloaded from http://www.brics.dk/automaton/ and is BSD-licensed.
> >>
> >> --
> >> This message is automatically generated by JIRA.
> >> -
> >> You can reply to this email to add a comment to the issue online.
> >>
> >>
> >> ---------------------------------------------------------------------
> >> To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
> >> For additional commands, e-mail: java-dev-help@lucene.apache.org
> >
> >
> >
> > ---------------------------------------------------------------------
> > To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
> > For additional commands, e-mail: java-dev-help@lucene.apache.org
> >
> >
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-dev-help@lucene.apache.org
>
>

Re: Core JDK 1.4 compatible.

Posted by Michael McCandless <lu...@mikemccandless.com>.
:)

But those days are numbered!

Mike

On Mon, Jun 15, 2009 at 11:55 AM, Uwe Schindler<uw...@thetaphi.de> wrote:
> By the way:
> I compiled core and corresponding tests with an old JDK 1.4 version, I found
> locally on my machine. Works fine!
>
> Uwe
>
> -----
> Uwe Schindler
> H.-H.-Meier-Allee 63, D-28213 Bremen
> http://www.thetaphi.de
> eMail: uwe@thetaphi.de
>
>> -----Original Message-----
>> From: Uwe Schindler (JIRA) [mailto:jira@apache.org]
>> Sent: Monday, June 15, 2009 5:48 PM
>> To: java-dev@lucene.apache.org
>> Subject: [jira] Commented: (LUCENE-1606) Automaton Query/Filter (scalable
>> regex)
>>
>>
>>     [ https://issues.apache.org/jira/browse/LUCENE-
>> 1606?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-
>> tabpanel&focusedCommentId=12719606#action_12719606 ]
>>
>> Uwe Schindler commented on LUCENE-1606:
>> ---------------------------------------
>>
>> Doesn't seem to work, I will check the sources:
>>
>> {code}
>> compile-core:
>>     [javac] Compiling 12 source files to
>> C:\Projects\lucene\trunk\build\contrib\regex\classes\java
>>     [javac]
>> C:\Projects\lucene\trunk\contrib\regex\src\java\org\apache\lucene\search\r
>> egex\AutomatonFuzzyQuery.java:11: cannot access
>> dk.brics.automaton.Automaton
>>     [javac] bad class file:
>> C:\Projects\lucene\trunk\contrib\regex\lib\automaton
>> .jar(dk/brics/automaton/Automaton.class)
>>     [javac] class file has wrong version 49.0, should be 48.0
>>     [javac] Please remove or make sure it appears in the correct
>> subdirectory of
>>  the classpath.
>>     [javac] import dk.brics.automaton.Automaton;
>>     [javac]                           ^
>>     [javac] 1 error
>> {code}
>>
>> > Automaton Query/Filter (scalable regex)
>> > ---------------------------------------
>> >
>> >                 Key: LUCENE-1606
>> >                 URL: https://issues.apache.org/jira/browse/LUCENE-1606
>> >             Project: Lucene - Java
>> >          Issue Type: New Feature
>> >          Components: contrib/*
>> >            Reporter: Robert Muir
>> >            Assignee: Uwe Schindler
>> >            Priority: Minor
>> >             Fix For: 2.9
>> >
>> >         Attachments: automaton.patch, automatonMultiQuery.patch,
>> automatonmultiqueryfuzzy.patch, automatonMultiQuerySmart.patch,
>> automatonWithWildCard.patch, automatonWithWildCard2.patch, LUCENE-
>> 1606.patch
>> >
>> >
>> > Attached is a patch for an AutomatonQuery/Filter (name can change if its
>> not suitable).
>> > Whereas the out-of-box contrib RegexQuery is nice, I have some very
>> large indexes (100M+ unique tokens) where queries are quite slow, 2
>> minutes, etc. Additionally all of the existing RegexQuery implementations
>> in Lucene are really slow if there is no constant prefix. This
>> implementation does not depend upon constant prefix, and runs the same
>> query in 640ms.
>> > Some use cases I envision:
>> >  1. lexicography/etc on large text corpora
>> >  2. looking for things such as urls where the prefix is not constant
>> (http:// or ftp://)
>> > The Filter uses the BRICS package (http://www.brics.dk/automaton/) to
>> convert regular expressions into a DFA. Then, the filter "enumerates"
>> terms in a special way, by using the underlying state machine. Here is my
>> short description from the comments:
>> >      The algorithm here is pretty basic. Enumerate terms but instead of
>> a binary accept/reject do:
>> >
>> >      1. Look at the portion that is OK (did not enter a reject state in
>> the DFA)
>> >      2. Generate the next possible String and seek to that.
>> > the Query simply wraps the filter with ConstantScoreQuery.
>> > I did not include the automaton.jar inside the patch but it can be
>> downloaded from http://www.brics.dk/automaton/ and is BSD-licensed.
>>
>> --
>> This message is automatically generated by JIRA.
>> -
>> You can reply to this email to add a comment to the issue online.
>>
>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
>> For additional commands, e-mail: java-dev-help@lucene.apache.org
>
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-dev-help@lucene.apache.org
>
>

---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org