You are viewing a plain text version of this content. The canonical link for it is here.

Posted to dev@couchdb.apache.org by Benoit Chesneau <bc...@gmail.com> on 2011/01/10 01:32:47 UTC

rewriter needed changes

There are 2 tickets open for the rewriter :

https://issues.apache.org/jira/browse/COUCHDB-1017
https://issues.apache.org/jira/browse/COUCHDB-1005

First one is about testing types of value to eventually encode them
(or decode) from the path or query string. 1017 speak about strings
but it could be integer as well. This isn't possible actually.

Second is to have a more enhanced rewriter.  First intention of
_rewriter was to offer a simple way to dispatch urls to a resource
(_show, _update, _list, _view, doc, attachment) based on path terms
(string, ':var", "*"). Path specifications are obtained by breaking
url into tokens via the "/" separator, Then we match them against path
terms. That's how we find urls. There is also the possibility to use
query arguments as a path term.  A rewriter like this is the easier
implementation we found, and as is the only that obtained a consensus.

The feature asked in 1005 need more power than simple pattern matching.

The more people will use CouchApps with CouchDB facing directly to the
web (without any proxy), the more people will ask for such features.

I see 2 alternatives and easy pattern matching we can use to solve such problem:


1.

Put var between "<>" like this <key>,
Then eventually say what is the type of the variable : <int:key> for integer.

Ex:

{
     "from": "/a/b/<key>/<int:id>",
     "to":"/c/<key>",
     "query": {
          "key": "<int:key>"
      }
}

/a/b/c/13 -> /c/c?key=13


This solve 1017 and potentially 1005 .

2. Use mongrel2 pattern matching:

<snip>
URL patterns always match from the start, routes are broken into
prefix and pattern part. We uses the routes to find the longest
matching prefix and then tests the pattern. If the pattern matches,
then the route works. If the route doesn't have a pattern, then it's
assumed to match, and you're done.

The only caveat is you have to wrap your pattern parts in parenthesis,
but these don't mean anything other than to delimit where a pattern
starts. So instead of /images/.⋆.jpg, write /images/(.⋆.jpg) for it to
work.

Here's the list of characters you can use in your patterns:

. (period) All characters.
\a Letters.
\c Control characters.
\d Digits.
\l Lowercase letters.
\p Punctuation characters.
\s Space characters.
\u Uppercase letters.
\w Alphanumeric characters.
\x Hexadecimal digits.
\z The 0 character (null terminator).
[set] Just like a regex [] where is a set of chars, like [0-9] for all digits.
[^set] Inverse character set, so [^0-9] is anything but digits.
⋆ Longest match of 0 or more of the preceding character.
+ Longest match of 1 or more of the preceding character.
- Shortest match of 0 or more of the preceding character.
? 0 or 1 match of of the preceding character
\bxy Balanced match a substring starting with x and ending in y. So
\b() will match balanced parentheses.
$ End of the string.
Using the uppercase version of an escaped character makes it work the
opposite way (i.e., \A matches any character that isn't a letter). The
backslash can be used to escape the following character, disabling its
special abilities (i.e., \\ will match a backslash).

Anything that's not listed here is matched literally.

</snip>

This solution is really simple, remove the useless things you have in
regexp and give complete power to the users. Also this kind of parsing
is relatively easy to do in erlang.


There may be a third solution. If we use something like emonk, erlv8,
... we could have the rewriter in a js function. But it won't happend
in next 6 months . I'm pretty supporter of the second solution though,
and quite ready to start a new parser.

Any thoughts ?


- benoît

Re: rewriter needed changes

Posted by Benjamin Young <be...@couchone.com>.

On 1/20/11 12:37 PM, Benoit Chesneau wrote:
> On Thu, Jan 20, 2011 at 5:27 PM, Benjamin Young<be...@couchone.com>  wrote:
>> On 1/20/11 10:45 AM, Benoit Chesneau wrote:
>>> On Thu, Jan 20, 2011 at 4:40 PM, Volker Mische<vo...@gmail.com>
>>>   wrote:
>>>
>>>> {
>>>>   "from": "/page/:x/:y/:z",
>>>>   "to": "/_show/post/:x-:y-:z/something",
>>>>   "params": {
>>>>     "x": {
>>>>     "match": "\\d",
>>>>   },
>>>>     "y": {
>>>>     "match": "\\d",
>>>>   },
>>>>     "z": {
>>>>     "match": "\\d",
>>>>   }
>>>> }
>>>>
>>> This one is already possible in current couchapp_legacy rewriter. I'm
>>> not a fan to have something other than :
>>>
>>> patterns: {
>>>      "name1": "regexp",
>>>       ...
>>> }
>>>
>>> It will make the system really complex. Maybe as an option though. I
>>> can detect if I have an object or not. I think it would be better to
>>> say ".*"
>>>
>>>
>>>   (and i need to find a new name for couchapp_legacy)
>>>
>>>
>>> - benoīt
>> I couldn't speak to the complexity of the code as I'm fairly new to Erlang.
>> In any case, I think we need to keep the URL's in "from" and "to" easy to
>> read and provide flexible parsing options with a "standard" (or standard
>> set) available by default.
>>
>> If we support regular expressions, I'd suggest using PCRE over any other as
>> its widely known and used (Django, nginx, PHP, mod_rewrite).
>>
> Well did you read the README ? It's still using From, To. Reverse
> dispatching is still here but improved. It also offer regexp, which
> use re module in erlang based on PCRE. Readme is here:
>
> https://github.com/benoitc/couchapp_legacy
>
> It's using regexp ( or reverse url dispatching.  About regexp it"s
> using re module in erlang based on PCRE.

Yeah, sorry about that. I should have read that more fully. I had 
thought it used Mongrel's stuff--which would be fine as an additional 
option, maybe.

> Url template is a good idea. Are you speaking about:
>
> http://tools.ietf.org/html/draft-gregorio-uritemplate-04 ?

Yep. It still needs work, obviously, but the syntax has some traction 
with groups like OpenSearch:
http://bitworking.org/news/URI_Templates

> It could be implemented using new engine. Also as a side node
> couchapp_legacy is legacy as inheritance or tribute. It doesn't mean I
> want it fully compatible with current one, users can still use old
> rewrite handler. I will change its name during the night.
>
> - benoit

Re: rewriter needed changes

Posted by Benjamin Young <be...@couchone.com>.

On 1/20/11 1:05 PM, Benoit Chesneau wrote:
> On Thu, Jan 20, 2011 at 6:37 PM, Benoit Chesneau<bc...@gmail.com>  wrote:
>> On Thu, Jan 20, 2011 at 5:27 PM, Benjamin Young<be...@couchone.com>  wrote:
>>> On 1/20/11 10:45 AM, Benoit Chesneau wrote:
>>>> On Thu, Jan 20, 2011 at 4:40 PM, Volker Mische<vo...@gmail.com>
>>>>   wrote:
>>>>
>>>>> {
>>>>>   "from": "/page/:x/:y/:z",
>>>>>   "to": "/_show/post/:x-:y-:z/something",
>>>>>   "params": {
>>>>>     "x": {
>>>>>     "match": "\\d",
>>>>>   },
>>>>>     "y": {
>>>>>     "match": "\\d",
>>>>>   },
>>>>>     "z": {
>>>>>     "match": "\\d",
>>>>>   }
>>>>> }
>>>>>
>>>> This one is already possible in current couchapp_legacy rewriter. I'm
>>>> not a fan to have something other than :
>>>>
>>>> patterns: {
>>>>      "name1": "regexp",
>>>>       ...
>>>> }
>>>>
>>>> It will make the system really complex. Maybe as an option though. I
>>>> can detect if I have an object or not. I think it would be better to
>>>> say ".*"
>>>>
>>>>
>>>>   (and i need to find a new name for couchapp_legacy)
>>>>
>>>>
>>>> - benoīt
>>> I couldn't speak to the complexity of the code as I'm fairly new to Erlang.
>>> In any case, I think we need to keep the URL's in "from" and "to" easy to
>>> read and provide flexible parsing options with a "standard" (or standard
>>> set) available by default.
>>>
>>> If we support regular expressions, I'd suggest using PCRE over any other as
>>> its widely known and used (Django, nginx, PHP, mod_rewrite).
>>>
>> Well did you read the README ? It's still using From, To. Reverse
>> dispatching is still here but improved. It also offer regexp, which
>> use re module in erlang based on PCRE. Readme is here:
>>
>> https://github.com/benoitc/couchapp_legacy
>>
>> It's using regexp ( or reverse url dispatching.  About regexp it"s
>> using re module in erlang based on PCRE.
>>
>> Url template is a good idea. Are you speaking about:
>>
>> http://tools.ietf.org/html/draft-gregorio-uritemplate-04 ?
>>
>> It could be implemented using new engine. Also as a side node
>> couchapp_legacy is legacy as inheritance or tribute. It doesn't mean I
>> want it fully compatible with current one, users can still use old
>> rewrite handler. I will change its name during the night.
>>
>> - benoit
>>
> Added query param handling as well :
>
> https://github.com/benoitc/couchapp_legacy/commit/1c9047375c394f9af6663462caa62588b254d1a6
>
> I'm trying to include support for url templating right now.
Looks like it's now here:
https://github.com/benoitc/couchapp-ng

Re: rewriter needed changes

Posted by Benoit Chesneau <bc...@gmail.com>.

On Thu, Jan 20, 2011 at 6:37 PM, Benoit Chesneau <bc...@gmail.com> wrote:
> On Thu, Jan 20, 2011 at 5:27 PM, Benjamin Young <be...@couchone.com> wrote:
>> On 1/20/11 10:45 AM, Benoit Chesneau wrote:
>>>
>>> On Thu, Jan 20, 2011 at 4:40 PM, Volker Mische<vo...@gmail.com>
>>>  wrote:
>>>
>>>> {
>>>>  "from": "/page/:x/:y/:z",
>>>>  "to": "/_show/post/:x-:y-:z/something",
>>>>  "params": {
>>>>    "x": {
>>>>    "match": "\\d",
>>>>  },
>>>>    "y": {
>>>>    "match": "\\d",
>>>>  },
>>>>    "z": {
>>>>    "match": "\\d",
>>>>  }
>>>> }
>>>>
>>> This one is already possible in current couchapp_legacy rewriter. I'm
>>> not a fan to have something other than :
>>>
>>> patterns: {
>>>     "name1": "regexp",
>>>      ...
>>> }
>>>
>>> It will make the system really complex. Maybe as an option though. I
>>> can detect if I have an object or not. I think it would be better to
>>> say ".*"
>>>
>>>
>>>  (and i need to find a new name for couchapp_legacy)
>>>
>>>
>>> - benoīt
>>
>> I couldn't speak to the complexity of the code as I'm fairly new to Erlang.
>> In any case, I think we need to keep the URL's in "from" and "to" easy to
>> read and provide flexible parsing options with a "standard" (or standard
>> set) available by default.
>>
>> If we support regular expressions, I'd suggest using PCRE over any other as
>> its widely known and used (Django, nginx, PHP, mod_rewrite).
>>
>
> Well did you read the README ? It's still using From, To. Reverse
> dispatching is still here but improved. It also offer regexp, which
> use re module in erlang based on PCRE. Readme is here:
>
> https://github.com/benoitc/couchapp_legacy
>
> It's using regexp ( or reverse url dispatching.  About regexp it"s
> using re module in erlang based on PCRE.
>
> Url template is a good idea. Are you speaking about:
>
> http://tools.ietf.org/html/draft-gregorio-uritemplate-04 ?
>
> It could be implemented using new engine. Also as a side node
> couchapp_legacy is legacy as inheritance or tribute. It doesn't mean I
> want it fully compatible with current one, users can still use old
> rewrite handler. I will change its name during the night.
>
> - benoit
>

Added query param handling as well :

https://github.com/benoitc/couchapp_legacy/commit/1c9047375c394f9af6663462caa62588b254d1a6

I'm trying to include support for url templating right now.

Re: rewriter needed changes

Posted by Benoit Chesneau <bc...@gmail.com>.

On Thu, Jan 20, 2011 at 5:27 PM, Benjamin Young <be...@couchone.com> wrote:
> On 1/20/11 10:45 AM, Benoit Chesneau wrote:
>>
>> On Thu, Jan 20, 2011 at 4:40 PM, Volker Mische<vo...@gmail.com>
>>  wrote:
>>
>>> {
>>>  "from": "/page/:x/:y/:z",
>>>  "to": "/_show/post/:x-:y-:z/something",
>>>  "params": {
>>>    "x": {
>>>    "match": "\\d",
>>>  },
>>>    "y": {
>>>    "match": "\\d",
>>>  },
>>>    "z": {
>>>    "match": "\\d",
>>>  }
>>> }
>>>
>> This one is already possible in current couchapp_legacy rewriter. I'm
>> not a fan to have something other than :
>>
>> patterns: {
>>     "name1": "regexp",
>>      ...
>> }
>>
>> It will make the system really complex. Maybe as an option though. I
>> can detect if I have an object or not. I think it would be better to
>> say ".*"
>>
>>
>>  (and i need to find a new name for couchapp_legacy)
>>
>>
>> - benoīt
>
> I couldn't speak to the complexity of the code as I'm fairly new to Erlang.
> In any case, I think we need to keep the URL's in "from" and "to" easy to
> read and provide flexible parsing options with a "standard" (or standard
> set) available by default.
>
> If we support regular expressions, I'd suggest using PCRE over any other as
> its widely known and used (Django, nginx, PHP, mod_rewrite).
>

Well did you read the README ? It's still using From, To. Reverse
dispatching is still here but improved. It also offer regexp, which
use re module in erlang based on PCRE. Readme is here:

https://github.com/benoitc/couchapp_legacy

It's using regexp ( or reverse url dispatching.  About regexp it"s
using re module in erlang based on PCRE.

Url template is a good idea. Are you speaking about:

http://tools.ietf.org/html/draft-gregorio-uritemplate-04 ?

It could be implemented using new engine. Also as a side node
couchapp_legacy is legacy as inheritance or tribute. It doesn't mean I
want it fully compatible with current one, users can still use old
rewrite handler. I will change its name during the night.

- benoit

Re: rewriter needed changes

Posted by Benjamin Young <be...@couchone.com>.

On 1/20/11 10:45 AM, Benoit Chesneau wrote:
> On Thu, Jan 20, 2011 at 4:40 PM, Volker Mische<vo...@gmail.com>  wrote:
>
>> {
>>   "from": "/page/:x/:y/:z",
>>   "to": "/_show/post/:x-:y-:z/something",
>>   "params": {
>>     "x": {
>>     "match": "\\d",
>>   },
>>     "y": {
>>     "match": "\\d",
>>   },
>>     "z": {
>>     "match": "\\d",
>>   }
>> }
>>
> This one is already possible in current couchapp_legacy rewriter. I'm
> not a fan to have something other than :
>
> patterns: {
>      "name1": "regexp",
>       ...
> }
>
> It will make the system really complex. Maybe as an option though. I
> can detect if I have an object or not. I think it would be better to
> say ".*"
>
>
>   (and i need to find a new name for couchapp_legacy)
>
>
> - benoît
I couldn't speak to the complexity of the code as I'm fairly new to 
Erlang. In any case, I think we need to keep the URL's in "from" and 
"to" easy to read and provide flexible parsing options with a "standard" 
(or standard set) available by default.

If we support regular expressions, I'd suggest using PCRE over any other 
as its widely known and used (Django, nginx, PHP, mod_rewrite).

Re: rewriter needed changes

Posted by Benoit Chesneau <bc...@gmail.com>.

On Thu, Jan 20, 2011 at 4:40 PM, Volker Mische <vo...@gmail.com> wrote:

>
> {
>  "from": "/page/:x/:y/:z",
>  "to": "/_show/post/:x-:y-:z/something",
>  "params": {
>    "x": {
>    "match": "\\d",
>  },
>    "y": {
>    "match": "\\d",
>  },
>    "z": {
>    "match": "\\d",
>  }
> }
>

This one is already possible in current couchapp_legacy rewriter. I'm
not a fan to have something other than :

patterns: {
    "name1": "regexp",
     ...
}

It will make the system really complex. Maybe as an option though. I
can detect if I have an object or not. I think it would be better to
say ".*"

 (and i need to find a new name for couchapp_legacy)

- benoît

Re: rewriter needed changes

Posted by Benjamin Young <be...@couchone.com>.

On 1/20/11 10:40 AM, Volker Mische wrote:
> On 20.01.2011 16:29, Benjamin Young wrote:
>> On 1/18/11 5:47 PM, Benoit Chesneau wrote:
>>> On Mon, Jan 10, 2011 at 1:32 AM, Benoit Chesneau<bc...@gmail.com>
>>> wrote:
>>>> There are 2 tickets open for the rewriter :
>>>>
>>>> https://issues.apache.org/jira/browse/COUCHDB-1017
>>>> https://issues.apache.org/jira/browse/COUCHDB-1005
>>>>
>>>> First one is about testing types of value to eventually encode them
>>>> (or decode) from the path or query string. 1017 speak about strings
>>>> but it could be integer as well. This isn't possible actually.
>>>>
>>>> Second is to have a more enhanced rewriter. First intention of
>>>> _rewriter was to offer a simple way to dispatch urls to a resource
>>>> (_show, _update, _list, _view, doc, attachment) based on path terms
>>>> (string, ':var", "*"). Path specifications are obtained by breaking
>>>> url into tokens via the "/" separator, Then we match them against path
>>>> terms. That's how we find urls. There is also the possibility to use
>>>> query arguments as a path term. A rewriter like this is the easier
>>>> implementation we found, and as is the only that obtained a consensus.
>>>>
>>>> The feature asked in 1005 need more power than simple pattern 
>>>> matching.
>>>>
>>>> The more people will use CouchApps with CouchDB facing directly to the
>>>> web (without any proxy), the more people will ask for such features.
>>>>
>>>> I see 2 alternatives and easy pattern matching we can use to solve
>>>> such problem:
>>>>
>>>>
>>>> 1.
>>>>
>>>> Put var between "<>" like this<key>,
>>>> Then eventually say what is the type of the variable :<int:key> for
>>>> integer.
>>>>
>>>> Ex:
>>>>
>>>> {
>>>> "from": "/a/b/<key>/<int:id>",
>>>> "to":"/c/<key>",
>>>> "query": {
>>>> "key": "<int:key>"
>>>> }
>>>> }
>>>>
>>>> /a/b/c/13 -> /c/c?key=13
>>>>
>>>>
>>>> This solve 1017 and potentially 1005 .
>>>>
>>>> 2. Use mongrel2 pattern matching:
>>>>
>>>> <snip>
>>>> URL patterns always match from the start, routes are broken into
>>>> prefix and pattern part. We uses the routes to find the longest
>>>> matching prefix and then tests the pattern. If the pattern matches,
>>>> then the route works. If the route doesn't have a pattern, then it's
>>>> assumed to match, and you're done.
>>>>
>>>> The only caveat is you have to wrap your pattern parts in parenthesis,
>>>> but these don't mean anything other than to delimit where a pattern
>>>> starts. So instead of /images/.⋆.jpg, write /images/(.⋆.jpg) for it to
>>>> work.
>>>>
>>>> Here's the list of characters you can use in your patterns:
>>>>
>>>> . (period) All characters.
>>>> \a Letters.
>>>> \c Control characters.
>>>> \d Digits.
>>>> \l Lowercase letters.
>>>> \p Punctuation characters.
>>>> \s Space characters.
>>>> \u Uppercase letters.
>>>> \w Alphanumeric characters.
>>>> \x Hexadecimal digits.
>>>> \z The 0 character (null terminator).
>>>> [set] Just like a regex [] where is a set of chars, like [0-9] for
>>>> all digits.
>>>> [^set] Inverse character set, so [^0-9] is anything but digits.
>>>> ⋆ Longest match of 0 or more of the preceding character.
>>>> + Longest match of 1 or more of the preceding character.
>>>> - Shortest match of 0 or more of the preceding character.
>>>> ? 0 or 1 match of of the preceding character
>>>> \bxy Balanced match a substring starting with x and ending in y. So
>>>> \b() will match balanced parentheses.
>>>> $ End of the string.
>>>> Using the uppercase version of an escaped character makes it work the
>>>> opposite way (i.e., \A matches any character that isn't a letter). The
>>>> backslash can be used to escape the following character, disabling its
>>>> special abilities (i.e., \\ will match a backslash).
>>>>
>>>> Anything that's not listed here is matched literally.
>>>>
>>>> </snip>
>>>>
>>>> This solution is really simple, remove the useless things you have in
>>>> regexp and give complete power to the users. Also this kind of parsing
>>>> is relatively easy to do in erlang.
>>>>
>>>>
>>>> There may be a third solution. If we use something like emonk, erlv8,
>>>> ... we could have the rewriter in a js function. But it won't happend
>>>> in next 6 months . I'm pretty supporter of the second solution though,
>>>> and quite ready to start a new parser.
>>>>
>>>> Any thoughts ?
>>>>
>>>>
>>>> - benoît
>>>>
>>> Since then I started couchapp_legacy :
>>>
>>> https://github.com/benoitc/couchapp_legacy
>>>
>>> It embed a new rewriter doing both reversed and regexp based
>>> dispatching with some other features like :
>>>
>>> - Resource handlers plugin system, actually a rewriter and a proxy
>>> handler.
>>> - Route caching: rules are build only on first access or when the
>>> design doc is changed.
>>>
>>> TODO:
>>> - variable transformations : string -> int for ex
>>>
>>>
>>> There will be other features in couchapp_legacy plugin (current name)
>>> soon. Hope it helps to push the conversation further.
>>>
>>> - benoit
>> Benoit,
>>
>> Thanks for starting this conversation! :) I'd played with building a
>> RegEx-based rewriter for CouchDB, but I'm new to Erlang, so it's no
>> where near production ready. It's great to see someone else has an
>> interest in this piece of the puzzle as well.
>>
>> In the legacy couchapp there's a route that uses an options section to
>> define patterns. It seems like a promising direction for extending the
>> rewriter. I'd like to propose we build something like this:
>>
>> {
>> "method":"GET",
>> "from": "/page/:page",
>> "to": "/_show/post/:page",
>> "params": {
>> "page": {
>> "match": "\\w*",
>> "type": "string"
>> }
>> }
>> }
>>
>> If the parameter appears in the params section, we should use it's
>> "match" rather than that standard (.*) pattern. "type" in that section
>> would refer to the output type. Variables would continue to be
>> represented with the colon notation to keep the URL space clean (vs.
>> using RegEx in the URL as I'd planned to do).
>>
>> One other helpful addition might be an "engine" option to set the
>> matching system to use. I'd prefer using PCRE, you've mentioned Mongrel,
>> someone else might want grep. :)
>>
>> Thanks for starting this discussion, Benoit. I look forward to your
>> thoughts.
>>
>> Later,
>> Benjamin
>
> Benjamin,
>
> this is a quite simple example. Should the rewriter still be based on 
> path, i.e. on slashes as separator (as it currently is), or would also 
> things like this be possible:
>
> {
>   "from": "/page/:x/:y/:z",
>   "to": "/_show/post/:x-:y-:z/something",
>   "params": {
>     "x": {
>     "match": "\\d",
>   },
>     "y": {
>     "match": "\\d",
>   },
>     "z": {
>     "match": "\\d",
>   }
> }
>
> Cheers,
>   Volker
We definitely need top open up URL construction beyond just slashes. We 
may want to consider using a non-reserved character for our variable 
names as well. URI Templates use {var} and past URI related RFC's have 
used <var> around non-path/query related pieces to denote them as samples.

Re: rewriter needed changes

Posted by Volker Mische <vo...@gmail.com>.

On 20.01.2011 16:29, Benjamin Young wrote:
> On 1/18/11 5:47 PM, Benoit Chesneau wrote:
>> On Mon, Jan 10, 2011 at 1:32 AM, Benoit Chesneau<bc...@gmail.com>
>> wrote:
>>> There are 2 tickets open for the rewriter :
>>>
>>> https://issues.apache.org/jira/browse/COUCHDB-1017
>>> https://issues.apache.org/jira/browse/COUCHDB-1005
>>>
>>> First one is about testing types of value to eventually encode them
>>> (or decode) from the path or query string. 1017 speak about strings
>>> but it could be integer as well. This isn't possible actually.
>>>
>>> Second is to have a more enhanced rewriter. First intention of
>>> _rewriter was to offer a simple way to dispatch urls to a resource
>>> (_show, _update, _list, _view, doc, attachment) based on path terms
>>> (string, ':var", "*"). Path specifications are obtained by breaking
>>> url into tokens via the "/" separator, Then we match them against path
>>> terms. That's how we find urls. There is also the possibility to use
>>> query arguments as a path term. A rewriter like this is the easier
>>> implementation we found, and as is the only that obtained a consensus.
>>>
>>> The feature asked in 1005 need more power than simple pattern matching.
>>>
>>> The more people will use CouchApps with CouchDB facing directly to the
>>> web (without any proxy), the more people will ask for such features.
>>>
>>> I see 2 alternatives and easy pattern matching we can use to solve
>>> such problem:
>>>
>>>
>>> 1.
>>>
>>> Put var between "<>" like this<key>,
>>> Then eventually say what is the type of the variable :<int:key> for
>>> integer.
>>>
>>> Ex:
>>>
>>> {
>>> "from": "/a/b/<key>/<int:id>",
>>> "to":"/c/<key>",
>>> "query": {
>>> "key": "<int:key>"
>>> }
>>> }
>>>
>>> /a/b/c/13 -> /c/c?key=13
>>>
>>>
>>> This solve 1017 and potentially 1005 .
>>>
>>> 2. Use mongrel2 pattern matching:
>>>
>>> <snip>
>>> URL patterns always match from the start, routes are broken into
>>> prefix and pattern part. We uses the routes to find the longest
>>> matching prefix and then tests the pattern. If the pattern matches,
>>> then the route works. If the route doesn't have a pattern, then it's
>>> assumed to match, and you're done.
>>>
>>> The only caveat is you have to wrap your pattern parts in parenthesis,
>>> but these don't mean anything other than to delimit where a pattern
>>> starts. So instead of /images/.⋆.jpg, write /images/(.⋆.jpg) for it to
>>> work.
>>>
>>> Here's the list of characters you can use in your patterns:
>>>
>>> . (period) All characters.
>>> \a Letters.
>>> \c Control characters.
>>> \d Digits.
>>> \l Lowercase letters.
>>> \p Punctuation characters.
>>> \s Space characters.
>>> \u Uppercase letters.
>>> \w Alphanumeric characters.
>>> \x Hexadecimal digits.
>>> \z The 0 character (null terminator).
>>> [set] Just like a regex [] where is a set of chars, like [0-9] for
>>> all digits.
>>> [^set] Inverse character set, so [^0-9] is anything but digits.
>>> ⋆ Longest match of 0 or more of the preceding character.
>>> + Longest match of 1 or more of the preceding character.
>>> - Shortest match of 0 or more of the preceding character.
>>> ? 0 or 1 match of of the preceding character
>>> \bxy Balanced match a substring starting with x and ending in y. So
>>> \b() will match balanced parentheses.
>>> $ End of the string.
>>> Using the uppercase version of an escaped character makes it work the
>>> opposite way (i.e., \A matches any character that isn't a letter). The
>>> backslash can be used to escape the following character, disabling its
>>> special abilities (i.e., \\ will match a backslash).
>>>
>>> Anything that's not listed here is matched literally.
>>>
>>> </snip>
>>>
>>> This solution is really simple, remove the useless things you have in
>>> regexp and give complete power to the users. Also this kind of parsing
>>> is relatively easy to do in erlang.
>>>
>>>
>>> There may be a third solution. If we use something like emonk, erlv8,
>>> ... we could have the rewriter in a js function. But it won't happend
>>> in next 6 months . I'm pretty supporter of the second solution though,
>>> and quite ready to start a new parser.
>>>
>>> Any thoughts ?
>>>
>>>
>>> - benoît
>>>
>> Since then I started couchapp_legacy :
>>
>> https://github.com/benoitc/couchapp_legacy
>>
>> It embed a new rewriter doing both reversed and regexp based
>> dispatching with some other features like :
>>
>> - Resource handlers plugin system, actually a rewriter and a proxy
>> handler.
>> - Route caching: rules are build only on first access or when the
>> design doc is changed.
>>
>> TODO:
>> - variable transformations : string -> int for ex
>>
>>
>> There will be other features in couchapp_legacy plugin (current name)
>> soon. Hope it helps to push the conversation further.
>>
>> - benoit
> Benoit,
>
> Thanks for starting this conversation! :) I'd played with building a
> RegEx-based rewriter for CouchDB, but I'm new to Erlang, so it's no
> where near production ready. It's great to see someone else has an
> interest in this piece of the puzzle as well.
>
> In the legacy couchapp there's a route that uses an options section to
> define patterns. It seems like a promising direction for extending the
> rewriter. I'd like to propose we build something like this:
>
> {
> "method":"GET",
> "from": "/page/:page",
> "to": "/_show/post/:page",
> "params": {
> "page": {
> "match": "\\w*",
> "type": "string"
> }
> }
> }
>
> If the parameter appears in the params section, we should use it's
> "match" rather than that standard (.*) pattern. "type" in that section
> would refer to the output type. Variables would continue to be
> represented with the colon notation to keep the URL space clean (vs.
> using RegEx in the URL as I'd planned to do).
>
> One other helpful addition might be an "engine" option to set the
> matching system to use. I'd prefer using PCRE, you've mentioned Mongrel,
> someone else might want grep. :)
>
> Thanks for starting this discussion, Benoit. I look forward to your
> thoughts.
>
> Later,
> Benjamin

Benjamin,

this is a quite simple example. Should the rewriter still be based on 
path, i.e. on slashes as separator (as it currently is), or would also 
things like this be possible:

{
   "from": "/page/:x/:y/:z",
   "to": "/_show/post/:x-:y-:z/something",
   "params": {
     "x": {
     "match": "\\d",
   },
     "y": {
     "match": "\\d",
   },
     "z": {
     "match": "\\d",
   }
}

Cheers,
   Volker

Re: rewriter needed changes

Posted by Benjamin Young <be...@couchone.com>.

On 1/18/11 5:47 PM, Benoit Chesneau wrote:
> On Mon, Jan 10, 2011 at 1:32 AM, Benoit Chesneau<bc...@gmail.com>  wrote:
>> There are 2 tickets open for the rewriter :
>>
>> https://issues.apache.org/jira/browse/COUCHDB-1017
>> https://issues.apache.org/jira/browse/COUCHDB-1005
>>
>> First one is about testing types of value to eventually encode them
>> (or decode) from the path or query string. 1017 speak about strings
>> but it could be integer as well. This isn't possible actually.
>>
>> Second is to have a more enhanced rewriter.  First intention of
>> _rewriter was to offer a simple way to dispatch urls to a resource
>> (_show, _update, _list, _view, doc, attachment) based on path terms
>> (string, ':var", "*"). Path specifications are obtained by breaking
>> url into tokens via the "/" separator, Then we match them against path
>> terms. That's how we find urls. There is also the possibility to use
>> query arguments as a path term.  A rewriter like this is the easier
>> implementation we found, and as is the only that obtained a consensus.
>>
>> The feature asked in 1005 need more power than simple pattern matching.
>>
>> The more people will use CouchApps with CouchDB facing directly to the
>> web (without any proxy), the more people will ask for such features.
>>
>> I see 2 alternatives and easy pattern matching we can use to solve such problem:
>>
>>
>> 1.
>>
>> Put var between "<>" like this<key>,
>> Then eventually say what is the type of the variable :<int:key>  for integer.
>>
>> Ex:
>>
>> {
>>      "from": "/a/b/<key>/<int:id>",
>>      "to":"/c/<key>",
>>      "query": {
>>           "key": "<int:key>"
>>       }
>> }
>>
>> /a/b/c/13 ->  /c/c?key=13
>>
>>
>> This solve 1017 and potentially 1005 .
>>
>> 2. Use mongrel2 pattern matching:
>>
>> <snip>
>> URL patterns always match from the start, routes are broken into
>> prefix and pattern part. We uses the routes to find the longest
>> matching prefix and then tests the pattern. If the pattern matches,
>> then the route works. If the route doesn't have a pattern, then it's
>> assumed to match, and you're done.
>>
>> The only caveat is you have to wrap your pattern parts in parenthesis,
>> but these don't mean anything other than to delimit where a pattern
>> starts. So instead of /images/.⋆.jpg, write /images/(.⋆.jpg) for it to
>> work.
>>
>> Here's the list of characters you can use in your patterns:
>>
>> . (period) All characters.
>> \a Letters.
>> \c Control characters.
>> \d Digits.
>> \l Lowercase letters.
>> \p Punctuation characters.
>> \s Space characters.
>> \u Uppercase letters.
>> \w Alphanumeric characters.
>> \x Hexadecimal digits.
>> \z The 0 character (null terminator).
>> [set] Just like a regex [] where is a set of chars, like [0-9] for all digits.
>> [^set] Inverse character set, so [^0-9] is anything but digits.
>> ⋆ Longest match of 0 or more of the preceding character.
>> + Longest match of 1 or more of the preceding character.
>> - Shortest match of 0 or more of the preceding character.
>> ? 0 or 1 match of of the preceding character
>> \bxy Balanced match a substring starting with x and ending in y. So
>> \b() will match balanced parentheses.
>> $ End of the string.
>> Using the uppercase version of an escaped character makes it work the
>> opposite way (i.e., \A matches any character that isn't a letter). The
>> backslash can be used to escape the following character, disabling its
>> special abilities (i.e., \\ will match a backslash).
>>
>> Anything that's not listed here is matched literally.
>>
>> </snip>
>>
>> This solution is really simple, remove the useless things you have in
>> regexp and give complete power to the users. Also this kind of parsing
>> is relatively easy to do in erlang.
>>
>>
>> There may be a third solution. If we use something like emonk, erlv8,
>> ... we could have the rewriter in a js function. But it won't happend
>> in next 6 months . I'm pretty supporter of the second solution though,
>> and quite ready to start a new parser.
>>
>> Any thoughts ?
>>
>>
>> - benoît
>>
> Since then I started couchapp_legacy :
>
> https://github.com/benoitc/couchapp_legacy
>
> It embed a new rewriter doing both reversed  and regexp based
> dispatching with some other features like :
>
> - Resource handlers plugin system, actually a rewriter and a proxy handler.
> - Route caching: rules are build only on first access or when the
> design doc is changed.
>
> TODO:
> - variable transformations : string ->  int for ex
>
>
> There will be other features in couchapp_legacy plugin (current name)
> soon. Hope it helps to push the conversation further.
>
> - benoit
Benoit,

Thanks for starting this conversation! :) I'd played with building a 
RegEx-based rewriter for CouchDB, but I'm new to Erlang, so it's no 
where near production ready. It's great to see someone else has an 
interest in this piece of the puzzle as well.

In the legacy couchapp there's a route that uses an options section to 
define patterns. It seems like a promising direction for extending the 
rewriter. I'd like to propose we build something like this:

{
     "method":"GET",
     "from": "/page/:page",
     "to": "/_show/post/:page",
     "params": {
         "page": {
             "match": "\\w*",
             "type": "string"
         }
     }
}

If the parameter appears in the params section, we should use it's 
"match" rather than that standard (.*) pattern. "type" in that section 
would refer to the output type. Variables would continue to be 
represented with the colon notation to keep the URL space clean (vs. 
using RegEx in the URL as I'd planned to do).

One other helpful addition might be an "engine" option to set the 
matching system to use. I'd prefer using PCRE, you've mentioned Mongrel, 
someone else might want grep. :)

Thanks for starting this discussion, Benoit. I look forward to your 
thoughts.

Later,
Benjamin

Re: rewriter needed changes

Posted by Benoit Chesneau <bc...@gmail.com>.

On Mon, Jan 10, 2011 at 1:32 AM, Benoit Chesneau <bc...@gmail.com> wrote:
> There are 2 tickets open for the rewriter :
>
> https://issues.apache.org/jira/browse/COUCHDB-1017
> https://issues.apache.org/jira/browse/COUCHDB-1005
>
> First one is about testing types of value to eventually encode them
> (or decode) from the path or query string. 1017 speak about strings
> but it could be integer as well. This isn't possible actually.
>
> Second is to have a more enhanced rewriter.  First intention of
> _rewriter was to offer a simple way to dispatch urls to a resource
> (_show, _update, _list, _view, doc, attachment) based on path terms
> (string, ':var", "*"). Path specifications are obtained by breaking
> url into tokens via the "/" separator, Then we match them against path
> terms. That's how we find urls. There is also the possibility to use
> query arguments as a path term.  A rewriter like this is the easier
> implementation we found, and as is the only that obtained a consensus.
>
> The feature asked in 1005 need more power than simple pattern matching.
>
> The more people will use CouchApps with CouchDB facing directly to the
> web (without any proxy), the more people will ask for such features.
>
> I see 2 alternatives and easy pattern matching we can use to solve such problem:
>
>
> 1.
>
> Put var between "<>" like this <key>,
> Then eventually say what is the type of the variable : <int:key> for integer.
>
> Ex:
>
> {
>     "from": "/a/b/<key>/<int:id>",
>     "to":"/c/<key>",
>     "query": {
>          "key": "<int:key>"
>      }
> }
>
> /a/b/c/13 -> /c/c?key=13
>
>
> This solve 1017 and potentially 1005 .
>
> 2. Use mongrel2 pattern matching:
>
> <snip>
> URL patterns always match from the start, routes are broken into
> prefix and pattern part. We uses the routes to find the longest
> matching prefix and then tests the pattern. If the pattern matches,
> then the route works. If the route doesn't have a pattern, then it's
> assumed to match, and you're done.
>
> The only caveat is you have to wrap your pattern parts in parenthesis,
> but these don't mean anything other than to delimit where a pattern
> starts. So instead of /images/.⋆.jpg, write /images/(.⋆.jpg) for it to
> work.
>
> Here's the list of characters you can use in your patterns:
>
> . (period) All characters.
> \a Letters.
> \c Control characters.
> \d Digits.
> \l Lowercase letters.
> \p Punctuation characters.
> \s Space characters.
> \u Uppercase letters.
> \w Alphanumeric characters.
> \x Hexadecimal digits.
> \z The 0 character (null terminator).
> [set] Just like a regex [] where is a set of chars, like [0-9] for all digits.
> [^set] Inverse character set, so [^0-9] is anything but digits.
> ⋆ Longest match of 0 or more of the preceding character.
> + Longest match of 1 or more of the preceding character.
> - Shortest match of 0 or more of the preceding character.
> ? 0 or 1 match of of the preceding character
> \bxy Balanced match a substring starting with x and ending in y. So
> \b() will match balanced parentheses.
> $ End of the string.
> Using the uppercase version of an escaped character makes it work the
> opposite way (i.e., \A matches any character that isn't a letter). The
> backslash can be used to escape the following character, disabling its
> special abilities (i.e., \\ will match a backslash).
>
> Anything that's not listed here is matched literally.
>
> </snip>
>
> This solution is really simple, remove the useless things you have in
> regexp and give complete power to the users. Also this kind of parsing
> is relatively easy to do in erlang.
>
>
> There may be a third solution. If we use something like emonk, erlv8,
> ... we could have the rewriter in a js function. But it won't happend
> in next 6 months . I'm pretty supporter of the second solution though,
> and quite ready to start a new parser.
>
> Any thoughts ?
>
>
> - benoît
>

Since then I started couchapp_legacy :

https://github.com/benoitc/couchapp_legacy

It embed a new rewriter doing both reversed  and regexp based
dispatching with some other features like :

- Resource handlers plugin system, actually a rewriter and a proxy handler.
- Route caching: rules are build only on first access or when the
design doc is changed.

TODO:
- variable transformations : string -> int for ex


There will be other features in couchapp_legacy plugin (current name)
soon. Hope it helps to push the conversation further.

- benoit

Re: rewriter needed changes

Posted by Benoit Chesneau <bc...@gmail.com>.

On Mon, Jan 10, 2011 at 12:03 PM, Volker Mische <vo...@gmail.com> wrote:

>
> I'm +1 for a more powerful rewriter. Though I haven't quite understood how
> those mongrel2 style rewrites will actually look like. I understand how to
> match a pattern, but how is it rewritten after that?
>

Each patterns become a var you can reuse in your rewritten url.
Following discussion on irc, I started an implementation as a couchdb
module, so it will be more easier to understand. Hopefully I will have
something to show on wed.

- benoît

Re: rewriter needed changes

Posted by Volker Mische <vo...@gmail.com>.

Hi,

On 10.01.2011 01:32, Benoit Chesneau wrote:
> 2. Use mongrel2 pattern matching:
>
> <snip>
> URL patterns always match from the start, routes are broken into
> prefix and pattern part. We uses the routes to find the longest
> matching prefix and then tests the pattern. If the pattern matches,
> then the route works. If the route doesn't have a pattern, then it's
> assumed to match, and you're done.
>
> The only caveat is you have to wrap your pattern parts in parenthesis,
> but these don't mean anything other than to delimit where a pattern
> starts. So instead of /images/.⋆.jpg, write /images/(.⋆.jpg) for it to
> work.
>
> [...]
> </snip>
>
> This solution is really simple, remove the useless things you have in
> regexp and give complete power to the users. Also this kind of parsing
> is relatively easy to do in erlang.
>
> [...]
 >
> Any thoughts ?
>
>
> - benoît

I'm +1 for a more powerful rewriter. Though I haven't quite understood 
how those mongrel2 style rewrites will actually look like. I understand 
how to match a pattern, but how is it rewritten after that?

Cheers,
   Volker