You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@trafficserver.apache.org by "Manjesh Nilange (JIRA)" <ji...@apache.org> on 2009/12/15 19:41:18 UTC

[jira] Created: (TS-87) Performance improvement: Avoid linear search in remap code

Performance improvement: Avoid linear search in remap code
----------------------------------------------------------

                 Key: TS-87
                 URL: https://issues.apache.org/jira/browse/TS-87
             Project: Traffic Server
          Issue Type: Improvement
          Components: Core
            Reporter: Manjesh Nilange
            Assignee: Manjesh Nilange
            Priority: Minor


Currently, the remap code stores all the remap rules in a hash table keyed by the host field of the "from URL" of the rule. Entries with the same host field are chained in a linked list. When looking for a mapping, the code has to do a linear traversal of this list, based on the path of the incoming request to find a matching rule. This performance should be improved as overhead can be substantial in cases where there are hundreds of the entries for the same host.


-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (TS-87) Performance improvement: Avoid linear search in remap code

Posted by "Manjesh Nilange (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/TS-87?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12799983#action_12799983 ] 

Manjesh Nilange commented on TS-87:
-----------------------------------

I should've added this in the patch description: The patch adds 3 new files (Trie.h, UrlMappingPathIndex.h, UrlMappingPathIndex.cc) and deletes two files (UmsHelper.h, UmsHelper.cc).


> Performance improvement: Avoid linear search in remap code
> ----------------------------------------------------------
>
>                 Key: TS-87
>                 URL: https://issues.apache.org/jira/browse/TS-87
>             Project: Traffic Server
>          Issue Type: Improvement
>          Components: Core
>            Reporter: Manjesh Nilange
>            Assignee: Manjesh Nilange
>            Priority: Minor
>         Attachments: trie.patch
>
>
> Currently, the remap code stores all the remap rules in a hash table keyed by the host field of the "from URL" of the rule. Entries with the same host field are chained in a linked list. When looking for a mapping, the code has to do a linear traversal of this list, based on the path of the incoming request to find a matching rule. This performance should be improved as overhead can be substantial in cases where there are hundreds of the entries for the same host.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (TS-87) Performance improvement: Avoid linear search in remap code

Posted by "Manjesh Nilange (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/TS-87?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12834102#action_12834102 ] 

Manjesh Nilange commented on TS-87:
-----------------------------------

Leif, I tested the patch with a similar remap.config for around 3 hrs with 10M requests with 100 concurrent clients and didn't see a crash. The aspect in which my remap.config differed is the "to field" of the lines. I run an Apache on my box and use that as the OS and I used Apache's URL with all caching turned off. I don't know if turning caching off made the difference. Can you either try with no caching or give me a core dump or something? 

Btw, the crash that we followed up in Y! was triggered by regex mappings. So that shouldn't have anything to do with this.


> Performance improvement: Avoid linear search in remap code
> ----------------------------------------------------------
>
>                 Key: TS-87
>                 URL: https://issues.apache.org/jira/browse/TS-87
>             Project: Traffic Server
>          Issue Type: Improvement
>          Components: Core
>            Reporter: Manjesh Nilange
>            Assignee: Leif Hedstrom
>            Priority: Minor
>             Fix For: 2.0.0a
>
>         Attachments: trie.patch
>
>
> Currently, the remap code stores all the remap rules in a hash table keyed by the host field of the "from URL" of the rule. Entries with the same host field are chained in a linked list. When looking for a mapping, the code has to do a linear traversal of this list, based on the path of the incoming request to find a matching rule. This performance should be improved as overhead can be substantial in cases where there are hundreds of the entries for the same host.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Work started: (TS-87) Performance improvement: Avoid linear search in remap code

Posted by "Leif Hedstrom (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/TS-87?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Work on TS-87 started by Leif Hedstrom.

> Performance improvement: Avoid linear search in remap code
> ----------------------------------------------------------
>
>                 Key: TS-87
>                 URL: https://issues.apache.org/jira/browse/TS-87
>             Project: Traffic Server
>          Issue Type: Improvement
>          Components: Core
>            Reporter: Manjesh Nilange
>            Assignee: Leif Hedstrom
>            Priority: Minor
>             Fix For: 2.0.0a
>
>         Attachments: trie.patch
>
>
> Currently, the remap code stores all the remap rules in a hash table keyed by the host field of the "from URL" of the rule. Entries with the same host field are chained in a linked list. When looking for a mapping, the code has to do a linear traversal of this list, based on the path of the incoming request to find a matching rule. This performance should be improved as overhead can be substantial in cases where there are hundreds of the entries for the same host.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Assigned: (TS-87) Performance improvement: Avoid linear search in remap code

Posted by "Leif Hedstrom (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/TS-87?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Leif Hedstrom reassigned TS-87:
-------------------------------

    Assignee: Leif Hedstrom  (was: Manjesh Nilange)

> Performance improvement: Avoid linear search in remap code
> ----------------------------------------------------------
>
>                 Key: TS-87
>                 URL: https://issues.apache.org/jira/browse/TS-87
>             Project: Traffic Server
>          Issue Type: Improvement
>          Components: Core
>            Reporter: Manjesh Nilange
>            Assignee: Leif Hedstrom
>            Priority: Minor
>             Fix For: 2.0.0a
>
>         Attachments: trie.patch
>
>
> Currently, the remap code stores all the remap rules in a hash table keyed by the host field of the "from URL" of the rule. Entries with the same host field are chained in a linked list. When looking for a mapping, the code has to do a linear traversal of this list, based on the path of the incoming request to find a matching rule. This performance should be improved as overhead can be substantial in cases where there are hundreds of the entries for the same host.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Updated: (TS-87) Performance improvement: Avoid linear search in remap code

Posted by "Leif Hedstrom (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/TS-87?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Leif Hedstrom updated TS-87:
----------------------------

    Fix Version/s: 2.0.0a

> Performance improvement: Avoid linear search in remap code
> ----------------------------------------------------------
>
>                 Key: TS-87
>                 URL: https://issues.apache.org/jira/browse/TS-87
>             Project: Traffic Server
>          Issue Type: Improvement
>          Components: Core
>            Reporter: Manjesh Nilange
>            Assignee: Manjesh Nilange
>            Priority: Minor
>             Fix For: 2.0.0a
>
>         Attachments: trie.patch
>
>
> Currently, the remap code stores all the remap rules in a hash table keyed by the host field of the "from URL" of the rule. Entries with the same host field are chained in a linked list. When looking for a mapping, the code has to do a linear traversal of this list, based on the path of the incoming request to find a matching rule. This performance should be improved as overhead can be substantial in cases where there are hundreds of the entries for the same host.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Resolved: (TS-87) Performance improvement: Avoid linear search in remap code

Posted by "Leif Hedstrom (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/TS-87?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Leif Hedstrom resolved TS-87.
-----------------------------

    Resolution: Fixed

> Performance improvement: Avoid linear search in remap code
> ----------------------------------------------------------
>
>                 Key: TS-87
>                 URL: https://issues.apache.org/jira/browse/TS-87
>             Project: Traffic Server
>          Issue Type: Improvement
>          Components: Core
>            Reporter: Manjesh Nilange
>            Assignee: Leif Hedstrom
>            Priority: Minor
>             Fix For: 2.0.0a
>
>         Attachments: trie.patch
>
>
> Currently, the remap code stores all the remap rules in a hash table keyed by the host field of the "from URL" of the rule. Entries with the same host field are chained in a linked list. When looking for a mapping, the code has to do a linear traversal of this list, based on the path of the incoming request to find a matching rule. This performance should be improved as overhead can be substantial in cases where there are hundreds of the entries for the same host.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Updated: (TS-87) Performance improvement: Avoid linear search in remap code

Posted by "Manjesh Nilange (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/TS-87?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Manjesh Nilange updated TS-87:
------------------------------

    Attachment: trie.patch

Implementation of proposed change

> Performance improvement: Avoid linear search in remap code
> ----------------------------------------------------------
>
>                 Key: TS-87
>                 URL: https://issues.apache.org/jira/browse/TS-87
>             Project: Traffic Server
>          Issue Type: Improvement
>          Components: Core
>            Reporter: Manjesh Nilange
>            Assignee: Manjesh Nilange
>            Priority: Minor
>         Attachments: trie.patch
>
>
> Currently, the remap code stores all the remap rules in a hash table keyed by the host field of the "from URL" of the rule. Entries with the same host field are chained in a linked list. When looking for a mapping, the code has to do a linear traversal of this list, based on the path of the incoming request to find a matching rule. This performance should be improved as overhead can be substantial in cases where there are hundreds of the entries for the same host.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (TS-87) Performance improvement: Avoid linear search in remap code

Posted by "Leif Hedstrom (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/TS-87?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12834256#action_12834256 ] 

Leif Hedstrom commented on TS-87:
---------------------------------

I ran more tests without this patch, given enough time, ATS built from trunk will crash anyways ... So, this is a red herring right now.

> Performance improvement: Avoid linear search in remap code
> ----------------------------------------------------------
>
>                 Key: TS-87
>                 URL: https://issues.apache.org/jira/browse/TS-87
>             Project: Traffic Server
>          Issue Type: Improvement
>          Components: Core
>            Reporter: Manjesh Nilange
>            Assignee: Leif Hedstrom
>            Priority: Minor
>             Fix For: 2.0.0a
>
>         Attachments: trie.patch
>
>
> Currently, the remap code stores all the remap rules in a hash table keyed by the host field of the "from URL" of the rule. Entries with the same host field are chained in a linked list. When looking for a mapping, the code has to do a linear traversal of this list, based on the path of the incoming request to find a matching rule. This performance should be improved as overhead can be substantial in cases where there are hundreds of the entries for the same host.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (TS-87) Performance improvement: Avoid linear search in remap code

Posted by "Leif Hedstrom (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/TS-87?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12833229#action_12833229 ] 

Leif Hedstrom commented on TS-87:
---------------------------------

This is the remap.config I used, the records.config is pretty much default ATS records.config, with port changed to 8080, and 16MB RAM cache (and a disk cache in storage.config, of course).

ap http://loki.ogre.com/ycs1  http://l.yimg.com/a/lib/ycs
map http://loki.ogre.com/ycs2  http://l.yimg.com/a/lib/ycs
map http://loki.ogre.com/ycs3  http://l.yimg.com/a/lib/ycs
map http://loki.ogre.com/ycs4  http://l.yimg.com/a/lib/ycs
map http://loki.ogre.com/ycs5  http://l.yimg.com/a/lib/ycs
map http://loki.ogre.com/ycs6  http://l.yimg.com/a/lib/ycs
map http://loki.ogre.com/ycs7  http://l.yimg.com/a/lib/ycs
map http://loki.ogre.com/ycs8  http://l.yimg.com/a/lib/ycs
map http://loki.ogre.com/ycs9  http://l.yimg.com/a/lib/ycs
map http://loki.ogre.com/ycs  http://l.yimg.com/a/lib/ycs

The test just hits these URLs, running for about an hour crashes on my box. Note that this is 100% cache hits.


> Performance improvement: Avoid linear search in remap code
> ----------------------------------------------------------
>
>                 Key: TS-87
>                 URL: https://issues.apache.org/jira/browse/TS-87
>             Project: Traffic Server
>          Issue Type: Improvement
>          Components: Core
>            Reporter: Manjesh Nilange
>            Assignee: Leif Hedstrom
>            Priority: Minor
>             Fix For: 2.0.0a
>
>         Attachments: trie.patch
>
>
> Currently, the remap code stores all the remap rules in a hash table keyed by the host field of the "from URL" of the rule. Entries with the same host field are chained in a linked list. When looking for a mapping, the code has to do a linear traversal of this list, based on the path of the incoming request to find a matching rule. This performance should be improved as overhead can be substantial in cases where there are hundreds of the entries for the same host.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (TS-87) Performance improvement: Avoid linear search in remap code

Posted by "Leif Hedstrom (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/TS-87?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12834122#action_12834122 ] 

Leif Hedstrom commented on TS-87:
---------------------------------

I ran it with 1000 connections. But, I can try to reproduce it again, I'll work on it tomorrow. It reproduced every time for me when I tried, after about 1h running at 20,000 QPS.

> Performance improvement: Avoid linear search in remap code
> ----------------------------------------------------------
>
>                 Key: TS-87
>                 URL: https://issues.apache.org/jira/browse/TS-87
>             Project: Traffic Server
>          Issue Type: Improvement
>          Components: Core
>            Reporter: Manjesh Nilange
>            Assignee: Leif Hedstrom
>            Priority: Minor
>             Fix For: 2.0.0a
>
>         Attachments: trie.patch
>
>
> Currently, the remap code stores all the remap rules in a hash table keyed by the host field of the "from URL" of the rule. Entries with the same host field are chained in a linked list. When looking for a mapping, the code has to do a linear traversal of this list, based on the path of the incoming request to find a matching rule. This performance should be improved as overhead can be substantial in cases where there are hundreds of the entries for the same host.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (TS-87) Performance improvement: Avoid linear search in remap code

Posted by "Manjesh Nilange (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/TS-87?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12799980#action_12799980 ] 

Manjesh Nilange commented on TS-87:
-----------------------------------

Proposed change:

Design: 

We create a trie for all the mappings for a given host, specific to a <url_type, port> combination. The <url_type, port> combination is required so that mappings for different purposes exist separately. Each node of this trie will have a pointer to the mapping and it's rank (if one exists) and a array of child node pointers.

Insert: We first hash the host field of the incoming request. We use the trie collection for that host, if it exists or else create a new one. We then find/create a trie for the <url_type, port> combination of the incoming request and then store the mapping in the trie. 

Search: We first find hash the host field and find the trie collection specific to that host. We then use the <url_type, port> of the incoming request to find a trie within that collection. We then traverse that trie using the incoming request's path. As we traverse, we retain the best-ranked mapping we encounter. This ensures both these properties:

1) incoming path /abcd should match configured path /a - this works because we encounter the mapping for /a in the earlier the traversal itself

2) if mappings are configured for both /a and /abcd and /abcd has a better rank (using line number for now), the mapping for /abcd will be returned as it'll win over the first mapping of /a owing to rank.

In the hash table, we store a collection of tries for every host string. 



New classes:

Trie - A template class that implements a trie; It's a template because it can be instantiated to store anything in the nodes. Right now it's only instantiation is to store url_mapping pointers.

UrlMappingPathIndex - Serves as an abstraction for the trie collection. This class hides away the <url_type, port> differentiating and the actual Trie. UrlRewrite.cc just stores UrlMappingPathIndex objects in the hash table and is not aware of any other details.


> Performance improvement: Avoid linear search in remap code
> ----------------------------------------------------------
>
>                 Key: TS-87
>                 URL: https://issues.apache.org/jira/browse/TS-87
>             Project: Traffic Server
>          Issue Type: Improvement
>          Components: Core
>            Reporter: Manjesh Nilange
>            Assignee: Manjesh Nilange
>            Priority: Minor
>         Attachments: trie.patch
>
>
> Currently, the remap code stores all the remap rules in a hash table keyed by the host field of the "from URL" of the rule. Entries with the same host field are chained in a linked list. When looking for a mapping, the code has to do a linear traversal of this list, based on the path of the incoming request to find a matching rule. This performance should be improved as overhead can be substantial in cases where there are hundreds of the entries for the same host.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (TS-87) Performance improvement: Avoid linear search in remap code

Posted by "Leif Hedstrom (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/TS-87?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12832768#action_12832768 ] 

Leif Hedstrom commented on TS-87:
---------------------------------

While testing, I discovered TS with this patch segfaults. Y! development team is working on a fix I think, so holding off on the merging of thise for now.

> Performance improvement: Avoid linear search in remap code
> ----------------------------------------------------------
>
>                 Key: TS-87
>                 URL: https://issues.apache.org/jira/browse/TS-87
>             Project: Traffic Server
>          Issue Type: Improvement
>          Components: Core
>            Reporter: Manjesh Nilange
>            Assignee: Leif Hedstrom
>            Priority: Minor
>             Fix For: 2.0.0a
>
>         Attachments: trie.patch
>
>
> Currently, the remap code stores all the remap rules in a hash table keyed by the host field of the "from URL" of the rule. Entries with the same host field are chained in a linked list. When looking for a mapping, the code has to do a linear traversal of this list, based on the path of the incoming request to find a matching rule. This performance should be improved as overhead can be substantial in cases where there are hundreds of the entries for the same host.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.