You are viewing a plain text version of this content. The canonical link for it is here.

Posted to solr-dev@lucene.apache.org by "Michael Henson (JIRA)" <ji...@apache.org> on 2010/02/23 00:08:27 UTC

[jira] Created: (SOLR-1787) CachedSqlEntityProcessor pre-warmed cache use case

CachedSqlEntityProcessor pre-warmed cache use case
--------------------------------------------------

Key: SOLR-1787
URL: https://issues.apache.org/jira/browse/SOLR-1787
Project: Solr
Issue Type: Improvement
Components: contrib - DataImportHandler
Affects Versions: 1.5
Environment: jdk 1.6.x, windows xp, tomcat 6.x
Reporter: Michael Henson
Priority: Minor
Fix For: 1.5

The CachedSqlEntityProcessor currently builds a cache of rows it sees as it goes, so later requests for that same key can be served from data that has already been fetched. The primary query could be written to fetch all possible rows, which would then be set into the cache on the first request for a row. In that case the database would only receive another query when there is a cache miss. However, the query it would execute is the one that pulls all rows, negating any performance gain.

This patch adds the ability to configure behavior on cache miss with the "onCacheMiss" attribute on an "entity" tag in the data-config.xml file. The current behavior is the default, corresponding to the setting onCacheMiss="fill". Any other value explicitly given for onCacheMiss will cause cache misses to be ignored - no query will be made to the db to fulfill them.

I've encountered two cases where this capability is useful:

1. Relatively small datasets, such as category id -> category name mappings, which will not change during the course of indexing.
2. Queries which are heavy on db resources per-query, particularly if the query for an individual record is slow, and can't be fixed easily on the db side for whatever reason.

--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (SOLR-1787) CachedSqlEntityProcessor pre-warmed cache use case

Posted by "Noble Paul (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/SOLR-1787?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12837251#action_12837251 ] 

Noble Paul commented on SOLR-1787:
----------------------------------

is pre-warming done in this patch?

> CachedSqlEntityProcessor pre-warmed cache use case
> --------------------------------------------------
>
>                 Key: SOLR-1787
>                 URL: https://issues.apache.org/jira/browse/SOLR-1787
>             Project: Solr
>          Issue Type: Improvement
>          Components: contrib - DataImportHandler
>    Affects Versions: 1.5
>         Environment: jdk 1.6.x, windows xp, tomcat 6.x
>            Reporter: Michael Henson
>            Priority: Minor
>             Fix For: 1.5
>
>         Attachments: solr-1787.patch
>
>
> The CachedSqlEntityProcessor currently builds a cache of rows it sees as it goes, so later requests for that same key can be served from data that has already been fetched. The primary query could be written to fetch all possible rows, which would then be set into the cache on the first request for a row. In that case the database would only receive another query when there is a cache miss. However, the query it would execute is the one that pulls all rows, negating any performance gain.
> This patch adds the ability to configure behavior on cache miss with the "onCacheMiss" attribute on an "entity" tag in the data-config.xml file. The current behavior is the default, corresponding to the setting onCacheMiss="fill". Any other value explicitly given for onCacheMiss will cause cache misses to be ignored - no query will be made to the db to fulfill them.
> I've encountered two cases where this capability is useful:
> 1. Relatively small datasets, such as category id -> category name mappings, which will not change during the course of indexing.
> 2. Queries which are heavy on db resources per-query, particularly if the query for an individual record is slow, and can't be fixed easily on the db side for whatever reason.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Updated: (SOLR-1787) Add ability to configure behavior of cache miss to CachedSqlEntityProcessor

Posted by "Michael Henson (JIRA)" <ji...@apache.org>.

     [ https://issues.apache.org/jira/browse/SOLR-1787?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Michael Henson updated SOLR-1787:
---------------------------------

    Summary: Add ability to configure behavior of cache miss to CachedSqlEntityProcessor  (was: CachedSqlEntityProcessor pre-warmed cache use case)

> Add ability to configure behavior of cache miss to CachedSqlEntityProcessor
> ---------------------------------------------------------------------------
>
>                 Key: SOLR-1787
>                 URL: https://issues.apache.org/jira/browse/SOLR-1787
>             Project: Solr
>          Issue Type: Improvement
>          Components: contrib - DataImportHandler
>    Affects Versions: 1.5
>         Environment: jdk 1.6.x, windows xp, tomcat 6.x
>            Reporter: Michael Henson
>            Assignee: Noble Paul
>            Priority: Minor
>             Fix For: 1.5
>
>         Attachments: solr-1787.patch
>
>
> The CachedSqlEntityProcessor currently builds a cache of rows it sees as it goes, so later requests for that same key can be served from data that has already been fetched. The primary query could be written to fetch all possible rows, which would then be set into the cache on the first request for a row. In that case the database would only receive another query when there is a cache miss. However, the query it would execute is the one that pulls all rows, negating any performance gain.
> This patch adds the ability to configure behavior on cache miss with the "onCacheMiss" attribute on an "entity" tag in the data-config.xml file. The current behavior is the default, corresponding to the setting onCacheMiss="fill". Any other value explicitly given for onCacheMiss will cause cache misses to be ignored - no query will be made to the db to fulfill them.
> I've encountered two cases where this capability is useful:
> 1. Relatively small datasets, such as category id -> category name mappings, which will not change during the course of indexing.
> 2. Queries which are heavy on db resources per-query, particularly if the query for an individual record is slow, and can't be fixed easily on the db side for whatever reason.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Updated: (SOLR-1787) CachedSqlEntityProcessor pre-warmed cache use case

Posted by "Michael Henson (JIRA)" <ji...@apache.org>.

     [ https://issues.apache.org/jira/browse/SOLR-1787?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Michael Henson updated SOLR-1787:
---------------------------------

    Attachment: solr-1787.patch

First draft patch. Fairly straightforward.

> CachedSqlEntityProcessor pre-warmed cache use case
> --------------------------------------------------
>
>                 Key: SOLR-1787
>                 URL: https://issues.apache.org/jira/browse/SOLR-1787
>             Project: Solr
>          Issue Type: Improvement
>          Components: contrib - DataImportHandler
>    Affects Versions: 1.5
>         Environment: jdk 1.6.x, windows xp, tomcat 6.x
>            Reporter: Michael Henson
>            Priority: Minor
>             Fix For: 1.5
>
>         Attachments: solr-1787.patch
>
>
> The CachedSqlEntityProcessor currently builds a cache of rows it sees as it goes, so later requests for that same key can be served from data that has already been fetched. The primary query could be written to fetch all possible rows, which would then be set into the cache on the first request for a row. In that case the database would only receive another query when there is a cache miss. However, the query it would execute is the one that pulls all rows, negating any performance gain.
> This patch adds the ability to configure behavior on cache miss with the "onCacheMiss" attribute on an "entity" tag in the data-config.xml file. The current behavior is the default, corresponding to the setting onCacheMiss="fill". Any other value explicitly given for onCacheMiss will cause cache misses to be ignored - no query will be made to the db to fulfill them.
> I've encountered two cases where this capability is useful:
> 1. Relatively small datasets, such as category id -> category name mappings, which will not change during the course of indexing.
> 2. Queries which are heavy on db resources per-query, particularly if the query for an individual record is slow, and can't be fixed easily on the db side for whatever reason.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (SOLR-1787) CachedSqlEntityProcessor pre-warmed cache use case

Posted by "Michael Henson (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/SOLR-1787?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12837424#action_12837424 ] 

Michael Henson commented on SOLR-1787:
--------------------------------------

There isn't any new code to make a pre-warming call. The design relies on filling the cache on the first call to getAllNonCachedRows().

I'm also relying on the synchronize block of ThreadedEntityProcessorWrapper.nextRow() to block other threads while the cache filling query runs. It looks like I'm missing some aspect of multi-threading because I get SQLException's with "could not execute query" when I have threads="3" on the root entity.

> CachedSqlEntityProcessor pre-warmed cache use case
> --------------------------------------------------
>
>                 Key: SOLR-1787
>                 URL: https://issues.apache.org/jira/browse/SOLR-1787
>             Project: Solr
>          Issue Type: Improvement
>          Components: contrib - DataImportHandler
>    Affects Versions: 1.5
>         Environment: jdk 1.6.x, windows xp, tomcat 6.x
>            Reporter: Michael Henson
>            Assignee: Noble Paul
>            Priority: Minor
>             Fix For: 1.5
>
>         Attachments: solr-1787.patch
>
>
> The CachedSqlEntityProcessor currently builds a cache of rows it sees as it goes, so later requests for that same key can be served from data that has already been fetched. The primary query could be written to fetch all possible rows, which would then be set into the cache on the first request for a row. In that case the database would only receive another query when there is a cache miss. However, the query it would execute is the one that pulls all rows, negating any performance gain.
> This patch adds the ability to configure behavior on cache miss with the "onCacheMiss" attribute on an "entity" tag in the data-config.xml file. The current behavior is the default, corresponding to the setting onCacheMiss="fill". Any other value explicitly given for onCacheMiss will cause cache misses to be ignored - no query will be made to the db to fulfill them.
> I've encountered two cases where this capability is useful:
> 1. Relatively small datasets, such as category id -> category name mappings, which will not change during the course of indexing.
> 2. Queries which are heavy on db resources per-query, particularly if the query for an individual record is slow, and can't be fixed easily on the db side for whatever reason.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Assigned: (SOLR-1787) CachedSqlEntityProcessor pre-warmed cache use case

Posted by "Noble Paul (JIRA)" <ji...@apache.org>.

     [ https://issues.apache.org/jira/browse/SOLR-1787?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Noble Paul reassigned SOLR-1787:
--------------------------------

    Assignee: Noble Paul

> CachedSqlEntityProcessor pre-warmed cache use case
> --------------------------------------------------
>
>                 Key: SOLR-1787
>                 URL: https://issues.apache.org/jira/browse/SOLR-1787
>             Project: Solr
>          Issue Type: Improvement
>          Components: contrib - DataImportHandler
>    Affects Versions: 1.5
>         Environment: jdk 1.6.x, windows xp, tomcat 6.x
>            Reporter: Michael Henson
>            Assignee: Noble Paul
>            Priority: Minor
>             Fix For: 1.5
>
>         Attachments: solr-1787.patch
>
>
> The CachedSqlEntityProcessor currently builds a cache of rows it sees as it goes, so later requests for that same key can be served from data that has already been fetched. The primary query could be written to fetch all possible rows, which would then be set into the cache on the first request for a row. In that case the database would only receive another query when there is a cache miss. However, the query it would execute is the one that pulls all rows, negating any performance gain.
> This patch adds the ability to configure behavior on cache miss with the "onCacheMiss" attribute on an "entity" tag in the data-config.xml file. The current behavior is the default, corresponding to the setting onCacheMiss="fill". Any other value explicitly given for onCacheMiss will cause cache misses to be ignored - no query will be made to the db to fulfill them.
> I've encountered two cases where this capability is useful:
> 1. Relatively small datasets, such as category id -> category name mappings, which will not change during the course of indexing.
> 2. Queries which are heavy on db resources per-query, particularly if the query for an individual record is slow, and can't be fixed easily on the db side for whatever reason.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (SOLR-1787) CachedSqlEntityProcessor pre-warmed cache use case

Posted by "Noble Paul (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/SOLR-1787?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12838208#action_12838208 ] 

Noble Paul commented on SOLR-1787:
----------------------------------

you can change the title of the issue so that it reflects what is the actual requirement

> CachedSqlEntityProcessor pre-warmed cache use case
> --------------------------------------------------
>
>                 Key: SOLR-1787
>                 URL: https://issues.apache.org/jira/browse/SOLR-1787
>             Project: Solr
>          Issue Type: Improvement
>          Components: contrib - DataImportHandler
>    Affects Versions: 1.5
>         Environment: jdk 1.6.x, windows xp, tomcat 6.x
>            Reporter: Michael Henson
>            Assignee: Noble Paul
>            Priority: Minor
>             Fix For: 1.5
>
>         Attachments: solr-1787.patch
>
>
> The CachedSqlEntityProcessor currently builds a cache of rows it sees as it goes, so later requests for that same key can be served from data that has already been fetched. The primary query could be written to fetch all possible rows, which would then be set into the cache on the first request for a row. In that case the database would only receive another query when there is a cache miss. However, the query it would execute is the one that pulls all rows, negating any performance gain.
> This patch adds the ability to configure behavior on cache miss with the "onCacheMiss" attribute on an "entity" tag in the data-config.xml file. The current behavior is the default, corresponding to the setting onCacheMiss="fill". Any other value explicitly given for onCacheMiss will cause cache misses to be ignored - no query will be made to the db to fulfill them.
> I've encountered two cases where this capability is useful:
> 1. Relatively small datasets, such as category id -> category name mappings, which will not change during the course of indexing.
> 2. Queries which are heavy on db resources per-query, particularly if the query for an individual record is slow, and can't be fixed easily on the db side for whatever reason.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (SOLR-1787) CachedSqlEntityProcessor pre-warmed cache use case

Posted by "Michael Henson (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/SOLR-1787?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12838052#action_12838052 ] 

Michael Henson commented on SOLR-1787:
--------------------------------------

The SQLException issue looks like it's an issue with my desktop machine's ability to maintain a connection to our database rather than the code itself.

> CachedSqlEntityProcessor pre-warmed cache use case
> --------------------------------------------------
>
>                 Key: SOLR-1787
>                 URL: https://issues.apache.org/jira/browse/SOLR-1787
>             Project: Solr
>          Issue Type: Improvement
>          Components: contrib - DataImportHandler
>    Affects Versions: 1.5
>         Environment: jdk 1.6.x, windows xp, tomcat 6.x
>            Reporter: Michael Henson
>            Assignee: Noble Paul
>            Priority: Minor
>             Fix For: 1.5
>
>         Attachments: solr-1787.patch
>
>
> The CachedSqlEntityProcessor currently builds a cache of rows it sees as it goes, so later requests for that same key can be served from data that has already been fetched. The primary query could be written to fetch all possible rows, which would then be set into the cache on the first request for a row. In that case the database would only receive another query when there is a cache miss. However, the query it would execute is the one that pulls all rows, negating any performance gain.
> This patch adds the ability to configure behavior on cache miss with the "onCacheMiss" attribute on an "entity" tag in the data-config.xml file. The current behavior is the default, corresponding to the setting onCacheMiss="fill". Any other value explicitly given for onCacheMiss will cause cache misses to be ignored - no query will be made to the db to fulfill them.
> I've encountered two cases where this capability is useful:
> 1. Relatively small datasets, such as category id -> category name mappings, which will not change during the course of indexing.
> 2. Queries which are heavy on db resources per-query, particularly if the query for an individual record is slow, and can't be fixed easily on the db side for whatever reason.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.