You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@lucene.apache.org by "Eungsop Yoo (JIRA)" <ji...@apache.org> on 2016/10/04 02:44:20 UTC

[jira] [Commented] (SOLR-9562) Minimize queried collections for time series alias

    [ https://issues.apache.org/jira/browse/SOLR-9562?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15544140#comment-15544140 ] 

Eungsop Yoo commented on SOLR-9562:
-----------------------------------

I backported this patch to my own cluster, Solr 4.10.3-cdh5.4.9.
It took over 20 seconds to query against last 30 minutes over the collections of 14 days without this patch, but it takes only 3 seconds now.

> Minimize queried collections for time series alias
> --------------------------------------------------
>
>                 Key: SOLR-9562
>                 URL: https://issues.apache.org/jira/browse/SOLR-9562
>             Project: Solr
>          Issue Type: Improvement
>      Security Level: Public(Default Security Level. Issues are Public) 
>            Reporter: Eungsop Yoo
>            Priority: Minor
>         Attachments: SOLR-9562-v2.patch, SOLR-9562.patch
>
>
> For indexing time series data(such as large log data), we can create a new collection regularly(hourly, daily, etc.) with a write alias and create a read alias for all of those collections. But all of the collections of the read alias are queried even if we search over very narrow time window. In this case, the docs to be queried may be stored in very small portion of collections. So we don't need to do that.
> I suggest this patch for read alias to minimize queried collections. Three parameters for CREATEALIAS action are added.
> || Key || Type || Required || Default || Description ||
> | timeField | string | No | | The time field name for time series data. It should be date type. |
> | dateTimeFormat | string | No | | The format of timestamp for collection creation. Every collection should has a suffix(start with "_") with this format. 
> Ex. dateTimeFormat: yyyyMMdd, collectionName: col_20160927
> See [SimpleDateFormat|https://docs.oracle.com/javase/7/docs/api/java/text/SimpleDateFormat.html]. |
> | timeZone | string | No | | The time zone information for dateTimeFormat parameter.
> Ex. GMT+9. 
> See [SimpleDateFormat|https://docs.oracle.com/javase/7/docs/api/java/text/SimpleDateFormat.html]. |
> And then when we query with filter query like this "timeField:\[fromTime TO toTime\]", only the collections have the docs for a given time range will be queried.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org