You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@nutch.apache.org by "Lewis John McGibbney (JIRA)" <ji...@apache.org> on 2013/05/10 21:09:15 UTC

[jira] [Updated] (NUTCH-1570) Add filtering capability to Datastore Queries

     [ https://issues.apache.org/jira/browse/NUTCH-1570?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Lewis John McGibbney updated NUTCH-1570:
----------------------------------------

    Description: 
For some time this issue has been discussed on various lists.
When doing the upgrade of the Gora dependencies in NUTCH-1569, I  stumbled across a comment within o.a.n.api.DbReader#Iterator

{code}
  public Iterator<Map<String,Object>> iterator(String[] fields, String startKey, String endKey,
      String batchId) throws Exception {
    Query<String,WebPage> q = store.newQuery();
    String[] qFields = fields;
    if (fields != null) {
      HashSet<String> flds = new HashSet<String>(Arrays.asList(fields));
      // remove "url"
      flds.remove("url");
      if (flds.size() > 0) {
        qFields = flds.toArray(new String[flds.size()]);
      } else {
        qFields = null;
      }
    }
    q.setFields(qFields);
    if (startKey != null) {
      q.setStartKey(startKey);
      if (endKey != null) {
        q.setEndKey(endKey);
      }
    }
    Result<String,WebPage> res = store.execute(q);
    *XXX we should add the filtering capability to Query*
    return new DbIterator(res, fields, batchId);
  }
{code} 

I will link this issue to something over on Gora once we get around to the implementation.

  was:
For some time this issue has been discussed on various lists.
When doing the upgrade of the Gora dependencies in NUTCH-1569, I  stumbled across a comment within o.a.n.api.DbReader#Iterator

{code}
  public Iterator<Map<String,Object>> iterator(String[] fields, String startKey, String endKey,
      String batchId) throws Exception {
    Query<String,WebPage> q = store.newQuery();
    String[] qFields = fields;
    if (fields != null) {
      HashSet<String> flds = new HashSet<String>(Arrays.asList(fields));
      // remove "url"
      flds.remove("url");
      if (flds.size() > 0) {
        qFields = flds.toArray(new String[flds.size()]);
      } else {
        qFields = null;
      }
    }
    q.setFields(qFields);
    if (startKey != null) {
      q.setStartKey(startKey);
      if (endKey != null) {
        q.setEndKey(endKey);
      }
    }
    Result<String,WebPage> res = store.execute(q);
    * // XXX we should add the filtering capability to Query *
    return new DbIterator(res, fields, batchId);
  }
{code} 

I will link this issue to something over on Gora once we get around to the implementation.

    
> Add filtering capability to Datastore Queries
> ---------------------------------------------
>
>                 Key: NUTCH-1570
>                 URL: https://issues.apache.org/jira/browse/NUTCH-1570
>             Project: Nutch
>          Issue Type: Bug
>          Components: storage
>    Affects Versions: 2.2
>            Reporter: Lewis John McGibbney
>             Fix For: 2.3
>
>
> For some time this issue has been discussed on various lists.
> When doing the upgrade of the Gora dependencies in NUTCH-1569, I  stumbled across a comment within o.a.n.api.DbReader#Iterator
> {code}
>   public Iterator<Map<String,Object>> iterator(String[] fields, String startKey, String endKey,
>       String batchId) throws Exception {
>     Query<String,WebPage> q = store.newQuery();
>     String[] qFields = fields;
>     if (fields != null) {
>       HashSet<String> flds = new HashSet<String>(Arrays.asList(fields));
>       // remove "url"
>       flds.remove("url");
>       if (flds.size() > 0) {
>         qFields = flds.toArray(new String[flds.size()]);
>       } else {
>         qFields = null;
>       }
>     }
>     q.setFields(qFields);
>     if (startKey != null) {
>       q.setStartKey(startKey);
>       if (endKey != null) {
>         q.setEndKey(endKey);
>       }
>     }
>     Result<String,WebPage> res = store.execute(q);
>     *XXX we should add the filtering capability to Query*
>     return new DbIterator(res, fields, batchId);
>   }
> {code} 
> I will link this issue to something over on Gora once we get around to the implementation.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira