You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@mahout.apache.org by "Suneel Marthi (JIRA)" <ji...@apache.org> on 2013/07/26 16:39:50 UTC

[jira] [Comment Edited] (MAHOUT-1292) lucene2seq creates single document from index

    [ https://issues.apache.org/jira/browse/MAHOUT-1292?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13720838#comment-13720838 ] 

Suneel Marthi edited comment on MAHOUT-1292 at 7/26/13 2:39 PM:
----------------------------------------------------------------

Reopening this issue, there should be a way to error out (or report back to the user) if an invalid field (that's not present in Solr/Lucene) has been specified. The user did not have to go through the entire chain of lucene2seq -> seq2sparse -> rowid -> cvb0 to realize that lucene2seq had failed.


                
      
> lucene2seq creates single document from index
> ---------------------------------------------
>
>                 Key: MAHOUT-1292
>                 URL: https://issues.apache.org/jira/browse/MAHOUT-1292
>             Project: Mahout
>          Issue Type: Bug
>          Components: Integration
>    Affects Versions: 0.8
>            Reporter: Liz Merkhofer
>            Assignee: Suneel Marthi
>              Labels: cvb, lucene, solr
>             Fix For: 0.9
>
>
> Lucene2seq creates only one sequencefile, rather than a file for each document in the index.
> Running lucene2seq on my Solr (4.3) index produces a file with a header and, it seems, the field I specified from the index, concatenated for all the documents. After running this through seq2sparse and rowid (to prepare for cvb), the resulting matrix has only one row, though it should create one row per document.
> This issue prevents, at least, data from a lucene index from being easily used as input for cvb. Lucene.vector is also currently inadequate: the keys to its sequence files are LongWriteable, and rowid will not convert only Text to IntWriteable, as is necessary for the keys in cvb.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira