You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@lucene.apache.org by "Erik Hatcher (JIRA)" <ji...@apache.org> on 2015/05/19 18:29:01 UTC

[jira] [Assigned] (SOLR-7057) SimplePostTool curbside appeal

     [ https://issues.apache.org/jira/browse/SOLR-7057?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Erik Hatcher reassigned SOLR-7057:
----------------------------------

    Assignee: Erik Hatcher

> SimplePostTool curbside appeal
> ------------------------------
>
>                 Key: SOLR-7057
>                 URL: https://issues.apache.org/jira/browse/SOLR-7057
>             Project: Solr
>          Issue Type: Improvement
>          Components: SimplePostTool
>            Reporter: Timothy Potter
>            Assignee: Erik Hatcher
>            Priority: Minor
>
> When trying to index some Freebase articles, such as:
> http://maven.tamingtext.com/freebase-wex-2011-01-18-articles-first10k.tsv
> using the SimplePostTool (bin/post), I ran into a few minor things along the way that would help new users trying to get their content indexed.
> First, I tried the naive approach:
> {code}
> $ bin/post -c freebase ./freebase-wex-2011-01-18-articles-first10k.tsv 
> {code}
> Didn't work ... here's the output:
> {code}
> SimplePostTool: WARNING: Skipping freebase-wex-2011-01-18-articles-first10k.tsv. Unsupported file type for auto mode.
> 1 files indexed.
> {code}
> Ummm ... no, 1 files not indexed ;-) Instead the output should be something like:
> {code}
> SimplePostTool: WARNING: Skipping freebase-wex-2011-01-18-articles-first10k.tsv. Unsupported file type for auto mode.
> 0 of 1 files indexed.
> {code}
> Besides the misleading output, shouldn't tsv be a supported file type for auto-mode? It's a common enough format ...
> So I renamed the file to .csv instead and re-ran ... this time I get:
> {code}
> $ mv freebase-wex-2011-01-18-articles-first10k.tsv freebase-wex-2011-01-18-articles-first10k.csv
> $ bin/post -c freebase ./freebase-wex-2011-01-18-articles-first10k.csv
> ERROR - 2015-01-28 16:24:16.074; org.apache.solr.common.SolrException; org.apache.solr.common.SolrException: CSVLoader: input=null, line=1,expected 108 values but got 4
> {code}
> Hmmm ... OK ... did a little Googling and discovered I needed to specify the separator to be %09 (again, the tool should just recognize TSV as a supported format)
> {code}
> bin/post -c freebase -params "separator=%09&escape=\\" ./freebase-wex-2011-01-18-articles-first10k.csv
> {code}
> Success! (of course I had to add a header line to the file too, but there's little we can do about that)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org