You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@lucene.apache.org by "Jan Høydahl (JIRA)" <ji...@apache.org> on 2016/07/28 09:46:20 UTC

[jira] [Commented] (SOLR-9347) Solr post tool - ignore file name extension when -type is provided

    [ https://issues.apache.org/jira/browse/SOLR-9347?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15397328#comment-15397328 ] 

Jan Høydahl commented on SOLR-9347:
-----------------------------------

You are giving the tool a directory as argument, so by default it will scan the directory and all sub directories for files matching the filetypes pattern.

I assume you want to force the tool into considering all files it finds as being of type tsv even if the file has no ending. Problem is that there will always be users attempting such a command on a folder with lots of other files, causing unexpected behavior. And the tool does not try to guess file types from file content, so the only way we can guess is through file endings.

For now I think your best bet is to call the tool once for every file, and use some bash scripting to select what files you need.

I guess what could be done is a new option to tell the tool what type it should assume for files without a suffix, e.g. {{-nosuffix=tsv}}. The tool would then include files without a suffix in the file filter, and map them to that default type. Would that cover your use case?

> Solr post tool - ignore file name extension when -type is provided
> ------------------------------------------------------------------
>
>                 Key: SOLR-9347
>                 URL: https://issues.apache.org/jira/browse/SOLR-9347
>             Project: Solr
>          Issue Type: Improvement
>      Security Level: Public(Default Security Level. Issues are Public) 
>    Affects Versions: 6.1
>            Reporter: nirav patel
>
> I found that post tool is not loading files from directory if files have no extension even if you specify "-params "separator=%09" -type text/tsv -filetypes tsv" in arguments. I think if any of above parameter is used then there is no need to Enter auto mode. 
> Also there is no -verbose or -debug option that indicate potential problem.
> ./bin/post -c mycol1  -params "separator=%09" -type text/tsv -filetypes tsv  /dev/datascience/pod1/population/baseline/
> /usr/java/jdk1.8.0_102//bin/java -classpath /home/xactly/solr-6.1.0/dist/solr-core-6.1.0.jar -Dauto=yes -Dc=bonusOrder -Ddata=files -Drecursive=yes org.apache.solr.util.SimplePostTool /mapr/insights/datascience/rulec/prdx/bonusOrderType/baseline/
> SimplePostTool version 5.0.0
> Posting files to [base] url http://localhost:8983/solr/mycol1/update...
> Entering auto mode. File endings considered are xml,json,jsonl,csv,pdf,doc,docx,ppt,pptx,xls,xlsx,odt,odp,ods,ott,otp,ots,rtf,htm,html,txt,log
> Entering recursive mode, max depth=999, delay=0s
> Indexing directory /dev/datascience/pod1/population/baseline/ (0 files, depth=0)
> 0 files indexed.
> COMMITting Solr index changes to http://localhost:8983/solr/mycol1/update...
> Time spent: 0:00:00.056



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org