You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@spark.apache.org by "darion.yaphet" <fl...@163.com> on 2017/06/12 03:46:35 UTC
LibSVM should have just one input file
Hi team :
Currently when we using SVM to train dataset we found the input files limit only one .
the source code as following :
valpath=if (dataFiles.length ==1) {
dataFiles.head.getPath.toUri.toString
} elseif (dataFiles.isEmpty) {
thrownewIOException("No input path specified for libsvm data")
} else {
thrownewIOException("Multiple input paths are not supported for libsvm data.")
}
The file store on the Distributed File System such as HDFS is split into mutil piece and I think this limit is not necessary . I'm not sure is it a bug ? or something I'm using not correctly .
thanks a lot ~~~
Re: LibSVM should have just one input file
Posted by "颜发才 (Yan Facai)" <fa...@gmail.com>.
Hi, yaphet.
It seems that the code you pasted should be located in LibSVM, rather
than SVM.
Do I misunderstand?
For LibSVMDataSource,
1. if numFeatures is unspecified, only one file is valid input.
val df = spark.read.format("libsvm")
.load("data/mllib/sample_libsvm_data.txt")
2. otherwise, multiple files are OK.
val df = spark.read.format("libsvm")
.option("numFeatures", "780")
.load("data/mllib/sample_libsvm_data.txt")
For more to see: http://spark.apache.org/docs/latest/api/scala/index.html#
org.apache.spark.ml.source.libsvm.LibSVMDataSource
On Mon, Jun 12, 2017 at 11:46 AM, darion.yaphet <fl...@163.com> wrote:
> Hi team :
>
> Currently when we using SVM to train dataset we found the input
> files limit only one .
>
> the source code as following :
>
> val path = if (dataFiles.length == 1) {
> dataFiles.head.getPath.toUri.toString
> } else if (dataFiles.isEmpty) {
> throw new IOException("No input path specified for libsvm data")
> } else {
> throw new IOException("Multiple input paths are not supported for libsvm
> data.")
> }
>
> The file store on the Distributed File System such as HDFS is split into
> mutil piece and I think this limit is not necessary . I'm not sure is it a
> bug ? or something I'm using not correctly .
>
> thanks a lot ~~~
>
>
>
>
Re: LibSVM should have just one input file
Posted by "颜发才 (Yan Facai)" <fa...@gmail.com>.
Hi, yaphet.
It seems that the code you pasted should be located in LibSVM, rather
than SVM.
Do I misunderstand?
For LibSVMDataSource,
1. if numFeatures is unspecified, only one file is valid input.
val df = spark.read.format("libsvm")
.load("data/mllib/sample_libsvm_data.txt")
2. otherwise, multiple files are OK.
val df = spark.read.format("libsvm")
.option("numFeatures", "780")
.load("data/mllib/sample_libsvm_data.txt")
For more to see: http://spark.apache.org/docs/latest/api/scala/index.html#
org.apache.spark.ml.source.libsvm.LibSVMDataSource
On Mon, Jun 12, 2017 at 11:46 AM, darion.yaphet <fl...@163.com> wrote:
> Hi team :
>
> Currently when we using SVM to train dataset we found the input
> files limit only one .
>
> the source code as following :
>
> val path = if (dataFiles.length == 1) {
> dataFiles.head.getPath.toUri.toString
> } else if (dataFiles.isEmpty) {
> throw new IOException("No input path specified for libsvm data")
> } else {
> throw new IOException("Multiple input paths are not supported for libsvm
> data.")
> }
>
> The file store on the Distributed File System such as HDFS is split into
> mutil piece and I think this limit is not necessary . I'm not sure is it a
> bug ? or something I'm using not correctly .
>
> thanks a lot ~~~
>
>
>
>