You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@flink.apache.org by santosh_rajaguru <sa...@gmail.com> on 2015/07/15 12:15:36 UTC

Read XML from HDFS

Hi,

Is there any way to read the complete XML string or file from HDFS using
flink?

Thanks and Regards,
Santosh 



--
View this message in context: http://apache-flink-mailing-list-archive.1008284.n3.nabble.com/Read-XML-from-HDFS-tp7023.html
Sent from the Apache Flink Mailing List archive. mailing list archive at Nabble.com.

Re: Read XML from HDFS

Posted by santosh_rajaguru <sa...@gmail.com>.
Thanks Fabian Kostas for info. Using XMLInputFormat, I am able to read a xml
file from HDFS. 

Cheers,
Santosh 



--
View this message in context: http://apache-flink-mailing-list-archive.1008284.n3.nabble.com/Read-XML-from-HDFS-tp7023p7035.html
Sent from the Apache Flink Mailing List archive. mailing list archive at Nabble.com.

Re: Read XML from HDFS

Posted by Kostas Tzoumas <kt...@apache.org>.
Perhaps there is also an existing HadoopInputFormat for XML that you might
be able to reuse for your purposes (Flink supports Hadoop input formats).

For example, there is an XMLInputFormat in the Apache Mahout codebase that
you could take a look at:
https://github.com/apache/mahout/blob/ad84344e4055b1e6adff5779339a33fa29e1265d/examples/src/main/java/org/apache/mahout/classifier/bayes/XmlInputFormat.java




On Wed, Jul 15, 2015 at 1:37 PM, Fabian Hueske <fh...@gmail.com> wrote:

> Hi Santosh,
>
> yes that is possible, if you want to read a complete file without splitting
> it into records. However, you need to implement a custom InputFormat for
> that which extends Flink's FileInputFormat.
>
> If you want to split it into records, you need a character sequence that
> delimits two records. Depending on the schema and format of your data this
> might not be possible. If you have such a delimiting character sequence,
> you can use Flink's DelimitedInputFormat.
>
> Cheers, Fabian
>
>
> 2015-07-15 12:15 GMT+02:00 santosh_rajaguru <sa...@gmail.com>:
>
> > Hi,
> >
> > Is there any way to read the complete XML string or file from HDFS using
> > flink?
> >
> > Thanks and Regards,
> > Santosh
> >
> >
> >
> > --
> > View this message in context:
> >
> http://apache-flink-mailing-list-archive.1008284.n3.nabble.com/Read-XML-from-HDFS-tp7023.html
> > Sent from the Apache Flink Mailing List archive. mailing list archive at
> > Nabble.com.
> >
>

Re: Read XML from HDFS

Posted by Fabian Hueske <fh...@gmail.com>.
Hi Santosh,

yes that is possible, if you want to read a complete file without splitting
it into records. However, you need to implement a custom InputFormat for
that which extends Flink's FileInputFormat.

If you want to split it into records, you need a character sequence that
delimits two records. Depending on the schema and format of your data this
might not be possible. If you have such a delimiting character sequence,
you can use Flink's DelimitedInputFormat.

Cheers, Fabian


2015-07-15 12:15 GMT+02:00 santosh_rajaguru <sa...@gmail.com>:

> Hi,
>
> Is there any way to read the complete XML string or file from HDFS using
> flink?
>
> Thanks and Regards,
> Santosh
>
>
>
> --
> View this message in context:
> http://apache-flink-mailing-list-archive.1008284.n3.nabble.com/Read-XML-from-HDFS-tp7023.html
> Sent from the Apache Flink Mailing List archive. mailing list archive at
> Nabble.com.
>