You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@hawq.apache.org by Roman Shaposhnik <rv...@apache.org> on 2016/12/16 00:50:07 UTC

Using PXF without HAWQ

Hi!

recently I got pretty excited about a possibility of using
PXF outside of its original HAWQ use case. My ultimate
wish here is to make PXF available to other Postgres-derived
databases thus connecting them to the Hadoop ecosystem of
data sources (think FDW-over-PXF).

With that ambitious goal in mind, I started at a much smaller
MVP today and wanted to share my experience with you all.

Basically my goal was to make PXF available to Apache Calcite
as a backend (since Calcite itself doesn't deal with storage of data,
algorithms to process data, and a repository for storing metadata).
Calcite comes with a demo that allows you to treat a directory
full of CSV files as a DB (with individual files being tables) and
I wanted to extend that demo to use PXF reading CSV files from HDFS
instead:
  http://calcite.apache.org/docs/tutorial.html
https://github.com/apache/calcite/tree/master/example/csv/src/main/java/org/apache/calcite/adapter/csv

Being a new to using PXF outside of HAWQ, I started looking
for any kind of a "Standalone PXF" Quickstart guide but couldn't find
any (please let me know if I missed it). What follows are my notes on
what I've been able to do so far. Let me know if they are reasonable
and I'll start collecting them on a wiki to help others get going with PXF.

1. My first challenge was to get a local PXF service running. I couldn't find
any task that would help me do that so I did this:
    https://issues.apache.org/jira/browse/HAWQ-1224

2. My next challenge was to try and figure out the sequence of API calls
that would be required to use PXF to ready data from a CSV file stored
in a local HDFS (HDFS that happens to be backed by my local filesystem).
The problem is that I couldn't really find any API quick start guide that would
clearly describe the objects that PXF manipulates (nouns) what it can do
with them (verbs) and, potentially, a state transition diagram to guide the
client-side writers like myself. Did I miss a doc like that or should
I file a JIRA
for it to be created?

3. Even when I figured out some of the calls to make, there's still no
client-side
library available to translate those into the REST calls (or may be
even short-circuit
them when running as part of the same JVM as PXF). Does this sounds like
something that needs to be addressed by PXF community? Shall I file a JIRA?

Thanks,
Roman.

Re: Using PXF without HAWQ

Posted by Roman Shaposhnik <ro...@shaposhnik.org>.
On Tue, Feb 14, 2017 at 9:51 AM, Kyle Dunn <kd...@pivotal.io> wrote:
> I've added some technical details around PXF to the "Document HAWQ to PXF
> APIs" task: https://issues.apache.org/jira/browse/HAWQ-1234
>
> This was helpful in troubleshooting an issue so I figured the community
> would find value in it as well in support of Roman's idea about a
> standalone client.

Great notes, Kyle! Thanks for sharing. I'm sure you'll uncover more
as you go -- please add them as comments to the JIRA!

Thanks,
Roman.

Re: Using PXF without HAWQ

Posted by Kyle Dunn <kd...@pivotal.io>.
I've added some technical details around PXF to the "Document HAWQ to PXF
APIs" task: https://issues.apache.org/jira/browse/HAWQ-1234

This was helpful in troubleshooting an issue so I figured the community
would find value in it as well in support of Roman's idea about a
standalone client.


-Kyle

On Thu, Dec 22, 2016 at 6:29 PM Roman Shaposhnik <ro...@shaposhnik.org>
wrote:

> On Fri, Dec 16, 2016 at 9:27 AM, Shivram Mani <sh...@gmail.com>
> wrote:
> > Currently the only documented interface for using PXF is using HAWQ
> > queries. The current API reference doc
> > <
> http://hdb.docs.pivotal.io/210/hawq/pxf/PXFExternalTableandAPIReference.html
> >
> > is
> > from the standpoint of a PXF plugin developer to a new data format, and
> not
> > quite intended for a client side user.
> > The javadocs <http://hawq.incubator.apache.org/docs/pxf/javadoc/>
> published
> > isn't going to be your API quick starter guide either.
> >
> > So yes, please do file a hawq JIRA against PXF component requesting this.
> > This will help external clients/db engines to leverage PXF API's
> direclty.
>
> Thanks Shivram! I've filed an umbrella JIRA. It would be awesome if those
> who are interested in this could contribute:
>      https://issues.apache.org/jira/browse/HAWQ-1233
>
> Thanks,
> Roman.
>
-- 
*Kyle Dunn | Data Engineering | Pivotal*
Direct: 303.905.3171 <3039053171> | Email: kdunn@pivotal.io

Re: Using PXF without HAWQ

Posted by Roman Shaposhnik <ro...@shaposhnik.org>.
On Fri, Dec 16, 2016 at 9:27 AM, Shivram Mani <sh...@gmail.com> wrote:
> Currently the only documented interface for using PXF is using HAWQ
> queries. The current API reference doc
> <http://hdb.docs.pivotal.io/210/hawq/pxf/PXFExternalTableandAPIReference.html>
> is
> from the standpoint of a PXF plugin developer to a new data format, and not
> quite intended for a client side user.
> The javadocs <http://hawq.incubator.apache.org/docs/pxf/javadoc/> published
> isn't going to be your API quick starter guide either.
>
> So yes, please do file a hawq JIRA against PXF component requesting this.
> This will help external clients/db engines to leverage PXF API's direclty.

Thanks Shivram! I've filed an umbrella JIRA. It would be awesome if those
who are interested in this could contribute:
     https://issues.apache.org/jira/browse/HAWQ-1233

Thanks,
Roman.

Re: Using PXF without HAWQ

Posted by Shivram Mani <sh...@gmail.com>.
Currently the only documented interface for using PXF is using HAWQ
queries. The current API reference doc
<http://hdb.docs.pivotal.io/210/hawq/pxf/PXFExternalTableandAPIReference.html>
is
from the standpoint of a PXF plugin developer to a new data format, and not
quite intended for a client side user.
The javadocs <http://hawq.incubator.apache.org/docs/pxf/javadoc/> published
isn't going to be your API quick starter guide either.

So yes, please do file a hawq JIRA against PXF component requesting this.
This will help external clients/db engines to leverage PXF API's direclty.

On Thu, Dec 15, 2016 at 4:50 PM, Roman Shaposhnik <rv...@apache.org> wrote:

> Hi!
>
> recently I got pretty excited about a possibility of using
> PXF outside of its original HAWQ use case. My ultimate
> wish here is to make PXF available to other Postgres-derived
> databases thus connecting them to the Hadoop ecosystem of
> data sources (think FDW-over-PXF).
>
> With that ambitious goal in mind, I started at a much smaller
> MVP today and wanted to share my experience with you all.
>
> Basically my goal was to make PXF available to Apache Calcite
> as a backend (since Calcite itself doesn't deal with storage of data,
> algorithms to process data, and a repository for storing metadata).
> Calcite comes with a demo that allows you to treat a directory
> full of CSV files as a DB (with individual files being tables) and
> I wanted to extend that demo to use PXF reading CSV files from HDFS
> instead:
>   http://calcite.apache.org/docs/tutorial.html
> https://github.com/apache/calcite/tree/master/example/
> csv/src/main/java/org/apache/calcite/adapter/csv
>
> Being a new to using PXF outside of HAWQ, I started looking
> for any kind of a "Standalone PXF" Quickstart guide but couldn't find
> any (please let me know if I missed it). What follows are my notes on
> what I've been able to do so far. Let me know if they are reasonable
> and I'll start collecting them on a wiki to help others get going with PXF.
>
> 1. My first challenge was to get a local PXF service running. I couldn't
> find
> any task that would help me do that so I did this:
>     https://issues.apache.org/jira/browse/HAWQ-1224
>
> 2. My next challenge was to try and figure out the sequence of API calls
> that would be required to use PXF to ready data from a CSV file stored
> in a local HDFS (HDFS that happens to be backed by my local filesystem).
> The problem is that I couldn't really find any API quick start guide that
> would
> clearly describe the objects that PXF manipulates (nouns) what it can do
> with them (verbs) and, potentially, a state transition diagram to guide the
> client-side writers like myself. Did I miss a doc like that or should
> I file a JIRA
> for it to be created?
>
> 3. Even when I figured out some of the calls to make, there's still no
> client-side
> library available to translate those into the REST calls (or may be
> even short-circuit
> them when running as part of the same JVM as PXF). Does this sounds like
> something that needs to be addressed by PXF community? Shall I file a JIRA?
>
> Thanks,
> Roman.
>



-- 
shivram mani