You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@hawq.apache.org by Roman Shaposhnik <rv...@apache.org> on 2016/12/16 00:50:07 UTC
Using PXF without HAWQ
Hi!
recently I got pretty excited about a possibility of using
PXF outside of its original HAWQ use case. My ultimate
wish here is to make PXF available to other Postgres-derived
databases thus connecting them to the Hadoop ecosystem of
data sources (think FDW-over-PXF).
With that ambitious goal in mind, I started at a much smaller
MVP today and wanted to share my experience with you all.
Basically my goal was to make PXF available to Apache Calcite
as a backend (since Calcite itself doesn't deal with storage of data,
algorithms to process data, and a repository for storing metadata).
Calcite comes with a demo that allows you to treat a directory
full of CSV files as a DB (with individual files being tables) and
I wanted to extend that demo to use PXF reading CSV files from HDFS
instead:
http://calcite.apache.org/docs/tutorial.html
https://github.com/apache/calcite/tree/master/example/csv/src/main/java/org/apache/calcite/adapter/csv
Being a new to using PXF outside of HAWQ, I started looking
for any kind of a "Standalone PXF" Quickstart guide but couldn't find
any (please let me know if I missed it). What follows are my notes on
what I've been able to do so far. Let me know if they are reasonable
and I'll start collecting them on a wiki to help others get going with PXF.
1. My first challenge was to get a local PXF service running. I couldn't find
any task that would help me do that so I did this:
https://issues.apache.org/jira/browse/HAWQ-1224
2. My next challenge was to try and figure out the sequence of API calls
that would be required to use PXF to ready data from a CSV file stored
in a local HDFS (HDFS that happens to be backed by my local filesystem).
The problem is that I couldn't really find any API quick start guide that would
clearly describe the objects that PXF manipulates (nouns) what it can do
with them (verbs) and, potentially, a state transition diagram to guide the
client-side writers like myself. Did I miss a doc like that or should
I file a JIRA
for it to be created?
3. Even when I figured out some of the calls to make, there's still no
client-side
library available to translate those into the REST calls (or may be
even short-circuit
them when running as part of the same JVM as PXF). Does this sounds like
something that needs to be addressed by PXF community? Shall I file a JIRA?
Thanks,
Roman.
Re: Using PXF without HAWQ
Posted by Roman Shaposhnik <ro...@shaposhnik.org>.
On Tue, Feb 14, 2017 at 9:51 AM, Kyle Dunn <kd...@pivotal.io> wrote:
> I've added some technical details around PXF to the "Document HAWQ to PXF
> APIs" task: https://issues.apache.org/jira/browse/HAWQ-1234
>
> This was helpful in troubleshooting an issue so I figured the community
> would find value in it as well in support of Roman's idea about a
> standalone client.
Great notes, Kyle! Thanks for sharing. I'm sure you'll uncover more
as you go -- please add them as comments to the JIRA!
Thanks,
Roman.
Re: Using PXF without HAWQ
Posted by Kyle Dunn <kd...@pivotal.io>.
I've added some technical details around PXF to the "Document HAWQ to PXF
APIs" task: https://issues.apache.org/jira/browse/HAWQ-1234
This was helpful in troubleshooting an issue so I figured the community
would find value in it as well in support of Roman's idea about a
standalone client.
-Kyle
On Thu, Dec 22, 2016 at 6:29 PM Roman Shaposhnik <ro...@shaposhnik.org>
wrote:
> On Fri, Dec 16, 2016 at 9:27 AM, Shivram Mani <sh...@gmail.com>
> wrote:
> > Currently the only documented interface for using PXF is using HAWQ
> > queries. The current API reference doc
> > <
> http://hdb.docs.pivotal.io/210/hawq/pxf/PXFExternalTableandAPIReference.html
> >
> > is
> > from the standpoint of a PXF plugin developer to a new data format, and
> not
> > quite intended for a client side user.
> > The javadocs <http://hawq.incubator.apache.org/docs/pxf/javadoc/>
> published
> > isn't going to be your API quick starter guide either.
> >
> > So yes, please do file a hawq JIRA against PXF component requesting this.
> > This will help external clients/db engines to leverage PXF API's
> direclty.
>
> Thanks Shivram! I've filed an umbrella JIRA. It would be awesome if those
> who are interested in this could contribute:
> https://issues.apache.org/jira/browse/HAWQ-1233
>
> Thanks,
> Roman.
>
--
*Kyle Dunn | Data Engineering | Pivotal*
Direct: 303.905.3171 <3039053171> | Email: kdunn@pivotal.io
Re: Using PXF without HAWQ
Posted by Roman Shaposhnik <ro...@shaposhnik.org>.
On Fri, Dec 16, 2016 at 9:27 AM, Shivram Mani <sh...@gmail.com> wrote:
> Currently the only documented interface for using PXF is using HAWQ
> queries. The current API reference doc
> <http://hdb.docs.pivotal.io/210/hawq/pxf/PXFExternalTableandAPIReference.html>
> is
> from the standpoint of a PXF plugin developer to a new data format, and not
> quite intended for a client side user.
> The javadocs <http://hawq.incubator.apache.org/docs/pxf/javadoc/> published
> isn't going to be your API quick starter guide either.
>
> So yes, please do file a hawq JIRA against PXF component requesting this.
> This will help external clients/db engines to leverage PXF API's direclty.
Thanks Shivram! I've filed an umbrella JIRA. It would be awesome if those
who are interested in this could contribute:
https://issues.apache.org/jira/browse/HAWQ-1233
Thanks,
Roman.
Re: Using PXF without HAWQ
Posted by Shivram Mani <sh...@gmail.com>.
Currently the only documented interface for using PXF is using HAWQ
queries. The current API reference doc
<http://hdb.docs.pivotal.io/210/hawq/pxf/PXFExternalTableandAPIReference.html>
is
from the standpoint of a PXF plugin developer to a new data format, and not
quite intended for a client side user.
The javadocs <http://hawq.incubator.apache.org/docs/pxf/javadoc/> published
isn't going to be your API quick starter guide either.
So yes, please do file a hawq JIRA against PXF component requesting this.
This will help external clients/db engines to leverage PXF API's direclty.
On Thu, Dec 15, 2016 at 4:50 PM, Roman Shaposhnik <rv...@apache.org> wrote:
> Hi!
>
> recently I got pretty excited about a possibility of using
> PXF outside of its original HAWQ use case. My ultimate
> wish here is to make PXF available to other Postgres-derived
> databases thus connecting them to the Hadoop ecosystem of
> data sources (think FDW-over-PXF).
>
> With that ambitious goal in mind, I started at a much smaller
> MVP today and wanted to share my experience with you all.
>
> Basically my goal was to make PXF available to Apache Calcite
> as a backend (since Calcite itself doesn't deal with storage of data,
> algorithms to process data, and a repository for storing metadata).
> Calcite comes with a demo that allows you to treat a directory
> full of CSV files as a DB (with individual files being tables) and
> I wanted to extend that demo to use PXF reading CSV files from HDFS
> instead:
> http://calcite.apache.org/docs/tutorial.html
> https://github.com/apache/calcite/tree/master/example/
> csv/src/main/java/org/apache/calcite/adapter/csv
>
> Being a new to using PXF outside of HAWQ, I started looking
> for any kind of a "Standalone PXF" Quickstart guide but couldn't find
> any (please let me know if I missed it). What follows are my notes on
> what I've been able to do so far. Let me know if they are reasonable
> and I'll start collecting them on a wiki to help others get going with PXF.
>
> 1. My first challenge was to get a local PXF service running. I couldn't
> find
> any task that would help me do that so I did this:
> https://issues.apache.org/jira/browse/HAWQ-1224
>
> 2. My next challenge was to try and figure out the sequence of API calls
> that would be required to use PXF to ready data from a CSV file stored
> in a local HDFS (HDFS that happens to be backed by my local filesystem).
> The problem is that I couldn't really find any API quick start guide that
> would
> clearly describe the objects that PXF manipulates (nouns) what it can do
> with them (verbs) and, potentially, a state transition diagram to guide the
> client-side writers like myself. Did I miss a doc like that or should
> I file a JIRA
> for it to be created?
>
> 3. Even when I figured out some of the calls to make, there's still no
> client-side
> library available to translate those into the REST calls (or may be
> even short-circuit
> them when running as part of the same JVM as PXF). Does this sounds like
> something that needs to be addressed by PXF community? Shall I file a JIRA?
>
> Thanks,
> Roman.
>
--
shivram mani