You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@phoenix.apache.org by Dhruv Gohil <yo...@gmail.com> on 2015/10/29 21:26:43 UTC
Re: replace CsvToKeyValueMapper with my
+1
FYI: We did it(phoenix 4.2.2) by copy pasting whole "CsvBulkLoadTool"
and changing the pieces we want "Custom parser", getting back "job
counters" to take downstream decisions etc..
+1 for pluggability, but we don't know how stable the interface would be
(should we even publish it?),
A wild idea is to instead of "inventing proper interface" if we
can refector the logic out of org.apache.phoenix.mapreduce.Csv* (3
classes) to make the current implementation independent of "CSV" and
"Mapreduce"
that way CsvBulkLoadTool will be lightweight default reference and
people might just extend/copy it to customize MOST of behaviour.
P.S.: We are gonna give a shot to pick up record directly from kafka
instead of a CSV file soon.
On Thursday 29 October 2015 03:38 PM, Bulvik, Noam wrote:
>
> This is exactly what I need i.e. to be able to change the content of
> the row rather than different input format.
>
> The use case is when you need to load large amount of data from files
> and each row needs to be handled before it is been processed by the
> CSV parser. Examples can be change date format, fix encoding, escape
> delimiters and more. Of course this can be done in different
> map-reduce job but since we are already processing each row then it
> would be nice if we can do it there.
>
> *erom:*James Taylor [mailto:jamestaylor@apache.org]
> *Sent:* Thursday, October 29, 2015 7:33 PM
> *To:* user <us...@phoenix.apache.org>
> *Subject:* Re: replace CsvToKeyValueMapper with my implementation
>
> I seem to remember you starting down that path, Gabriel - a kind of
> pluggable transformation for each row. It wasn't pluggable on the
> input format, but that's a nice idea too, Ravi. I'm not sure if this
> is what Noam needs or if it's something else.
>
> Probably good to discuss a bit more at the use case level to
> understand the specifics a bit more.
>
> On Thu, Oct 29, 2015 at 9:17 AM, Ravi Kiran <maghamravikiran@gmail.com
> <ma...@gmail.com>> wrote:
>
> It would be great if we can provide an api and have end users
> provided implementation on how to parse each record . This way, we
> can move away with only bulk loading csv and have json and other
> formats of input bulk loaded onto phoenix tables.
>
> I can take that one up. Would it be something the community like
> as a feature ?
>
> On Thu, Oct 29, 2015 at 8:10 AM, Gabriel Reid
> <gabriel.reid@gmail.com <ma...@gmail.com>> wrote:
>
> Hi Noam,
>
> That specific piece of code in CsvBulkLoadTool that you
> referred to
> allows packaging the CsvBulkLoadTool within a different job
> jar file,
> but won't allow setting a different mapper class. The actual
> setting
> of the mapper class is done further down in the submitJob method,
> specifically the following piece:
>
> job.setMapperClass(CsvToKeyValueMapper.class);
>
> There isn't currently a way to load a custom mapper in the
> CsvBulkLoadTool, so the only (current) option is to create a
> fully new
> custom implementation of the bulk load tool (probably copying or
> reusing most of the existing tool). However, I can certainly
> imagine
> this being a useful feature to have in some situations.
>
> Could you log this request in jira? It would also be really
> good to
> have some more detail on your specific use case. And even
> better is a
> patch that implements it :-)
>
> - Gabriel
>
>
>
> On Thu, Oct 29, 2015 at 3:22 PM, Bulvik, Noam
> <Noam.Bulvik@teoco.com <ma...@teoco.com>> wrote:
> > Hi,
> >
> >
> >
> > We have private logic to be executed when parsing each line
> before it is
> > uploaded to phoenix. I saw the following in the code of the
> CsvBulkLoadTool
> >
> > // Allow overriding the job jar setting by using a -D system
> property at
> > startup
> >
> > if (job.getJar() == null)
> >
> > {
> >
> >
> > job.setJarByClass(CsvToKeyValueMapper.class);
> >
> > }
> >
> >
> >
> > Assuming I have the implementation for MyKeyValueMapper how
> can I make sure
> > it will be loaded instead of standard one ?
> >
> >
> >
> > Also in CsvToKeyValueMapper class there are some private
> members like
> >
> > · private PhoenixConnection conn;
> >
> > · private byte[] tableName;
> >
> >
> >
> > can you add option to access these member or make them
> protected so we will
> > be able to use them in the class we create that extends
> CsvToKeyValueMapper
> > and not to duplicate them and the code that init them
> >
> >
> >
> > we are using phoenix 4.5.2 over CDH
> >
> >
> >
> > thanks
> >
> > Noam
> >
> >
> >
> > Noam Bulvik
> >
> > R&D Manager
> >
> >
> >
> > TEOCO CORPORATION
> >
> > c: +972 54 5507984 <tel:%2B972%2054%205507984>
> >
> > p: +972 3 9269145 <tel:%2B972%203%209269145>
> >
> > Noam.Bulvik@teoco.com <ma...@teoco.com>
> >
> > www.teoco.com <http://www.teoco.com>
> >
> >
> >
> >
> > ________________________________
> >
> > PRIVILEGED AND CONFIDENTIAL
> > PLEASE NOTE: The information contained in this message is
> privileged and
> > confidential, and is intended only for the use of the
> individual to whom it
> > is addressed and others who have been specifically
> authorized to receive it.
> > If you are not the intended recipient, you are hereby
> notified that any
> > dissemination, distribution or copying of this communication
> is strictly
> > prohibited. If you have received this communication in
> error, or if any
> > problems occur with transmission, please contact sender.
> Thank you.
>
>
> ------------------------------------------------------------------------
>
> PRIVILEGED AND CONFIDENTIAL
> PLEASE NOTE: The information contained in this message is privileged
> and confidential, and is intended only for the use of the individual
> to whom it is addressed and others who have been specifically
> authorized to receive it. If you are not the intended recipient, you
> are hereby notified that any dissemination, distribution or copying of
> this communication is strictly prohibited. If you have received this
> communication in error, or if any problems occur with transmission,
> please contact sender. Thank you.