You are viewing a plain text version of this content. The canonical link for it is here.

Posted to dev@ignite.apache.org by Valentin Kulichenko <va...@gmail.com> on 2016/11/18 17:44:43 UTC

ignite-spark module in Hadoop Accelerator

Folks,

Is there anyone who understands the purpose of including ignite-spark
module in the Hadoop Accelerator build? I can't figure out a use case for
which it's needed.

In case we actually need it there, there is an issue then. We actually have
two ignite-spark modules, for 2.10 and 2.11. In Fabric build everything is
good, we put both in 'optional' folder and user can enable either one. But
in Hadoop Accelerator there is only 2.11 which means that the build doesn't
work with 2.10 out of the box.

We should either remove the module from the build, or fix the issue.

-Val

Re: ignite-spark module in Hadoop Accelerator

Posted by Dmitriy Setrakyan <ds...@apache.org>.

On Fri, Dec 9, 2016 at 2:48 AM, Vladimir Ozerov <vo...@gridgain.com>
wrote:

> Dima,
>
> The idea is not to add libs to Hadoop Accelerator, but to remove
> Accelerator edition altogether. Instead, we will add accelerator JARs to
> fabric.
>

I get the idea, but I don't see it. If I just need haddop plug-n-play
acceleration, then our hadoop accelerator edition is the easiest download
to use. Not sure why take it away from users. For everything else, we
should use the data fabric edition.



>
> On Fri, Dec 9, 2016 at 4:45 AM, Dmitriy Setrakyan <ds...@apache.org>
> wrote:
>
> > Guys, again, Hadoop edition is for *plug-n-play* Hadoop acceleration
> only.
> > Why are we trying to add more libs to it, if the only libs required are
> to
> > enable IGFS and Ignite MapReduce? It seems like we are trying to make the
> > Hadoop accelerator into something it was never meant to be.
> >
> > D.
> >
> > On Thu, Dec 8, 2016 at 1:55 AM, Sergey Kozlov <sk...@gridgain.com>
> > wrote:
> >
> > > Another point is that hadoop edition has no optional modules. It forces
> > > user to download the fabric edition and copy module from there.
> > >
> > > On Thu, Dec 8, 2016 at 12:19 PM, Vladimir Ozerov <vozerov@gridgain.com
> >
> > > wrote:
> > >
> > > > Work for ourselves - is to maintain two separate editions, while
> > > everything
> > > > can be easily merged into a single distribution.
> > > >
> > > > On Wed, Dec 7, 2016 at 3:29 AM, Dmitriy Setrakyan <
> > dsetrakyan@apache.org
> > > >
> > > > wrote:
> > > >
> > > > > Why are we creating work for ourselves? What is wrong with having 2
> > > > > downloads?
> > > > >
> > > > > Hadoop accelerator edition exists for the following 2 purposes
> only:
> > > > >
> > > > >    - accelerate HDFS with Ignite In-Memory File System (IGFS)
> > > > >    - accelerate Hadoop MapReduce with Ignite In-Memory MapReduce
> > > > >
> > > > > I agree with the original email from Valentin that Spark libs
> should
> > > not
> > > > be
> > > > > included into hadoop-accelerator download. Spark integration is not
> > > part
> > > > of
> > > > > Ignite Hadoop Accelerator and should be included only into the
> Ignite
> > > > > fabric download.
> > > > >
> > > > > D.
> > > > >
> > > > >
> > > > >
> > > > > On Tue, Dec 6, 2016 at 12:30 AM, Sergey Kozlov <
> skozlov@gridgain.com
> > >
> > > > > wrote:
> > > > >
> > > > > > Hi
> > > > > >
> > > > > > In general I agree with Vladimir but would suggest more technical
> > > > > details:
> > > > > >
> > > > > > Due the need to collect particular CLASS_PATHs for fabric and
> > hadoop
> > > > > > editions we can change the logic of processing of libs directory
> > > > > >
> > > > > > 1. Introduce libs/hadoop and libs/fabric directories. These
> > > directories
> > > > > are
> > > > > > root directories for specific modules for hadoop and fabric
> > > > > > editions respectively
> > > > > > 2. Change collecting of directories for CLASS_PATH for ignite.sh:
> > > > > >  - collect everything for libs except libs/hadoop
> > > > > >  - collect everything from libs/fabric
> > > > > > 3. Add ignite-hadoop-accelerator.{sh|bat} script (also it may
> make
> > > > > initial
> > > > > > setup instead of setup-hadoop.sh) that constructs CLASS_PATH by
> > > > following
> > > > > > way:
> > > > > >  - collect everything for libs except libs/fabirc
> > > > > >  - collect everything from libs/hadoop
> > > > > >
> > > > > > This approach allows us following:
> > > > > >  - share common modules across both editions (just put in libs)
> > > > > >  - do not share edition-specific modules (either put in
> libs/hadoop
> > > or
> > > > in
> > > > > > libs/fabric)
> > > > > >
> > > > > >
> > > > > >
> > > > > >
> > > > > > On Mon, Dec 5, 2016 at 11:56 PM, Vladimir Ozerov <
> > > vozerov@gridgain.com
> > > > >
> > > > > > wrote:
> > > > > >
> > > > > > > Agree. I do not see any reasons to have two different products.
> > > > > Instead,
> > > > > > > just add ignite-hadoop.jar to distribution, and add separate
> > script
> > > > to
> > > > > > > start Accelerator. We can go the same way as we did for
> > > "platforms":
> > > > > > create
> > > > > > > separate top-level folder "hadoop" in Fabric distribution and
> put
> > > all
> > > > > > > realted Hadoop Acceleratro stuff there.
> > > > > > >
> > > > > > > On Fri, Dec 2, 2016 at 10:46 PM, Valentin Kulichenko <
> > > > > > > valentin.kulichenko@gmail.com> wrote:
> > > > > > >
> > > > > > > > In general, I don't quite understand why we should move any
> > > > component
> > > > > > > > outside of Fabric. The concept of Fabric is to have
> everything,
> > > no?
> > > > > :)
> > > > > > In
> > > > > > > > other words, if a cluster was once setup for Hadoop
> > Acceleration,
> > > > why
> > > > > > not
> > > > > > > > allow to create a cache and/or run a task using native Ignite
> > > APIs
> > > > > > > sometime
> > > > > > > > later. We follow this approach with all our components and
> > > modules,
> > > > > but
> > > > > > > not
> > > > > > > > with ignite-hadoop for some reason.
> > > > > > > >
> > > > > > > > If we get rid of Hadoop Accelerator build, initial setup of
> > > Hadoop
> > > > > > > > integration can potentially become a bit more complicated,
> but
> > > with
> > > > > > > proper
> > > > > > > > documentation I don't think this is going to be a problem,
> > > because
> > > > it
> > > > > > > > requires multiple steps now anyway. And frankly the same can
> be
> > > > said
> > > > > > > about
> > > > > > > > any optional module we have - enabling it requires some
> > > additional
> > > > > > steps
> > > > > > > as
> > > > > > > > it doesn't work out of the box.
> > > > > > > >
> > > > > > > > -Val
> > > > > > > >
> > > > > > > > On Fri, Dec 2, 2016 at 11:38 AM, Denis Magda <
> > dmagda@apache.org>
> > > > > > wrote:
> > > > > > > >
> > > > > > > >> Dmitriy,
> > > > > > > >>
> > > > > > > >> >   - the "lib/" folder has much fewer libraries that in
> > fabric,
> > > > > > simply
> > > > > > > >> >   becomes many dependencies don't make sense for hadoop
> > > > > environment
> > > > > > > >>
> > > > > > > >> This reason why the discussion moved to this direction is
> > > exactly
> > > > in
> > > > > > > that.
> > > > > > > >>
> > > > > > > >> How do we decide what should be a part of Hadoop Accelerator
> > and
> > > > > what
> > > > > > > >> should be excluded? If you read through Val and Cos comments
> > > below
> > > > > > > you’ll
> > > > > > > >> get more insights.
> > > > > > > >>
> > > > > > > >> In general, we need to have a clear understanding on what's
> > > Hadoop
> > > > > > > >> Accelerator distribution use case. This will help us to come
> > up
> > > > > with a
> > > > > > > >> final decision.
> > > > > > > >>
> > > > > > > >> If the accelerator is supposed to be plugged-in into an
> > existed
> > > > > Hadoop
> > > > > > > >> environment by enabling MapReduce and/IGFS at the
> > configuration
> > > > > level
> > > > > > > then
> > > > > > > >> we should simply remove ignite-indexing, ignite-spark
> modules
> > > and
> > > > > add
> > > > > > > >> additional logging libs as well as AWS, GCE integrations’
> > > > packages.
> > > > > > > >>
> > > > > > > >> But, wait, what if a user wants to leverage from Ignite
> Spark
> > > > > > > >> Integration, Ignite SQL or Geospatial queries, Ignite
> > streaming
> > > > > > > >> capabilities after he has already plugged-in the
> accelerator.
> > > What
> > > > > if
> > > > > > > he is
> > > > > > > >> ready to modify his existed code. He can’t simply switch to
> > the
> > > > > fabric
> > > > > > > on
> > > > > > > >> an application side because the fabric doesn’t include
> > > > accelerator’s
> > > > > > > libs
> > > > > > > >> that are still needed. He can’t solely rely on the
> accelerator
> > > > > > > distribution
> > > > > > > >> as well which misses some libs. And, obviously, the user
> > starts
> > > > > > > shuffling
> > > > > > > >> libs in between the fabric and accelerator to get what is
> > > > required.
> > > > > > > >>
> > > > > > > >> Vladimir, can you share your thoughts on this?
> > > > > > > >>
> > > > > > > >> —
> > > > > > > >> Denis
> > > > > > > >>
> > > > > > > >>
> > > > > > > >>
> > > > > > > >> > On Nov 30, 2016, at 11:18 PM, Dmitriy Setrakyan <
> > > > > > > dsetrakyan@apache.org>
> > > > > > > >> wrote:
> > > > > > > >> >
> > > > > > > >> > Guys,
> > > > > > > >> >
> > > > > > > >> > I just downloaded the hadoop accelerator and here are the
> > > > > > differences
> > > > > > > >> from
> > > > > > > >> > the fabric edition that jump at me right away:
> > > > > > > >> >
> > > > > > > >> >   - the "bin/" folder has "setup-hadoop" scripts
> > > > > > > >> >   - the "config/" folder has "hadoop" subfolder with
> > necessary
> > > > > > > >> >   hadoop-related configuration
> > > > > > > >> >   - the "lib/" folder has much fewer libraries that in
> > fabric,
> > > > > > simply
> > > > > > > >> >   becomes many dependencies don't make sense for hadoop
> > > > > environment
> > > > > > > >> >
> > > > > > > >> > I currently don't see how we can merge the hadoop
> > accelerator
> > > > with
> > > > > > > >> standard
> > > > > > > >> > fabric edition.
> > > > > > > >> >
> > > > > > > >> > D.
> > > > > > > >> >
> > > > > > > >> > On Thu, Dec 1, 2016 at 9:54 AM, Denis Magda <
> > > dmagda@apache.org>
> > > > > > > wrote:
> > > > > > > >> >
> > > > > > > >> >> Vovan,
> > > > > > > >> >>
> > > > > > > >> >> As one of hadoop maintainers, please share your point of
> > view
> > > > on
> > > > > > > this.
> > > > > > > >> >>
> > > > > > > >> >> —
> > > > > > > >> >> Denis
> > > > > > > >> >>
> > > > > > > >> >>> On Nov 30, 2016, at 10:49 PM, Sergey Kozlov <
> > > > > skozlov@gridgain.com
> > > > > > >
> > > > > > > >> >> wrote:
> > > > > > > >> >>>
> > > > > > > >> >>> Denis
> > > > > > > >> >>>
> > > > > > > >> >>> I agree that at the moment there's no reason to split
> into
> > > > > fabric
> > > > > > > and
> > > > > > > >> >>> hadoop editions.
> > > > > > > >> >>>
> > > > > > > >> >>> On Thu, Dec 1, 2016 at 4:45 AM, Denis Magda <
> > > > dmagda@apache.org>
> > > > > > > >> wrote:
> > > > > > > >> >>>
> > > > > > > >> >>>> Hadoop Accelerator doesn’t require any additional
> > libraries
> > > > in
> > > > > > > >> compare
> > > > > > > >> >> to
> > > > > > > >> >>>> those we have in the fabric build. It only lacks some
> of
> > > them
> > > > > as
> > > > > > > Val
> > > > > > > >> >>>> mentioned below.
> > > > > > > >> >>>>
> > > > > > > >> >>>> Wouldn’t it better to discontinue Hadoop Accelerator
> > > edition
> > > > > and
> > > > > > > >> simply
> > > > > > > >> >>>> deliver hadoop jar and its configs as a part of the
> > fabric?
> > > > > > > >> >>>>
> > > > > > > >> >>>> —
> > > > > > > >> >>>> Denis
> > > > > > > >> >>>>
> > > > > > > >> >>>>> On Nov 27, 2016, at 3:12 PM, Dmitriy Setrakyan <
> > > > > > > >> dsetrakyan@apache.org>
> > > > > > > >> >>>> wrote:
> > > > > > > >> >>>>>
> > > > > > > >> >>>>> Separate edition for the Hadoop Accelerator was
> > primarily
> > > > > driven
> > > > > > > by
> > > > > > > >> the
> > > > > > > >> >>>>> default libraries. Hadoop Accelerator requires many
> more
> > > > > > libraries
> > > > > > > >> as
> > > > > > > >> >>>> well
> > > > > > > >> >>>>> as configuration settings compared to the standard
> > fabric
> > > > > > > download.
> > > > > > > >> >>>>>
> > > > > > > >> >>>>> Now, as far as spark integration is concerned, I am
> not
> > > sure
> > > > > > which
> > > > > > > >> >>>> edition
> > > > > > > >> >>>>> it belongs in, Hadoop Accelerator or standard fabric.
> > > > > > > >> >>>>>
> > > > > > > >> >>>>> D.
> > > > > > > >> >>>>>
> > > > > > > >> >>>>> On Sat, Nov 26, 2016 at 7:39 PM, Denis Magda <
> > > > > dmagda@apache.org
> > > > > > >
> > > > > > > >> >> wrote:
> > > > > > > >> >>>>>
> > > > > > > >> >>>>>> *Dmitriy*,
> > > > > > > >> >>>>>>
> > > > > > > >> >>>>>> I do believe that you should know why the community
> > > decided
> > > > > to
> > > > > > a
> > > > > > > >> >>>> separate
> > > > > > > >> >>>>>> edition for the Hadoop Accelerator. What was the
> reason
> > > for
> > > > > > that?
> > > > > > > >> >>>>>> Presently, as I see, it brings more confusion and
> > > > > difficulties
> > > > > > > >> rather
> > > > > > > >> >>>> then
> > > > > > > >> >>>>>> benefit.
> > > > > > > >> >>>>>>
> > > > > > > >> >>>>>> —
> > > > > > > >> >>>>>> Denis
> > > > > > > >> >>>>>>
> > > > > > > >> >>>>>> On Nov 26, 2016, at 2:14 PM, Konstantin Boudnik <
> > > > > > cos@apache.org>
> > > > > > > >> >> wrote:
> > > > > > > >> >>>>>>
> > > > > > > >> >>>>>> In fact I am very much agree with you. Right now,
> > running
> > > > the
> > > > > > > >> >>>> "accelerator"
> > > > > > > >> >>>>>> component in Bigtop disto gives one a pretty much
> > > complete
> > > > > > fabric
> > > > > > > >> >>>> anyway.
> > > > > > > >> >>>>>> But
> > > > > > > >> >>>>>> in order to make just an accelerator component we
> > perform
> > > > > > quite a
> > > > > > > >> bit
> > > > > > > >> >> of
> > > > > > > >> >>>>>> woodoo magic during the packaging stage of the Bigtop
> > > > build,
> > > > > > > >> shuffling
> > > > > > > >> >>>> jars
> > > > > > > >> >>>>>> from here and there. And that's quite crazy, honestly
> > ;)
> > > > > > > >> >>>>>>
> > > > > > > >> >>>>>> Cos
> > > > > > > >> >>>>>>
> > > > > > > >> >>>>>> On Mon, Nov 21, 2016 at 03:33PM, Valentin Kulichenko
> > > wrote:
> > > > > > > >> >>>>>>
> > > > > > > >> >>>>>> I tend to agree with Denis. I see only these
> > differences
> > > > > > between
> > > > > > > >> >> Hadoop
> > > > > > > >> >>>>>> Accelerator and Fabric builds (correct me if I miss
> > > > > something):
> > > > > > > >> >>>>>>
> > > > > > > >> >>>>>> - Limited set of available modules and no optional
> > > modules
> > > > in
> > > > > > > >> Hadoop
> > > > > > > >> >>>>>> Accelerator.
> > > > > > > >> >>>>>> - No ignite-hadoop module in Fabric.
> > > > > > > >> >>>>>> - Additional scripts, configs and instructions
> included
> > > in
> > > > > > Hadoop
> > > > > > > >> >>>>>> Accelerator.
> > > > > > > >> >>>>>>
> > > > > > > >> >>>>>> And the list of included modules frankly looks very
> > > weird.
> > > > > Here
> > > > > > > are
> > > > > > > >> >> only
> > > > > > > >> >>>>>> some of the issues I noticed:
> > > > > > > >> >>>>>>
> > > > > > > >> >>>>>> - ignite-indexing and ignite-spark are mandatory.
> Even
> > if
> > > > we
> > > > > > need
> > > > > > > >> them
> > > > > > > >> >>>>>> for Hadoop Acceleration (which I doubt), are they
> > really
> > > > > > required
> > > > > > > >> or
> > > > > > > >> >>>> can
> > > > > > > >> >>>>>> be
> > > > > > > >> >>>>>> optional?
> > > > > > > >> >>>>>> - We force to use ignite-log4j module without
> providing
> > > > other
> > > > > > > >> logger
> > > > > > > >> >>>>>> options (e.g., SLF).
> > > > > > > >> >>>>>> - We don't include ignite-aws module. How to use
> Hadoop
> > > > > > > Accelerator
> > > > > > > >> >>>> with
> > > > > > > >> >>>>>> S3 discovery?
> > > > > > > >> >>>>>> - Etc.
> > > > > > > >> >>>>>>
> > > > > > > >> >>>>>> It seems to me that if we try to fix all this issue,
> > > there
> > > > > will
> > > > > > > be
> > > > > > > >> >>>>>> virtually no difference between Fabric and Hadoop
> > > > Accelerator
> > > > > > > >> builds
> > > > > > > >> >>>> except
> > > > > > > >> >>>>>> couple of scripts and config files. If so, there is
> no
> > > > reason
> > > > > > to
> > > > > > > >> have
> > > > > > > >> >>>> two
> > > > > > > >> >>>>>> builds.
> > > > > > > >> >>>>>>
> > > > > > > >> >>>>>> -Val
> > > > > > > >> >>>>>>
> > > > > > > >> >>>>>> On Mon, Nov 21, 2016 at 3:13 PM, Denis Magda <
> > > > > > dmagda@apache.org>
> > > > > > > >> >> wrote:
> > > > > > > >> >>>>>>
> > > > > > > >> >>>>>> On the separate note, in the Bigtop, we start looking
> > > into
> > > > > > > changing
> > > > > > > >> >> the
> > > > > > > >> >>>>>>
> > > > > > > >> >>>>>> way we
> > > > > > > >> >>>>>>
> > > > > > > >> >>>>>> deliver Ignite and we'll likely to start offering the
> > > whole
> > > > > > 'data
> > > > > > > >> >>>> fabric'
> > > > > > > >> >>>>>> experience instead of the mere "hadoop-acceleration”.
> > > > > > > >> >>>>>>
> > > > > > > >> >>>>>>
> > > > > > > >> >>>>>> And you still will be using hadoop-accelerator libs
> of
> > > > > Ignite,
> > > > > > > >> right?
> > > > > > > >> >>>>>>
> > > > > > > >> >>>>>> I’m thinking of if there is a need to keep releasing
> > > Hadoop
> > > > > > > >> >> Accelerator
> > > > > > > >> >>>> as
> > > > > > > >> >>>>>> a separate delivery.
> > > > > > > >> >>>>>> What if we start releasing the accelerator as a part
> of
> > > the
> > > > > > > >> standard
> > > > > > > >> >>>>>> fabric binary putting hadoop-accelerator libs under
> > > > > ‘optional’
> > > > > > > >> folder?
> > > > > > > >> >>>>>>
> > > > > > > >> >>>>>> —
> > > > > > > >> >>>>>> Denis
> > > > > > > >> >>>>>>
> > > > > > > >> >>>>>> On Nov 21, 2016, at 12:19 PM, Konstantin Boudnik <
> > > > > > cos@apache.org
> > > > > > > >
> > > > > > > >> >>>> wrote:
> > > > > > > >> >>>>>>
> > > > > > > >> >>>>>> What Denis said: spark has been added to the Hadoop
> > > > > accelerator
> > > > > > > as
> > > > > > > >> a
> > > > > > > >> >> way
> > > > > > > >> >>>>>>
> > > > > > > >> >>>>>> to
> > > > > > > >> >>>>>>
> > > > > > > >> >>>>>> boost the performance of more than just MR compute of
> > the
> > > > > > Hadoop
> > > > > > > >> >> stack,
> > > > > > > >> >>>>>>
> > > > > > > >> >>>>>> IIRC.
> > > > > > > >> >>>>>>
> > > > > > > >> >>>>>> For what it worth, Spark is considered a part of
> Hadoop
> > > at
> > > > > > large.
> > > > > > > >> >>>>>>
> > > > > > > >> >>>>>> On the separate note, in the Bigtop, we start looking
> > > into
> > > > > > > changing
> > > > > > > >> >> the
> > > > > > > >> >>>>>>
> > > > > > > >> >>>>>> way we
> > > > > > > >> >>>>>>
> > > > > > > >> >>>>>> deliver Ignite and we'll likely to start offering the
> > > whole
> > > > > > 'data
> > > > > > > >> >>>> fabric'
> > > > > > > >> >>>>>> experience instead of the mere "hadoop-acceleration".
> > > > > > > >> >>>>>>
> > > > > > > >> >>>>>> Cos
> > > > > > > >> >>>>>>
> > > > > > > >> >>>>>> On Mon, Nov 21, 2016 at 09:54AM, Denis Magda wrote:
> > > > > > > >> >>>>>>
> > > > > > > >> >>>>>> Val,
> > > > > > > >> >>>>>>
> > > > > > > >> >>>>>> Ignite Hadoop module includes not only the map-reduce
> > > > > > accelerator
> > > > > > > >> but
> > > > > > > >> >>>>>>
> > > > > > > >> >>>>>> Ignite
> > > > > > > >> >>>>>>
> > > > > > > >> >>>>>> Hadoop File System component as well. The latter can
> be
> > > > used
> > > > > in
> > > > > > > >> >>>>>>
> > > > > > > >> >>>>>> deployments
> > > > > > > >> >>>>>>
> > > > > > > >> >>>>>> like HDFS+IGFS+Ignite Spark + Spark.
> > > > > > > >> >>>>>>
> > > > > > > >> >>>>>> Considering this I’m for the second solution proposed
> > by
> > > > you:
> > > > > > put
> > > > > > > >> both
> > > > > > > >> >>>>>>
> > > > > > > >> >>>>>> 2.10
> > > > > > > >> >>>>>>
> > > > > > > >> >>>>>> and 2.11 ignite-spark modules under ‘optional’ folder
> > of
> > > > > Ignite
> > > > > > > >> Hadoop
> > > > > > > >> >>>>>> Accelerator distribution.
> > > > > > > >> >>>>>> https://issues.apache.org/jira/browse/IGNITE-4254 <
> > > > > > > >> >>>>>>
> > > > > > > >> >>>>>> https://issues.apache.org/jira/browse/IGNITE-4254>
> > > > > > > >> >>>>>>
> > > > > > > >> >>>>>>
> > > > > > > >> >>>>>> BTW, this task may be affected or related to the
> > > following
> > > > > > ones:
> > > > > > > >> >>>>>> https://issues.apache.org/jira/browse/IGNITE-3596 <
> > > > > > > >> >>>>>>
> > > > > > > >> >>>>>> https://issues.apache.org/jira/browse/IGNITE-3596>
> > > > > > > >> >>>>>>
> > > > > > > >> >>>>>> https://issues.apache.org/jira/browse/IGNITE-3822
> > > > > > > >> >>>>>>
> > > > > > > >> >>>>>> —
> > > > > > > >> >>>>>> Denis
> > > > > > > >> >>>>>>
> > > > > > > >> >>>>>> On Nov 19, 2016, at 1:26 PM, Valentin Kulichenko <
> > > > > > > >> >>>>>>
> > > > > > > >> >>>>>> valentin.kulichenko@gmail.com> wrote:
> > > > > > > >> >>>>>>
> > > > > > > >> >>>>>>
> > > > > > > >> >>>>>> Hadoop Accelerator is a plugin to Ignite and this
> > plugin
> > > is
> > > > > > used
> > > > > > > by
> > > > > > > >> >>>>>>
> > > > > > > >> >>>>>> Hadoop
> > > > > > > >> >>>>>>
> > > > > > > >> >>>>>> when running its jobs. ignite-spark module only
> > provides
> > > > > > > IgniteRDD
> > > > > > > >> >>>>>>
> > > > > > > >> >>>>>> which
> > > > > > > >> >>>>>>
> > > > > > > >> >>>>>> Hadoop obviously will never use.
> > > > > > > >> >>>>>>
> > > > > > > >> >>>>>> Is there another use case for Hadoop Accelerator
> which
> > > I'm
> > > > > > > missing?
> > > > > > > >> >>>>>>
> > > > > > > >> >>>>>> -Val
> > > > > > > >> >>>>>>
> > > > > > > >> >>>>>> On Sat, Nov 19, 2016 at 3:12 AM, Dmitriy Setrakyan <
> > > > > > > >> >>>>>>
> > > > > > > >> >>>>>> dsetrakyan@apache.org>
> > > > > > > >> >>>>>>
> > > > > > > >> >>>>>> wrote:
> > > > > > > >> >>>>>>
> > > > > > > >> >>>>>> Why do you think that spark module is not needed in
> our
> > > > > hadoop
> > > > > > > >> build?
> > > > > > > >> >>>>>>
> > > > > > > >> >>>>>> On Fri, Nov 18, 2016 at 5:44 PM, Valentin Kulichenko
> <
> > > > > > > >> >>>>>> valentin.kulichenko@gmail.com> wrote:
> > > > > > > >> >>>>>>
> > > > > > > >> >>>>>> Folks,
> > > > > > > >> >>>>>>
> > > > > > > >> >>>>>> Is there anyone who understands the purpose of
> > including
> > > > > > > >> ignite-spark
> > > > > > > >> >>>>>> module in the Hadoop Accelerator build? I can't
> figure
> > > out
> > > > a
> > > > > > use
> > > > > > > >> >>>>>>
> > > > > > > >> >>>>>> case for
> > > > > > > >> >>>>>>
> > > > > > > >> >>>>>> which it's needed.
> > > > > > > >> >>>>>>
> > > > > > > >> >>>>>> In case we actually need it there, there is an issue
> > > then.
> > > > We
> > > > > > > >> >>>>>>
> > > > > > > >> >>>>>> actually
> > > > > > > >> >>>>>>
> > > > > > > >> >>>>>> have
> > > > > > > >> >>>>>>
> > > > > > > >> >>>>>> two ignite-spark modules, for 2.10 and 2.11. In
> Fabric
> > > > build
> > > > > > > >> >>>>>>
> > > > > > > >> >>>>>> everything
> > > > > > > >> >>>>>>
> > > > > > > >> >>>>>> is
> > > > > > > >> >>>>>>
> > > > > > > >> >>>>>> good, we put both in 'optional' folder and user can
> > > enable
> > > > > > either
> > > > > > > >> >>>>>>
> > > > > > > >> >>>>>> one.
> > > > > > > >> >>>>>>
> > > > > > > >> >>>>>> But
> > > > > > > >> >>>>>>
> > > > > > > >> >>>>>> in Hadoop Accelerator there is only 2.11 which means
> > that
> > > > the
> > > > > > > build
> > > > > > > >> >>>>>>
> > > > > > > >> >>>>>> doesn't
> > > > > > > >> >>>>>>
> > > > > > > >> >>>>>> work with 2.10 out of the box.
> > > > > > > >> >>>>>>
> > > > > > > >> >>>>>> We should either remove the module from the build, or
> > fix
> > > > the
> > > > > > > >> issue.
> > > > > > > >> >>>>>>
> > > > > > > >> >>>>>> -Val
> > > > > > > >> >>>>>>
> > > > > > > >> >>>>>>
> > > > > > > >> >>>>>>
> > > > > > > >> >>>>>>
> > > > > > > >> >>>>>>
> > > > > > > >> >>>>>>
> > > > > > > >> >>>>>>
> > > > > > > >> >>>>
> > > > > > > >> >>>>
> > > > > > > >> >>>
> > > > > > > >> >>>
> > > > > > > >> >>> --
> > > > > > > >> >>> Sergey Kozlov
> > > > > > > >> >>> GridGain Systems
> > > > > > > >> >>> www.gridgain.com
> > > > > > > >> >>
> > > > > > > >> >>
> > > > > > > >>
> > > > > > > >>
> > > > > > > >
> > > > > > >
> > > > > > >
> > > > > > > --
> > > > > > > Vladimir Ozerov
> > > > > > > Senior Software Architect
> > > > > > > GridGain Systems
> > > > > > > www.gridgain.com
> > > > > > > *+7 (960) 283 98 40*
> > > > > > >
> > > > > >
> > > > > >
> > > > > >
> > > > > > --
> > > > > > Sergey Kozlov
> > > > > > GridGain Systems
> > > > > > www.gridgain.com
> > > > > >
> > > > >
> > > >
> > > >
> > > >
> > > > --
> > > > Vladimir Ozerov
> > > > Senior Software Architect
> > > > GridGain Systems
> > > > www.gridgain.com
> > > > *+7 (960) 283 98 40*
> > > >
> > >
> > >
> > >
> > > --
> > > Sergey Kozlov
> > > GridGain Systems
> > > www.gridgain.com
> > >
> >
>
>
>
> --
> Vladimir Ozerov
> Senior Software Architect
> GridGain Systems
> www.gridgain.com
> *+7 (960) 283 98 40*
>

Re: ignite-spark module in Hadoop Accelerator

Posted by Vladimir Ozerov <vo...@gridgain.com>.

Dima,

The idea is not to add libs to Hadoop Accelerator, but to remove
Accelerator edition altogether. Instead, we will add accelerator JARs to
fabric.

On Fri, Dec 9, 2016 at 4:45 AM, Dmitriy Setrakyan <ds...@apache.org>
wrote:

> Guys, again, Hadoop edition is for *plug-n-play* Hadoop acceleration only.
> Why are we trying to add more libs to it, if the only libs required are to
> enable IGFS and Ignite MapReduce? It seems like we are trying to make the
> Hadoop accelerator into something it was never meant to be.
>
> D.
>
> On Thu, Dec 8, 2016 at 1:55 AM, Sergey Kozlov <sk...@gridgain.com>
> wrote:
>
> > Another point is that hadoop edition has no optional modules. It forces
> > user to download the fabric edition and copy module from there.
> >
> > On Thu, Dec 8, 2016 at 12:19 PM, Vladimir Ozerov <vo...@gridgain.com>
> > wrote:
> >
> > > Work for ourselves - is to maintain two separate editions, while
> > everything
> > > can be easily merged into a single distribution.
> > >
> > > On Wed, Dec 7, 2016 at 3:29 AM, Dmitriy Setrakyan <
> dsetrakyan@apache.org
> > >
> > > wrote:
> > >
> > > > Why are we creating work for ourselves? What is wrong with having 2
> > > > downloads?
> > > >
> > > > Hadoop accelerator edition exists for the following 2 purposes only:
> > > >
> > > >    - accelerate HDFS with Ignite In-Memory File System (IGFS)
> > > >    - accelerate Hadoop MapReduce with Ignite In-Memory MapReduce
> > > >
> > > > I agree with the original email from Valentin that Spark libs should
> > not
> > > be
> > > > included into hadoop-accelerator download. Spark integration is not
> > part
> > > of
> > > > Ignite Hadoop Accelerator and should be included only into the Ignite
> > > > fabric download.
> > > >
> > > > D.
> > > >
> > > >
> > > >
> > > > On Tue, Dec 6, 2016 at 12:30 AM, Sergey Kozlov <skozlov@gridgain.com
> >
> > > > wrote:
> > > >
> > > > > Hi
> > > > >
> > > > > In general I agree with Vladimir but would suggest more technical
> > > > details:
> > > > >
> > > > > Due the need to collect particular CLASS_PATHs for fabric and
> hadoop
> > > > > editions we can change the logic of processing of libs directory
> > > > >
> > > > > 1. Introduce libs/hadoop and libs/fabric directories. These
> > directories
> > > > are
> > > > > root directories for specific modules for hadoop and fabric
> > > > > editions respectively
> > > > > 2. Change collecting of directories for CLASS_PATH for ignite.sh:
> > > > >  - collect everything for libs except libs/hadoop
> > > > >  - collect everything from libs/fabric
> > > > > 3. Add ignite-hadoop-accelerator.{sh|bat} script (also it may make
> > > > initial
> > > > > setup instead of setup-hadoop.sh) that constructs CLASS_PATH by
> > > following
> > > > > way:
> > > > >  - collect everything for libs except libs/fabirc
> > > > >  - collect everything from libs/hadoop
> > > > >
> > > > > This approach allows us following:
> > > > >  - share common modules across both editions (just put in libs)
> > > > >  - do not share edition-specific modules (either put in libs/hadoop
> > or
> > > in
> > > > > libs/fabric)
> > > > >
> > > > >
> > > > >
> > > > >
> > > > > On Mon, Dec 5, 2016 at 11:56 PM, Vladimir Ozerov <
> > vozerov@gridgain.com
> > > >
> > > > > wrote:
> > > > >
> > > > > > Agree. I do not see any reasons to have two different products.
> > > > Instead,
> > > > > > just add ignite-hadoop.jar to distribution, and add separate
> script
> > > to
> > > > > > start Accelerator. We can go the same way as we did for
> > "platforms":
> > > > > create
> > > > > > separate top-level folder "hadoop" in Fabric distribution and put
> > all
> > > > > > realted Hadoop Acceleratro stuff there.
> > > > > >
> > > > > > On Fri, Dec 2, 2016 at 10:46 PM, Valentin Kulichenko <
> > > > > > valentin.kulichenko@gmail.com> wrote:
> > > > > >
> > > > > > > In general, I don't quite understand why we should move any
> > > component
> > > > > > > outside of Fabric. The concept of Fabric is to have everything,
> > no?
> > > > :)
> > > > > In
> > > > > > > other words, if a cluster was once setup for Hadoop
> Acceleration,
> > > why
> > > > > not
> > > > > > > allow to create a cache and/or run a task using native Ignite
> > APIs
> > > > > > sometime
> > > > > > > later. We follow this approach with all our components and
> > modules,
> > > > but
> > > > > > not
> > > > > > > with ignite-hadoop for some reason.
> > > > > > >
> > > > > > > If we get rid of Hadoop Accelerator build, initial setup of
> > Hadoop
> > > > > > > integration can potentially become a bit more complicated, but
> > with
> > > > > > proper
> > > > > > > documentation I don't think this is going to be a problem,
> > because
> > > it
> > > > > > > requires multiple steps now anyway. And frankly the same can be
> > > said
> > > > > > about
> > > > > > > any optional module we have - enabling it requires some
> > additional
> > > > > steps
> > > > > > as
> > > > > > > it doesn't work out of the box.
> > > > > > >
> > > > > > > -Val
> > > > > > >
> > > > > > > On Fri, Dec 2, 2016 at 11:38 AM, Denis Magda <
> dmagda@apache.org>
> > > > > wrote:
> > > > > > >
> > > > > > >> Dmitriy,
> > > > > > >>
> > > > > > >> >   - the "lib/" folder has much fewer libraries that in
> fabric,
> > > > > simply
> > > > > > >> >   becomes many dependencies don't make sense for hadoop
> > > > environment
> > > > > > >>
> > > > > > >> This reason why the discussion moved to this direction is
> > exactly
> > > in
> > > > > > that.
> > > > > > >>
> > > > > > >> How do we decide what should be a part of Hadoop Accelerator
> and
> > > > what
> > > > > > >> should be excluded? If you read through Val and Cos comments
> > below
> > > > > > you’ll
> > > > > > >> get more insights.
> > > > > > >>
> > > > > > >> In general, we need to have a clear understanding on what's
> > Hadoop
> > > > > > >> Accelerator distribution use case. This will help us to come
> up
> > > > with a
> > > > > > >> final decision.
> > > > > > >>
> > > > > > >> If the accelerator is supposed to be plugged-in into an
> existed
> > > > Hadoop
> > > > > > >> environment by enabling MapReduce and/IGFS at the
> configuration
> > > > level
> > > > > > then
> > > > > > >> we should simply remove ignite-indexing, ignite-spark modules
> > and
> > > > add
> > > > > > >> additional logging libs as well as AWS, GCE integrations’
> > > packages.
> > > > > > >>
> > > > > > >> But, wait, what if a user wants to leverage from Ignite Spark
> > > > > > >> Integration, Ignite SQL or Geospatial queries, Ignite
> streaming
> > > > > > >> capabilities after he has already plugged-in the accelerator.
> > What
> > > > if
> > > > > > he is
> > > > > > >> ready to modify his existed code. He can’t simply switch to
> the
> > > > fabric
> > > > > > on
> > > > > > >> an application side because the fabric doesn’t include
> > > accelerator’s
> > > > > > libs
> > > > > > >> that are still needed. He can’t solely rely on the accelerator
> > > > > > distribution
> > > > > > >> as well which misses some libs. And, obviously, the user
> starts
> > > > > > shuffling
> > > > > > >> libs in between the fabric and accelerator to get what is
> > > required.
> > > > > > >>
> > > > > > >> Vladimir, can you share your thoughts on this?
> > > > > > >>
> > > > > > >> —
> > > > > > >> Denis
> > > > > > >>
> > > > > > >>
> > > > > > >>
> > > > > > >> > On Nov 30, 2016, at 11:18 PM, Dmitriy Setrakyan <
> > > > > > dsetrakyan@apache.org>
> > > > > > >> wrote:
> > > > > > >> >
> > > > > > >> > Guys,
> > > > > > >> >
> > > > > > >> > I just downloaded the hadoop accelerator and here are the
> > > > > differences
> > > > > > >> from
> > > > > > >> > the fabric edition that jump at me right away:
> > > > > > >> >
> > > > > > >> >   - the "bin/" folder has "setup-hadoop" scripts
> > > > > > >> >   - the "config/" folder has "hadoop" subfolder with
> necessary
> > > > > > >> >   hadoop-related configuration
> > > > > > >> >   - the "lib/" folder has much fewer libraries that in
> fabric,
> > > > > simply
> > > > > > >> >   becomes many dependencies don't make sense for hadoop
> > > > environment
> > > > > > >> >
> > > > > > >> > I currently don't see how we can merge the hadoop
> accelerator
> > > with
> > > > > > >> standard
> > > > > > >> > fabric edition.
> > > > > > >> >
> > > > > > >> > D.
> > > > > > >> >
> > > > > > >> > On Thu, Dec 1, 2016 at 9:54 AM, Denis Magda <
> > dmagda@apache.org>
> > > > > > wrote:
> > > > > > >> >
> > > > > > >> >> Vovan,
> > > > > > >> >>
> > > > > > >> >> As one of hadoop maintainers, please share your point of
> view
> > > on
> > > > > > this.
> > > > > > >> >>
> > > > > > >> >> —
> > > > > > >> >> Denis
> > > > > > >> >>
> > > > > > >> >>> On Nov 30, 2016, at 10:49 PM, Sergey Kozlov <
> > > > skozlov@gridgain.com
> > > > > >
> > > > > > >> >> wrote:
> > > > > > >> >>>
> > > > > > >> >>> Denis
> > > > > > >> >>>
> > > > > > >> >>> I agree that at the moment there's no reason to split into
> > > > fabric
> > > > > > and
> > > > > > >> >>> hadoop editions.
> > > > > > >> >>>
> > > > > > >> >>> On Thu, Dec 1, 2016 at 4:45 AM, Denis Magda <
> > > dmagda@apache.org>
> > > > > > >> wrote:
> > > > > > >> >>>
> > > > > > >> >>>> Hadoop Accelerator doesn’t require any additional
> libraries
> > > in
> > > > > > >> compare
> > > > > > >> >> to
> > > > > > >> >>>> those we have in the fabric build. It only lacks some of
> > them
> > > > as
> > > > > > Val
> > > > > > >> >>>> mentioned below.
> > > > > > >> >>>>
> > > > > > >> >>>> Wouldn’t it better to discontinue Hadoop Accelerator
> > edition
> > > > and
> > > > > > >> simply
> > > > > > >> >>>> deliver hadoop jar and its configs as a part of the
> fabric?
> > > > > > >> >>>>
> > > > > > >> >>>> —
> > > > > > >> >>>> Denis
> > > > > > >> >>>>
> > > > > > >> >>>>> On Nov 27, 2016, at 3:12 PM, Dmitriy Setrakyan <
> > > > > > >> dsetrakyan@apache.org>
> > > > > > >> >>>> wrote:
> > > > > > >> >>>>>
> > > > > > >> >>>>> Separate edition for the Hadoop Accelerator was
> primarily
> > > > driven
> > > > > > by
> > > > > > >> the
> > > > > > >> >>>>> default libraries. Hadoop Accelerator requires many more
> > > > > libraries
> > > > > > >> as
> > > > > > >> >>>> well
> > > > > > >> >>>>> as configuration settings compared to the standard
> fabric
> > > > > > download.
> > > > > > >> >>>>>
> > > > > > >> >>>>> Now, as far as spark integration is concerned, I am not
> > sure
> > > > > which
> > > > > > >> >>>> edition
> > > > > > >> >>>>> it belongs in, Hadoop Accelerator or standard fabric.
> > > > > > >> >>>>>
> > > > > > >> >>>>> D.
> > > > > > >> >>>>>
> > > > > > >> >>>>> On Sat, Nov 26, 2016 at 7:39 PM, Denis Magda <
> > > > dmagda@apache.org
> > > > > >
> > > > > > >> >> wrote:
> > > > > > >> >>>>>
> > > > > > >> >>>>>> *Dmitriy*,
> > > > > > >> >>>>>>
> > > > > > >> >>>>>> I do believe that you should know why the community
> > decided
> > > > to
> > > > > a
> > > > > > >> >>>> separate
> > > > > > >> >>>>>> edition for the Hadoop Accelerator. What was the reason
> > for
> > > > > that?
> > > > > > >> >>>>>> Presently, as I see, it brings more confusion and
> > > > difficulties
> > > > > > >> rather
> > > > > > >> >>>> then
> > > > > > >> >>>>>> benefit.
> > > > > > >> >>>>>>
> > > > > > >> >>>>>> —
> > > > > > >> >>>>>> Denis
> > > > > > >> >>>>>>
> > > > > > >> >>>>>> On Nov 26, 2016, at 2:14 PM, Konstantin Boudnik <
> > > > > cos@apache.org>
> > > > > > >> >> wrote:
> > > > > > >> >>>>>>
> > > > > > >> >>>>>> In fact I am very much agree with you. Right now,
> running
> > > the
> > > > > > >> >>>> "accelerator"
> > > > > > >> >>>>>> component in Bigtop disto gives one a pretty much
> > complete
> > > > > fabric
> > > > > > >> >>>> anyway.
> > > > > > >> >>>>>> But
> > > > > > >> >>>>>> in order to make just an accelerator component we
> perform
> > > > > quite a
> > > > > > >> bit
> > > > > > >> >> of
> > > > > > >> >>>>>> woodoo magic during the packaging stage of the Bigtop
> > > build,
> > > > > > >> shuffling
> > > > > > >> >>>> jars
> > > > > > >> >>>>>> from here and there. And that's quite crazy, honestly
> ;)
> > > > > > >> >>>>>>
> > > > > > >> >>>>>> Cos
> > > > > > >> >>>>>>
> > > > > > >> >>>>>> On Mon, Nov 21, 2016 at 03:33PM, Valentin Kulichenko
> > wrote:
> > > > > > >> >>>>>>
> > > > > > >> >>>>>> I tend to agree with Denis. I see only these
> differences
> > > > > between
> > > > > > >> >> Hadoop
> > > > > > >> >>>>>> Accelerator and Fabric builds (correct me if I miss
> > > > something):
> > > > > > >> >>>>>>
> > > > > > >> >>>>>> - Limited set of available modules and no optional
> > modules
> > > in
> > > > > > >> Hadoop
> > > > > > >> >>>>>> Accelerator.
> > > > > > >> >>>>>> - No ignite-hadoop module in Fabric.
> > > > > > >> >>>>>> - Additional scripts, configs and instructions included
> > in
> > > > > Hadoop
> > > > > > >> >>>>>> Accelerator.
> > > > > > >> >>>>>>
> > > > > > >> >>>>>> And the list of included modules frankly looks very
> > weird.
> > > > Here
> > > > > > are
> > > > > > >> >> only
> > > > > > >> >>>>>> some of the issues I noticed:
> > > > > > >> >>>>>>
> > > > > > >> >>>>>> - ignite-indexing and ignite-spark are mandatory. Even
> if
> > > we
> > > > > need
> > > > > > >> them
> > > > > > >> >>>>>> for Hadoop Acceleration (which I doubt), are they
> really
> > > > > required
> > > > > > >> or
> > > > > > >> >>>> can
> > > > > > >> >>>>>> be
> > > > > > >> >>>>>> optional?
> > > > > > >> >>>>>> - We force to use ignite-log4j module without providing
> > > other
> > > > > > >> logger
> > > > > > >> >>>>>> options (e.g., SLF).
> > > > > > >> >>>>>> - We don't include ignite-aws module. How to use Hadoop
> > > > > > Accelerator
> > > > > > >> >>>> with
> > > > > > >> >>>>>> S3 discovery?
> > > > > > >> >>>>>> - Etc.
> > > > > > >> >>>>>>
> > > > > > >> >>>>>> It seems to me that if we try to fix all this issue,
> > there
> > > > will
> > > > > > be
> > > > > > >> >>>>>> virtually no difference between Fabric and Hadoop
> > > Accelerator
> > > > > > >> builds
> > > > > > >> >>>> except
> > > > > > >> >>>>>> couple of scripts and config files. If so, there is no
> > > reason
> > > > > to
> > > > > > >> have
> > > > > > >> >>>> two
> > > > > > >> >>>>>> builds.
> > > > > > >> >>>>>>
> > > > > > >> >>>>>> -Val
> > > > > > >> >>>>>>
> > > > > > >> >>>>>> On Mon, Nov 21, 2016 at 3:13 PM, Denis Magda <
> > > > > dmagda@apache.org>
> > > > > > >> >> wrote:
> > > > > > >> >>>>>>
> > > > > > >> >>>>>> On the separate note, in the Bigtop, we start looking
> > into
> > > > > > changing
> > > > > > >> >> the
> > > > > > >> >>>>>>
> > > > > > >> >>>>>> way we
> > > > > > >> >>>>>>
> > > > > > >> >>>>>> deliver Ignite and we'll likely to start offering the
> > whole
> > > > > 'data
> > > > > > >> >>>> fabric'
> > > > > > >> >>>>>> experience instead of the mere "hadoop-acceleration”.
> > > > > > >> >>>>>>
> > > > > > >> >>>>>>
> > > > > > >> >>>>>> And you still will be using hadoop-accelerator libs of
> > > > Ignite,
> > > > > > >> right?
> > > > > > >> >>>>>>
> > > > > > >> >>>>>> I’m thinking of if there is a need to keep releasing
> > Hadoop
> > > > > > >> >> Accelerator
> > > > > > >> >>>> as
> > > > > > >> >>>>>> a separate delivery.
> > > > > > >> >>>>>> What if we start releasing the accelerator as a part of
> > the
> > > > > > >> standard
> > > > > > >> >>>>>> fabric binary putting hadoop-accelerator libs under
> > > > ‘optional’
> > > > > > >> folder?
> > > > > > >> >>>>>>
> > > > > > >> >>>>>> —
> > > > > > >> >>>>>> Denis
> > > > > > >> >>>>>>
> > > > > > >> >>>>>> On Nov 21, 2016, at 12:19 PM, Konstantin Boudnik <
> > > > > cos@apache.org
> > > > > > >
> > > > > > >> >>>> wrote:
> > > > > > >> >>>>>>
> > > > > > >> >>>>>> What Denis said: spark has been added to the Hadoop
> > > > accelerator
> > > > > > as
> > > > > > >> a
> > > > > > >> >> way
> > > > > > >> >>>>>>
> > > > > > >> >>>>>> to
> > > > > > >> >>>>>>
> > > > > > >> >>>>>> boost the performance of more than just MR compute of
> the
> > > > > Hadoop
> > > > > > >> >> stack,
> > > > > > >> >>>>>>
> > > > > > >> >>>>>> IIRC.
> > > > > > >> >>>>>>
> > > > > > >> >>>>>> For what it worth, Spark is considered a part of Hadoop
> > at
> > > > > large.
> > > > > > >> >>>>>>
> > > > > > >> >>>>>> On the separate note, in the Bigtop, we start looking
> > into
> > > > > > changing
> > > > > > >> >> the
> > > > > > >> >>>>>>
> > > > > > >> >>>>>> way we
> > > > > > >> >>>>>>
> > > > > > >> >>>>>> deliver Ignite and we'll likely to start offering the
> > whole
> > > > > 'data
> > > > > > >> >>>> fabric'
> > > > > > >> >>>>>> experience instead of the mere "hadoop-acceleration".
> > > > > > >> >>>>>>
> > > > > > >> >>>>>> Cos
> > > > > > >> >>>>>>
> > > > > > >> >>>>>> On Mon, Nov 21, 2016 at 09:54AM, Denis Magda wrote:
> > > > > > >> >>>>>>
> > > > > > >> >>>>>> Val,
> > > > > > >> >>>>>>
> > > > > > >> >>>>>> Ignite Hadoop module includes not only the map-reduce
> > > > > accelerator
> > > > > > >> but
> > > > > > >> >>>>>>
> > > > > > >> >>>>>> Ignite
> > > > > > >> >>>>>>
> > > > > > >> >>>>>> Hadoop File System component as well. The latter can be
> > > used
> > > > in
> > > > > > >> >>>>>>
> > > > > > >> >>>>>> deployments
> > > > > > >> >>>>>>
> > > > > > >> >>>>>> like HDFS+IGFS+Ignite Spark + Spark.
> > > > > > >> >>>>>>
> > > > > > >> >>>>>> Considering this I’m for the second solution proposed
> by
> > > you:
> > > > > put
> > > > > > >> both
> > > > > > >> >>>>>>
> > > > > > >> >>>>>> 2.10
> > > > > > >> >>>>>>
> > > > > > >> >>>>>> and 2.11 ignite-spark modules under ‘optional’ folder
> of
> > > > Ignite
> > > > > > >> Hadoop
> > > > > > >> >>>>>> Accelerator distribution.
> > > > > > >> >>>>>> https://issues.apache.org/jira/browse/IGNITE-4254 <
> > > > > > >> >>>>>>
> > > > > > >> >>>>>> https://issues.apache.org/jira/browse/IGNITE-4254>
> > > > > > >> >>>>>>
> > > > > > >> >>>>>>
> > > > > > >> >>>>>> BTW, this task may be affected or related to the
> > following
> > > > > ones:
> > > > > > >> >>>>>> https://issues.apache.org/jira/browse/IGNITE-3596 <
> > > > > > >> >>>>>>
> > > > > > >> >>>>>> https://issues.apache.org/jira/browse/IGNITE-3596>
> > > > > > >> >>>>>>
> > > > > > >> >>>>>> https://issues.apache.org/jira/browse/IGNITE-3822
> > > > > > >> >>>>>>
> > > > > > >> >>>>>> —
> > > > > > >> >>>>>> Denis
> > > > > > >> >>>>>>
> > > > > > >> >>>>>> On Nov 19, 2016, at 1:26 PM, Valentin Kulichenko <
> > > > > > >> >>>>>>
> > > > > > >> >>>>>> valentin.kulichenko@gmail.com> wrote:
> > > > > > >> >>>>>>
> > > > > > >> >>>>>>
> > > > > > >> >>>>>> Hadoop Accelerator is a plugin to Ignite and this
> plugin
> > is
> > > > > used
> > > > > > by
> > > > > > >> >>>>>>
> > > > > > >> >>>>>> Hadoop
> > > > > > >> >>>>>>
> > > > > > >> >>>>>> when running its jobs. ignite-spark module only
> provides
> > > > > > IgniteRDD
> > > > > > >> >>>>>>
> > > > > > >> >>>>>> which
> > > > > > >> >>>>>>
> > > > > > >> >>>>>> Hadoop obviously will never use.
> > > > > > >> >>>>>>
> > > > > > >> >>>>>> Is there another use case for Hadoop Accelerator which
> > I'm
> > > > > > missing?
> > > > > > >> >>>>>>
> > > > > > >> >>>>>> -Val
> > > > > > >> >>>>>>
> > > > > > >> >>>>>> On Sat, Nov 19, 2016 at 3:12 AM, Dmitriy Setrakyan <
> > > > > > >> >>>>>>
> > > > > > >> >>>>>> dsetrakyan@apache.org>
> > > > > > >> >>>>>>
> > > > > > >> >>>>>> wrote:
> > > > > > >> >>>>>>
> > > > > > >> >>>>>> Why do you think that spark module is not needed in our
> > > > hadoop
> > > > > > >> build?
> > > > > > >> >>>>>>
> > > > > > >> >>>>>> On Fri, Nov 18, 2016 at 5:44 PM, Valentin Kulichenko <
> > > > > > >> >>>>>> valentin.kulichenko@gmail.com> wrote:
> > > > > > >> >>>>>>
> > > > > > >> >>>>>> Folks,
> > > > > > >> >>>>>>
> > > > > > >> >>>>>> Is there anyone who understands the purpose of
> including
> > > > > > >> ignite-spark
> > > > > > >> >>>>>> module in the Hadoop Accelerator build? I can't figure
> > out
> > > a
> > > > > use
> > > > > > >> >>>>>>
> > > > > > >> >>>>>> case for
> > > > > > >> >>>>>>
> > > > > > >> >>>>>> which it's needed.
> > > > > > >> >>>>>>
> > > > > > >> >>>>>> In case we actually need it there, there is an issue
> > then.
> > > We
> > > > > > >> >>>>>>
> > > > > > >> >>>>>> actually
> > > > > > >> >>>>>>
> > > > > > >> >>>>>> have
> > > > > > >> >>>>>>
> > > > > > >> >>>>>> two ignite-spark modules, for 2.10 and 2.11. In Fabric
> > > build
> > > > > > >> >>>>>>
> > > > > > >> >>>>>> everything
> > > > > > >> >>>>>>
> > > > > > >> >>>>>> is
> > > > > > >> >>>>>>
> > > > > > >> >>>>>> good, we put both in 'optional' folder and user can
> > enable
> > > > > either
> > > > > > >> >>>>>>
> > > > > > >> >>>>>> one.
> > > > > > >> >>>>>>
> > > > > > >> >>>>>> But
> > > > > > >> >>>>>>
> > > > > > >> >>>>>> in Hadoop Accelerator there is only 2.11 which means
> that
> > > the
> > > > > > build
> > > > > > >> >>>>>>
> > > > > > >> >>>>>> doesn't
> > > > > > >> >>>>>>
> > > > > > >> >>>>>> work with 2.10 out of the box.
> > > > > > >> >>>>>>
> > > > > > >> >>>>>> We should either remove the module from the build, or
> fix
> > > the
> > > > > > >> issue.
> > > > > > >> >>>>>>
> > > > > > >> >>>>>> -Val
> > > > > > >> >>>>>>
> > > > > > >> >>>>>>
> > > > > > >> >>>>>>
> > > > > > >> >>>>>>
> > > > > > >> >>>>>>
> > > > > > >> >>>>>>
> > > > > > >> >>>>>>
> > > > > > >> >>>>
> > > > > > >> >>>>
> > > > > > >> >>>
> > > > > > >> >>>
> > > > > > >> >>> --
> > > > > > >> >>> Sergey Kozlov
> > > > > > >> >>> GridGain Systems
> > > > > > >> >>> www.gridgain.com
> > > > > > >> >>
> > > > > > >> >>
> > > > > > >>
> > > > > > >>
> > > > > > >
> > > > > >
> > > > > >
> > > > > > --
> > > > > > Vladimir Ozerov
> > > > > > Senior Software Architect
> > > > > > GridGain Systems
> > > > > > www.gridgain.com
> > > > > > *+7 (960) 283 98 40*
> > > > > >
> > > > >
> > > > >
> > > > >
> > > > > --
> > > > > Sergey Kozlov
> > > > > GridGain Systems
> > > > > www.gridgain.com
> > > > >
> > > >
> > >
> > >
> > >
> > > --
> > > Vladimir Ozerov
> > > Senior Software Architect
> > > GridGain Systems
> > > www.gridgain.com
> > > *+7 (960) 283 98 40*
> > >
> >
> >
> >
> > --
> > Sergey Kozlov
> > GridGain Systems
> > www.gridgain.com
> >
>



-- 
Vladimir Ozerov
Senior Software Architect
GridGain Systems
www.gridgain.com
*+7 (960) 283 98 40*

Re: ignite-spark module in Hadoop Accelerator

Posted by Dmitriy Setrakyan <ds...@apache.org>.

Guys, again, Hadoop edition is for *plug-n-play* Hadoop acceleration only.
Why are we trying to add more libs to it, if the only libs required are to
enable IGFS and Ignite MapReduce? It seems like we are trying to make the
Hadoop accelerator into something it was never meant to be.

D.

On Thu, Dec 8, 2016 at 1:55 AM, Sergey Kozlov <sk...@gridgain.com> wrote:

> Another point is that hadoop edition has no optional modules. It forces
> user to download the fabric edition and copy module from there.
>
> On Thu, Dec 8, 2016 at 12:19 PM, Vladimir Ozerov <vo...@gridgain.com>
> wrote:
>
> > Work for ourselves - is to maintain two separate editions, while
> everything
> > can be easily merged into a single distribution.
> >
> > On Wed, Dec 7, 2016 at 3:29 AM, Dmitriy Setrakyan <dsetrakyan@apache.org
> >
> > wrote:
> >
> > > Why are we creating work for ourselves? What is wrong with having 2
> > > downloads?
> > >
> > > Hadoop accelerator edition exists for the following 2 purposes only:
> > >
> > >    - accelerate HDFS with Ignite In-Memory File System (IGFS)
> > >    - accelerate Hadoop MapReduce with Ignite In-Memory MapReduce
> > >
> > > I agree with the original email from Valentin that Spark libs should
> not
> > be
> > > included into hadoop-accelerator download. Spark integration is not
> part
> > of
> > > Ignite Hadoop Accelerator and should be included only into the Ignite
> > > fabric download.
> > >
> > > D.
> > >
> > >
> > >
> > > On Tue, Dec 6, 2016 at 12:30 AM, Sergey Kozlov <sk...@gridgain.com>
> > > wrote:
> > >
> > > > Hi
> > > >
> > > > In general I agree with Vladimir but would suggest more technical
> > > details:
> > > >
> > > > Due the need to collect particular CLASS_PATHs for fabric and hadoop
> > > > editions we can change the logic of processing of libs directory
> > > >
> > > > 1. Introduce libs/hadoop and libs/fabric directories. These
> directories
> > > are
> > > > root directories for specific modules for hadoop and fabric
> > > > editions respectively
> > > > 2. Change collecting of directories for CLASS_PATH for ignite.sh:
> > > >  - collect everything for libs except libs/hadoop
> > > >  - collect everything from libs/fabric
> > > > 3. Add ignite-hadoop-accelerator.{sh|bat} script (also it may make
> > > initial
> > > > setup instead of setup-hadoop.sh) that constructs CLASS_PATH by
> > following
> > > > way:
> > > >  - collect everything for libs except libs/fabirc
> > > >  - collect everything from libs/hadoop
> > > >
> > > > This approach allows us following:
> > > >  - share common modules across both editions (just put in libs)
> > > >  - do not share edition-specific modules (either put in libs/hadoop
> or
> > in
> > > > libs/fabric)
> > > >
> > > >
> > > >
> > > >
> > > > On Mon, Dec 5, 2016 at 11:56 PM, Vladimir Ozerov <
> vozerov@gridgain.com
> > >
> > > > wrote:
> > > >
> > > > > Agree. I do not see any reasons to have two different products.
> > > Instead,
> > > > > just add ignite-hadoop.jar to distribution, and add separate script
> > to
> > > > > start Accelerator. We can go the same way as we did for
> "platforms":
> > > > create
> > > > > separate top-level folder "hadoop" in Fabric distribution and put
> all
> > > > > realted Hadoop Acceleratro stuff there.
> > > > >
> > > > > On Fri, Dec 2, 2016 at 10:46 PM, Valentin Kulichenko <
> > > > > valentin.kulichenko@gmail.com> wrote:
> > > > >
> > > > > > In general, I don't quite understand why we should move any
> > component
> > > > > > outside of Fabric. The concept of Fabric is to have everything,
> no?
> > > :)
> > > > In
> > > > > > other words, if a cluster was once setup for Hadoop Acceleration,
> > why
> > > > not
> > > > > > allow to create a cache and/or run a task using native Ignite
> APIs
> > > > > sometime
> > > > > > later. We follow this approach with all our components and
> modules,
> > > but
> > > > > not
> > > > > > with ignite-hadoop for some reason.
> > > > > >
> > > > > > If we get rid of Hadoop Accelerator build, initial setup of
> Hadoop
> > > > > > integration can potentially become a bit more complicated, but
> with
> > > > > proper
> > > > > > documentation I don't think this is going to be a problem,
> because
> > it
> > > > > > requires multiple steps now anyway. And frankly the same can be
> > said
> > > > > about
> > > > > > any optional module we have - enabling it requires some
> additional
> > > > steps
> > > > > as
> > > > > > it doesn't work out of the box.
> > > > > >
> > > > > > -Val
> > > > > >
> > > > > > On Fri, Dec 2, 2016 at 11:38 AM, Denis Magda <dm...@apache.org>
> > > > wrote:
> > > > > >
> > > > > >> Dmitriy,
> > > > > >>
> > > > > >> >   - the "lib/" folder has much fewer libraries that in fabric,
> > > > simply
> > > > > >> >   becomes many dependencies don't make sense for hadoop
> > > environment
> > > > > >>
> > > > > >> This reason why the discussion moved to this direction is
> exactly
> > in
> > > > > that.
> > > > > >>
> > > > > >> How do we decide what should be a part of Hadoop Accelerator and
> > > what
> > > > > >> should be excluded? If you read through Val and Cos comments
> below
> > > > > you’ll
> > > > > >> get more insights.
> > > > > >>
> > > > > >> In general, we need to have a clear understanding on what's
> Hadoop
> > > > > >> Accelerator distribution use case. This will help us to come up
> > > with a
> > > > > >> final decision.
> > > > > >>
> > > > > >> If the accelerator is supposed to be plugged-in into an existed
> > > Hadoop
> > > > > >> environment by enabling MapReduce and/IGFS at the configuration
> > > level
> > > > > then
> > > > > >> we should simply remove ignite-indexing, ignite-spark modules
> and
> > > add
> > > > > >> additional logging libs as well as AWS, GCE integrations’
> > packages.
> > > > > >>
> > > > > >> But, wait, what if a user wants to leverage from Ignite Spark
> > > > > >> Integration, Ignite SQL or Geospatial queries, Ignite streaming
> > > > > >> capabilities after he has already plugged-in the accelerator.
> What
> > > if
> > > > > he is
> > > > > >> ready to modify his existed code. He can’t simply switch to the
> > > fabric
> > > > > on
> > > > > >> an application side because the fabric doesn’t include
> > accelerator’s
> > > > > libs
> > > > > >> that are still needed. He can’t solely rely on the accelerator
> > > > > distribution
> > > > > >> as well which misses some libs. And, obviously, the user starts
> > > > > shuffling
> > > > > >> libs in between the fabric and accelerator to get what is
> > required.
> > > > > >>
> > > > > >> Vladimir, can you share your thoughts on this?
> > > > > >>
> > > > > >> —
> > > > > >> Denis
> > > > > >>
> > > > > >>
> > > > > >>
> > > > > >> > On Nov 30, 2016, at 11:18 PM, Dmitriy Setrakyan <
> > > > > dsetrakyan@apache.org>
> > > > > >> wrote:
> > > > > >> >
> > > > > >> > Guys,
> > > > > >> >
> > > > > >> > I just downloaded the hadoop accelerator and here are the
> > > > differences
> > > > > >> from
> > > > > >> > the fabric edition that jump at me right away:
> > > > > >> >
> > > > > >> >   - the "bin/" folder has "setup-hadoop" scripts
> > > > > >> >   - the "config/" folder has "hadoop" subfolder with necessary
> > > > > >> >   hadoop-related configuration
> > > > > >> >   - the "lib/" folder has much fewer libraries that in fabric,
> > > > simply
> > > > > >> >   becomes many dependencies don't make sense for hadoop
> > > environment
> > > > > >> >
> > > > > >> > I currently don't see how we can merge the hadoop accelerator
> > with
> > > > > >> standard
> > > > > >> > fabric edition.
> > > > > >> >
> > > > > >> > D.
> > > > > >> >
> > > > > >> > On Thu, Dec 1, 2016 at 9:54 AM, Denis Magda <
> dmagda@apache.org>
> > > > > wrote:
> > > > > >> >
> > > > > >> >> Vovan,
> > > > > >> >>
> > > > > >> >> As one of hadoop maintainers, please share your point of view
> > on
> > > > > this.
> > > > > >> >>
> > > > > >> >> —
> > > > > >> >> Denis
> > > > > >> >>
> > > > > >> >>> On Nov 30, 2016, at 10:49 PM, Sergey Kozlov <
> > > skozlov@gridgain.com
> > > > >
> > > > > >> >> wrote:
> > > > > >> >>>
> > > > > >> >>> Denis
> > > > > >> >>>
> > > > > >> >>> I agree that at the moment there's no reason to split into
> > > fabric
> > > > > and
> > > > > >> >>> hadoop editions.
> > > > > >> >>>
> > > > > >> >>> On Thu, Dec 1, 2016 at 4:45 AM, Denis Magda <
> > dmagda@apache.org>
> > > > > >> wrote:
> > > > > >> >>>
> > > > > >> >>>> Hadoop Accelerator doesn’t require any additional libraries
> > in
> > > > > >> compare
> > > > > >> >> to
> > > > > >> >>>> those we have in the fabric build. It only lacks some of
> them
> > > as
> > > > > Val
> > > > > >> >>>> mentioned below.
> > > > > >> >>>>
> > > > > >> >>>> Wouldn’t it better to discontinue Hadoop Accelerator
> edition
> > > and
> > > > > >> simply
> > > > > >> >>>> deliver hadoop jar and its configs as a part of the fabric?
> > > > > >> >>>>
> > > > > >> >>>> —
> > > > > >> >>>> Denis
> > > > > >> >>>>
> > > > > >> >>>>> On Nov 27, 2016, at 3:12 PM, Dmitriy Setrakyan <
> > > > > >> dsetrakyan@apache.org>
> > > > > >> >>>> wrote:
> > > > > >> >>>>>
> > > > > >> >>>>> Separate edition for the Hadoop Accelerator was primarily
> > > driven
> > > > > by
> > > > > >> the
> > > > > >> >>>>> default libraries. Hadoop Accelerator requires many more
> > > > libraries
> > > > > >> as
> > > > > >> >>>> well
> > > > > >> >>>>> as configuration settings compared to the standard fabric
> > > > > download.
> > > > > >> >>>>>
> > > > > >> >>>>> Now, as far as spark integration is concerned, I am not
> sure
> > > > which
> > > > > >> >>>> edition
> > > > > >> >>>>> it belongs in, Hadoop Accelerator or standard fabric.
> > > > > >> >>>>>
> > > > > >> >>>>> D.
> > > > > >> >>>>>
> > > > > >> >>>>> On Sat, Nov 26, 2016 at 7:39 PM, Denis Magda <
> > > dmagda@apache.org
> > > > >
> > > > > >> >> wrote:
> > > > > >> >>>>>
> > > > > >> >>>>>> *Dmitriy*,
> > > > > >> >>>>>>
> > > > > >> >>>>>> I do believe that you should know why the community
> decided
> > > to
> > > > a
> > > > > >> >>>> separate
> > > > > >> >>>>>> edition for the Hadoop Accelerator. What was the reason
> for
> > > > that?
> > > > > >> >>>>>> Presently, as I see, it brings more confusion and
> > > difficulties
> > > > > >> rather
> > > > > >> >>>> then
> > > > > >> >>>>>> benefit.
> > > > > >> >>>>>>
> > > > > >> >>>>>> —
> > > > > >> >>>>>> Denis
> > > > > >> >>>>>>
> > > > > >> >>>>>> On Nov 26, 2016, at 2:14 PM, Konstantin Boudnik <
> > > > cos@apache.org>
> > > > > >> >> wrote:
> > > > > >> >>>>>>
> > > > > >> >>>>>> In fact I am very much agree with you. Right now, running
> > the
> > > > > >> >>>> "accelerator"
> > > > > >> >>>>>> component in Bigtop disto gives one a pretty much
> complete
> > > > fabric
> > > > > >> >>>> anyway.
> > > > > >> >>>>>> But
> > > > > >> >>>>>> in order to make just an accelerator component we perform
> > > > quite a
> > > > > >> bit
> > > > > >> >> of
> > > > > >> >>>>>> woodoo magic during the packaging stage of the Bigtop
> > build,
> > > > > >> shuffling
> > > > > >> >>>> jars
> > > > > >> >>>>>> from here and there. And that's quite crazy, honestly ;)
> > > > > >> >>>>>>
> > > > > >> >>>>>> Cos
> > > > > >> >>>>>>
> > > > > >> >>>>>> On Mon, Nov 21, 2016 at 03:33PM, Valentin Kulichenko
> wrote:
> > > > > >> >>>>>>
> > > > > >> >>>>>> I tend to agree with Denis. I see only these differences
> > > > between
> > > > > >> >> Hadoop
> > > > > >> >>>>>> Accelerator and Fabric builds (correct me if I miss
> > > something):
> > > > > >> >>>>>>
> > > > > >> >>>>>> - Limited set of available modules and no optional
> modules
> > in
> > > > > >> Hadoop
> > > > > >> >>>>>> Accelerator.
> > > > > >> >>>>>> - No ignite-hadoop module in Fabric.
> > > > > >> >>>>>> - Additional scripts, configs and instructions included
> in
> > > > Hadoop
> > > > > >> >>>>>> Accelerator.
> > > > > >> >>>>>>
> > > > > >> >>>>>> And the list of included modules frankly looks very
> weird.
> > > Here
> > > > > are
> > > > > >> >> only
> > > > > >> >>>>>> some of the issues I noticed:
> > > > > >> >>>>>>
> > > > > >> >>>>>> - ignite-indexing and ignite-spark are mandatory. Even if
> > we
> > > > need
> > > > > >> them
> > > > > >> >>>>>> for Hadoop Acceleration (which I doubt), are they really
> > > > required
> > > > > >> or
> > > > > >> >>>> can
> > > > > >> >>>>>> be
> > > > > >> >>>>>> optional?
> > > > > >> >>>>>> - We force to use ignite-log4j module without providing
> > other
> > > > > >> logger
> > > > > >> >>>>>> options (e.g., SLF).
> > > > > >> >>>>>> - We don't include ignite-aws module. How to use Hadoop
> > > > > Accelerator
> > > > > >> >>>> with
> > > > > >> >>>>>> S3 discovery?
> > > > > >> >>>>>> - Etc.
> > > > > >> >>>>>>
> > > > > >> >>>>>> It seems to me that if we try to fix all this issue,
> there
> > > will
> > > > > be
> > > > > >> >>>>>> virtually no difference between Fabric and Hadoop
> > Accelerator
> > > > > >> builds
> > > > > >> >>>> except
> > > > > >> >>>>>> couple of scripts and config files. If so, there is no
> > reason
> > > > to
> > > > > >> have
> > > > > >> >>>> two
> > > > > >> >>>>>> builds.
> > > > > >> >>>>>>
> > > > > >> >>>>>> -Val
> > > > > >> >>>>>>
> > > > > >> >>>>>> On Mon, Nov 21, 2016 at 3:13 PM, Denis Magda <
> > > > dmagda@apache.org>
> > > > > >> >> wrote:
> > > > > >> >>>>>>
> > > > > >> >>>>>> On the separate note, in the Bigtop, we start looking
> into
> > > > > changing
> > > > > >> >> the
> > > > > >> >>>>>>
> > > > > >> >>>>>> way we
> > > > > >> >>>>>>
> > > > > >> >>>>>> deliver Ignite and we'll likely to start offering the
> whole
> > > > 'data
> > > > > >> >>>> fabric'
> > > > > >> >>>>>> experience instead of the mere "hadoop-acceleration”.
> > > > > >> >>>>>>
> > > > > >> >>>>>>
> > > > > >> >>>>>> And you still will be using hadoop-accelerator libs of
> > > Ignite,
> > > > > >> right?
> > > > > >> >>>>>>
> > > > > >> >>>>>> I’m thinking of if there is a need to keep releasing
> Hadoop
> > > > > >> >> Accelerator
> > > > > >> >>>> as
> > > > > >> >>>>>> a separate delivery.
> > > > > >> >>>>>> What if we start releasing the accelerator as a part of
> the
> > > > > >> standard
> > > > > >> >>>>>> fabric binary putting hadoop-accelerator libs under
> > > ‘optional’
> > > > > >> folder?
> > > > > >> >>>>>>
> > > > > >> >>>>>> —
> > > > > >> >>>>>> Denis
> > > > > >> >>>>>>
> > > > > >> >>>>>> On Nov 21, 2016, at 12:19 PM, Konstantin Boudnik <
> > > > cos@apache.org
> > > > > >
> > > > > >> >>>> wrote:
> > > > > >> >>>>>>
> > > > > >> >>>>>> What Denis said: spark has been added to the Hadoop
> > > accelerator
> > > > > as
> > > > > >> a
> > > > > >> >> way
> > > > > >> >>>>>>
> > > > > >> >>>>>> to
> > > > > >> >>>>>>
> > > > > >> >>>>>> boost the performance of more than just MR compute of the
> > > > Hadoop
> > > > > >> >> stack,
> > > > > >> >>>>>>
> > > > > >> >>>>>> IIRC.
> > > > > >> >>>>>>
> > > > > >> >>>>>> For what it worth, Spark is considered a part of Hadoop
> at
> > > > large.
> > > > > >> >>>>>>
> > > > > >> >>>>>> On the separate note, in the Bigtop, we start looking
> into
> > > > > changing
> > > > > >> >> the
> > > > > >> >>>>>>
> > > > > >> >>>>>> way we
> > > > > >> >>>>>>
> > > > > >> >>>>>> deliver Ignite and we'll likely to start offering the
> whole
> > > > 'data
> > > > > >> >>>> fabric'
> > > > > >> >>>>>> experience instead of the mere "hadoop-acceleration".
> > > > > >> >>>>>>
> > > > > >> >>>>>> Cos
> > > > > >> >>>>>>
> > > > > >> >>>>>> On Mon, Nov 21, 2016 at 09:54AM, Denis Magda wrote:
> > > > > >> >>>>>>
> > > > > >> >>>>>> Val,
> > > > > >> >>>>>>
> > > > > >> >>>>>> Ignite Hadoop module includes not only the map-reduce
> > > > accelerator
> > > > > >> but
> > > > > >> >>>>>>
> > > > > >> >>>>>> Ignite
> > > > > >> >>>>>>
> > > > > >> >>>>>> Hadoop File System component as well. The latter can be
> > used
> > > in
> > > > > >> >>>>>>
> > > > > >> >>>>>> deployments
> > > > > >> >>>>>>
> > > > > >> >>>>>> like HDFS+IGFS+Ignite Spark + Spark.
> > > > > >> >>>>>>
> > > > > >> >>>>>> Considering this I’m for the second solution proposed by
> > you:
> > > > put
> > > > > >> both
> > > > > >> >>>>>>
> > > > > >> >>>>>> 2.10
> > > > > >> >>>>>>
> > > > > >> >>>>>> and 2.11 ignite-spark modules under ‘optional’ folder of
> > > Ignite
> > > > > >> Hadoop
> > > > > >> >>>>>> Accelerator distribution.
> > > > > >> >>>>>> https://issues.apache.org/jira/browse/IGNITE-4254 <
> > > > > >> >>>>>>
> > > > > >> >>>>>> https://issues.apache.org/jira/browse/IGNITE-4254>
> > > > > >> >>>>>>
> > > > > >> >>>>>>
> > > > > >> >>>>>> BTW, this task may be affected or related to the
> following
> > > > ones:
> > > > > >> >>>>>> https://issues.apache.org/jira/browse/IGNITE-3596 <
> > > > > >> >>>>>>
> > > > > >> >>>>>> https://issues.apache.org/jira/browse/IGNITE-3596>
> > > > > >> >>>>>>
> > > > > >> >>>>>> https://issues.apache.org/jira/browse/IGNITE-3822
> > > > > >> >>>>>>
> > > > > >> >>>>>> —
> > > > > >> >>>>>> Denis
> > > > > >> >>>>>>
> > > > > >> >>>>>> On Nov 19, 2016, at 1:26 PM, Valentin Kulichenko <
> > > > > >> >>>>>>
> > > > > >> >>>>>> valentin.kulichenko@gmail.com> wrote:
> > > > > >> >>>>>>
> > > > > >> >>>>>>
> > > > > >> >>>>>> Hadoop Accelerator is a plugin to Ignite and this plugin
> is
> > > > used
> > > > > by
> > > > > >> >>>>>>
> > > > > >> >>>>>> Hadoop
> > > > > >> >>>>>>
> > > > > >> >>>>>> when running its jobs. ignite-spark module only provides
> > > > > IgniteRDD
> > > > > >> >>>>>>
> > > > > >> >>>>>> which
> > > > > >> >>>>>>
> > > > > >> >>>>>> Hadoop obviously will never use.
> > > > > >> >>>>>>
> > > > > >> >>>>>> Is there another use case for Hadoop Accelerator which
> I'm
> > > > > missing?
> > > > > >> >>>>>>
> > > > > >> >>>>>> -Val
> > > > > >> >>>>>>
> > > > > >> >>>>>> On Sat, Nov 19, 2016 at 3:12 AM, Dmitriy Setrakyan <
> > > > > >> >>>>>>
> > > > > >> >>>>>> dsetrakyan@apache.org>
> > > > > >> >>>>>>
> > > > > >> >>>>>> wrote:
> > > > > >> >>>>>>
> > > > > >> >>>>>> Why do you think that spark module is not needed in our
> > > hadoop
> > > > > >> build?
> > > > > >> >>>>>>
> > > > > >> >>>>>> On Fri, Nov 18, 2016 at 5:44 PM, Valentin Kulichenko <
> > > > > >> >>>>>> valentin.kulichenko@gmail.com> wrote:
> > > > > >> >>>>>>
> > > > > >> >>>>>> Folks,
> > > > > >> >>>>>>
> > > > > >> >>>>>> Is there anyone who understands the purpose of including
> > > > > >> ignite-spark
> > > > > >> >>>>>> module in the Hadoop Accelerator build? I can't figure
> out
> > a
> > > > use
> > > > > >> >>>>>>
> > > > > >> >>>>>> case for
> > > > > >> >>>>>>
> > > > > >> >>>>>> which it's needed.
> > > > > >> >>>>>>
> > > > > >> >>>>>> In case we actually need it there, there is an issue
> then.
> > We
> > > > > >> >>>>>>
> > > > > >> >>>>>> actually
> > > > > >> >>>>>>
> > > > > >> >>>>>> have
> > > > > >> >>>>>>
> > > > > >> >>>>>> two ignite-spark modules, for 2.10 and 2.11. In Fabric
> > build
> > > > > >> >>>>>>
> > > > > >> >>>>>> everything
> > > > > >> >>>>>>
> > > > > >> >>>>>> is
> > > > > >> >>>>>>
> > > > > >> >>>>>> good, we put both in 'optional' folder and user can
> enable
> > > > either
> > > > > >> >>>>>>
> > > > > >> >>>>>> one.
> > > > > >> >>>>>>
> > > > > >> >>>>>> But
> > > > > >> >>>>>>
> > > > > >> >>>>>> in Hadoop Accelerator there is only 2.11 which means that
> > the
> > > > > build
> > > > > >> >>>>>>
> > > > > >> >>>>>> doesn't
> > > > > >> >>>>>>
> > > > > >> >>>>>> work with 2.10 out of the box.
> > > > > >> >>>>>>
> > > > > >> >>>>>> We should either remove the module from the build, or fix
> > the
> > > > > >> issue.
> > > > > >> >>>>>>
> > > > > >> >>>>>> -Val
> > > > > >> >>>>>>
> > > > > >> >>>>>>
> > > > > >> >>>>>>
> > > > > >> >>>>>>
> > > > > >> >>>>>>
> > > > > >> >>>>>>
> > > > > >> >>>>>>
> > > > > >> >>>>
> > > > > >> >>>>
> > > > > >> >>>
> > > > > >> >>>
> > > > > >> >>> --
> > > > > >> >>> Sergey Kozlov
> > > > > >> >>> GridGain Systems
> > > > > >> >>> www.gridgain.com
> > > > > >> >>
> > > > > >> >>
> > > > > >>
> > > > > >>
> > > > > >
> > > > >
> > > > >
> > > > > --
> > > > > Vladimir Ozerov
> > > > > Senior Software Architect
> > > > > GridGain Systems
> > > > > www.gridgain.com
> > > > > *+7 (960) 283 98 40*
> > > > >
> > > >
> > > >
> > > >
> > > > --
> > > > Sergey Kozlov
> > > > GridGain Systems
> > > > www.gridgain.com
> > > >
> > >
> >
> >
> >
> > --
> > Vladimir Ozerov
> > Senior Software Architect
> > GridGain Systems
> > www.gridgain.com
> > *+7 (960) 283 98 40*
> >
>
>
>
> --
> Sergey Kozlov
> GridGain Systems
> www.gridgain.com
>

Re: ignite-spark module in Hadoop Accelerator

Posted by Sergey Kozlov <sk...@gridgain.com>.

Another point is that hadoop edition has no optional modules. It forces
user to download the fabric edition and copy module from there.

On Thu, Dec 8, 2016 at 12:19 PM, Vladimir Ozerov <vo...@gridgain.com>
wrote:

> Work for ourselves - is to maintain two separate editions, while everything
> can be easily merged into a single distribution.
>
> On Wed, Dec 7, 2016 at 3:29 AM, Dmitriy Setrakyan <ds...@apache.org>
> wrote:
>
> > Why are we creating work for ourselves? What is wrong with having 2
> > downloads?
> >
> > Hadoop accelerator edition exists for the following 2 purposes only:
> >
> >    - accelerate HDFS with Ignite In-Memory File System (IGFS)
> >    - accelerate Hadoop MapReduce with Ignite In-Memory MapReduce
> >
> > I agree with the original email from Valentin that Spark libs should not
> be
> > included into hadoop-accelerator download. Spark integration is not part
> of
> > Ignite Hadoop Accelerator and should be included only into the Ignite
> > fabric download.
> >
> > D.
> >
> >
> >
> > On Tue, Dec 6, 2016 at 12:30 AM, Sergey Kozlov <sk...@gridgain.com>
> > wrote:
> >
> > > Hi
> > >
> > > In general I agree with Vladimir but would suggest more technical
> > details:
> > >
> > > Due the need to collect particular CLASS_PATHs for fabric and hadoop
> > > editions we can change the logic of processing of libs directory
> > >
> > > 1. Introduce libs/hadoop and libs/fabric directories. These directories
> > are
> > > root directories for specific modules for hadoop and fabric
> > > editions respectively
> > > 2. Change collecting of directories for CLASS_PATH for ignite.sh:
> > >  - collect everything for libs except libs/hadoop
> > >  - collect everything from libs/fabric
> > > 3. Add ignite-hadoop-accelerator.{sh|bat} script (also it may make
> > initial
> > > setup instead of setup-hadoop.sh) that constructs CLASS_PATH by
> following
> > > way:
> > >  - collect everything for libs except libs/fabirc
> > >  - collect everything from libs/hadoop
> > >
> > > This approach allows us following:
> > >  - share common modules across both editions (just put in libs)
> > >  - do not share edition-specific modules (either put in libs/hadoop or
> in
> > > libs/fabric)
> > >
> > >
> > >
> > >
> > > On Mon, Dec 5, 2016 at 11:56 PM, Vladimir Ozerov <vozerov@gridgain.com
> >
> > > wrote:
> > >
> > > > Agree. I do not see any reasons to have two different products.
> > Instead,
> > > > just add ignite-hadoop.jar to distribution, and add separate script
> to
> > > > start Accelerator. We can go the same way as we did for "platforms":
> > > create
> > > > separate top-level folder "hadoop" in Fabric distribution and put all
> > > > realted Hadoop Acceleratro stuff there.
> > > >
> > > > On Fri, Dec 2, 2016 at 10:46 PM, Valentin Kulichenko <
> > > > valentin.kulichenko@gmail.com> wrote:
> > > >
> > > > > In general, I don't quite understand why we should move any
> component
> > > > > outside of Fabric. The concept of Fabric is to have everything, no?
> > :)
> > > In
> > > > > other words, if a cluster was once setup for Hadoop Acceleration,
> why
> > > not
> > > > > allow to create a cache and/or run a task using native Ignite APIs
> > > > sometime
> > > > > later. We follow this approach with all our components and modules,
> > but
> > > > not
> > > > > with ignite-hadoop for some reason.
> > > > >
> > > > > If we get rid of Hadoop Accelerator build, initial setup of Hadoop
> > > > > integration can potentially become a bit more complicated, but with
> > > > proper
> > > > > documentation I don't think this is going to be a problem, because
> it
> > > > > requires multiple steps now anyway. And frankly the same can be
> said
> > > > about
> > > > > any optional module we have - enabling it requires some additional
> > > steps
> > > > as
> > > > > it doesn't work out of the box.
> > > > >
> > > > > -Val
> > > > >
> > > > > On Fri, Dec 2, 2016 at 11:38 AM, Denis Magda <dm...@apache.org>
> > > wrote:
> > > > >
> > > > >> Dmitriy,
> > > > >>
> > > > >> >   - the "lib/" folder has much fewer libraries that in fabric,
> > > simply
> > > > >> >   becomes many dependencies don't make sense for hadoop
> > environment
> > > > >>
> > > > >> This reason why the discussion moved to this direction is exactly
> in
> > > > that.
> > > > >>
> > > > >> How do we decide what should be a part of Hadoop Accelerator and
> > what
> > > > >> should be excluded? If you read through Val and Cos comments below
> > > > you’ll
> > > > >> get more insights.
> > > > >>
> > > > >> In general, we need to have a clear understanding on what's Hadoop
> > > > >> Accelerator distribution use case. This will help us to come up
> > with a
> > > > >> final decision.
> > > > >>
> > > > >> If the accelerator is supposed to be plugged-in into an existed
> > Hadoop
> > > > >> environment by enabling MapReduce and/IGFS at the configuration
> > level
> > > > then
> > > > >> we should simply remove ignite-indexing, ignite-spark modules and
> > add
> > > > >> additional logging libs as well as AWS, GCE integrations’
> packages.
> > > > >>
> > > > >> But, wait, what if a user wants to leverage from Ignite Spark
> > > > >> Integration, Ignite SQL or Geospatial queries, Ignite streaming
> > > > >> capabilities after he has already plugged-in the accelerator. What
> > if
> > > > he is
> > > > >> ready to modify his existed code. He can’t simply switch to the
> > fabric
> > > > on
> > > > >> an application side because the fabric doesn’t include
> accelerator’s
> > > > libs
> > > > >> that are still needed. He can’t solely rely on the accelerator
> > > > distribution
> > > > >> as well which misses some libs. And, obviously, the user starts
> > > > shuffling
> > > > >> libs in between the fabric and accelerator to get what is
> required.
> > > > >>
> > > > >> Vladimir, can you share your thoughts on this?
> > > > >>
> > > > >> —
> > > > >> Denis
> > > > >>
> > > > >>
> > > > >>
> > > > >> > On Nov 30, 2016, at 11:18 PM, Dmitriy Setrakyan <
> > > > dsetrakyan@apache.org>
> > > > >> wrote:
> > > > >> >
> > > > >> > Guys,
> > > > >> >
> > > > >> > I just downloaded the hadoop accelerator and here are the
> > > differences
> > > > >> from
> > > > >> > the fabric edition that jump at me right away:
> > > > >> >
> > > > >> >   - the "bin/" folder has "setup-hadoop" scripts
> > > > >> >   - the "config/" folder has "hadoop" subfolder with necessary
> > > > >> >   hadoop-related configuration
> > > > >> >   - the "lib/" folder has much fewer libraries that in fabric,
> > > simply
> > > > >> >   becomes many dependencies don't make sense for hadoop
> > environment
> > > > >> >
> > > > >> > I currently don't see how we can merge the hadoop accelerator
> with
> > > > >> standard
> > > > >> > fabric edition.
> > > > >> >
> > > > >> > D.
> > > > >> >
> > > > >> > On Thu, Dec 1, 2016 at 9:54 AM, Denis Magda <dm...@apache.org>
> > > > wrote:
> > > > >> >
> > > > >> >> Vovan,
> > > > >> >>
> > > > >> >> As one of hadoop maintainers, please share your point of view
> on
> > > > this.
> > > > >> >>
> > > > >> >> —
> > > > >> >> Denis
> > > > >> >>
> > > > >> >>> On Nov 30, 2016, at 10:49 PM, Sergey Kozlov <
> > skozlov@gridgain.com
> > > >
> > > > >> >> wrote:
> > > > >> >>>
> > > > >> >>> Denis
> > > > >> >>>
> > > > >> >>> I agree that at the moment there's no reason to split into
> > fabric
> > > > and
> > > > >> >>> hadoop editions.
> > > > >> >>>
> > > > >> >>> On Thu, Dec 1, 2016 at 4:45 AM, Denis Magda <
> dmagda@apache.org>
> > > > >> wrote:
> > > > >> >>>
> > > > >> >>>> Hadoop Accelerator doesn’t require any additional libraries
> in
> > > > >> compare
> > > > >> >> to
> > > > >> >>>> those we have in the fabric build. It only lacks some of them
> > as
> > > > Val
> > > > >> >>>> mentioned below.
> > > > >> >>>>
> > > > >> >>>> Wouldn’t it better to discontinue Hadoop Accelerator edition
> > and
> > > > >> simply
> > > > >> >>>> deliver hadoop jar and its configs as a part of the fabric?
> > > > >> >>>>
> > > > >> >>>> —
> > > > >> >>>> Denis
> > > > >> >>>>
> > > > >> >>>>> On Nov 27, 2016, at 3:12 PM, Dmitriy Setrakyan <
> > > > >> dsetrakyan@apache.org>
> > > > >> >>>> wrote:
> > > > >> >>>>>
> > > > >> >>>>> Separate edition for the Hadoop Accelerator was primarily
> > driven
> > > > by
> > > > >> the
> > > > >> >>>>> default libraries. Hadoop Accelerator requires many more
> > > libraries
> > > > >> as
> > > > >> >>>> well
> > > > >> >>>>> as configuration settings compared to the standard fabric
> > > > download.
> > > > >> >>>>>
> > > > >> >>>>> Now, as far as spark integration is concerned, I am not sure
> > > which
> > > > >> >>>> edition
> > > > >> >>>>> it belongs in, Hadoop Accelerator or standard fabric.
> > > > >> >>>>>
> > > > >> >>>>> D.
> > > > >> >>>>>
> > > > >> >>>>> On Sat, Nov 26, 2016 at 7:39 PM, Denis Magda <
> > dmagda@apache.org
> > > >
> > > > >> >> wrote:
> > > > >> >>>>>
> > > > >> >>>>>> *Dmitriy*,
> > > > >> >>>>>>
> > > > >> >>>>>> I do believe that you should know why the community decided
> > to
> > > a
> > > > >> >>>> separate
> > > > >> >>>>>> edition for the Hadoop Accelerator. What was the reason for
> > > that?
> > > > >> >>>>>> Presently, as I see, it brings more confusion and
> > difficulties
> > > > >> rather
> > > > >> >>>> then
> > > > >> >>>>>> benefit.
> > > > >> >>>>>>
> > > > >> >>>>>> —
> > > > >> >>>>>> Denis
> > > > >> >>>>>>
> > > > >> >>>>>> On Nov 26, 2016, at 2:14 PM, Konstantin Boudnik <
> > > cos@apache.org>
> > > > >> >> wrote:
> > > > >> >>>>>>
> > > > >> >>>>>> In fact I am very much agree with you. Right now, running
> the
> > > > >> >>>> "accelerator"
> > > > >> >>>>>> component in Bigtop disto gives one a pretty much complete
> > > fabric
> > > > >> >>>> anyway.
> > > > >> >>>>>> But
> > > > >> >>>>>> in order to make just an accelerator component we perform
> > > quite a
> > > > >> bit
> > > > >> >> of
> > > > >> >>>>>> woodoo magic during the packaging stage of the Bigtop
> build,
> > > > >> shuffling
> > > > >> >>>> jars
> > > > >> >>>>>> from here and there. And that's quite crazy, honestly ;)
> > > > >> >>>>>>
> > > > >> >>>>>> Cos
> > > > >> >>>>>>
> > > > >> >>>>>> On Mon, Nov 21, 2016 at 03:33PM, Valentin Kulichenko wrote:
> > > > >> >>>>>>
> > > > >> >>>>>> I tend to agree with Denis. I see only these differences
> > > between
> > > > >> >> Hadoop
> > > > >> >>>>>> Accelerator and Fabric builds (correct me if I miss
> > something):
> > > > >> >>>>>>
> > > > >> >>>>>> - Limited set of available modules and no optional modules
> in
> > > > >> Hadoop
> > > > >> >>>>>> Accelerator.
> > > > >> >>>>>> - No ignite-hadoop module in Fabric.
> > > > >> >>>>>> - Additional scripts, configs and instructions included in
> > > Hadoop
> > > > >> >>>>>> Accelerator.
> > > > >> >>>>>>
> > > > >> >>>>>> And the list of included modules frankly looks very weird.
> > Here
> > > > are
> > > > >> >> only
> > > > >> >>>>>> some of the issues I noticed:
> > > > >> >>>>>>
> > > > >> >>>>>> - ignite-indexing and ignite-spark are mandatory. Even if
> we
> > > need
> > > > >> them
> > > > >> >>>>>> for Hadoop Acceleration (which I doubt), are they really
> > > required
> > > > >> or
> > > > >> >>>> can
> > > > >> >>>>>> be
> > > > >> >>>>>> optional?
> > > > >> >>>>>> - We force to use ignite-log4j module without providing
> other
> > > > >> logger
> > > > >> >>>>>> options (e.g., SLF).
> > > > >> >>>>>> - We don't include ignite-aws module. How to use Hadoop
> > > > Accelerator
> > > > >> >>>> with
> > > > >> >>>>>> S3 discovery?
> > > > >> >>>>>> - Etc.
> > > > >> >>>>>>
> > > > >> >>>>>> It seems to me that if we try to fix all this issue, there
> > will
> > > > be
> > > > >> >>>>>> virtually no difference between Fabric and Hadoop
> Accelerator
> > > > >> builds
> > > > >> >>>> except
> > > > >> >>>>>> couple of scripts and config files. If so, there is no
> reason
> > > to
> > > > >> have
> > > > >> >>>> two
> > > > >> >>>>>> builds.
> > > > >> >>>>>>
> > > > >> >>>>>> -Val
> > > > >> >>>>>>
> > > > >> >>>>>> On Mon, Nov 21, 2016 at 3:13 PM, Denis Magda <
> > > dmagda@apache.org>
> > > > >> >> wrote:
> > > > >> >>>>>>
> > > > >> >>>>>> On the separate note, in the Bigtop, we start looking into
> > > > changing
> > > > >> >> the
> > > > >> >>>>>>
> > > > >> >>>>>> way we
> > > > >> >>>>>>
> > > > >> >>>>>> deliver Ignite and we'll likely to start offering the whole
> > > 'data
> > > > >> >>>> fabric'
> > > > >> >>>>>> experience instead of the mere "hadoop-acceleration”.
> > > > >> >>>>>>
> > > > >> >>>>>>
> > > > >> >>>>>> And you still will be using hadoop-accelerator libs of
> > Ignite,
> > > > >> right?
> > > > >> >>>>>>
> > > > >> >>>>>> I’m thinking of if there is a need to keep releasing Hadoop
> > > > >> >> Accelerator
> > > > >> >>>> as
> > > > >> >>>>>> a separate delivery.
> > > > >> >>>>>> What if we start releasing the accelerator as a part of the
> > > > >> standard
> > > > >> >>>>>> fabric binary putting hadoop-accelerator libs under
> > ‘optional’
> > > > >> folder?
> > > > >> >>>>>>
> > > > >> >>>>>> —
> > > > >> >>>>>> Denis
> > > > >> >>>>>>
> > > > >> >>>>>> On Nov 21, 2016, at 12:19 PM, Konstantin Boudnik <
> > > cos@apache.org
> > > > >
> > > > >> >>>> wrote:
> > > > >> >>>>>>
> > > > >> >>>>>> What Denis said: spark has been added to the Hadoop
> > accelerator
> > > > as
> > > > >> a
> > > > >> >> way
> > > > >> >>>>>>
> > > > >> >>>>>> to
> > > > >> >>>>>>
> > > > >> >>>>>> boost the performance of more than just MR compute of the
> > > Hadoop
> > > > >> >> stack,
> > > > >> >>>>>>
> > > > >> >>>>>> IIRC.
> > > > >> >>>>>>
> > > > >> >>>>>> For what it worth, Spark is considered a part of Hadoop at
> > > large.
> > > > >> >>>>>>
> > > > >> >>>>>> On the separate note, in the Bigtop, we start looking into
> > > > changing
> > > > >> >> the
> > > > >> >>>>>>
> > > > >> >>>>>> way we
> > > > >> >>>>>>
> > > > >> >>>>>> deliver Ignite and we'll likely to start offering the whole
> > > 'data
> > > > >> >>>> fabric'
> > > > >> >>>>>> experience instead of the mere "hadoop-acceleration".
> > > > >> >>>>>>
> > > > >> >>>>>> Cos
> > > > >> >>>>>>
> > > > >> >>>>>> On Mon, Nov 21, 2016 at 09:54AM, Denis Magda wrote:
> > > > >> >>>>>>
> > > > >> >>>>>> Val,
> > > > >> >>>>>>
> > > > >> >>>>>> Ignite Hadoop module includes not only the map-reduce
> > > accelerator
> > > > >> but
> > > > >> >>>>>>
> > > > >> >>>>>> Ignite
> > > > >> >>>>>>
> > > > >> >>>>>> Hadoop File System component as well. The latter can be
> used
> > in
> > > > >> >>>>>>
> > > > >> >>>>>> deployments
> > > > >> >>>>>>
> > > > >> >>>>>> like HDFS+IGFS+Ignite Spark + Spark.
> > > > >> >>>>>>
> > > > >> >>>>>> Considering this I’m for the second solution proposed by
> you:
> > > put
> > > > >> both
> > > > >> >>>>>>
> > > > >> >>>>>> 2.10
> > > > >> >>>>>>
> > > > >> >>>>>> and 2.11 ignite-spark modules under ‘optional’ folder of
> > Ignite
> > > > >> Hadoop
> > > > >> >>>>>> Accelerator distribution.
> > > > >> >>>>>> https://issues.apache.org/jira/browse/IGNITE-4254 <
> > > > >> >>>>>>
> > > > >> >>>>>> https://issues.apache.org/jira/browse/IGNITE-4254>
> > > > >> >>>>>>
> > > > >> >>>>>>
> > > > >> >>>>>> BTW, this task may be affected or related to the following
> > > ones:
> > > > >> >>>>>> https://issues.apache.org/jira/browse/IGNITE-3596 <
> > > > >> >>>>>>
> > > > >> >>>>>> https://issues.apache.org/jira/browse/IGNITE-3596>
> > > > >> >>>>>>
> > > > >> >>>>>> https://issues.apache.org/jira/browse/IGNITE-3822
> > > > >> >>>>>>
> > > > >> >>>>>> —
> > > > >> >>>>>> Denis
> > > > >> >>>>>>
> > > > >> >>>>>> On Nov 19, 2016, at 1:26 PM, Valentin Kulichenko <
> > > > >> >>>>>>
> > > > >> >>>>>> valentin.kulichenko@gmail.com> wrote:
> > > > >> >>>>>>
> > > > >> >>>>>>
> > > > >> >>>>>> Hadoop Accelerator is a plugin to Ignite and this plugin is
> > > used
> > > > by
> > > > >> >>>>>>
> > > > >> >>>>>> Hadoop
> > > > >> >>>>>>
> > > > >> >>>>>> when running its jobs. ignite-spark module only provides
> > > > IgniteRDD
> > > > >> >>>>>>
> > > > >> >>>>>> which
> > > > >> >>>>>>
> > > > >> >>>>>> Hadoop obviously will never use.
> > > > >> >>>>>>
> > > > >> >>>>>> Is there another use case for Hadoop Accelerator which I'm
> > > > missing?
> > > > >> >>>>>>
> > > > >> >>>>>> -Val
> > > > >> >>>>>>
> > > > >> >>>>>> On Sat, Nov 19, 2016 at 3:12 AM, Dmitriy Setrakyan <
> > > > >> >>>>>>
> > > > >> >>>>>> dsetrakyan@apache.org>
> > > > >> >>>>>>
> > > > >> >>>>>> wrote:
> > > > >> >>>>>>
> > > > >> >>>>>> Why do you think that spark module is not needed in our
> > hadoop
> > > > >> build?
> > > > >> >>>>>>
> > > > >> >>>>>> On Fri, Nov 18, 2016 at 5:44 PM, Valentin Kulichenko <
> > > > >> >>>>>> valentin.kulichenko@gmail.com> wrote:
> > > > >> >>>>>>
> > > > >> >>>>>> Folks,
> > > > >> >>>>>>
> > > > >> >>>>>> Is there anyone who understands the purpose of including
> > > > >> ignite-spark
> > > > >> >>>>>> module in the Hadoop Accelerator build? I can't figure out
> a
> > > use
> > > > >> >>>>>>
> > > > >> >>>>>> case for
> > > > >> >>>>>>
> > > > >> >>>>>> which it's needed.
> > > > >> >>>>>>
> > > > >> >>>>>> In case we actually need it there, there is an issue then.
> We
> > > > >> >>>>>>
> > > > >> >>>>>> actually
> > > > >> >>>>>>
> > > > >> >>>>>> have
> > > > >> >>>>>>
> > > > >> >>>>>> two ignite-spark modules, for 2.10 and 2.11. In Fabric
> build
> > > > >> >>>>>>
> > > > >> >>>>>> everything
> > > > >> >>>>>>
> > > > >> >>>>>> is
> > > > >> >>>>>>
> > > > >> >>>>>> good, we put both in 'optional' folder and user can enable
> > > either
> > > > >> >>>>>>
> > > > >> >>>>>> one.
> > > > >> >>>>>>
> > > > >> >>>>>> But
> > > > >> >>>>>>
> > > > >> >>>>>> in Hadoop Accelerator there is only 2.11 which means that
> the
> > > > build
> > > > >> >>>>>>
> > > > >> >>>>>> doesn't
> > > > >> >>>>>>
> > > > >> >>>>>> work with 2.10 out of the box.
> > > > >> >>>>>>
> > > > >> >>>>>> We should either remove the module from the build, or fix
> the
> > > > >> issue.
> > > > >> >>>>>>
> > > > >> >>>>>> -Val
> > > > >> >>>>>>
> > > > >> >>>>>>
> > > > >> >>>>>>
> > > > >> >>>>>>
> > > > >> >>>>>>
> > > > >> >>>>>>
> > > > >> >>>>>>
> > > > >> >>>>
> > > > >> >>>>
> > > > >> >>>
> > > > >> >>>
> > > > >> >>> --
> > > > >> >>> Sergey Kozlov
> > > > >> >>> GridGain Systems
> > > > >> >>> www.gridgain.com
> > > > >> >>
> > > > >> >>
> > > > >>
> > > > >>
> > > > >
> > > >
> > > >
> > > > --
> > > > Vladimir Ozerov
> > > > Senior Software Architect
> > > > GridGain Systems
> > > > www.gridgain.com
> > > > *+7 (960) 283 98 40*
> > > >
> > >
> > >
> > >
> > > --
> > > Sergey Kozlov
> > > GridGain Systems
> > > www.gridgain.com
> > >
> >
>
>
>
> --
> Vladimir Ozerov
> Senior Software Architect
> GridGain Systems
> www.gridgain.com
> *+7 (960) 283 98 40*
>



-- 
Sergey Kozlov
GridGain Systems
www.gridgain.com

Re: ignite-spark module in Hadoop Accelerator

Posted by Vladimir Ozerov <vo...@gridgain.com>.

Work for ourselves - is to maintain two separate editions, while everything
can be easily merged into a single distribution.

On Wed, Dec 7, 2016 at 3:29 AM, Dmitriy Setrakyan <ds...@apache.org>
wrote:

> Why are we creating work for ourselves? What is wrong with having 2
> downloads?
>
> Hadoop accelerator edition exists for the following 2 purposes only:
>
>    - accelerate HDFS with Ignite In-Memory File System (IGFS)
>    - accelerate Hadoop MapReduce with Ignite In-Memory MapReduce
>
> I agree with the original email from Valentin that Spark libs should not be
> included into hadoop-accelerator download. Spark integration is not part of
> Ignite Hadoop Accelerator and should be included only into the Ignite
> fabric download.
>
> D.
>
>
>
> On Tue, Dec 6, 2016 at 12:30 AM, Sergey Kozlov <sk...@gridgain.com>
> wrote:
>
> > Hi
> >
> > In general I agree with Vladimir but would suggest more technical
> details:
> >
> > Due the need to collect particular CLASS_PATHs for fabric and hadoop
> > editions we can change the logic of processing of libs directory
> >
> > 1. Introduce libs/hadoop and libs/fabric directories. These directories
> are
> > root directories for specific modules for hadoop and fabric
> > editions respectively
> > 2. Change collecting of directories for CLASS_PATH for ignite.sh:
> >  - collect everything for libs except libs/hadoop
> >  - collect everything from libs/fabric
> > 3. Add ignite-hadoop-accelerator.{sh|bat} script (also it may make
> initial
> > setup instead of setup-hadoop.sh) that constructs CLASS_PATH by following
> > way:
> >  - collect everything for libs except libs/fabirc
> >  - collect everything from libs/hadoop
> >
> > This approach allows us following:
> >  - share common modules across both editions (just put in libs)
> >  - do not share edition-specific modules (either put in libs/hadoop or in
> > libs/fabric)
> >
> >
> >
> >
> > On Mon, Dec 5, 2016 at 11:56 PM, Vladimir Ozerov <vo...@gridgain.com>
> > wrote:
> >
> > > Agree. I do not see any reasons to have two different products.
> Instead,
> > > just add ignite-hadoop.jar to distribution, and add separate script to
> > > start Accelerator. We can go the same way as we did for "platforms":
> > create
> > > separate top-level folder "hadoop" in Fabric distribution and put all
> > > realted Hadoop Acceleratro stuff there.
> > >
> > > On Fri, Dec 2, 2016 at 10:46 PM, Valentin Kulichenko <
> > > valentin.kulichenko@gmail.com> wrote:
> > >
> > > > In general, I don't quite understand why we should move any component
> > > > outside of Fabric. The concept of Fabric is to have everything, no?
> :)
> > In
> > > > other words, if a cluster was once setup for Hadoop Acceleration, why
> > not
> > > > allow to create a cache and/or run a task using native Ignite APIs
> > > sometime
> > > > later. We follow this approach with all our components and modules,
> but
> > > not
> > > > with ignite-hadoop for some reason.
> > > >
> > > > If we get rid of Hadoop Accelerator build, initial setup of Hadoop
> > > > integration can potentially become a bit more complicated, but with
> > > proper
> > > > documentation I don't think this is going to be a problem, because it
> > > > requires multiple steps now anyway. And frankly the same can be said
> > > about
> > > > any optional module we have - enabling it requires some additional
> > steps
> > > as
> > > > it doesn't work out of the box.
> > > >
> > > > -Val
> > > >
> > > > On Fri, Dec 2, 2016 at 11:38 AM, Denis Magda <dm...@apache.org>
> > wrote:
> > > >
> > > >> Dmitriy,
> > > >>
> > > >> >   - the "lib/" folder has much fewer libraries that in fabric,
> > simply
> > > >> >   becomes many dependencies don't make sense for hadoop
> environment
> > > >>
> > > >> This reason why the discussion moved to this direction is exactly in
> > > that.
> > > >>
> > > >> How do we decide what should be a part of Hadoop Accelerator and
> what
> > > >> should be excluded? If you read through Val and Cos comments below
> > > you’ll
> > > >> get more insights.
> > > >>
> > > >> In general, we need to have a clear understanding on what's Hadoop
> > > >> Accelerator distribution use case. This will help us to come up
> with a
> > > >> final decision.
> > > >>
> > > >> If the accelerator is supposed to be plugged-in into an existed
> Hadoop
> > > >> environment by enabling MapReduce and/IGFS at the configuration
> level
> > > then
> > > >> we should simply remove ignite-indexing, ignite-spark modules and
> add
> > > >> additional logging libs as well as AWS, GCE integrations’ packages.
> > > >>
> > > >> But, wait, what if a user wants to leverage from Ignite Spark
> > > >> Integration, Ignite SQL or Geospatial queries, Ignite streaming
> > > >> capabilities after he has already plugged-in the accelerator. What
> if
> > > he is
> > > >> ready to modify his existed code. He can’t simply switch to the
> fabric
> > > on
> > > >> an application side because the fabric doesn’t include accelerator’s
> > > libs
> > > >> that are still needed. He can’t solely rely on the accelerator
> > > distribution
> > > >> as well which misses some libs. And, obviously, the user starts
> > > shuffling
> > > >> libs in between the fabric and accelerator to get what is required.
> > > >>
> > > >> Vladimir, can you share your thoughts on this?
> > > >>
> > > >> —
> > > >> Denis
> > > >>
> > > >>
> > > >>
> > > >> > On Nov 30, 2016, at 11:18 PM, Dmitriy Setrakyan <
> > > dsetrakyan@apache.org>
> > > >> wrote:
> > > >> >
> > > >> > Guys,
> > > >> >
> > > >> > I just downloaded the hadoop accelerator and here are the
> > differences
> > > >> from
> > > >> > the fabric edition that jump at me right away:
> > > >> >
> > > >> >   - the "bin/" folder has "setup-hadoop" scripts
> > > >> >   - the "config/" folder has "hadoop" subfolder with necessary
> > > >> >   hadoop-related configuration
> > > >> >   - the "lib/" folder has much fewer libraries that in fabric,
> > simply
> > > >> >   becomes many dependencies don't make sense for hadoop
> environment
> > > >> >
> > > >> > I currently don't see how we can merge the hadoop accelerator with
> > > >> standard
> > > >> > fabric edition.
> > > >> >
> > > >> > D.
> > > >> >
> > > >> > On Thu, Dec 1, 2016 at 9:54 AM, Denis Magda <dm...@apache.org>
> > > wrote:
> > > >> >
> > > >> >> Vovan,
> > > >> >>
> > > >> >> As one of hadoop maintainers, please share your point of view on
> > > this.
> > > >> >>
> > > >> >> —
> > > >> >> Denis
> > > >> >>
> > > >> >>> On Nov 30, 2016, at 10:49 PM, Sergey Kozlov <
> skozlov@gridgain.com
> > >
> > > >> >> wrote:
> > > >> >>>
> > > >> >>> Denis
> > > >> >>>
> > > >> >>> I agree that at the moment there's no reason to split into
> fabric
> > > and
> > > >> >>> hadoop editions.
> > > >> >>>
> > > >> >>> On Thu, Dec 1, 2016 at 4:45 AM, Denis Magda <dm...@apache.org>
> > > >> wrote:
> > > >> >>>
> > > >> >>>> Hadoop Accelerator doesn’t require any additional libraries in
> > > >> compare
> > > >> >> to
> > > >> >>>> those we have in the fabric build. It only lacks some of them
> as
> > > Val
> > > >> >>>> mentioned below.
> > > >> >>>>
> > > >> >>>> Wouldn’t it better to discontinue Hadoop Accelerator edition
> and
> > > >> simply
> > > >> >>>> deliver hadoop jar and its configs as a part of the fabric?
> > > >> >>>>
> > > >> >>>> —
> > > >> >>>> Denis
> > > >> >>>>
> > > >> >>>>> On Nov 27, 2016, at 3:12 PM, Dmitriy Setrakyan <
> > > >> dsetrakyan@apache.org>
> > > >> >>>> wrote:
> > > >> >>>>>
> > > >> >>>>> Separate edition for the Hadoop Accelerator was primarily
> driven
> > > by
> > > >> the
> > > >> >>>>> default libraries. Hadoop Accelerator requires many more
> > libraries
> > > >> as
> > > >> >>>> well
> > > >> >>>>> as configuration settings compared to the standard fabric
> > > download.
> > > >> >>>>>
> > > >> >>>>> Now, as far as spark integration is concerned, I am not sure
> > which
> > > >> >>>> edition
> > > >> >>>>> it belongs in, Hadoop Accelerator or standard fabric.
> > > >> >>>>>
> > > >> >>>>> D.
> > > >> >>>>>
> > > >> >>>>> On Sat, Nov 26, 2016 at 7:39 PM, Denis Magda <
> dmagda@apache.org
> > >
> > > >> >> wrote:
> > > >> >>>>>
> > > >> >>>>>> *Dmitriy*,
> > > >> >>>>>>
> > > >> >>>>>> I do believe that you should know why the community decided
> to
> > a
> > > >> >>>> separate
> > > >> >>>>>> edition for the Hadoop Accelerator. What was the reason for
> > that?
> > > >> >>>>>> Presently, as I see, it brings more confusion and
> difficulties
> > > >> rather
> > > >> >>>> then
> > > >> >>>>>> benefit.
> > > >> >>>>>>
> > > >> >>>>>> —
> > > >> >>>>>> Denis
> > > >> >>>>>>
> > > >> >>>>>> On Nov 26, 2016, at 2:14 PM, Konstantin Boudnik <
> > cos@apache.org>
> > > >> >> wrote:
> > > >> >>>>>>
> > > >> >>>>>> In fact I am very much agree with you. Right now, running the
> > > >> >>>> "accelerator"
> > > >> >>>>>> component in Bigtop disto gives one a pretty much complete
> > fabric
> > > >> >>>> anyway.
> > > >> >>>>>> But
> > > >> >>>>>> in order to make just an accelerator component we perform
> > quite a
> > > >> bit
> > > >> >> of
> > > >> >>>>>> woodoo magic during the packaging stage of the Bigtop build,
> > > >> shuffling
> > > >> >>>> jars
> > > >> >>>>>> from here and there. And that's quite crazy, honestly ;)
> > > >> >>>>>>
> > > >> >>>>>> Cos
> > > >> >>>>>>
> > > >> >>>>>> On Mon, Nov 21, 2016 at 03:33PM, Valentin Kulichenko wrote:
> > > >> >>>>>>
> > > >> >>>>>> I tend to agree with Denis. I see only these differences
> > between
> > > >> >> Hadoop
> > > >> >>>>>> Accelerator and Fabric builds (correct me if I miss
> something):
> > > >> >>>>>>
> > > >> >>>>>> - Limited set of available modules and no optional modules in
> > > >> Hadoop
> > > >> >>>>>> Accelerator.
> > > >> >>>>>> - No ignite-hadoop module in Fabric.
> > > >> >>>>>> - Additional scripts, configs and instructions included in
> > Hadoop
> > > >> >>>>>> Accelerator.
> > > >> >>>>>>
> > > >> >>>>>> And the list of included modules frankly looks very weird.
> Here
> > > are
> > > >> >> only
> > > >> >>>>>> some of the issues I noticed:
> > > >> >>>>>>
> > > >> >>>>>> - ignite-indexing and ignite-spark are mandatory. Even if we
> > need
> > > >> them
> > > >> >>>>>> for Hadoop Acceleration (which I doubt), are they really
> > required
> > > >> or
> > > >> >>>> can
> > > >> >>>>>> be
> > > >> >>>>>> optional?
> > > >> >>>>>> - We force to use ignite-log4j module without providing other
> > > >> logger
> > > >> >>>>>> options (e.g., SLF).
> > > >> >>>>>> - We don't include ignite-aws module. How to use Hadoop
> > > Accelerator
> > > >> >>>> with
> > > >> >>>>>> S3 discovery?
> > > >> >>>>>> - Etc.
> > > >> >>>>>>
> > > >> >>>>>> It seems to me that if we try to fix all this issue, there
> will
> > > be
> > > >> >>>>>> virtually no difference between Fabric and Hadoop Accelerator
> > > >> builds
> > > >> >>>> except
> > > >> >>>>>> couple of scripts and config files. If so, there is no reason
> > to
> > > >> have
> > > >> >>>> two
> > > >> >>>>>> builds.
> > > >> >>>>>>
> > > >> >>>>>> -Val
> > > >> >>>>>>
> > > >> >>>>>> On Mon, Nov 21, 2016 at 3:13 PM, Denis Magda <
> > dmagda@apache.org>
> > > >> >> wrote:
> > > >> >>>>>>
> > > >> >>>>>> On the separate note, in the Bigtop, we start looking into
> > > changing
> > > >> >> the
> > > >> >>>>>>
> > > >> >>>>>> way we
> > > >> >>>>>>
> > > >> >>>>>> deliver Ignite and we'll likely to start offering the whole
> > 'data
> > > >> >>>> fabric'
> > > >> >>>>>> experience instead of the mere "hadoop-acceleration”.
> > > >> >>>>>>
> > > >> >>>>>>
> > > >> >>>>>> And you still will be using hadoop-accelerator libs of
> Ignite,
> > > >> right?
> > > >> >>>>>>
> > > >> >>>>>> I’m thinking of if there is a need to keep releasing Hadoop
> > > >> >> Accelerator
> > > >> >>>> as
> > > >> >>>>>> a separate delivery.
> > > >> >>>>>> What if we start releasing the accelerator as a part of the
> > > >> standard
> > > >> >>>>>> fabric binary putting hadoop-accelerator libs under
> ‘optional’
> > > >> folder?
> > > >> >>>>>>
> > > >> >>>>>> —
> > > >> >>>>>> Denis
> > > >> >>>>>>
> > > >> >>>>>> On Nov 21, 2016, at 12:19 PM, Konstantin Boudnik <
> > cos@apache.org
> > > >
> > > >> >>>> wrote:
> > > >> >>>>>>
> > > >> >>>>>> What Denis said: spark has been added to the Hadoop
> accelerator
> > > as
> > > >> a
> > > >> >> way
> > > >> >>>>>>
> > > >> >>>>>> to
> > > >> >>>>>>
> > > >> >>>>>> boost the performance of more than just MR compute of the
> > Hadoop
> > > >> >> stack,
> > > >> >>>>>>
> > > >> >>>>>> IIRC.
> > > >> >>>>>>
> > > >> >>>>>> For what it worth, Spark is considered a part of Hadoop at
> > large.
> > > >> >>>>>>
> > > >> >>>>>> On the separate note, in the Bigtop, we start looking into
> > > changing
> > > >> >> the
> > > >> >>>>>>
> > > >> >>>>>> way we
> > > >> >>>>>>
> > > >> >>>>>> deliver Ignite and we'll likely to start offering the whole
> > 'data
> > > >> >>>> fabric'
> > > >> >>>>>> experience instead of the mere "hadoop-acceleration".
> > > >> >>>>>>
> > > >> >>>>>> Cos
> > > >> >>>>>>
> > > >> >>>>>> On Mon, Nov 21, 2016 at 09:54AM, Denis Magda wrote:
> > > >> >>>>>>
> > > >> >>>>>> Val,
> > > >> >>>>>>
> > > >> >>>>>> Ignite Hadoop module includes not only the map-reduce
> > accelerator
> > > >> but
> > > >> >>>>>>
> > > >> >>>>>> Ignite
> > > >> >>>>>>
> > > >> >>>>>> Hadoop File System component as well. The latter can be used
> in
> > > >> >>>>>>
> > > >> >>>>>> deployments
> > > >> >>>>>>
> > > >> >>>>>> like HDFS+IGFS+Ignite Spark + Spark.
> > > >> >>>>>>
> > > >> >>>>>> Considering this I’m for the second solution proposed by you:
> > put
> > > >> both
> > > >> >>>>>>
> > > >> >>>>>> 2.10
> > > >> >>>>>>
> > > >> >>>>>> and 2.11 ignite-spark modules under ‘optional’ folder of
> Ignite
> > > >> Hadoop
> > > >> >>>>>> Accelerator distribution.
> > > >> >>>>>> https://issues.apache.org/jira/browse/IGNITE-4254 <
> > > >> >>>>>>
> > > >> >>>>>> https://issues.apache.org/jira/browse/IGNITE-4254>
> > > >> >>>>>>
> > > >> >>>>>>
> > > >> >>>>>> BTW, this task may be affected or related to the following
> > ones:
> > > >> >>>>>> https://issues.apache.org/jira/browse/IGNITE-3596 <
> > > >> >>>>>>
> > > >> >>>>>> https://issues.apache.org/jira/browse/IGNITE-3596>
> > > >> >>>>>>
> > > >> >>>>>> https://issues.apache.org/jira/browse/IGNITE-3822
> > > >> >>>>>>
> > > >> >>>>>> —
> > > >> >>>>>> Denis
> > > >> >>>>>>
> > > >> >>>>>> On Nov 19, 2016, at 1:26 PM, Valentin Kulichenko <
> > > >> >>>>>>
> > > >> >>>>>> valentin.kulichenko@gmail.com> wrote:
> > > >> >>>>>>
> > > >> >>>>>>
> > > >> >>>>>> Hadoop Accelerator is a plugin to Ignite and this plugin is
> > used
> > > by
> > > >> >>>>>>
> > > >> >>>>>> Hadoop
> > > >> >>>>>>
> > > >> >>>>>> when running its jobs. ignite-spark module only provides
> > > IgniteRDD
> > > >> >>>>>>
> > > >> >>>>>> which
> > > >> >>>>>>
> > > >> >>>>>> Hadoop obviously will never use.
> > > >> >>>>>>
> > > >> >>>>>> Is there another use case for Hadoop Accelerator which I'm
> > > missing?
> > > >> >>>>>>
> > > >> >>>>>> -Val
> > > >> >>>>>>
> > > >> >>>>>> On Sat, Nov 19, 2016 at 3:12 AM, Dmitriy Setrakyan <
> > > >> >>>>>>
> > > >> >>>>>> dsetrakyan@apache.org>
> > > >> >>>>>>
> > > >> >>>>>> wrote:
> > > >> >>>>>>
> > > >> >>>>>> Why do you think that spark module is not needed in our
> hadoop
> > > >> build?
> > > >> >>>>>>
> > > >> >>>>>> On Fri, Nov 18, 2016 at 5:44 PM, Valentin Kulichenko <
> > > >> >>>>>> valentin.kulichenko@gmail.com> wrote:
> > > >> >>>>>>
> > > >> >>>>>> Folks,
> > > >> >>>>>>
> > > >> >>>>>> Is there anyone who understands the purpose of including
> > > >> ignite-spark
> > > >> >>>>>> module in the Hadoop Accelerator build? I can't figure out a
> > use
> > > >> >>>>>>
> > > >> >>>>>> case for
> > > >> >>>>>>
> > > >> >>>>>> which it's needed.
> > > >> >>>>>>
> > > >> >>>>>> In case we actually need it there, there is an issue then. We
> > > >> >>>>>>
> > > >> >>>>>> actually
> > > >> >>>>>>
> > > >> >>>>>> have
> > > >> >>>>>>
> > > >> >>>>>> two ignite-spark modules, for 2.10 and 2.11. In Fabric build
> > > >> >>>>>>
> > > >> >>>>>> everything
> > > >> >>>>>>
> > > >> >>>>>> is
> > > >> >>>>>>
> > > >> >>>>>> good, we put both in 'optional' folder and user can enable
> > either
> > > >> >>>>>>
> > > >> >>>>>> one.
> > > >> >>>>>>
> > > >> >>>>>> But
> > > >> >>>>>>
> > > >> >>>>>> in Hadoop Accelerator there is only 2.11 which means that the
> > > build
> > > >> >>>>>>
> > > >> >>>>>> doesn't
> > > >> >>>>>>
> > > >> >>>>>> work with 2.10 out of the box.
> > > >> >>>>>>
> > > >> >>>>>> We should either remove the module from the build, or fix the
> > > >> issue.
> > > >> >>>>>>
> > > >> >>>>>> -Val
> > > >> >>>>>>
> > > >> >>>>>>
> > > >> >>>>>>
> > > >> >>>>>>
> > > >> >>>>>>
> > > >> >>>>>>
> > > >> >>>>>>
> > > >> >>>>
> > > >> >>>>
> > > >> >>>
> > > >> >>>
> > > >> >>> --
> > > >> >>> Sergey Kozlov
> > > >> >>> GridGain Systems
> > > >> >>> www.gridgain.com
> > > >> >>
> > > >> >>
> > > >>
> > > >>
> > > >
> > >
> > >
> > > --
> > > Vladimir Ozerov
> > > Senior Software Architect
> > > GridGain Systems
> > > www.gridgain.com
> > > *+7 (960) 283 98 40*
> > >
> >
> >
> >
> > --
> > Sergey Kozlov
> > GridGain Systems
> > www.gridgain.com
> >
>



-- 
Vladimir Ozerov
Senior Software Architect
GridGain Systems
www.gridgain.com
*+7 (960) 283 98 40*

Re: ignite-spark module in Hadoop Accelerator

Posted by Dmitriy Setrakyan <ds...@apache.org>.

Why are we creating work for ourselves? What is wrong with having 2
downloads?

Hadoop accelerator edition exists for the following 2 purposes only:

   - accelerate HDFS with Ignite In-Memory File System (IGFS)
   - accelerate Hadoop MapReduce with Ignite In-Memory MapReduce

I agree with the original email from Valentin that Spark libs should not be
included into hadoop-accelerator download. Spark integration is not part of
Ignite Hadoop Accelerator and should be included only into the Ignite
fabric download.

D.



On Tue, Dec 6, 2016 at 12:30 AM, Sergey Kozlov <sk...@gridgain.com> wrote:

> Hi
>
> In general I agree with Vladimir but would suggest more technical details:
>
> Due the need to collect particular CLASS_PATHs for fabric and hadoop
> editions we can change the logic of processing of libs directory
>
> 1. Introduce libs/hadoop and libs/fabric directories. These directories are
> root directories for specific modules for hadoop and fabric
> editions respectively
> 2. Change collecting of directories for CLASS_PATH for ignite.sh:
>  - collect everything for libs except libs/hadoop
>  - collect everything from libs/fabric
> 3. Add ignite-hadoop-accelerator.{sh|bat} script (also it may make initial
> setup instead of setup-hadoop.sh) that constructs CLASS_PATH by following
> way:
>  - collect everything for libs except libs/fabirc
>  - collect everything from libs/hadoop
>
> This approach allows us following:
>  - share common modules across both editions (just put in libs)
>  - do not share edition-specific modules (either put in libs/hadoop or in
> libs/fabric)
>
>
>
>
> On Mon, Dec 5, 2016 at 11:56 PM, Vladimir Ozerov <vo...@gridgain.com>
> wrote:
>
> > Agree. I do not see any reasons to have two different products. Instead,
> > just add ignite-hadoop.jar to distribution, and add separate script to
> > start Accelerator. We can go the same way as we did for "platforms":
> create
> > separate top-level folder "hadoop" in Fabric distribution and put all
> > realted Hadoop Acceleratro stuff there.
> >
> > On Fri, Dec 2, 2016 at 10:46 PM, Valentin Kulichenko <
> > valentin.kulichenko@gmail.com> wrote:
> >
> > > In general, I don't quite understand why we should move any component
> > > outside of Fabric. The concept of Fabric is to have everything, no? :)
> In
> > > other words, if a cluster was once setup for Hadoop Acceleration, why
> not
> > > allow to create a cache and/or run a task using native Ignite APIs
> > sometime
> > > later. We follow this approach with all our components and modules, but
> > not
> > > with ignite-hadoop for some reason.
> > >
> > > If we get rid of Hadoop Accelerator build, initial setup of Hadoop
> > > integration can potentially become a bit more complicated, but with
> > proper
> > > documentation I don't think this is going to be a problem, because it
> > > requires multiple steps now anyway. And frankly the same can be said
> > about
> > > any optional module we have - enabling it requires some additional
> steps
> > as
> > > it doesn't work out of the box.
> > >
> > > -Val
> > >
> > > On Fri, Dec 2, 2016 at 11:38 AM, Denis Magda <dm...@apache.org>
> wrote:
> > >
> > >> Dmitriy,
> > >>
> > >> >   - the "lib/" folder has much fewer libraries that in fabric,
> simply
> > >> >   becomes many dependencies don't make sense for hadoop environment
> > >>
> > >> This reason why the discussion moved to this direction is exactly in
> > that.
> > >>
> > >> How do we decide what should be a part of Hadoop Accelerator and what
> > >> should be excluded? If you read through Val and Cos comments below
> > you’ll
> > >> get more insights.
> > >>
> > >> In general, we need to have a clear understanding on what's Hadoop
> > >> Accelerator distribution use case. This will help us to come up with a
> > >> final decision.
> > >>
> > >> If the accelerator is supposed to be plugged-in into an existed Hadoop
> > >> environment by enabling MapReduce and/IGFS at the configuration level
> > then
> > >> we should simply remove ignite-indexing, ignite-spark modules and add
> > >> additional logging libs as well as AWS, GCE integrations’ packages.
> > >>
> > >> But, wait, what if a user wants to leverage from Ignite Spark
> > >> Integration, Ignite SQL or Geospatial queries, Ignite streaming
> > >> capabilities after he has already plugged-in the accelerator. What if
> > he is
> > >> ready to modify his existed code. He can’t simply switch to the fabric
> > on
> > >> an application side because the fabric doesn’t include accelerator’s
> > libs
> > >> that are still needed. He can’t solely rely on the accelerator
> > distribution
> > >> as well which misses some libs. And, obviously, the user starts
> > shuffling
> > >> libs in between the fabric and accelerator to get what is required.
> > >>
> > >> Vladimir, can you share your thoughts on this?
> > >>
> > >> —
> > >> Denis
> > >>
> > >>
> > >>
> > >> > On Nov 30, 2016, at 11:18 PM, Dmitriy Setrakyan <
> > dsetrakyan@apache.org>
> > >> wrote:
> > >> >
> > >> > Guys,
> > >> >
> > >> > I just downloaded the hadoop accelerator and here are the
> differences
> > >> from
> > >> > the fabric edition that jump at me right away:
> > >> >
> > >> >   - the "bin/" folder has "setup-hadoop" scripts
> > >> >   - the "config/" folder has "hadoop" subfolder with necessary
> > >> >   hadoop-related configuration
> > >> >   - the "lib/" folder has much fewer libraries that in fabric,
> simply
> > >> >   becomes many dependencies don't make sense for hadoop environment
> > >> >
> > >> > I currently don't see how we can merge the hadoop accelerator with
> > >> standard
> > >> > fabric edition.
> > >> >
> > >> > D.
> > >> >
> > >> > On Thu, Dec 1, 2016 at 9:54 AM, Denis Magda <dm...@apache.org>
> > wrote:
> > >> >
> > >> >> Vovan,
> > >> >>
> > >> >> As one of hadoop maintainers, please share your point of view on
> > this.
> > >> >>
> > >> >> —
> > >> >> Denis
> > >> >>
> > >> >>> On Nov 30, 2016, at 10:49 PM, Sergey Kozlov <skozlov@gridgain.com
> >
> > >> >> wrote:
> > >> >>>
> > >> >>> Denis
> > >> >>>
> > >> >>> I agree that at the moment there's no reason to split into fabric
> > and
> > >> >>> hadoop editions.
> > >> >>>
> > >> >>> On Thu, Dec 1, 2016 at 4:45 AM, Denis Magda <dm...@apache.org>
> > >> wrote:
> > >> >>>
> > >> >>>> Hadoop Accelerator doesn’t require any additional libraries in
> > >> compare
> > >> >> to
> > >> >>>> those we have in the fabric build. It only lacks some of them as
> > Val
> > >> >>>> mentioned below.
> > >> >>>>
> > >> >>>> Wouldn’t it better to discontinue Hadoop Accelerator edition and
> > >> simply
> > >> >>>> deliver hadoop jar and its configs as a part of the fabric?
> > >> >>>>
> > >> >>>> —
> > >> >>>> Denis
> > >> >>>>
> > >> >>>>> On Nov 27, 2016, at 3:12 PM, Dmitriy Setrakyan <
> > >> dsetrakyan@apache.org>
> > >> >>>> wrote:
> > >> >>>>>
> > >> >>>>> Separate edition for the Hadoop Accelerator was primarily driven
> > by
> > >> the
> > >> >>>>> default libraries. Hadoop Accelerator requires many more
> libraries
> > >> as
> > >> >>>> well
> > >> >>>>> as configuration settings compared to the standard fabric
> > download.
> > >> >>>>>
> > >> >>>>> Now, as far as spark integration is concerned, I am not sure
> which
> > >> >>>> edition
> > >> >>>>> it belongs in, Hadoop Accelerator or standard fabric.
> > >> >>>>>
> > >> >>>>> D.
> > >> >>>>>
> > >> >>>>> On Sat, Nov 26, 2016 at 7:39 PM, Denis Magda <dmagda@apache.org
> >
> > >> >> wrote:
> > >> >>>>>
> > >> >>>>>> *Dmitriy*,
> > >> >>>>>>
> > >> >>>>>> I do believe that you should know why the community decided to
> a
> > >> >>>> separate
> > >> >>>>>> edition for the Hadoop Accelerator. What was the reason for
> that?
> > >> >>>>>> Presently, as I see, it brings more confusion and difficulties
> > >> rather
> > >> >>>> then
> > >> >>>>>> benefit.
> > >> >>>>>>
> > >> >>>>>> —
> > >> >>>>>> Denis
> > >> >>>>>>
> > >> >>>>>> On Nov 26, 2016, at 2:14 PM, Konstantin Boudnik <
> cos@apache.org>
> > >> >> wrote:
> > >> >>>>>>
> > >> >>>>>> In fact I am very much agree with you. Right now, running the
> > >> >>>> "accelerator"
> > >> >>>>>> component in Bigtop disto gives one a pretty much complete
> fabric
> > >> >>>> anyway.
> > >> >>>>>> But
> > >> >>>>>> in order to make just an accelerator component we perform
> quite a
> > >> bit
> > >> >> of
> > >> >>>>>> woodoo magic during the packaging stage of the Bigtop build,
> > >> shuffling
> > >> >>>> jars
> > >> >>>>>> from here and there. And that's quite crazy, honestly ;)
> > >> >>>>>>
> > >> >>>>>> Cos
> > >> >>>>>>
> > >> >>>>>> On Mon, Nov 21, 2016 at 03:33PM, Valentin Kulichenko wrote:
> > >> >>>>>>
> > >> >>>>>> I tend to agree with Denis. I see only these differences
> between
> > >> >> Hadoop
> > >> >>>>>> Accelerator and Fabric builds (correct me if I miss something):
> > >> >>>>>>
> > >> >>>>>> - Limited set of available modules and no optional modules in
> > >> Hadoop
> > >> >>>>>> Accelerator.
> > >> >>>>>> - No ignite-hadoop module in Fabric.
> > >> >>>>>> - Additional scripts, configs and instructions included in
> Hadoop
> > >> >>>>>> Accelerator.
> > >> >>>>>>
> > >> >>>>>> And the list of included modules frankly looks very weird. Here
> > are
> > >> >> only
> > >> >>>>>> some of the issues I noticed:
> > >> >>>>>>
> > >> >>>>>> - ignite-indexing and ignite-spark are mandatory. Even if we
> need
> > >> them
> > >> >>>>>> for Hadoop Acceleration (which I doubt), are they really
> required
> > >> or
> > >> >>>> can
> > >> >>>>>> be
> > >> >>>>>> optional?
> > >> >>>>>> - We force to use ignite-log4j module without providing other
> > >> logger
> > >> >>>>>> options (e.g., SLF).
> > >> >>>>>> - We don't include ignite-aws module. How to use Hadoop
> > Accelerator
> > >> >>>> with
> > >> >>>>>> S3 discovery?
> > >> >>>>>> - Etc.
> > >> >>>>>>
> > >> >>>>>> It seems to me that if we try to fix all this issue, there will
> > be
> > >> >>>>>> virtually no difference between Fabric and Hadoop Accelerator
> > >> builds
> > >> >>>> except
> > >> >>>>>> couple of scripts and config files. If so, there is no reason
> to
> > >> have
> > >> >>>> two
> > >> >>>>>> builds.
> > >> >>>>>>
> > >> >>>>>> -Val
> > >> >>>>>>
> > >> >>>>>> On Mon, Nov 21, 2016 at 3:13 PM, Denis Magda <
> dmagda@apache.org>
> > >> >> wrote:
> > >> >>>>>>
> > >> >>>>>> On the separate note, in the Bigtop, we start looking into
> > changing
> > >> >> the
> > >> >>>>>>
> > >> >>>>>> way we
> > >> >>>>>>
> > >> >>>>>> deliver Ignite and we'll likely to start offering the whole
> 'data
> > >> >>>> fabric'
> > >> >>>>>> experience instead of the mere "hadoop-acceleration”.
> > >> >>>>>>
> > >> >>>>>>
> > >> >>>>>> And you still will be using hadoop-accelerator libs of Ignite,
> > >> right?
> > >> >>>>>>
> > >> >>>>>> I’m thinking of if there is a need to keep releasing Hadoop
> > >> >> Accelerator
> > >> >>>> as
> > >> >>>>>> a separate delivery.
> > >> >>>>>> What if we start releasing the accelerator as a part of the
> > >> standard
> > >> >>>>>> fabric binary putting hadoop-accelerator libs under ‘optional’
> > >> folder?
> > >> >>>>>>
> > >> >>>>>> —
> > >> >>>>>> Denis
> > >> >>>>>>
> > >> >>>>>> On Nov 21, 2016, at 12:19 PM, Konstantin Boudnik <
> cos@apache.org
> > >
> > >> >>>> wrote:
> > >> >>>>>>
> > >> >>>>>> What Denis said: spark has been added to the Hadoop accelerator
> > as
> > >> a
> > >> >> way
> > >> >>>>>>
> > >> >>>>>> to
> > >> >>>>>>
> > >> >>>>>> boost the performance of more than just MR compute of the
> Hadoop
> > >> >> stack,
> > >> >>>>>>
> > >> >>>>>> IIRC.
> > >> >>>>>>
> > >> >>>>>> For what it worth, Spark is considered a part of Hadoop at
> large.
> > >> >>>>>>
> > >> >>>>>> On the separate note, in the Bigtop, we start looking into
> > changing
> > >> >> the
> > >> >>>>>>
> > >> >>>>>> way we
> > >> >>>>>>
> > >> >>>>>> deliver Ignite and we'll likely to start offering the whole
> 'data
> > >> >>>> fabric'
> > >> >>>>>> experience instead of the mere "hadoop-acceleration".
> > >> >>>>>>
> > >> >>>>>> Cos
> > >> >>>>>>
> > >> >>>>>> On Mon, Nov 21, 2016 at 09:54AM, Denis Magda wrote:
> > >> >>>>>>
> > >> >>>>>> Val,
> > >> >>>>>>
> > >> >>>>>> Ignite Hadoop module includes not only the map-reduce
> accelerator
> > >> but
> > >> >>>>>>
> > >> >>>>>> Ignite
> > >> >>>>>>
> > >> >>>>>> Hadoop File System component as well. The latter can be used in
> > >> >>>>>>
> > >> >>>>>> deployments
> > >> >>>>>>
> > >> >>>>>> like HDFS+IGFS+Ignite Spark + Spark.
> > >> >>>>>>
> > >> >>>>>> Considering this I’m for the second solution proposed by you:
> put
> > >> both
> > >> >>>>>>
> > >> >>>>>> 2.10
> > >> >>>>>>
> > >> >>>>>> and 2.11 ignite-spark modules under ‘optional’ folder of Ignite
> > >> Hadoop
> > >> >>>>>> Accelerator distribution.
> > >> >>>>>> https://issues.apache.org/jira/browse/IGNITE-4254 <
> > >> >>>>>>
> > >> >>>>>> https://issues.apache.org/jira/browse/IGNITE-4254>
> > >> >>>>>>
> > >> >>>>>>
> > >> >>>>>> BTW, this task may be affected or related to the following
> ones:
> > >> >>>>>> https://issues.apache.org/jira/browse/IGNITE-3596 <
> > >> >>>>>>
> > >> >>>>>> https://issues.apache.org/jira/browse/IGNITE-3596>
> > >> >>>>>>
> > >> >>>>>> https://issues.apache.org/jira/browse/IGNITE-3822
> > >> >>>>>>
> > >> >>>>>> —
> > >> >>>>>> Denis
> > >> >>>>>>
> > >> >>>>>> On Nov 19, 2016, at 1:26 PM, Valentin Kulichenko <
> > >> >>>>>>
> > >> >>>>>> valentin.kulichenko@gmail.com> wrote:
> > >> >>>>>>
> > >> >>>>>>
> > >> >>>>>> Hadoop Accelerator is a plugin to Ignite and this plugin is
> used
> > by
> > >> >>>>>>
> > >> >>>>>> Hadoop
> > >> >>>>>>
> > >> >>>>>> when running its jobs. ignite-spark module only provides
> > IgniteRDD
> > >> >>>>>>
> > >> >>>>>> which
> > >> >>>>>>
> > >> >>>>>> Hadoop obviously will never use.
> > >> >>>>>>
> > >> >>>>>> Is there another use case for Hadoop Accelerator which I'm
> > missing?
> > >> >>>>>>
> > >> >>>>>> -Val
> > >> >>>>>>
> > >> >>>>>> On Sat, Nov 19, 2016 at 3:12 AM, Dmitriy Setrakyan <
> > >> >>>>>>
> > >> >>>>>> dsetrakyan@apache.org>
> > >> >>>>>>
> > >> >>>>>> wrote:
> > >> >>>>>>
> > >> >>>>>> Why do you think that spark module is not needed in our hadoop
> > >> build?
> > >> >>>>>>
> > >> >>>>>> On Fri, Nov 18, 2016 at 5:44 PM, Valentin Kulichenko <
> > >> >>>>>> valentin.kulichenko@gmail.com> wrote:
> > >> >>>>>>
> > >> >>>>>> Folks,
> > >> >>>>>>
> > >> >>>>>> Is there anyone who understands the purpose of including
> > >> ignite-spark
> > >> >>>>>> module in the Hadoop Accelerator build? I can't figure out a
> use
> > >> >>>>>>
> > >> >>>>>> case for
> > >> >>>>>>
> > >> >>>>>> which it's needed.
> > >> >>>>>>
> > >> >>>>>> In case we actually need it there, there is an issue then. We
> > >> >>>>>>
> > >> >>>>>> actually
> > >> >>>>>>
> > >> >>>>>> have
> > >> >>>>>>
> > >> >>>>>> two ignite-spark modules, for 2.10 and 2.11. In Fabric build
> > >> >>>>>>
> > >> >>>>>> everything
> > >> >>>>>>
> > >> >>>>>> is
> > >> >>>>>>
> > >> >>>>>> good, we put both in 'optional' folder and user can enable
> either
> > >> >>>>>>
> > >> >>>>>> one.
> > >> >>>>>>
> > >> >>>>>> But
> > >> >>>>>>
> > >> >>>>>> in Hadoop Accelerator there is only 2.11 which means that the
> > build
> > >> >>>>>>
> > >> >>>>>> doesn't
> > >> >>>>>>
> > >> >>>>>> work with 2.10 out of the box.
> > >> >>>>>>
> > >> >>>>>> We should either remove the module from the build, or fix the
> > >> issue.
> > >> >>>>>>
> > >> >>>>>> -Val
> > >> >>>>>>
> > >> >>>>>>
> > >> >>>>>>
> > >> >>>>>>
> > >> >>>>>>
> > >> >>>>>>
> > >> >>>>>>
> > >> >>>>
> > >> >>>>
> > >> >>>
> > >> >>>
> > >> >>> --
> > >> >>> Sergey Kozlov
> > >> >>> GridGain Systems
> > >> >>> www.gridgain.com
> > >> >>
> > >> >>
> > >>
> > >>
> > >
> >
> >
> > --
> > Vladimir Ozerov
> > Senior Software Architect
> > GridGain Systems
> > www.gridgain.com
> > *+7 (960) 283 98 40*
> >
>
>
>
> --
> Sergey Kozlov
> GridGain Systems
> www.gridgain.com
>

Re: ignite-spark module in Hadoop Accelerator

Posted by Sergey Kozlov <sk...@gridgain.com>.

Hi

In general I agree with Vladimir but would suggest more technical details:

Due the need to collect particular CLASS_PATHs for fabric and hadoop
editions we can change the logic of processing of libs directory

1. Introduce libs/hadoop and libs/fabric directories. These directories are
root directories for specific modules for hadoop and fabric
editions respectively
2. Change collecting of directories for CLASS_PATH for ignite.sh:
 - collect everything for libs except libs/hadoop
 - collect everything from libs/fabric
3. Add ignite-hadoop-accelerator.{sh|bat} script (also it may make initial
setup instead of setup-hadoop.sh) that constructs CLASS_PATH by following
way:
 - collect everything for libs except libs/fabirc
 - collect everything from libs/hadoop

This approach allows us following:
 - share common modules across both editions (just put in libs)
 - do not share edition-specific modules (either put in libs/hadoop or in
libs/fabric)




On Mon, Dec 5, 2016 at 11:56 PM, Vladimir Ozerov <vo...@gridgain.com>
wrote:

> Agree. I do not see any reasons to have two different products. Instead,
> just add ignite-hadoop.jar to distribution, and add separate script to
> start Accelerator. We can go the same way as we did for "platforms": create
> separate top-level folder "hadoop" in Fabric distribution and put all
> realted Hadoop Acceleratro stuff there.
>
> On Fri, Dec 2, 2016 at 10:46 PM, Valentin Kulichenko <
> valentin.kulichenko@gmail.com> wrote:
>
> > In general, I don't quite understand why we should move any component
> > outside of Fabric. The concept of Fabric is to have everything, no? :) In
> > other words, if a cluster was once setup for Hadoop Acceleration, why not
> > allow to create a cache and/or run a task using native Ignite APIs
> sometime
> > later. We follow this approach with all our components and modules, but
> not
> > with ignite-hadoop for some reason.
> >
> > If we get rid of Hadoop Accelerator build, initial setup of Hadoop
> > integration can potentially become a bit more complicated, but with
> proper
> > documentation I don't think this is going to be a problem, because it
> > requires multiple steps now anyway. And frankly the same can be said
> about
> > any optional module we have - enabling it requires some additional steps
> as
> > it doesn't work out of the box.
> >
> > -Val
> >
> > On Fri, Dec 2, 2016 at 11:38 AM, Denis Magda <dm...@apache.org> wrote:
> >
> >> Dmitriy,
> >>
> >> >   - the "lib/" folder has much fewer libraries that in fabric, simply
> >> >   becomes many dependencies don't make sense for hadoop environment
> >>
> >> This reason why the discussion moved to this direction is exactly in
> that.
> >>
> >> How do we decide what should be a part of Hadoop Accelerator and what
> >> should be excluded? If you read through Val and Cos comments below
> you’ll
> >> get more insights.
> >>
> >> In general, we need to have a clear understanding on what's Hadoop
> >> Accelerator distribution use case. This will help us to come up with a
> >> final decision.
> >>
> >> If the accelerator is supposed to be plugged-in into an existed Hadoop
> >> environment by enabling MapReduce and/IGFS at the configuration level
> then
> >> we should simply remove ignite-indexing, ignite-spark modules and add
> >> additional logging libs as well as AWS, GCE integrations’ packages.
> >>
> >> But, wait, what if a user wants to leverage from Ignite Spark
> >> Integration, Ignite SQL or Geospatial queries, Ignite streaming
> >> capabilities after he has already plugged-in the accelerator. What if
> he is
> >> ready to modify his existed code. He can’t simply switch to the fabric
> on
> >> an application side because the fabric doesn’t include accelerator’s
> libs
> >> that are still needed. He can’t solely rely on the accelerator
> distribution
> >> as well which misses some libs. And, obviously, the user starts
> shuffling
> >> libs in between the fabric and accelerator to get what is required.
> >>
> >> Vladimir, can you share your thoughts on this?
> >>
> >> —
> >> Denis
> >>
> >>
> >>
> >> > On Nov 30, 2016, at 11:18 PM, Dmitriy Setrakyan <
> dsetrakyan@apache.org>
> >> wrote:
> >> >
> >> > Guys,
> >> >
> >> > I just downloaded the hadoop accelerator and here are the differences
> >> from
> >> > the fabric edition that jump at me right away:
> >> >
> >> >   - the "bin/" folder has "setup-hadoop" scripts
> >> >   - the "config/" folder has "hadoop" subfolder with necessary
> >> >   hadoop-related configuration
> >> >   - the "lib/" folder has much fewer libraries that in fabric, simply
> >> >   becomes many dependencies don't make sense for hadoop environment
> >> >
> >> > I currently don't see how we can merge the hadoop accelerator with
> >> standard
> >> > fabric edition.
> >> >
> >> > D.
> >> >
> >> > On Thu, Dec 1, 2016 at 9:54 AM, Denis Magda <dm...@apache.org>
> wrote:
> >> >
> >> >> Vovan,
> >> >>
> >> >> As one of hadoop maintainers, please share your point of view on
> this.
> >> >>
> >> >> —
> >> >> Denis
> >> >>
> >> >>> On Nov 30, 2016, at 10:49 PM, Sergey Kozlov <sk...@gridgain.com>
> >> >> wrote:
> >> >>>
> >> >>> Denis
> >> >>>
> >> >>> I agree that at the moment there's no reason to split into fabric
> and
> >> >>> hadoop editions.
> >> >>>
> >> >>> On Thu, Dec 1, 2016 at 4:45 AM, Denis Magda <dm...@apache.org>
> >> wrote:
> >> >>>
> >> >>>> Hadoop Accelerator doesn’t require any additional libraries in
> >> compare
> >> >> to
> >> >>>> those we have in the fabric build. It only lacks some of them as
> Val
> >> >>>> mentioned below.
> >> >>>>
> >> >>>> Wouldn’t it better to discontinue Hadoop Accelerator edition and
> >> simply
> >> >>>> deliver hadoop jar and its configs as a part of the fabric?
> >> >>>>
> >> >>>> —
> >> >>>> Denis
> >> >>>>
> >> >>>>> On Nov 27, 2016, at 3:12 PM, Dmitriy Setrakyan <
> >> dsetrakyan@apache.org>
> >> >>>> wrote:
> >> >>>>>
> >> >>>>> Separate edition for the Hadoop Accelerator was primarily driven
> by
> >> the
> >> >>>>> default libraries. Hadoop Accelerator requires many more libraries
> >> as
> >> >>>> well
> >> >>>>> as configuration settings compared to the standard fabric
> download.
> >> >>>>>
> >> >>>>> Now, as far as spark integration is concerned, I am not sure which
> >> >>>> edition
> >> >>>>> it belongs in, Hadoop Accelerator or standard fabric.
> >> >>>>>
> >> >>>>> D.
> >> >>>>>
> >> >>>>> On Sat, Nov 26, 2016 at 7:39 PM, Denis Magda <dm...@apache.org>
> >> >> wrote:
> >> >>>>>
> >> >>>>>> *Dmitriy*,
> >> >>>>>>
> >> >>>>>> I do believe that you should know why the community decided to a
> >> >>>> separate
> >> >>>>>> edition for the Hadoop Accelerator. What was the reason for that?
> >> >>>>>> Presently, as I see, it brings more confusion and difficulties
> >> rather
> >> >>>> then
> >> >>>>>> benefit.
> >> >>>>>>
> >> >>>>>> —
> >> >>>>>> Denis
> >> >>>>>>
> >> >>>>>> On Nov 26, 2016, at 2:14 PM, Konstantin Boudnik <co...@apache.org>
> >> >> wrote:
> >> >>>>>>
> >> >>>>>> In fact I am very much agree with you. Right now, running the
> >> >>>> "accelerator"
> >> >>>>>> component in Bigtop disto gives one a pretty much complete fabric
> >> >>>> anyway.
> >> >>>>>> But
> >> >>>>>> in order to make just an accelerator component we perform quite a
> >> bit
> >> >> of
> >> >>>>>> woodoo magic during the packaging stage of the Bigtop build,
> >> shuffling
> >> >>>> jars
> >> >>>>>> from here and there. And that's quite crazy, honestly ;)
> >> >>>>>>
> >> >>>>>> Cos
> >> >>>>>>
> >> >>>>>> On Mon, Nov 21, 2016 at 03:33PM, Valentin Kulichenko wrote:
> >> >>>>>>
> >> >>>>>> I tend to agree with Denis. I see only these differences between
> >> >> Hadoop
> >> >>>>>> Accelerator and Fabric builds (correct me if I miss something):
> >> >>>>>>
> >> >>>>>> - Limited set of available modules and no optional modules in
> >> Hadoop
> >> >>>>>> Accelerator.
> >> >>>>>> - No ignite-hadoop module in Fabric.
> >> >>>>>> - Additional scripts, configs and instructions included in Hadoop
> >> >>>>>> Accelerator.
> >> >>>>>>
> >> >>>>>> And the list of included modules frankly looks very weird. Here
> are
> >> >> only
> >> >>>>>> some of the issues I noticed:
> >> >>>>>>
> >> >>>>>> - ignite-indexing and ignite-spark are mandatory. Even if we need
> >> them
> >> >>>>>> for Hadoop Acceleration (which I doubt), are they really required
> >> or
> >> >>>> can
> >> >>>>>> be
> >> >>>>>> optional?
> >> >>>>>> - We force to use ignite-log4j module without providing other
> >> logger
> >> >>>>>> options (e.g., SLF).
> >> >>>>>> - We don't include ignite-aws module. How to use Hadoop
> Accelerator
> >> >>>> with
> >> >>>>>> S3 discovery?
> >> >>>>>> - Etc.
> >> >>>>>>
> >> >>>>>> It seems to me that if we try to fix all this issue, there will
> be
> >> >>>>>> virtually no difference between Fabric and Hadoop Accelerator
> >> builds
> >> >>>> except
> >> >>>>>> couple of scripts and config files. If so, there is no reason to
> >> have
> >> >>>> two
> >> >>>>>> builds.
> >> >>>>>>
> >> >>>>>> -Val
> >> >>>>>>
> >> >>>>>> On Mon, Nov 21, 2016 at 3:13 PM, Denis Magda <dm...@apache.org>
> >> >> wrote:
> >> >>>>>>
> >> >>>>>> On the separate note, in the Bigtop, we start looking into
> changing
> >> >> the
> >> >>>>>>
> >> >>>>>> way we
> >> >>>>>>
> >> >>>>>> deliver Ignite and we'll likely to start offering the whole 'data
> >> >>>> fabric'
> >> >>>>>> experience instead of the mere "hadoop-acceleration”.
> >> >>>>>>
> >> >>>>>>
> >> >>>>>> And you still will be using hadoop-accelerator libs of Ignite,
> >> right?
> >> >>>>>>
> >> >>>>>> I’m thinking of if there is a need to keep releasing Hadoop
> >> >> Accelerator
> >> >>>> as
> >> >>>>>> a separate delivery.
> >> >>>>>> What if we start releasing the accelerator as a part of the
> >> standard
> >> >>>>>> fabric binary putting hadoop-accelerator libs under ‘optional’
> >> folder?
> >> >>>>>>
> >> >>>>>> —
> >> >>>>>> Denis
> >> >>>>>>
> >> >>>>>> On Nov 21, 2016, at 12:19 PM, Konstantin Boudnik <cos@apache.org
> >
> >> >>>> wrote:
> >> >>>>>>
> >> >>>>>> What Denis said: spark has been added to the Hadoop accelerator
> as
> >> a
> >> >> way
> >> >>>>>>
> >> >>>>>> to
> >> >>>>>>
> >> >>>>>> boost the performance of more than just MR compute of the Hadoop
> >> >> stack,
> >> >>>>>>
> >> >>>>>> IIRC.
> >> >>>>>>
> >> >>>>>> For what it worth, Spark is considered a part of Hadoop at large.
> >> >>>>>>
> >> >>>>>> On the separate note, in the Bigtop, we start looking into
> changing
> >> >> the
> >> >>>>>>
> >> >>>>>> way we
> >> >>>>>>
> >> >>>>>> deliver Ignite and we'll likely to start offering the whole 'data
> >> >>>> fabric'
> >> >>>>>> experience instead of the mere "hadoop-acceleration".
> >> >>>>>>
> >> >>>>>> Cos
> >> >>>>>>
> >> >>>>>> On Mon, Nov 21, 2016 at 09:54AM, Denis Magda wrote:
> >> >>>>>>
> >> >>>>>> Val,
> >> >>>>>>
> >> >>>>>> Ignite Hadoop module includes not only the map-reduce accelerator
> >> but
> >> >>>>>>
> >> >>>>>> Ignite
> >> >>>>>>
> >> >>>>>> Hadoop File System component as well. The latter can be used in
> >> >>>>>>
> >> >>>>>> deployments
> >> >>>>>>
> >> >>>>>> like HDFS+IGFS+Ignite Spark + Spark.
> >> >>>>>>
> >> >>>>>> Considering this I’m for the second solution proposed by you: put
> >> both
> >> >>>>>>
> >> >>>>>> 2.10
> >> >>>>>>
> >> >>>>>> and 2.11 ignite-spark modules under ‘optional’ folder of Ignite
> >> Hadoop
> >> >>>>>> Accelerator distribution.
> >> >>>>>> https://issues.apache.org/jira/browse/IGNITE-4254 <
> >> >>>>>>
> >> >>>>>> https://issues.apache.org/jira/browse/IGNITE-4254>
> >> >>>>>>
> >> >>>>>>
> >> >>>>>> BTW, this task may be affected or related to the following ones:
> >> >>>>>> https://issues.apache.org/jira/browse/IGNITE-3596 <
> >> >>>>>>
> >> >>>>>> https://issues.apache.org/jira/browse/IGNITE-3596>
> >> >>>>>>
> >> >>>>>> https://issues.apache.org/jira/browse/IGNITE-3822
> >> >>>>>>
> >> >>>>>> —
> >> >>>>>> Denis
> >> >>>>>>
> >> >>>>>> On Nov 19, 2016, at 1:26 PM, Valentin Kulichenko <
> >> >>>>>>
> >> >>>>>> valentin.kulichenko@gmail.com> wrote:
> >> >>>>>>
> >> >>>>>>
> >> >>>>>> Hadoop Accelerator is a plugin to Ignite and this plugin is used
> by
> >> >>>>>>
> >> >>>>>> Hadoop
> >> >>>>>>
> >> >>>>>> when running its jobs. ignite-spark module only provides
> IgniteRDD
> >> >>>>>>
> >> >>>>>> which
> >> >>>>>>
> >> >>>>>> Hadoop obviously will never use.
> >> >>>>>>
> >> >>>>>> Is there another use case for Hadoop Accelerator which I'm
> missing?
> >> >>>>>>
> >> >>>>>> -Val
> >> >>>>>>
> >> >>>>>> On Sat, Nov 19, 2016 at 3:12 AM, Dmitriy Setrakyan <
> >> >>>>>>
> >> >>>>>> dsetrakyan@apache.org>
> >> >>>>>>
> >> >>>>>> wrote:
> >> >>>>>>
> >> >>>>>> Why do you think that spark module is not needed in our hadoop
> >> build?
> >> >>>>>>
> >> >>>>>> On Fri, Nov 18, 2016 at 5:44 PM, Valentin Kulichenko <
> >> >>>>>> valentin.kulichenko@gmail.com> wrote:
> >> >>>>>>
> >> >>>>>> Folks,
> >> >>>>>>
> >> >>>>>> Is there anyone who understands the purpose of including
> >> ignite-spark
> >> >>>>>> module in the Hadoop Accelerator build? I can't figure out a use
> >> >>>>>>
> >> >>>>>> case for
> >> >>>>>>
> >> >>>>>> which it's needed.
> >> >>>>>>
> >> >>>>>> In case we actually need it there, there is an issue then. We
> >> >>>>>>
> >> >>>>>> actually
> >> >>>>>>
> >> >>>>>> have
> >> >>>>>>
> >> >>>>>> two ignite-spark modules, for 2.10 and 2.11. In Fabric build
> >> >>>>>>
> >> >>>>>> everything
> >> >>>>>>
> >> >>>>>> is
> >> >>>>>>
> >> >>>>>> good, we put both in 'optional' folder and user can enable either
> >> >>>>>>
> >> >>>>>> one.
> >> >>>>>>
> >> >>>>>> But
> >> >>>>>>
> >> >>>>>> in Hadoop Accelerator there is only 2.11 which means that the
> build
> >> >>>>>>
> >> >>>>>> doesn't
> >> >>>>>>
> >> >>>>>> work with 2.10 out of the box.
> >> >>>>>>
> >> >>>>>> We should either remove the module from the build, or fix the
> >> issue.
> >> >>>>>>
> >> >>>>>> -Val
> >> >>>>>>
> >> >>>>>>
> >> >>>>>>
> >> >>>>>>
> >> >>>>>>
> >> >>>>>>
> >> >>>>>>
> >> >>>>
> >> >>>>
> >> >>>
> >> >>>
> >> >>> --
> >> >>> Sergey Kozlov
> >> >>> GridGain Systems
> >> >>> www.gridgain.com
> >> >>
> >> >>
> >>
> >>
> >
>
>
> --
> Vladimir Ozerov
> Senior Software Architect
> GridGain Systems
> www.gridgain.com
> *+7 (960) 283 98 40*
>



-- 
Sergey Kozlov
GridGain Systems
www.gridgain.com

Re: ignite-spark module in Hadoop Accelerator

Posted by Vladimir Ozerov <vo...@gridgain.com>.

Agree. I do not see any reasons to have two different products. Instead,
just add ignite-hadoop.jar to distribution, and add separate script to
start Accelerator. We can go the same way as we did for "platforms": create
separate top-level folder "hadoop" in Fabric distribution and put all
realted Hadoop Acceleratro stuff there.

On Fri, Dec 2, 2016 at 10:46 PM, Valentin Kulichenko <
valentin.kulichenko@gmail.com> wrote:

> In general, I don't quite understand why we should move any component
> outside of Fabric. The concept of Fabric is to have everything, no? :) In
> other words, if a cluster was once setup for Hadoop Acceleration, why not
> allow to create a cache and/or run a task using native Ignite APIs sometime
> later. We follow this approach with all our components and modules, but not
> with ignite-hadoop for some reason.
>
> If we get rid of Hadoop Accelerator build, initial setup of Hadoop
> integration can potentially become a bit more complicated, but with proper
> documentation I don't think this is going to be a problem, because it
> requires multiple steps now anyway. And frankly the same can be said about
> any optional module we have - enabling it requires some additional steps as
> it doesn't work out of the box.
>
> -Val
>
> On Fri, Dec 2, 2016 at 11:38 AM, Denis Magda <dm...@apache.org> wrote:
>
>> Dmitriy,
>>
>> >   - the "lib/" folder has much fewer libraries that in fabric, simply
>> >   becomes many dependencies don't make sense for hadoop environment
>>
>> This reason why the discussion moved to this direction is exactly in that.
>>
>> How do we decide what should be a part of Hadoop Accelerator and what
>> should be excluded? If you read through Val and Cos comments below you’ll
>> get more insights.
>>
>> In general, we need to have a clear understanding on what's Hadoop
>> Accelerator distribution use case. This will help us to come up with a
>> final decision.
>>
>> If the accelerator is supposed to be plugged-in into an existed Hadoop
>> environment by enabling MapReduce and/IGFS at the configuration level then
>> we should simply remove ignite-indexing, ignite-spark modules and add
>> additional logging libs as well as AWS, GCE integrations’ packages.
>>
>> But, wait, what if a user wants to leverage from Ignite Spark
>> Integration, Ignite SQL or Geospatial queries, Ignite streaming
>> capabilities after he has already plugged-in the accelerator. What if he is
>> ready to modify his existed code. He can’t simply switch to the fabric on
>> an application side because the fabric doesn’t include accelerator’s libs
>> that are still needed. He can’t solely rely on the accelerator distribution
>> as well which misses some libs. And, obviously, the user starts shuffling
>> libs in between the fabric and accelerator to get what is required.
>>
>> Vladimir, can you share your thoughts on this?
>>
>> —
>> Denis
>>
>>
>>
>> > On Nov 30, 2016, at 11:18 PM, Dmitriy Setrakyan <ds...@apache.org>
>> wrote:
>> >
>> > Guys,
>> >
>> > I just downloaded the hadoop accelerator and here are the differences
>> from
>> > the fabric edition that jump at me right away:
>> >
>> >   - the "bin/" folder has "setup-hadoop" scripts
>> >   - the "config/" folder has "hadoop" subfolder with necessary
>> >   hadoop-related configuration
>> >   - the "lib/" folder has much fewer libraries that in fabric, simply
>> >   becomes many dependencies don't make sense for hadoop environment
>> >
>> > I currently don't see how we can merge the hadoop accelerator with
>> standard
>> > fabric edition.
>> >
>> > D.
>> >
>> > On Thu, Dec 1, 2016 at 9:54 AM, Denis Magda <dm...@apache.org> wrote:
>> >
>> >> Vovan,
>> >>
>> >> As one of hadoop maintainers, please share your point of view on this.
>> >>
>> >> —
>> >> Denis
>> >>
>> >>> On Nov 30, 2016, at 10:49 PM, Sergey Kozlov <sk...@gridgain.com>
>> >> wrote:
>> >>>
>> >>> Denis
>> >>>
>> >>> I agree that at the moment there's no reason to split into fabric and
>> >>> hadoop editions.
>> >>>
>> >>> On Thu, Dec 1, 2016 at 4:45 AM, Denis Magda <dm...@apache.org>
>> wrote:
>> >>>
>> >>>> Hadoop Accelerator doesn’t require any additional libraries in
>> compare
>> >> to
>> >>>> those we have in the fabric build. It only lacks some of them as Val
>> >>>> mentioned below.
>> >>>>
>> >>>> Wouldn’t it better to discontinue Hadoop Accelerator edition and
>> simply
>> >>>> deliver hadoop jar and its configs as a part of the fabric?
>> >>>>
>> >>>> —
>> >>>> Denis
>> >>>>
>> >>>>> On Nov 27, 2016, at 3:12 PM, Dmitriy Setrakyan <
>> dsetrakyan@apache.org>
>> >>>> wrote:
>> >>>>>
>> >>>>> Separate edition for the Hadoop Accelerator was primarily driven by
>> the
>> >>>>> default libraries. Hadoop Accelerator requires many more libraries
>> as
>> >>>> well
>> >>>>> as configuration settings compared to the standard fabric download.
>> >>>>>
>> >>>>> Now, as far as spark integration is concerned, I am not sure which
>> >>>> edition
>> >>>>> it belongs in, Hadoop Accelerator or standard fabric.
>> >>>>>
>> >>>>> D.
>> >>>>>
>> >>>>> On Sat, Nov 26, 2016 at 7:39 PM, Denis Magda <dm...@apache.org>
>> >> wrote:
>> >>>>>
>> >>>>>> *Dmitriy*,
>> >>>>>>
>> >>>>>> I do believe that you should know why the community decided to a
>> >>>> separate
>> >>>>>> edition for the Hadoop Accelerator. What was the reason for that?
>> >>>>>> Presently, as I see, it brings more confusion and difficulties
>> rather
>> >>>> then
>> >>>>>> benefit.
>> >>>>>>
>> >>>>>> —
>> >>>>>> Denis
>> >>>>>>
>> >>>>>> On Nov 26, 2016, at 2:14 PM, Konstantin Boudnik <co...@apache.org>
>> >> wrote:
>> >>>>>>
>> >>>>>> In fact I am very much agree with you. Right now, running the
>> >>>> "accelerator"
>> >>>>>> component in Bigtop disto gives one a pretty much complete fabric
>> >>>> anyway.
>> >>>>>> But
>> >>>>>> in order to make just an accelerator component we perform quite a
>> bit
>> >> of
>> >>>>>> woodoo magic during the packaging stage of the Bigtop build,
>> shuffling
>> >>>> jars
>> >>>>>> from here and there. And that's quite crazy, honestly ;)
>> >>>>>>
>> >>>>>> Cos
>> >>>>>>
>> >>>>>> On Mon, Nov 21, 2016 at 03:33PM, Valentin Kulichenko wrote:
>> >>>>>>
>> >>>>>> I tend to agree with Denis. I see only these differences between
>> >> Hadoop
>> >>>>>> Accelerator and Fabric builds (correct me if I miss something):
>> >>>>>>
>> >>>>>> - Limited set of available modules and no optional modules in
>> Hadoop
>> >>>>>> Accelerator.
>> >>>>>> - No ignite-hadoop module in Fabric.
>> >>>>>> - Additional scripts, configs and instructions included in Hadoop
>> >>>>>> Accelerator.
>> >>>>>>
>> >>>>>> And the list of included modules frankly looks very weird. Here are
>> >> only
>> >>>>>> some of the issues I noticed:
>> >>>>>>
>> >>>>>> - ignite-indexing and ignite-spark are mandatory. Even if we need
>> them
>> >>>>>> for Hadoop Acceleration (which I doubt), are they really required
>> or
>> >>>> can
>> >>>>>> be
>> >>>>>> optional?
>> >>>>>> - We force to use ignite-log4j module without providing other
>> logger
>> >>>>>> options (e.g., SLF).
>> >>>>>> - We don't include ignite-aws module. How to use Hadoop Accelerator
>> >>>> with
>> >>>>>> S3 discovery?
>> >>>>>> - Etc.
>> >>>>>>
>> >>>>>> It seems to me that if we try to fix all this issue, there will be
>> >>>>>> virtually no difference between Fabric and Hadoop Accelerator
>> builds
>> >>>> except
>> >>>>>> couple of scripts and config files. If so, there is no reason to
>> have
>> >>>> two
>> >>>>>> builds.
>> >>>>>>
>> >>>>>> -Val
>> >>>>>>
>> >>>>>> On Mon, Nov 21, 2016 at 3:13 PM, Denis Magda <dm...@apache.org>
>> >> wrote:
>> >>>>>>
>> >>>>>> On the separate note, in the Bigtop, we start looking into changing
>> >> the
>> >>>>>>
>> >>>>>> way we
>> >>>>>>
>> >>>>>> deliver Ignite and we'll likely to start offering the whole 'data
>> >>>> fabric'
>> >>>>>> experience instead of the mere "hadoop-acceleration”.
>> >>>>>>
>> >>>>>>
>> >>>>>> And you still will be using hadoop-accelerator libs of Ignite,
>> right?
>> >>>>>>
>> >>>>>> I’m thinking of if there is a need to keep releasing Hadoop
>> >> Accelerator
>> >>>> as
>> >>>>>> a separate delivery.
>> >>>>>> What if we start releasing the accelerator as a part of the
>> standard
>> >>>>>> fabric binary putting hadoop-accelerator libs under ‘optional’
>> folder?
>> >>>>>>
>> >>>>>> —
>> >>>>>> Denis
>> >>>>>>
>> >>>>>> On Nov 21, 2016, at 12:19 PM, Konstantin Boudnik <co...@apache.org>
>> >>>> wrote:
>> >>>>>>
>> >>>>>> What Denis said: spark has been added to the Hadoop accelerator as
>> a
>> >> way
>> >>>>>>
>> >>>>>> to
>> >>>>>>
>> >>>>>> boost the performance of more than just MR compute of the Hadoop
>> >> stack,
>> >>>>>>
>> >>>>>> IIRC.
>> >>>>>>
>> >>>>>> For what it worth, Spark is considered a part of Hadoop at large.
>> >>>>>>
>> >>>>>> On the separate note, in the Bigtop, we start looking into changing
>> >> the
>> >>>>>>
>> >>>>>> way we
>> >>>>>>
>> >>>>>> deliver Ignite and we'll likely to start offering the whole 'data
>> >>>> fabric'
>> >>>>>> experience instead of the mere "hadoop-acceleration".
>> >>>>>>
>> >>>>>> Cos
>> >>>>>>
>> >>>>>> On Mon, Nov 21, 2016 at 09:54AM, Denis Magda wrote:
>> >>>>>>
>> >>>>>> Val,
>> >>>>>>
>> >>>>>> Ignite Hadoop module includes not only the map-reduce accelerator
>> but
>> >>>>>>
>> >>>>>> Ignite
>> >>>>>>
>> >>>>>> Hadoop File System component as well. The latter can be used in
>> >>>>>>
>> >>>>>> deployments
>> >>>>>>
>> >>>>>> like HDFS+IGFS+Ignite Spark + Spark.
>> >>>>>>
>> >>>>>> Considering this I’m for the second solution proposed by you: put
>> both
>> >>>>>>
>> >>>>>> 2.10
>> >>>>>>
>> >>>>>> and 2.11 ignite-spark modules under ‘optional’ folder of Ignite
>> Hadoop
>> >>>>>> Accelerator distribution.
>> >>>>>> https://issues.apache.org/jira/browse/IGNITE-4254 <
>> >>>>>>
>> >>>>>> https://issues.apache.org/jira/browse/IGNITE-4254>
>> >>>>>>
>> >>>>>>
>> >>>>>> BTW, this task may be affected or related to the following ones:
>> >>>>>> https://issues.apache.org/jira/browse/IGNITE-3596 <
>> >>>>>>
>> >>>>>> https://issues.apache.org/jira/browse/IGNITE-3596>
>> >>>>>>
>> >>>>>> https://issues.apache.org/jira/browse/IGNITE-3822
>> >>>>>>
>> >>>>>> —
>> >>>>>> Denis
>> >>>>>>
>> >>>>>> On Nov 19, 2016, at 1:26 PM, Valentin Kulichenko <
>> >>>>>>
>> >>>>>> valentin.kulichenko@gmail.com> wrote:
>> >>>>>>
>> >>>>>>
>> >>>>>> Hadoop Accelerator is a plugin to Ignite and this plugin is used by
>> >>>>>>
>> >>>>>> Hadoop
>> >>>>>>
>> >>>>>> when running its jobs. ignite-spark module only provides IgniteRDD
>> >>>>>>
>> >>>>>> which
>> >>>>>>
>> >>>>>> Hadoop obviously will never use.
>> >>>>>>
>> >>>>>> Is there another use case for Hadoop Accelerator which I'm missing?
>> >>>>>>
>> >>>>>> -Val
>> >>>>>>
>> >>>>>> On Sat, Nov 19, 2016 at 3:12 AM, Dmitriy Setrakyan <
>> >>>>>>
>> >>>>>> dsetrakyan@apache.org>
>> >>>>>>
>> >>>>>> wrote:
>> >>>>>>
>> >>>>>> Why do you think that spark module is not needed in our hadoop
>> build?
>> >>>>>>
>> >>>>>> On Fri, Nov 18, 2016 at 5:44 PM, Valentin Kulichenko <
>> >>>>>> valentin.kulichenko@gmail.com> wrote:
>> >>>>>>
>> >>>>>> Folks,
>> >>>>>>
>> >>>>>> Is there anyone who understands the purpose of including
>> ignite-spark
>> >>>>>> module in the Hadoop Accelerator build? I can't figure out a use
>> >>>>>>
>> >>>>>> case for
>> >>>>>>
>> >>>>>> which it's needed.
>> >>>>>>
>> >>>>>> In case we actually need it there, there is an issue then. We
>> >>>>>>
>> >>>>>> actually
>> >>>>>>
>> >>>>>> have
>> >>>>>>
>> >>>>>> two ignite-spark modules, for 2.10 and 2.11. In Fabric build
>> >>>>>>
>> >>>>>> everything
>> >>>>>>
>> >>>>>> is
>> >>>>>>
>> >>>>>> good, we put both in 'optional' folder and user can enable either
>> >>>>>>
>> >>>>>> one.
>> >>>>>>
>> >>>>>> But
>> >>>>>>
>> >>>>>> in Hadoop Accelerator there is only 2.11 which means that the build
>> >>>>>>
>> >>>>>> doesn't
>> >>>>>>
>> >>>>>> work with 2.10 out of the box.
>> >>>>>>
>> >>>>>> We should either remove the module from the build, or fix the
>> issue.
>> >>>>>>
>> >>>>>> -Val
>> >>>>>>
>> >>>>>>
>> >>>>>>
>> >>>>>>
>> >>>>>>
>> >>>>>>
>> >>>>>>
>> >>>>
>> >>>>
>> >>>
>> >>>
>> >>> --
>> >>> Sergey Kozlov
>> >>> GridGain Systems
>> >>> www.gridgain.com
>> >>
>> >>
>>
>>
>


-- 
Vladimir Ozerov
Senior Software Architect
GridGain Systems
www.gridgain.com
*+7 (960) 283 98 40*

Re: ignite-spark module in Hadoop Accelerator

Posted by Valentin Kulichenko <va...@gmail.com>.

In general, I don't quite understand why we should move any component
outside of Fabric. The concept of Fabric is to have everything, no? :) In
other words, if a cluster was once setup for Hadoop Acceleration, why not
allow to create a cache and/or run a task using native Ignite APIs sometime
later. We follow this approach with all our components and modules, but not
with ignite-hadoop for some reason.

If we get rid of Hadoop Accelerator build, initial setup of Hadoop
integration can potentially become a bit more complicated, but with proper
documentation I don't think this is going to be a problem, because it
requires multiple steps now anyway. And frankly the same can be said about
any optional module we have - enabling it requires some additional steps as
it doesn't work out of the box.

-Val

On Fri, Dec 2, 2016 at 11:38 AM, Denis Magda <dm...@apache.org> wrote:

> Dmitriy,
>
> >   - the "lib/" folder has much fewer libraries that in fabric, simply
> >   becomes many dependencies don't make sense for hadoop environment
>
> This reason why the discussion moved to this direction is exactly in that.
>
> How do we decide what should be a part of Hadoop Accelerator and what
> should be excluded? If you read through Val and Cos comments below you’ll
> get more insights.
>
> In general, we need to have a clear understanding on what's Hadoop
> Accelerator distribution use case. This will help us to come up with a
> final decision.
>
> If the accelerator is supposed to be plugged-in into an existed Hadoop
> environment by enabling MapReduce and/IGFS at the configuration level then
> we should simply remove ignite-indexing, ignite-spark modules and add
> additional logging libs as well as AWS, GCE integrations’ packages.
>
> But, wait, what if a user wants to leverage from Ignite Spark Integration,
> Ignite SQL or Geospatial queries, Ignite streaming capabilities after he
> has already plugged-in the accelerator. What if he is ready to modify his
> existed code. He can’t simply switch to the fabric on an application side
> because the fabric doesn’t include accelerator’s libs that are still
> needed. He can’t solely rely on the accelerator distribution as well which
> misses some libs. And, obviously, the user starts shuffling libs in between
> the fabric and accelerator to get what is required.
>
> Vladimir, can you share your thoughts on this?
>
> —
> Denis
>
>
>
> > On Nov 30, 2016, at 11:18 PM, Dmitriy Setrakyan <ds...@apache.org>
> wrote:
> >
> > Guys,
> >
> > I just downloaded the hadoop accelerator and here are the differences
> from
> > the fabric edition that jump at me right away:
> >
> >   - the "bin/" folder has "setup-hadoop" scripts
> >   - the "config/" folder has "hadoop" subfolder with necessary
> >   hadoop-related configuration
> >   - the "lib/" folder has much fewer libraries that in fabric, simply
> >   becomes many dependencies don't make sense for hadoop environment
> >
> > I currently don't see how we can merge the hadoop accelerator with
> standard
> > fabric edition.
> >
> > D.
> >
> > On Thu, Dec 1, 2016 at 9:54 AM, Denis Magda <dm...@apache.org> wrote:
> >
> >> Vovan,
> >>
> >> As one of hadoop maintainers, please share your point of view on this.
> >>
> >> —
> >> Denis
> >>
> >>> On Nov 30, 2016, at 10:49 PM, Sergey Kozlov <sk...@gridgain.com>
> >> wrote:
> >>>
> >>> Denis
> >>>
> >>> I agree that at the moment there's no reason to split into fabric and
> >>> hadoop editions.
> >>>
> >>> On Thu, Dec 1, 2016 at 4:45 AM, Denis Magda <dm...@apache.org> wrote:
> >>>
> >>>> Hadoop Accelerator doesn’t require any additional libraries in compare
> >> to
> >>>> those we have in the fabric build. It only lacks some of them as Val
> >>>> mentioned below.
> >>>>
> >>>> Wouldn’t it better to discontinue Hadoop Accelerator edition and
> simply
> >>>> deliver hadoop jar and its configs as a part of the fabric?
> >>>>
> >>>> —
> >>>> Denis
> >>>>
> >>>>> On Nov 27, 2016, at 3:12 PM, Dmitriy Setrakyan <
> dsetrakyan@apache.org>
> >>>> wrote:
> >>>>>
> >>>>> Separate edition for the Hadoop Accelerator was primarily driven by
> the
> >>>>> default libraries. Hadoop Accelerator requires many more libraries as
> >>>> well
> >>>>> as configuration settings compared to the standard fabric download.
> >>>>>
> >>>>> Now, as far as spark integration is concerned, I am not sure which
> >>>> edition
> >>>>> it belongs in, Hadoop Accelerator or standard fabric.
> >>>>>
> >>>>> D.
> >>>>>
> >>>>> On Sat, Nov 26, 2016 at 7:39 PM, Denis Magda <dm...@apache.org>
> >> wrote:
> >>>>>
> >>>>>> *Dmitriy*,
> >>>>>>
> >>>>>> I do believe that you should know why the community decided to a
> >>>> separate
> >>>>>> edition for the Hadoop Accelerator. What was the reason for that?
> >>>>>> Presently, as I see, it brings more confusion and difficulties
> rather
> >>>> then
> >>>>>> benefit.
> >>>>>>
> >>>>>> —
> >>>>>> Denis
> >>>>>>
> >>>>>> On Nov 26, 2016, at 2:14 PM, Konstantin Boudnik <co...@apache.org>
> >> wrote:
> >>>>>>
> >>>>>> In fact I am very much agree with you. Right now, running the
> >>>> "accelerator"
> >>>>>> component in Bigtop disto gives one a pretty much complete fabric
> >>>> anyway.
> >>>>>> But
> >>>>>> in order to make just an accelerator component we perform quite a
> bit
> >> of
> >>>>>> woodoo magic during the packaging stage of the Bigtop build,
> shuffling
> >>>> jars
> >>>>>> from here and there. And that's quite crazy, honestly ;)
> >>>>>>
> >>>>>> Cos
> >>>>>>
> >>>>>> On Mon, Nov 21, 2016 at 03:33PM, Valentin Kulichenko wrote:
> >>>>>>
> >>>>>> I tend to agree with Denis. I see only these differences between
> >> Hadoop
> >>>>>> Accelerator and Fabric builds (correct me if I miss something):
> >>>>>>
> >>>>>> - Limited set of available modules and no optional modules in Hadoop
> >>>>>> Accelerator.
> >>>>>> - No ignite-hadoop module in Fabric.
> >>>>>> - Additional scripts, configs and instructions included in Hadoop
> >>>>>> Accelerator.
> >>>>>>
> >>>>>> And the list of included modules frankly looks very weird. Here are
> >> only
> >>>>>> some of the issues I noticed:
> >>>>>>
> >>>>>> - ignite-indexing and ignite-spark are mandatory. Even if we need
> them
> >>>>>> for Hadoop Acceleration (which I doubt), are they really required or
> >>>> can
> >>>>>> be
> >>>>>> optional?
> >>>>>> - We force to use ignite-log4j module without providing other logger
> >>>>>> options (e.g., SLF).
> >>>>>> - We don't include ignite-aws module. How to use Hadoop Accelerator
> >>>> with
> >>>>>> S3 discovery?
> >>>>>> - Etc.
> >>>>>>
> >>>>>> It seems to me that if we try to fix all this issue, there will be
> >>>>>> virtually no difference between Fabric and Hadoop Accelerator builds
> >>>> except
> >>>>>> couple of scripts and config files. If so, there is no reason to
> have
> >>>> two
> >>>>>> builds.
> >>>>>>
> >>>>>> -Val
> >>>>>>
> >>>>>> On Mon, Nov 21, 2016 at 3:13 PM, Denis Magda <dm...@apache.org>
> >> wrote:
> >>>>>>
> >>>>>> On the separate note, in the Bigtop, we start looking into changing
> >> the
> >>>>>>
> >>>>>> way we
> >>>>>>
> >>>>>> deliver Ignite and we'll likely to start offering the whole 'data
> >>>> fabric'
> >>>>>> experience instead of the mere "hadoop-acceleration”.
> >>>>>>
> >>>>>>
> >>>>>> And you still will be using hadoop-accelerator libs of Ignite,
> right?
> >>>>>>
> >>>>>> I’m thinking of if there is a need to keep releasing Hadoop
> >> Accelerator
> >>>> as
> >>>>>> a separate delivery.
> >>>>>> What if we start releasing the accelerator as a part of the standard
> >>>>>> fabric binary putting hadoop-accelerator libs under ‘optional’
> folder?
> >>>>>>
> >>>>>> —
> >>>>>> Denis
> >>>>>>
> >>>>>> On Nov 21, 2016, at 12:19 PM, Konstantin Boudnik <co...@apache.org>
> >>>> wrote:
> >>>>>>
> >>>>>> What Denis said: spark has been added to the Hadoop accelerator as a
> >> way
> >>>>>>
> >>>>>> to
> >>>>>>
> >>>>>> boost the performance of more than just MR compute of the Hadoop
> >> stack,
> >>>>>>
> >>>>>> IIRC.
> >>>>>>
> >>>>>> For what it worth, Spark is considered a part of Hadoop at large.
> >>>>>>
> >>>>>> On the separate note, in the Bigtop, we start looking into changing
> >> the
> >>>>>>
> >>>>>> way we
> >>>>>>
> >>>>>> deliver Ignite and we'll likely to start offering the whole 'data
> >>>> fabric'
> >>>>>> experience instead of the mere "hadoop-acceleration".
> >>>>>>
> >>>>>> Cos
> >>>>>>
> >>>>>> On Mon, Nov 21, 2016 at 09:54AM, Denis Magda wrote:
> >>>>>>
> >>>>>> Val,
> >>>>>>
> >>>>>> Ignite Hadoop module includes not only the map-reduce accelerator
> but
> >>>>>>
> >>>>>> Ignite
> >>>>>>
> >>>>>> Hadoop File System component as well. The latter can be used in
> >>>>>>
> >>>>>> deployments
> >>>>>>
> >>>>>> like HDFS+IGFS+Ignite Spark + Spark.
> >>>>>>
> >>>>>> Considering this I’m for the second solution proposed by you: put
> both
> >>>>>>
> >>>>>> 2.10
> >>>>>>
> >>>>>> and 2.11 ignite-spark modules under ‘optional’ folder of Ignite
> Hadoop
> >>>>>> Accelerator distribution.
> >>>>>> https://issues.apache.org/jira/browse/IGNITE-4254 <
> >>>>>>
> >>>>>> https://issues.apache.org/jira/browse/IGNITE-4254>
> >>>>>>
> >>>>>>
> >>>>>> BTW, this task may be affected or related to the following ones:
> >>>>>> https://issues.apache.org/jira/browse/IGNITE-3596 <
> >>>>>>
> >>>>>> https://issues.apache.org/jira/browse/IGNITE-3596>
> >>>>>>
> >>>>>> https://issues.apache.org/jira/browse/IGNITE-3822
> >>>>>>
> >>>>>> —
> >>>>>> Denis
> >>>>>>
> >>>>>> On Nov 19, 2016, at 1:26 PM, Valentin Kulichenko <
> >>>>>>
> >>>>>> valentin.kulichenko@gmail.com> wrote:
> >>>>>>
> >>>>>>
> >>>>>> Hadoop Accelerator is a plugin to Ignite and this plugin is used by
> >>>>>>
> >>>>>> Hadoop
> >>>>>>
> >>>>>> when running its jobs. ignite-spark module only provides IgniteRDD
> >>>>>>
> >>>>>> which
> >>>>>>
> >>>>>> Hadoop obviously will never use.
> >>>>>>
> >>>>>> Is there another use case for Hadoop Accelerator which I'm missing?
> >>>>>>
> >>>>>> -Val
> >>>>>>
> >>>>>> On Sat, Nov 19, 2016 at 3:12 AM, Dmitriy Setrakyan <
> >>>>>>
> >>>>>> dsetrakyan@apache.org>
> >>>>>>
> >>>>>> wrote:
> >>>>>>
> >>>>>> Why do you think that spark module is not needed in our hadoop
> build?
> >>>>>>
> >>>>>> On Fri, Nov 18, 2016 at 5:44 PM, Valentin Kulichenko <
> >>>>>> valentin.kulichenko@gmail.com> wrote:
> >>>>>>
> >>>>>> Folks,
> >>>>>>
> >>>>>> Is there anyone who understands the purpose of including
> ignite-spark
> >>>>>> module in the Hadoop Accelerator build? I can't figure out a use
> >>>>>>
> >>>>>> case for
> >>>>>>
> >>>>>> which it's needed.
> >>>>>>
> >>>>>> In case we actually need it there, there is an issue then. We
> >>>>>>
> >>>>>> actually
> >>>>>>
> >>>>>> have
> >>>>>>
> >>>>>> two ignite-spark modules, for 2.10 and 2.11. In Fabric build
> >>>>>>
> >>>>>> everything
> >>>>>>
> >>>>>> is
> >>>>>>
> >>>>>> good, we put both in 'optional' folder and user can enable either
> >>>>>>
> >>>>>> one.
> >>>>>>
> >>>>>> But
> >>>>>>
> >>>>>> in Hadoop Accelerator there is only 2.11 which means that the build
> >>>>>>
> >>>>>> doesn't
> >>>>>>
> >>>>>> work with 2.10 out of the box.
> >>>>>>
> >>>>>> We should either remove the module from the build, or fix the issue.
> >>>>>>
> >>>>>> -Val
> >>>>>>
> >>>>>>
> >>>>>>
> >>>>>>
> >>>>>>
> >>>>>>
> >>>>>>
> >>>>
> >>>>
> >>>
> >>>
> >>> --
> >>> Sergey Kozlov
> >>> GridGain Systems
> >>> www.gridgain.com
> >>
> >>
>
>

Re: ignite-spark module in Hadoop Accelerator

Posted by Denis Magda <dm...@apache.org>.

Dmitriy,

>   - the "lib/" folder has much fewer libraries that in fabric, simply
>   becomes many dependencies don't make sense for hadoop environment

This reason why the discussion moved to this direction is exactly in that.

How do we decide what should be a part of Hadoop Accelerator and what should be excluded? If you read through Val and Cos comments below you’ll get more insights.

In general, we need to have a clear understanding on what's Hadoop Accelerator distribution use case. This will help us to come up with a final decision. 

If the accelerator is supposed to be plugged-in into an existed Hadoop environment by enabling MapReduce and/IGFS at the configuration level then we should simply remove ignite-indexing, ignite-spark modules and add additional logging libs as well as AWS, GCE integrations’ packages.

But, wait, what if a user wants to leverage from Ignite Spark Integration, Ignite SQL or Geospatial queries, Ignite streaming capabilities after he has already plugged-in the accelerator. What if he is ready to modify his existed code. He can’t simply switch to the fabric on an application side because the fabric doesn’t include accelerator’s libs that are still needed. He can’t solely rely on the accelerator distribution as well which misses some libs. And, obviously, the user starts shuffling libs in between the fabric and accelerator to get what is required.

Vladimir, can you share your thoughts on this?

—
Denis  



> On Nov 30, 2016, at 11:18 PM, Dmitriy Setrakyan <ds...@apache.org> wrote:
> 
> Guys,
> 
> I just downloaded the hadoop accelerator and here are the differences from
> the fabric edition that jump at me right away:
> 
>   - the "bin/" folder has "setup-hadoop" scripts
>   - the "config/" folder has "hadoop" subfolder with necessary
>   hadoop-related configuration
>   - the "lib/" folder has much fewer libraries that in fabric, simply
>   becomes many dependencies don't make sense for hadoop environment
> 
> I currently don't see how we can merge the hadoop accelerator with standard
> fabric edition.
> 
> D.
> 
> On Thu, Dec 1, 2016 at 9:54 AM, Denis Magda <dm...@apache.org> wrote:
> 
>> Vovan,
>> 
>> As one of hadoop maintainers, please share your point of view on this.
>> 
>> —
>> Denis
>> 
>>> On Nov 30, 2016, at 10:49 PM, Sergey Kozlov <sk...@gridgain.com>
>> wrote:
>>> 
>>> Denis
>>> 
>>> I agree that at the moment there's no reason to split into fabric and
>>> hadoop editions.
>>> 
>>> On Thu, Dec 1, 2016 at 4:45 AM, Denis Magda <dm...@apache.org> wrote:
>>> 
>>>> Hadoop Accelerator doesn’t require any additional libraries in compare
>> to
>>>> those we have in the fabric build. It only lacks some of them as Val
>>>> mentioned below.
>>>> 
>>>> Wouldn’t it better to discontinue Hadoop Accelerator edition and simply
>>>> deliver hadoop jar and its configs as a part of the fabric?
>>>> 
>>>> —
>>>> Denis
>>>> 
>>>>> On Nov 27, 2016, at 3:12 PM, Dmitriy Setrakyan <ds...@apache.org>
>>>> wrote:
>>>>> 
>>>>> Separate edition for the Hadoop Accelerator was primarily driven by the
>>>>> default libraries. Hadoop Accelerator requires many more libraries as
>>>> well
>>>>> as configuration settings compared to the standard fabric download.
>>>>> 
>>>>> Now, as far as spark integration is concerned, I am not sure which
>>>> edition
>>>>> it belongs in, Hadoop Accelerator or standard fabric.
>>>>> 
>>>>> D.
>>>>> 
>>>>> On Sat, Nov 26, 2016 at 7:39 PM, Denis Magda <dm...@apache.org>
>> wrote:
>>>>> 
>>>>>> *Dmitriy*,
>>>>>> 
>>>>>> I do believe that you should know why the community decided to a
>>>> separate
>>>>>> edition for the Hadoop Accelerator. What was the reason for that?
>>>>>> Presently, as I see, it brings more confusion and difficulties rather
>>>> then
>>>>>> benefit.
>>>>>> 
>>>>>> —
>>>>>> Denis
>>>>>> 
>>>>>> On Nov 26, 2016, at 2:14 PM, Konstantin Boudnik <co...@apache.org>
>> wrote:
>>>>>> 
>>>>>> In fact I am very much agree with you. Right now, running the
>>>> "accelerator"
>>>>>> component in Bigtop disto gives one a pretty much complete fabric
>>>> anyway.
>>>>>> But
>>>>>> in order to make just an accelerator component we perform quite a bit
>> of
>>>>>> woodoo magic during the packaging stage of the Bigtop build, shuffling
>>>> jars
>>>>>> from here and there. And that's quite crazy, honestly ;)
>>>>>> 
>>>>>> Cos
>>>>>> 
>>>>>> On Mon, Nov 21, 2016 at 03:33PM, Valentin Kulichenko wrote:
>>>>>> 
>>>>>> I tend to agree with Denis. I see only these differences between
>> Hadoop
>>>>>> Accelerator and Fabric builds (correct me if I miss something):
>>>>>> 
>>>>>> - Limited set of available modules and no optional modules in Hadoop
>>>>>> Accelerator.
>>>>>> - No ignite-hadoop module in Fabric.
>>>>>> - Additional scripts, configs and instructions included in Hadoop
>>>>>> Accelerator.
>>>>>> 
>>>>>> And the list of included modules frankly looks very weird. Here are
>> only
>>>>>> some of the issues I noticed:
>>>>>> 
>>>>>> - ignite-indexing and ignite-spark are mandatory. Even if we need them
>>>>>> for Hadoop Acceleration (which I doubt), are they really required or
>>>> can
>>>>>> be
>>>>>> optional?
>>>>>> - We force to use ignite-log4j module without providing other logger
>>>>>> options (e.g., SLF).
>>>>>> - We don't include ignite-aws module. How to use Hadoop Accelerator
>>>> with
>>>>>> S3 discovery?
>>>>>> - Etc.
>>>>>> 
>>>>>> It seems to me that if we try to fix all this issue, there will be
>>>>>> virtually no difference between Fabric and Hadoop Accelerator builds
>>>> except
>>>>>> couple of scripts and config files. If so, there is no reason to have
>>>> two
>>>>>> builds.
>>>>>> 
>>>>>> -Val
>>>>>> 
>>>>>> On Mon, Nov 21, 2016 at 3:13 PM, Denis Magda <dm...@apache.org>
>> wrote:
>>>>>> 
>>>>>> On the separate note, in the Bigtop, we start looking into changing
>> the
>>>>>> 
>>>>>> way we
>>>>>> 
>>>>>> deliver Ignite and we'll likely to start offering the whole 'data
>>>> fabric'
>>>>>> experience instead of the mere "hadoop-acceleration”.
>>>>>> 
>>>>>> 
>>>>>> And you still will be using hadoop-accelerator libs of Ignite, right?
>>>>>> 
>>>>>> I’m thinking of if there is a need to keep releasing Hadoop
>> Accelerator
>>>> as
>>>>>> a separate delivery.
>>>>>> What if we start releasing the accelerator as a part of the standard
>>>>>> fabric binary putting hadoop-accelerator libs under ‘optional’ folder?
>>>>>> 
>>>>>> —
>>>>>> Denis
>>>>>> 
>>>>>> On Nov 21, 2016, at 12:19 PM, Konstantin Boudnik <co...@apache.org>
>>>> wrote:
>>>>>> 
>>>>>> What Denis said: spark has been added to the Hadoop accelerator as a
>> way
>>>>>> 
>>>>>> to
>>>>>> 
>>>>>> boost the performance of more than just MR compute of the Hadoop
>> stack,
>>>>>> 
>>>>>> IIRC.
>>>>>> 
>>>>>> For what it worth, Spark is considered a part of Hadoop at large.
>>>>>> 
>>>>>> On the separate note, in the Bigtop, we start looking into changing
>> the
>>>>>> 
>>>>>> way we
>>>>>> 
>>>>>> deliver Ignite and we'll likely to start offering the whole 'data
>>>> fabric'
>>>>>> experience instead of the mere "hadoop-acceleration".
>>>>>> 
>>>>>> Cos
>>>>>> 
>>>>>> On Mon, Nov 21, 2016 at 09:54AM, Denis Magda wrote:
>>>>>> 
>>>>>> Val,
>>>>>> 
>>>>>> Ignite Hadoop module includes not only the map-reduce accelerator but
>>>>>> 
>>>>>> Ignite
>>>>>> 
>>>>>> Hadoop File System component as well. The latter can be used in
>>>>>> 
>>>>>> deployments
>>>>>> 
>>>>>> like HDFS+IGFS+Ignite Spark + Spark.
>>>>>> 
>>>>>> Considering this I’m for the second solution proposed by you: put both
>>>>>> 
>>>>>> 2.10
>>>>>> 
>>>>>> and 2.11 ignite-spark modules under ‘optional’ folder of Ignite Hadoop
>>>>>> Accelerator distribution.
>>>>>> https://issues.apache.org/jira/browse/IGNITE-4254 <
>>>>>> 
>>>>>> https://issues.apache.org/jira/browse/IGNITE-4254>
>>>>>> 
>>>>>> 
>>>>>> BTW, this task may be affected or related to the following ones:
>>>>>> https://issues.apache.org/jira/browse/IGNITE-3596 <
>>>>>> 
>>>>>> https://issues.apache.org/jira/browse/IGNITE-3596>
>>>>>> 
>>>>>> https://issues.apache.org/jira/browse/IGNITE-3822
>>>>>> 
>>>>>> —
>>>>>> Denis
>>>>>> 
>>>>>> On Nov 19, 2016, at 1:26 PM, Valentin Kulichenko <
>>>>>> 
>>>>>> valentin.kulichenko@gmail.com> wrote:
>>>>>> 
>>>>>> 
>>>>>> Hadoop Accelerator is a plugin to Ignite and this plugin is used by
>>>>>> 
>>>>>> Hadoop
>>>>>> 
>>>>>> when running its jobs. ignite-spark module only provides IgniteRDD
>>>>>> 
>>>>>> which
>>>>>> 
>>>>>> Hadoop obviously will never use.
>>>>>> 
>>>>>> Is there another use case for Hadoop Accelerator which I'm missing?
>>>>>> 
>>>>>> -Val
>>>>>> 
>>>>>> On Sat, Nov 19, 2016 at 3:12 AM, Dmitriy Setrakyan <
>>>>>> 
>>>>>> dsetrakyan@apache.org>
>>>>>> 
>>>>>> wrote:
>>>>>> 
>>>>>> Why do you think that spark module is not needed in our hadoop build?
>>>>>> 
>>>>>> On Fri, Nov 18, 2016 at 5:44 PM, Valentin Kulichenko <
>>>>>> valentin.kulichenko@gmail.com> wrote:
>>>>>> 
>>>>>> Folks,
>>>>>> 
>>>>>> Is there anyone who understands the purpose of including ignite-spark
>>>>>> module in the Hadoop Accelerator build? I can't figure out a use
>>>>>> 
>>>>>> case for
>>>>>> 
>>>>>> which it's needed.
>>>>>> 
>>>>>> In case we actually need it there, there is an issue then. We
>>>>>> 
>>>>>> actually
>>>>>> 
>>>>>> have
>>>>>> 
>>>>>> two ignite-spark modules, for 2.10 and 2.11. In Fabric build
>>>>>> 
>>>>>> everything
>>>>>> 
>>>>>> is
>>>>>> 
>>>>>> good, we put both in 'optional' folder and user can enable either
>>>>>> 
>>>>>> one.
>>>>>> 
>>>>>> But
>>>>>> 
>>>>>> in Hadoop Accelerator there is only 2.11 which means that the build
>>>>>> 
>>>>>> doesn't
>>>>>> 
>>>>>> work with 2.10 out of the box.
>>>>>> 
>>>>>> We should either remove the module from the build, or fix the issue.
>>>>>> 
>>>>>> -Val
>>>>>> 
>>>>>> 
>>>>>> 
>>>>>> 
>>>>>> 
>>>>>> 
>>>>>> 
>>>> 
>>>> 
>>> 
>>> 
>>> --
>>> Sergey Kozlov
>>> GridGain Systems
>>> www.gridgain.com
>> 
>>

Re: ignite-spark module in Hadoop Accelerator

Posted by Dmitriy Setrakyan <ds...@apache.org>.

Guys,

I just downloaded the hadoop accelerator and here are the differences from
the fabric edition that jump at me right away:

   - the "bin/" folder has "setup-hadoop" scripts
   - the "config/" folder has "hadoop" subfolder with necessary
   hadoop-related configuration
   - the "lib/" folder has much fewer libraries that in fabric, simply
   becomes many dependencies don't make sense for hadoop environment

I currently don't see how we can merge the hadoop accelerator with standard
fabric edition.

D.

On Thu, Dec 1, 2016 at 9:54 AM, Denis Magda <dm...@apache.org> wrote:

> Vovan,
>
> As one of hadoop maintainers, please share your point of view on this.
>
> —
> Denis
>
> > On Nov 30, 2016, at 10:49 PM, Sergey Kozlov <sk...@gridgain.com>
> wrote:
> >
> > Denis
> >
> > I agree that at the moment there's no reason to split into fabric and
> > hadoop editions.
> >
> > On Thu, Dec 1, 2016 at 4:45 AM, Denis Magda <dm...@apache.org> wrote:
> >
> >> Hadoop Accelerator doesn’t require any additional libraries in compare
> to
> >> those we have in the fabric build. It only lacks some of them as Val
> >> mentioned below.
> >>
> >> Wouldn’t it better to discontinue Hadoop Accelerator edition and simply
> >> deliver hadoop jar and its configs as a part of the fabric?
> >>
> >> —
> >> Denis
> >>
> >>> On Nov 27, 2016, at 3:12 PM, Dmitriy Setrakyan <ds...@apache.org>
> >> wrote:
> >>>
> >>> Separate edition for the Hadoop Accelerator was primarily driven by the
> >>> default libraries. Hadoop Accelerator requires many more libraries as
> >> well
> >>> as configuration settings compared to the standard fabric download.
> >>>
> >>> Now, as far as spark integration is concerned, I am not sure which
> >> edition
> >>> it belongs in, Hadoop Accelerator or standard fabric.
> >>>
> >>> D.
> >>>
> >>> On Sat, Nov 26, 2016 at 7:39 PM, Denis Magda <dm...@apache.org>
> wrote:
> >>>
> >>>> *Dmitriy*,
> >>>>
> >>>> I do believe that you should know why the community decided to a
> >> separate
> >>>> edition for the Hadoop Accelerator. What was the reason for that?
> >>>> Presently, as I see, it brings more confusion and difficulties rather
> >> then
> >>>> benefit.
> >>>>
> >>>> —
> >>>> Denis
> >>>>
> >>>> On Nov 26, 2016, at 2:14 PM, Konstantin Boudnik <co...@apache.org>
> wrote:
> >>>>
> >>>> In fact I am very much agree with you. Right now, running the
> >> "accelerator"
> >>>> component in Bigtop disto gives one a pretty much complete fabric
> >> anyway.
> >>>> But
> >>>> in order to make just an accelerator component we perform quite a bit
> of
> >>>> woodoo magic during the packaging stage of the Bigtop build, shuffling
> >> jars
> >>>> from here and there. And that's quite crazy, honestly ;)
> >>>>
> >>>> Cos
> >>>>
> >>>> On Mon, Nov 21, 2016 at 03:33PM, Valentin Kulichenko wrote:
> >>>>
> >>>> I tend to agree with Denis. I see only these differences between
> Hadoop
> >>>> Accelerator and Fabric builds (correct me if I miss something):
> >>>>
> >>>> - Limited set of available modules and no optional modules in Hadoop
> >>>> Accelerator.
> >>>> - No ignite-hadoop module in Fabric.
> >>>> - Additional scripts, configs and instructions included in Hadoop
> >>>> Accelerator.
> >>>>
> >>>> And the list of included modules frankly looks very weird. Here are
> only
> >>>> some of the issues I noticed:
> >>>>
> >>>> - ignite-indexing and ignite-spark are mandatory. Even if we need them
> >>>> for Hadoop Acceleration (which I doubt), are they really required or
> >> can
> >>>> be
> >>>> optional?
> >>>> - We force to use ignite-log4j module without providing other logger
> >>>> options (e.g., SLF).
> >>>> - We don't include ignite-aws module. How to use Hadoop Accelerator
> >> with
> >>>> S3 discovery?
> >>>> - Etc.
> >>>>
> >>>> It seems to me that if we try to fix all this issue, there will be
> >>>> virtually no difference between Fabric and Hadoop Accelerator builds
> >> except
> >>>> couple of scripts and config files. If so, there is no reason to have
> >> two
> >>>> builds.
> >>>>
> >>>> -Val
> >>>>
> >>>> On Mon, Nov 21, 2016 at 3:13 PM, Denis Magda <dm...@apache.org>
> wrote:
> >>>>
> >>>> On the separate note, in the Bigtop, we start looking into changing
> the
> >>>>
> >>>> way we
> >>>>
> >>>> deliver Ignite and we'll likely to start offering the whole 'data
> >> fabric'
> >>>> experience instead of the mere "hadoop-acceleration”.
> >>>>
> >>>>
> >>>> And you still will be using hadoop-accelerator libs of Ignite, right?
> >>>>
> >>>> I’m thinking of if there is a need to keep releasing Hadoop
> Accelerator
> >> as
> >>>> a separate delivery.
> >>>> What if we start releasing the accelerator as a part of the standard
> >>>> fabric binary putting hadoop-accelerator libs under ‘optional’ folder?
> >>>>
> >>>> —
> >>>> Denis
> >>>>
> >>>> On Nov 21, 2016, at 12:19 PM, Konstantin Boudnik <co...@apache.org>
> >> wrote:
> >>>>
> >>>> What Denis said: spark has been added to the Hadoop accelerator as a
> way
> >>>>
> >>>> to
> >>>>
> >>>> boost the performance of more than just MR compute of the Hadoop
> stack,
> >>>>
> >>>> IIRC.
> >>>>
> >>>> For what it worth, Spark is considered a part of Hadoop at large.
> >>>>
> >>>> On the separate note, in the Bigtop, we start looking into changing
> the
> >>>>
> >>>> way we
> >>>>
> >>>> deliver Ignite and we'll likely to start offering the whole 'data
> >> fabric'
> >>>> experience instead of the mere "hadoop-acceleration".
> >>>>
> >>>> Cos
> >>>>
> >>>> On Mon, Nov 21, 2016 at 09:54AM, Denis Magda wrote:
> >>>>
> >>>> Val,
> >>>>
> >>>> Ignite Hadoop module includes not only the map-reduce accelerator but
> >>>>
> >>>> Ignite
> >>>>
> >>>> Hadoop File System component as well. The latter can be used in
> >>>>
> >>>> deployments
> >>>>
> >>>> like HDFS+IGFS+Ignite Spark + Spark.
> >>>>
> >>>> Considering this I’m for the second solution proposed by you: put both
> >>>>
> >>>> 2.10
> >>>>
> >>>> and 2.11 ignite-spark modules under ‘optional’ folder of Ignite Hadoop
> >>>> Accelerator distribution.
> >>>> https://issues.apache.org/jira/browse/IGNITE-4254 <
> >>>>
> >>>> https://issues.apache.org/jira/browse/IGNITE-4254>
> >>>>
> >>>>
> >>>> BTW, this task may be affected or related to the following ones:
> >>>> https://issues.apache.org/jira/browse/IGNITE-3596 <
> >>>>
> >>>> https://issues.apache.org/jira/browse/IGNITE-3596>
> >>>>
> >>>> https://issues.apache.org/jira/browse/IGNITE-3822
> >>>>
> >>>> —
> >>>> Denis
> >>>>
> >>>> On Nov 19, 2016, at 1:26 PM, Valentin Kulichenko <
> >>>>
> >>>> valentin.kulichenko@gmail.com> wrote:
> >>>>
> >>>>
> >>>> Hadoop Accelerator is a plugin to Ignite and this plugin is used by
> >>>>
> >>>> Hadoop
> >>>>
> >>>> when running its jobs. ignite-spark module only provides IgniteRDD
> >>>>
> >>>> which
> >>>>
> >>>> Hadoop obviously will never use.
> >>>>
> >>>> Is there another use case for Hadoop Accelerator which I'm missing?
> >>>>
> >>>> -Val
> >>>>
> >>>> On Sat, Nov 19, 2016 at 3:12 AM, Dmitriy Setrakyan <
> >>>>
> >>>> dsetrakyan@apache.org>
> >>>>
> >>>> wrote:
> >>>>
> >>>> Why do you think that spark module is not needed in our hadoop build?
> >>>>
> >>>> On Fri, Nov 18, 2016 at 5:44 PM, Valentin Kulichenko <
> >>>> valentin.kulichenko@gmail.com> wrote:
> >>>>
> >>>> Folks,
> >>>>
> >>>> Is there anyone who understands the purpose of including ignite-spark
> >>>> module in the Hadoop Accelerator build? I can't figure out a use
> >>>>
> >>>> case for
> >>>>
> >>>> which it's needed.
> >>>>
> >>>> In case we actually need it there, there is an issue then. We
> >>>>
> >>>> actually
> >>>>
> >>>> have
> >>>>
> >>>> two ignite-spark modules, for 2.10 and 2.11. In Fabric build
> >>>>
> >>>> everything
> >>>>
> >>>> is
> >>>>
> >>>> good, we put both in 'optional' folder and user can enable either
> >>>>
> >>>> one.
> >>>>
> >>>> But
> >>>>
> >>>> in Hadoop Accelerator there is only 2.11 which means that the build
> >>>>
> >>>> doesn't
> >>>>
> >>>> work with 2.10 out of the box.
> >>>>
> >>>> We should either remove the module from the build, or fix the issue.
> >>>>
> >>>> -Val
> >>>>
> >>>>
> >>>>
> >>>>
> >>>>
> >>>>
> >>>>
> >>
> >>
> >
> >
> > --
> > Sergey Kozlov
> > GridGain Systems
> > www.gridgain.com
>
>

Re: ignite-spark module in Hadoop Accelerator

Posted by Denis Magda <dm...@apache.org>.

Vovan,

As one of hadoop maintainers, please share your point of view on this.

—
Denis

> On Nov 30, 2016, at 10:49 PM, Sergey Kozlov <sk...@gridgain.com> wrote:
> 
> Denis
> 
> I agree that at the moment there's no reason to split into fabric and
> hadoop editions.
> 
> On Thu, Dec 1, 2016 at 4:45 AM, Denis Magda <dm...@apache.org> wrote:
> 
>> Hadoop Accelerator doesn’t require any additional libraries in compare to
>> those we have in the fabric build. It only lacks some of them as Val
>> mentioned below.
>> 
>> Wouldn’t it better to discontinue Hadoop Accelerator edition and simply
>> deliver hadoop jar and its configs as a part of the fabric?
>> 
>> —
>> Denis
>> 
>>> On Nov 27, 2016, at 3:12 PM, Dmitriy Setrakyan <ds...@apache.org>
>> wrote:
>>> 
>>> Separate edition for the Hadoop Accelerator was primarily driven by the
>>> default libraries. Hadoop Accelerator requires many more libraries as
>> well
>>> as configuration settings compared to the standard fabric download.
>>> 
>>> Now, as far as spark integration is concerned, I am not sure which
>> edition
>>> it belongs in, Hadoop Accelerator or standard fabric.
>>> 
>>> D.
>>> 
>>> On Sat, Nov 26, 2016 at 7:39 PM, Denis Magda <dm...@apache.org> wrote:
>>> 
>>>> *Dmitriy*,
>>>> 
>>>> I do believe that you should know why the community decided to a
>> separate
>>>> edition for the Hadoop Accelerator. What was the reason for that?
>>>> Presently, as I see, it brings more confusion and difficulties rather
>> then
>>>> benefit.
>>>> 
>>>> —
>>>> Denis
>>>> 
>>>> On Nov 26, 2016, at 2:14 PM, Konstantin Boudnik <co...@apache.org> wrote:
>>>> 
>>>> In fact I am very much agree with you. Right now, running the
>> "accelerator"
>>>> component in Bigtop disto gives one a pretty much complete fabric
>> anyway.
>>>> But
>>>> in order to make just an accelerator component we perform quite a bit of
>>>> woodoo magic during the packaging stage of the Bigtop build, shuffling
>> jars
>>>> from here and there. And that's quite crazy, honestly ;)
>>>> 
>>>> Cos
>>>> 
>>>> On Mon, Nov 21, 2016 at 03:33PM, Valentin Kulichenko wrote:
>>>> 
>>>> I tend to agree with Denis. I see only these differences between Hadoop
>>>> Accelerator and Fabric builds (correct me if I miss something):
>>>> 
>>>> - Limited set of available modules and no optional modules in Hadoop
>>>> Accelerator.
>>>> - No ignite-hadoop module in Fabric.
>>>> - Additional scripts, configs and instructions included in Hadoop
>>>> Accelerator.
>>>> 
>>>> And the list of included modules frankly looks very weird. Here are only
>>>> some of the issues I noticed:
>>>> 
>>>> - ignite-indexing and ignite-spark are mandatory. Even if we need them
>>>> for Hadoop Acceleration (which I doubt), are they really required or
>> can
>>>> be
>>>> optional?
>>>> - We force to use ignite-log4j module without providing other logger
>>>> options (e.g., SLF).
>>>> - We don't include ignite-aws module. How to use Hadoop Accelerator
>> with
>>>> S3 discovery?
>>>> - Etc.
>>>> 
>>>> It seems to me that if we try to fix all this issue, there will be
>>>> virtually no difference between Fabric and Hadoop Accelerator builds
>> except
>>>> couple of scripts and config files. If so, there is no reason to have
>> two
>>>> builds.
>>>> 
>>>> -Val
>>>> 
>>>> On Mon, Nov 21, 2016 at 3:13 PM, Denis Magda <dm...@apache.org> wrote:
>>>> 
>>>> On the separate note, in the Bigtop, we start looking into changing the
>>>> 
>>>> way we
>>>> 
>>>> deliver Ignite and we'll likely to start offering the whole 'data
>> fabric'
>>>> experience instead of the mere "hadoop-acceleration”.
>>>> 
>>>> 
>>>> And you still will be using hadoop-accelerator libs of Ignite, right?
>>>> 
>>>> I’m thinking of if there is a need to keep releasing Hadoop Accelerator
>> as
>>>> a separate delivery.
>>>> What if we start releasing the accelerator as a part of the standard
>>>> fabric binary putting hadoop-accelerator libs under ‘optional’ folder?
>>>> 
>>>> —
>>>> Denis
>>>> 
>>>> On Nov 21, 2016, at 12:19 PM, Konstantin Boudnik <co...@apache.org>
>> wrote:
>>>> 
>>>> What Denis said: spark has been added to the Hadoop accelerator as a way
>>>> 
>>>> to
>>>> 
>>>> boost the performance of more than just MR compute of the Hadoop stack,
>>>> 
>>>> IIRC.
>>>> 
>>>> For what it worth, Spark is considered a part of Hadoop at large.
>>>> 
>>>> On the separate note, in the Bigtop, we start looking into changing the
>>>> 
>>>> way we
>>>> 
>>>> deliver Ignite and we'll likely to start offering the whole 'data
>> fabric'
>>>> experience instead of the mere "hadoop-acceleration".
>>>> 
>>>> Cos
>>>> 
>>>> On Mon, Nov 21, 2016 at 09:54AM, Denis Magda wrote:
>>>> 
>>>> Val,
>>>> 
>>>> Ignite Hadoop module includes not only the map-reduce accelerator but
>>>> 
>>>> Ignite
>>>> 
>>>> Hadoop File System component as well. The latter can be used in
>>>> 
>>>> deployments
>>>> 
>>>> like HDFS+IGFS+Ignite Spark + Spark.
>>>> 
>>>> Considering this I’m for the second solution proposed by you: put both
>>>> 
>>>> 2.10
>>>> 
>>>> and 2.11 ignite-spark modules under ‘optional’ folder of Ignite Hadoop
>>>> Accelerator distribution.
>>>> https://issues.apache.org/jira/browse/IGNITE-4254 <
>>>> 
>>>> https://issues.apache.org/jira/browse/IGNITE-4254>
>>>> 
>>>> 
>>>> BTW, this task may be affected or related to the following ones:
>>>> https://issues.apache.org/jira/browse/IGNITE-3596 <
>>>> 
>>>> https://issues.apache.org/jira/browse/IGNITE-3596>
>>>> 
>>>> https://issues.apache.org/jira/browse/IGNITE-3822
>>>> 
>>>> —
>>>> Denis
>>>> 
>>>> On Nov 19, 2016, at 1:26 PM, Valentin Kulichenko <
>>>> 
>>>> valentin.kulichenko@gmail.com> wrote:
>>>> 
>>>> 
>>>> Hadoop Accelerator is a plugin to Ignite and this plugin is used by
>>>> 
>>>> Hadoop
>>>> 
>>>> when running its jobs. ignite-spark module only provides IgniteRDD
>>>> 
>>>> which
>>>> 
>>>> Hadoop obviously will never use.
>>>> 
>>>> Is there another use case for Hadoop Accelerator which I'm missing?
>>>> 
>>>> -Val
>>>> 
>>>> On Sat, Nov 19, 2016 at 3:12 AM, Dmitriy Setrakyan <
>>>> 
>>>> dsetrakyan@apache.org>
>>>> 
>>>> wrote:
>>>> 
>>>> Why do you think that spark module is not needed in our hadoop build?
>>>> 
>>>> On Fri, Nov 18, 2016 at 5:44 PM, Valentin Kulichenko <
>>>> valentin.kulichenko@gmail.com> wrote:
>>>> 
>>>> Folks,
>>>> 
>>>> Is there anyone who understands the purpose of including ignite-spark
>>>> module in the Hadoop Accelerator build? I can't figure out a use
>>>> 
>>>> case for
>>>> 
>>>> which it's needed.
>>>> 
>>>> In case we actually need it there, there is an issue then. We
>>>> 
>>>> actually
>>>> 
>>>> have
>>>> 
>>>> two ignite-spark modules, for 2.10 and 2.11. In Fabric build
>>>> 
>>>> everything
>>>> 
>>>> is
>>>> 
>>>> good, we put both in 'optional' folder and user can enable either
>>>> 
>>>> one.
>>>> 
>>>> But
>>>> 
>>>> in Hadoop Accelerator there is only 2.11 which means that the build
>>>> 
>>>> doesn't
>>>> 
>>>> work with 2.10 out of the box.
>>>> 
>>>> We should either remove the module from the build, or fix the issue.
>>>> 
>>>> -Val
>>>> 
>>>> 
>>>> 
>>>> 
>>>> 
>>>> 
>>>> 
>> 
>> 
> 
> 
> -- 
> Sergey Kozlov
> GridGain Systems
> www.gridgain.com

Re: ignite-spark module in Hadoop Accelerator

Posted by Sergey Kozlov <sk...@gridgain.com>.

Denis

I agree that at the moment there's no reason to split into fabric and
hadoop editions.

On Thu, Dec 1, 2016 at 4:45 AM, Denis Magda <dm...@apache.org> wrote:

> Hadoop Accelerator doesn’t require any additional libraries in compare to
> those we have in the fabric build. It only lacks some of them as Val
> mentioned below.
>
> Wouldn’t it better to discontinue Hadoop Accelerator edition and simply
> deliver hadoop jar and its configs as a part of the fabric?
>
> —
> Denis
>
> > On Nov 27, 2016, at 3:12 PM, Dmitriy Setrakyan <ds...@apache.org>
> wrote:
> >
> > Separate edition for the Hadoop Accelerator was primarily driven by the
> > default libraries. Hadoop Accelerator requires many more libraries as
> well
> > as configuration settings compared to the standard fabric download.
> >
> > Now, as far as spark integration is concerned, I am not sure which
> edition
> > it belongs in, Hadoop Accelerator or standard fabric.
> >
> > D.
> >
> > On Sat, Nov 26, 2016 at 7:39 PM, Denis Magda <dm...@apache.org> wrote:
> >
> >> *Dmitriy*,
> >>
> >> I do believe that you should know why the community decided to a
> separate
> >> edition for the Hadoop Accelerator. What was the reason for that?
> >> Presently, as I see, it brings more confusion and difficulties rather
> then
> >> benefit.
> >>
> >> —
> >> Denis
> >>
> >> On Nov 26, 2016, at 2:14 PM, Konstantin Boudnik <co...@apache.org> wrote:
> >>
> >> In fact I am very much agree with you. Right now, running the
> "accelerator"
> >> component in Bigtop disto gives one a pretty much complete fabric
> anyway.
> >> But
> >> in order to make just an accelerator component we perform quite a bit of
> >> woodoo magic during the packaging stage of the Bigtop build, shuffling
> jars
> >> from here and there. And that's quite crazy, honestly ;)
> >>
> >> Cos
> >>
> >> On Mon, Nov 21, 2016 at 03:33PM, Valentin Kulichenko wrote:
> >>
> >> I tend to agree with Denis. I see only these differences between Hadoop
> >> Accelerator and Fabric builds (correct me if I miss something):
> >>
> >>  - Limited set of available modules and no optional modules in Hadoop
> >>  Accelerator.
> >>  - No ignite-hadoop module in Fabric.
> >>  - Additional scripts, configs and instructions included in Hadoop
> >>  Accelerator.
> >>
> >> And the list of included modules frankly looks very weird. Here are only
> >> some of the issues I noticed:
> >>
> >>  - ignite-indexing and ignite-spark are mandatory. Even if we need them
> >>  for Hadoop Acceleration (which I doubt), are they really required or
> can
> >> be
> >>  optional?
> >>  - We force to use ignite-log4j module without providing other logger
> >>  options (e.g., SLF).
> >>  - We don't include ignite-aws module. How to use Hadoop Accelerator
> with
> >>  S3 discovery?
> >>  - Etc.
> >>
> >> It seems to me that if we try to fix all this issue, there will be
> >> virtually no difference between Fabric and Hadoop Accelerator builds
> except
> >> couple of scripts and config files. If so, there is no reason to have
> two
> >> builds.
> >>
> >> -Val
> >>
> >> On Mon, Nov 21, 2016 at 3:13 PM, Denis Magda <dm...@apache.org> wrote:
> >>
> >> On the separate note, in the Bigtop, we start looking into changing the
> >>
> >> way we
> >>
> >> deliver Ignite and we'll likely to start offering the whole 'data
> fabric'
> >> experience instead of the mere "hadoop-acceleration”.
> >>
> >>
> >> And you still will be using hadoop-accelerator libs of Ignite, right?
> >>
> >> I’m thinking of if there is a need to keep releasing Hadoop Accelerator
> as
> >> a separate delivery.
> >> What if we start releasing the accelerator as a part of the standard
> >> fabric binary putting hadoop-accelerator libs under ‘optional’ folder?
> >>
> >> —
> >> Denis
> >>
> >> On Nov 21, 2016, at 12:19 PM, Konstantin Boudnik <co...@apache.org>
> wrote:
> >>
> >> What Denis said: spark has been added to the Hadoop accelerator as a way
> >>
> >> to
> >>
> >> boost the performance of more than just MR compute of the Hadoop stack,
> >>
> >> IIRC.
> >>
> >> For what it worth, Spark is considered a part of Hadoop at large.
> >>
> >> On the separate note, in the Bigtop, we start looking into changing the
> >>
> >> way we
> >>
> >> deliver Ignite and we'll likely to start offering the whole 'data
> fabric'
> >> experience instead of the mere "hadoop-acceleration".
> >>
> >> Cos
> >>
> >> On Mon, Nov 21, 2016 at 09:54AM, Denis Magda wrote:
> >>
> >> Val,
> >>
> >> Ignite Hadoop module includes not only the map-reduce accelerator but
> >>
> >> Ignite
> >>
> >> Hadoop File System component as well. The latter can be used in
> >>
> >> deployments
> >>
> >> like HDFS+IGFS+Ignite Spark + Spark.
> >>
> >> Considering this I’m for the second solution proposed by you: put both
> >>
> >> 2.10
> >>
> >> and 2.11 ignite-spark modules under ‘optional’ folder of Ignite Hadoop
> >> Accelerator distribution.
> >> https://issues.apache.org/jira/browse/IGNITE-4254 <
> >>
> >> https://issues.apache.org/jira/browse/IGNITE-4254>
> >>
> >>
> >> BTW, this task may be affected or related to the following ones:
> >> https://issues.apache.org/jira/browse/IGNITE-3596 <
> >>
> >> https://issues.apache.org/jira/browse/IGNITE-3596>
> >>
> >> https://issues.apache.org/jira/browse/IGNITE-3822
> >>
> >> —
> >> Denis
> >>
> >> On Nov 19, 2016, at 1:26 PM, Valentin Kulichenko <
> >>
> >> valentin.kulichenko@gmail.com> wrote:
> >>
> >>
> >> Hadoop Accelerator is a plugin to Ignite and this plugin is used by
> >>
> >> Hadoop
> >>
> >> when running its jobs. ignite-spark module only provides IgniteRDD
> >>
> >> which
> >>
> >> Hadoop obviously will never use.
> >>
> >> Is there another use case for Hadoop Accelerator which I'm missing?
> >>
> >> -Val
> >>
> >> On Sat, Nov 19, 2016 at 3:12 AM, Dmitriy Setrakyan <
> >>
> >> dsetrakyan@apache.org>
> >>
> >> wrote:
> >>
> >> Why do you think that spark module is not needed in our hadoop build?
> >>
> >> On Fri, Nov 18, 2016 at 5:44 PM, Valentin Kulichenko <
> >> valentin.kulichenko@gmail.com> wrote:
> >>
> >> Folks,
> >>
> >> Is there anyone who understands the purpose of including ignite-spark
> >> module in the Hadoop Accelerator build? I can't figure out a use
> >>
> >> case for
> >>
> >> which it's needed.
> >>
> >> In case we actually need it there, there is an issue then. We
> >>
> >> actually
> >>
> >> have
> >>
> >> two ignite-spark modules, for 2.10 and 2.11. In Fabric build
> >>
> >> everything
> >>
> >> is
> >>
> >> good, we put both in 'optional' folder and user can enable either
> >>
> >> one.
> >>
> >> But
> >>
> >> in Hadoop Accelerator there is only 2.11 which means that the build
> >>
> >> doesn't
> >>
> >> work with 2.10 out of the box.
> >>
> >> We should either remove the module from the build, or fix the issue.
> >>
> >> -Val
> >>
> >>
> >>
> >>
> >>
> >>
> >>
>
>


-- 
Sergey Kozlov
GridGain Systems
www.gridgain.com

Re: ignite-spark module in Hadoop Accelerator

Posted by Denis Magda <dm...@apache.org>.

Hadoop Accelerator doesn’t require any additional libraries in compare to those we have in the fabric build. It only lacks some of them as Val mentioned below.

Wouldn’t it better to discontinue Hadoop Accelerator edition and simply deliver hadoop jar and its configs as a part of the fabric?

—
Denis

> On Nov 27, 2016, at 3:12 PM, Dmitriy Setrakyan <ds...@apache.org> wrote:
> 
> Separate edition for the Hadoop Accelerator was primarily driven by the
> default libraries. Hadoop Accelerator requires many more libraries as well
> as configuration settings compared to the standard fabric download.
> 
> Now, as far as spark integration is concerned, I am not sure which edition
> it belongs in, Hadoop Accelerator or standard fabric.
> 
> D.
> 
> On Sat, Nov 26, 2016 at 7:39 PM, Denis Magda <dm...@apache.org> wrote:
> 
>> *Dmitriy*,
>> 
>> I do believe that you should know why the community decided to a separate
>> edition for the Hadoop Accelerator. What was the reason for that?
>> Presently, as I see, it brings more confusion and difficulties rather then
>> benefit.
>> 
>> —
>> Denis
>> 
>> On Nov 26, 2016, at 2:14 PM, Konstantin Boudnik <co...@apache.org> wrote:
>> 
>> In fact I am very much agree with you. Right now, running the "accelerator"
>> component in Bigtop disto gives one a pretty much complete fabric anyway.
>> But
>> in order to make just an accelerator component we perform quite a bit of
>> woodoo magic during the packaging stage of the Bigtop build, shuffling jars
>> from here and there. And that's quite crazy, honestly ;)
>> 
>> Cos
>> 
>> On Mon, Nov 21, 2016 at 03:33PM, Valentin Kulichenko wrote:
>> 
>> I tend to agree with Denis. I see only these differences between Hadoop
>> Accelerator and Fabric builds (correct me if I miss something):
>> 
>>  - Limited set of available modules and no optional modules in Hadoop
>>  Accelerator.
>>  - No ignite-hadoop module in Fabric.
>>  - Additional scripts, configs and instructions included in Hadoop
>>  Accelerator.
>> 
>> And the list of included modules frankly looks very weird. Here are only
>> some of the issues I noticed:
>> 
>>  - ignite-indexing and ignite-spark are mandatory. Even if we need them
>>  for Hadoop Acceleration (which I doubt), are they really required or can
>> be
>>  optional?
>>  - We force to use ignite-log4j module without providing other logger
>>  options (e.g., SLF).
>>  - We don't include ignite-aws module. How to use Hadoop Accelerator with
>>  S3 discovery?
>>  - Etc.
>> 
>> It seems to me that if we try to fix all this issue, there will be
>> virtually no difference between Fabric and Hadoop Accelerator builds except
>> couple of scripts and config files. If so, there is no reason to have two
>> builds.
>> 
>> -Val
>> 
>> On Mon, Nov 21, 2016 at 3:13 PM, Denis Magda <dm...@apache.org> wrote:
>> 
>> On the separate note, in the Bigtop, we start looking into changing the
>> 
>> way we
>> 
>> deliver Ignite and we'll likely to start offering the whole 'data fabric'
>> experience instead of the mere "hadoop-acceleration”.
>> 
>> 
>> And you still will be using hadoop-accelerator libs of Ignite, right?
>> 
>> I’m thinking of if there is a need to keep releasing Hadoop Accelerator as
>> a separate delivery.
>> What if we start releasing the accelerator as a part of the standard
>> fabric binary putting hadoop-accelerator libs under ‘optional’ folder?
>> 
>> —
>> Denis
>> 
>> On Nov 21, 2016, at 12:19 PM, Konstantin Boudnik <co...@apache.org> wrote:
>> 
>> What Denis said: spark has been added to the Hadoop accelerator as a way
>> 
>> to
>> 
>> boost the performance of more than just MR compute of the Hadoop stack,
>> 
>> IIRC.
>> 
>> For what it worth, Spark is considered a part of Hadoop at large.
>> 
>> On the separate note, in the Bigtop, we start looking into changing the
>> 
>> way we
>> 
>> deliver Ignite and we'll likely to start offering the whole 'data fabric'
>> experience instead of the mere "hadoop-acceleration".
>> 
>> Cos
>> 
>> On Mon, Nov 21, 2016 at 09:54AM, Denis Magda wrote:
>> 
>> Val,
>> 
>> Ignite Hadoop module includes not only the map-reduce accelerator but
>> 
>> Ignite
>> 
>> Hadoop File System component as well. The latter can be used in
>> 
>> deployments
>> 
>> like HDFS+IGFS+Ignite Spark + Spark.
>> 
>> Considering this I’m for the second solution proposed by you: put both
>> 
>> 2.10
>> 
>> and 2.11 ignite-spark modules under ‘optional’ folder of Ignite Hadoop
>> Accelerator distribution.
>> https://issues.apache.org/jira/browse/IGNITE-4254 <
>> 
>> https://issues.apache.org/jira/browse/IGNITE-4254>
>> 
>> 
>> BTW, this task may be affected or related to the following ones:
>> https://issues.apache.org/jira/browse/IGNITE-3596 <
>> 
>> https://issues.apache.org/jira/browse/IGNITE-3596>
>> 
>> https://issues.apache.org/jira/browse/IGNITE-3822
>> 
>> —
>> Denis
>> 
>> On Nov 19, 2016, at 1:26 PM, Valentin Kulichenko <
>> 
>> valentin.kulichenko@gmail.com> wrote:
>> 
>> 
>> Hadoop Accelerator is a plugin to Ignite and this plugin is used by
>> 
>> Hadoop
>> 
>> when running its jobs. ignite-spark module only provides IgniteRDD
>> 
>> which
>> 
>> Hadoop obviously will never use.
>> 
>> Is there another use case for Hadoop Accelerator which I'm missing?
>> 
>> -Val
>> 
>> On Sat, Nov 19, 2016 at 3:12 AM, Dmitriy Setrakyan <
>> 
>> dsetrakyan@apache.org>
>> 
>> wrote:
>> 
>> Why do you think that spark module is not needed in our hadoop build?
>> 
>> On Fri, Nov 18, 2016 at 5:44 PM, Valentin Kulichenko <
>> valentin.kulichenko@gmail.com> wrote:
>> 
>> Folks,
>> 
>> Is there anyone who understands the purpose of including ignite-spark
>> module in the Hadoop Accelerator build? I can't figure out a use
>> 
>> case for
>> 
>> which it's needed.
>> 
>> In case we actually need it there, there is an issue then. We
>> 
>> actually
>> 
>> have
>> 
>> two ignite-spark modules, for 2.10 and 2.11. In Fabric build
>> 
>> everything
>> 
>> is
>> 
>> good, we put both in 'optional' folder and user can enable either
>> 
>> one.
>> 
>> But
>> 
>> in Hadoop Accelerator there is only 2.11 which means that the build
>> 
>> doesn't
>> 
>> work with 2.10 out of the box.
>> 
>> We should either remove the module from the build, or fix the issue.
>> 
>> -Val
>> 
>> 
>> 
>> 
>> 
>> 
>>

Re: ignite-spark module in Hadoop Accelerator

Posted by Dmitriy Setrakyan <ds...@apache.org>.

Separate edition for the Hadoop Accelerator was primarily driven by the
default libraries. Hadoop Accelerator requires many more libraries as well
as configuration settings compared to the standard fabric download.

Now, as far as spark integration is concerned, I am not sure which edition
it belongs in, Hadoop Accelerator or standard fabric.

D.

On Sat, Nov 26, 2016 at 7:39 PM, Denis Magda <dm...@apache.org> wrote:

> *Dmitriy*,
>
> I do believe that you should know why the community decided to a separate
> edition for the Hadoop Accelerator. What was the reason for that?
> Presently, as I see, it brings more confusion and difficulties rather then
> benefit.
>
> —
> Denis
>
> On Nov 26, 2016, at 2:14 PM, Konstantin Boudnik <co...@apache.org> wrote:
>
> In fact I am very much agree with you. Right now, running the "accelerator"
> component in Bigtop disto gives one a pretty much complete fabric anyway.
> But
> in order to make just an accelerator component we perform quite a bit of
> woodoo magic during the packaging stage of the Bigtop build, shuffling jars
> from here and there. And that's quite crazy, honestly ;)
>
> Cos
>
> On Mon, Nov 21, 2016 at 03:33PM, Valentin Kulichenko wrote:
>
> I tend to agree with Denis. I see only these differences between Hadoop
> Accelerator and Fabric builds (correct me if I miss something):
>
>   - Limited set of available modules and no optional modules in Hadoop
>   Accelerator.
>   - No ignite-hadoop module in Fabric.
>   - Additional scripts, configs and instructions included in Hadoop
>   Accelerator.
>
> And the list of included modules frankly looks very weird. Here are only
> some of the issues I noticed:
>
>   - ignite-indexing and ignite-spark are mandatory. Even if we need them
>   for Hadoop Acceleration (which I doubt), are they really required or can
> be
>   optional?
>   - We force to use ignite-log4j module without providing other logger
>   options (e.g., SLF).
>   - We don't include ignite-aws module. How to use Hadoop Accelerator with
>   S3 discovery?
>   - Etc.
>
> It seems to me that if we try to fix all this issue, there will be
> virtually no difference between Fabric and Hadoop Accelerator builds except
> couple of scripts and config files. If so, there is no reason to have two
> builds.
>
> -Val
>
> On Mon, Nov 21, 2016 at 3:13 PM, Denis Magda <dm...@apache.org> wrote:
>
> On the separate note, in the Bigtop, we start looking into changing the
>
> way we
>
> deliver Ignite and we'll likely to start offering the whole 'data fabric'
> experience instead of the mere "hadoop-acceleration”.
>
>
> And you still will be using hadoop-accelerator libs of Ignite, right?
>
> I’m thinking of if there is a need to keep releasing Hadoop Accelerator as
> a separate delivery.
> What if we start releasing the accelerator as a part of the standard
> fabric binary putting hadoop-accelerator libs under ‘optional’ folder?
>
> —
> Denis
>
> On Nov 21, 2016, at 12:19 PM, Konstantin Boudnik <co...@apache.org> wrote:
>
> What Denis said: spark has been added to the Hadoop accelerator as a way
>
> to
>
> boost the performance of more than just MR compute of the Hadoop stack,
>
> IIRC.
>
> For what it worth, Spark is considered a part of Hadoop at large.
>
> On the separate note, in the Bigtop, we start looking into changing the
>
> way we
>
> deliver Ignite and we'll likely to start offering the whole 'data fabric'
> experience instead of the mere "hadoop-acceleration".
>
> Cos
>
> On Mon, Nov 21, 2016 at 09:54AM, Denis Magda wrote:
>
> Val,
>
> Ignite Hadoop module includes not only the map-reduce accelerator but
>
> Ignite
>
> Hadoop File System component as well. The latter can be used in
>
> deployments
>
> like HDFS+IGFS+Ignite Spark + Spark.
>
> Considering this I’m for the second solution proposed by you: put both
>
> 2.10
>
> and 2.11 ignite-spark modules under ‘optional’ folder of Ignite Hadoop
> Accelerator distribution.
> https://issues.apache.org/jira/browse/IGNITE-4254 <
>
> https://issues.apache.org/jira/browse/IGNITE-4254>
>
>
> BTW, this task may be affected or related to the following ones:
> https://issues.apache.org/jira/browse/IGNITE-3596 <
>
> https://issues.apache.org/jira/browse/IGNITE-3596>
>
> https://issues.apache.org/jira/browse/IGNITE-3822
>
> —
> Denis
>
> On Nov 19, 2016, at 1:26 PM, Valentin Kulichenko <
>
> valentin.kulichenko@gmail.com> wrote:
>
>
> Hadoop Accelerator is a plugin to Ignite and this plugin is used by
>
> Hadoop
>
> when running its jobs. ignite-spark module only provides IgniteRDD
>
> which
>
> Hadoop obviously will never use.
>
> Is there another use case for Hadoop Accelerator which I'm missing?
>
> -Val
>
> On Sat, Nov 19, 2016 at 3:12 AM, Dmitriy Setrakyan <
>
> dsetrakyan@apache.org>
>
> wrote:
>
> Why do you think that spark module is not needed in our hadoop build?
>
> On Fri, Nov 18, 2016 at 5:44 PM, Valentin Kulichenko <
> valentin.kulichenko@gmail.com> wrote:
>
> Folks,
>
> Is there anyone who understands the purpose of including ignite-spark
> module in the Hadoop Accelerator build? I can't figure out a use
>
> case for
>
> which it's needed.
>
> In case we actually need it there, there is an issue then. We
>
> actually
>
> have
>
> two ignite-spark modules, for 2.10 and 2.11. In Fabric build
>
> everything
>
> is
>
> good, we put both in 'optional' folder and user can enable either
>
> one.
>
> But
>
> in Hadoop Accelerator there is only 2.11 which means that the build
>
> doesn't
>
> work with 2.10 out of the box.
>
> We should either remove the module from the build, or fix the issue.
>
> -Val
>
>
>
>
>
>
>

Re: ignite-spark module in Hadoop Accelerator

Posted by Denis Magda <dm...@apache.org>.

Dmitriy,

I do believe that you should know why the community decided to a separate edition for the Hadoop Accelerator. What was the reason for that? Presently, as I see, it brings more confusion and difficulties rather then benefit.

—
Denis

> On Nov 26, 2016, at 2:14 PM, Konstantin Boudnik <co...@apache.org> wrote:
> 
> In fact I am very much agree with you. Right now, running the "accelerator"
> component in Bigtop disto gives one a pretty much complete fabric anyway. But
> in order to make just an accelerator component we perform quite a bit of
> woodoo magic during the packaging stage of the Bigtop build, shuffling jars
> from here and there. And that's quite crazy, honestly ;)
> 
> Cos
> 
> On Mon, Nov 21, 2016 at 03:33PM, Valentin Kulichenko wrote:
>> I tend to agree with Denis. I see only these differences between Hadoop
>> Accelerator and Fabric builds (correct me if I miss something):
>> 
>>   - Limited set of available modules and no optional modules in Hadoop
>>   Accelerator.
>>   - No ignite-hadoop module in Fabric.
>>   - Additional scripts, configs and instructions included in Hadoop
>>   Accelerator.
>> 
>> And the list of included modules frankly looks very weird. Here are only
>> some of the issues I noticed:
>> 
>>   - ignite-indexing and ignite-spark are mandatory. Even if we need them
>>   for Hadoop Acceleration (which I doubt), are they really required or can be
>>   optional?
>>   - We force to use ignite-log4j module without providing other logger
>>   options (e.g., SLF).
>>   - We don't include ignite-aws module. How to use Hadoop Accelerator with
>>   S3 discovery?
>>   - Etc.
>> 
>> It seems to me that if we try to fix all this issue, there will be
>> virtually no difference between Fabric and Hadoop Accelerator builds except
>> couple of scripts and config files. If so, there is no reason to have two
>> builds.
>> 
>> -Val
>> 
>> On Mon, Nov 21, 2016 at 3:13 PM, Denis Magda <dm...@apache.org> wrote:
>> 
>>>> On the separate note, in the Bigtop, we start looking into changing the
>>> way we
>>>> deliver Ignite and we'll likely to start offering the whole 'data fabric'
>>>> experience instead of the mere "hadoop-acceleration”.
>>> 
>>> And you still will be using hadoop-accelerator libs of Ignite, right?
>>> 
>>> I’m thinking of if there is a need to keep releasing Hadoop Accelerator as
>>> a separate delivery.
>>> What if we start releasing the accelerator as a part of the standard
>>> fabric binary putting hadoop-accelerator libs under ‘optional’ folder?
>>> 
>>> —
>>> Denis
>>> 
>>>> On Nov 21, 2016, at 12:19 PM, Konstantin Boudnik <co...@apache.org> wrote:
>>>> 
>>>> What Denis said: spark has been added to the Hadoop accelerator as a way
>>> to
>>>> boost the performance of more than just MR compute of the Hadoop stack,
>>> IIRC.
>>>> For what it worth, Spark is considered a part of Hadoop at large.
>>>> 
>>>> On the separate note, in the Bigtop, we start looking into changing the
>>> way we
>>>> deliver Ignite and we'll likely to start offering the whole 'data fabric'
>>>> experience instead of the mere "hadoop-acceleration".
>>>> 
>>>> Cos
>>>> 
>>>> On Mon, Nov 21, 2016 at 09:54AM, Denis Magda wrote:
>>>>> Val,
>>>>> 
>>>>> Ignite Hadoop module includes not only the map-reduce accelerator but
>>> Ignite
>>>>> Hadoop File System component as well. The latter can be used in
>>> deployments
>>>>> like HDFS+IGFS+Ignite Spark + Spark.
>>>>> 
>>>>> Considering this I’m for the second solution proposed by you: put both
>>> 2.10
>>>>> and 2.11 ignite-spark modules under ‘optional’ folder of Ignite Hadoop
>>>>> Accelerator distribution.
>>>>> https://issues.apache.org/jira/browse/IGNITE-4254 <
>>> https://issues.apache.org/jira/browse/IGNITE-4254>
>>>>> 
>>>>> BTW, this task may be affected or related to the following ones:
>>>>> https://issues.apache.org/jira/browse/IGNITE-3596 <
>>> https://issues.apache.org/jira/browse/IGNITE-3596>
>>>>> https://issues.apache.org/jira/browse/IGNITE-3822
>>>>> 
>>>>> —
>>>>> Denis
>>>>> 
>>>>>> On Nov 19, 2016, at 1:26 PM, Valentin Kulichenko <
>>> valentin.kulichenko@gmail.com> wrote:
>>>>>> 
>>>>>> Hadoop Accelerator is a plugin to Ignite and this plugin is used by
>>> Hadoop
>>>>>> when running its jobs. ignite-spark module only provides IgniteRDD
>>> which
>>>>>> Hadoop obviously will never use.
>>>>>> 
>>>>>> Is there another use case for Hadoop Accelerator which I'm missing?
>>>>>> 
>>>>>> -Val
>>>>>> 
>>>>>> On Sat, Nov 19, 2016 at 3:12 AM, Dmitriy Setrakyan <
>>> dsetrakyan@apache.org>
>>>>>> wrote:
>>>>>> 
>>>>>>> Why do you think that spark module is not needed in our hadoop build?
>>>>>>> 
>>>>>>> On Fri, Nov 18, 2016 at 5:44 PM, Valentin Kulichenko <
>>>>>>> valentin.kulichenko@gmail.com> wrote:
>>>>>>> 
>>>>>>>> Folks,
>>>>>>>> 
>>>>>>>> Is there anyone who understands the purpose of including ignite-spark
>>>>>>>> module in the Hadoop Accelerator build? I can't figure out a use
>>> case for
>>>>>>>> which it's needed.
>>>>>>>> 
>>>>>>>> In case we actually need it there, there is an issue then. We
>>> actually
>>>>>>> have
>>>>>>>> two ignite-spark modules, for 2.10 and 2.11. In Fabric build
>>> everything
>>>>>>> is
>>>>>>>> good, we put both in 'optional' folder and user can enable either
>>> one.
>>>>>>> But
>>>>>>>> in Hadoop Accelerator there is only 2.11 which means that the build
>>>>>>> doesn't
>>>>>>>> work with 2.10 out of the box.
>>>>>>>> 
>>>>>>>> We should either remove the module from the build, or fix the issue.
>>>>>>>> 
>>>>>>>> -Val
>>>>>>>> 
>>>>>>> 
>>>>> 
>>> 
>>>

Re: ignite-spark module in Hadoop Accelerator

Posted by Konstantin Boudnik <co...@apache.org>.

In fact I am very much agree with you. Right now, running the "accelerator"
component in Bigtop disto gives one a pretty much complete fabric anyway. But
in order to make just an accelerator component we perform quite a bit of
woodoo magic during the packaging stage of the Bigtop build, shuffling jars
from here and there. And that's quite crazy, honestly ;)

Cos
 
On Mon, Nov 21, 2016 at 03:33PM, Valentin Kulichenko wrote:
> I tend to agree with Denis. I see only these differences between Hadoop
> Accelerator and Fabric builds (correct me if I miss something):
> 
>    - Limited set of available modules and no optional modules in Hadoop
>    Accelerator.
>    - No ignite-hadoop module in Fabric.
>    - Additional scripts, configs and instructions included in Hadoop
>    Accelerator.
> 
> And the list of included modules frankly looks very weird. Here are only
> some of the issues I noticed:
> 
>    - ignite-indexing and ignite-spark are mandatory. Even if we need them
>    for Hadoop Acceleration (which I doubt), are they really required or can be
>    optional?
>    - We force to use ignite-log4j module without providing other logger
>    options (e.g., SLF).
>    - We don't include ignite-aws module. How to use Hadoop Accelerator with
>    S3 discovery?
>    - Etc.
> 
> It seems to me that if we try to fix all this issue, there will be
> virtually no difference between Fabric and Hadoop Accelerator builds except
> couple of scripts and config files. If so, there is no reason to have two
> builds.
> 
> -Val
> 
> On Mon, Nov 21, 2016 at 3:13 PM, Denis Magda <dm...@apache.org> wrote:
> 
> > > On the separate note, in the Bigtop, we start looking into changing the
> > way we
> > > deliver Ignite and we'll likely to start offering the whole 'data fabric'
> > > experience instead of the mere "hadoop-acceleration\u201d.
> >
> > And you still will be using hadoop-accelerator libs of Ignite, right?
> >
> > I\u2019m thinking of if there is a need to keep releasing Hadoop Accelerator as
> > a separate delivery.
> > What if we start releasing the accelerator as a part of the standard
> > fabric binary putting hadoop-accelerator libs under \u2018optional\u2019 folder?
> >
> > \u2014
> > Denis
> >
> > > On Nov 21, 2016, at 12:19 PM, Konstantin Boudnik <co...@apache.org> wrote:
> > >
> > > What Denis said: spark has been added to the Hadoop accelerator as a way
> > to
> > > boost the performance of more than just MR compute of the Hadoop stack,
> > IIRC.
> > > For what it worth, Spark is considered a part of Hadoop at large.
> > >
> > > On the separate note, in the Bigtop, we start looking into changing the
> > way we
> > > deliver Ignite and we'll likely to start offering the whole 'data fabric'
> > > experience instead of the mere "hadoop-acceleration".
> > >
> > > Cos
> > >
> > > On Mon, Nov 21, 2016 at 09:54AM, Denis Magda wrote:
> > >> Val,
> > >>
> > >> Ignite Hadoop module includes not only the map-reduce accelerator but
> > Ignite
> > >> Hadoop File System component as well. The latter can be used in
> > deployments
> > >> like HDFS+IGFS+Ignite Spark + Spark.
> > >>
> > >> Considering this I\u2019m for the second solution proposed by you: put both
> > 2.10
> > >> and 2.11 ignite-spark modules under \u2018optional\u2019 folder of Ignite Hadoop
> > >> Accelerator distribution.
> > >> https://issues.apache.org/jira/browse/IGNITE-4254 <
> > https://issues.apache.org/jira/browse/IGNITE-4254>
> > >>
> > >> BTW, this task may be affected or related to the following ones:
> > >> https://issues.apache.org/jira/browse/IGNITE-3596 <
> > https://issues.apache.org/jira/browse/IGNITE-3596>
> > >> https://issues.apache.org/jira/browse/IGNITE-3822
> > >>
> > >> \u2014
> > >> Denis
> > >>
> > >>> On Nov 19, 2016, at 1:26 PM, Valentin Kulichenko <
> > valentin.kulichenko@gmail.com> wrote:
> > >>>
> > >>> Hadoop Accelerator is a plugin to Ignite and this plugin is used by
> > Hadoop
> > >>> when running its jobs. ignite-spark module only provides IgniteRDD
> > which
> > >>> Hadoop obviously will never use.
> > >>>
> > >>> Is there another use case for Hadoop Accelerator which I'm missing?
> > >>>
> > >>> -Val
> > >>>
> > >>> On Sat, Nov 19, 2016 at 3:12 AM, Dmitriy Setrakyan <
> > dsetrakyan@apache.org>
> > >>> wrote:
> > >>>
> > >>>> Why do you think that spark module is not needed in our hadoop build?
> > >>>>
> > >>>> On Fri, Nov 18, 2016 at 5:44 PM, Valentin Kulichenko <
> > >>>> valentin.kulichenko@gmail.com> wrote:
> > >>>>
> > >>>>> Folks,
> > >>>>>
> > >>>>> Is there anyone who understands the purpose of including ignite-spark
> > >>>>> module in the Hadoop Accelerator build? I can't figure out a use
> > case for
> > >>>>> which it's needed.
> > >>>>>
> > >>>>> In case we actually need it there, there is an issue then. We
> > actually
> > >>>> have
> > >>>>> two ignite-spark modules, for 2.10 and 2.11. In Fabric build
> > everything
> > >>>> is
> > >>>>> good, we put both in 'optional' folder and user can enable either
> > one.
> > >>>> But
> > >>>>> in Hadoop Accelerator there is only 2.11 which means that the build
> > >>>> doesn't
> > >>>>> work with 2.10 out of the box.
> > >>>>>
> > >>>>> We should either remove the module from the build, or fix the issue.
> > >>>>>
> > >>>>> -Val
> > >>>>>
> > >>>>
> > >>
> >
> >

Re: ignite-spark module in Hadoop Accelerator

Posted by Valentin Kulichenko <va...@gmail.com>.

I tend to agree with Denis. I see only these differences between Hadoop
Accelerator and Fabric builds (correct me if I miss something):

   - Limited set of available modules and no optional modules in Hadoop
   Accelerator.
   - No ignite-hadoop module in Fabric.
   - Additional scripts, configs and instructions included in Hadoop
   Accelerator.

And the list of included modules frankly looks very weird. Here are only
some of the issues I noticed:

   - ignite-indexing and ignite-spark are mandatory. Even if we need them
   for Hadoop Acceleration (which I doubt), are they really required or can be
   optional?
   - We force to use ignite-log4j module without providing other logger
   options (e.g., SLF).
   - We don't include ignite-aws module. How to use Hadoop Accelerator with
   S3 discovery?
   - Etc.

It seems to me that if we try to fix all this issue, there will be
virtually no difference between Fabric and Hadoop Accelerator builds except
couple of scripts and config files. If so, there is no reason to have two
builds.

-Val

On Mon, Nov 21, 2016 at 3:13 PM, Denis Magda <dm...@apache.org> wrote:

> > On the separate note, in the Bigtop, we start looking into changing the
> way we
> > deliver Ignite and we'll likely to start offering the whole 'data fabric'
> > experience instead of the mere "hadoop-acceleration”.
>
> And you still will be using hadoop-accelerator libs of Ignite, right?
>
> I’m thinking of if there is a need to keep releasing Hadoop Accelerator as
> a separate delivery.
> What if we start releasing the accelerator as a part of the standard
> fabric binary putting hadoop-accelerator libs under ‘optional’ folder?
>
> —
> Denis
>
> > On Nov 21, 2016, at 12:19 PM, Konstantin Boudnik <co...@apache.org> wrote:
> >
> > What Denis said: spark has been added to the Hadoop accelerator as a way
> to
> > boost the performance of more than just MR compute of the Hadoop stack,
> IIRC.
> > For what it worth, Spark is considered a part of Hadoop at large.
> >
> > On the separate note, in the Bigtop, we start looking into changing the
> way we
> > deliver Ignite and we'll likely to start offering the whole 'data fabric'
> > experience instead of the mere "hadoop-acceleration".
> >
> > Cos
> >
> > On Mon, Nov 21, 2016 at 09:54AM, Denis Magda wrote:
> >> Val,
> >>
> >> Ignite Hadoop module includes not only the map-reduce accelerator but
> Ignite
> >> Hadoop File System component as well. The latter can be used in
> deployments
> >> like HDFS+IGFS+Ignite Spark + Spark.
> >>
> >> Considering this I’m for the second solution proposed by you: put both
> 2.10
> >> and 2.11 ignite-spark modules under ‘optional’ folder of Ignite Hadoop
> >> Accelerator distribution.
> >> https://issues.apache.org/jira/browse/IGNITE-4254 <
> https://issues.apache.org/jira/browse/IGNITE-4254>
> >>
> >> BTW, this task may be affected or related to the following ones:
> >> https://issues.apache.org/jira/browse/IGNITE-3596 <
> https://issues.apache.org/jira/browse/IGNITE-3596>
> >> https://issues.apache.org/jira/browse/IGNITE-3822
> >>
> >> —
> >> Denis
> >>
> >>> On Nov 19, 2016, at 1:26 PM, Valentin Kulichenko <
> valentin.kulichenko@gmail.com> wrote:
> >>>
> >>> Hadoop Accelerator is a plugin to Ignite and this plugin is used by
> Hadoop
> >>> when running its jobs. ignite-spark module only provides IgniteRDD
> which
> >>> Hadoop obviously will never use.
> >>>
> >>> Is there another use case for Hadoop Accelerator which I'm missing?
> >>>
> >>> -Val
> >>>
> >>> On Sat, Nov 19, 2016 at 3:12 AM, Dmitriy Setrakyan <
> dsetrakyan@apache.org>
> >>> wrote:
> >>>
> >>>> Why do you think that spark module is not needed in our hadoop build?
> >>>>
> >>>> On Fri, Nov 18, 2016 at 5:44 PM, Valentin Kulichenko <
> >>>> valentin.kulichenko@gmail.com> wrote:
> >>>>
> >>>>> Folks,
> >>>>>
> >>>>> Is there anyone who understands the purpose of including ignite-spark
> >>>>> module in the Hadoop Accelerator build? I can't figure out a use
> case for
> >>>>> which it's needed.
> >>>>>
> >>>>> In case we actually need it there, there is an issue then. We
> actually
> >>>> have
> >>>>> two ignite-spark modules, for 2.10 and 2.11. In Fabric build
> everything
> >>>> is
> >>>>> good, we put both in 'optional' folder and user can enable either
> one.
> >>>> But
> >>>>> in Hadoop Accelerator there is only 2.11 which means that the build
> >>>> doesn't
> >>>>> work with 2.10 out of the box.
> >>>>>
> >>>>> We should either remove the module from the build, or fix the issue.
> >>>>>
> >>>>> -Val
> >>>>>
> >>>>
> >>
>
>

Re: ignite-spark module in Hadoop Accelerator

Posted by Denis Magda <dm...@apache.org>.

> On the separate note, in the Bigtop, we start looking into changing the way we
> deliver Ignite and we'll likely to start offering the whole 'data fabric'
> experience instead of the mere "hadoop-acceleration”.

And you still will be using hadoop-accelerator libs of Ignite, right?

I’m thinking of if there is a need to keep releasing Hadoop Accelerator as a separate delivery.
What if we start releasing the accelerator as a part of the standard fabric binary putting hadoop-accelerator libs under ‘optional’ folder?

—
Denis

> On Nov 21, 2016, at 12:19 PM, Konstantin Boudnik <co...@apache.org> wrote:
> 
> What Denis said: spark has been added to the Hadoop accelerator as a way to
> boost the performance of more than just MR compute of the Hadoop stack, IIRC.
> For what it worth, Spark is considered a part of Hadoop at large. 
> 
> On the separate note, in the Bigtop, we start looking into changing the way we
> deliver Ignite and we'll likely to start offering the whole 'data fabric'
> experience instead of the mere "hadoop-acceleration".
> 
> Cos
> 
> On Mon, Nov 21, 2016 at 09:54AM, Denis Magda wrote:
>> Val,
>> 
>> Ignite Hadoop module includes not only the map-reduce accelerator but Ignite
>> Hadoop File System component as well. The latter can be used in deployments
>> like HDFS+IGFS+Ignite Spark + Spark. 
>> 
>> Considering this I’m for the second solution proposed by you: put both 2.10
>> and 2.11 ignite-spark modules under ‘optional’ folder of Ignite Hadoop
>> Accelerator distribution.
>> https://issues.apache.org/jira/browse/IGNITE-4254 <https://issues.apache.org/jira/browse/IGNITE-4254>
>> 
>> BTW, this task may be affected or related to the following ones:
>> https://issues.apache.org/jira/browse/IGNITE-3596 <https://issues.apache.org/jira/browse/IGNITE-3596>
>> https://issues.apache.org/jira/browse/IGNITE-3822
>> 
>> —
>> Denis
>> 
>>> On Nov 19, 2016, at 1:26 PM, Valentin Kulichenko <va...@gmail.com> wrote:
>>> 
>>> Hadoop Accelerator is a plugin to Ignite and this plugin is used by Hadoop
>>> when running its jobs. ignite-spark module only provides IgniteRDD which
>>> Hadoop obviously will never use.
>>> 
>>> Is there another use case for Hadoop Accelerator which I'm missing?
>>> 
>>> -Val
>>> 
>>> On Sat, Nov 19, 2016 at 3:12 AM, Dmitriy Setrakyan <ds...@apache.org>
>>> wrote:
>>> 
>>>> Why do you think that spark module is not needed in our hadoop build?
>>>> 
>>>> On Fri, Nov 18, 2016 at 5:44 PM, Valentin Kulichenko <
>>>> valentin.kulichenko@gmail.com> wrote:
>>>> 
>>>>> Folks,
>>>>> 
>>>>> Is there anyone who understands the purpose of including ignite-spark
>>>>> module in the Hadoop Accelerator build? I can't figure out a use case for
>>>>> which it's needed.
>>>>> 
>>>>> In case we actually need it there, there is an issue then. We actually
>>>> have
>>>>> two ignite-spark modules, for 2.10 and 2.11. In Fabric build everything
>>>> is
>>>>> good, we put both in 'optional' folder and user can enable either one.
>>>> But
>>>>> in Hadoop Accelerator there is only 2.11 which means that the build
>>>> doesn't
>>>>> work with 2.10 out of the box.
>>>>> 
>>>>> We should either remove the module from the build, or fix the issue.
>>>>> 
>>>>> -Val
>>>>> 
>>>> 
>>

Re: ignite-spark module in Hadoop Accelerator

Posted by Konstantin Boudnik <co...@apache.org>.

What Denis said: spark has been added to the Hadoop accelerator as a way to
boost the performance of more than just MR compute of the Hadoop stack, IIRC.
For what it worth, Spark is considered a part of Hadoop at large. 

On the separate note, in the Bigtop, we start looking into changing the way we
deliver Ignite and we'll likely to start offering the whole 'data fabric'
experience instead of the mere "hadoop-acceleration".

Cos

On Mon, Nov 21, 2016 at 09:54AM, Denis Magda wrote:
> Val,
> 
> Ignite Hadoop module includes not only the map-reduce accelerator but Ignite
> Hadoop File System component as well. The latter can be used in deployments
> like HDFS+IGFS+Ignite Spark + Spark. 
> 
> Considering this I\u2019m for the second solution proposed by you: put both 2.10
> and 2.11 ignite-spark modules under \u2018optional\u2019 folder of Ignite Hadoop
> Accelerator distribution.
> https://issues.apache.org/jira/browse/IGNITE-4254 <https://issues.apache.org/jira/browse/IGNITE-4254>
> 
> BTW, this task may be affected or related to the following ones:
> https://issues.apache.org/jira/browse/IGNITE-3596 <https://issues.apache.org/jira/browse/IGNITE-3596>
> https://issues.apache.org/jira/browse/IGNITE-3822
> 
> \u2014
> Denis
> 
> > On Nov 19, 2016, at 1:26 PM, Valentin Kulichenko <va...@gmail.com> wrote:
> > 
> > Hadoop Accelerator is a plugin to Ignite and this plugin is used by Hadoop
> > when running its jobs. ignite-spark module only provides IgniteRDD which
> > Hadoop obviously will never use.
> > 
> > Is there another use case for Hadoop Accelerator which I'm missing?
> > 
> > -Val
> > 
> > On Sat, Nov 19, 2016 at 3:12 AM, Dmitriy Setrakyan <ds...@apache.org>
> > wrote:
> > 
> >> Why do you think that spark module is not needed in our hadoop build?
> >> 
> >> On Fri, Nov 18, 2016 at 5:44 PM, Valentin Kulichenko <
> >> valentin.kulichenko@gmail.com> wrote:
> >> 
> >>> Folks,
> >>> 
> >>> Is there anyone who understands the purpose of including ignite-spark
> >>> module in the Hadoop Accelerator build? I can't figure out a use case for
> >>> which it's needed.
> >>> 
> >>> In case we actually need it there, there is an issue then. We actually
> >> have
> >>> two ignite-spark modules, for 2.10 and 2.11. In Fabric build everything
> >> is
> >>> good, we put both in 'optional' folder and user can enable either one.
> >> But
> >>> in Hadoop Accelerator there is only 2.11 which means that the build
> >> doesn't
> >>> work with 2.10 out of the box.
> >>> 
> >>> We should either remove the module from the build, or fix the issue.
> >>> 
> >>> -Val
> >>> 
> >> 
>

Re: ignite-spark module in Hadoop Accelerator

Posted by Denis Magda <dm...@apache.org>.

Val,

Ignite Hadoop module includes not only the map-reduce accelerator but Ignite Hadoop File System component as well. The latter can be used in deployments like HDFS+IGFS+Ignite Spark + Spark. 

Considering this I’m for the second solution proposed by you: put both 2.10 and 2.11 ignite-spark modules under ‘optional’ folder of Ignite Hadoop Accelerator distribution.
https://issues.apache.org/jira/browse/IGNITE-4254 <https://issues.apache.org/jira/browse/IGNITE-4254>

BTW, this task may be affected or related to the following ones:
https://issues.apache.org/jira/browse/IGNITE-3596 <https://issues.apache.org/jira/browse/IGNITE-3596>
https://issues.apache.org/jira/browse/IGNITE-3822

—
Denis

> On Nov 19, 2016, at 1:26 PM, Valentin Kulichenko <va...@gmail.com> wrote:
> 
> Hadoop Accelerator is a plugin to Ignite and this plugin is used by Hadoop
> when running its jobs. ignite-spark module only provides IgniteRDD which
> Hadoop obviously will never use.
> 
> Is there another use case for Hadoop Accelerator which I'm missing?
> 
> -Val
> 
> On Sat, Nov 19, 2016 at 3:12 AM, Dmitriy Setrakyan <ds...@apache.org>
> wrote:
> 
>> Why do you think that spark module is not needed in our hadoop build?
>> 
>> On Fri, Nov 18, 2016 at 5:44 PM, Valentin Kulichenko <
>> valentin.kulichenko@gmail.com> wrote:
>> 
>>> Folks,
>>> 
>>> Is there anyone who understands the purpose of including ignite-spark
>>> module in the Hadoop Accelerator build? I can't figure out a use case for
>>> which it's needed.
>>> 
>>> In case we actually need it there, there is an issue then. We actually
>> have
>>> two ignite-spark modules, for 2.10 and 2.11. In Fabric build everything
>> is
>>> good, we put both in 'optional' folder and user can enable either one.
>> But
>>> in Hadoop Accelerator there is only 2.11 which means that the build
>> doesn't
>>> work with 2.10 out of the box.
>>> 
>>> We should either remove the module from the build, or fix the issue.
>>> 
>>> -Val
>>> 
>>

Re: ignite-spark module in Hadoop Accelerator

Posted by Valentin Kulichenko <va...@gmail.com>.

I meant "plugin to Hadoop" of course.

On Sat, Nov 19, 2016 at 1:26 PM Valentin Kulichenko <
valentin.kulichenko@gmail.com> wrote:

> Hadoop Accelerator is a plugin to Ignite and this plugin is used by Hadoop
> when running its jobs. ignite-spark module only provides IgniteRDD which
> Hadoop obviously will never use.
>
> Is there another use case for Hadoop Accelerator which I'm missing?
>
> -Val
>
> On Sat, Nov 19, 2016 at 3:12 AM, Dmitriy Setrakyan <ds...@apache.org>
> wrote:
>
> Why do you think that spark module is not needed in our hadoop build?
>
> On Fri, Nov 18, 2016 at 5:44 PM, Valentin Kulichenko <
> valentin.kulichenko@gmail.com> wrote:
>
> > Folks,
> >
> > Is there anyone who understands the purpose of including ignite-spark
> > module in the Hadoop Accelerator build? I can't figure out a use case for
> > which it's needed.
> >
> > In case we actually need it there, there is an issue then. We actually
> have
> > two ignite-spark modules, for 2.10 and 2.11. In Fabric build everything
> is
> > good, we put both in 'optional' folder and user can enable either one.
> But
> > in Hadoop Accelerator there is only 2.11 which means that the build
> doesn't
> > work with 2.10 out of the box.
> >
> > We should either remove the module from the build, or fix the issue.
> >
> > -Val
> >
>
>
>

Re: ignite-spark module in Hadoop Accelerator

Posted by Jörn Franke <jo...@gmail.com>.

You can cache often used hive tables/partitions (ideally using tez engine and orc format) since they are just files. 
However , it could be optimized that it automatically detects which tables/partitions are often used. This use case might become irrelevant with Hive LLAP.

> On 19 Nov 2016, at 22:26, Valentin Kulichenko <va...@gmail.com> wrote:
> 
> Hadoop Accelerator is a plugin to Ignite and this plugin is used by Hadoop
> when running its jobs. ignite-spark module only provides IgniteRDD which
> Hadoop obviously will never use.
> 
> Is there another use case for Hadoop Accelerator which I'm missing?
> 
> -Val
> 
> On Sat, Nov 19, 2016 at 3:12 AM, Dmitriy Setrakyan <ds...@apache.org>
> wrote:
> 
>> Why do you think that spark module is not needed in our hadoop build?
>> 
>> On Fri, Nov 18, 2016 at 5:44 PM, Valentin Kulichenko <
>> valentin.kulichenko@gmail.com> wrote:
>> 
>>> Folks,
>>> 
>>> Is there anyone who understands the purpose of including ignite-spark
>>> module in the Hadoop Accelerator build? I can't figure out a use case for
>>> which it's needed.
>>> 
>>> In case we actually need it there, there is an issue then. We actually
>> have
>>> two ignite-spark modules, for 2.10 and 2.11. In Fabric build everything
>> is
>>> good, we put both in 'optional' folder and user can enable either one.
>> But
>>> in Hadoop Accelerator there is only 2.11 which means that the build
>> doesn't
>>> work with 2.10 out of the box.
>>> 
>>> We should either remove the module from the build, or fix the issue.
>>> 
>>> -Val
>>> 
>>

Re: ignite-spark module in Hadoop Accelerator

Posted by Valentin Kulichenko <va...@gmail.com>.

Hadoop Accelerator is a plugin to Ignite and this plugin is used by Hadoop
when running its jobs. ignite-spark module only provides IgniteRDD which
Hadoop obviously will never use.

Is there another use case for Hadoop Accelerator which I'm missing?

-Val

On Sat, Nov 19, 2016 at 3:12 AM, Dmitriy Setrakyan <ds...@apache.org>
wrote:

> Why do you think that spark module is not needed in our hadoop build?
>
> On Fri, Nov 18, 2016 at 5:44 PM, Valentin Kulichenko <
> valentin.kulichenko@gmail.com> wrote:
>
> > Folks,
> >
> > Is there anyone who understands the purpose of including ignite-spark
> > module in the Hadoop Accelerator build? I can't figure out a use case for
> > which it's needed.
> >
> > In case we actually need it there, there is an issue then. We actually
> have
> > two ignite-spark modules, for 2.10 and 2.11. In Fabric build everything
> is
> > good, we put both in 'optional' folder and user can enable either one.
> But
> > in Hadoop Accelerator there is only 2.11 which means that the build
> doesn't
> > work with 2.10 out of the box.
> >
> > We should either remove the module from the build, or fix the issue.
> >
> > -Val
> >
>

Re: ignite-spark module in Hadoop Accelerator

Posted by Dmitriy Setrakyan <ds...@apache.org>.

Why do you think that spark module is not needed in our hadoop build?

On Fri, Nov 18, 2016 at 5:44 PM, Valentin Kulichenko <
valentin.kulichenko@gmail.com> wrote:

> Folks,
>
> Is there anyone who understands the purpose of including ignite-spark
> module in the Hadoop Accelerator build? I can't figure out a use case for
> which it's needed.
>
> In case we actually need it there, there is an issue then. We actually have
> two ignite-spark modules, for 2.10 and 2.11. In Fabric build everything is
> good, we put both in 'optional' folder and user can enable either one. But
> in Hadoop Accelerator there is only 2.11 which means that the build doesn't
> work with 2.10 out of the box.
>
> We should either remove the module from the build, or fix the issue.
>
> -Val
>