You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@spark.apache.org by Jacek Laskowski <ja...@japila.pl> on 2020/05/21 09:11:43 UTC

Re: BOOK review of Spark: WARNING to spark users

Hi Emma,

I'm curious about the purpose of the email. Mind elaborating?

Pozdrawiam,
Jacek Laskowski
----
https://about.me/JacekLaskowski
"The Internals Of" Online Books <https://books.japila.pl/>
Follow me on https://twitter.com/jaceklaskowski

<https://twitter.com/jaceklaskowski>


On Wed, May 20, 2020 at 10:43 PM emma davis <em...@aol.com.invalid>
wrote:

>
> *Book:* Machine Learning with Apache Spark Quick Start Guide
> *publisher* : packt>
>
>
> *F**ollow**ing* this Getting Started with Python in VS Code
> https://code.visualstudio.com/docs/python/python-tutorial
>
> I realised Jillur Qudus has written and published a book without any
> knowledge
> of subject matter, amongst other things Python.
>
>
>
> *Highlighted proof with further details further down the email. *
>
> import findspark # these lines of code are unnecessary see link above for
> setup
> findspark.init()
>
> Setting SPARK_HOME or any other spark variables are unnecessary because
> Spark like any
> frameworks is self contained and has its own conf directory for startup persistent
> configuration settings.
> Obviously the software would find its own current directory upon starting
> i.e. sbin/start-master.sh
>
> Spark is a BIG DATA tool ( heavy distributed ,parallelism processing) so
> clearly you would expect its hello world demo programs to demonstrate
> that.
>
> what is the point of setting num_samples=100. something like 10**10 would
> make sense to test performance.
>
>
>
> *This is my warning do not end up wasting your valuable time as I did .  I
> fee your time is valuable. *
> *I realise the scam as I got a better understanding of the product by just
> doing the correct hello world program from correct source. *
>
> “Research by CISQ found that, in 2018, poor quality software cost
> organizations $2.8 trillion in the US alone. “
>
> I attribute this to the Indian IT industry claiming they can do job better
> than the natives [US , Europeans.] Implying Indian Education or IT people
> is superior. For example People like me born, live and educated  in the
> western Europe
>
> *https://www.it-cisq.org/the-cost-of-poor-quality-software-in-the-us-a-2018-report/The-Cost-of-Poor-Quality-Software-in-the-US-2018-Report.pdf
> <https://www.it-cisq.org/the-cost-of-poor-quality-software-in-the-us-a-2018-report/The-Cost-of-Poor-Quality-Software-in-the-US-2018-Report.pdf>*
>
>
> *Contributors: About the Author*
> “*Jillur Qudus* is a lead technical architect, polygot software engineer
> and data scientist
> with over 10 years of hand-on experience in architecting and engineering
> distributed,
> scalable , high performance .. to combat serious organised crime. Jillur
> has extensive experience working with government, intelligence,law
> enforcement and banking, and has worked across the world including
> Japan,Singapore,Malysia,Hong Kong and New Zealand .. founder of keisan, a
> UK-based company specializing in open source distributed technologies and
> machine learning…“
> This obviously means a lot to many but when I look at his work Judge for
> yourself based on evidence.
>
> *Page 54*
> *<quote> ”*
> Additional Python Packages
> > conda install -c conda-forge findspark
> > conda install -c conda-forge pykafka
> ...”*<quote>*
>
> The remainder of the program was copied from spark website so that wasn’t
> wrong.
> *Page 63*
>
> *<quote> “*
> > cd *etc*/profile.d
> vi spark.sh
> $ export SPARK_HOME=/opt/spark-2.3.2-bin-hadoop2.7
> > source spark.sh
>
> .. in order for the SPARK_HOME environment variable to be successfully
> recognized and registered by findspark ...
> ….
>
> We are now ready to write out first spark application in Python ! …..
>
> # (1) import required Python dependencies
> import findspark
> findspark.init()
>
> (3)
> ….
> num_samples = 100 *“ **</quote>*
>
>
> emma davis
> emma.davis76@aol.com
>
>