You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@carbondata.apache.org by Swapnil Shinde <sw...@gmail.com> on 2017/07/17 05:55:55 UTC

FileNotFoundExceptions while running CarbonData

Hello
    I am new to carbon data and we are trying to use carbon data in
production. I built and installed it on Spark edge nodes as per given
instruction -

*Build -* No major issues.
*Installation -* Followed yarn installation (
http://carbondata.apache.org/installation-guide.html)  instructions.
*Infrastructure -* Spark 2.1.0 on MapR cluster.
*carbon.properties changes -*
          carbon.storelocation=/tmp/hacluster/Opt/CarbonStore
          carbon.badRecords.location=/opt/Carbon/Spark/badrecords
          carbon.lock.type=HDFSLOCK
*spark-default.conf changes -*
         spark.yarn.dist.files
/opt/mapr/spark/spark-2.1.0/conf/carbon.properties
         spark.yarn.dist.archives
 /opt/mapr/spark/spark-2.1.0/carbonlib/carbondata.tar.gz
         spark.executor.extraJavaOptions
 -Dcarbon.properties.filepath=carbon.properties
         spark.driver.extraJavaOptions
-Dcarbon.properties.filepath=/opt/mapr/spark/spark-2.1.0/conf/carbon.properties
*Command line -*
/opt/mapr/spark/spark-2.1.0/bin/spark-shell --name "My app" --master yarn
--jars
/opt/mapr/spark/spark-2.1.0/carbonlib/carbondata_2.11-1.1.0-shade-hadoop2.2.0.jar
\
--driver-memory 1g \
--executor-cores 2 \
--executor-memory 2G
*Code snippet -*
import org.apache.spark.sql.SparkSession
import org.apache.spark.sql.CarbonSession._
val carbon =
SparkSession.builder().config(sc.getConf).getOrCreateCarbonSession("/mapr/
ri0.abc.com/tmp", "/mapr/ri0.abc.com/tmp")
carbon.sql("""CREATE TABLE
                        IF NOT EXISTS test_table(
                                  id string,
                                  name string,
                                  city string,
                                  age Int)
                       STORED BY 'carbondata'""")

carbon.sql("""LOAD DATA INPATH '/mapr/ri0.comscore.com/tmp/sample.csv'
                  INTO TABLE test_table""")

First error -
      Inital error was "*Dictionay file is locked for updation*". Further
debugging showed that it was due to missing maprFS filesyste.
(HDFSFileLock.java line # 52) -
String hdfsPath =  "conf.get(CarbonCommonConstants.FS_DEFAULT_FS)"

I added some code to workaround with path like maprfs:///* and that seemed
to be working fine. (like adding MAPRFS FileType)

*Second error -*
   First error was gone after mapFS refactoring but then it fails with
below error. *It seems *.dict & *.dictmeta are not getting created.* Could
you please help me resolving this error?
[image: Inline image 1]


Thanks
Swapnil

Re: FileNotFoundExceptions while running CarbonData

Posted by Liang Chen <ch...@apache.org>.
Hi Swapnil

Very look forward to seeing your PR.
Please let me know your Apache JIRA email id, i will add the contributor
right for you.

Regards
Liang

2017-07-18 6:49 GMT+08:00 Swapnil Shinde <sw...@gmail.com>:

> Thanks. I think I fixed it support maprFS. I will do some more testing and
> then add a jira ticket and PR.
>
> On Mon, Jul 17, 2017 at 11:51 AM, Ravindra Pesala <ra...@gmail.com>
> wrote:
>
> > Hi,
> >
> > Right now we don't have support to maprfs filesystem , so it would be
> > unpredictable even though you have fixed at some places. We need to check
> > in all places and add the maprfs support. So it would be great if you can
> > add the support to maprfs in carbon.
> >
> > And one more observation is please provide absolute path along with
> > maprfs:// in all the places instead of giving relative path. And also
> make
> > sure that storelocation inside carbon properties and store location while
> > creating carbon session must be same.
> >
> > Regards,
> > Ravindra.
> >
> > On 17 July 2017 at 11:25, Swapnil Shinde <sw...@gmail.com>
> wrote:
> >
> > > Hello
> > >     I am new to carbon data and we are trying to use carbon data in
> > > production. I built and installed it on Spark edge nodes as per given
> > > instruction -
> > >
> > > *Build -* No major issues.
> > > *Installation -* Followed yarn installation (
> > http://carbondata.apache.org/
> > > installation-guide.html)  instructions.
> > > *Infrastructure -* Spark 2.1.0 on MapR cluster.
> > > *carbon.properties changes -*
> > >           carbon.storelocation=/tmp/hacluster/Opt/CarbonStore
> > >           carbon.badRecords.location=/opt/Carbon/Spark/badrecords
> > >           carbon.lock.type=HDFSLOCK
> > > *spark-default.conf changes -*
> > >          spark.yarn.dist.files     /opt/mapr/spark/spark-2.1.0/
> > > conf/carbon.properties
> > >          spark.yarn.dist.archives    /opt/mapr/spark/spark-2.1.0/
> > > carbonlib/carbondata.tar.gz
> > >          spark.executor.extraJavaOptions
> > -Dcarbon.properties.filepath=
> > > carbon.properties
> > >          spark.driver.extraJavaOptions
>  -Dcarbon.properties.filepath=/
> > > opt/mapr/spark/spark-2.1.0/conf/carbon.properties
> > > *Command line -*
> > > /opt/mapr/spark/spark-2.1.0/bin/spark-shell --name "My app" --master
> > yarn
> > > --jars /opt/mapr/spark/spark-2.1.0/carbonlib/carbondata_2.11-1.1.
> > 0-shade-hadoop2.2.0.jar
> > > \
> > > --driver-memory 1g \
> > > --executor-cores 2 \
> > > --executor-memory 2G
> > > *Code snippet -*
> > > import org.apache.spark.sql.SparkSession
> > > import org.apache.spark.sql.CarbonSession._
> > > val carbon = SparkSession.builder().config(sc.getConf).
> > > getOrCreateCarbonSession("/mapr/ri0.abc.com/tmp", "/mapr/
> ri0.abc.com/tmp
> > ")
> > > carbon.sql("""CREATE TABLE
> > >                         IF NOT EXISTS test_table(
> > >                                   id string,
> > >                                   name string,
> > >                                   city string,
> > >                                   age Int)
> > >                        STORED BY 'carbondata'""")
> > >
> > > carbon.sql("""LOAD DATA INPATH '/mapr/ri0.comscore.com/tmp/sample.csv'
> > >                   INTO TABLE test_table""")
> > >
> > > First error -
> > >       Inital error was "*Dictionay file is locked for updation*".
> Further
> > > debugging showed that it was due to missing maprFS filesyste.
> > > (HDFSFileLock.java line # 52) -
> > > String hdfsPath =  "conf.get(CarbonCommonConstants.FS_DEFAULT_FS)"
> > >
> > > I added some code to workaround with path like maprfs:///* and that
> > seemed
> > > to be working fine. (like adding MAPRFS FileType)
> > >
> > > *Second error -*
> > >    First error was gone after mapFS refactoring but then it fails with
> > > below error. *It seems *.dict & *.dictmeta are not getting created.*
> > > Could you please help me resolving this error?
> > > [image: Inline image 1]
> > >
> > >
> > > Thanks
> > > Swapnil
> > >
> > >
> >
> >
> > --
> > Thanks & Regards,
> > Ravi
> >
>

Re: FileNotFoundExceptions while running CarbonData

Posted by Liang Chen <ch...@apache.org>.
Hi Swapnil

Very look forward to seeing your PR.
Please let me know your Apache JIRA email id, i will add the contributor
right for you.

Regards
Liang

2017-07-18 6:49 GMT+08:00 Swapnil Shinde <sw...@gmail.com>:

> Thanks. I think I fixed it support maprFS. I will do some more testing and
> then add a jira ticket and PR.
>
> On Mon, Jul 17, 2017 at 11:51 AM, Ravindra Pesala <ra...@gmail.com>
> wrote:
>
> > Hi,
> >
> > Right now we don't have support to maprfs filesystem , so it would be
> > unpredictable even though you have fixed at some places. We need to check
> > in all places and add the maprfs support. So it would be great if you can
> > add the support to maprfs in carbon.
> >
> > And one more observation is please provide absolute path along with
> > maprfs:// in all the places instead of giving relative path. And also
> make
> > sure that storelocation inside carbon properties and store location while
> > creating carbon session must be same.
> >
> > Regards,
> > Ravindra.
> >
> > On 17 July 2017 at 11:25, Swapnil Shinde <sw...@gmail.com>
> wrote:
> >
> > > Hello
> > >     I am new to carbon data and we are trying to use carbon data in
> > > production. I built and installed it on Spark edge nodes as per given
> > > instruction -
> > >
> > > *Build -* No major issues.
> > > *Installation -* Followed yarn installation (
> > http://carbondata.apache.org/
> > > installation-guide.html)  instructions.
> > > *Infrastructure -* Spark 2.1.0 on MapR cluster.
> > > *carbon.properties changes -*
> > >           carbon.storelocation=/tmp/hacluster/Opt/CarbonStore
> > >           carbon.badRecords.location=/opt/Carbon/Spark/badrecords
> > >           carbon.lock.type=HDFSLOCK
> > > *spark-default.conf changes -*
> > >          spark.yarn.dist.files     /opt/mapr/spark/spark-2.1.0/
> > > conf/carbon.properties
> > >          spark.yarn.dist.archives    /opt/mapr/spark/spark-2.1.0/
> > > carbonlib/carbondata.tar.gz
> > >          spark.executor.extraJavaOptions
> > -Dcarbon.properties.filepath=
> > > carbon.properties
> > >          spark.driver.extraJavaOptions
>  -Dcarbon.properties.filepath=/
> > > opt/mapr/spark/spark-2.1.0/conf/carbon.properties
> > > *Command line -*
> > > /opt/mapr/spark/spark-2.1.0/bin/spark-shell --name "My app" --master
> > yarn
> > > --jars /opt/mapr/spark/spark-2.1.0/carbonlib/carbondata_2.11-1.1.
> > 0-shade-hadoop2.2.0.jar
> > > \
> > > --driver-memory 1g \
> > > --executor-cores 2 \
> > > --executor-memory 2G
> > > *Code snippet -*
> > > import org.apache.spark.sql.SparkSession
> > > import org.apache.spark.sql.CarbonSession._
> > > val carbon = SparkSession.builder().config(sc.getConf).
> > > getOrCreateCarbonSession("/mapr/ri0.abc.com/tmp", "/mapr/
> ri0.abc.com/tmp
> > ")
> > > carbon.sql("""CREATE TABLE
> > >                         IF NOT EXISTS test_table(
> > >                                   id string,
> > >                                   name string,
> > >                                   city string,
> > >                                   age Int)
> > >                        STORED BY 'carbondata'""")
> > >
> > > carbon.sql("""LOAD DATA INPATH '/mapr/ri0.comscore.com/tmp/sample.csv'
> > >                   INTO TABLE test_table""")
> > >
> > > First error -
> > >       Inital error was "*Dictionay file is locked for updation*".
> Further
> > > debugging showed that it was due to missing maprFS filesyste.
> > > (HDFSFileLock.java line # 52) -
> > > String hdfsPath =  "conf.get(CarbonCommonConstants.FS_DEFAULT_FS)"
> > >
> > > I added some code to workaround with path like maprfs:///* and that
> > seemed
> > > to be working fine. (like adding MAPRFS FileType)
> > >
> > > *Second error -*
> > >    First error was gone after mapFS refactoring but then it fails with
> > > below error. *It seems *.dict & *.dictmeta are not getting created.*
> > > Could you please help me resolving this error?
> > > [image: Inline image 1]
> > >
> > >
> > > Thanks
> > > Swapnil
> > >
> > >
> >
> >
> > --
> > Thanks & Regards,
> > Ravi
> >
>

Re: FileNotFoundExceptions while running CarbonData

Posted by Swapnil Shinde <sw...@gmail.com>.
Thanks. I think I fixed it support maprFS. I will do some more testing and
then add a jira ticket and PR.

On Mon, Jul 17, 2017 at 11:51 AM, Ravindra Pesala <ra...@gmail.com>
wrote:

> Hi,
>
> Right now we don't have support to maprfs filesystem , so it would be
> unpredictable even though you have fixed at some places. We need to check
> in all places and add the maprfs support. So it would be great if you can
> add the support to maprfs in carbon.
>
> And one more observation is please provide absolute path along with
> maprfs:// in all the places instead of giving relative path. And also make
> sure that storelocation inside carbon properties and store location while
> creating carbon session must be same.
>
> Regards,
> Ravindra.
>
> On 17 July 2017 at 11:25, Swapnil Shinde <sw...@gmail.com> wrote:
>
> > Hello
> >     I am new to carbon data and we are trying to use carbon data in
> > production. I built and installed it on Spark edge nodes as per given
> > instruction -
> >
> > *Build -* No major issues.
> > *Installation -* Followed yarn installation (
> http://carbondata.apache.org/
> > installation-guide.html)  instructions.
> > *Infrastructure -* Spark 2.1.0 on MapR cluster.
> > *carbon.properties changes -*
> >           carbon.storelocation=/tmp/hacluster/Opt/CarbonStore
> >           carbon.badRecords.location=/opt/Carbon/Spark/badrecords
> >           carbon.lock.type=HDFSLOCK
> > *spark-default.conf changes -*
> >          spark.yarn.dist.files     /opt/mapr/spark/spark-2.1.0/
> > conf/carbon.properties
> >          spark.yarn.dist.archives    /opt/mapr/spark/spark-2.1.0/
> > carbonlib/carbondata.tar.gz
> >          spark.executor.extraJavaOptions
> -Dcarbon.properties.filepath=
> > carbon.properties
> >          spark.driver.extraJavaOptions     -Dcarbon.properties.filepath=/
> > opt/mapr/spark/spark-2.1.0/conf/carbon.properties
> > *Command line -*
> > /opt/mapr/spark/spark-2.1.0/bin/spark-shell --name "My app" --master
> yarn
> > --jars /opt/mapr/spark/spark-2.1.0/carbonlib/carbondata_2.11-1.1.
> 0-shade-hadoop2.2.0.jar
> > \
> > --driver-memory 1g \
> > --executor-cores 2 \
> > --executor-memory 2G
> > *Code snippet -*
> > import org.apache.spark.sql.SparkSession
> > import org.apache.spark.sql.CarbonSession._
> > val carbon = SparkSession.builder().config(sc.getConf).
> > getOrCreateCarbonSession("/mapr/ri0.abc.com/tmp", "/mapr/ri0.abc.com/tmp
> ")
> > carbon.sql("""CREATE TABLE
> >                         IF NOT EXISTS test_table(
> >                                   id string,
> >                                   name string,
> >                                   city string,
> >                                   age Int)
> >                        STORED BY 'carbondata'""")
> >
> > carbon.sql("""LOAD DATA INPATH '/mapr/ri0.comscore.com/tmp/sample.csv'
> >                   INTO TABLE test_table""")
> >
> > First error -
> >       Inital error was "*Dictionay file is locked for updation*". Further
> > debugging showed that it was due to missing maprFS filesyste.
> > (HDFSFileLock.java line # 52) -
> > String hdfsPath =  "conf.get(CarbonCommonConstants.FS_DEFAULT_FS)"
> >
> > I added some code to workaround with path like maprfs:///* and that
> seemed
> > to be working fine. (like adding MAPRFS FileType)
> >
> > *Second error -*
> >    First error was gone after mapFS refactoring but then it fails with
> > below error. *It seems *.dict & *.dictmeta are not getting created.*
> > Could you please help me resolving this error?
> > [image: Inline image 1]
> >
> >
> > Thanks
> > Swapnil
> >
> >
>
>
> --
> Thanks & Regards,
> Ravi
>

Re: FileNotFoundExceptions while running CarbonData

Posted by Ravindra Pesala <ra...@gmail.com>.
Hi,

Right now we don't have support to maprfs filesystem , so it would be
unpredictable even though you have fixed at some places. We need to check
in all places and add the maprfs support. So it would be great if you can
add the support to maprfs in carbon.

And one more observation is please provide absolute path along with
maprfs:// in all the places instead of giving relative path. And also make
sure that storelocation inside carbon properties and store location while
creating carbon session must be same.

Regards,
Ravindra.

On 17 July 2017 at 11:25, Swapnil Shinde <sw...@gmail.com> wrote:

> Hello
>     I am new to carbon data and we are trying to use carbon data in
> production. I built and installed it on Spark edge nodes as per given
> instruction -
>
> *Build -* No major issues.
> *Installation -* Followed yarn installation (http://carbondata.apache.org/
> installation-guide.html)  instructions.
> *Infrastructure -* Spark 2.1.0 on MapR cluster.
> *carbon.properties changes -*
>           carbon.storelocation=/tmp/hacluster/Opt/CarbonStore
>           carbon.badRecords.location=/opt/Carbon/Spark/badrecords
>           carbon.lock.type=HDFSLOCK
> *spark-default.conf changes -*
>          spark.yarn.dist.files     /opt/mapr/spark/spark-2.1.0/
> conf/carbon.properties
>          spark.yarn.dist.archives    /opt/mapr/spark/spark-2.1.0/
> carbonlib/carbondata.tar.gz
>          spark.executor.extraJavaOptions    -Dcarbon.properties.filepath=
> carbon.properties
>          spark.driver.extraJavaOptions     -Dcarbon.properties.filepath=/
> opt/mapr/spark/spark-2.1.0/conf/carbon.properties
> *Command line -*
> /opt/mapr/spark/spark-2.1.0/bin/spark-shell --name "My app" --master yarn
> --jars /opt/mapr/spark/spark-2.1.0/carbonlib/carbondata_2.11-1.1.0-shade-hadoop2.2.0.jar
> \
> --driver-memory 1g \
> --executor-cores 2 \
> --executor-memory 2G
> *Code snippet -*
> import org.apache.spark.sql.SparkSession
> import org.apache.spark.sql.CarbonSession._
> val carbon = SparkSession.builder().config(sc.getConf).
> getOrCreateCarbonSession("/mapr/ri0.abc.com/tmp", "/mapr/ri0.abc.com/tmp")
> carbon.sql("""CREATE TABLE
>                         IF NOT EXISTS test_table(
>                                   id string,
>                                   name string,
>                                   city string,
>                                   age Int)
>                        STORED BY 'carbondata'""")
>
> carbon.sql("""LOAD DATA INPATH '/mapr/ri0.comscore.com/tmp/sample.csv'
>                   INTO TABLE test_table""")
>
> First error -
>       Inital error was "*Dictionay file is locked for updation*". Further
> debugging showed that it was due to missing maprFS filesyste.
> (HDFSFileLock.java line # 52) -
> String hdfsPath =  "conf.get(CarbonCommonConstants.FS_DEFAULT_FS)"
>
> I added some code to workaround with path like maprfs:///* and that seemed
> to be working fine. (like adding MAPRFS FileType)
>
> *Second error -*
>    First error was gone after mapFS refactoring but then it fails with
> below error. *It seems *.dict & *.dictmeta are not getting created.*
> Could you please help me resolving this error?
> [image: Inline image 1]
>
>
> Thanks
> Swapnil
>
>


-- 
Thanks & Regards,
Ravi