You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@spark.apache.org by Senthil Kumar <se...@gmail.com> on 2016/05/12 10:53:49 UTC

Spark Exposing RDD as WebService ?

Hi All , I have a requirement to Process huge file ( 75 GB ) ..

          Here is the sample data :
          <InodeSection>
           <inode>
             <id>100</id>
             <name>spark.conf</name>
              .
              .
              .
           </inode>
          </InodeSection>

           <INodeDirectorySection>

<directory><parent>99</parent><inode>98</inode><inode>97</inode><inode>96</inode></directory>
          </INodeDirectorySection>


          Steps :
            1)    Load complete <InodeSection>
            2)    Load INodeDirectorySection
            3)    Iterate each INode and Query InodeSection as well as
InodeDirectory Section to know the Parents ( till ROOT directory )


          Currently i have done this , as below
          1) Load Inodes to Redis
          2) Load InodeDirectorySection to Redis
          3) For each Inode Query Redis and compute the Parents

           The number of Inodes close to 200 Million so the Job is not
completing within SLA.. I have max SLA as 2-2.5 Hours for this Operation.

           How do i use Spark here and Expose RDD as Service for my
requirement ??  Can this be done with Other methodologies ? ..

--Senthil