You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@archiva.apache.org by Joakim Erdfelt <jo...@erdfelt.com> on 2007/04/10 22:25:07 UTC

State of the Archiva (April 2007)

State of the Archiva (April 2007)

:: PRESENT ::

  Work is continuing on the archiva-jpox-database branch.
  Many many improvements exist currently in that branch.
  The core / base / database changes have settled down, now the work
  in webapp continues to take advantage of the changes made in archiva-base.

:: FUTURE ::

  BRANCH to TRUNK merge.

  In roughly 2 weeks time, the branch will come up for a vote to be merged
  with trunk.   When this gets approval, the new trunk will undergo a
complete
  review with regards to existing jiras to determine if they still exist, or
  can be closed as fixed.

  This is a good time to update the documentation to reflect the current
  archiva UI and configuration process.

  RELEASES.

  Once the critical jiras have been closed, the initial release of Archiva
  1.0-alpha-1 should be cut.

  After this has occured, progress will continue on 1.0 following the
  outline in http://docs.codehaus.org/display/MAVENUSER/Archiva+Roadmap

  When we have a feature complete 1.0 (as per the roadmap) we'll start
  the 1.0-M1 release.

  When we have 2 solid weeks on a Milestone release without major bug
  reports we'll start the vote for 1.0 final.

:: CHANGES IN BRANCH ::

  First: lets show you a brief tour of the directories.

  archiva/branches/archiva-jpox-database-refactor/
  |-- archiva-base/
  |   |-- archiva-common/
  |   |-- archiva-configuration/
  |   |-- archiva-consumers/                   (NEW)
  |   |   |-- archiva-consumer-api/            (NEW)
  |   |   |-- archiva-core-consumers/          (NEW)
  |   |   |-- archiva-database-consumers/      (NEW)
  |   |   |-- archiva-lucene-consumers/        (NEW)
  |   |   `-- archiva-signature-consumers/     (NEW)
  |   |-- archiva-converter/
  |   |-- archiva-indexer/
  |   |-- archiva-model/                       (NEW)
  |   |-- archiva-proxy/
  |   |-- archiva-repository-layer/
  |   |-- archiva-scheduled/                   (NEW)
  |   `-- archiva-xml-tools/                   (NEW)
  |-- archiva-cli/
  |-- archiva-database/                        (NEW)
  |-- archiva-reporting/
  |   `-- archiva-report-manager/              (Was
archiva-reports-standard)
  |-- archiva-site/
  |-- archiva-web/
  |   |-- archiva-applet/
  |   |-- archiva-security/
  |   |-- archiva-standalone/
  |   |   |-- archiva-plexus-application/
  |   |   `-- archiva-plexus-runtime/
  |   |-- archiva-webapp/
  |   `-- archiva-webapp-test/
  |-- design/
  |   |-- logos/
  |   `-- white-site/
  `-- maven-meeper/

  MAJOR CHANGES:

  modules refactored out of existance:
    * archiva-discoverer
      The classes here have been simplified and merged with
      archiva-repository-layer.
    * archiva-core
      The classes in here have been moved to archiva-repository-layer,
      archiva-common, archiva-model, and archiva-consumer-api

  The archiva-repository-layer module is now the nexus for all things
  that work against the repository.

  The role of archiva.xml configuration file has been changed from being
  the canonical source for 'configured' repositories, to being a bootstrap
  for configured repositories stored and maintained in the database, and
  the list of active consumers to use in the various stages of content
  consumption. (more on that later)

  The use of maven-artifact and maven-project has been removed as the
  assumptions present in each (everything is for the purposes of a build)
  are inappropriate for archiva and jpox.  The new inbuilt replacements
  are more resilient to missing referenced data.

  Terminology:
    I had to establish a new set of terminology to describe bits in
    the database.

     Name        | Group ID | Artifact ID | Version | Classifier | Type |
     ------------+----------+-------------+---------+------------+------+
     Project     |  yes     |   yes       |         |            |      |
     Versioned   |  yes     |   yes       |   yes   |            |      |
     Artifact    |  yes     |   yes       |   yes   |    yes     | yes  |

   These terms (Project, Versioned, Artifact) describes the heirarchy that
   is present in the repository.
   1 Project can contain multiple versions, each version can contain
     multiple artifacts.

  CONTENT SCANNING:

  The scanning of content from the repository occurs in 2 major stages.

  Major Stage 1:  Scan of repository filesystem.
    Artifacts Stage:
      a) Find the new artifacts and put them into the database as
unprocessed.
      b) Find the maven-metadata.xml and put them into the database.
      c) Validate checksums (and report issues).
      d) Create missing checksums.
    Content Stage:
      a) Index content (lucene)
    Bad Content Stage:
      a) Auto remove known bad content.
      b) Auto rename known common filename issues.
      c) Flag remaining unknown content as bad (in report).

  Major Stage 2:  Scan of artifacts from database.
    Unprocessed Artifacts Stage:
      a) Find pom artifacts and load project model into database.
      b) Index artifact details (lucene)
      c) Validate repository metadata.
      d) Index archiva table of contents (lucene)
      e) Update bytecode information in artifact-java-details.
      f) Index public methods (lucene)
    Processed Artifacts Stage:
      a) Artifact not present in filesystem, remove artifact from db.
      b) Artifact of type 'pom' not present in filesystem, remove project
         model from db
      c) Artifact not present in filesystem, remove from lucene index.

  The benefit of these stages is that it allows the content to be found on
  the filesystem and be made available to the users via the browse interface
  relatively quickly. (Takes about 6 minutes to scan all of ibiblio this
way)

  If a user happens to request an versioned project browse that has
  yet to undergo the Major Stage 2, a 'Just in Time' scan of that specific
  project is done.

  The repository scan has been changed to include all content "**/*" and
  specifically exclude known ignorable content. For each discovered file
  a determination is made to see if it falls into the Artifact list or
  the Content list, if it doesnt' fall into those two lists.

  The archiva.xml contains the lists of patterns for ...
    a) Artifacts
    b) Indexable Content
    c) Auto-Remove
    d) Ignored

  For latest, in code, lists see: http://tinyurl.com/2hbzoc

  CONSUMER API:

  This is a fundamental part of how archiva knows what to do with the
content
  it is tracking.

  We have 2 major consumer api interfaces.

  RepositoryContentConsumer - http://tinyurl.com/28roxn
    This consumer interface is used for those consumers that want to operate
    on the raw files in the repository filesystem.

  ArchivaArtifactConsumer   - http://tinyurl.com/2s2blk
    This consumer interface is used for those consumers that want to operate
    on artifacts.  Those consumers operating on the second major phase (as
    outlined above as the Database Scan) should use this interface.

  This allows for a very simplified content scan and manipulation in
archiva.

- Joakim Erdfelt

Re: State of the Archiva (April 2007)

Posted by Brett Porter <br...@apache.org>.
Mostly sounds good.

Firstly, all this stuff needs to become some sort of code  
documentation. I'm regularly hearing feedback that it's hard to find  
a way in to this stuff.

Comments inline...

On 11/04/2007, at 6:25 AM, Joakim Erdfelt wrote:

>   The role of archiva.xml configuration file has been changed from  
> being
>   the canonical source for 'configured' repositories, to being a  
> bootstrap
>   for configured repositories stored and maintained in the  
> database, and
>   the list of active consumers to use in the various stages of content
>   consumption. (more on that later)

I'd like to hear more on that, and the reason why. It sounds very  
confusing.

>
>   The use of maven-artifact and maven-project has been removed as the
>   assumptions present in each (everything is for the purposes of a  
> build)
>   are inappropriate for archiva and jpox.  The new inbuilt  
> replacements
>   are more resilient to missing referenced data.

That seems more of a flaw in those libraries than anything. maven- 
artifact should not be build specific.

Maybe we need to look at using the reasons as impetus for change in  
maven 2.1? Duplicating that code seems like a long term maintenance  
risk.

Was it also removed from the proxying? I haven't had a chance to look  
yet, but that sounds like a lot of duplicated functionality.

>
>   The benefit of these stages is that it allows the content to be  
> found on
>   the filesystem and be made available to the users via the browse  
> interface
>   relatively quickly. (Takes about 6 minutes to scan all of ibiblio  
> this
> way)

 From scratch? Presumably when it's unchanged it's much faster?

>     c) Auto-Remove

what are these?

Thanks for putting this together Joakim. Looking forward to kicking  
the tires. Also, I'll have a closer look at the code - I'll hold off  
on that until its close to being ready for trunk though.

Cheers,
Brett