Please note that the Scala wikis are in a state of flux. We strongly encourage you to add content but avoid creating permanent links. URLs will frequently change. For our long-term plans see this post by the doc czar.

Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.
Comment: Corrected links that should have been relative instead of absolute.

Scaladoc 2 is part of the Scala compiler (NSC), but is relatively self-contained. Scaladoc's source is in package scala.tools.nsc.doc. There are two entry points into Scaladoc: one for the command line, one for Ant.

Definitions

Documentation
A user-friendly (currently HTML) representation of the API of a library or system; this is the output of Scaladoc. An example of a documentation can be found here.

Entity
A symbol, of any kind, that is part of a documentation. An entity may be an entry in the documentation or it may not, for example in the case of a class that is referenced by the source but is not declared in it.

Source
The source code of a library or system, including Scaladoc comments, that can be compiled to a documentation; this is the input of Scaladoc.

Template
An entity representing a class, trait, object or package. A documentation template is a template that is declared in the source; it will be described in the documentation — in HTML, a page is generated for every documentable template. An external template is a template that is referenced by the sources, but is not declared in it; it will not be described in the documentation.

Process description

The process of transforming source into documentation is separated into three tasks, which are controlled by the DocFactory class.

  1. The compiler is run on the source to create what is called a symbol table, namely, the compiler's internal representation of the program's structure. This process is the same as beginning a normal compilation run, but is stopped before the actual bytecode generation starts. The following phases are executed: syntax analyzer, namer, package objects phase, typer, super-accessors phase, pickler, and reference checker.
  2. The model extractor transforms the symbol table left by the compiler into a new representation of the program's structure, called the “model”. The model is similar to the symbol table in that it represents all elements (classes, objects, methods, values, types, etc.) of a program, which it encodes as entities. It differs in that the model corresponds to the user-friendly (and to some degree incomplete) representation used in the documentation, whilst that used in the symbol table is tuned for fast and easy compilation. In other words, the entities of the model correspond directly to what will be printed to the documentation, contrary to the symbols of the symbol table which follow their own logic. In still other words, the model is an in-memory representation of the documentation. As such, everything that will be displayed to the user (such as companion classes), are already part of the model.
  3. The documentation generator takes the model and outputs it as a display format. Currently, HTML is the only output format, but Scaladoc's design plans for other generators to be added easily (for example: a PDF generator or a documentation query system for IDEs). Note that the generator exclusively defines a layout; the relation between the model and the generator is similar to that between HTML and CSS in the W3C model of separation of concerns.

The Model

The model is the central data structure of Scaladoc; it defines the structure that is used in documentation for representing a program. The model extractor encodes the relation between the compilation-friendly representation of a program (the symbol table) and the user-friendly representation used in documentation (the model).

Entities

The public interface of the model is represented by a group of entities, which are available as instances of the completely abstract sub-traits of class Entity. Users of the model (i.e. the generator), must only use the abstract sub-traits, which hide the actual implementation classes defined as part of the model extractor (see below).

The most important entity types are the following:

TemplateEntity extends Entity::
A class, trait, object or package.
MemberEntity extends Entity::
A member (method, value, type alias, abstract type, constructor, or inner template) of a template.
DocTemplateEntity extends TemplateEntity with MemberEntity::
A class, trait or object that is defined in the source (as opposed to one that is just referenced), or a package that contains at least one DocTemplateEntity.
NonTemplateMemberEntity extends MemberEntity::
A member that isn't a template (method, value, type alias, abstract type, or constructor).
ParameterEntity extends Entity::
A type or value parameter of a method or of a constructor.

The Model Extractor

The model extractor is implemented as class ModelFactory, which builds models based on a given compiler instance. A user of the model extractor only uses the makeModel method. The model is returned as the root package (an object with the abstract type Package, a subtype of Entity); the rest of the classes and members can be discovered by browsing down the hierarchy from there.

Internally, the model factory defines concrete implementation for the Entity abstract datatype. The implementation classes are called XxxImpl, or are simply anonymous classes. The concrete implementations should only be visible inside the model factory (the implementation classes are not private because there was a problem with that, but really, they should be). The various bits of the model are built by the model factory by querying the symbol table (that calculated by the compiler in the first step of the process). This is why you will see plenty of calls like sym.something in the implementation of the model. To try to limit the memory consumption of the model, which is already quite large, most parts of the model are implemented as forwarders to the symbol table. Only some crucial elements are stored as values in the model. In any case, from the point of view of the public doc.model.Entity datatype, this makes no difference.

Besides the implementation classes, the model factory contains a series of maker methods. Each of this method takes a symbol and returns a part of the model. Some are pretty straightforward, like the makeValueParam method, which simply maps a symbol to a ValueParam entity. Other are more complex, like the makeMember method. This complexity comes from the difference between the representation of the program in the symbol table (where all members are symbols of pretty much the same type), and the representation of the model (that treats various member kinds like methods, constructors or inner classes differently). Notice for example how the makeMember method will not return any model member for private of synthetic symbols, or how additional documentation-only members are returned when asking for a member of a symbol that has use cases. In other words, the maker methods implement the difference between the symbol table and the model.

The maker methods take two parameters: a symbol for which a model entity must be generated; and an inTpl (in template) class or object entity. The in template parameter is the class or object or package that contains the entity. For example, class scala.List is in package scala, which itself is in package root, which is a special case that isn't in any package. The in template parameter becomes interesting for inherited member. For example, a method f defined in class A also exists in class B that extends A because it is inherited there. Furthermore, the type of f may be different in B than it is in A, for example if it depends on a type parameter, even when there is no redefinition of f in B. This is particularly noticeable in the new collection library. In the symbol table, both {{f}}s are represented by the same symbol, and one must be careful to always read their type through the asSeenFrom method of the symbol, to take into account the difference between the two {{f}}s. In the model, there is a different entity generated for both {{f}}s, based on which template (class) they are seen from. This is what the inTpl parameter defines.

Order of Model Initialisation

The model is a highly mutually-referencing data structure. For example, class A may contain a method b that returns a B, and class B a method a that returns an A. Therefore, the initialisation of the templates in the model is a delicate process which order must be carefully controlled.

Some of the elements of a template are either calculated on demand (def) or memoized (lazy val), effectively taking them out of the initialisation problem, as long as they are not called before the model is completely built (that is: before makeModel returns). Other elements of a template, such as its members, its super-templates, or its comment, are calculated eagerly (val) as part of the template's constructor in DocTemplateImpl or MemberImpl.

Eventually, one of the templates referenced eagerly will not yet have been instantiated. To solve this problem, the pattern used to obtain a template is to always assume that the template doesn't exist and call the template constructor in any case (makeTemplate or makeDocTemplate). To guarantee that only a single instance is created for every template — a crucial property of the model — the template constructor is guaranteed to return the existing template instance, if it exists, only otherwise will it instantiate a fresh one. In order for the template constructor to return an existing instance, the latter has to be registered in the templateCache. This is done by the instance itself at the beginning of its construction process. It is crucial that the template is registered in the cache before its constructor may construct another template, since this other template may itself depend on the original template. Note that only DocTemplate instances need to be registered in the template cache. Other entities types are either not mutually-referencing — NonTemplateMemberEntity, NoDocTemplate — or are not guaranteed to be represented by a unique instance — TypeEntity (note to reader: it may be that NoDocTemplate should be guaranteed to be unique, for performance reasons).

To take the A-B example above again, we may have a situation where:

  1. The constructor for template AmakeDocTemplate — is called (probably in makeMember as a member of the package containing A).
    2. An uninitialised instance representing A is created. Its Object constructor has just been executed, all subsequent constructors still need to be executed.
    3. At the top of the constructors chain is the constructor of EntityImpl that defines the name of the template, follows the constructor of MemberImpl that parses the template's comment.
    4. Then comes the big DocTemplateImpl constructor:

    a. First, it registers itself in the templatesCache — from that point on, the constructor can safely call other template constructors;

    b. It calculates its linearization by calling makeTemplate on all parents;

    c. It calculates all members before method b;

    d. It calls makeMember for method b (which result is, as you remember, of type B) that will eventually call makeDocTemplate for template B.

    5. The constructor for template B is called until:

    e. It calls makeMember for method a (which result is, as you remember, of type A) that will eventually call makeDocTemplate for template A.

    6. The constructor for template A is called, which looks-up and immediately returns the original, partially-initialised instance from the templatesCache.
    7. The constructor for template B finishes.
    8. The remaining members of template A are calculated.
    9. The constructor for template A finishes.

In practice, the following two rules have to be remembered when programming the model factory:
- Eager (val) members defined in classes that are ancestors of DocTemplateImpl must not use makeDocTemplate or makeTemplate (or try to create template in any other way).
- The logic used to calculate an eager member can only assume members declared in Entity and TemplateEntity to be initialised. Because of the initialisation procedure, all other members may not yet have been initialised.

If a definition requires access to other members, it should be declared as a def or a lazy val, which will be executed after the model has been initialised. This guarantees that all constructors have been completed, and that makeDocTemplate is merely a lookup in the template cache).

The Generator

Will be described later…