There's a counter-constructive saying: a project is dead as soon as you add documentation (Aeschylus, I believe).
This could not be more incorrect. Whereas it is true that writing documentation on an evolving project will quickly result in the fresh documentation becoming quickly invalid, it is a planning truth that writing documentation once a project is finishing is impossible because there are a hundread and one more pressing issues. Therefore, adding docstrings to each function, method and class in Python as one goes along is by far more advantageous. Once this is done, however this information needs to be transmuted into documentation. Here is how once can set up ReadTheDocs without falling into a few traps, as the documentation generator Sphinx is ironically weirdly documented and should be done ideally early on, so one knows what mistakes one's making.
Note
I run an internal workshop on adding extras to a GitHub repository and one question in the feedback was if I could write out the steps to do ReadTheDocs properly. And this is it.
Motivation
Motivation is hard when it’s a big task. When the project is complete, the paper becomes the only focus and documentation falls on the sidelines. It does not help that reviewers rarely check code: in my experience, half of reviewers do not even check a web app. However, in the grand scheme of things it does matter. Therefore, one should not leave it to last. Every time I have left it as the last thing I have sorely regretted it. Furthermore, the sentence “I wrote this, but don’t remember what it does” does often arise when documentation is done late.
Invalid top-down documentation
The saying in the lead applies primarily to writing an overview. In an ideal world, the overview is written first and acts as a roadmap of how the module ought to work and by virtue of the excellent planning resulting from the thoroughly thought-out overview will be true even at the end of the project. However, projects evolve and more often than not they were not started with the idea of being evolvable —the comparison of an American city vs. a European medieval city is classic teaching example of the concept of planning and evolvability in CS. Nevertheless, I would still advocate to think what the end goals of a project are and sketch them out before starting. But majorly the most time-consuming part of writing documentation is the description of the parts, hence my insistence on writing them as one goes along.
Docstrings
In Python docstrings work really nicely —much nicer than doxigen documentation in C++. Docstrings are generally written in ReStructuredText (rst) within triple quotes within a function, method or class.
from typing import Any
def foo(bar: Any) -> int:
    """
    This is a docstring.
    :param bar: This is a parameter.
    :type bar: Any
    :return: This is a return value.
    :rtype: int
    """
    ...
    return bar
All these docstrings can be converted by Sphinx into a nice documentation page. Previously I wrote a blog post about converting docstrings to markdown documentation for GitHub, which is helpful in the case the project is not intended to be pip released, but for a proper project this is a bad idea and instead the correct course of action is to create ReadTheDocs documentation. The preferred format for GitHub is markdown as it's easier and the Sphinx autodoc extension is not applicable there. The preferred format for ReadTheDocs is ReStructuredText (rst).
The textbook example generation of the conf.py file is 
using Sphinx sphinx-quickstart command.
This does not automatically tell it to convert docstrings out of the box, but you have to add them.
The docstrings and module content is ”API” documentation and the command line tool sphinx-apidoc or sphinx-autogen
do this. But it often requires some tweaks for the API documentation one wants.
At the base of the repo, we will create a .readthedocs.yml file for ReadTheDocs,
but first lets make a .readthedocs folder (or any other name you want) will the documentation.
sphinx-apidoc -o .readthedocs . .readthedocs --full -A 'Your name here' -l 'en';
cd .readthedocs;
Running make html in that folder will generate the documentation in the html folder,
for you to check out. Do this often as stuff breaks easily with Sphinx.
Some tweaks are a must.
In the folder there are two main files of interest, the conf.py file and the index.rst file. 
The former holds how the project is parsed the latter how is the main menu.
Automodule, autoclass, autobahn, automethod, autofunction
The index.rst file is the main menu. It will refer to a file, without the .rst extension,
with the name of your module.
This will be a file in the folder along with all submodules, in the format module.submodule.rst. And will contain the following workhorse:
.. automodule:: module_name.submodule_name
   :members:
   :undoc-members:
   :show-inheritance:
There are a few directives like this that can be used to generate the documentation
and are discussed in autodoc documentation,
such as autoclass.
When you add a new python file (submodule) to your project,
Sphinx will not know about it. So be vigilant to add a new definition to the index.rst file.
The following parameters are worth noting:
- :members:will include all the members of the module and the order can be changed with- :member-order:.
- This will not include private (_foo) or magic (or dunder) methods.:private-members:will include all, while:special-members:will include magic methods (called special by nobody except Sphinx).
- :undoc-members:will include all members that are not documented.
- :inherited-members:will include all members that are inherited from a parent class, which is rather key.
When a class gets too big, it should be split into multiple files, each with a single class in it
that has a functional theme. These classes will form a chain of inheritance, leading up to the main class.
Naming the split files with underscores will get them ignored. Consequently,
it is an option to document only the main class which thanks to :inherited-members: will have everything.
But :inherited-members: is not always welcome. 
For example, when using typehinting (which is optional but actually a must),
one does resort to typing.TypedDict (which allows you to specify the expected names of the dictionary keys and the
type of its values) or typing.TypeVar (which is a wrapper for a type). The :inherited-members: on these will
make a mess of pointlessness.
Therefore it often gets easier to manually define how one wants things annotated via multiple autoclass 
rather than the autogenerated blanket automodule.
conf.py file
Ignore sys.path.insert
In the conf.py file, there's a commented out line with sys.path.insert. Leave it like so.
In the .readthedocs.yml file, there will be
python:
   install:
     - method: pip
       path: .
     - requirements: .readthedocs/requirements.txt
     - requirements: requirements.txt
So the module to be documented will be installed anyway (path: .).
extensions
The conf.py file does not call a function like setup in a setup.py file,
but just sets global variables for Sphinx.
One is the list extensions which tells Sphinx which extensions to use. E.g.
extensions = [
    'readthedocs_ext.readthedocs',
    'sphinx.ext.viewcode',
    'sphinx.ext.todo',
    #'sphinx_toolbox.more_autodoc',
    'sphinx.ext.autodoc',
]
readthedocs_ext.readthedocs will be added by RTD, but it's nice for testing locally (need to be installed).
sphinx.ext.viewcode shows the code snippets in the documentation.
sphinx_toolbox.more_autodoc is a nice extension that adds more autodoc directives,
but is hard to set up as it will crash one and a million corner cases —more so than mypy.
But it is a good idea to check if it can in the first place —if something fails use the subsets that work.
sphinx_toolbox.more_autodoc.typehints is the key one in my opinion as vanilla Sphinx does not do typehints.
In the sphinx-quickstart command documentation 
there's a list of vanilla extensions that one can use.
It should be noted that the classic way to specify typehint only methods:
import typing
if typing.TYPE_CHECKING:
    from foo import Foo
needs to be altered to:
import typing
if typing.TYPE_CHECKING or 'sphinx' in sys.modules:
    from foo import Foo
Other variables
There is a variable html_static_path, which can be set to an empty list if there are no static files:
html_static_path = ['_static']
This is because you cannot git commit an empty folder so without a _static folder it will fail. 
There is also a line html_theme = 'alabaster' which is the default theme for Sphinx.
ReadTheDocs uses 'sphinx_rtd_theme'. Therefore to use the sphinx_rtd_theme locally you need to install it.
So our installation list is looking like:
pip install sphinx-toolbox readthedocs-sphinx-ext sphinx-rtd-theme
Other variables worth adding for more_autodoc are:
always_document_param_types = True
typehints_defaults = 'braces'   # other styles are available
The root_doc variable is a good way to store the rst files in a folder to declutter. By default it is index
as index.rst is the main page, so moving it to source/index.rst and setting root_doc='source/index'.
Alternatively, one could have the conf.py in that folder, but not the make.
init.py
Counterintuitively, __init__ method docstrings are skipped, 
even if at first documentation of how to initialise a module would be expected in the __init__.py file.
There are thre solutions:
One can add it manually on an autoclass directive
via :special-members: __init__ in the rst definition.
One can globally override its skippage in the conf.py file one can add:
def skip(app, what, name, obj,would_skip, options):
    if name in ( '__init__',):
        return False
    return would_skip
def setup(app):
    app.connect('autodoc-skip-member', skip)
One can document class initialisation in the class docstring, which is often done, but one loses the typehints.
However, as codeclimate painfully reminds us, there should be ideally 4 or less attributes in a method,
and class initialisation often has many arguments, so you may end up using packed keyword arguments
annotated as a TypedDict. And to add insult to injury, the init may be overloaded:
from typing_extensions import Unpack, TypedDict  # this is a 3.10 feature
from typing import Dict, List
from singledispatchmethod import singledispatchmethod
class FooOptions(TypedDict):
    a: int
    b: str
    c: float
    d: bool
    e: Dict[str, int]
class Foo:
   """
   This class accepts a main arguments, either as a dictionary or as a list, 
   followed by various options as keyword arguments as specified in the `FooKwargs` class.
   """
    @singledispatchmethod
    def __init__(self, data: list, **options: Unpack[FooOptions]):
        """
        This docstring will be skipped. And also, are we talking of this dispatch or all?
        """
        self.data:List[int] = data
        self.a:int = options.get('a', -1)
        self.b:str = options.get('b', 'unknown')
        self.c:float = options.get('c', float('nan'))
        self.d:bool = options.get('d', False)
        self.e:Dict[str, int] = options.get('e', {})
    @__init__.register
    def _(self, data: dict, **kwargs: Unpack[FooOptions]):
        self.__init__(list(data.values()), **kwargs)
In this rather extreme case, annotating the class makes very much more sense. If this example seemed very alien, don't worry, but do make sure to read up on typehints as they make coding easier and less error-prone and as a bonus PyCharm will give better suggestions.
Mock
Often some module is required, but this requires a dark magic ritual to get running.
As a result the Mock class from unittest is of great use.
This is used to make a mock of a module, which pretends to be there, but does nothing.
So in config.py one can add:
import sys
from unittest.mock import Mock, MagicMock
sys.modules['foo'] = MagicMock()
Mixed Markdown
GitHub runs off a README.md, while the PyPI runs off the setuptools.setup call in setup.py, specifically whatever text is passed to the description and long_description arguments and flavoured via long_description_content_type argument. However, most projects simply pass the text of the former to the latter. The same applies to the intro in RTD.
Therefore, it's beneficial to mix some markdown within the RST files. To make Sphinx accept both the module sphinx-mdinclude can be used. In the requirements.txt, it is hyphenated, while in the include list in the conf.py it is underscored.
The conf.py for Sphinx is messy and will populate its folder with RST files hence why it was kept separate above. This however means that the markdown files at the root of the project will be missed. As a result they need to be copied over to the documentation folder and the contained links fixed and the filenames changed to me more graceful (README.md to Description.md).
Additional caveats
Stick to ReStructuredText
One can write docstrings directly in markdown, but this is not a great idea as RST is specifically designed for code annotation as we will see in a later section.
Catch formatting errors early
PyCharm autofills docstrings for you if set to do so (search preferences for “Automatic documentation”), but a common mistake is to not add a blank space between the description and the parameters. Without this the first parameter will be interpreted as the description and not as a bullet point. Everyone makes this mistake, but if one started early to check the documentation was getting generated fine, then one would avoid this subsequently.
Browser hard refresh
In a browser it is critical to do a hard refresh of the pages (Shift+refresh button). Silly but I'd say 90% of issues come from this.
Tests are documentation
Tests are documentation. You should always write tests. I test new features generally in a Jupyter notebook, to see the outputs in full, but the key conclusions can be converted into a test. Future you or a user will likely check out the code in the tests, so do add docstrings to them too.
Check if possible
Sphinx has many extra formatting features over markdown and 
if you have a need for something that may be a common requirement, check the documentation and
pick up the extra extensions or Sphinx formatting tricks as the need arises, for example: :ivar: or :cvar:,
are worth adding to the documentation.
ReadTheDocs
So far I have gone through Sphinx, which only half of it. The next step, once we have a working Sphinx, is to use ReadTheDocs.
Yaml
Add a .readthedocs.yml file to the root of your project. For example I like to have:
version: 2
build:
  os: ubuntu-20.04
  tools:
    python: "3.8"
sphinx:
   configuration: .readthedocs/source/conf.py
   builder: html
   fail_on_warning: true
python:
   install:
     - method: pip
       path: .
     - requirements: requirements.txt
     - requirements: .readthedocs/requirements.txt
Namely, we install the module defined in the setup.py in the root with the method pip and 
the requirements with requirements.txt.
But as mentioned there are a few requirements specific to Sphinx, which have nothing to do with the module's
operations, hence the additional .readthedocs/requirements.txt file.
The fail_on_warning set to true is rather wishful thinking but at the debug stage this is helpful.
In the case of PyRosetta, we have a problem as it does not install like normal.
Luckily one can have private environment variables in ReadTheDocs (set within the settings for the project on the
ReadTheDocs website). In my package, pyrosetta-help is a command line tool that is added install_pyrosetta,
which requires the presence of the PYROSETTA_USERNAME and PYROSETTA_PASSWORD env variables.
This can be run by setting in the yaml file the following:
build:
  ...
  jobs:
    post_install:
      - install_pyrosetta
Likewise for other options the jobs directives can be used to better set up the environment.
Runtime
Once the yaml file is complete, head over to the ReadTheDocs website and link your GitHub account and create a new project from the reposition of interest.
Once the project build is kicked off, you can see what happens in the 'Builds' tab.
Clicking on the top build, which give a badge (green hopefully), a printout and on two links in the right hand side
saying view docs and view raw. The latter is crucial as it gives you the raw output.
Check for errors and warnings. If fail_on_warning is set to false,
then if the documentation was partially generated, it would claim to be a success
and only view raw would say otherwise.
And to reiterate, do make sure to do a hard refresh of the docs page.
Slack
In the settings on the site one can set up a webhook to a Slack channel to notify of build status.
 
No comments:
Post a Comment