documentation_notes [the libarynth]

notes on documentation, processes, tools + ephemera

formats + software

File Formats / Open Codecs / AntiWord

documents that are in PDF format

using Skim to annotate pdfs on MacOS > http://skim-app.sourceforge.net/

merging two pdfs skimpdf merge IN_PDF_FILE_1 IN_PDF_FILE_2 [OUT_PDF_FILE]

to merge annotations/notes from 2 pdfs;

open doc-1.pdf in Skim
export note from “File > Export” with “Skim Notes” format
open doc-2.pdf in Skim
import notes via “File > Read Notes” make sure to unselect “replace existing notes”
save doc-2.pdf

documents that uses LaTeX

“The original documentation source is LaTeX. Simply running $\LaTeX$ gives you DVI, which you can convert into publication quality Postscript. Using pdflatex (NOT ps2pdf), you can also create very high quality PDF, which includes a real PDF table of contents, cross-references, and URL links. Finally, using latex2html, you can create almost native-quality HTML documentation. And, if you really need ASCII, you can get a reasonable rendering by running lynx (in its ASCII-dump mode) over the HTML.”

“latex2html comes with LaTeX macros that let you specify hyperlinks inside your document. They are rendered as real hyperlinks in HTML, and footnotes in the printed versions.”

– http://www.circlemud.org/~jelson

In its favour (in this particular context)…

It takes input in a text format. Use CVS, your favourite editor, etc.
It also produces excellent quality output, and generating HTML, PostScript and PDF output are all straightforward with standard tools.
It can generate things like cross-references, tables of contents and citations very easily.
There are good packages available for free that can typeset code right out of the source file.
There are freely available and very powerful diagramming tools that plug right in. (Anyone know of a speciality UML drawing package, BTW? The usual tools are OK, but I've never found, say, some Metapost macros to make it completely trivial the way it could be. Surely someone must have written some!)
The maths typesetting is all there if you need it, of course. That's very useful if you're working on a scientific app and need to document the algorithms as part of the design, and it doesn't get in the way if you don't need it.
It's available for minimal money and effort.
It's highly extensible. If you need to do something that doesn't come as standard, you almost certainly can (and someone else almost certainly already has, and put it on CTAN for you).
LaTeX versions are not incompatible. The file format has never changed. I have $\LaTeX$ files from 1989 that work without problem in the latest version of $\LaTeX$.

The only downside I can think of is the learning curve. Basic $\LaTeX$ use is fine, but for really good output, you're going to want your own class file and/or packages. That's fantastic once you've got it – all your docs follow a consistent style, and you can make it easy for newbies to learn the tool. Someone has to be pretty sharp to write the class/packages in the first place, though, or you have to be prepared to buy in expert help for a couple of days.

XML, XSLT and Docbook

I've been using XML and Docbook for a while now, and I really, really like it, particularly if you use Docbook as an intermediate format rather than what you actually write your documentation in.

For example, I've got some really nice stuff for building use cases in XML. I created my own DTD for a use case language (which I call UCML) that allows me to define actors, use cases, goals, glossary terms, etc. Use cases consist primarily of a sequence of steps (nestable) with links to actors, terms, other use cases or steps, goals, etc. along with some other tags that define the name, extends relationships with other use cases, termination states and conditions, preconditions, business rules, etc.

I also have some XSLT stylesheets that do a number of nifty things with these UCML documents. One stylesheet transforms UCML to HTML, complete with every imaginable hyperlink, tables of contents etc., and even deduces a bunch of relationships which it documents (and hyperlinks). For example, if a use case mentions an actor or another use case in its steps, the stylesheet adds a section to the HTML description of that use case that documents the fact that this use case uses (in the OO sense) that one, or that this actor participates, and adds similar information to the descriptions of the use case and actor that are referenced. This is just a sample, the stylesheet does a lot more, and produces very usable and consistent documentation.

Another stylesheet acts as a sort of garbage collector. Phases (groups of Use Cases intented to implemented together) and Use Cases can be marked as “dead”, in which case the UCML→HTML stylesheet will “hide” them (they won't show up in the output). The garbage collector stylesheet takes this a step further and analyzes all actors, glossary terms, entities, goals, etc. and produces a new UCML document that does not include elements unreferenced by a “live” use case. So, I can mark some currently uninteresting use cases as dead, run the garbage collector, run the UCML→HTML stylesheet on the result and get a nicely formatted document that contains only the supporting information required for the included use cases.

HTML is not ideal for printing, though, and this is where Docbook comes in. I have a UCML→Docbook stylesheet that does essentially the same things as the UCML→HTML stylesheet. Then I can convert the Docbook to PDF, Postscript, TeX, etc.

I've also created my own XML languages for component models, architectural decisions documents and others – it's pretty easy to gin one up whenever I come across another sort of highly structure document I need to write. Plus I have one that I use for simple, less-structured documentation that just provides sections, paragraphs, etc. Mostly I just have whatever→HTML stylesheets for most of these, since they're all intended to be read by developers who are less insistent than end-users on having printable formats.

So, I have nice, text documents that I can use EMACS on, manage in CVS, etc., stylesheets I can fiddle with (when I get sick of writing documentation I can always spend a little time improving the stylesheet code and justify it as “documentation” effort ) and everyone else gets really nice docs from me. The biggest downside is that other people (non-programmer types who are uncomfortable with tagged text) are often uncomfortable with trying to edit my documents (sometimes it's a good thing, as it allows me to retain the “power of the pen”, sometimes its bad as even trivial updates must pass through me).

The next steps with UCML are (1) figure out how to establish and maintain XMI-documented use cases and UCML-documented use cases and (2) write a WYSIWYG-like tool for editing them, for the tag-averse.

– shawn_at_willden.org

free/open publishing licences

linux documentation project http://www.linuxdoc.org
o'reilly (eg. samba) http://www.oreilly.com/catalog/samba/chapter/licenseinfo.html
GFDL
wikipedia + GFDL → http://www.wikipedia.com/wiki/GNU_Free_Documentation_License
social life of information http://www.slofi.com/
open content http://www.opencontent.org/

a cautionary tale

Eric Weisstein's tale of getting screwed by a publisher, who printed a hard copy 'snapshot' of his Mathworld website and his struggle against greedy facelessness → http://mathworld.wolfram.com/authors_note.html

Linux Gazette articles

part i, POD http://www.linuxgazette.com/issue73/spiel.html
part 2, LaTeX http://www.linuxgazette.com/issue74/spiel.html
part 3, DocBook/XML http://www.linuxgazette.com/issue75/spiel.html

Table of Contents