11 Generating R documentation

A good practice of coding is producing related documentation. Easy to say, particularly annoying to achieve.

Package wyz.code.rdoc aims to ease R documentation generation. Its works whatever the offensive programming instrumentation reached. In particular, under no instrumentation you can still generate R documentation.

Using wyz.code.rdoc, manual pages .Rd files stored in man folder in a package context can be automatically created and filled up nearly to completion, depending on the level of offensive programming instrumentation of your code.

Generated manual pages uses English language. Feel free to modify produced English, to match your own English flavor, if needed.

The level of your R code instrumentation will impact the quality of the generation and your review work depth and time. When using both function return type instrumentation and test case instrumentation, expects produced manual page content to be fully generated and ready for review.

When your code is not offensive programming instrumented, documentation generation is still possible. Thanks to customization as code, you may provide your customization to patch generated content.

11.1 Conceptual approach

Package wyz.code.rdoc provides an API you can use to generate your manual pages. The approach followed is different from already well known documentation generation approaches, utils and roxygen2 approaches. Refer to vignette named tutorial of wyz.code.rdoc for a detailed explanations of approaches and comparison with wyz.code.rdoc.

In short, while utils generates a documentation template for you to fill in with your documentation content, roxygen2 instruments comments in your code to generate on-demand the documentation by an automated process. In contrary wyz.code.rdoc uses code to generate documentation. The main benefit is that is allows you to customize the result by the code, thus offering a higher degree of freedom, higher reproducibility while saving time.

11.2 Vignettes

Package wyz.code.rdoc provides a vignette named documentation that presents all vignettes of this package to grant an easy navigation. Please refer to this vignette and browse tutorial, use cases, and tips and tricks to know more about wyz.code.rdoc functionalities.

11.3 API

Package wyz.code.rdoc allows to generate all kind of manual pages, in particular manual pages for data, functions, and R objects of various types environment, S3, S4, R6, RC.

The API allows you to meet the real R specification as described in document Writing R extensions, by providing R functions to ease handling of manual pages markup language and of special required constructs.

You may add missing sections, format presentations, x-ref other manual pages, add URLs, change phraseology and/or content as needed. wyz.code.rdoc provides a convenient and easy way to do so. Let’s see how through some examples.

11.4 Best generation strategy

The best manual pages generation strategies are presented below from best to worst

strategy name	benefits	comments
full automated generation	no hand edition, reproducibility, robustness, reusable	not always applicable. See 11.7.
partially automated generation	hand edition is allowed	Sometimes the shortest path. Hand modifications may be lost while regenerating content. Constrained reproducibility.

Stick to full automated generation wherever possible. Also, keep it simple and stupid. It is possible to enter very deep complexity using wyz.code.rdoc, indeed good documentation generally requires good presentation and good wording. The later is generally forgotten and does not depends on any software piece.

11.5 Pure R function

Let’s consider following R function named pure_r.

library(wyz.code.rdoc)

nop <- function() {}

pure_r <- nop
formals(pure_r) <- alist(x = , y = NULL, z = FALSE)

Now, let’s create its documentation as in package my_package, using wyz.code.rdoc.

ic <- InputContext(object = NULL, method = 'pure_r', package = 'my_package')
pc <- ProcessingContext(extraneous = list(
  concept = 'my concept',
  keyword = 'utils'
  )
)
gc <- GenerationContext(verbosity = TRUE, overwrite = TRUE)
pmp <- produceManualPage(ic, pc, gc)
#> 
#> ------------------------------------------------------------------------------
#> Creating manual page for function pure_r 
#> standard section multi concept 
#> standard section multi keyword 
#> wrote file /tmp/RtmpgMcFnE/pure_r.Rd 
#> filename is /tmp/RtmpgMcFnE/pure_r.Rd [OVERWRITTEN] 
#> generated 9 sections: name, alias, title, usage, arguments, author, keyword, concept, encoding 
#> missing 3 sections: description, value, examples 
#> probably missing 1 section: details 
#> replacements to manage: 3 
#> WARNING: File /tmp/RtmpgMcFnE/pure_r.Rd 
#> checkRd: (5) /tmp/RtmpgMcFnE/pure_r.Rd:0-20: Must have a \description

First, notice that you passed the function name, not the function itself. Second, notice I set overwrite in order to be able to change file if it exists. Without this option, processing will take place but won’t be saved into the targeted file name.

The result tells you what happened. By default generation uses folder /tmp. You may change this by providing setting argument /tmp of object GenerationContext.

The provided results is oriented towards good documentation production. As we are documenting a function, sections descriptions, details, values and examples should be present.

The warning shown comes from standard R documentation verification tool, namely tools::checkRd. This tool is use when checking package. You must get rid of errors, warnings and notes if you plan to publish your package.

Let’s do it.


examples <- list(
  function() { pure_r(sum, 1:5) },
  function() { pure_r(setenv) }
)

pc <- ProcessingContext(extraneous = list(
  description = 'tells if an R function is pure or not',
  details = 'A function is told to be pure if bla bla bla', 
  value = 'A single boolean value',
  examples = convertExamples(examples, captureOutput = FALSE),
  concept = 'my concept',
  keyword = 'utils'
  )
)
pmp2 <- produceManualPage(ic, pc, gc)
#> 
#> ------------------------------------------------------------------------------
#> Creating manual page for function pure_r 
#> standard section mono description 
#> standard section mono details 
#> standard section mono value 
#> standard section mono examples 
#> standard section multi concept 
#> standard section multi keyword 
#> wrote file /tmp/RtmpgMcFnE/pure_r.Rd 
#> filename is /tmp/RtmpgMcFnE/pure_r.Rd [OVERWRITTEN] 
#> generated 13 sections: name, alias, title, description, usage, arguments, details, value, author, examples, keyword, concept, encoding 
#> replacements to manage: 6 
#> File /tmp/RtmpgMcFnE/pure_r.Rd passes standard documentation checks

The generated manual page now passed the standard R documentation verification. Let’s have a look to generated file now.

cat(paste(readLines(pmp2$context$filename, warn = FALSE), collapse = '\n'))
#> \name{pure_r}
#> \alias{pure_r}
#> \title{Function pure_r}
#> \description{
#> tells if an R function is pure or not
#> }
#> \usage{
#> pure_r(x, y, z = FALSE)
#> }
#> \arguments{
#> \item{x}{XXX_004}
#> \item{y}{XXX_005}
#> \item{z}{XXX_006}
#> }
#> \details{
#> A function is told to be pure if bla bla bla
#> }
#> \value{
#> A single boolean value
#> }
#> \author{
#> \packageAuthor{my_package}
#> 
#> Maintainer: \packageMaintainer{my_package}
#> }
#> \examples{
#> # ------- example 1 -------
#> pure_r(sum, 1:5) 
#> 
#> # ------- example 2 -------
#> pure_r(setenv) 
#> 
#> }
#> \keyword{utils}
#> \concept{my concept}
#> \encoding{UTF-8}

As you see, some content as been generated with place holders prefixed by XXX. Although valid from a format point of view, this is meaningless, and should be corrected. You could fix it manually, but you will lose the ability to regenerate the manual page without losing some manual changes already made. Changing it by code keeps evolutivity and regeneration clean. Here is how to fix it, just use post processing.

pc <- ProcessingContext(extraneous = list(
  description = 'tells if an R function is pure or not',
  details = 'A function is told to be pure if bla bla bla', 
  value = 'A single boolean value',
  examples = convertExamples(examples, captureOutput = FALSE),
  concept = 'my concept',
  keyword = 'utils'
  ),
  postProcessing = list(
    arguments = function(content_s) {
      s <- sub('XXX_007', sentensize('a typical description for variable x'), 
               content_s, fixed = TRUE)
      s <- sub('XXX_008', sentensize('a typical description for variable y'), s, fixed = TRUE)
      s <- sub('XXX_009', 
               sentensize('a typical description for variable z'),
               s, fixed = TRUE)
      s
    }
  )
)
pmp3 <- produceManualPage(ic, pc, gc)
#> 
#> ------------------------------------------------------------------------------
#> Creating manual page for function pure_r 
#> standard section mono description 
#> standard section mono details 
#> standard section mono value 
#> standard section mono examples 
#> standard section multi concept 
#> standard section multi keyword 
#> patch arguments 
#> wrote file /tmp/RtmpgMcFnE/pure_r.Rd 
#> filename is /tmp/RtmpgMcFnE/pure_r.Rd [OVERWRITTEN] 
#> generated 13 sections: name, alias, title, description, usage, arguments, details, value, author, examples, keyword, concept, encoding 
#> patched 1 section: arguments 
#> replacements to manage: 9 
#> File /tmp/RtmpgMcFnE/pure_r.Rd passes standard documentation checks
cat(paste(readLines(pmp3$context$filename, warn = FALSE), collapse = '\n'))
#> \name{pure_r}
#> \alias{pure_r}
#> \title{Function pure_r}
#> \description{
#> tells if an R function is pure or not
#> }
#> \usage{
#> pure_r(x, y, z = FALSE)
#> }
#> \arguments{
#> \item{x}{A typical description for variable x.}
#> \item{y}{A typical description for variable y.}
#> \item{z}{A typical description for variable z.}
#> }
#> \details{
#> A function is told to be pure if bla bla bla
#> }
#> \value{
#> A single boolean value
#> }
#> \author{
#> \packageAuthor{my_package}
#> 
#> Maintainer: \packageMaintainer{my_package}
#> }
#> \examples{
#> # ------- example 1 -------
#> pure_r(sum, 1:5) 
#> 
#> # ------- example 2 -------
#> pure_r(setenv) 
#> 
#> }
#> \keyword{utils}
#> \concept{my concept}
#> \encoding{UTF-8}

Note that I had to infer variable names for next generation, not just had to use previous ones! Now, generated manual pages looks acceptable, from a content and format point of view.

From documentation quality point of view, understand-ability is insufficient according to me, due to genericity of phraseology. Use specific terminology to make things crystal clear and easy to understand for most of your readers.

Now that you understood the global pattern, you may use immediately a pattern like the one used for the final case above, to get in one single shot a manual page from an R function.

11.6 Offensive programming R function

Using an offensive programming instrumented function eases processing in comparison to previous case. Let’s see how

op_r <- nop
formals(op_r) <- alist(functionName_f_1 = , functionArguments_l = NULL, verbosityFlag_b_1 = FALSE)

Now, let’s create its documentation as in package my_package, using wyz.code.rdoc.

examples <- list(
  function() { op_r(sum, 1:5) },
  function() { op_r(setenv) }
)
ic <- InputContext(object = NULL, method = 'op_r', package = 'my_package')
pc <- ProcessingContext(extraneous = list(
  description = 'tells if an R function is pure or not',
  details = 'A function is told to be pure if bla bla bla', 
  value = 'A single boolean value',
  examples = convertExamples(examples, captureOutput = FALSE),
  concept = 'my concept',
  keyword = 'utils'
  )
)
gc <- GenerationContext(verbosity = TRUE, overwrite = TRUE)
omp <- produceManualPage(ic, pc, gc)
#> 
#> ------------------------------------------------------------------------------
#> Creating manual page for function op_r 
#> standard section mono description 
#> standard section mono details 
#> standard section mono value 
#> standard section mono examples 
#> standard section multi concept 
#> standard section multi keyword 
#> wrote file /tmp/RtmpgMcFnE/op_r.Rd 
#> filename is /tmp/RtmpgMcFnE/op_r.Rd [OVERWRITTEN] 
#> generated 13 sections: name, alias, title, description, usage, arguments, details, value, author, examples, keyword, concept, encoding 
#> File /tmp/RtmpgMcFnE/op_r.Rd passes standard documentation checks
cat(paste(readLines(omp$context$filename, warn = FALSE), collapse = '\n'))
#> \name{op_r}
#> \alias{op_r}
#> \title{Function op_r}
#> \description{
#> tells if an R function is pure or not
#> }
#> \usage{
#> op_r(functionName_f_1, functionArguments_l, verbosityFlag_b_1 = FALSE)
#> }
#> \arguments{
#> \item{functionName_f_1}{A single function value}
#> \item{functionArguments_l}{An unconstrained list}
#> \item{verbosityFlag_b_1}{A single boolean value}
#> }
#> \details{
#> A function is told to be pure if bla bla bla
#> }
#> \value{
#> A single boolean value
#> }
#> \author{
#> \packageAuthor{my_package}
#> 
#> Maintainer: \packageMaintainer{my_package}
#> }
#> \examples{
#> # ------- example 1 -------
#> op_r(sum, 1:5) 
#> 
#> # ------- example 2 -------
#> op_r(setenv) 
#> 
#> }
#> \keyword{utils}
#> \concept{my concept}
#> \encoding{UTF-8}

As you can see, no more need to qualify arguments. Semantic naming is used to generate the content. If generated content does not match perfectly your need, you can still apply post processing in a somewhat similar or dissimilar way.

For convenience, just know that you can easily generate argument section content using following approach

dt <- data.table(fields = letters[24:26], 
                 description = paste('a typical description for variable', 
                                     letters[24:26]))
sapply(seq_len(nrow(dt)), function(k) {
  generateMarkup(dt[k]$fields, 'item', dt[k]$description)
})
#> [1] "\\item{x}{a typical description for variable x}"
#> [2] "\\item{y}{a typical description for variable y}"
#> [3] "\\item{z}{a typical description for variable z}"

Using such approach allows you to replace purely the content using a post processing scheme.

11.7 Known limits

Generation of manual pages can be quite tricky. Whereas package wyz.code.rdoc alleviates greatly the burden, some pitfalls remain. Here they are

Generated manual page might not respect the maximum line length required by R CMD check, and this tool will provide explicit information about noncompliance. To solve issue, just split the content by adding carriage return wherever required.
Generated documentation is quite stereotyped. Inject your instructions to customize the result.

11.8 Opportunities

Reuse can be made at several levels depending of your needs. Roughly speaking, you may aim for one of these 3 levels of customization

just customize some textual information. Generate the pages using package wyz.code.rdoc and modify page contents manually is generally the best way to achieve this goal
customize some manual pages sections. Generate the pages using package wyz.code.rdoc while providing some dedicated context information. Refer to previous examples, and look at variables starting with extraneous. They allow you to inject your customized content in targeted sections.
If you seek for fully customized manual page generation, then you may use package (ref:rd) to create your own R generation scheme. That way you will get the benefit of starting launched, using high-level R documentation generation functions, and also get the ability to reuse and customized provided generation scheme. This package uses only R code, and so you could get insight and reuse any part of it.