2 Application in R language
2.1 A simple case
Consider following R class implementation which provides some basic mathematics operations.
MathOperation <- function() {
self <- environment()
class(self) <- append('Addition', class(self))
add <- function(x, y) x + y
multiply <- function(x, y) x * y
divide <- function(x, y) x / y
self
}
Let’s not argue about the design and relevancy of the approach. Instead, can you tell the scopes of each function, and identify/inventorize the implementations flaws ?
2.2 Defensive programming
In standard R, the provided implementation might behave correctly, erroneously, or even generate errors, depending on the inputs you provide.
Is it bad code ? Not at all according to me. The class name mentions clearly the intent that is to encapsulate some math operations. There are 3 operations. They can take any argument that can be accepted by operators ‘+’, ’*’ or ‘/’. So, providing, integers, doubles, and complex numbers should work. If you use an external package like gmp, it is also an acceptable input for any of the needed parameters. Any combination of this types will provide a correct result, using scalars or vectors.
From my point of view, main issues are the followings
issue number | issue description | issue severity |
---|---|---|
1 | few seconds for creation, several quarters of an hour for testing, and hours for documentation | UNACCEPTABLE |
2 | does it complies with maths sets? Not at all, this is software engineering implementation, not a math compliant one | SEVERE |
3 | high sensitivity to input values did you consider that NaN, NA, Inf, -Inf, 0 could be valid input values here?. Indeed R is naturally great on this part | LOW |
4 | natural polymorphism of returned types, that brings again software engineering whereas reliable math ops are needed. From a mathematical point of view, input belong to a predefined mathematical set, and output belongs also to a predefined mathematical set. Not the case with provided implementations | HIGH |
5 | unreliable implementation as input might return numeric output, warning or errors | HIGH |
2.3 Offensive programming
Consider same R class implementation with a little bit instrumentation.
suppressMessages(require(data.table))
MathOperation <- function() {
self <- environment()
class(self) <- append('Addition', class(self))
add <- function(x_r, y_r) x_r + y_r
multiply <- function(x_r, y_r) x_r * y_r
divide <- function(x_r, y_r) x_r / y_r
function_return_types <- data.table(
function_name = c('add', 'multiply', 'divide'),
return_value = c('x_r', 'x_r', 'x_d')
)
self
}
2.3.1 What is different?
Compare to previously shown implementation, here are the two main differences
- arguments are renamed according to a pattern
- a variable named function_return_types has been added. It holds a data.table that defines expected function return types.
That’s it. Function implementation is exactly the same. No change done elsewhere. Everything is there and should be sufficient to solve many of the faced issues.
2.3.2 Semantic argument naming
Arguments have been renamed from x to x_r. What does that mean? Syntactically, it changes nothing for R. For us humans, it changes a lot of things, as this follows a pattern that allows to specify several intents in a short, concise, and reliable way.
The pattern is simple to understand. Its parts can be up to three, and the second and third parts are optional. First part, is the variable name. Second part is the type of the variable. Third part is the length constraint specification. Parts are separated by underscore. Refer to 5 for more details about syntax, and for illustrative examples.
2.4 Back to definition
So now you may be able to translate the variable x_r by yourself. Just a vector of real values, unconstrained in length. Using this parameter name implies that the the developer is responsible for testing cases of various length and has to prevent weirdness propagation.
For example following R code shows results that require decisions
mo <- MathOperation()
print(mo$add(1.0 * 1:3, 1.0 * 1:7))
#> Warning in x_r + y_r: la taille d'un objet plus long n'est pas multiple de
#> la taille d'un objet plus court
#> [1] 2 4 6 5 7 9 8
This code provides both an output and a warning, because of R recycling on various length vectors. What decision should be taken ? Allow or deny this behavior ? It depends of your usage. If you are creating a real math library, I would recommend to duplicate the code and create two functions named addRCompliant and addMathCompliant. Later should enforce arguments length control in his body, while former should keep the body as is or instrument it with an encapsulating suppressWarning call. That way, you should easily meet your end-users expectations, either mathematicians or software engineers.
Note that in the later case, added controls are not defensive programming but functional scope verification.