Custom Functions

Introduction

So far, just about everything you have done in this book has involved the use of functions. From basic c() o advanced functions like ggplot(), somebody had to sit down and write the code to make these functions do something useful. Now it is your turn.

Custom functions tend to beuseful whenever you find yourself repeating parts of your code. For example, maybe you are repeating a calculation across different data columns of data, or across different input data sets, or you might want to compare different models by tweaking slightly different parameters. Condensing repeated code into functions can help make your code more concise, organized, and understandable. Functions can also make repeated code run faster, as we very briefly introduced in the Flow Control chapter. Functions can also help to avoid errors, because you only change parameters in one place – when you run the code – instead of replacing parameters in all of the steps of the function.

Giving your custom function a clear name and specifying the arguments that it takes, will help make your code easier to understand. This is important for collaboration, even if you are collaborating with yourself at some time in the future.

Custom functions can be a little tricky to master at first, but you’ve already taken the biggest step by learning how to run functions that others have written. By digging into the help files of the functions you know, you should get a good feel for what functions can do and how they are organized with parameters. In this chapter, you will learn how to build your own functions, and some of the ways they may be particularly useful to you.

General Form

You should already have a good sense of how functions work from all of the other tutorials/chapters. Now let’s work through a real example. Don’t type this out, but read through it:

functionName<-function(var1=Default1,var2=Default2){
  ## Meat and potatoes script
  return(output)
}

This is called pseudo-code, which means that it is not working R code, but rather a sort of fake code to act as a framework for understanding the basics. You’ll come across this if you decide you want to dive deeper into computer science. The purpose of this pseudo-code is to give a general sense of functions work, regardless of the specific programming language you use (e.g., R, Python, Unix). We have variables (var1 and var2) and we can assign default values (Default1 and Default2). The comment ## Meat and potatoes script represents the main steps of the function. The final return(output) defines which objects from the function are output or returned to the user.

Take some time to think about this pseudo-code, and compare it to the help for some functions you know, like ?mean(), ?ggplot() or ?read.csv(). By now you are quite familiar with the idea that each function has a name and a set of parameters (or arguments in R lingo). The default parameters are the ones that are written inside the help file of the function. For example the sep = "," argument in the read.csv() function tells us that the function will assume you want the sep agrument to be a comma unless you explicitly define it. Some arguments do not have a default. For example the n argument in the rnorm() function does not have a default value. That’s why you get an error if you try to run rnorm() but not if you run rnorm(1).

That’s everything you need to know to start writing your own functions!

Example Function

We’ll start by converterting the above pesudo-code into functional R code. We’ll set default arguments, where the user will input two numeric objects, just to see how it works. The idea is that we want to calculate some basic math for a pair of input vectors, and output the results as a list.

my.function<-function(var1=0,var2=0){
  # We can make new variables within a function
  add<-var1+var2
  subt<-var1-var2
  mult<-var1*var2
  div<-var1/var2 
  # And put them together into a list for output
  outlist<-list(input1=var1, input2=var2, 
                addition=add, subtraction=subt, 
                multiplication=mult, division=div)
  # So far, everything is contained within the function. 
  # Use return() to generate output
  return(outlist)
}

On the first line, we define the function by giving it a name (my.function) and setting parameters for the function. In this function we have just two parameters, one for each input variables. If one or both are left blank when we call the function, then R will replace these values with the default (zero). These variables may be individual numbers, or we may input vectors and R will automatically apply them to each element.

Inside of the function, we generate five objects, the first four representing simple mathematical equations applied to the two input variables. A fifth object (outlist) is simply a list object containing the two input variables and the output of each of the four equation objects.

The final line return() contains the object that is output from running the custom function. In this case, it is the outlist list object.

As usual, be sure to type out this function in a .R script or a code chunk in a .Rmd file. Try running the first line of the function. You’ll see a + sign in your R console. This is R telling you that it is expecting more lines of code. This happens when you have open brackets or an unfinished pipe (%>%) or ggplot (+) command. Run each of the remaining lines of the function, and you should see the R Console return to > after you run the last line of the function.

Question: Why is there no output to the R Console?

Answer: You have just loaded the function into memory. Think of this like when you use library() to load a package into memory. The functions from that package are now available for use.

Look at the Environment tab in R Studio. This is a tab in one of the R Studio windows – usually in the top-right window by default. You should see a new item here called my.function followed by function (var1 = 0, var2 = 0). This tells us that the my.function() function is available for use, and it has two input parameters with default values.

Local vs Global

Before running your function, first note what is missing from the Environment tab: none of the objects that are created inside the function are listed here. For example, add, subt and outlist. Even when we run the function, we won’t see those objects in the Environment here. These objects are local objects because they only exist within the function that contains them.

By contrast, a global object is created when we make an object in the main code.

Global<-"Object"

Global objects are saved in memory and can be accessed by any function that you run. Local objects can only be used inside of the function that contains them.

Running Functions

Running custom functions is no different from running any of the other functions you are familiar with. Try running the function on its own, with default values:

my.function()
$input1
[1] 0

$input2
[1] 0

$addition
[1] 0

$subtraction
[1] 0

$multiplication
[1] 0

$division
[1] NaN

Now try specifying the input parameters and compare the output.

my.function(var1=10,var2=0.1)
$input1
[1] 10

$input2
[1] 0.1

$addition
[1] 10.1

$subtraction
[1] 9.9

$multiplication
[1] 1

$division
[1] 100
my.function(var1=c(1:10),var2=c(10:1))
$input1
 [1]  1  2  3  4  5  6  7  8  9 10

$input2
 [1] 10  9  8  7  6  5  4  3  2  1

$addition
 [1] 11 11 11 11 11 11 11 11 11 11

$subtraction
 [1] -9 -7 -5 -3 -1  1  3  5  7  9

$multiplication
 [1] 10 18 24 28 30 30 28 24 18 10

$division
 [1]  0.1000000  0.2222222  0.3750000  0.5714286  0.8333333  1.2000000
 [7]  1.7500000  2.6666667  4.5000000 10.0000000

Annotation

For more complicated functions that take a long time to run, consider using print() or cat() to indicate the steps that are being run. This can help a lot with troubleshooting custom functions. The cat function is similar to print but lets you print directly to screen rather than passing through a data object. Recall from the Regular Expression Chapter that \n is the new line character. If we include \n in the cat() output, then it will print to a new line. Here is an example:

my.function<-function(var1=0,var2=0){
  cat("\nInput variables:\nvar1 =", var1,"\nvar2 =", var2,"\n")
  cat("\nCalculating functions...\n")
  cat("\nAdding...\n")    

  add<-var1+var2
  
  cat("\nSubtracting...\n")

  subt<-var1-var2
  
  cat("\nMultiplying...\n")
  
  mult<-var1*var2
  
  cat("\nDividing...\n")
  
  div<-var1/var2  
  
  cat("\nGenerating output...\n\n")
  
  outlist<-list(input1=var1, input2=var2, 
                addition=add, subtraction=subt, 
                multiplication=mult, division=div)

  return(outlist)
}

## Run
my.function(var1=10,var2=0.1)

Input variables:
var1 = 10 
var2 = 0.1 

Calculating functions...

Adding...

Subtracting...

Multiplying...

Dividing...

Generating output...
$input1
[1] 10

$input2
[1] 0.1

$addition
[1] 10.1

$subtraction
[1] 9.9

$multiplication
[1] 1

$division
[1] 100

Verbose Parameter

Printing text to the screen can slow down your function considerably, as we saw in the Flow Control Chapter. A good practice is to provide output as a user-defined option by adding a ‘verbose’ parameter and an if() statement.

my.function<-function(var1=0,var2=0,verbose=FALSE){
  if(verbose==T){
    cat("\nInput variables:\nvar1 =", var1,"\nvar2 =", var2,"\n")
    cat("\nCalculating functions...\n")
    cat("\nAdding...\n")    
  }
  
  add<-var1+var2
  
  if(verbose==T){
    cat("\nSubtracting...\n")
  }
  
  subt<-var1-var2
  
  if(verbose==T){
    cat("\nMultiplying...\n")
  }
  
  mult<-var1*var2
  
  if(verbose==T){
    cat("\nDividing...\n")
  }
  
  div<-var1/var2  
  
  if(verbose==T){
    cat("\nGenerating output...\n")
  }
  
  outlist<-list(input1=var1, input2=var2,
                addition=add, subtraction=subt,
                multiplication=mult, division=div)

  return(outlist)
}

Now the Outlist is returned, but the cat() functions are only run if Verbose=T is selected when running the function.

External Files

In the Basic Customization Chapter, we saw how to create a custom plotting theme and save it as a file that we could load to apply the theme. The same is true for custom functions.

If you have a custom function that you would like to use frequently, or if it is too big to include in your main R Script or R Markdown file, then a dedicated .R file may be a good option.

  1. Save in a separate file, typically with a .R extension. For example, we might make new R Script called myfunction.R containing just the lines of the my.function() function that we created earlier.

  2. Load using source("PathName.FileName.R"). For example, we may have a directory called scripts inside of our working directory, in which case we could load the custom function with source("./scripts/myfunction.R").