1. Overview

Regular Expressions

  • aka ‘regex’ and ‘regexp’

  • a sort of find-and-replace for nerds

  • one of the most powerful data tools I have ever learned

  • requires patience and lots of practice

2. Basic Regex Functions in R

Four Basic Regex Functions in R:

grep() and grepl() are equivalent to ‘find’ in your favorite word processor

  • General form: grep("find this", in.this.object)

sub() and gsub() are equivalent to ‘find and replace’

  • General form: grep("find this", "replace with this", in.this.object)

Three Advanced Regex Functions in R:

regexpr() provides more detailed info about the first match

gregexpr() provides more detailed results about all matches

More examples here

Start with a simple data frame of species names:

Species<-c("petiolata", "verticillatus", "salicaria", "minor")
print(Species)
## [1] "petiolata"     "verticillatus" "salicaria"     "minor"

grep() – returns cell addresses matching query

grep("a",Species)
## [1] 1 2 3

grepl() – returns T/F associated with

grepl("a",Species)
## [1]  TRUE  TRUE  TRUE FALSE

sub() – replaces first match (in each cell)

sub("l","L",Species)
## [1] "petioLata"     "verticiLlatus" "saLicaria"     "minor"

gsub() – replaces all matches

gsub("l","L",Species)