Overview

A phylogeny is a tree that represents a hypothesis about the evolutionary relationships among organisms. Connected branches form a ‘clade’ and shorter connections represent more recent evolutionary divergence compared to long (i.e. deep) branches. There are many different methods for building a phylogenetic tree, but in general, trees are built using clustering algorithms that group objects by some measure of their similarity. Modern phylogenies are based on DNA or protein similarity, but in principal we can cluster objects based on any trait we can measure. To look at the process in more detail, let’s do a phylogeny of dragons.

The Dragon Phylogeny is a project developed when Rob Colautti was a postdoctoral researcher at the University of British Columbia. It was originally developed as a t-shirt Threadless.com, which is out of print now.

HOWEVER Some recent students have resurrected the project and re-released a slightly different version, which is available at https://dragonphylogeny.threadless.com/

100% of proceeds from t-shirt sales support grants & projects that diversity representation of STEM researchers, particularly in Ecology and Evolutionary Biology.

#diversifyEEB

The original design generated some media coverage:

  • io9/Gizmodo Article – A nice write-up about the project on the blog io9.com (now part of Gizmodo).
  • CBC.ca Interview – from an interview with CBC radio, later posted on their website.

GitHub

The project now lives on at GitHub: https://github.com/ColauttiLab/DragonPhylogeny

Scoring Traits

Images

The images are available as a pdf file on the Dragon Phylogeny GitHub site: Dragon_Pics.pdf NOTE This is a large file size (9.4MB). The PDF contains images of all dragons used in the original Dragon Phylogeny Unfortunately, we don’t have any dragon blood to do a DNA-based phylogeny. However, we can try to cluster dragons based on their physical appearance. The first step is to locate some pictures of dragons and choose a common set of traits.

Here are the traits for a variety of dragons:

Origin RefNum Teeth.Type Tongue Eyes Eye.position Ears Nose Snout Horns Skin.face.neck Skin.body Skin.Belly Dorsal.ridge Toes Opposed.toes Legs Claws Wings Wing.type Wing.structure Body.Type Tail.end Size wiskers
German 1 Blunt forked; short small lateral spearlike upturned; forward nostrils long absent smooth fish scales wide platemail absent 3 0 4 short catlike 2 bat-like 4 ridges elongate point absent
French 2 Fangs forked; long average lateral absent upturned; forward nostrils long plates hairy soft scales hair absent 3 1 4 short talon 2 bird-like elongate point 1-2x human absent
French 3 Blunt&Fangs wormy; short forward spearlike pointy; forward nostrils long absent smooth smooth smooth absent 3 0 4 short catlike 2 bat-like 11 ridges rotund point 1-2x human absent
Dutch 4 Pointy narrow lateral spearlike upturned; forward nostrils moderate absent scaly skin round platemail smooth single row; spikes 3 1 4 short talon 2 bird-like 5major; 6minor ridges absent
English 5 small lateral absent upturned; lateral nostrils long absent smooth smooth smooth single row; continuous 2 0 4 2 batlike 5 ridges elongate point 1-2x human absent
American 6 Pointy single; long large lateral absent slits; forward absent absent scaly skin hexagonal scales wide platemail absent 0 0 0 0 0 0 0 snake 3x human absent
French 7 Blunt&Fangs forked; long small forward spearlike upturned; forward nostrils blunt absent hairy rough wide platemail 4 1 4 large talon 2 mix 3major; 1 minor elongate point 1-2x human absent
English 8 Pointy long average lateral small slits; lateral long; duckbill short fish scales fish scales fish scales single row; jagged 3 0 2 short talon 2 batlike 7major; 3minor elongate split (Y shape) 1-2x human absent
French 9 Pointy average forward round upturned; forward nostrils blunt absent hairy smooth smooth absent 3 1 4 short talon 2 wing-leg hybrid batlike 6 ridges rotund point 1-2x human absent
French 10 Pointy average forward large dishlike upturned; forward nostrils blunt absent hairy extra wide plates see body absent 3 0 2 short talon 2 wing-leg hybrid batlike 7 ridges elongate point 1-2x human absent
Spanish 11 Blunt&Fangs small lateral round hooked long absent smooth smooth absent 3 1 4 short talon 2 batlike 4 ridges elongate point 1-2x human absent
Japanese 12 average forward round small; forward long long hairy fish scales wide platemail single; overlapping plates 3 1 4 large talongs absent snake point long
Japanese 13 narrow forward absent upturned; forward nostrils long absent spiny spiny absent 3 NA NA snake short
Japanese 14 Fangs&Pointy large forward absent upturned; forward nostrils long short hairy fish scales wide platemail single NA 0 0 snake blunt 3x human short
Japanese 15 Fangs&Pointy large lateral absent upturned; forward nostrils blunt jagged spiny fish scales wide platemail absent NA 0 0 snake 1-2x human long
Japanese 16 Pointy short average lateral absent upturned; forward nostrils long 2lobed antler hairy rough wide platemail single; jagged 3 0 2 long catlike 0 snake long
Japanese 17 Pointy average forward spearlike upturned; forward nostrils long 2lobed antler spiny fish scales wide platemail single; spiked 3 0 4 large talon absent snake blunt long
Japanese 18 Fangs&Pointy short large forward absent upturned; forward nostrils blunt absent spiny large scales wide platemail absent NA 4 0 snake 1-2x human
Japanese 19 Fangs&Pointy large forward absent upturned; forward nostrils blunt short rough fish scales wide platemail absent NA 0 0 snake 1-2x human long
Japanese 20 average forward round upturned; forward nostrils blunt short hairy fish scales wide platemail single; jagged 3 0 4 short catlike 0 snake long
Japanese 21 Fangs&Pointy short small forward absent upturned; forward nostrils long jagged antlers hairy fish scales wide platemail single; spiked 3 0 4 large talon 0 snake long
Japanese 22 large forward spearlike upturned; forward nostrils long short antler hairy fish scales wide platemail single; spiked 3 0 4 large talon 0 snake fan medium
Japanese 23 average forward round upturned; forward nostrils long 2lobed antler hairy fish scales wide platemail single; jagged 3 0 4 large talon 0 snake fan 1-2x human medium
Japanese 24 large lateral spearlike forward long 2lobed antler hairy fish scales wide platemail single; spiked 3 0 4 large talon 0 snake fan medium
Japanese 25 Fangs&Pointy small forward absent upturned; forward nostrils long long hairy bumpy double plates single jagged 3 0 4 large talon 0 snake fan long
Japanese 26 Fangs&Pointy short average forward spearlike upturned; forward nostrils long 2lobed antler hairy bumpy double plates absent 3 0 NA large talon 0 snake medium
Japanese 27 short small forward upturned; forward nostrils long long hairy fish scales double plates single; overlapping plates 3 0 4 large talon 0 snake point 5-6x human long
Japanese 28 Fangs&Pointy short small forward spearlike upturned; forward nostrils long 2lobed antler hairy fish scales wide platemail single; jagged 3 0 4 large talon 0 snake fan 3-4x human long
Japanese 29 Fangs&Pointy short average forward spearlike upturned; forward nostrils long very long; multilobed hairy fish scales double plates single; jagged 4 0 4 large talon 0 snake point 1-2x human absent
Italian 30 Pointy small forward round pointy long absent smooth small scales smooth single; ridge 3 0 4 short talon 2 birdlike 6 ridges elongate point 1/2 human absent
Italian 31 Pointy short average forward small upturned; forward nostrils moderate absent smooth light hair smooth absent 3; webbed 0 4 short catlike 2 batlike 5 ridges elongate point 1/2 human absent
Italian 32 Fangs short small forward spearlike upturned; forward nostrils moderate absent hairy light hair smooth absent 3 0 2 short catlike 2 batlike 9major; 2 minor elongate point 1-2x human absent
33 Fangs&Pointy average forward spearlike small; forward long stubs scaly skin; hair fish scales hair single; spiked 4 0 4 short talon 2 batlike 5 ridges elongate 1-2x human absent
German 34 Pointy double small lateral absent small; forward long absent smooth hairy double muscular single; hairy 3 1 2 short talon 2 batlike 2 ridges elongate point 1-2x human absent
English 35 Pointy forked; long small forward spearlike lateral long medium skaly skin fish scales wide platemail single; jagged 4 1 4 large talons 2 mix 5 ridges elongate point 1-2x human nose horn (modified wisker?)
German 36 Fangs&Pointy short large forward spearlike upturned; forward nostrils long stubs hairy light hair light hair sigle; ridge 4 1 4 short catlike 2 batlike 4 ridges elongate 1-2x human absent
Dutch 37 Fangs&Pointy short average forward spearlike forward moderate absent smooth smooth muscular single bumps 3 0 2 large talons 2 batlike 3 ridges elongate point 1-2x human absent
Spanish 38 Pointy forked; long small forward small small; forward long absent smooth smooth smooth wide overlapping plates 3 1 2 short talon 2 batlike 5 ridges elongate point 1-2x human absent
Italian 39 Pointy long small forward absent pointy forward jagged antlers rough lizard scales lizard scales single bumps 3 1 4 short catlike 2 birdlike 5 ridges elongate point 1/2 human absent
Italian 40 Pointy long small forward spearlike forward long absent rough hairy muscular absent 4 1 4 short catlike 2 batlike 17+ ridges elongate point 1-2x human absent
English 41 Pointy narrow; short average lateral small forward long absent hairy lizard scales 5 0 4 short talon 2 batlike 5 ridges elongate 1-2x human absent
Italian 42 narrow; short small lateral spearlike forward long absent rough smooth bump ridge absent 4 1 4 short talon 2 batlike 5 ridges elongate point 1-2x human absent
Spanish 43 Pointy narrow; short small lateral round pointy long absent hairy complex leather wide platemail single; complex bumps 3 1 4 short talon 2 batlike 5 ridges elongate point 1-2x human absent
Italian 44 Pointy long small lateral small upturned; forward nostrils long absent rough bumpy NA 4 2 batlike 9 ridges elongate point 1-2x human absent
Italian 45 Fangs&Pointy long small lateral absent pointy long absent smooth smooth smooth single; ridge 3 0 2 short catlike 2 batlike 9 ridges elongate point 1-2x human absent
English 46 Pointy narrow lateral absent forward long stubs smooth platemail smooth absent 4 0 4 small talons 0 blunt 1-2x human absent
Italian 47 Pointy narrow; short small forward spearlike forward long fanlike hairy hairy smooth absent 3 1 4 small talons 2 batlike 7 ridges elongate point 1-2x human absent
Dutch 48 Pointy long average forward spearlike forward moderate stubs smooth light hair smooth single; bumped 3 1 4 small talons 2 birdlike 5 ridges elongate point 1/2 human absent
Indian 49 Pointy long small lateral absent forward long absent rough bumpy serrated single; ridge 3 1 4 absent 0 point absent
Japanese 50 narrow forward small upturned; forward nostrils long long hairy fish scales muscular single; overlapping plates 3 0 4 large talons 0 snake point 1-2x human absent
Japanese 51 Fangs narrow forward spearlike upturned; forward nostrils long long hairy rough muscular single; overlapping plates 3 0 4 small talons 2 birdlike dozens ridges snake blunt 1-2x human absent
Japanese 52 narrow lateral round upturned; forward nostrils long absent hairy fish scales muscular absent 3 0 4 large talons 0 snake point 1/4 human medium
Japanese 53 large forward absent forward long absent hairy lizard scales lizard scales wide overlapping plates 3 0 NA small talons 0 point 1-2x human small
Iranian 54 forward forward long fanlike smooth smooth smooth single; ridge 3 0 4 short catlike 2 birdlike 7 ridges snake 1-2x human absent
Iranian 55 Fangs small lateral small forward long small smooth smooth smooth single; ridge 3 0 4 small talons 0 snake point 1-2x human absent
Iranian 56 Fangs many points narrow lateral spearlike small; forward long long rough rough rough absent 4 1 2 small talons 2 batlike 4 ridges snake point 1-2x human absent
Iranian 57 Fangs narrow; short small forward small upturned; forward nostrils long small rough rough muscular absent 4 1 4 small talons 0 snake point 1-2x human absent
Turkish 58 Pointy long average lateral round small; forward long small rough rough muscular single; ridge NA 0 0 blunt 1-2x human absent
Iranian 59 Pointy long small forward spearlike small; forward long 2lobed antler rough fish scales fish scales single; jagged 4 0 4 small talons 0 snake point 1-2x human absent
Iranian 60 Fangs narrow; short narrow forward absent forward long absent hairy smooth muscular single; ridge 4 0 4 long catlike 0 snake point 1-2x human absent
Turkish 61 Pointy forked; long narrow lateral absent forward long medium rough smooth muscular single; ridge 4 0 4 small talons 0 snake absent
Turkish 62 Fangs&Pointy forked; long small lateral spearlike forward long medium smooth smooth muscular single; ridge 4 0 4 small talons 0 snake blunt absent
Ukraine 63 Blunt&Pointy short small lateral round upturned; forward nostrils forward ramhorn rough lizard scales muscular hair 3 0 4 short catlike 0 elongate point 1-2x human absent
Ukraine 64 Pointy average lateral absent tiny long short smooth smooth rough absent 4 0 2 long catlike 0 snake threetail 1-2x human absent
Russia 65 Pointy long average lateral absent forward moderate absent rough rough spiny single; bumped 2 1 2 short catlike 2 wing-leg hybrid birdlike 6 ridges snake blunt 1-2x human absent
Ukraine 66 Pointy small lateral round lateral moderate absent smooth smooth smooth single; bumped 3 0 4 small talons 2 batlike 5 ridges elongate point 1-2x human absent
Russia 67 short narrow lateral absent upturned; forward nostrils moderate small hairy hairy hairy absent NA 2 2 wing-leg hybrid birdlike 5 ridges snake point 1/2 human absent
Greece 68 Fangs short average forward spearlike forward moderate absent hairy hairy hairy single; bumped 3 1 4 large talons 2 birdlike 9 ridges elongate point 1-2x human absent
Italian 69 Pointy long average lateral spearlike upturned; lateral nostrils long absent rough smooth smooth single; spiked 4 NA 4 small talons 2 birdlike 6 ridges elongate point 1-2x human absent
American 70 small forward large speaklike upturned; forward nostrils long absent rough rough wide platemail absent 4 NA 4 small talons absent point 1-2x human absent
British (Wales) 71 long; spearhead narrow lateral spearlike forward long absent hairy fish scales lateral plates single; spiked 3 1 4 large talons 2 batlike 9 ridges elongate spear absent
British 72 Fangs long; spearhead narrow lateral large spearlike tiny long absent smooth smooth lateral plates single; jagged 2 1 2 small talons 2 batlike 10 ridges elongate spear absent
British 73 Fangs short narrow lateral spearlike tiny long small spiny spiny lateral plates single; jagged 4 0 2 small talons 2 batlike 6 ridges elongate spear absent
British 74 Fangs long; spearhead narrow lateral spearlike forward long absent rough rough lateral plates single; jagged 4 0 4 small talons 2 batlike 5 ridges elongate spear absent
British 75 Fangs long; spearhead average lateral spearlike forward long absent hairy lizard scales muscular single; jagged 4 0 2 large talons 2 batlike 5 ridges elongate spear absent

Encoding

Now that we’ve scored the traits, we have to encode them – in this case we’ll use 1s and 0s, with ? indicating unknown values for the traits that couldn’t be observed in some of the photos. The coding is easy for binary traits (present/absent), however most traits are not binary, and we might want coding that will account for inferred evolutionary transitions. For example, if we look at skin type we have several categories:

  • fish scales
  • spiny
  • hairy
  • plates
  • scaly skin
  • bumps/ridged skin
  • smooth

What if we want to encode an evolutionary model? For example, one that looks like this:

spiny <-- fish scales --> scaly skin  
  |           |              /   \
  v           v             v     v
hairy       plates      smooth   bumpy

We need a coding system where the coding of the derived states is more similar to the ancenstral form than to the other states.

How would you code these using a binary vector?

Here’s one way:

100000 <-- 000000 --> 000100  
  |           |        /   \
  v           v       v     v
110000     001000  000110  000101

Trait coding

Here is the full list of traits and how they were encoded.

## [1] "Order"     "Trait"     "Phenotype" "Binary"
Order Trait Phenotype Binary
1 Appendages Zero 1111
1 Appendages Two 1101
1 Appendages Four 1001
1 Appendages Six 0001
2 Mass 1/4 human 0000
2 Mass 1/2 human 0001
2 Mass 1-2x human 0011
2 Mass 3-4x human 0111
2 Mass >4x human 1111
3 Body type Rotund 00
3 Body type Elongate 01
3 Body type Snakelike 11
4 Claw type Long Catlike 1100
4 Claw type Short Catlike 1000
4 Claw type Absent 0000
4 Claw type Short Talons 0010
4 Claw type Long Talons 0011
5 Dorsal ridges Plates 100000
5 Dorsal ridges Absent 000000
5 Dorsal ridges Bumps 010000
5 Dorsal ridges Spike 011000
5 Dorsal ridges Ridge 010100
5 Dorsal ridges Jagged 010010
5 Dorsal ridges Hair 010011
6 Ear morphology Absent 000
6 Ear morphology Round or Small 100
6 Ear morphology Spearlike 010
6 Ear morphology Other 001
7 Eye morphology Avg 000
7 Eye morphology Large 001
7 Eye morphology Narrow 010
7 Eye morphology Small 100
8 Eye position Lateral 0
8 Eye position Forward 1
9 Horn type Absent 000
9 Horn type Stubs/Small 100
9 Horn type Med/Long 110
9 Horn type Jagged/Antlers 111
10 Nose Position Lateral 0
10 Nose Position Forward 1
11 Nasal morphology Upturned 1
11 Nasal morphology other 0
12 Skin-dorsal Fish Scales 000000
12 Skin-dorsal Spiny 000100
12 Skin-dorsal Hairy 000110
12 Skin-dorsal Plates 000001
12 Skin-dorsal Scaly Skin 100000
12 Skin-dorsal Rough Skin 101000
12 Skin-dorsal Smooth Skin 110000
13 Skin-head Fish Scales 000000
13 Skin-head Spiny 000100
13 Skin-head Hairy 000110
13 Skin-head Plates 000001
13 Skin-head Scaly Skin 100000
13 Skin-head Rough Skin 101000
13 Skin-head Smooth Skin 110000
14 Skin-ventral Fish Scales 000000
14 Skin-ventral Spiny 000100
14 Skin-ventral Hairy 000110
14 Skin-ventral Plates 000001
14 Skin-ventral Scaly Skin 100000
14 Skin-ventral Rough Skin 101000
14 Skin-ventral Smooth Skin 110000
15 Snout type Absent 0000
15 Snout type Beak 0001
15 Snout type Blunt 1000
15 Snout type Moderate 1100
15 Snout type Long 1110
16 Tail type Blunt/Point 10
16 Tail type Fan/Split Y 00
16 Tail type Sepear 01
17 Teeth Pointy Only 0000
17 Teeth Blunt + Pointy 1000
17 Teeth Blunt Only 1100
17 Teeth Fangs + Other 0001
17 Teeth Fangs Only 0011
18 Toes-opposing Yes 0
18 Toes-opposing No 1
19 Toe Number > Five 000000
19 Toe Number Five 100000
19 Toe Number Four 110000
19 Toe Number Three 111000
19 Toe Number Two 111100
19 Toe Number One 111110
19 Toe Number Zero 111111
20 Tongue length Short 0
20 Tongue length Long 1
21 Tongue morphology Regular 00
21 Tongue morphology Forked 01
21 Tongue morphology Spear 10
22 Ventral plates Yes 1
22 Ventral plates No 0
23 Whiskers Absent 00
23 Whiskers Short 10
23 Whiskers Long 11
24 Wing structure Absent 00
24 Wing structure Hybrid 10
24 Wing structure Full 11
25 Wing type Bat 100
25 Wing type Bird 010
25 Wing type Hybrid 001

To encode a dragon, traits are first arranged by the Order column, and then the observed Phenotype for each Trait is recoded as 1s and 0s (or ? for missing values) using the corresponding Binary code. Finally, all of the 1s and zeros are combined into a single vector.

Nexus Data File

The file DragonMatrix.nex contains the encoded traits, along with a few extra lines of information that specify the data in a nexus data file format. Nexus files are just readable text files that follow a few formatting rules, typically with a .nxs or .nex file name extension. For more information, see the Wikipedia Entry or Christoph Champ wiki.

Open up the file in a text editor:

The penultimate line is a semicolon ;, which specifies the end of the coded characters, followed by End; indicating the end of the file.

The first few lines set up the data format:

#NEXUS
begin data;
dimensions ntax=77 nchar=78;
format datatype=binary interleave=no gap=?;
matrix

Here’s a breakdown of what these first few lines do:

  • The first line shows us that it’s a #NEXUS data file
  • begin data specifies the start of the data
  • The dimensions line specifies the number of taxa (n=77 dragons) and characters (t=78 binary trait scores)
  • The fifth line denotes that the data are arranged in a matrix format.
  • The format line notes that the traits are encoded as binary. The gap specifies the symbol used for data gaps – i.e. missing values. The interleave=no specifies that each line contains all of the traits for the dragon. If this were DNA we might have 1000 or more base pairs in our sequence. In that case, we probably wouldn’t want a single line of base-pairs. Instead we might break it up into smaller chunks in an interleave format, like this:
Dragon1 TTGTCGAGTGTGCGGCAGCTTAGGTGAATTAAGTCCGGGCAACCTTTAGT
Dragon2 CAATAGCATACTACCGTGCGAGCCAGCTTATAGGTCGTTGCAGGTTATTA
Dragon3 ATGTCATTCGCCACGAGACTTTACTAGGGTATCATGCCGAAAGGGGATGG

Dragon1 TGTCCTGTGTGGGAAGTCGTGCCAGGACGGTTACAGCCTTAGCTTGTGCG
Dragon2 AAGCGAACTGAAGCGGTTGGGAGGATAAGCTTTACACGTGCCCCACAAAG
Dragon3 AAGCGAACTGAAGCGGTTGGGAGGATAAGCTTTACACGTGCCCCACAAAG

Import Nexus Data

Now that we’re familiar with the file, let’s import it using the read.nexus.data() function from the ape package:

library(ape)
DragonNexus<-read.nexus.data("Data/DragonMatrix.nex")
head(DragonNexus)
## $`0.1FishXXX`
##  [1] "0" "0" "0" "0" "0" "0" "0" "0" "0" "0" "0" "0" "0" "0" "0" "0" "0" "0" "0"
## [20] "0" "0" "0" "0" "0" "0" "0" "0" "0" "0" "0" "0" "0" "0" "0" "0" "0" "0" "0"
## [39] "0" "0" "0" "0" "0" "0" "0" "0" "0" "0" "0" "0" "0" "0" "0" "0" "0" "0" "0"
## [58] "0" "0" "0" "0" "0" "0" "0" "0" "0" "0" "0" "0" "0" "0" "0" "0" "0" "0" "0"
## [77] "0" "0"
## 
## $`0.2SnakeXX`
##  [1] "1" "1" "1" "1" "0" "0" "0" "0" "1" "1" "1" "0" "0" "1" "1" "1" "0" "1" "1"
## [20] "0" "0" "0" "0" "0" "0" "0" "1" "1" "0" "0" "0" "0" "0" "0" "1" "0" "0" "0"
## [39] "1" "0" "0" "0" "1" "0" "0" "0" "1" "0" "0" "0" "0" "0" "0" "1" "1" "1" "1"
## [58] "1" "1" "0" "0" "0" "0" "0" "0" "0" "0" "1" "0" "0" "0" "0" "0" "0" "0" "0"
## [77] "0" "0"
## 
## $`0.3MammalX`
##  [1] "1" "0" "0" "0" "0" "0" "0" "0" "0" "0" "1" "1" "1" "0" "0" "0" "0" "0" "0"
## [20] "0" "0" "0" "1" "0" "0" "0" "1" "1" "1" "0" "0" "0" "0" "0" "1" "1" "0" "0"
## [39] "1" "1" "0" "0" "1" "1" "0" "0" "0" "0" "0" "0" "0" "0" "0" "1" "0" "0" "0"
## [58] "0" "0" "0" "1" "1" "0" "0" "0" "0" "0" "1" "0" "0" "0" "0" "0" "1" "0" "0"
## [77] "0" "0"
## 
## $`1GermanXXX`
##  [1] "0" "1" "0" "0" "1" "1" "1" "0" "0" "1" "1" "1" "1" "0" "0" "0" "0" "1" "1"
## [20] "0" "0" "0" "0" "1" "0" "1" "1" "1" "1" "1" "0" "0" "0" "0" "1" "1" "0" "1"
## [39] "0" "0" "0" "0" "1" "1" "0" "1" "1" "0" "0" "0" "0" "0" "0" "1" "1" "1" "0"
## [58] "0" "0" "0" "1" "0" "0" "0" "1" "0" "0" "1" "0" "?" "?" "?" "?" "0" "0" "1"
## [77] "1" "1"
## 
## $`2FrenchXXX`
##  [1] "0" "1" "0" "0" "1" "1" "0" "1" "0" "1" "1" "0" "0" "1" "1" "1" "0" "1" "0"
## [20] "0" "0" "0" "0" "0" "0" "1" "1" "1" "1" "1" "0" "0" "0" "0" "1" "1" "0" "0"
## [39] "1" "0" "0" "0" "1" "1" "0" "0" "0" "0" "0" "0" "0" "0" "0" "1" "1" "0" "0"
## [58] "0" "0" "1" "0" "0" "1" "0" "0" "0" "1" "1" "0" "0" "0" "1" "1" "0" "0" "1"
## [77] "1" "0"
## 
## $`3FrenchXXX`
##  [1] "0" "1" "0" "0" "1" "1" "1" "0" "0" "0" "1" "1" "0" "0" "0" "0" "0" "0" "?"
## [20] "?" "?" "1" "0" "1" "0" "0" "1" "1" "1" "1" "0" "0" "0" "0" "1" "1" "0" "1"
## [39] "1" "1" "0" "1" "1" "1" "0" "1" "0" "0" "0" "0" "0" "0" "0" "1" "1" "1" "0"
## [58] "0" "0" "0" "1" "0" "0" "0" "1" "0" "0" "1" "0" "0" "0" "1" "1" "0" "0" "1"
## [77] "1" "1"
names(DragonNexus)
##  [1] "0.1FishXXX" "0.2SnakeXX" "0.3MammalX" "1GermanXXX" "2FrenchXXX"
##  [6] "3FrenchXXX" "4DutchXXXX" "5EnglishXX" "6AmericanX" "7FrenchXXX"
## [11] "8EnglishXX" "9FrenchXXX" "10FrenchXX" "11SpanishX" "12Japanese"
## [16] "13Japanese" "14Japanese" "15Japanese" "16Japanese" "17Japanese"
## [21] "18Japanese" "19Japanese" "20Japanese" "21Japanese" "22Japanese"
## [26] "23Japanese" "24Japanese" "25Japanese" "26Japanese" "27Japanese"
## [31] "28Japanese" "29Japanese" "30ItalianX" "31ItalianX" "32ItalianX"
## [36] "33XXXXXXXX" "34GermanXX" "35EnglishX" "36GermanXX" "37DutchXXX"
## [41] "38SpanishX" "39ItalianX" "40ItalianX" "41EnglishX" "42ItalianX"
## [46] "43SpanishX" "44ItalianX" "45ItalianX" "46EnglishX" "47ItalianX"
## [51] "48DutchXXX" "49IndianXX" "50Japanese" "51Japanese" "52Japanese"
## [56] "53Japanese" "54IranianX" "55IranianX" "56IranianX" "57IranianX"
## [61] "58TurkishX" "59IranianX" "60IranianX" "61TurkishX" "62TurkishX"
## [66] "63UkraineX" "64UkraineX" "65RussiaXX" "66UkraineX" "67RussiaXX"
## [71] "68GreeceXX" "69ItalianX" "70American" "71BritishX" "72BritishX"
## [76] "73BritishX" "74BritishX"

Compare the header of the nexus R object to the layout of the text-based nexus file. What is different? How does R treat the data?

Distance Matrix

Since we aren’t using DNA, we can’t use the dist.dna() function from ape. Instead, we use the more basic dist() function, which calculates the similarity/dissimilarity matrix based on our binary traits:

DragonDistMat<-dist(DragonNexus,method='binary')
## Error in dist(DragonNexus, method = "binary"): 'list' object cannot be coerced to type 'double'

Why do we get an error?

We get an error because the dist() function doesn’t like the fact that our DragonNexus object is a list. Looking at the ?dist help file tells us what kind of input the function is looking for (look at the description of the x object)

How can we fix this problem?

We can convert a list object to a data.frame object fairly easily, but there is a trick: we need to unlist the list object to make it a vector, before we can convert it to a matrix.

DragonNexusDF<-data.frame(matrix(unlist(DragonNexus), ncol=78,byrow=T))
row.names(DragonNexusDF)<-names(DragonNexus)
head(DragonNexusDF)
##            X1 X2 X3 X4 X5 X6 X7 X8 X9 X10 X11 X12 X13 X14 X15 X16 X17 X18 X19
## 0.1FishXXX  0  0  0  0  0  0  0  0  0   0   0   0   0   0   0   0   0   0   0
## 0.2SnakeXX  1  1  1  1  0  0  0  0  1   1   1   0   0   1   1   1   0   1   1
## 0.3MammalX  1  0  0  0  0  0  0  0  0   0   1   1   1   0   0   0   0   0   0
## 1GermanXXX  0  1  0  0  1  1  1  0  0   1   1   1   1   0   0   0   0   1   1
## 2FrenchXXX  0  1  0  0  1  1  0  1  0   1   1   0   0   1   1   1   0   1   0
## 3FrenchXXX  0  1  0  0  1  1  1  0  0   0   1   1   0   0   0   0   0   0   ?
##            X20 X21 X22 X23 X24 X25 X26 X27 X28 X29 X30 X31 X32 X33 X34 X35 X36
## 0.1FishXXX   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0
## 0.2SnakeXX   0   0   0   0   0   0   0   1   1   0   0   0   0   0   0   1   0
## 0.3MammalX   0   0   0   1   0   0   0   1   1   1   0   0   0   0   0   1   1
## 1GermanXXX   0   0   0   0   1   0   1   1   1   1   1   0   0   0   0   1   1
## 2FrenchXXX   0   0   0   0   0   0   1   1   1   1   1   0   0   0   0   1   1
## 3FrenchXXX   ?   ?   1   0   1   0   0   1   1   1   1   0   0   0   0   1   1
##            X37 X38 X39 X40 X41 X42 X43 X44 X45 X46 X47 X48 X49 X50 X51 X52 X53
## 0.1FishXXX   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0
## 0.2SnakeXX   0   0   1   0   0   0   1   0   0   0   1   0   0   0   0   0   0
## 0.3MammalX   0   0   1   1   0   0   1   1   0   0   0   0   0   0   0   0   0
## 1GermanXXX   0   1   0   0   0   0   1   1   0   1   1   0   0   0   0   0   0
## 2FrenchXXX   0   0   1   0   0   0   1   1   0   0   0   0   0   0   0   0   0
## 3FrenchXXX   0   1   1   1   0   1   1   1   0   1   0   0   0   0   0   0   0
##            X54 X55 X56 X57 X58 X59 X60 X61 X62 X63 X64 X65 X66 X67 X68 X69 X70
## 0.1FishXXX   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0
## 0.2SnakeXX   1   1   1   1   1   1   0   0   0   0   0   0   0   0   1   0   0
## 0.3MammalX   1   0   0   0   0   0   0   1   1   0   0   0   0   0   1   0   0
## 1GermanXXX   1   1   1   0   0   0   0   1   0   0   0   1   0   0   1   0   ?
## 2FrenchXXX   1   1   0   0   0   0   1   0   0   1   0   0   0   1   1   0   0
## 3FrenchXXX   1   1   1   0   0   0   0   1   0   0   0   1   0   0   1   0   0
##            X71 X72 X73 X74 X75 X76 X77 X78
## 0.1FishXXX   0   0   0   0   0   0   0   0
## 0.2SnakeXX   0   0   0   0   0   0   0   0
## 0.3MammalX   0   0   0   1   0   0   0   0
## 1GermanXXX   ?   ?   ?   0   0   1   1   1
## 2FrenchXXX   0   1   1   0   0   1   1   0
## 3FrenchXXX   0   1   1   0   0   1   1   1
DragonDist<-dist(DragonNexusDF,method='binary')
## Warning in dist(DragonNexusDF, method = "binary"): NAs introduced by coercion
DragonDistMat<-as.matrix(DragonDist)

Visualize

To visualize the matrix in ggplot, we need to rearrange the data from an \(n \times n\) matrix to a \(n^2 \times 3\) matrix (i.e. a linear matrix). This is easiliy done with the melt function from the reshape2 library.

library(reshape2)
PDat<-melt(DragonDistMat)

Let’s look at the difference in dimension (structural layout) of the two data objects.

dim(DragonDistMat)
## [1] 77 77
head(DragonDistMat)
##            0.1FishXXX 0.2SnakeXX 0.3MammalX 1GermanXXX 2FrenchXXX 3FrenchXXX
## 0.1FishXXX          0  1.0000000  1.0000000  1.0000000  1.0000000  1.0000000
## 0.2SnakeXX          1  0.0000000  0.7428571  0.6744186  0.6250000  0.7555556
## 0.3MammalX          1  0.7428571  0.0000000  0.6578947  0.7105263  0.6216216
## 1GermanXXX          1  0.6744186  0.6578947  0.0000000  0.5000000  0.2571429
## 2FrenchXXX          1  0.6250000  0.7105263  0.5000000  0.0000000  0.5238095
## 3FrenchXXX          1  0.7555556  0.6216216  0.2571429  0.5238095  0.0000000
##            4DutchXXXX 5EnglishXX 6AmericanX 7FrenchXXX 8EnglishXX 9FrenchXXX
## 0.1FishXXX  1.0000000  1.0000000  1.0000000  1.0000000  1.0000000  1.0000000
## 0.2SnakeXX  0.7435897  0.6829268  0.3870968  0.6888889  0.7948718  0.7368421
## 0.3MammalX  0.7142857  0.7027027  0.7777778  0.7073171  0.9210526  0.6250000
## 1GermanXXX  0.5750000  0.3611111  0.6829268  0.4761905  0.7000000  0.5789474
## 2FrenchXXX  0.3750000  0.4736842  0.6904762  0.4390244  0.6666667  0.4857143
## 3FrenchXXX  0.5135135  0.3055556  0.6666667  0.4883721  0.6829268  0.4166667
##            10FrenchXX 11SpanishX 12Japanese 13Japanese 14Japanese 15Japanese
## 0.1FishXXX  1.0000000  1.0000000  1.0000000  1.0000000  1.0000000  1.0000000
## 0.2SnakeXX  0.6052632  0.7272727  0.6756757  0.6538462  0.4871795  0.4571429
## 0.3MammalX  0.6756757  0.6315789  0.5937500  0.5238095  0.7073171  0.7368421
## 1GermanXXX  0.4871795  0.3243243  0.6097561  0.5000000  0.5681818  0.6279070
## 2FrenchXXX  0.5128205  0.3947368  0.5945946  0.3809524  0.5909091  0.6428571
## 3FrenchXXX  0.4102564  0.2222222  0.6428571  0.4166667  0.5957447  0.6739130
##            16Japanese 17Japanese 18Japanese 19Japanese 20Japanese 21Japanese
## 0.1FishXXX  1.0000000  1.0000000  1.0000000  1.0000000  1.0000000  1.0000000
## 0.2SnakeXX  0.6590909  0.6904762  0.5925926  0.4722222  0.6470588  0.6585366
## 0.3MammalX  0.6052632  0.6923077  0.6400000  0.7435897  0.6000000  0.7105263
## 1GermanXXX  0.5333333  0.5348837  0.6363636  0.6046512  0.5405405  0.5909091
## 2FrenchXXX  0.6222222  0.6190476  0.5483871  0.6190476  0.6388889  0.6511628
## 3FrenchXXX  0.5454545  0.5909091  0.6060606  0.6222222  0.6153846  0.6590909
##            22Japanese 23Japanese 24Japanese 25Japanese 26Japanese 27Japanese
## 0.1FishXXX  1.0000000  1.0000000  1.0000000  1.0000000  1.0000000  1.0000000
## 0.2SnakeXX  0.7000000  0.7073171  0.6842105  0.6428571  0.6315789  0.6666667
## 0.3MammalX  0.7027027  0.6756757  0.6857143  0.6750000  0.6756757  0.6842105
## 1GermanXXX  0.5365854  0.5609756  0.5500000  0.5869565  0.5581395  0.5250000
## 2FrenchXXX  0.6250000  0.5853659  0.6410256  0.5714286  0.5500000  0.6136364
## 3FrenchXXX  0.5952381  0.6000000  0.6097561  0.6000000  0.5365854  0.6000000
##            28Japanese 29Japanese 30ItalianX 31ItalianX 32ItalianX 33XXXXXXXX
## 0.1FishXXX  1.0000000  1.0000000  1.0000000  1.0000000  1.0000000  1.0000000
## 0.2SnakeXX  0.7021277  0.6829268  0.6666667  0.7073171  0.6000000  0.7500000
## 0.3MammalX  0.7500000  0.7027027  0.6666667  0.5757576  0.6486486  0.7894737
## 1GermanXXX  0.5555556  0.5952381  0.4871795  0.3611111  0.3947368  0.4444444
## 2FrenchXXX  0.5957447  0.5609756  0.4166667  0.5128205  0.4750000  0.4285714
## 3FrenchXXX  0.6041667  0.6136364  0.4615385  0.2857143  0.3243243  0.4054054
##            34GermanXX 35EnglishX 36GermanXX 37DutchXXX 38SpanishX 39ItalianX
## 0.1FishXXX  1.0000000  1.0000000  1.0000000  1.0000000  1.0000000  1.0000000
## 0.2SnakeXX  0.6666667  0.7446809  0.7826087  0.6739130  0.6666667  0.6829268
## 0.3MammalX  0.7142857  0.8222222  0.6842105  0.7209302  0.6829268  0.7027027
## 1GermanXXX  0.4146341  0.5116279  0.4750000  0.3902439  0.4523810  0.6046512
## 2FrenchXXX  0.4523810  0.4761905  0.4358974  0.5000000  0.4146341  0.4473684
## 3FrenchXXX  0.3902439  0.5555556  0.3684211  0.2564103  0.3500000  0.5714286
##            40ItalianX 41EnglishX 42ItalianX 43SpanishX 44ItalianX 45ItalianX
## 0.1FishXXX  1.0000000  1.0000000  1.0000000  1.0000000  1.0000000  1.0000000
## 0.2SnakeXX  0.6976744  0.7567568  0.7209302  0.7045455  0.6857143  0.6363636
## 0.3MammalX  0.6486486  0.6666667  0.6756757  0.6578947  0.6129032  0.6829268
## 1GermanXXX  0.3947368  0.5000000  0.3333333  0.4102564  0.4411765  0.3750000
## 2FrenchXXX  0.4750000  0.4545455  0.4210526  0.4500000  0.4411765  0.4883721
## 3FrenchXXX  0.3243243  0.4411765  0.2777778  0.3846154  0.4117647  0.2631579
##            46EnglishX 47ItalianX 48DutchXXX 49IndianXX 50Japanese 51Japanese
## 0.1FishXXX  1.0000000  1.0000000  1.0000000  1.0000000  1.0000000  1.0000000
## 0.2SnakeXX  0.7179487  0.7333333  0.7272727  0.5945946  0.6578947  0.6595745
## 0.3MammalX  0.6060606  0.6923077  0.6842105  0.6060606  0.6571429  0.7446809
## 1GermanXXX  0.5263158  0.4500000  0.5238095  0.5952381  0.5384615  0.4666667
## 2FrenchXXX  0.4722222  0.4500000  0.3947368  0.5526316  0.5263158  0.3414634
## 3FrenchXXX  0.3888889  0.3421053  0.4250000  0.6000000  0.5714286  0.4222222
##            52Japanese 53Japanese 54IranianX 55IranianX 56IranianX 57IranianX
## 0.1FishXXX  1.0000000  1.0000000  1.0000000  1.0000000  1.0000000  1.0000000
## 0.2SnakeXX  0.6176471  0.5862069  0.7073171  0.5789474  0.6875000  0.6511628
## 0.3MammalX  0.5666667  0.6206897  0.6756757  0.6111111  0.7333333  0.6315789
## 1GermanXXX  0.4722222  0.5714286  0.4736842  0.6046512  0.5531915  0.6000000
## 2FrenchXXX  0.5555556  0.4193548  0.4473684  0.5250000  0.4186047  0.5000000
## 3FrenchXXX  0.6000000  0.4857143  0.3421053  0.5238095  0.4666667  0.5777778
##            58TurkishX 59IranianX 60IranianX 61TurkishX 62TurkishX 63UkraineX
## 0.1FishXXX  1.0000000  1.0000000  1.0000000  1.0000000  1.0000000  1.0000000
## 0.2SnakeXX  0.4750000  0.6923077  0.6585366  0.6410256  0.5750000  0.6744186
## 0.3MammalX  0.6410256  0.7500000  0.6388889  0.6571429  0.6578947  0.5833333
## 1GermanXXX  0.6170213  0.6976744  0.6666667  0.6363636  0.5454545  0.5238095
## 2FrenchXXX  0.6086957  0.6341463  0.6363636  0.6000000  0.5365854  0.6222222
## 3FrenchXXX  0.5744681  0.6744186  0.5476190  0.6097561  0.5476190  0.5681818
##            64UkraineX 65RussiaXX 66UkraineX 67RussiaXX 68GreeceXX 69ItalianX
## 0.1FishXXX  1.0000000  1.0000000  1.0000000  1.0000000  1.0000000  1.0000000
## 0.2SnakeXX  0.6388889  0.6000000  0.7000000  0.5862069  0.6976744  0.7777778
## 0.3MammalX  0.5483871  0.6111111  0.6666667  0.5384615  0.6842105  0.7750000
## 1GermanXXX  0.6250000  0.5813953  0.4324324  0.5588235  0.5813953  0.6046512
## 2FrenchXXX  0.5789474  0.4358974  0.5384615  0.4516129  0.2571429  0.4615385
## 3FrenchXXX  0.5128205  0.5000000  0.3333333  0.5294118  0.4634146  0.5238095
##            70American 71BritishX 72BritishX 73BritishX 74BritishX
## 0.1FishXXX  1.0000000  1.0000000  1.0000000  1.0000000  1.0000000
## 0.2SnakeXX  0.6388889  0.7441860  0.6666667  0.6744186  0.7021277
## 0.3MammalX  0.5937500  0.7692308  0.8000000  0.7250000  0.7500000
## 1GermanXXX  0.4722222  0.4358974  0.5744681  0.5116279  0.4888889
## 2FrenchXXX  0.4571429  0.5000000  0.6000000  0.5000000  0.4761905
## 3FrenchXXX  0.4736842  0.4871795  0.5116279  0.4750000  0.4523810
dim(PDat)
## [1] 5929    3
head(PDat)
##         Var1       Var2 value
## 1 0.1FishXXX 0.1FishXXX     0
## 2 0.2SnakeXX 0.1FishXXX     1
## 3 0.3MammalX 0.1FishXXX     1
## 4 1GermanXXX 0.1FishXXX     1
## 5 2FrenchXXX 0.1FishXXX     1
## 6 3FrenchXXX 0.1FishXXX     1

Now let’s plot. Note: Here is a good resource from the developers of ggtree. This GitHub link includes details on how to make a visually appealing tree.

library(ggplot2)
ggplot(data = PDat, aes(x=Var1, y=Var2, fill=value)) + 
  geom_tile()+scale_fill_gradientn(colours=c("white","blue","green","red")) +
  theme(axis.text.x = element_text(angle = 90, hjust = 1, vjust = 0.5))

Looks like we have a nice range of values to try to cluster by distance.

Tree Building

Now that we have a distance matrix, let’s try building a phylogeny using the neighbour-joining (NJ) method.

DragonTree<-nj(DragonDist)

Now let’s draw the tree using the ggtree library to compare the two:

devtools::install_github("hadley/devtools")
## rlang       (1.0.1 -> 1.0.2  ) [CRAN]
## glue        (1.6.1 -> 1.6.2  ) [CRAN]
## withr       (2.4.3 -> 2.5.0  ) [CRAN]
## desc        (1.4.0 -> 1.4.1  ) [CRAN]
## xfun        (0.29  -> 0.30   ) [CRAN]
## sass        (NA    -> 0.4.0  ) [CRAN]
## bslib       (NA    -> 0.3.1  ) [CRAN]
## textshaping (NA    -> 0.3.6  ) [CRAN]
## openssl     (1.4.6 -> 2.0.0  ) [CRAN]
## promises    (NA    -> 1.2.0.1) [CRAN]
## later       (NA    -> 1.3.0  ) [CRAN]
## sourcetools (NA    -> 0.1.7  ) [CRAN]
## fontawesome (NA    -> 0.2.2  ) [CRAN]
## httpuv      (NA    -> 1.6.5  ) [CRAN]
## rmarkdown   (2.11  -> 2.12   ) [CRAN]
## ragg        (NA    -> 1.2.2  ) [CRAN]
## downlit     (NA    -> 0.4.0  ) [CRAN]
## shiny       (NA    -> 1.7.1  ) [CRAN]
## crosstalk   (NA    -> 1.2.0  ) [CRAN]
## urlchecker  (NA    -> 1.0.1  ) [CRAN]
## profvis     (NA    -> 0.3.7  ) [CRAN]
## pkgdown     (NA    -> 2.0.2  ) [CRAN]
## miniUI      (NA    -> 0.1.1.1) [CRAN]
## DT          (NA    -> 0.21   ) [CRAN]
## 
##   There is a binary version available but the source version is later:
##         binary source needs_compilation
## pkgdown  1.6.1  2.0.2             FALSE
## 
## package 'rlang' successfully unpacked and MD5 sums checked
## package 'glue' successfully unpacked and MD5 sums checked
## package 'withr' successfully unpacked and MD5 sums checked
## package 'desc' successfully unpacked and MD5 sums checked
## package 'xfun' successfully unpacked and MD5 sums checked
## package 'sass' successfully unpacked and MD5 sums checked
## package 'bslib' successfully unpacked and MD5 sums checked
## package 'textshaping' successfully unpacked and MD5 sums checked
## package 'openssl' successfully unpacked and MD5 sums checked
## package 'promises' successfully unpacked and MD5 sums checked
## package 'later' successfully unpacked and MD5 sums checked
## package 'sourcetools' successfully unpacked and MD5 sums checked
## package 'fontawesome' successfully unpacked and MD5 sums checked
## package 'httpuv' successfully unpacked and MD5 sums checked
## package 'rmarkdown' successfully unpacked and MD5 sums checked
## package 'ragg' successfully unpacked and MD5 sums checked
## package 'downlit' successfully unpacked and MD5 sums checked
## package 'shiny' successfully unpacked and MD5 sums checked
## package 'crosstalk' successfully unpacked and MD5 sums checked
## package 'urlchecker' successfully unpacked and MD5 sums checked
## package 'profvis' successfully unpacked and MD5 sums checked
## package 'miniUI' successfully unpacked and MD5 sums checked
## package 'DT' successfully unpacked and MD5 sums checked
## 
## The downloaded binary packages are in
##  C:\Users\rob_c\AppData\Local\Temp\RtmpK6cGb4\downloaded_packages
## * checking for file 'C:\Users\rob_c\AppData\Local\Temp\RtmpK6cGb4\remotes25884f235da9\r-lib-devtools-575ae4e/DESCRIPTION' ... OK
## * preparing 'devtools':
## * checking DESCRIPTION meta-information ... OK
## * checking for LF line-endings in source and make files and shell scripts
## * checking for empty or unneeded directories
## * building 'devtools_2.4.3.9000.tar.gz'
## 
library(ggtree)
ggtree(DragonTree,layout="circular")

ggtree(DragonTree,layout="rectangular")

Woah, what’s going on here?

This tree has some problems. The branches are very long relative to the bifurcations among groups. It is almost as if all the characters are all mixed up. This is what we might expect if dragons were created from active imaginations and didn’t really evolve from each other.

Another reason is that we are treating all traits the same. For example, we treat snout length the same as limb number. However, we might argue that limb length evolves more slowly than snout length, that dragons with the same number of limbs are more likely related than dragons with similar snout lengths. We can do this by weighting our traits. Weights are just numbers that we multiply (or sometimes add) to our data so that traits with higher weights have a stronger influence on our clustering algorithm.

Weights

The Weights.csv data table has a set of weights that were used for the Dragon Phylogeny t-shirt design. Let’s take a look:

WeightsDat<-read.csv("Data/Weights.csv")
Code Origin Weight Rationale
1 Appendages ZZZZ Highly Highly conserved (all tertrapods came from the same fish; loss of limbs more common (e.g. lizards to snakes)
2 Mass 1111 Weak: e.g. squirrel vs. Kangaroo
3 Body Type CC Body type somewhat conserved (e.g. snakes vs. lizards)
4 ClawType 5555 Somewhat Strong: e.g. mammals vs. eagle
5 Dorsal Ridges 111111 Weak: e.g. dinosaurs
6 Ear Morphology 111 Weak: e.g. deer mouse vs. vole
7 Eye Morphology 111 Weak: e.g. nocturnal vs. diurnal and eye size
8 Eye Position 1 Weak: e.g. predators vs. prey
9 HornType 111 Weak: e.g. gazelle vs. deer
10 NosePos 7 Moderately conserved e.g. amphibians vs. fish
11 Nasal Morphology 1 Weak: e.g. dog breeds
12 Skin-dorsal 999999 Somewhat conserved: e.g. reptiles vs. mammals
13 Skin-head 333333 Somewhat conserved: e.g. reptiles vs. mammals
14 Skin-ventral 333333 Somewhat conserved: e.g. reptiles vs. mammals
15 SnoutType 3333 Moderate: e.g. mouse vs. rhinocerous
16 TailType 11 Weak: e.g. dinosaurs
17 Teeth 3333 Moderately conserved e.g. herbivore vs. carnivore mammals
18 OppToes 1 Weak: e.g. chimps vs. humans
19 Toe Number 333333 Moderate: e.g. horse vs. lion
20 Tongue Length 1 Weak: e.g. anteater vs. primate
21 Tongue Morphology Type 44 Moderate: e.g. snake vs. lizard
22 Ventral Plates 7 Somewhat conserved: e.g. snakes vs. mammals
23 Wiskers 11 Weak: e.g. catfish vs. bass
24 Wing Structure JJ Somewhat highly conserved; single origin in bats and birds
25 WingType AAA Strong: e.g. bat vs. bird

Weights are scaled on a scale from 1 through 9 and then A (A=10) through Z (Z=35). The number of each digit corresponds to the number of binary values for that trait, and the traits are ordered in the same way they were encoded. There are programs we could use to calculate distance using these as weights (e.g. BEAST2). However, we’ll do it manually to see how it works. All we need to do is multiply each binary value by it’s weight. So the first step is to import the weights and extract the weights column. However, that’s going to take a few steps:

  1. Create a single vector of weights
  2. Convert each letter to its corresponding weight value (e.g. A=10, B=11, etc.)
  3. Multiply the weight value by the trait vector for each dragon
  4. Re-calculate our distance matrix
  5. Plot the tree

1. Create a single vector of weights.

Easy:

Weights<-paste0(WeightsDat$Weight,collapse="")
Weights<-strsplit(Weights,split="")[[1]]

2. Convert each letter to a value.

We could encode very single letter individually, or we can use a custom function with the built-in LETTERS object:

LETTERS # See what LETTERS is (see also letters)
##  [1] "A" "B" "C" "D" "E" "F" "G" "H" "I" "J" "K" "L" "M" "N" "O" "P" "Q" "R" "S"
## [20] "T" "U" "V" "W" "X" "Y" "Z"
which(LETTERS=="G")
## [1] 7
WeightsNum<-rep(NA,length(Weights))
for(i in 1:length(WeightsNum)){
  if(Weights[i] %in% LETTERS){
    WeightsNum[i]<-which(LETTERS==Weights[i])+9
  } else {
    WeightsNum[i]<-Weights[i]
  }
}
WeightsNum<-as.numeric(WeightsNum)

Now we have a vector of weights, which should have the same length as the number of 1s and 0s (i.e. 78 ‘characters’):

length(WeightsNum)
## [1] 78

3. Multiply the weight value by the trait vector for each dragon.

This is complicated by the fact that our data include missing data coded as ?, so all of our characters are stored as strings. Since \(0*x=0\) and \(?*x\) is undefinded, we really just need to multiply the 1s, which is equivalent to replacing the 1s with their corresponding weight values. To do this, we also have to slice our list object using the double brackets [[]].

WtDragonNexus<-DragonNexus # Make a new weighted data frame object
for (i in 1:length(DragonNexus)){
  RepWeight<-DragonNexus[[i]]==1
  WtDragonNexus[[i]][RepWeight]<-WeightsNum[RepWeight]
  RepWeight<-NA
}

4. Re-calculate our distance matrix

We just need to modify our objects

WtDragonNexusDF<-data.frame(matrix(unlist(WtDragonNexus),ncol=78,byrow=T))
row.names(WtDragonNexusDF)<-names(WtDragonNexus)
WtDragonDist<-dist(WtDragonNexusDF,method='euclidean')
## Warning in dist(WtDragonNexusDF, method = "euclidean"): NAs introduced by
## coercion
WtDragonDistMat<-as.matrix(WtDragonDist)

Note the change in method from binary to euclidean… why?

Compare the new distance matrix to the older one above. Note the much stronger structure:

WtPDat<-melt(WtDragonDistMat)
ggplot(data = WtPDat, aes(x=Var1, y=Var2, fill=value)) + 
  geom_tile()+scale_fill_gradientn(colours=c("white","blue","green","red")) +
  theme(axis.text.x = element_text(angle = 90, hjust = 1, vjust = 0.5))

5. Plot the tree

Let’s compare the minimal evolution (ME) and Neighbour Joining (NJ) methods

WtDragonTree<-fastme.bal(WtDragonDist)
WtDragonTreeNJ<-nj(WtDragonDist)
ggtree(WtDragonTree,layout="circular")

ggtree(WtDragonTreeNJ,layout="circular")

Tree Formatting

Let’s try to make it look a bit better. To do this, we need to understand the data format

str(WtDragonTree)
## List of 4
##  $ edge       : int [1:151, 1:2] 78 78 79 80 81 81 82 83 83 82 ...
##  $ edge.length: num [1:151] 25.6 5.88 9.8 1.63 12.13 ...
##  $ tip.label  : chr [1:77] "0.1FishXXX" "8EnglishXX" "32ItalianX" "73BritishX" ...
##  $ Nnode      : int 75
##  - attr(*, "class")= chr "phylo"
##  - attr(*, "order")= chr "cladewise"

Note that this is a list object with phylo class and cladewise order. We can also see three ‘slices’ (denoted with $). The tip.label slice contains the specimen labels. The edge slice contains all of the line segments,

Q: Why are there 151 edges if we only have 77 dragons?

The edges include lines that connect clades.

Colour by OTU

The tip labels contain information about the origin of each dragon image (‘Fish’, ‘Reptile’ and ‘Mammal’ were added later as outgroups).

head(WtDragonTree$tip.label)
## [1] "0.1FishXXX" "8EnglishXX" "32ItalianX" "73BritishX" "37DutchXXX"
## [6] "45ItalianX"

We can use this to colour-code our tree and see if dragons from the same regions cluster together. We can use regular expressions to parse out a vector:

Country<-gsub("[0-9\\.]+([^X]+)X*","\\1",WtDragonTree$tip.label) # Remove leading numbers

Next we have to group tip.labels by their corresponding country. There is a nice function in R called split that makes this easy to do:

CountryGroups<-split(WtDragonTree$tip.label, Country)

Now we use the groupOTU function to apply the grouping information for plotting:

WtDTcol<-groupOTU(WtDragonTree,CountryGroups)
str(WtDTcol)
## List of 4
##  $ edge       : int [1:151, 1:2] 78 78 79 80 81 81 82 83 83 82 ...
##  $ edge.length: num [1:151] 25.6 5.88 9.8 1.63 12.13 ...
##  $ tip.label  : chr [1:77] "0.1FishXXX" "8EnglishXX" "32ItalianX" "73BritishX" ...
##  $ Nnode      : int 75
##  - attr(*, "class")= chr "phylo"
##  - attr(*, "order")= chr "cladewise"
##  - attr(*, "group")= Factor w/ 19 levels "3","American",..: 6 5 12 3 4 12 8 17 3 11 ...

Notice how there is a new group attribute, which is a factor containing our country groups.

ggtree(WtDTcol,layout="circular",aes(colour=group))+geom_tiplab(size=2,aes(angle=angle))

What might we infer from this figure?

Colour by clade

As an alternative to colouring by region, we might want to point out a few clades (i.e. groups of dragons that cluster together). For example, it looks like the outer node

WtDTclade<-groupClade(WtDragonTree,.node=c(142,128,103,90,80))
ggtree(WtDTclade,layout="circular",aes(colour=group)) + 
  geom_cladelabel(node=142,label="Serpentidae",hjust=0.5,offset.text=4,fontsize=3,angle=-45) +
  geom_cladelabel(node=128,label="Wyvernidae",hjust=0.5,offset.text=4,fontsize=3,angle=15) +
  geom_cladelabel(node=103,label="Orientalia",hjust=0.5,offset.text=4,fontsize=3,angle=40) +
  geom_cladelabel(node=90,label="Dracopteronidae",hjust=0.5,offset.text=4,fontsize=3,angle=-55) +
  geom_cladelabel(node=80,label="Dracoverisidae",hjust=0.5,offset.text=6,fontsize=3,angle=55) +
  xlim(NA,60)

NOTE: To find these nodes, we can use + geom_nodelab to plot the node number on top of each node.

Advanced Techniques:

(OPTIONAL) You can Try overlaying your phylogeny on a geographical map: https://www.molecularecologist.com/2014/11/geophylogeny-plots-in-r-for-dummies/

To do this, you would need to find latitude/longitude coordinates. An easy way to do this is to find a location in Google Maps. When you right-click you will see the longitude and latitude coordinates.