A phylogeny is a tree that represents a hypothesis about the evolutionary relationships among organisms. Connected branches form a ‘clade’ and shorter connections represent more recent evolutionary divergence compared to long (i.e. deep) branches. There are many different methods for building a phylogenetic tree, but in general, trees are built using clustering algorithms that group objects by some measure of their similarity. Modern phylogenies are based on DNA or protein similarity, but in principal we can cluster objects based on any trait we can measure. To look at the process in more detail, let’s do a phylogeny of dragons.
The Dragon Phylogeny is a project developed when Rob Colautti was a postdoctoral researcher at the University of British Columbia. It was originally developed as a t-shirt Threadless.com, which is out of print now.
HOWEVER Some recent students have resurrected the project and re-released a slightly different version, which is available at https://dragonphylogeny.threadless.com/
100% of proceeds from t-shirt sales support grants & projects that diversity representation of STEM researchers, particularly in Ecology and Evolutionary Biology.
The original design generated some media coverage:
The project now lives on at GitHub: https://github.com/ColauttiLab/DragonPhylogeny
The images are available as a pdf file on the Dragon Phylogeny GitHub site: Dragon_Pics.pdf NOTE This is a large file size (9.4MB). The PDF contains images of all dragons used in the original Dragon Phylogeny Unfortunately, we don’t have any dragon blood to do a DNA-based phylogeny. However, we can try to cluster dragons based on their physical appearance. The first step is to locate some pictures of dragons and choose a common set of traits.
Here are the traits for a variety of dragons:
Origin | RefNum | Teeth.Type | Tongue | Eyes | Eye.position | Ears | Nose | Snout | Horns | Skin.face.neck | Skin.body | Skin.Belly | Dorsal.ridge | Toes | Opposed.toes | Legs | Claws | Wings | Wing.type | Wing.structure | Body.Type | Tail.end | Size | wiskers |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
German | 1 | Blunt | forked; short | small | lateral | spearlike | upturned; forward nostrils | long | absent | smooth | fish scales | wide platemail | absent | 3 | 0 | 4 | short catlike | 2 | bat-like | 4 ridges | elongate | point | absent | |
French | 2 | Fangs | forked; long | average | lateral | absent | upturned; forward nostrils | long | plates | hairy | soft scales | hair | absent | 3 | 1 | 4 | short talon | 2 | bird-like | elongate | point | 1-2x human | absent | |
French | 3 | Blunt&Fangs | wormy; short | forward | spearlike | pointy; forward nostrils | long | absent | smooth | smooth | smooth | absent | 3 | 0 | 4 | short catlike | 2 | bat-like | 11 ridges | rotund | point | 1-2x human | absent | |
Dutch | 4 | Pointy | narrow | lateral | spearlike | upturned; forward nostrils | moderate | absent | scaly skin | round platemail | smooth | single row; spikes | 3 | 1 | 4 | short talon | 2 | bird-like | 5major; 6minor ridges | absent | ||||
English | 5 | small | lateral | absent | upturned; lateral nostrils | long | absent | smooth | smooth | smooth | single row; continuous | 2 | 0 | 4 | 2 | batlike | 5 ridges | elongate | point | 1-2x human | absent | |||
American | 6 | Pointy | single; long | large | lateral | absent | slits; forward | absent | absent | scaly skin | hexagonal scales | wide platemail | absent | 0 | 0 | 0 | 0 | 0 | 0 | 0 | snake | 3x human | absent | |
French | 7 | Blunt&Fangs | forked; long | small | forward | spearlike | upturned; forward nostrils | blunt | absent | hairy | rough | wide platemail | 4 | 1 | 4 | large talon | 2 | mix | 3major; 1 minor | elongate | point | 1-2x human | absent | |
English | 8 | Pointy | long | average | lateral | small | slits; lateral | long; duckbill | short | fish scales | fish scales | fish scales | single row; jagged | 3 | 0 | 2 | short talon | 2 | batlike | 7major; 3minor | elongate | split (Y shape) | 1-2x human | absent |
French | 9 | Pointy | average | forward | round | upturned; forward nostrils | blunt | absent | hairy | smooth | smooth | absent | 3 | 1 | 4 | short talon | 2 wing-leg hybrid | batlike | 6 ridges | rotund | point | 1-2x human | absent | |
French | 10 | Pointy | average | forward | large dishlike | upturned; forward nostrils | blunt | absent | hairy | extra wide plates | see body | absent | 3 | 0 | 2 | short talon | 2 wing-leg hybrid | batlike | 7 ridges | elongate | point | 1-2x human | absent | |
Spanish | 11 | Blunt&Fangs | small | lateral | round | hooked | long | absent | smooth | smooth | absent | 3 | 1 | 4 | short talon | 2 | batlike | 4 ridges | elongate | point | 1-2x human | absent | ||
Japanese | 12 | average | forward | round | small; forward | long | long | hairy | fish scales | wide platemail | single; overlapping plates | 3 | 1 | 4 | large talongs | absent | snake | point | long | |||||
Japanese | 13 | narrow | forward | absent | upturned; forward nostrils | long | absent | spiny | spiny | absent | 3 | NA | NA | snake | short | |||||||||
Japanese | 14 | Fangs&Pointy | large | forward | absent | upturned; forward nostrils | long | short | hairy | fish scales | wide platemail | single | NA | 0 | 0 | snake | blunt | 3x human | short | |||||
Japanese | 15 | Fangs&Pointy | large | lateral | absent | upturned; forward nostrils | blunt | jagged | spiny | fish scales | wide platemail | absent | NA | 0 | 0 | snake | 1-2x human | long | ||||||
Japanese | 16 | Pointy | short | average | lateral | absent | upturned; forward nostrils | long | 2lobed antler | hairy | rough | wide platemail | single; jagged | 3 | 0 | 2 | long catlike | 0 | snake | long | ||||
Japanese | 17 | Pointy | average | forward | spearlike | upturned; forward nostrils | long | 2lobed antler | spiny | fish scales | wide platemail | single; spiked | 3 | 0 | 4 | large talon | absent | snake | blunt | long | ||||
Japanese | 18 | Fangs&Pointy | short | large | forward | absent | upturned; forward nostrils | blunt | absent | spiny | large scales | wide platemail | absent | NA | 4 | 0 | snake | 1-2x human | ||||||
Japanese | 19 | Fangs&Pointy | large | forward | absent | upturned; forward nostrils | blunt | short | rough | fish scales | wide platemail | absent | NA | 0 | 0 | snake | 1-2x human | long | ||||||
Japanese | 20 | average | forward | round | upturned; forward nostrils | blunt | short | hairy | fish scales | wide platemail | single; jagged | 3 | 0 | 4 | short catlike | 0 | snake | long | ||||||
Japanese | 21 | Fangs&Pointy | short | small | forward | absent | upturned; forward nostrils | long | jagged antlers | hairy | fish scales | wide platemail | single; spiked | 3 | 0 | 4 | large talon | 0 | snake | long | ||||
Japanese | 22 | large | forward | spearlike | upturned; forward nostrils | long | short antler | hairy | fish scales | wide platemail | single; spiked | 3 | 0 | 4 | large talon | 0 | snake | fan | medium | |||||
Japanese | 23 | average | forward | round | upturned; forward nostrils | long | 2lobed antler | hairy | fish scales | wide platemail | single; jagged | 3 | 0 | 4 | large talon | 0 | snake | fan | 1-2x human | medium | ||||
Japanese | 24 | large | lateral | spearlike | forward | long | 2lobed antler | hairy | fish scales | wide platemail | single; spiked | 3 | 0 | 4 | large talon | 0 | snake | fan | medium | |||||
Japanese | 25 | Fangs&Pointy | small | forward | absent | upturned; forward nostrils | long | long | hairy | bumpy | double plates | single jagged | 3 | 0 | 4 | large talon | 0 | snake | fan | long | ||||
Japanese | 26 | Fangs&Pointy | short | average | forward | spearlike | upturned; forward nostrils | long | 2lobed antler | hairy | bumpy | double plates | absent | 3 | 0 | NA | large talon | 0 | snake | medium | ||||
Japanese | 27 | short | small | forward | upturned; forward nostrils | long | long | hairy | fish scales | double plates | single; overlapping plates | 3 | 0 | 4 | large talon | 0 | snake | point | 5-6x human | long | ||||
Japanese | 28 | Fangs&Pointy | short | small | forward | spearlike | upturned; forward nostrils | long | 2lobed antler | hairy | fish scales | wide platemail | single; jagged | 3 | 0 | 4 | large talon | 0 | snake | fan | 3-4x human | long | ||
Japanese | 29 | Fangs&Pointy | short | average | forward | spearlike | upturned; forward nostrils | long | very long; multilobed | hairy | fish scales | double plates | single; jagged | 4 | 0 | 4 | large talon | 0 | snake | point | 1-2x human | absent | ||
Italian | 30 | Pointy | small | forward | round | pointy | long | absent | smooth | small scales | smooth | single; ridge | 3 | 0 | 4 | short talon | 2 | birdlike | 6 ridges | elongate | point | 1/2 human | absent | |
Italian | 31 | Pointy | short | average | forward | small | upturned; forward nostrils | moderate | absent | smooth | light hair | smooth | absent | 3; webbed | 0 | 4 | short catlike | 2 | batlike | 5 ridges | elongate | point | 1/2 human | absent |
Italian | 32 | Fangs | short | small | forward | spearlike | upturned; forward nostrils | moderate | absent | hairy | light hair | smooth | absent | 3 | 0 | 2 | short catlike | 2 | batlike | 9major; 2 minor | elongate | point | 1-2x human | absent |
33 | Fangs&Pointy | average | forward | spearlike | small; forward | long | stubs | scaly skin; hair | fish scales | hair | single; spiked | 4 | 0 | 4 | short talon | 2 | batlike | 5 ridges | elongate | 1-2x human | absent | |||
German | 34 | Pointy | double | small | lateral | absent | small; forward | long | absent | smooth | hairy | double muscular | single; hairy | 3 | 1 | 2 | short talon | 2 | batlike | 2 ridges | elongate | point | 1-2x human | absent |
English | 35 | Pointy | forked; long | small | forward | spearlike | lateral | long | medium | skaly skin | fish scales | wide platemail | single; jagged | 4 | 1 | 4 | large talons | 2 | mix | 5 ridges | elongate | point | 1-2x human | nose horn (modified wisker?) |
German | 36 | Fangs&Pointy | short | large | forward | spearlike | upturned; forward nostrils | long | stubs | hairy | light hair | light hair | sigle; ridge | 4 | 1 | 4 | short catlike | 2 | batlike | 4 ridges | elongate | 1-2x human | absent | |
Dutch | 37 | Fangs&Pointy | short | average | forward | spearlike | forward | moderate | absent | smooth | smooth | muscular | single bumps | 3 | 0 | 2 | large talons | 2 | batlike | 3 ridges | elongate | point | 1-2x human | absent |
Spanish | 38 | Pointy | forked; long | small | forward | small | small; forward | long | absent | smooth | smooth | smooth | wide overlapping plates | 3 | 1 | 2 | short talon | 2 | batlike | 5 ridges | elongate | point | 1-2x human | absent |
Italian | 39 | Pointy | long | small | forward | absent | pointy | forward | jagged antlers | rough | lizard scales | lizard scales | single bumps | 3 | 1 | 4 | short catlike | 2 | birdlike | 5 ridges | elongate | point | 1/2 human | absent |
Italian | 40 | Pointy | long | small | forward | spearlike | forward | long | absent | rough | hairy | muscular | absent | 4 | 1 | 4 | short catlike | 2 | batlike | 17+ ridges | elongate | point | 1-2x human | absent |
English | 41 | Pointy | narrow; short | average | lateral | small | forward | long | absent | hairy | lizard scales | 5 | 0 | 4 | short talon | 2 | batlike | 5 ridges | elongate | 1-2x human | absent | |||
Italian | 42 | narrow; short | small | lateral | spearlike | forward | long | absent | rough | smooth | bump ridge | absent | 4 | 1 | 4 | short talon | 2 | batlike | 5 ridges | elongate | point | 1-2x human | absent | |
Spanish | 43 | Pointy | narrow; short | small | lateral | round | pointy | long | absent | hairy | complex leather | wide platemail | single; complex bumps | 3 | 1 | 4 | short talon | 2 | batlike | 5 ridges | elongate | point | 1-2x human | absent |
Italian | 44 | Pointy | long | small | lateral | small | upturned; forward nostrils | long | absent | rough | bumpy | NA | 4 | 2 | batlike | 9 ridges | elongate | point | 1-2x human | absent | ||||
Italian | 45 | Fangs&Pointy | long | small | lateral | absent | pointy | long | absent | smooth | smooth | smooth | single; ridge | 3 | 0 | 2 | short catlike | 2 | batlike | 9 ridges | elongate | point | 1-2x human | absent |
English | 46 | Pointy | narrow | lateral | absent | forward | long | stubs | smooth | platemail | smooth | absent | 4 | 0 | 4 | small talons | 0 | blunt | 1-2x human | absent | ||||
Italian | 47 | Pointy | narrow; short | small | forward | spearlike | forward | long | fanlike | hairy | hairy | smooth | absent | 3 | 1 | 4 | small talons | 2 | batlike | 7 ridges | elongate | point | 1-2x human | absent |
Dutch | 48 | Pointy | long | average | forward | spearlike | forward | moderate | stubs | smooth | light hair | smooth | single; bumped | 3 | 1 | 4 | small talons | 2 | birdlike | 5 ridges | elongate | point | 1/2 human | absent |
Indian | 49 | Pointy | long | small | lateral | absent | forward | long | absent | rough | bumpy | serrated | single; ridge | 3 | 1 | 4 | absent | 0 | point | absent | ||||
Japanese | 50 | narrow | forward | small | upturned; forward nostrils | long | long | hairy | fish scales | muscular | single; overlapping plates | 3 | 0 | 4 | large talons | 0 | snake | point | 1-2x human | absent | ||||
Japanese | 51 | Fangs | narrow | forward | spearlike | upturned; forward nostrils | long | long | hairy | rough | muscular | single; overlapping plates | 3 | 0 | 4 | small talons | 2 | birdlike | dozens ridges | snake | blunt | 1-2x human | absent | |
Japanese | 52 | narrow | lateral | round | upturned; forward nostrils | long | absent | hairy | fish scales | muscular | absent | 3 | 0 | 4 | large talons | 0 | snake | point | 1/4 human | medium | ||||
Japanese | 53 | large | forward | absent | forward | long | absent | hairy | lizard scales | lizard scales | wide overlapping plates | 3 | 0 | NA | small talons | 0 | point | 1-2x human | small | |||||
Iranian | 54 | forward | forward | long | fanlike | smooth | smooth | smooth | single; ridge | 3 | 0 | 4 | short catlike | 2 | birdlike | 7 ridges | snake | 1-2x human | absent | |||||
Iranian | 55 | Fangs | small | lateral | small | forward | long | small | smooth | smooth | smooth | single; ridge | 3 | 0 | 4 | small talons | 0 | snake | point | 1-2x human | absent | |||
Iranian | 56 | Fangs | many points | narrow | lateral | spearlike | small; forward | long | long | rough | rough | rough | absent | 4 | 1 | 2 | small talons | 2 | batlike | 4 ridges | snake | point | 1-2x human | absent |
Iranian | 57 | Fangs | narrow; short | small | forward | small | upturned; forward nostrils | long | small | rough | rough | muscular | absent | 4 | 1 | 4 | small talons | 0 | snake | point | 1-2x human | absent | ||
Turkish | 58 | Pointy | long | average | lateral | round | small; forward | long | small | rough | rough | muscular | single; ridge | NA | 0 | 0 | blunt | 1-2x human | absent | |||||
Iranian | 59 | Pointy | long | small | forward | spearlike | small; forward | long | 2lobed antler | rough | fish scales | fish scales | single; jagged | 4 | 0 | 4 | small talons | 0 | snake | point | 1-2x human | absent | ||
Iranian | 60 | Fangs | narrow; short | narrow | forward | absent | forward | long | absent | hairy | smooth | muscular | single; ridge | 4 | 0 | 4 | long catlike | 0 | snake | point | 1-2x human | absent | ||
Turkish | 61 | Pointy | forked; long | narrow | lateral | absent | forward | long | medium | rough | smooth | muscular | single; ridge | 4 | 0 | 4 | small talons | 0 | snake | absent | ||||
Turkish | 62 | Fangs&Pointy | forked; long | small | lateral | spearlike | forward | long | medium | smooth | smooth | muscular | single; ridge | 4 | 0 | 4 | small talons | 0 | snake | blunt | absent | |||
Ukraine | 63 | Blunt&Pointy | short | small | lateral | round | upturned; forward nostrils | forward | ramhorn | rough | lizard scales | muscular | hair | 3 | 0 | 4 | short catlike | 0 | elongate | point | 1-2x human | absent | ||
Ukraine | 64 | Pointy | average | lateral | absent | tiny | long | short | smooth | smooth | rough | absent | 4 | 0 | 2 | long catlike | 0 | snake | threetail | 1-2x human | absent | |||
Russia | 65 | Pointy | long | average | lateral | absent | forward | moderate | absent | rough | rough | spiny | single; bumped | 2 | 1 | 2 | short catlike | 2 wing-leg hybrid | birdlike | 6 ridges | snake | blunt | 1-2x human | absent |
Ukraine | 66 | Pointy | small | lateral | round | lateral | moderate | absent | smooth | smooth | smooth | single; bumped | 3 | 0 | 4 | small talons | 2 | batlike | 5 ridges | elongate | point | 1-2x human | absent | |
Russia | 67 | short | narrow | lateral | absent | upturned; forward nostrils | moderate | small | hairy | hairy | hairy | absent | NA | 2 | 2 wing-leg hybrid | birdlike | 5 ridges | snake | point | 1/2 human | absent | |||
Greece | 68 | Fangs | short | average | forward | spearlike | forward | moderate | absent | hairy | hairy | hairy | single; bumped | 3 | 1 | 4 | large talons | 2 | birdlike | 9 ridges | elongate | point | 1-2x human | absent |
Italian | 69 | Pointy | long | average | lateral | spearlike | upturned; lateral nostrils | long | absent | rough | smooth | smooth | single; spiked | 4 | NA | 4 | small talons | 2 | birdlike | 6 ridges | elongate | point | 1-2x human | absent |
American | 70 | small | forward | large speaklike | upturned; forward nostrils | long | absent | rough | rough | wide platemail | absent | 4 | NA | 4 | small talons | absent | point | 1-2x human | absent | |||||
British (Wales) | 71 | long; spearhead | narrow | lateral | spearlike | forward | long | absent | hairy | fish scales | lateral plates | single; spiked | 3 | 1 | 4 | large talons | 2 | batlike | 9 ridges | elongate | spear | absent | ||
British | 72 | Fangs | long; spearhead | narrow | lateral | large spearlike | tiny | long | absent | smooth | smooth | lateral plates | single; jagged | 2 | 1 | 2 | small talons | 2 | batlike | 10 ridges | elongate | spear | absent | |
British | 73 | Fangs | short | narrow | lateral | spearlike | tiny | long | small | spiny | spiny | lateral plates | single; jagged | 4 | 0 | 2 | small talons | 2 | batlike | 6 ridges | elongate | spear | absent | |
British | 74 | Fangs | long; spearhead | narrow | lateral | spearlike | forward | long | absent | rough | rough | lateral plates | single; jagged | 4 | 0 | 4 | small talons | 2 | batlike | 5 ridges | elongate | spear | absent | |
British | 75 | Fangs | long; spearhead | average | lateral | spearlike | forward | long | absent | hairy | lizard scales | muscular | single; jagged | 4 | 0 | 2 | large talons | 2 | batlike | 5 ridges | elongate | spear | absent |
Now that we’ve scored the traits, we have to encode them – in this case we’ll use 1s and 0s, with ?
indicating unknown values for the traits that couldn’t be observed in some of the photos. The coding is easy for binary traits (present/absent), however most traits are not binary, and we might want coding that will account for inferred evolutionary transitions. For example, if we look at skin type we have several categories:
What if we want to encode an evolutionary model? For example, one that looks like this:
spiny <-- fish scales --> scaly skin
| | / \
v v v v
hairy plates smooth bumpy
We need a coding system where the coding of the derived states is more similar to the ancenstral form than to the other states.
How would you code these using a binary vector?
Here’s one way:
100000 <-- 000000 --> 000100
| | / \
v v v v
110000 001000 000110 000101
Here is the full list of traits and how they were encoded.
## [1] "Order" "Trait" "Phenotype" "Binary"
Order | Trait | Phenotype | Binary |
---|---|---|---|
1 | Appendages | Zero | 1111 |
1 | Appendages | Two | 1101 |
1 | Appendages | Four | 1001 |
1 | Appendages | Six | 0001 |
2 | Mass | 1/4 human | 0000 |
2 | Mass | 1/2 human | 0001 |
2 | Mass | 1-2x human | 0011 |
2 | Mass | 3-4x human | 0111 |
2 | Mass | >4x human | 1111 |
3 | Body type | Rotund | 00 |
3 | Body type | Elongate | 01 |
3 | Body type | Snakelike | 11 |
4 | Claw type | Long Catlike | 1100 |
4 | Claw type | Short Catlike | 1000 |
4 | Claw type | Absent | 0000 |
4 | Claw type | Short Talons | 0010 |
4 | Claw type | Long Talons | 0011 |
5 | Dorsal ridges | Plates | 100000 |
5 | Dorsal ridges | Absent | 000000 |
5 | Dorsal ridges | Bumps | 010000 |
5 | Dorsal ridges | Spike | 011000 |
5 | Dorsal ridges | Ridge | 010100 |
5 | Dorsal ridges | Jagged | 010010 |
5 | Dorsal ridges | Hair | 010011 |
6 | Ear morphology | Absent | 000 |
6 | Ear morphology | Round or Small | 100 |
6 | Ear morphology | Spearlike | 010 |
6 | Ear morphology | Other | 001 |
7 | Eye morphology | Avg | 000 |
7 | Eye morphology | Large | 001 |
7 | Eye morphology | Narrow | 010 |
7 | Eye morphology | Small | 100 |
8 | Eye position | Lateral | 0 |
8 | Eye position | Forward | 1 |
9 | Horn type | Absent | 000 |
9 | Horn type | Stubs/Small | 100 |
9 | Horn type | Med/Long | 110 |
9 | Horn type | Jagged/Antlers | 111 |
10 | Nose Position | Lateral | 0 |
10 | Nose Position | Forward | 1 |
11 | Nasal morphology | Upturned | 1 |
11 | Nasal morphology | other | 0 |
12 | Skin-dorsal | Fish Scales | 000000 |
12 | Skin-dorsal | Spiny | 000100 |
12 | Skin-dorsal | Hairy | 000110 |
12 | Skin-dorsal | Plates | 000001 |
12 | Skin-dorsal | Scaly Skin | 100000 |
12 | Skin-dorsal | Rough Skin | 101000 |
12 | Skin-dorsal | Smooth Skin | 110000 |
13 | Skin-head | Fish Scales | 000000 |
13 | Skin-head | Spiny | 000100 |
13 | Skin-head | Hairy | 000110 |
13 | Skin-head | Plates | 000001 |
13 | Skin-head | Scaly Skin | 100000 |
13 | Skin-head | Rough Skin | 101000 |
13 | Skin-head | Smooth Skin | 110000 |
14 | Skin-ventral | Fish Scales | 000000 |
14 | Skin-ventral | Spiny | 000100 |
14 | Skin-ventral | Hairy | 000110 |
14 | Skin-ventral | Plates | 000001 |
14 | Skin-ventral | Scaly Skin | 100000 |
14 | Skin-ventral | Rough Skin | 101000 |
14 | Skin-ventral | Smooth Skin | 110000 |
15 | Snout type | Absent | 0000 |
15 | Snout type | Beak | 0001 |
15 | Snout type | Blunt | 1000 |
15 | Snout type | Moderate | 1100 |
15 | Snout type | Long | 1110 |
16 | Tail type | Blunt/Point | 10 |
16 | Tail type | Fan/Split Y | 00 |
16 | Tail type | Sepear | 01 |
17 | Teeth | Pointy Only | 0000 |
17 | Teeth | Blunt + Pointy | 1000 |
17 | Teeth | Blunt Only | 1100 |
17 | Teeth | Fangs + Other | 0001 |
17 | Teeth | Fangs Only | 0011 |
18 | Toes-opposing | Yes | 0 |
18 | Toes-opposing | No | 1 |
19 | Toe Number | > Five | 000000 |
19 | Toe Number | Five | 100000 |
19 | Toe Number | Four | 110000 |
19 | Toe Number | Three | 111000 |
19 | Toe Number | Two | 111100 |
19 | Toe Number | One | 111110 |
19 | Toe Number | Zero | 111111 |
20 | Tongue length | Short | 0 |
20 | Tongue length | Long | 1 |
21 | Tongue morphology | Regular | 00 |
21 | Tongue morphology | Forked | 01 |
21 | Tongue morphology | Spear | 10 |
22 | Ventral plates | Yes | 1 |
22 | Ventral plates | No | 0 |
23 | Whiskers | Absent | 00 |
23 | Whiskers | Short | 10 |
23 | Whiskers | Long | 11 |
24 | Wing structure | Absent | 00 |
24 | Wing structure | Hybrid | 10 |
24 | Wing structure | Full | 11 |
25 | Wing type | Bat | 100 |
25 | Wing type | Bird | 010 |
25 | Wing type | Hybrid | 001 |
To encode a dragon, traits are first arranged by the Order column, and then the observed Phenotype for each Trait is recoded as 1s and 0s (or ? for missing values) using the corresponding Binary code. Finally, all of the 1s and zeros are combined into a single vector.
The file DragonMatrix.nex contains the encoded traits, along with a few extra lines of information that specify the data in a nexus data file format. Nexus files are just readable text files that follow a few formatting rules, typically with a .nxs
or .nex
file name extension. For more information, see the Wikipedia Entry or Christoph Champ wiki.
Open up the file in a text editor:
The penultimate line is a semicolon ;
, which specifies the end of the coded characters, followed by End;
indicating the end of the file.
The first few lines set up the data format:
#NEXUS
begin data;
dimensions ntax=77 nchar=78;
format datatype=binary interleave=no gap=?;
matrix
Here’s a breakdown of what these first few lines do:
#NEXUS
data filebegin data
specifies the start of the datadimensions
line specifies the number of taxa (n=77 dragons) and characters (t=78 binary trait scores)format
line notes that the traits are encoded as binary. The gap
specifies the symbol used for data gaps – i.e. missing values. The interleave=no
specifies that each line contains all of the traits for the dragon. If this were DNA we might have 1000 or more base pairs in our sequence. In that case, we probably wouldn’t want a single line of base-pairs. Instead we might break it up into smaller chunks in an interleave
format, like this:Dragon1 TTGTCGAGTGTGCGGCAGCTTAGGTGAATTAAGTCCGGGCAACCTTTAGT
Dragon2 CAATAGCATACTACCGTGCGAGCCAGCTTATAGGTCGTTGCAGGTTATTA
Dragon3 ATGTCATTCGCCACGAGACTTTACTAGGGTATCATGCCGAAAGGGGATGG
Dragon1 TGTCCTGTGTGGGAAGTCGTGCCAGGACGGTTACAGCCTTAGCTTGTGCG
Dragon2 AAGCGAACTGAAGCGGTTGGGAGGATAAGCTTTACACGTGCCCCACAAAG
Dragon3 AAGCGAACTGAAGCGGTTGGGAGGATAAGCTTTACACGTGCCCCACAAAG
Now that we’re familiar with the file, let’s import it using the read.nexus.data()
function from the ape
package:
library(ape)
<-read.nexus.data("Data/DragonMatrix.nex")
DragonNexushead(DragonNexus)
## $`0.1FishXXX`
## [1] "0" "0" "0" "0" "0" "0" "0" "0" "0" "0" "0" "0" "0" "0" "0" "0" "0" "0" "0"
## [20] "0" "0" "0" "0" "0" "0" "0" "0" "0" "0" "0" "0" "0" "0" "0" "0" "0" "0" "0"
## [39] "0" "0" "0" "0" "0" "0" "0" "0" "0" "0" "0" "0" "0" "0" "0" "0" "0" "0" "0"
## [58] "0" "0" "0" "0" "0" "0" "0" "0" "0" "0" "0" "0" "0" "0" "0" "0" "0" "0" "0"
## [77] "0" "0"
##
## $`0.2SnakeXX`
## [1] "1" "1" "1" "1" "0" "0" "0" "0" "1" "1" "1" "0" "0" "1" "1" "1" "0" "1" "1"
## [20] "0" "0" "0" "0" "0" "0" "0" "1" "1" "0" "0" "0" "0" "0" "0" "1" "0" "0" "0"
## [39] "1" "0" "0" "0" "1" "0" "0" "0" "1" "0" "0" "0" "0" "0" "0" "1" "1" "1" "1"
## [58] "1" "1" "0" "0" "0" "0" "0" "0" "0" "0" "1" "0" "0" "0" "0" "0" "0" "0" "0"
## [77] "0" "0"
##
## $`0.3MammalX`
## [1] "1" "0" "0" "0" "0" "0" "0" "0" "0" "0" "1" "1" "1" "0" "0" "0" "0" "0" "0"
## [20] "0" "0" "0" "1" "0" "0" "0" "1" "1" "1" "0" "0" "0" "0" "0" "1" "1" "0" "0"
## [39] "1" "1" "0" "0" "1" "1" "0" "0" "0" "0" "0" "0" "0" "0" "0" "1" "0" "0" "0"
## [58] "0" "0" "0" "1" "1" "0" "0" "0" "0" "0" "1" "0" "0" "0" "0" "0" "1" "0" "0"
## [77] "0" "0"
##
## $`1GermanXXX`
## [1] "0" "1" "0" "0" "1" "1" "1" "0" "0" "1" "1" "1" "1" "0" "0" "0" "0" "1" "1"
## [20] "0" "0" "0" "0" "1" "0" "1" "1" "1" "1" "1" "0" "0" "0" "0" "1" "1" "0" "1"
## [39] "0" "0" "0" "0" "1" "1" "0" "1" "1" "0" "0" "0" "0" "0" "0" "1" "1" "1" "0"
## [58] "0" "0" "0" "1" "0" "0" "0" "1" "0" "0" "1" "0" "?" "?" "?" "?" "0" "0" "1"
## [77] "1" "1"
##
## $`2FrenchXXX`
## [1] "0" "1" "0" "0" "1" "1" "0" "1" "0" "1" "1" "0" "0" "1" "1" "1" "0" "1" "0"
## [20] "0" "0" "0" "0" "0" "0" "1" "1" "1" "1" "1" "0" "0" "0" "0" "1" "1" "0" "0"
## [39] "1" "0" "0" "0" "1" "1" "0" "0" "0" "0" "0" "0" "0" "0" "0" "1" "1" "0" "0"
## [58] "0" "0" "1" "0" "0" "1" "0" "0" "0" "1" "1" "0" "0" "0" "1" "1" "0" "0" "1"
## [77] "1" "0"
##
## $`3FrenchXXX`
## [1] "0" "1" "0" "0" "1" "1" "1" "0" "0" "0" "1" "1" "0" "0" "0" "0" "0" "0" "?"
## [20] "?" "?" "1" "0" "1" "0" "0" "1" "1" "1" "1" "0" "0" "0" "0" "1" "1" "0" "1"
## [39] "1" "1" "0" "1" "1" "1" "0" "1" "0" "0" "0" "0" "0" "0" "0" "1" "1" "1" "0"
## [58] "0" "0" "0" "1" "0" "0" "0" "1" "0" "0" "1" "0" "0" "0" "1" "1" "0" "0" "1"
## [77] "1" "1"
names(DragonNexus)
## [1] "0.1FishXXX" "0.2SnakeXX" "0.3MammalX" "1GermanXXX" "2FrenchXXX"
## [6] "3FrenchXXX" "4DutchXXXX" "5EnglishXX" "6AmericanX" "7FrenchXXX"
## [11] "8EnglishXX" "9FrenchXXX" "10FrenchXX" "11SpanishX" "12Japanese"
## [16] "13Japanese" "14Japanese" "15Japanese" "16Japanese" "17Japanese"
## [21] "18Japanese" "19Japanese" "20Japanese" "21Japanese" "22Japanese"
## [26] "23Japanese" "24Japanese" "25Japanese" "26Japanese" "27Japanese"
## [31] "28Japanese" "29Japanese" "30ItalianX" "31ItalianX" "32ItalianX"
## [36] "33XXXXXXXX" "34GermanXX" "35EnglishX" "36GermanXX" "37DutchXXX"
## [41] "38SpanishX" "39ItalianX" "40ItalianX" "41EnglishX" "42ItalianX"
## [46] "43SpanishX" "44ItalianX" "45ItalianX" "46EnglishX" "47ItalianX"
## [51] "48DutchXXX" "49IndianXX" "50Japanese" "51Japanese" "52Japanese"
## [56] "53Japanese" "54IranianX" "55IranianX" "56IranianX" "57IranianX"
## [61] "58TurkishX" "59IranianX" "60IranianX" "61TurkishX" "62TurkishX"
## [66] "63UkraineX" "64UkraineX" "65RussiaXX" "66UkraineX" "67RussiaXX"
## [71] "68GreeceXX" "69ItalianX" "70American" "71BritishX" "72BritishX"
## [76] "73BritishX" "74BritishX"
Compare the header of the nexus R object to the layout of the text-based nexus file. What is different? How does R treat the data?
Since we aren’t using DNA, we can’t use the dist.dna()
function from ape
. Instead, we use the more basic dist()
function, which calculates the similarity/dissimilarity matrix based on our binary traits:
<-dist(DragonNexus,method='binary') DragonDistMat
## Error in dist(DragonNexus, method = "binary"): 'list' object cannot be coerced to type 'double'
Why do we get an error?
We get an error because the dist()
function doesn’t like the fact that our DragonNexus
object is a list
. Looking at the ?dist
help file tells us what kind of input the function is looking for (look at the description of the x
object)
How can we fix this problem?
We can convert a list object to a data.frame object fairly easily, but there is a trick: we need to unlist
the list object to make it a vector, before we can convert it to a matrix.
<-data.frame(matrix(unlist(DragonNexus), ncol=78,byrow=T))
DragonNexusDFrow.names(DragonNexusDF)<-names(DragonNexus)
head(DragonNexusDF)
## X1 X2 X3 X4 X5 X6 X7 X8 X9 X10 X11 X12 X13 X14 X15 X16 X17 X18 X19
## 0.1FishXXX 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
## 0.2SnakeXX 1 1 1 1 0 0 0 0 1 1 1 0 0 1 1 1 0 1 1
## 0.3MammalX 1 0 0 0 0 0 0 0 0 0 1 1 1 0 0 0 0 0 0
## 1GermanXXX 0 1 0 0 1 1 1 0 0 1 1 1 1 0 0 0 0 1 1
## 2FrenchXXX 0 1 0 0 1 1 0 1 0 1 1 0 0 1 1 1 0 1 0
## 3FrenchXXX 0 1 0 0 1 1 1 0 0 0 1 1 0 0 0 0 0 0 ?
## X20 X21 X22 X23 X24 X25 X26 X27 X28 X29 X30 X31 X32 X33 X34 X35 X36
## 0.1FishXXX 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
## 0.2SnakeXX 0 0 0 0 0 0 0 1 1 0 0 0 0 0 0 1 0
## 0.3MammalX 0 0 0 1 0 0 0 1 1 1 0 0 0 0 0 1 1
## 1GermanXXX 0 0 0 0 1 0 1 1 1 1 1 0 0 0 0 1 1
## 2FrenchXXX 0 0 0 0 0 0 1 1 1 1 1 0 0 0 0 1 1
## 3FrenchXXX ? ? 1 0 1 0 0 1 1 1 1 0 0 0 0 1 1
## X37 X38 X39 X40 X41 X42 X43 X44 X45 X46 X47 X48 X49 X50 X51 X52 X53
## 0.1FishXXX 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
## 0.2SnakeXX 0 0 1 0 0 0 1 0 0 0 1 0 0 0 0 0 0
## 0.3MammalX 0 0 1 1 0 0 1 1 0 0 0 0 0 0 0 0 0
## 1GermanXXX 0 1 0 0 0 0 1 1 0 1 1 0 0 0 0 0 0
## 2FrenchXXX 0 0 1 0 0 0 1 1 0 0 0 0 0 0 0 0 0
## 3FrenchXXX 0 1 1 1 0 1 1 1 0 1 0 0 0 0 0 0 0
## X54 X55 X56 X57 X58 X59 X60 X61 X62 X63 X64 X65 X66 X67 X68 X69 X70
## 0.1FishXXX 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
## 0.2SnakeXX 1 1 1 1 1 1 0 0 0 0 0 0 0 0 1 0 0
## 0.3MammalX 1 0 0 0 0 0 0 1 1 0 0 0 0 0 1 0 0
## 1GermanXXX 1 1 1 0 0 0 0 1 0 0 0 1 0 0 1 0 ?
## 2FrenchXXX 1 1 0 0 0 0 1 0 0 1 0 0 0 1 1 0 0
## 3FrenchXXX 1 1 1 0 0 0 0 1 0 0 0 1 0 0 1 0 0
## X71 X72 X73 X74 X75 X76 X77 X78
## 0.1FishXXX 0 0 0 0 0 0 0 0
## 0.2SnakeXX 0 0 0 0 0 0 0 0
## 0.3MammalX 0 0 0 1 0 0 0 0
## 1GermanXXX ? ? ? 0 0 1 1 1
## 2FrenchXXX 0 1 1 0 0 1 1 0
## 3FrenchXXX 0 1 1 0 0 1 1 1
<-dist(DragonNexusDF,method='binary') DragonDist
## Warning in dist(DragonNexusDF, method = "binary"): NAs introduced by coercion
<-as.matrix(DragonDist) DragonDistMat
To visualize the matrix in ggplot
, we need to rearrange the data from an \(n \times n\) matrix to a \(n^2 \times 3\) matrix (i.e. a linear matrix). This is easiliy done with the melt
function from the reshape2
library.
library(reshape2)
<-melt(DragonDistMat) PDat
Let’s look at the difference in dimension (structural layout) of the two data objects.
dim(DragonDistMat)
## [1] 77 77
head(DragonDistMat)
## 0.1FishXXX 0.2SnakeXX 0.3MammalX 1GermanXXX 2FrenchXXX 3FrenchXXX
## 0.1FishXXX 0 1.0000000 1.0000000 1.0000000 1.0000000 1.0000000
## 0.2SnakeXX 1 0.0000000 0.7428571 0.6744186 0.6250000 0.7555556
## 0.3MammalX 1 0.7428571 0.0000000 0.6578947 0.7105263 0.6216216
## 1GermanXXX 1 0.6744186 0.6578947 0.0000000 0.5000000 0.2571429
## 2FrenchXXX 1 0.6250000 0.7105263 0.5000000 0.0000000 0.5238095
## 3FrenchXXX 1 0.7555556 0.6216216 0.2571429 0.5238095 0.0000000
## 4DutchXXXX 5EnglishXX 6AmericanX 7FrenchXXX 8EnglishXX 9FrenchXXX
## 0.1FishXXX 1.0000000 1.0000000 1.0000000 1.0000000 1.0000000 1.0000000
## 0.2SnakeXX 0.7435897 0.6829268 0.3870968 0.6888889 0.7948718 0.7368421
## 0.3MammalX 0.7142857 0.7027027 0.7777778 0.7073171 0.9210526 0.6250000
## 1GermanXXX 0.5750000 0.3611111 0.6829268 0.4761905 0.7000000 0.5789474
## 2FrenchXXX 0.3750000 0.4736842 0.6904762 0.4390244 0.6666667 0.4857143
## 3FrenchXXX 0.5135135 0.3055556 0.6666667 0.4883721 0.6829268 0.4166667
## 10FrenchXX 11SpanishX 12Japanese 13Japanese 14Japanese 15Japanese
## 0.1FishXXX 1.0000000 1.0000000 1.0000000 1.0000000 1.0000000 1.0000000
## 0.2SnakeXX 0.6052632 0.7272727 0.6756757 0.6538462 0.4871795 0.4571429
## 0.3MammalX 0.6756757 0.6315789 0.5937500 0.5238095 0.7073171 0.7368421
## 1GermanXXX 0.4871795 0.3243243 0.6097561 0.5000000 0.5681818 0.6279070
## 2FrenchXXX 0.5128205 0.3947368 0.5945946 0.3809524 0.5909091 0.6428571
## 3FrenchXXX 0.4102564 0.2222222 0.6428571 0.4166667 0.5957447 0.6739130
## 16Japanese 17Japanese 18Japanese 19Japanese 20Japanese 21Japanese
## 0.1FishXXX 1.0000000 1.0000000 1.0000000 1.0000000 1.0000000 1.0000000
## 0.2SnakeXX 0.6590909 0.6904762 0.5925926 0.4722222 0.6470588 0.6585366
## 0.3MammalX 0.6052632 0.6923077 0.6400000 0.7435897 0.6000000 0.7105263
## 1GermanXXX 0.5333333 0.5348837 0.6363636 0.6046512 0.5405405 0.5909091
## 2FrenchXXX 0.6222222 0.6190476 0.5483871 0.6190476 0.6388889 0.6511628
## 3FrenchXXX 0.5454545 0.5909091 0.6060606 0.6222222 0.6153846 0.6590909
## 22Japanese 23Japanese 24Japanese 25Japanese 26Japanese 27Japanese
## 0.1FishXXX 1.0000000 1.0000000 1.0000000 1.0000000 1.0000000 1.0000000
## 0.2SnakeXX 0.7000000 0.7073171 0.6842105 0.6428571 0.6315789 0.6666667
## 0.3MammalX 0.7027027 0.6756757 0.6857143 0.6750000 0.6756757 0.6842105
## 1GermanXXX 0.5365854 0.5609756 0.5500000 0.5869565 0.5581395 0.5250000
## 2FrenchXXX 0.6250000 0.5853659 0.6410256 0.5714286 0.5500000 0.6136364
## 3FrenchXXX 0.5952381 0.6000000 0.6097561 0.6000000 0.5365854 0.6000000
## 28Japanese 29Japanese 30ItalianX 31ItalianX 32ItalianX 33XXXXXXXX
## 0.1FishXXX 1.0000000 1.0000000 1.0000000 1.0000000 1.0000000 1.0000000
## 0.2SnakeXX 0.7021277 0.6829268 0.6666667 0.7073171 0.6000000 0.7500000
## 0.3MammalX 0.7500000 0.7027027 0.6666667 0.5757576 0.6486486 0.7894737
## 1GermanXXX 0.5555556 0.5952381 0.4871795 0.3611111 0.3947368 0.4444444
## 2FrenchXXX 0.5957447 0.5609756 0.4166667 0.5128205 0.4750000 0.4285714
## 3FrenchXXX 0.6041667 0.6136364 0.4615385 0.2857143 0.3243243 0.4054054
## 34GermanXX 35EnglishX 36GermanXX 37DutchXXX 38SpanishX 39ItalianX
## 0.1FishXXX 1.0000000 1.0000000 1.0000000 1.0000000 1.0000000 1.0000000
## 0.2SnakeXX 0.6666667 0.7446809 0.7826087 0.6739130 0.6666667 0.6829268
## 0.3MammalX 0.7142857 0.8222222 0.6842105 0.7209302 0.6829268 0.7027027
## 1GermanXXX 0.4146341 0.5116279 0.4750000 0.3902439 0.4523810 0.6046512
## 2FrenchXXX 0.4523810 0.4761905 0.4358974 0.5000000 0.4146341 0.4473684
## 3FrenchXXX 0.3902439 0.5555556 0.3684211 0.2564103 0.3500000 0.5714286
## 40ItalianX 41EnglishX 42ItalianX 43SpanishX 44ItalianX 45ItalianX
## 0.1FishXXX 1.0000000 1.0000000 1.0000000 1.0000000 1.0000000 1.0000000
## 0.2SnakeXX 0.6976744 0.7567568 0.7209302 0.7045455 0.6857143 0.6363636
## 0.3MammalX 0.6486486 0.6666667 0.6756757 0.6578947 0.6129032 0.6829268
## 1GermanXXX 0.3947368 0.5000000 0.3333333 0.4102564 0.4411765 0.3750000
## 2FrenchXXX 0.4750000 0.4545455 0.4210526 0.4500000 0.4411765 0.4883721
## 3FrenchXXX 0.3243243 0.4411765 0.2777778 0.3846154 0.4117647 0.2631579
## 46EnglishX 47ItalianX 48DutchXXX 49IndianXX 50Japanese 51Japanese
## 0.1FishXXX 1.0000000 1.0000000 1.0000000 1.0000000 1.0000000 1.0000000
## 0.2SnakeXX 0.7179487 0.7333333 0.7272727 0.5945946 0.6578947 0.6595745
## 0.3MammalX 0.6060606 0.6923077 0.6842105 0.6060606 0.6571429 0.7446809
## 1GermanXXX 0.5263158 0.4500000 0.5238095 0.5952381 0.5384615 0.4666667
## 2FrenchXXX 0.4722222 0.4500000 0.3947368 0.5526316 0.5263158 0.3414634
## 3FrenchXXX 0.3888889 0.3421053 0.4250000 0.6000000 0.5714286 0.4222222
## 52Japanese 53Japanese 54IranianX 55IranianX 56IranianX 57IranianX
## 0.1FishXXX 1.0000000 1.0000000 1.0000000 1.0000000 1.0000000 1.0000000
## 0.2SnakeXX 0.6176471 0.5862069 0.7073171 0.5789474 0.6875000 0.6511628
## 0.3MammalX 0.5666667 0.6206897 0.6756757 0.6111111 0.7333333 0.6315789
## 1GermanXXX 0.4722222 0.5714286 0.4736842 0.6046512 0.5531915 0.6000000
## 2FrenchXXX 0.5555556 0.4193548 0.4473684 0.5250000 0.4186047 0.5000000
## 3FrenchXXX 0.6000000 0.4857143 0.3421053 0.5238095 0.4666667 0.5777778
## 58TurkishX 59IranianX 60IranianX 61TurkishX 62TurkishX 63UkraineX
## 0.1FishXXX 1.0000000 1.0000000 1.0000000 1.0000000 1.0000000 1.0000000
## 0.2SnakeXX 0.4750000 0.6923077 0.6585366 0.6410256 0.5750000 0.6744186
## 0.3MammalX 0.6410256 0.7500000 0.6388889 0.6571429 0.6578947 0.5833333
## 1GermanXXX 0.6170213 0.6976744 0.6666667 0.6363636 0.5454545 0.5238095
## 2FrenchXXX 0.6086957 0.6341463 0.6363636 0.6000000 0.5365854 0.6222222
## 3FrenchXXX 0.5744681 0.6744186 0.5476190 0.6097561 0.5476190 0.5681818
## 64UkraineX 65RussiaXX 66UkraineX 67RussiaXX 68GreeceXX 69ItalianX
## 0.1FishXXX 1.0000000 1.0000000 1.0000000 1.0000000 1.0000000 1.0000000
## 0.2SnakeXX 0.6388889 0.6000000 0.7000000 0.5862069 0.6976744 0.7777778
## 0.3MammalX 0.5483871 0.6111111 0.6666667 0.5384615 0.6842105 0.7750000
## 1GermanXXX 0.6250000 0.5813953 0.4324324 0.5588235 0.5813953 0.6046512
## 2FrenchXXX 0.5789474 0.4358974 0.5384615 0.4516129 0.2571429 0.4615385
## 3FrenchXXX 0.5128205 0.5000000 0.3333333 0.5294118 0.4634146 0.5238095
## 70American 71BritishX 72BritishX 73BritishX 74BritishX
## 0.1FishXXX 1.0000000 1.0000000 1.0000000 1.0000000 1.0000000
## 0.2SnakeXX 0.6388889 0.7441860 0.6666667 0.6744186 0.7021277
## 0.3MammalX 0.5937500 0.7692308 0.8000000 0.7250000 0.7500000
## 1GermanXXX 0.4722222 0.4358974 0.5744681 0.5116279 0.4888889
## 2FrenchXXX 0.4571429 0.5000000 0.6000000 0.5000000 0.4761905
## 3FrenchXXX 0.4736842 0.4871795 0.5116279 0.4750000 0.4523810
dim(PDat)
## [1] 5929 3
head(PDat)
## Var1 Var2 value
## 1 0.1FishXXX 0.1FishXXX 0
## 2 0.2SnakeXX 0.1FishXXX 1
## 3 0.3MammalX 0.1FishXXX 1
## 4 1GermanXXX 0.1FishXXX 1
## 5 2FrenchXXX 0.1FishXXX 1
## 6 3FrenchXXX 0.1FishXXX 1
Now let’s plot. Note: Here is a good resource from the developers of ggtree
. This GitHub link includes details on how to make a visually appealing tree.
library(ggplot2)
ggplot(data = PDat, aes(x=Var1, y=Var2, fill=value)) +
geom_tile()+scale_fill_gradientn(colours=c("white","blue","green","red")) +
theme(axis.text.x = element_text(angle = 90, hjust = 1, vjust = 0.5))
Looks like we have a nice range of values to try to cluster by distance.
Now that we have a distance matrix, let’s try building a phylogeny using the neighbour-joining (NJ) method.
<-nj(DragonDist) DragonTree
Now let’s draw the tree using the ggtree
library to compare the two:
::install_github("hadley/devtools") devtools
## rlang (1.0.1 -> 1.0.2 ) [CRAN]
## glue (1.6.1 -> 1.6.2 ) [CRAN]
## withr (2.4.3 -> 2.5.0 ) [CRAN]
## desc (1.4.0 -> 1.4.1 ) [CRAN]
## xfun (0.29 -> 0.30 ) [CRAN]
## sass (NA -> 0.4.0 ) [CRAN]
## bslib (NA -> 0.3.1 ) [CRAN]
## textshaping (NA -> 0.3.6 ) [CRAN]
## openssl (1.4.6 -> 2.0.0 ) [CRAN]
## promises (NA -> 1.2.0.1) [CRAN]
## later (NA -> 1.3.0 ) [CRAN]
## sourcetools (NA -> 0.1.7 ) [CRAN]
## fontawesome (NA -> 0.2.2 ) [CRAN]
## httpuv (NA -> 1.6.5 ) [CRAN]
## rmarkdown (2.11 -> 2.12 ) [CRAN]
## ragg (NA -> 1.2.2 ) [CRAN]
## downlit (NA -> 0.4.0 ) [CRAN]
## shiny (NA -> 1.7.1 ) [CRAN]
## crosstalk (NA -> 1.2.0 ) [CRAN]
## urlchecker (NA -> 1.0.1 ) [CRAN]
## profvis (NA -> 0.3.7 ) [CRAN]
## pkgdown (NA -> 2.0.2 ) [CRAN]
## miniUI (NA -> 0.1.1.1) [CRAN]
## DT (NA -> 0.21 ) [CRAN]
##
## There is a binary version available but the source version is later:
## binary source needs_compilation
## pkgdown 1.6.1 2.0.2 FALSE
##
## package 'rlang' successfully unpacked and MD5 sums checked
## package 'glue' successfully unpacked and MD5 sums checked
## package 'withr' successfully unpacked and MD5 sums checked
## package 'desc' successfully unpacked and MD5 sums checked
## package 'xfun' successfully unpacked and MD5 sums checked
## package 'sass' successfully unpacked and MD5 sums checked
## package 'bslib' successfully unpacked and MD5 sums checked
## package 'textshaping' successfully unpacked and MD5 sums checked
## package 'openssl' successfully unpacked and MD5 sums checked
## package 'promises' successfully unpacked and MD5 sums checked
## package 'later' successfully unpacked and MD5 sums checked
## package 'sourcetools' successfully unpacked and MD5 sums checked
## package 'fontawesome' successfully unpacked and MD5 sums checked
## package 'httpuv' successfully unpacked and MD5 sums checked
## package 'rmarkdown' successfully unpacked and MD5 sums checked
## package 'ragg' successfully unpacked and MD5 sums checked
## package 'downlit' successfully unpacked and MD5 sums checked
## package 'shiny' successfully unpacked and MD5 sums checked
## package 'crosstalk' successfully unpacked and MD5 sums checked
## package 'urlchecker' successfully unpacked and MD5 sums checked
## package 'profvis' successfully unpacked and MD5 sums checked
## package 'miniUI' successfully unpacked and MD5 sums checked
## package 'DT' successfully unpacked and MD5 sums checked
##
## The downloaded binary packages are in
## C:\Users\rob_c\AppData\Local\Temp\RtmpK6cGb4\downloaded_packages
## * checking for file 'C:\Users\rob_c\AppData\Local\Temp\RtmpK6cGb4\remotes25884f235da9\r-lib-devtools-575ae4e/DESCRIPTION' ... OK
## * preparing 'devtools':
## * checking DESCRIPTION meta-information ... OK
## * checking for LF line-endings in source and make files and shell scripts
## * checking for empty or unneeded directories
## * building 'devtools_2.4.3.9000.tar.gz'
##
library(ggtree)
ggtree(DragonTree,layout="circular")
ggtree(DragonTree,layout="rectangular")
Woah, what’s going on here?
This tree has some problems. The branches are very long relative to the bifurcations among groups. It is almost as if all the characters are all mixed up. This is what we might expect if dragons were created from active imaginations and didn’t really evolve from each other.
Another reason is that we are treating all traits the same. For example, we treat snout length the same as limb number. However, we might argue that limb length evolves more slowly than snout length, that dragons with the same number of limbs are more likely related than dragons with similar snout lengths. We can do this by weighting our traits. Weights are just numbers that we multiply (or sometimes add) to our data so that traits with higher weights have a stronger influence on our clustering algorithm.
The Weights.csv data table has a set of weights that were used for the Dragon Phylogeny t-shirt design. Let’s take a look:
<-read.csv("Data/Weights.csv") WeightsDat
Code | Origin | Weight | Rationale |
---|---|---|---|
1 | Appendages | ZZZZ | Highly Highly conserved (all tertrapods came from the same fish; loss of limbs more common (e.g. lizards to snakes) |
2 | Mass | 1111 | Weak: e.g. squirrel vs. Kangaroo |
3 | Body Type | CC | Body type somewhat conserved (e.g. snakes vs. lizards) |
4 | ClawType | 5555 | Somewhat Strong: e.g. mammals vs. eagle |
5 | Dorsal Ridges | 111111 | Weak: e.g. dinosaurs |
6 | Ear Morphology | 111 | Weak: e.g. deer mouse vs. vole |
7 | Eye Morphology | 111 | Weak: e.g. nocturnal vs. diurnal and eye size |
8 | Eye Position | 1 | Weak: e.g. predators vs. prey |
9 | HornType | 111 | Weak: e.g. gazelle vs. deer |
10 | NosePos | 7 | Moderately conserved e.g. amphibians vs. fish |
11 | Nasal Morphology | 1 | Weak: e.g. dog breeds |
12 | Skin-dorsal | 999999 | Somewhat conserved: e.g. reptiles vs. mammals |
13 | Skin-head | 333333 | Somewhat conserved: e.g. reptiles vs. mammals |
14 | Skin-ventral | 333333 | Somewhat conserved: e.g. reptiles vs. mammals |
15 | SnoutType | 3333 | Moderate: e.g. mouse vs. rhinocerous |
16 | TailType | 11 | Weak: e.g. dinosaurs |
17 | Teeth | 3333 | Moderately conserved e.g. herbivore vs. carnivore mammals |
18 | OppToes | 1 | Weak: e.g. chimps vs. humans |
19 | Toe Number | 333333 | Moderate: e.g. horse vs. lion |
20 | Tongue Length | 1 | Weak: e.g. anteater vs. primate |
21 | Tongue Morphology Type | 44 | Moderate: e.g. snake vs. lizard |
22 | Ventral Plates | 7 | Somewhat conserved: e.g. snakes vs. mammals |
23 | Wiskers | 11 | Weak: e.g. catfish vs. bass |
24 | Wing Structure | JJ | Somewhat highly conserved; single origin in bats and birds |
25 | WingType | AAA | Strong: e.g. bat vs. bird |
Weights are scaled on a scale from 1 through 9 and then A (A=10) through Z (Z=35). The number of each digit corresponds to the number of binary values for that trait, and the traits are ordered in the same way they were encoded. There are programs we could use to calculate distance using these as weights (e.g. BEAST2). However, we’ll do it manually to see how it works. All we need to do is multiply each binary value by it’s weight. So the first step is to import the weights and extract the weights column. However, that’s going to take a few steps:
Easy:
<-paste0(WeightsDat$Weight,collapse="")
Weights<-strsplit(Weights,split="")[[1]] Weights
We could encode very single letter individually, or we can use a custom function with the built-in LETTERS
object:
# See what LETTERS is (see also letters) LETTERS
## [1] "A" "B" "C" "D" "E" "F" "G" "H" "I" "J" "K" "L" "M" "N" "O" "P" "Q" "R" "S"
## [20] "T" "U" "V" "W" "X" "Y" "Z"
which(LETTERS=="G")
## [1] 7
<-rep(NA,length(Weights))
WeightsNumfor(i in 1:length(WeightsNum)){
if(Weights[i] %in% LETTERS){
<-which(LETTERS==Weights[i])+9
WeightsNum[i]else {
} <-Weights[i]
WeightsNum[i]
}
}<-as.numeric(WeightsNum) WeightsNum
Now we have a vector of weights, which should have the same length as the number of 1s and 0s (i.e. 78 ‘characters’):
length(WeightsNum)
## [1] 78
This is complicated by the fact that our data include missing data coded as ?
, so all of our characters are stored as strings. Since \(0*x=0\) and \(?*x\) is undefinded, we really just need to multiply the 1s, which is equivalent to replacing the 1s with their corresponding weight values. To do this, we also have to slice our list object using the double brackets [[]]
.
<-DragonNexus # Make a new weighted data frame object
WtDragonNexusfor (i in 1:length(DragonNexus)){
<-DragonNexus[[i]]==1
RepWeight<-WeightsNum[RepWeight]
WtDragonNexus[[i]][RepWeight]<-NA
RepWeight }
We just need to modify our objects
<-data.frame(matrix(unlist(WtDragonNexus),ncol=78,byrow=T))
WtDragonNexusDFrow.names(WtDragonNexusDF)<-names(WtDragonNexus)
<-dist(WtDragonNexusDF,method='euclidean') WtDragonDist
## Warning in dist(WtDragonNexusDF, method = "euclidean"): NAs introduced by
## coercion
<-as.matrix(WtDragonDist) WtDragonDistMat
Note the change in method from binary to euclidean… why?
Compare the new distance matrix to the older one above. Note the much stronger structure:
<-melt(WtDragonDistMat)
WtPDatggplot(data = WtPDat, aes(x=Var1, y=Var2, fill=value)) +
geom_tile()+scale_fill_gradientn(colours=c("white","blue","green","red")) +
theme(axis.text.x = element_text(angle = 90, hjust = 1, vjust = 0.5))
Let’s compare the minimal evolution (ME) and Neighbour Joining (NJ) methods
<-fastme.bal(WtDragonDist)
WtDragonTree<-nj(WtDragonDist)
WtDragonTreeNJggtree(WtDragonTree,layout="circular")
ggtree(WtDragonTreeNJ,layout="circular")
Let’s try to make it look a bit better. To do this, we need to understand the data format
str(WtDragonTree)
## List of 4
## $ edge : int [1:151, 1:2] 78 78 79 80 81 81 82 83 83 82 ...
## $ edge.length: num [1:151] 25.6 5.88 9.8 1.63 12.13 ...
## $ tip.label : chr [1:77] "0.1FishXXX" "8EnglishXX" "32ItalianX" "73BritishX" ...
## $ Nnode : int 75
## - attr(*, "class")= chr "phylo"
## - attr(*, "order")= chr "cladewise"
Note that this is a list
object with phylo
class and cladewise
order. We can also see three ‘slices’ (denoted with $
). The tip.label
slice contains the specimen labels. The edge
slice contains all of the line segments,
Q: Why are there 151 edges if we only have 77 dragons?
The edges include lines that connect clades.
The tip labels contain information about the origin of each dragon image (‘Fish’, ‘Reptile’ and ‘Mammal’ were added later as outgroups).
head(WtDragonTree$tip.label)
## [1] "0.1FishXXX" "8EnglishXX" "32ItalianX" "73BritishX" "37DutchXXX"
## [6] "45ItalianX"
We can use this to colour-code our tree and see if dragons from the same regions cluster together. We can use regular expressions to parse out a vector:
<-gsub("[0-9\\.]+([^X]+)X*","\\1",WtDragonTree$tip.label) # Remove leading numbers Country
Next we have to group tip.labels
by their corresponding country. There is a nice function in R called split
that makes this easy to do:
<-split(WtDragonTree$tip.label, Country) CountryGroups
Now we use the groupOTU
function to apply the grouping information for plotting:
<-groupOTU(WtDragonTree,CountryGroups)
WtDTcolstr(WtDTcol)
## List of 4
## $ edge : int [1:151, 1:2] 78 78 79 80 81 81 82 83 83 82 ...
## $ edge.length: num [1:151] 25.6 5.88 9.8 1.63 12.13 ...
## $ tip.label : chr [1:77] "0.1FishXXX" "8EnglishXX" "32ItalianX" "73BritishX" ...
## $ Nnode : int 75
## - attr(*, "class")= chr "phylo"
## - attr(*, "order")= chr "cladewise"
## - attr(*, "group")= Factor w/ 19 levels "3","American",..: 6 5 12 3 4 12 8 17 3 11 ...
Notice how there is a new group
attribute, which is a factor containing our country groups.
ggtree(WtDTcol,layout="circular",aes(colour=group))+geom_tiplab(size=2,aes(angle=angle))
What might we infer from this figure?
As an alternative to colouring by region, we might want to point out a few clades (i.e. groups of dragons that cluster together). For example, it looks like the outer node
<-groupClade(WtDragonTree,.node=c(142,128,103,90,80))
WtDTcladeggtree(WtDTclade,layout="circular",aes(colour=group)) +
geom_cladelabel(node=142,label="Serpentidae",hjust=0.5,offset.text=4,fontsize=3,angle=-45) +
geom_cladelabel(node=128,label="Wyvernidae",hjust=0.5,offset.text=4,fontsize=3,angle=15) +
geom_cladelabel(node=103,label="Orientalia",hjust=0.5,offset.text=4,fontsize=3,angle=40) +
geom_cladelabel(node=90,label="Dracopteronidae",hjust=0.5,offset.text=4,fontsize=3,angle=-55) +
geom_cladelabel(node=80,label="Dracoverisidae",hjust=0.5,offset.text=6,fontsize=3,angle=55) +
xlim(NA,60)
NOTE: To find these nodes, we can use + geom_nodelab
to plot the node number on top of each node.
(OPTIONAL) You can Try overlaying your phylogeny on a geographical map: https://www.molecularecologist.com/2014/11/geophylogeny-plots-in-r-for-dummies/
To do this, you would need to find latitude/longitude coordinates. An easy way to do this is to find a location in Google Maps. When you right-click you will see the longitude and latitude coordinates.