Schedule

Time Activity
07:30 Breakfast
08:30 Introduction & R Tutorials
11:00 Field Activity
12:00 Lunch
13:00 Field Activity
14:00 Group presentations (4x10 min)
14:40 Methods & Applications
15:30 Transcriptomics Rapid Tutorial
16:00 Metagenomics Rapid Tutorial
16:25 Nanopore Sequencing Demo & Wrap-up

Preparation

  1. Install R
  2. Install R Studio (The free’ desktop’ version)
  3. Open R-Studio and locate the ‘Console’ panel
  4. In the console type install.packages("___") but replace ___ with the package name to install. Repeat for each of these packages:
    • tidyverse
    • knitr
    • vcfR
    • pinfsc50
    • rmarkdown
    • ggplot2
    • vegan
    • RAM
    • lmtest
    • car
    • coefplot

Photo sharing

Upload your field photos.

Download photos from other participants.

The password is: LSETAC-QUBS2018

QUBS

Introduction

Code of Conduct

In this course we follow a code of conduct based on The Carpentries.

They can be summarized by two Main rules:

  1. Be professional
  2. Be inclusive

Zero-tolerance policy: Harassment is any form of behaviour intended to exclude, intimidate, or cause discomfort. Because we are a diverse community, we may have different ways of communicating and of understanding the intent behind actions. Therefore we have chosen to prohibit certain forms of behaviour in our community, regardless of intent. Prohibited harassing behaviour includes but is not limited to:

  • written or verbal comments which have the effect of excluding people on the basis of membership of a specific group (ethnicity, religious belief, gender, sexual orientation, etc.)
  • causing someone to fear for their safety, such as through stalking, following, or intimidation
  • the display of sexual or violent images
  • unwelcome sexual attention
  • nonconsensual or unwelcome physical contact
  • sustained disruption of talks, events or communications
  • incitement to violence, suicide, or self-harm
  • continuing to initiate interaction (including photography or recording) with someone after being asked to stop
  • publication of private communication without consent

The list above should not be taken as exhaustive but rather as a guide to make it easier to enrich all of us and the communities in which we participate.

Programming Crash-Course

Introduction to R

  1. R Fundamentals
  2. Basic visualizations with qplot()
  3. R Markdown
  4. Advanced visualizations with ggplot()
  5. Regular expressions

Introduction to Python

  1. Python Fundamentals
  2. Analysis pipelines with python & snakemake

Linux/Command Line

  1. Introduction to high-performance computing
  2. Version Control with Git & GitHub

Applications Tutorials

  1. Transcriptome analysis tutorial
  2. Community ordination – classic plant sampling tutorial
  3. Community ordination – Operational taxonomic units (OTUs) from metabarcoding tutorial

Guides to Reproducible Science

From the British Ecological Society

  1. Data Management
  2. Reproducible Code
  3. Getting Published
  4. Peer Review

Key points

  • R can be a powerful interface for
    1. Managing data
    2. Analyzing data
    3. Visualizing data
    4. Generating dynamic reports
  • R can be slower or unusable for larger datasets (>> 1Mb)
  • Unix/Linux/Command Line programming is useful for working with large datasets

FIRST Generation Sequencing

Classic sequencing uses the Sanger method (Wikipedia).

Step 1. Extract and Purify DNA

Step 2. Select single target (e.g. PCR)

Step 3. Dye-terminator PCR

Step 4. Visualize on a gel

Protocol

Read through our Quick Extraction & PCR Protocol

Draw a flowchart outlining the major steps.

Why is PCR necessary for Sanger Sequencing?

We don’t have time to run through the protocol and sequenc samples. However, an undergraduate field course did this at QUBS a couple of weeks ago, so we can analyze their data. DNA was extracted from plants collected locally, used for DNA barcoding.

Sanger sequencing analysis tutorial

Try running through the tutorial with some of the other sequences. What species were found?

Key points

  • Sanger sequencing is the ‘classic’ sequencing method
  • Gel can read only one target sequence at a time
  • DNA barcodes can be used to identify species
  • R can be a powerful interface for visualizing & analyzing sequencing data

SECOND Generation

Typical Workflow

Illumina Sequencing Overview

2nd Gen Presentation

Key points

  • There are several flavours of ‘next generation’ sequencing (NGS)
  • NGS platforms sequence many fragments simultaneously, unlike the Sanger method
  • Sequencing technology is evolving faster than Moore’s Law of computation
  • Dealing with billions to trillions of base pairs of data is not trivial
  • Bioinformatics is usually the bottleneck and main cost of a project involving NGS

THIRD Generation

Nanopore MinION metagetnomics

What’s in My Pot?

  • Inspect the data on the One Codex website
  • Can be run in ‘real time’ (i.e. BLAST results as they are sequenced)

Sequencing Comparison Table

Platform Instrument Mreads Length Gbp Type
Illumina NovaSeq 6000 S4 10,000 300 3000.00 SR or PE
Illumina NovaSeq 6000 S3 6,600 300 1980.00 SR or PE
Illumina NovaSeq 5000/6000 S2 3,300 300 990.00 SR or PE
Illumina NovaSeq 5000/6000 S1 1,600 300 480.00 SR or PE
Illumina NextSeq 500 High-Output 400 300 120.00 SR or PE
Illumina HiSeq X 375 300 113.00 PE
Illumina HiSeq 3000/4000 313 300 93.80 SR or PE
Illumina NextSeq 500 Mid-Output 130 300 39.00 PE
Illumina HiSeq High-Output v4 250 250 62.50 SR or PE
Illumina HiSeq High-Output v3 186 250 46.50 SR or PE
Illumina HiSeq Rapid run v4 150 500 75.00 SR or PE
Illumina HiSeq Rapid Run 151 300 45.20 SR or PE
Illumina HiScanSQ 93 200 18.60 SR or PE
Illumina GAIIx 42 300 12.60 SR or PE
Illumina MiSeq v3 25 600 15.00 SR or PE
Illumina MiniSeq High-Output 25 300 7.50 SR or PE
Illumina MiSeq v2 16 250 4.00 SR or PE
Illumina MiniSeq Mid-Output 8 300 2.40 SR or PE
Illumina MiSeq v2 Micro 4 300 1.20 SR or PE
Illumina MiSeq v2 Nano 1 500 0.50 SR or PE
Ion Proton I 60 200 12.00 SR
Ion PGM 318 4 400 1.60 SR
Ion PGM 316 2 400 0.80 SR
Ion PGM 314 0.4 400 0.16 SR
Roche 454 GS FLX+ / FLX 1 700 0.49 SR
Roche 454 GS FLX+ / FLX 0.35 700 0.24 SR
Roche 454 GS FLX+ / FLX 0.13 700 0.09 SR
Roche 454 GS FLX+ / FLX 0.05 700 0.04 SR
Roche 454 GS FLX+ / FLX 0.02 700 0.01 SR
Roche 454 GS FLX+ / FLX 0.07 400 0.03 SR
SOLiD 5500xl W 267 100 26.70 SR or PE
SOLiD 5500 W 267 100 26.70 SR or PE
SOLiD 5500 82 100 8.15 SR or PE
SOLiD 5500xl 82 100 8.15 SR or PE
PacBio PacBio Sequel 0.37 20,000 7.40 SR
PacBio PacBio RS II (P6) 0.06 15,000 0.80 SR
Oxford Nanopore MinION ?? 2,000,000 20.00 SR
Oxford Nanopore PromethION ?? 2,000,000 1000.00 SR