To run Python commands, you need an interpreter. Like R, this is typically already installed on new Mac and Linux/Unix systems. If you are using Windows, you need to install the interpreter. In addition to the Python interpreter, you may also want to install an ‘Interactive Development Environment’ or IDE. RStudio is an example of a nice IDE for R. Spider is a decent IDE for Python.
The Anaconda package will install Python (choose Python 3) and some nice tools including Spyder (like RStudio for Python) and your own local version of Jupyter Notebooks.
Jupyter Notebooks is a simple and handy way to learn and practice Python syntax and share scripts. If you have a CAC account you can access Jupyter Notebooks ‘in the cloud’ through the same portal as RStudio Server:
Make comments inside your code. Very important (unless you are using Jupyter notebooks)!
# Use hastags to make comments - not read by the Python terminal
# Use other characters and blank lines to improve readability:
# -------------------------
# My first Python script
# Today's Date
# -------------------------
# Add a summary description of what the script does
# This script will...
# And annotate individual parts of the script
print()
Use the print() function to output objects to the screen
print("string")
## string
print('string')
## string
print('''
use 3 quotation marks
to make a string string
that can span
multiple
lines
''')
##
## use 3 quotation marks
## to make a string string
## that can span
## multiple
## lines
print(10+2) # add
## 12
print(10-2) # subtract
## 8
print(10*2) # multiply
## 20
print(10/2) # divide
## 5.0
# What about exponent?
print(10^2)
## 8
The ^
is exclusive-or (bitwise XOR), not exponent. This is the exponent function in Python:
print(10**2) # exponent
## 100
What is bitwise XOR?
1. Convert number to binary notation
2. For each bit:
a. 0 if bits are the same (i.e. both 1 or both 0)
b. 1 if bits are different
For example, in binary notation: 0 = 000 1 = 001 2 = 010 3 = 011 4 = 100 5 = 101 6 = 110 etc.
print(0^0)
## 0
print(0^1)
## 1
print(1^0)
## 1
print(1^1)
## 0
print(2^2)
## 0
print(3^5)
## 6
print(11//3) # floor (round down)
## 3
print(10%3) # Modulus (remainder)
## 1
print(abs(-10)) # absolute value
## 10
Python uses ‘modules’ that contain functions, just like R uses ‘packages’. First we import the package before we can use its functions. However, unlike R, we have to specify the package when calling the function:
import math
math.sqrt(9) # Square Root
math.log(10) # Natural log
math.log(10,10) # Log base 10
math.log(10,2) # Log base 2
math.exp(2) # Power of e
math.pow(2,10) # 2 to the power of 10
math.pi # Good old pi -- notice no brackets
math.e # e -- also no brackets
math.sin(math.pi/2) # Sine function
math.asin(1) # Inverse sine
math.cos(math.pi) # Cosine
math.acos(-1) # Inverse cosine
math.tan(0) # Tangent
math.atan(0) # Inverse tangent
import math
round(math.pi,3) # standard rounding to 3 digits
math.floor(math.pi) # round down to closest whole number
math.ceil(math.pi) # round up to closest whole number
Note: !
is a negation/inverse operator
1 > 2 # greater than
1 < 2 # less than
1 <= 2 # less than or equal to
1 == 1 # equal to
1 != 1 # not equal to
(1 == 2) | (1 == 1) # | means 'OR'
(1 == 2) & (1 == 1) # & means 'AND'
(1 == 1) & (1 == 1)
PROTIP:
|
vsin
Instead of |
, you (can) us in
to check a large number of values:
my_list=[1,2,3,4,5,6,7,8,9,10]
1 in my_list
Generate some random numbers. Useful for modelling, testing scripts, etc. Again, we need to import a package to do this
import random
random.randint(1,100) # random integer between 1 and 100
random.random() # Random floating point (decimal) number between 0 and 1
random.uniform(1,10) # Random floating point number between 1 and 10
random.gauss(0,1) # Random number from Gaussian distribution with mean 0 and standard deviation of 1
random.randrange (1,1000,10) # randomly select from 1 to 1000 by 10s
print(dir(random))
## ['BPF', 'LOG4', 'NV_MAGICCONST', 'RECIP_BPF', 'Random', 'SG_MAGICCONST', 'SystemRandom', 'TWOPI', '_BuiltinMethodType', '_MethodType', '_Sequence', '_Set', '__all__', '__builtins__', '__cached__', '__doc__', '__file__', '__loader__', '__name__', '__package__', '__spec__', '_acos', '_bisect', '_ceil', '_cos', '_e', '_exp', '_inst', '_itertools', '_log', '_pi', '_random', '_sha512', '_sin', '_sqrt', '_test', '_test_generator', '_urandom', '_warn', 'betavariate', 'choice', 'choices', 'expovariate', 'gammavariate', 'gauss', 'getrandbits', 'getstate', 'lognormvariate', 'normalvariate', 'paretovariate', 'randint', 'random', 'randrange', 'sample', 'seed', 'setstate', 'shuffle', 'triangular', 'uniform', 'vonmisesvariate', 'weibullvariate']
Fun fact, random numbers generated by a computer are generated by a calculation from a ‘seed’ number, so they are never truly random. They act random because the seed number is typically something like the millionth of a second of the time on your computer’s internal clock.
It’s not just philosophical, it is also useful for testing and debugging since you can set the seed to generate the same ‘random’ numbers.
Compare these outputs:
import random
random.seed(10)
random.randint(1,100)
random.seed(10)
random.randint(1,100)
random.randint(1,100)
random.randint(1,100)
Lists are groups of objects defined by square brackets []
my_list=[1,2,3,4,5,6,7,8,9,10]
print(my_list)
## [1, 2, 3, 4, 5, 6, 7, 8, 9, 10]
my_list2=[1,2,3,4,5,6,7,8,9,10,"string"]
print(my_list2)
## [1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 'string']
Q: Notice what happens when we add a string. How is this different from the
c()
function in R?
# The numeric values would switch to strings in the c() function in R:
c(1,2,3,4,5,6,7,8,9,10,"string")
## [1] "1" "2" "3" "4" "5" "6" "7"
## [8] "8" "9" "10" "string"
# This means you can't do simple vector math. For example:
print(my_list*10)
## [1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10]
The most basic object is a single number or string
X="string"
When we wrote: X<-"string"
Python created the object called X, so no output is produced.
To see the contents of X:
print(X)
## string
You can also concatenate multiple items using print – similar to the paste()
function in R
print("the string inside X is '",X,"'")
## the string inside X is ' string '
* A one to many dimensional list of items, defined using square brackets
* Ordered from 1 to ?
* Can be all the same type or a mix of types (e.g. numeric & string)
* Lists can be nested within other lists
Xvec=[X,1,2,3,"E", "Computational Biology", my_list]
print(Xvec)
## ['string', 1, 2, 3, 'E', 'Computational Biology', [1, 2, 3, 4, 5, 6, 7, 8, 9, 10]]
Notice how the list my_list is incorporated into the list Xvec. How is this different from the
c()
function in R?
The list is NESTED (2-dimension) in Python but FLATTENED (1-dimension) in R.
print(Xvec)
## ['string', 1, 2, 3, 'E', 'Computational Biology', [1, 2, 3, 4, 5, 6, 7, 8, 9, 10]]
print(Xvec[1])
## 1
print(Xvec[5])
## Computational Biology
print(Xvec[0])
## string
print(Xvec[-1:-3])
## []
print(Xvec[-4])
## 3
print(Xvec[0:3])
## ['string', 1, 2]
What do you notice about these indices? How is this different from indices in R?
* Positive values count from left to right
* First element on right is ZERO, not ONE!
* Negative values count from right to left
* First element on right is -1, not 0
* Range of values using colon [a,b)
* i.e. includes elements from a up to, but not including, b
Use multiple []
to drill down into individual nested terms
new_list=[my_list,my_list] # nests 2 lists within 3rd list
new_list2=my_list*2 # repeats elements 2 times
print(new_list)
## [[1, 2, 3, 4, 5, 6, 7, 8, 9, 10], [1, 2, 3, 4, 5, 6, 7, 8, 9, 10]]
print(new_list2)
# Nest [] to pull out nested lists
## [1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10]
new_list2[0]
new_list[0]
new_list[0][0]
The package NumPy is popular for array math. These are more similar to the c()
and matrix()
functions in R. They allow math on groups of numbers
import numpy
my_array=numpy.linspace(-1,1,5)
print(my_array)
## [-1. -0.5 0. 0.5 1. ]
print(my_array*10)
## [-10. -5. 0. 5. 10.]
PROTIP: Python package functions
It can get tedious always typing out the package before each function (e.g. `numpy.linspace). There are two ways to work around this:
as
with a custom nameimport numpy as np
my_array=np.linspace(-1,1,5)
from
to import the specific functionfrom numpy import linspace
my_array=linspace(-1,1,5)
A 2-D array of cells
With 1 to ? rows by columns
from numpy import matrix
A = matrix([[11,12,13],[21,22,23],[31,32,33]])
x = matrix([[1],[2],[3]])
print(A)
## [[11 12 13]
## [21 22 23]
## [31 32 33]]
print(x)
## [[1]
## [2]
## [3]]
Some basic matrix algebra
print(A*x) # Multiply matrix A by vector x
## [[ 74]
## [134]
## [194]]
print(A.T) # Transpose A
## [[11 21 31]
## [12 22 32]
## [13 23 33]]
print(A.I) # Inverse of A
## [[ 2.95161921e+14 -5.90323841e+14 2.95161921e+14]
## [ -5.90323841e+14 1.18064768e+15 -5.90323841e+14]
## [ 2.95161921e+14 -5.90323841e+14 2.95161921e+14]]
Did you notice the extra square brackets in the output?
from numpy import tensordot
print(tensordot(A,A,axes=0)) # tensor product (a %*% b)
## [[[[ 121 132 143]
## [ 231 242 253]
## [ 341 352 363]]
##
## [[ 132 144 156]
## [ 252 264 276]
## [ 372 384 396]]
##
## [[ 143 156 169]
## [ 273 286 299]
## [ 403 416 429]]]
##
##
## [[[ 231 252 273]
## [ 441 462 483]
## [ 651 672 693]]
##
## [[ 242 264 286]
## [ 462 484 506]
## [ 682 704 726]]
##
## [[ 253 276 299]
## [ 483 506 529]
## [ 713 736 759]]]
##
##
## [[[ 341 372 403]
## [ 651 682 713]
## [ 961 992 1023]]
##
## [[ 352 384 416]
## [ 672 704 736]
## [ 992 1024 1056]]
##
## [[ 363 396 429]
## [ 693 726 759]
## [1023 1056 1089]]]]
[row,col]
from numpy import matrix
A = matrix([[11,12,13],[21,22,23],[31,32,33]])
print(A)
## [[11 12 13]
## [21 22 23]
## [31 32 33]]
print(A[0,1])
## 12
from numpy import tensordot
T = tensordot(A,A,axes=0)
print(T[0,0,1,1])
## 242
print(A*A)
## [[ 776 812 848]
## [1406 1472 1538]
## [2036 2132 2228]]
from numpy import matrix
X = matrix([[11,12,13],[21,22,23],[31,32,33]])
Y = matrix([[1,0,0],[0,1,0],[0,0,1]])
print(X*2)
## [[22 24 26]
## [42 44 46]
## [62 64 66]]
print(X*Y)
## [[11 12 13]
## [21 22 23]
## [31 32 33]]
print(X+Y)
## [[12 12 13]
## [21 23 23]
## [31 32 34]]
print(Y/X)
## [[ 0.09090909 0. 0. ]
## [ 0. 0.04545455 0. ]
## [ 0. 0. 0.03030303]]
print(X^Y)
## [[10 12 13]
## [21 23 23]
## [31 32 32]]
Use numpy tensordot
with axes=
for: (Tensor Product)[https://en.wikipedia.org/wiki/Tensor_product]
# Tensor product (a %*% b)
print(tensordot(A,A,axes=0))
## [[[[ 121 132 143]
## [ 231 242 253]
## [ 341 352 363]]
##
## [[ 132 144 156]
## [ 252 264 276]
## [ 372 384 396]]
##
## [[ 143 156 169]
## [ 273 286 299]
## [ 403 416 429]]]
##
##
## [[[ 231 252 273]
## [ 441 462 483]
## [ 651 672 693]]
##
## [[ 242 264 286]
## [ 462 484 506]
## [ 682 704 726]]
##
## [[ 253 276 299]
## [ 483 506 529]
## [ 713 736 759]]]
##
##
## [[[ 341 372 403]
## [ 651 682 713]
## [ 961 992 1023]]
##
## [[ 352 384 416]
## [ 672 704 736]
## [ 992 1024 1056]]
##
## [[ 363 396 429]
## [ 693 726 759]
## [1023 1056 1089]]]]
(Dot/scalar product)[https://en.wikipedia.org/wiki/Dot_product]
# Tensor dot-product
print(tensordot(A,A,axes=1))
## [[ 776 812 848]
## [1406 1472 1538]
## [2036 2132 2228]]
We have already seen lists, which are assigned using square brackets []
my_numlist=[1,2,3,4,5]
my_strlist=["a","b","c"]
print(my_numlist)
## [1, 2, 3, 4, 5]
print(my_strlist)
## ['a', 'b', 'c']
You can also subset and edit lists using square brackets
REMEMBER index number starts at 0, not 1
my_numlist=[1,2,3,4,5]
print(my_numlist [0:3])
## [1, 2, 3]
my_numlist[3]=1000
print(my_numlist)
## [1, 2, 3, 1000, 5]
Tuples are sequences of Python objects similar to lists, but unlike lists they can’t be changed. They are defined using regular brackets ()
my_tup=(1,2,3,4,5,"string")
print(my_tup[2])
## 3
my_tup[2]=1000
## TypeError: 'tuple' object does not support item assignment
##
## Detailed traceback:
## File "<string>", line 1, in <module>
Check for a value inside a list or tuple using in
and not in
:
nucleotides=["A","T","G","C"]
print("A" in nucleotides)
## True
print("A" not in nucleotides)
## False
Use and
and or
for multiple comparisons
print("A" in nucleotides and "R" in nucleotides)
## False
print("A" in nucleotides or "R" in nucleotides)
## True
Dictionaries in Python are assigned with ‘curly brackets’ {}
. Dictionaries are like lists and tuples. They are mutable like lists but contain customizable indices. Dictionaries are assigned with the general form index : value ,
adict={'seq' : 'GATTACA', 'species' : 'Human'}
print(adict)
## {'seq': 'GATTACA', 'species': 'Human'}
You can update dictionaries with square brackets, just like lists. But unlike lists it doesn’t have to be a number; you use the index value, which can be a string.
adict['species']="Homo sapiens"
You can also add new entries this way:
adict['ID']=90211
print(adict)
## {'seq': 'GATTACA', 'species': 'Homo sapiens', 'ID': 90211}
print()
Dictionaries are handy for tracking outputs using %(index), followed by a modifier for the type of variable:
print("The sequence of %(species)s reference #%(ID)d is %(seq)s" %adict)
## The sequence of Homo sapiens reference #90211 is GATTACA
PROTIP: Formatting variables
You can use %d and %f to format numbers with many decimal places
from math import pi
print("Round pi to whole number %d" %pi)
## Round pi to whole number 3
print("Round pi to 2 decimal places %.2f" %pi)
## Round pi to 2 decimal places 3.14
print("Round pi to default number of decimal places (6) %f" %pi)
## Round pi to default number of decimal places (6) 3.141593
print("Round pi to 48 decimal places %.48f" %pi)
## Round pi to 48 decimal places 3.141592653589793115997963468544185161590576171875
Now try pi to 100 decimal places.
print("Round pi to 100 decimal places %.100f" %pi)
## Round pi to 100 decimal places 3.1415926535897931159979634685441851615905761718750000000000000000000000000000000000000000000000000000
Why is pi broken (all 0s after the 48th decimal place)?
Hint: What is %f?
if/then statements in Python do not use brackets, making them much ‘cleaner’ and easier to read than if/then statements in R.
Instead of using brackets, the first line ends with a colon :
and tab-spacing is used to delimit subsequent lines within the if/then statement. This can be a bit confusing at first. To demonstrate, compare these two blocks of code:
CODE A
x=6
if x > 5:
print("X is > 5. In fact, x is %s" %x)
if x > 1:
print("X is > 1")
## X is > 5. In fact, x is 6
## X is > 1
CODE B
x=6
if x > 5:
print("X is > 5. In fact, x is %s" %x)
## X is > 5. In fact, x is 6
if x > 1:
print("X is > 1")
## X is > 1
Now set x=4 and re-run both sets of code. Why is the output different?
Use else
and elif
to clarify if two lines are part of the same if statement:
x=4
if x > 5:
print("X is > 5. In fact, x is %s" %x)
else:
print("X is > 1")
## X is > 1
x=4
if x > 5:
print("X is > 5. In fact, x is %s" %x)
elif x > 1:
print("X is > 1")
else:
print("X is not > 1 or > 5")
## X is > 1
For loops use a similar structure to if/then statements:
for i in range(0,100,10):
print(i)
## 0
## 10
## 20
## 30
## 40
## 50
## 60
## 70
## 80
## 90
You can also use for loops to cycle through lists, tuples or dictionaries
basepairs=["G","A","T","T","A","C","A"]
for i in basepairs:
print(i)
if i in ["A","G"]:
print("Purine")
elif i in ["T","C"]:
print("Pyramidine")
## G
## Purine
## A
## Purine
## T
## Pyramidine
## T
## Pyramidine
## A
## Purine
## C
## Pyramidine
## A
## Purine
You can also use break
and else
to break out of loops
inputs=["G","A","T","T","y","A","C","A"]
basepairs=["A","T","G","C"]
for i in inputs:
for j in basepairs:
if i == j:
print(j)
break
else:
print("The character ",i," is not a base pair")
## G
## A
## T
## T
## The character y is not a base pair
## A
## C
## A
This is a bit tricky – the else
statement belongs to the second for
statement, not the if
statement nested inside it. Take a few minutes to walk through the loop for each value of i and j to understand how this works.
One handy way to use for loops in Python is inside of other objects, using square brackets[]
single_digit_squares=[x**2 for x in range(10) ]
# Recall: range(10) starts at 0 and includes whole numbers up to but not including 10.
print(single_digit_squares)
## [0, 1, 4, 9, 16, 25, 36, 49, 64, 81]
While loops continue until some expression or criterion is fulfilled.
counter=0
while counter < 10:
counter +=1
print(counter)
## 1
## 2
## 3
## 4
## 5
## 6
## 7
## 8
## 9
## 10
PROTIP: Counter shortcut
Notice the line counter +=1
, which is a nice Python shortcut for counter = counter + 1
Functions in Python are defined with the def
function. As with flow control (above), the syntax is much simpler and cleaner than R
def seqtype(seq):
for i in seq:
print(i)
if i in ["A","G"]:
print("Purine")
elif i in ["T","C"]:
print("Pyramidine")
myseq=["G","T","C","A"]
seqtype(myseq)
## G
## Purine
## T
## Pyramidine
## C
## Pyramidine
## A
## Purine
We can also use return
to return an object (e.g. value, list, etc.) back to the user
def seqtype(seq):
seqlen=0
for i in seq:
seqlen+=1
return(seqlen)
myseq=["G","T","C","A","C","T","T","A"]
myseqlen=seqtype(myseq)
print("Sequence length is ",myseqlen)
## Sequence length is 8
Reading & writing text files in Python can take a bit of exploration and practice. BUT one important thing to remember is that you have to ‘close’ a file after it is ‘open’ or else you probably won’t be able to access it.
Probably the most straight-forward way to read files in Python is to open
the file and then copy the lines into a new object.
myfile='./FallopiaData.csv'
openfile=open(myfile,'r')
for line in openfile:
line=line.strip('\n') # .strip() removes the string \n from the line
print(line)
openfile.close() ## This closes the file -- very important!
NOTE: OUTPUT NOT SHOWN
Output is not shown since it just prints out the entire text of the file.
PROTIP: Memory use
When you load a file into a data.frame object in R using MyData<-read.csv()
, the entire file is loaded into memory. In contrast, the above code loads only a single line into memory at a time. This is one huge advantage of Python, and why it is so common for processing very large datasets (e.g. high-throughput sequencing).
On the other hand, you can also load a file into a Python object:
myfile='./FallopiaData.csv'
openfile=open(myfile,'r')
mydata=[]
for line in openfile:
mydata=mydata+[line.strip('\n')]
openfile.close()
print(mydata)
PROTIP: Loops & methods inside lists
The above can be simplified as follows:
1. Combine file name with `open()` function
2. Include `.strip()` and `.readlines()` inside a list object using `[]`
3. Include the __for loop__ inside the same list object
myfile=open('./FallopiaData.csv','r')
mydata=[line.strip('\n') for line in myfile]
myfile.close() ## This closes the file -- very important!
print(mydata[0:2]) # Print first few lines
## ['PotNum,Scenario,Nutrients,Taxon,Symphytum,Silene,Urtica,Geranium,Geum,All_Natives,Fallopia,Total,Pct_Fallopia', '1,low,low,japon,9.81,36.36,16.08,4.68,0.12,67.05,0.01,67.06,0.01']
Notice that the file is read in as raw text as a single list object (only one set of square brackets []
).
Notice the use of strip()
and readlines()
each applied to their respective objects line
and myfile
with the application of a period: .
Notice the printed output above. This is much different than a data.frame() object in R. If we want to make sense of the csv file, we have to parse the file into rows and columns, which can be a bit tricky using the methods we’ve learned so far.
Start by parsing by row, using the csv library
import csv
myfile=open("./FallopiaData.csv")
mydata=csv.reader(myfile)
print(mydata)
## <_csv.reader object at 0x000000001B622DB0>
Notice the mydata
object is not a standard type of object that we usually work with. However, it does contain a set of objects, one for each row. We can access these with a for loop:
row=0
for line in mydata:
row += 1
print(line)
if row > 2:
break
## ['PotNum', 'Scenario', 'Nutrients', 'Taxon', 'Symphytum', 'Silene', 'Urtica', 'Geranium', 'Geum', 'All_Natives', 'Fallopia', 'Total', 'Pct_Fallopia']
## ['1', 'low', 'low', 'japon', '9.81', '36.36', '16.08', '4.68', '0.12', '67.05', '0.01', '67.06', '0.01']
## ['2', 'low', 'low', 'japon', '8.64', '29.65', '5.59', '5.75', '0.55', '50.18', '0.04', '50.22', '0.08']
We can also look at a particular column, using square brackets
row=0
for line in mydata:
row += 1
print(line[2])
if row > 2:
break
## low
## low
## low
Writing is just like reading in terms of opening and closing the file, with file.write()
instead of file.read()
in between.
outfile=open('./OutFile.txt','w')
for line in range(10):
outstring="This is line number %d" %line
print(outstring)
outfile.write(outstring+"\n")
## This is line number 0
## This is line number 1
## This is line number 2
## This is line number 3
## This is line number 4
## This is line number 5
## This is line number 6
## This is line number 7
## This is line number 8
## This is line number 9
outfile.close()
Python is a great tool for editing text files. By text files we mean anything that is human-readable, including a lot of input/output files from common bioinformatics programs (e.g. FASTA, FASTQ, SAM). Python is perfect for analysis pipelines where you want to take the output of one analysis (e.g. SAM) and modify it for input into another program (e.g. FASTQ of aligned reads only).
infile=open('./OutFile.txt','r')
outfile=open('./ModOutFile.txt','w')
for line in infile:
outstring="%s with new info added" %line
print(outstring)
outfile.write(outstring+"\n")
## This is line number 0
## with new info added
## This is line number 1
## with new info added
## This is line number 2
## with new info added
## This is line number 3
## with new info added
## This is line number 4
## with new info added
## This is line number 5
## with new info added
## This is line number 6
## with new info added
## This is line number 7
## with new info added
## This is line number 8
## with new info added
## This is line number 9
## with new info added
infile.close()
outfile.close()
Notice that our added input goes on a new line, that’s because the line that we read in already has \n
at the end. We can remove this with the .strip()
function:
infile=open('./OutFile.txt','r')
outfile=open('./ModOutFile.txt','w')
for line in infile:
instring=line.strip('\n')
outstring="%s with new info added" %instring
print(outstring)
outfile.write(outstring+'\n')
## This is line number 0 with new info added
## This is line number 1 with new info added
## This is line number 2 with new info added
## This is line number 3 with new info added
## This is line number 4 with new info added
## This is line number 5 with new info added
## This is line number 6 with new info added
## This is line number 7 with new info added
## This is line number 8 with new info added
## This is line number 9 with new info added
infile.close()
outfile.close()
This can be even more powerful when combined with regular expressions (see Regex tutorial)
As you can see above, Python is great for text files but not an elegant program for dealing with the kinds of data that are found in tables (e.g. csv files). A more recent library called pandas adds a lot of the more intuitive functionality of R
import pandas as pd
FileLoc="./FallopiaData.csv"
mydata=pd.read_csv(FileLoc)
print(mydata[:3])
## PotNum Scenario Nutrients Taxon Symphytum Silene Urtica Geranium \
## 0 1 low low japon 9.81 36.36 16.08 4.68
## 1 2 low low japon 8.64 29.65 5.59 5.75
## 2 3 low low japon 2.65 36.03 17.09 5.13
##
## Geum All_Natives Fallopia Total Pct_Fallopia
## 0 0.12 67.05 0.01 67.06 0.01
## 1 0.55 50.18 0.04 50.22 0.08
## 2 0.09 60.99 0.09 61.08 0.15
Now you can see something much more analogous to R. There are also functions for inspecting and plotting the data. It’s still not quite as good as R, but much better than base Pyton. For more information on pandas, see the documentation and tutorials on the pandas website