Data in Python
Numeric Types
Python supports integers, floating point numbers, and complex numbers. It being 2011 and RAM being cheap, “plain integers” have 32 bits of precision (equivalent to long in C), and “long integers” have infinite precision. Floating point numbers have 64 bits of precision (equivalent to double in C). A complex number z has real part z.real and imaginary part z.imag, and each part is a floating point number. When assigning a value to a variable, the variable is typed based on that value. When assigning an arithmetic result to a variable, the variable is typed based on the “widest” variable in the equation, where floating points are wider than long integers, and long integers are wider than plain integers.
Converting to a different type is easily done: to a plain integer via int(), to a long integer via long(), and to a floating point number via float(). Note that when a floating point number is converted to an integer (plain or long), it truncates toward zero.
Lists and Tuples
Lists and tuples are both native data structures to Python. They are zero-indexed, and the ith element of list or tuple x is referred to via x[i].
There are two main differences between lists and tuples:
- Lists are declared with square brackets, and tuples are declared with parentheses.
- Tuples are immutable, and cannot change once created.
Other than those two differences, operations on lists and tuples — and references to lists and tuples — act in the same way. The Python Tutorial documentation of methods for lists is comprehensive.
You cannot perform array operations on lists or tuples. For array operations — like those in IDL or Supermongo or Fortran 90 — you need numpy arrays.
Numpy Arrays
Numpy arrays are defined by the Numpy package, and are used in SciPy and matplotlib. A zero-length numpy array is declared with x=numpy.array([]). A numpy array of length N is declared with x=numpy.zeros(N).
A quick note on object-oriented programming, if you are new to it: an object x carries with it both the data and the functions which can operate on/tell you something about the data. So numpy arrays carry with them relevant numpy functions. Thus some functions, like min() or max() can be called two ways on a numpy array x:
- numpy.min(x)
- x.min(), as the function min() belongs to the object x.
Meanwhile, the similar built-in Python function min() can operate on a list y only via min(y).
The full numpy tutorial includes detailed coverage of numpy arrays.
Reading in Data
I’ll focus on reading in columns from text files. There are many ways this can be done, but numpy.loadtxt() is one of the easiest and most versatile. (See the official NumPy documentation for loadtxt().) This loads the data into a 1- or 2-dimensional numpy array. The default is into a two-dimensional numpy array, and the options in numpy.loadtxt() let us deviate from there. Starting with the short data file demoread.dat, we can read it into a two-dimensional numpy array with data=numpy.loadtxt("demoread.dat",skiprows=1). The skiprows option lists the number of rows from the top to skip. Now the 2D array data has the data stored as:
[[ 7. 99. 84.] [ 8. 64. 39.] [ 9. 57. 77.] [ 3. 49. 77.]]
By default, numpy.loadtxt reads in all the columns. With the usecols=() option, you can list the columns you want to save. Note that it is zero-indexed. And if you want to store each column in a 1D array, use the unpack=True option. Therefore, to read the first and third columns from the data file, and store them as x and z, respectively, use x,z = np.loadtxt("demoread.dat",skiprows=1,usecols=(0,2),unpack=True). Now the 1D array x is storing data as
[ 7. 8. 9. 3.]
.
The short script demoread.py demonstrates the use of numpy.loadtxt.