Computing complexity, information content and information rate (entropy)
from a string of symbols:

There are two tools available here which may be used to compute the complexity of a given string, or its information content or its information rate (referred to here as the entropy).

The complexity of a given finite string is taken here to be a measure or a count of the number of steps necessary to construct the string from its alphabet. Any number of algorithms might be considered for the purposes of constructing a string. The choice of production algorithm ultimately determines the practicality of the measure, and the usefulness of the result. For example Algorithmic Information Theory defines string complexity in terms of the size of the smallest computer program that can be run on a Universal Turing machine (A hypothetical computer) resulting in an uncomputable result. Lempel and Ziv (1976) defined string complexity in terms of the minimum number of linear-pattern copying steps that may be used to construct the string and they prove that a certain exhaustive parsing process achieves the optimum. The LZ76 algorithm, by virtue of its exhaustive search  of the patterns, a O(n^2) algorithm, that make this less than attractive in many applications. 

The tools here use a particular recursive hierarchical-patten copying algorithm to construct strings from their alphabet. The construction is referred to as T-augmentation. The inverse operation of discovering the production steps from a given string is referred to as T-decomposition. The process of reducing a given string to a corresponding sequence of constituent pattern,  with each new pattern describable in terms of previously identified patterns in accordance with the T-augmentation construction, constrains the process to such an extent that there a corresponding unique parsing for every possible string. In this respect it is not meaningful to talk of the smallest number of T-augmentation steps... but simply the number of production steps required to construct the string. To date there has been no attempt to prove optimality of the solution. It remains an open problem. It may turn out to be an unprovable result. Nevertheless there exist strong empirical results that demonstrate that this measure is meaningful in terms of what has been established in relation to existing theories of string complexity and information.

We refer to the complexity of a string measured in context of the T-augmentation algorithm as the T-complexity. In fact the T-complexity definition is a logarithmically weighted measure of the effective number of T-augmentation steps. Though the LZ76 and T-complexity measures are found in practice to often behave quite similarly, these differences between the T-complexity and the LZ76 definitions result in some rather interesting differences. One important practical difference is that the T-complexity may be computed in O(n) time (implementation due to Speidel, Yang) where n is the length of the string being processed. 

A series of experiments (see Ebeling, Steuer, Titchener 2001) demonstrate that a linearised version of the T-complexity measure directly corresponds to Shannon information for a simple dynamical system such as the logistic map. The linearised measure is referred to as the T-information content of a string. The T-information rate, i.e. the average T-information per symbol, in a given string is obtained by dividing the T-information by the length of the string and is referred to as the T-entropy.

The two tools are provided here for computing the T-complexity, T-information and T-entropy of a string include

1) tcalc  This implementation is due to Titchener, Wackrow & Speidel. The c-sources are available here also. tcalc provides access to a number of additional features which are helpful in certain applications. However the implementation of the T-decomposition algorithm here runs as O(n^2). 

2) qtc   Pronounced cute-c, and short for QTCIE that is: Quick T- Complexity, Information & Entropy.  As the name suggests it will compute from a given input file the corresponding T-complexity, T-information, T-entropy values. This code uses an implementation of the T-decomposition that runs as O(n)! For strings of <500 symbols there is little to distinguish the performance of the two implementations, but for anything larger qtc offers astonishing performance. 

You may down load an executable from:
linux version  http://tcode.auckland.ac.nz/~mark/SPOD_linux/qtc.gz
OSX version 

Analysis of Time series data using Det-IT
Time series data denoted [x_i] is a series of or sampled time generally real-valued data.
for example:    
0.131538
0.755605
0.45865
0.532767
0.218959
0.0470446
0.678865
0.679296
etc

To compute corresponding entropy values from a series of data points like this requires a first step of symbolising or coarse  grained encoding  of the values.  This is most easily achieved by partitioning the data in to subsets and labeling the values according to the subset.

4 methods for encoding time-series

In the first method a single bi-partition is placed on the time series. This divides the series in those points above and those below the partition.  These are encoded into ‘1’s and ‘0’s respectively and expressed as a string. Often the position of the partition is derived from the running mean. 

A fixed width sliding window is placed over the time series to derive a series of strings from which the entropy may be computed. A compromise is required  where a wide window tends to reduce the sensitivity to time variation, but also gives rise to a longer string that is capable of yielding a more dependable value of the entropy.

A coarse grained bi-partition such as in method 1 is not sensitive to the spread of the time series values. to overcome this, one may instead choose a hger order alphabet and a striped partition.

Method 2 might be used  alleviate the need to determine a running average, allowing the  striped partition to be fixed. 
However compared to the bi-partition the width of the sliding window to need to be relatively broader so that the alphabet symbols are being suitably exploited. If the symbols are not representative the resultant entropy will tend to be over estimated. As a consequence method 2 compromises the time sensitivity. 

An alternative approach is to use a swept bi-partition. This means a separate string will result from each of the individual partition levels. With each string returning a corresponding entropy value, and each window position extending this array in time, a two dimensional array of entropy values will results form this approach.
This array of values may be represented as a surface which in itself is very helpful in revealing changing structures both within the signal dynamics and over time. 


Finally, method 4 produces an entropy time-series from the signal dynamics. It seems to reliably derive the information rate across the whole  of the signal dynamics, as a function of time.

Both methods 3 and 4 are easily computed using the UNIX tool qtp (quick T-partition). A guide for using qtp is given below. In fact you may well need to experiment with combinations of parameters to optimise the output accordingly. The window size, the degree of window overlap, the partition range and increments are all presettable. qtp takes the whole of the time series as input and processes it in 

It will suffice at the outset to provide an example of how this tool is to be used. qtp generates its output in a format that is suitable for use directly with SpOd. But you may also choose to extract the data arrays from the output of qtp,and render this in what ever is your favorite visualisation environment.  The surfaces may for example be rendered in MATLAB using the mesh() function.


The above surface represents an array of entropy values computed from the scanned partition and sliding window. Time runs left to right. The height of the surface represents the magnitude of the entropy, and the position into the page corresponds to the position of the bi-partition with respect to the signal dynamics.


Example 4.1:  using qtp

qtp is available as an executable from 
http://tcode.auckland.ac.nz/~mark/SPOD_linux/qtp.gz


For this example we will need a data file ... I have put a spod data sample file at:
http://tcode.auckland.ac.nz/~mark/ntavail.spd.bz2


You can view this data file using SpOd by typing (assuming you have put the tool spod in /usr/bin or some other directory which is in the path.

spod is available in compressed executable form for linux at:
http://tcode.auckland.ac.nz/~mark/SPOD_linux/spod.gz

You need to decompress this by typing:
gzip -d spod.gz 

Make sure it is executable by typing:
chmod 755 spod

You may like to move this to for example the /usr/bin directory 

Once this is done you can view the example data file by typing
bzcat ntavail.spd.bz2 | spod 

All going well and you should get an animated  display looking something like this:


You may rotate the display using the left mouse button. And you can scroll the data using the right mouse button.

The 'j' (for jump/cycle) allows you to flick between preset views of the data. (now there are subtle differences between the linux and mac versions of spod... so I am not sure how you will get on...its possible that the linux spod version is not he very latest....I need to look in to updating that)


By rotating the image slightly you may obtain a full view which includes in this case sleep state information at the top/back




To extracta sample time series you may decompress the sample file by typing: 
bzip2 -d ntavail.spd.bz2

Alternatively if you don't have bzip2, download the gzipped version of the data file from:
http://tcode.auckland.ac.nz/~mark/ntavail.spd.gz

And decompress this by typing: 
gzip -d ntavail.spd.gz

(bzip2 is a better compressor than gzip, but may not be loaded onto your machine!?)

Of course you can display the decompressed file with spod by typing 
cat ntavail.spd | spod
or simply 
spod ntavail.spd

As you can see the file, ntavail.spd, contains five traces (actually it contains 7, with two copies each of the C3 and LOC channels). We shall extract one of these, the EEG/C3 channel to use with qtp. (You can always use an editor to view the text content of the spd file and see how it is put together.)

To extract the C3 data into a file called data in the tmp directory, type:
head -5 ntavail.spd | tr " " "\n" | awk '{printf "%.1f\n",$0}' >  /tmp/data

We now compute the entropy surface from the data stored in /tmp/data
Using qtp (which is also presumably put in your /usr/bin directory, type: 
qtp -f /tmp/data -p 500 50 -w 5000 2500 -i -min -250 -max 250 -ud -xyz 9 4 1 0 0 0 | spod

I have rotated the view around slightly to highlight the surface relief.




The above  corresponds to about six hours and 20 minutes of eeg data (sampled at 100Hz  and resulting in a total of 2,283,112 data points). The surface comprises an array of 912  x  21. 

You can rotate this about using the left mouse button, and use the arrow keys to alter the colour gradient and brightness.
The middle mouse button should change the light source.... have a look at the spod commands at 

Now you get a few more bits of data from qtp, takes somewhat longer to calculate:

try typing: qtp -f /tmp/data -p 500 50 -w 5000 2500 -i -min -250 -max 250 -ud -xyz 9 4 1 0 0 0  -RMS | spod

Then when the display comes up hit the numbers '0' followed by '2'  and this will display the max entropy graph
you can bring up a grid striking the shft1 (i.e. the '!' key)... again have a look at the spod commands:

http://tcode.auckland.ac.nz/~mark/Spod%20commands.html

(If you type '?'  while displaying with spod it will output the abbreviated help list to the terminal window.)

Good luck:
If you have any questions email me at mark@tcode.auckland.ac.nz  I am more than happy to help.




http://tcode.auckland.ac.nz/~mark/SPOD_linux/tcalc.chttp://tcode.auckland.ac.nz/~mark/SPOD_linux/qtc.gzhttp://tcode.auckland.ac.nz/~mark/SPOD_linux/qtp.gzhttp://tcode.auckland.ac.nz/~mark/ntavail.spd.bz2http://tcode.auckland.ac.nz/~mark/SPOD_linux/spod.gzhttp://tcode.auckland.ac.nz/~mark/ntavail.spd.gzhttp://tcode.auckland.ac.nz/~mark/Spod%20commands.htmlmailto:mark@tcode.auckland.ac.nzshapeimage_1_link_0shapeimage_1_link_1shapeimage_1_link_2shapeimage_1_link_3shapeimage_1_link_4shapeimage_1_link_5shapeimage_1_link_6shapeimage_1_link_7
Det-entropy - some tools:-