These pages are an unashamed promotion of Det-IT. This page completed in haste... to be checked more carefully in due course.
 
Some variants of Information Theory:-
 
Classical IT, from Shannon 1948
Algorithmic IT, from Solomonoff, Kolmogorov, Chaitin,1960’s
Deterministic IT , Titchener et al 1997-
 
A potted history:- explaining how Det-IT fits in.
Shannon is widely credited with the creation of Inofrmation Theory. In fact others before him such as Hartley had already attempted to define what information is in the context of communications process. Hartley in 1928 proposed a logarithmic measure of information but overlooked the statistical weighting needed to make the measure appropriate for quantifying information in strings.
 
Shannon’s definitions were ultimately of a statistical nature... which is perhaps not unexpected since Boltzmann had much earlier already invoked statistical arguments in the development of the area of thermodynamics, the field which lies at the at roots of IT. Indeed the equation that is famously known as Shannon’s entropy apparently appears in Boltzmann’s own lecture notes 50 years before the 1948 publication of Shannon.
 
Shannon was on a quest for understanding information in the context of communications whereas Clausius, Boltzmann, Gibbs, Maxwell, to name but a few, were concerned with describing the macroscopic behavior of complex collections of particles, understanding the irreversibility observed of certain physical processes, but not anticipated by the newtonian physics. Clausius introduced the notion of entropy (1864) to help explain the latter. Boltzmann later gave a mathematical formalism to this with S = k log W, the equation which now adorns his headstone. However Boltzmann (and independently Gibbs) also led the way in introducing statistical  treatments, giving rise to the subject of statistical thermodynamics. So it is not altogether surprising that the statistical version of the equation for entropy H = -∑p log p  had already appeared before Shannon.  Indeed the latter equation is easily derived from Boltzmanns entropy equation by substituting W with the binomial distribution and applying sterlings approximation in the limit of large n.
 
Perhaps  Shannon’s really important contribution lay in the realisation that information is entropy. He was of course encouraged by von Neuman to use the existing term from physics, if only to give weight to his theory, although perhaps Shannon did not immediately see the two theories as relating to one another so closely as they in fact do. Many have taken great pains to keep the two fields separate (e.g. Feynman) and this has perhaps added to the levels of confusion and misunderstanding that surround areas such as entropy and information.
 
Nevertheless developments taking place in physics, especially in the latter half of last century, in relation to non-linear dynamics, have been proceeding a pace. Poincare’s is credited by many as being first to seriously study the problems non-linearities systems. Early on it was understood that non-linearities created difficulties  in modeling physical systems. Much effort was given to finding elegant mathematical descriptions of the behaviour of physical systems, for example the orbit of planets about the sun.... but in fact it was well understood even in Poincare’s time that a simple system comprising just three planetary bodies in orbit about one another defied solution! The problem is that the non-linearities and the mixing behaviour of the physics, lead to unpredictable outcomes. So even systems of simple coupled differential equations may not have simple solutions.
 
In the 1930’s Morse and Hedlund had already introduced the notion of symbolic dynamics, discrete alphabetic encodings in relation to the solutions of differential equations. In deed the notion of encoding the phase space trajectories of dynamical systems had been around since the ideas of Poincare... without being to careful historically speaking... the important point is that the symbolic representation of dynamical processes, and the statistical treatment of these encodings was already established well before Shannon came along, so when he put forward his theory of communications, it was couched in such familiar terms as to be immediately held as being right, even though Shannon’s paper does little more than present arguments about the reasonableness of his approach.
 
Shannon’s definitions revolve around the idea of an information source, issuing symbolic messages which together characterise the source. One may view a dynamic system as such an information source, and the encodings of the trajectories as messages issuing from the source. Thus the entropy of the system may be computed using Shannon’s formula. Of course the entropy will be perhaps  sensitive to the method of encoding, so one may seek to find the partitioning of the state space, and corresponding alphabet size which maximises the entropy.  The supremum of all entropies computed over all possible partitions is refered to as the kolmogorov Sinai entropy, and is infact a fundamental characterstic of the system.
 
The fact that a dynamical system is governed by fixed rules, i.e. is deterministic in its behaviour but may be characterised in statistical terms led to the important realisation in thermodynamics a duality may exist between stocahstic and deterministic descriptions. This understanding drove Kolmogorov to suppose that there ought to be a dual to Shannon’s probablistic definitions of information. He this proposed an algorithmic view of information for individual finite string objects and labeled this the algorithmic information or algorithmic entropy, latter renamed by others algorithmic complexity.
 
 Unfortunately Kolmogorov resorted to a definition that incorporated a notion of compressibility, or more specifically the inverse. Implicitly these ideas rely on the possibility of making a reversible mapping between all strings, and measuring algorithmic complexity in terms of the length of the shortest string in each mapped pair. This creates a problem. Ultimately there are too few short strings by far to allow one to make a complete and reversible mapping between strings that properly and consistently  reflects the true information content in the strings... There are two important results from this, firstly that most strings are not compressible, and secondly that the Kolmogorov complexity is uncomputable (except to within an unknown constant). This presumably was not the outcome hoped for by Kolmorov.
 
Nevertheless Kolmogorov’s ideas were picked up by Lempel and Ziv who  in 1976 proposed a more practical measure of string complexity for finite strings. They defined the production complexity of a string to be the smallest number of steps to construct the string from its alphabet. They also showed that an exhuastive parsing of any finite string will result in the optimum. Of course their interest was in achieving compression, and they further  explore the issues of coding cost in encoding the vocabulary of patterns that make up the string production steps.  But lets just stop at this point. We are only  interested in measuring information. We have a finite string for which we want to know what its information content is. The number of steps required to build the string is such a measure. We also don’t have a clear way to translate this number to the Shannon information, since i) the Shannon information is an expectation for a source, and not defined for an individual finite string, and ii) Lempel and Ziv only manage to link the production complexity to the Shannon entropy in the limit of long strings assumed to be stationary, statistically speaking. So we are still not there.
 
Now there are some other practical problems too.. namely that the LZ exhaustive algorithm proceeds rather slowly, at
O(n^2). So one can’t look at too many strings or strings that are too long without paying a significant time penalty. This brings us to the area of Deterministic-IT. Here we use a similar approach to the Lempel Ziv production complexity. In fact the  approach followed was again stimulated by the Chaitin, Kolmogorov ideas and the connection between the our approach and Lempel-Ziv’s was only recongnised latter. Still we use the idea of a production process. We use a specific algorithm called T-augmentation, originally used to construct variable length self-synchronizing codes. T-augmentation is invoked ina recursive fashion and results in the catenation of a recursive hierarchy of pattern copies. T-augmentation by virtue of the constraints of the process, allows for reversible decomposition of a string to its constituent patterns. The T-decomposition proceeds at rates of O(n.log n), which on a fixed word size computer ostensibly corresponds to O(n).
 
The T-complexity of a string measures the effective number of T-augmentation steps required to build a given string. I say effective because it is a log weighed measure. While there are many similarities between the LZ76 measurte and the T-complexity measure there are also many subtle differences when it comes to values obtained for specific string examples. Broadly speaking, and apart from the computation cost differences, the ensemble properties of the LZ76 and T-complexity measure are asymptotically similar.
 
Both exhibit similar rates of growth in the maximal complexity as a function of string length. The LZ-76 bound approaches the n/log(n) curve, whereas the T-complexity is found to be roughly (but indeed quite precisely) given by
li(n.log#A) (Here the li() is the logarithmic integral function). In fact the n/log(n) and li(n) functions are also compared in relation to the density of primes, so it is already well established that these two functions are asymptotically similar.
 
The fact that the logarithmic function provides a rather good approximation of the maximal T-complexity bound for all n, and not just in the limit of large n, allows us to systematically linearise the T-complexity measure as a function of string length. Indeed we simply define the T-information to be the inverse logarithmic integral of the T-complexity. We infer from the form of this equation that the T-information has the unites of (nats) from base e. Shannon and those in classical IT, are used to working with (bits) mostly. But (nats) may be converted quite simply to (bits) by dividing by ln(2). So it is natural to want to compare the T-information with the Shannon information, right!? Before doing this we may go one more step, We define the T-entropy to be  T-information/ n, having the units of (nats/symbol).
 
We may also normalise the T-entropy so that the maximum expected T-entropy is 1.0.
 
Using the techniques/tools for time series analysis that are described on other pages on this site, I have computed the T-entropy surface for a uniformly random time series. 200,000 points are shown at the base of the graph to the left. A window of 100,000 samples in width is then shifted every 1000 points in time resulting in a total of 100 time positions. At each position a partition p is placed across the time series with the values x_i ≥ p encoded to s_i = 1, and the values below x_i < p encoded to s_i = 0. Thus for each position p, the windowed points  [x_i]_w are encoded to a binary string
s_p = [s_i] _p .
 
 
Note that because the time series is uniformly distributed in the interval [0,1], the prob(1) = p (nominally) and likewise prob(0) = 1-p. Thus the partiton axis may be read as the probability prob(1) = 1- prob(0)  directly. The entropy surface H_{T(w,p)} is generated by computing the T-entropy of the strings corresponding to the window position and partition.
 
Normalising the T-entropy so that the average maximum value is nominally 1.0 and viewing end on we get the above illustration.... note the surface undulations, sampling noise arising from the finite sample strings. If we look at the surface end on (removing all perspective) we get the figure below, which is recognisable immediately as being similar to the graph from the Shannon weaver publication (1953), plotting the normalised Shannon entropy as a function of probability prob(1) from an i.i.d. binary source.
 
So , at least to some scaling constant, the T-entropy and the Shannon entropy appear to be connected... although we must again remind ourselves that the T-entropy is defined for individual finite strings, where as the Shannon information is not!. In principle, given a measure defined on individual objects one may systematically proceed to measure all objects to arrive at the enssemble average (and variance).
 
We have done this exhaustively for all strings for up to 36 bits in length, and further sampled populations of longer strings up to 2000 bits for example and find the weighted ensemble average of the normalised T-entropy  exhibits a relatively rapid convergence to the expected Shannon entropy function for i.i.d. binary source. This is shown in the third illustration below. The blue curves have been generated from the exhaustive evaluation of the corresponding ensemble, where as the red curves have been derived using a sampled range of strings. Clearly it os not physically possible to generate all 2^2000 strings, but we at least have the sense that by 2000 bits the expected normalised T-entropy corresponds well with the Shannon theory.
 
Other comparisons between the Shannon measure, or more specifically the Kolmogorov Sinai entropy are  possible in the context of know results about simple discrete time non-linear dynamical systems like the logistic map. For example it is known (Pesin’s identity 1977) that the positive Lyapunov exponent for the logistic map gives the Kolmogorov-Sinai entropy for the system. It has also known that certain discrete partitions of the state space are generating partitions, that is the symbolic encodings resulting from these coarse grained partitions have a Shannon entropy exactly equal to the KS-entropy and therefor the positive Lyapunov exponent. The following graph compares the normalised T-entropy computed from sample strings of 100,000 bits long and derived from the generating bi-partition for the logistic map.
 
Here the T-entropy very nicely tracks the positive Lyapunov exponent across the whole range of dynamical behaviours.
It remains an open question whether Deterministic Entropy (T-entropy) is in fact formally a deterministic dual to Shannon’s probablistic theory. For the answer to this we must wait for a bright mathematically minded genius to furnish an answer.
 
The following illustrations shows the power of using a practical measure like the T-entropy to analyse time series, in this case  derived from the logistic map, allowing the biotic potential r to be a function of time.
 
 
 
 
Showing the iterates of the logistic map above the surface to highlight correspondence between the supertrack functions and the T-entropy surface features.
 
A portion of the surface computed at much higher resolution and with glancing lighting angles, further highlights the supertrack functions and the low entropy gullies, apparently implied artifacts of the supertrack functions exhibited during periodic orbits. (All illustrations of the T-entropy surfaces produced using SpOd.)
 
 
 
 
 
 
 
 
 
 
Information Theory