INTRODUCTION
This set of web pages is intended as an entropy resource with a difference. Be sure to check out the entropy tools at this page. In addition, links are provided to a series of pages documenting the entropy computed for a variety of sources, for example,  from Non-linear dynamics, of Natural language texts, and DNA sequences. In addition links are provided to papers and articles, describing the measure.
 
These results are computed using a novel grammar based measure of information. Here the definitions of string complexity, information and entropy are all
based on the notion that a finite string of characters may be systematically and mechanically constructed from an alphabet, and that the information is but a measure of the numbers of steps required to create the string from its alphabet.
 
Our measure is based on counting the numbers of steps required to construct a string from its alphabet. Clearly such a notion is related to Kolmogorov's Algorithmic Information content (1964), but also Lempel Ziv's measure of complexity proposed in 1976 for finite strings, and giving rise ultimately to the LZ family of compressors. It is not so evident that the measure we use here relates to Shannon's measure which is defined on probabilistic terms, but in fact our measure is demonstrably equivalent. We use results from non-linear dynamics to illustrate its unequivocal correspondence. On the other hand our measure is easily computable for any finite string without reference to the statistical characteristics of the string. Our measure uses no probabilities, relies on no knowledge of the source or source statistics, and delivers results for any finite string.
 
Our software has been tested to strings of 100 million symbols, accepts alphabet sizes up to 256 characters. Diagnostic outputs are available.
 
 
DETERMINISTIC INFORMATION THEORY
 
DET-IT defines a new approach to information theory, an approach which contrasts to Shannon's probabilistic formulation, and to Kolmogorov and Chaitin's algorithmic formulations. To distinguish this variant of Information Theory from the established areas we have chosen to call this Det-IT.
 
Relatively recent developments in non-linear dynamics have demonstrated that deterministic systems may exhibit behaviours more usually associated with stochastic indeterminism. Though Shannon developed his ideas about entropy (1948) in the context of communications  he was almost certainly aware of the ideas involving the symbolic encoding of phase space trajectories of dynamical systems. Morse and Hedlund had in the early 1930’s already published on the topic of symbolic dynamics. The idea of computing the symbolic entropy for a dynamical systems is formalised in the definitions of the Kolmogorov-Sinai entropy, essentially the maximal shannon entropy computed over all possible partitions (finite and infinite) of the phase space encodings. In 1977 Pesin proved that the KS-entropy of certain classes of non-linear dynamical system is given precisely by the sum of positive Lyapunov exponents for the system. A number of practical techniques exists for quite precisely computing the Lyapunov exponents of simple systems from the observed dynamics. Thus the KS-entropy may be derived indirectly without reference to the source symbol statistics or coding processes. The logistic map provides an excellent example of a coded information source whose entropy may be computed by way of the positive Lyapunov exponent from the real-valued time-series.
 
What we may take from non-linear dynamics is the idea that a duality exists between deterministic and probabilistic  formulations of system behaviour. Thus it ought not to be surprising that Shannon's probabilistic theory would have a deterministic and computable counterpart. Whereas probabilistic treatments abstract out the underlying mechanisms of the information processes, making it difficult to apply indeed well nye impossible to apply meaningfully in the case of individual finite strings, our deterministic measure gives us a way to evaluate information and entropy quantities of individual finite strings without reference to the source or ensemble statistics.
 
The logistic map is used here as a known information source to demonstrate the unequivocal correspondence between our measure and Shannon's entropy definitions. It remains a challenge however to prove formally equivalence.
 
The software "tcalc.c" which is used to compute these results is provided subject to conditions of the gnu general public license.
 
 
Normalisation of Data:
When applying T-entropy to measure information content or entropy of individual strings it is important to understand the relationship between T-entropy and the classical Shannon entropy when interpreting the results.
 
It is important to understand that Shannon’s entropy definition was not intended to apply to individual finite strings, but to be a description of the behaviour of an information source, from which message strings issue. In this context a single finite message string is simply a sample of the output, but may or may not be typical for the source.
 
Lets be more more careful in our terminology or we will simply add to the confusion that already exists in large measure in the application of Shannons measure.
 
In classical IT the terms entropy and entropy rate are used. These are distinct from one another, but careless use may mean the terms are abused. Entropy rate may be simply called entropy.   Now ordinarily the units of entropy and entropy rate would clarify the situation ....... but unfortunate choices of the units simply add to the confusion.
 
For example entropy may be measured in (bits)... these are information-bits.
 
 
 
Shannon’s entropy rate is defined is the ‘expected’ information rate for the source, i.e. the   information per symbol, averaged over all time. One may well expect that the actual information rate for finite sample messages will vary about the average. Some messages will have a higher than average content, and others lower than average. This raises an interesting conundrum which Shannon appears to have conveniently overlooked in terms of developing his theory, but which arises quite obviously when one applies a measure such as the T-entropy which is in fact defined for individual finite measures.
 
To illustrate the problem consider the example in which Shannon computes the entropy for a binary source, with i.i.d. probabilities P(1) = 1 - P(0)
                          
 
 
                    
 
 
 
Here the entropy is maximum when P(1) = P(0) = .5, precisely the situation we might expect from a sequence of coin tosses, where we record the heads as a ‘1’ and tails as ‘0’.
 
Now Shannon assigns the maximum entropy to be 1. What does this mean?  To make some sense of this lets consider the units of entropy.
 
 
Now, just a note on the units of entropy which is the information per symbol. Rather confusingly, the information is measured in units (bits).... but so also is the length of the string, counted in binary symbols. So a string may have a length 100 bits, but information content  of say 65 (bits). The entropy of the string would then  seem to be 65 (bits)/100 (bits)  = .65 bits/bit. It is tempting to cancel the units top and bottom as if they were the same, but in fact information-bits are not the same as string-length bits, even though the two units are almost universally shortened to (bits).
 
By analogy, imagine the confusion if we had a ‘standard’ car which travels 10 miles per gallon of fuel and then defined all distances between points in gallons where it is implied that the standard conversion is 10 miles = 1 gallon. (While this is not usual... it is done.  For example it is common to describe very large distances in terms of the time taken for light takes to traverse the distance.)  Where it may be normal to quantify the efficiency of a car in miles per gallon, using our new distance units we would get gallons/gallon... which at first glance looks to be unitless. But to conclude this would be misleading. So it is with entropy. Contrary to  popular assumption Entropy has units. The units are information-bits/string-length. Even though propability is unitless the units in Shannons definition result from the base of the log function and the constant.
 
 
 
In the limit as the length of the string sample approaches infinity the message will encapsulate all of the source statistics, and presuming the source statistics are stationary, one may in principle compute the source entropy from the infinite sample string, estimating the source symbol probabilities from the frequency of occurrence of the patterns.
 
The way this might be done is to place a window of width n over the string assumed to be length N, and slide the window along the string, one symbol position at a time. The total number of window positions . Lets assume the string is length N \rightarrow \infty. Then the total number of patterns observed by sliding the window symbol by symbol will be (N-n).
Letting F n-symbol pattern
 
 
 
 
 
Research: Deterministic IT
(incomplete :-- pages in flux)