Deterministic Information Theory represents something of a departure from classical IT, and AIT. Det-IT explicitly attributes measurable information content to individual finite string objects and provides a method to compute the information content for any given string. long chain polymers, DNA, RNA, proteins, language texts, acoustic languages are all naturally occurring "strings". Any theory about information is evidently a theory of physics, and the processes of information exchanges, communication and processing are necessarily constrained by the laws of physics. 
In science, observation and measurement are central to test any theory ... 

These pages thus present a novel approach to observing and measuring information, and the outcomes of information processing.  Feel free to experiment, to understand information in a way  not previously possible. Be prepared to shrug off pre-conceived ideas about information and entropy. These pages are not for the faint minded, and challenge established thinking. But that is after all what a good theory will do.

What is information? ... From the classical standpoint we may associate information with the "packing" of patterns in "containers", the patterns are linear symbolic arrangements and the containers are strings of size/length N. Imagine placing a window of size n over a string and then sliding the window along one symbol place at a time, and then count the variety of n-symbol combinations that appear. As a rule, the greater the variety, the greater the information content. Shannon's n-block entropy provides an elegant formulation of this with asymptotic conclusions few would question today. The problem is how to apply his probabilistic formulation to individual finite strings? In 1964 Kolmogorov also saw this as a challenge... AIT was one result of this but perhaps of more relevance here was his work in relation to non-linear dynamical systems.

Undoubtedly inspired by Kolgmogorov's algorithmic approach, Lempel Ziv (1976) proposed their production complexity for finite strings. By considering that a string may be produced from its alphabet, they compute a complexity for a given string as the minimum size vocabulary of patterns needed to produce the string. They prove asymptotic equivalence with Shannon's result, giving further clear evidence of the relationship between pattern packing and information content.

Det-IT roots are in coding, indeed in self-synchronizing codes, but we take a leaf out of the LZ paper (1976). Our emphasis is specifically on quantifying information content in finite strings and the following tutorial provides examples of how this is achieved and how the results may be usefully interpreted. It is notable that the developments in and flavor of Det-IT is very much that of a physics theory, one being based on empirical observations and experiment.

Observation and measurement implies a set of scales and units. Det-IT is about computing string complexity, information and entropy and the units in which these are measured are (taugs), (nats) and (nats/character) respectively.

T-complexity (taugs): (short for T-augmentation steps) is a weighted count on the number of production steps required to costruct the string from its alphabet. 
Its value is simply a function of the size of the string ants its information content.

T-information (nats): T-information, I_T, is a linearised measure of the complexity, computed as the inverse logarithmic-integral (See Abramowitz and Stegun, Handbook of Mathematical functions) of the complexity. To understand why the logarithmic integral turns up here requires one to follow a fairly lengthy derivation of the upper bound (see papers , Deterministic Complexity Information entropy, published in Fundameta Informaticae 2005)

If the complexity C_T is measured in (taugs), I_T turns out quite naturally to have the units of (nats), which assumes base-e. As may be expected, (nats) may be converted to the more familiar units of information (bits), by dividing (nats) by ln(2), i.e., (bits) = (nats)/0.69 . The maximum value for the T-information of a string of length N characters, is N x ln(#A) (nats), where #A is the size of the alphabet.

T-entropy (nats/character): The average information rate or entropy, H_T, is computed as ratio of the total information to the length/size of the string I_T/N, and thus H_T has the units 
(nats/chars) but may be equivalently expressed in (bits/char) = (nats/char)/0.69 .
The maximum value for the T-entropy is ln(#A) (nats/character), where #A is the size of the alphabet.
In the special case where a string's 'characters' are in fact 'bits', the entropy has units of (bits)/(bit), and would appears 'unitless' or even 'normalised'.
This is unfortunate consequence of choosing (bits) to serve as units of "information" as well as string size/length. 
Many have suggested ways to resolve the confusion that brings about, for example, to replace the information units with "binits" (base 2)  or "decits" (base 10) or even "nats" (base e) ... but 50 years of entrenched confusion is a tradition hard to break ..... 

To avoid confusion, tcalc defaults to using (nats) which turn up anyway rather naturally from the derivation of the complexity bound.  But I do convert to (bits) in the examples below where it is insightful. It is sometimes easier to see what is going on when the units look familiar (if not confusing). 


Later we show that the T-information and T-entropy are closely related to the Shannon probablistic definitions. 

However, we have to remind ourselves that Shannon's entropy is an average (an expectation) for an enssemble of messages, or the source. An individual sample message issuing from the source may have less or even more than this average, such fluctuations about the expectation are normal in any such sampling. What Classical IT doesn't tell us, is what the spread of the distribution about the mean looks like. By sampling large numbers of strings using the T-information  we may conclude that for a pure random source having the maximum average entropy, these are normally distributed with a variance that goes essentially as sqrt(N), where N is the length of the string, that is the information content of the strings is for a given length approximately binomially distributed. In measuring information content of indivual sample strings we sensitively observe these fluctuations directly, thus to make sense of these measurements we have to understand the nature of the distribution for the corresponding string lengths.

Physically a symbol can't actually carry more information than a symbols worth, i.e. a binary digit (abbreviated to bit (Tukey)) can only carry a bit of information maximum. The paradox implicit in Shannon's choice of scaling constant and units, in setting the maximum average entropy to 1, means that any apparent fluctuations about the mean  imply individual bits must carry more than they is physically possible. More over it means that Det-IT must be scaled to achieve the Shannon normalised results.

We shall explain this more carefully in a section below.

The T-complexity and T-information distributions are normally distributed, with the vast majority of strings of any given length having close to the mean value. The mean value corresponds to the  Shannon maximum average entropy for the source

Experiments you may try:

Information is communicated serially. The first symbols may easily carry more information than the last. If we switch to considering Shannon's view, that information is a measure of uncertainty, then when the first symbol arrives we have nothing on which to base an expectation, so it necessarily carries maximum information. Irrespective of the symbol the first will carry exactly one bit of information (for the smallest alphabet is binary... and as we don't have information at this stage to know more about the alphabet until more symbols arrive we can only assume it is binary.) 

The question is what information is conveyed by a repeating sequence, for example, 0000000....... 

The classical view is that the repeating symbols conveys no information, and while this may certainly be the situation in the limit as n -> infinity, is this reasonable for a finite string? We have already understood the first digit carries one (bit). But what of the next? Until we have received the next symbol we are uncertain about its outcome ... so the next symbol must also carry some information, but perhaps less than a (bit) ... and so on. We might reasonably argue that with each new symbol received the total information must rise, but that the average information per symbol ought to approach 0 in the limit as n -> infinity.

In fact Det-IT attributes a total complexity value to a string of n repeating digits, of log_2(n), and the information is thus given by li^{-1}(log_2(n)). 

We are now in a position to see this. Notice that the total information increases monotonically with length....But that the  entropy (the average information per symbol) starts close to 1bit/bit  and then falls asymptotically to zero as n -> infinity in keeping with the Shannon result. 
Look at the T-information values and the T-entropy respectively for each of the following:
'0' '00' '000' '0000' '00000000' or '00000000000000000000000000000000' and etc

The following two graphs illustrate the growing total information content and falling average information rate for a string of repeating symbols. 

The term Entropy was first coined by Clausius around 1865, in an attempt to explain why certain observed processes were irreversible in time. Its clear that many of the laws of (classical) physics exhibit no preference for a given time direction; time may be run forwards or backwards with no discernible change to an observer, yet many natural phenomena exhibit a clearly preferred time sense.

For example, consider a box of atomic particles initially regularly positioned one to each node on a regular imaginary 3D lattice. The over time an observer would see that the  arrangement of  particles would quickly degenerate into chaos. The thermodynamic entropy of the system appears to increase, never decrease (We shall more to say about the second law of thermodynamics in due course.

Whereas much has been made of the similarity of IT entropy and its thermodynamic counterpart, in this respect both appear quite different from one another. But is this not simply a an illusion. Being couched in terms of source probability distributions, one may be inclined to consider Shannon's entropy and sources only in respect of stationary distributions where one may make reasoned assignments of probabilities to symbol or pattern occurrences based on frequency of appearance of the these. By taking a string of digits and treating this without regard to time one must necessarily  lose the significance this has. If on the other hand we recognise that a string is a serial prescription for information then the direction in which the symbols patterns are encountered has significance, and if one assumes some sort of adaptive model of the distributions based upon the sequence seen so far, then we immediately recognise that reversibility and irreversibility play a role. 

The production complexity implicitly invokes a time direction. We start at one end of the string and to use the language of LZ we deduce a production history which necessarily reflects the unfolding patterns of the string. Its clear that some strings have a very definite asymmetry, for example 00000000000011100011001 and its time reciprocal 10011000111000000000000 exhibit order and disorder in opposite time sense. We may expect the production history of such a string to be sensitive to the direction of parsing. for this reason we may systematically process a string in both directions to detect asymmetry.   

An extreme example would be the string of repeating zeros, followed by a solitary 1, and its reciprocal the 1 followed by a series of repeating zeros. In the case of the repeating zeros its clear the with the receipt of the first we obtain one whole  bit of new information, the uncertainty is maximal for the first observed digit. As more symbols are received, if the first symbol is simply repeating we may be inclined to feel more certain that the sequence will continue, so in our statistical model, probability for a '0',  P(0), might be seen as increasing over time, so we may write it as function also of time P(0,t),  Clearly then P(1,t)  (or correspondingly for any other symbol that might turn up) must be decreasing in time. Thus if an when 1 appears out of the blue, it will appear to convey a relatively high jump in the information. It would be  tempting to conclude that this solitary bit would carry an amount of information  proportional to -log_2(P(1)) potentially greater than the bit itself. For a single bit to carry more than one bit's worth of information is of course a physically impossibility, an apparent paradox which arises from modeling information in terms of changing probabilities. Statistical techniques are not terrible easy to apply in the presence of transients. In reality the information is conveyed not by individual symbols  but symbols operating in concert and in context. Thus the "correct" way to understand this example is to recognise that with the appearance of the 1 as late as it is in the sequence, he probabilities P(0,t) must be revised downwards, hence the '0's are correspondingly conveying more information than was previously accounted for.

From this example it is clear that if the solitary 1 appears early it "carries" in some sense less information than if it occurs late, thus the two sequences: 1000000...., and its time reciprocal 000000....00001 necessarily have different total information content. Thus irreversibility is again a feature of the production complexity.

Compare the forward and reverse parsings of the following: 
and the parsing of:

Often when we look at strings we are inclined to identify the structure in these, we automatically seek out, look for patterns. But order and disorder are at opposite ends of the same scale, a continuum or range of states. Its a bit like being given a tumbler of water. An optimist will describe the tumbler as as half empty when a pessimist will describe it as half full, so each is measuring the state of the tumbler from the opposite ends of the scale. A finite string is a container and into this container we 'pack' patterns. The more patterns and the greater the variety we can pack, the more information the string will contain. 

So the terms structure, regularity, order, repetition, periodicity are all description that the optimist might use to describe the information in a given string, where as un-structure, variety, irregularity, disorder, chaos, aperiodicity would be terms a pessimist would chose from to describe the same string. There are many established techniques for systematically quantifying the periodic components in such objects, the Fourier transform, and autocorrelation functions are two obvious popular methods. But these are not especially sensitive to detecting the state of 'perfect disorder'. indeed randomness is attributed a good deal of mystery, and rather like the Copenghagen interpretation of  quantum theory, many involved in AIT today imbue randomness with a spiritual state. So elusive is this quality that various definitions of randomness have been put forward most couched in terms as fuzzy as the quality they are trying to identify. Ironically, randomness is all about us. It is ubiquitous, the most abundant state.  But being as we are, creatures bound up in  finite world, and focused on pattern and on structure, we purposefully seek out the order from the noise.

Entropy is in fact a measure of disorder, of variety, and ultimately of randomness. And T-entropy is a computable measure sensitive to relative degrees of disorder.

Lets start with the repeating series of 0's, then replacing at irregular positions the 0's for 1's, and observe how the T-information of the string, and T-entropy increase as a result. (Note also the asymmetry in the computed results.)








and so on...


tcalc is not concerned with the actual symbols. I have used 1's and 0's above but we are not restricted to the binary alphabet, nor to using 1's and zeros. tcalc assumes each character to be a byte encoded, so  we may compute the information content of english text directly.

'how much wood could a woodchuck chuck, if a wood chuck could chuck wood?'

'peter piper picked a peck of pickled pepper. if peter piper picked a peck of pickled pepper, where is the peck of pickled pepper peter piper picked?'

A perfect palendrome will exhibit no asymmetry
'able was I ere I saw elba'

Understanding or interpreting the outcome requires some understanding of how alphabet size contributes to the entropy of a message. Hartley (1928) identified that choice was essential to the communication of information. He asserted that for a letter 'a' to carry information there must be a choice of other alternative characters, 'b', 'c', etc, and that it was this choice which which was critical to the conveyance of information. He concluded that the amount of information conveyed  by each symbol was the logarithm of the set size. If the alphabet is A={a,b,c....} then #A is the cardinality of the set and the information conveyed is log_2(#A) (bits) or ln(#A) (nats).

Hartley didn't quite get it right. and it was another 20 years before Shannon amended the Hartley result to give us the foundations of classical IT. However we have already demonstrated that a single digit carries information, and that a that a finite run of a repeating symbol carries an amount of information that is a function of the length of the run. In a sense one run is distinct from another by its length and so presents a level of choice which Hartley had not contemplated. In other words, not only may exercise a choice in relation to the symbol, but also choice in terms of the repeating of the symbol. Its clear that the choices extend ultimately to the arrangement and combinations of symbols.

Shannon reflected on the fact that certain symbols and combinations of symbols occurred with differing frequency and that the relative frequency of occurrence of patterns mitigated the actual quantity of information being conveyed. The n-block entropy is in effect a measure of the likelihood of each of the n-symbol combinations that appear in a message, the more likely the pattern, the higher its associated probability, the less uncertain it is, the less information it carries.
Taking a string as an object to be analysed in this way, firstly requires one to know what the full extent of the alphabet is at the outset. It also overlooks the fact that the communication of a message and ths the interpretation of the received symbols occurs serially and thus the early symbols take a natural precedence over later received symbols. Its clear that a production history reflects the serial nature of the communications process.

Whereas a probabilistic model will attribute an amount of information to a given pattern irrespective of its 'timing' or placement in the message, Det-IT on the other hand attributes implicitly varying amounts of information according
to the timing and placement of patterns and symbols. We have already seen evidence of this in the above examples.

Det-IT has an advantage that the full extent of the alphabet need not be known from the outset, and may be discovered in the course of the message. 

Consider a message which comprises a listing of the alphabet symbols: 
and if we compare the total information conveyed by adding a tenth symbol  
we see that the last character added a total of around 4 bits (note that log_2(10) = 3.3 bits), but also note that the T-entropy for the whole message is 2.4 bits/character on average.

One may envisage then a process by which only new characters ever appear 
'abcdefghijklmnopqrstuvwxyz and so on'
 and then look at the incremental change in the total information to deduce the effective information conveyed as a function of string length N = #A (the size of the alphabet). The next graph plots in BLUE, the information per character for each new character added up to a total of 256 possible characters supported by tcalc.
The RED is the function log_2(N), where N =#A, which is the amount of information per character that Hartley had proposed, and which Shannon's entropy definition attributes as the maximum for symbols issuing for example from a random source. 

It is clear from the graph that  Det-IT and classical-IT are very similar in attributing information in relation to the alphabet size, but that classical IT 'over-estimates' the amount compared with DET-IT. 

We shall return to consider the reasons why this is the case, in the section below on RANDOMNESS, but it won't be altogether a surprise to find that Det-IT for sample strings from a source and Shannon's entropy for the source are related by but a modest scale factor. The value of this scaling 'constant' will be of especial interest.

The advantage to using a production complexity such as in Det-IT, is that it can discover new alphabet symbols as it goes, and make sensible adjustments for the appearance of each new symbol in the computed outcome. In probability models, much has been made in the literature of the difficulty of accommodating the appearance of new and unexpected symbols and as a rule any kind of transient is awkward to handle. 

It should be clear then that tcalc will see strings like 
as being equivalent, they are both binary strings because each is drawn from an alphabet comprising 2 distinct characters. The appearance of new symbols and new and transient patterns is handled transparently by tcalc, for example, introducing the letter c as the twentieth (last) position is treated as just another t-augmentation step  (I'm assuming forward parsing of the string), and adds exactly 1 (taug) to the complexity. 
EXAMPLE 2: Rationals and irrationals
Strings may be interpreted as expansions of numbers in the unit interval [0,1] if we assume that the period is always shifted to the left most position in relation to the digits. The decimal expansions of pi, sqrt(2), sqrt(5) and many other irrationals are known to be statistically indistinguishable from random data. Taking the first 100 digits we 
[UNIX bourne shell command: echo "scale=99; 4*a(1)" | bc -l | tr -d "\n" | tr -d "." | tr -d "\\" ]




Its easy to construct irrationals that do not have high entropy, for example the string whose digits repeat in a growing series ensures no periodicity and though all digits appear with equal frequency such a string necessary will have a low entropy compared with pi, sqrt(2), and sqrt(5)

expansions of rationals have on the other hand very low entropy by comparison. As in example 1 the length of repeating series together with the string size determine the
value of the entropy so the two strings 1/5 and 1/7, for example, for given length 1/7 has a higher entropy than 1/5 but both will asyptotically approach 0 (bits/char) as n -> infinty 




EXAMPLE 3:  insensitivity to encoding alphabet used

One might reasonably expect that the information contained in a number will be essentially independent of the radix (base) of the expansion, especially in the case of 'long' strings.  (We shall in due course come to understand what is implied by the term 'long' string, but for the moment we may start by taking the first 100 digits of pi and re-expressing these in binary, octal and hexadecimal [UNIX bourne shell command: BASE=8; echo "scale=99; obase=$BASE; 4*a(1)" | bc -l | tr -d "\n" | tr -d "." | tr -d "\\" ]  The complexity and information results are very similar (remember the longer the sample the more similar the result) where as the entropy is reduced because there are  now 330 chars (bits) whereas before there were 100 characters.

radix  2:
'1100100100001111110110101010001000100001011010001100001000110100110001001100011001100.... etc for 331 bits'

radix  8:
'311037552421026430215142306305056006701632112201116021051476307200202737246166116331045051202...  etc'

radix 10:

radix 16:

'1100100100001111110110101010001000100001011010001100001000110100110001001100011001109.... etc for 331 bits'

'recoding the binary string with 0 -> 't ' and 1 -> 'hd ', thus : hd hd t t hd t t hd t t t t hd hd hd hd hd hd t hd hd.... etc '

(For these short examples the variation is understandably much higher than for long strings, for example if one takes the first 1000 digits of pi we have the following results)


The first 64 bits of the omega number

compared with pi

For 64 bit strings we have an mean of 52.84 bits T-information and a deviation of (calculated from 35000 sample strings) of 4.2306 bits/char what this means is that the distribution is really quite broad relative to the 

T-complexity/T-information may be used to identify the degree of randomness of finite strings. There are several important observations that need to be appreciated in order to do this 1) The complexities of all strings of a given a length n, are approximately normally distributed. 2) The information content of the strings is thus also approximately normally distributed.  3) The width of the information distribution is approximately sqrt(n), i.e., the distribution is roughly that of the binomial  distribution, in the limit as n -> infinity. And the distribution becomes relatively speaking  narrow as n increases 
that is sqrt(n)/n -> 0  in the limit n -> infinity. 4) The mean of the binomial distribution is half the maximum; And 75% of all strings have complexity/information values that  lie within the "half-power" points in the normal distribution.   5) the mean of the T-complexity/information distributions are a function of n, and tend to the binomial distribution as 
n -> infinty.

Thus to determine if a string of length n is random or not, we may compare its t-complexity/tinformation against the  
distribution for the corrsponding strings of length n. 

Since, for any significant length n, it is not possible to exhaustively compute all t-complexities we are limited to sampling the distribution.  I have done this for strings ranging in length up to 10,000,000 bits


Shannon assumed the maximum information per symbol in a random string to be 1 (bit)/(bit). In fact Shannon's entropy is an expectation for the enssemble. If you pick a string at random, its information content is almost certainly to lie "close" to the mean, i.e.,
within sqrt(n) of the mean, which itself tends towards n/2. 


Normalising the T-entropy to give a mean value of 1, then the T-entropy of strings with entropy less than this, will be highly 
indicative of the corresponding Shannon source entropy, H_S. This has been verified using the logistic map as an variable rate information source.
Have a look here. It has been shown that the Shannon entropy of the binary encoded symbolic dynamic, created from the coarse 
grained  generating bi-partition, is a function of the control parameter r, and H_S ranges through 0 through to 1 on the normalised scale.

Applying the T-entropy measure to the same range of sample strings we find  the Shannon-normalised T-entropies are 
experimentally corroborated in the range.

sections to be completed:




LOGISTIC ZIPPER TRANSFORM,%20if%20a%20wood%20chuck%20could%20chuck%20woodhttp://,%20where%20is%20the%20peck%20of%20pickled%20pepper%20peter%20piper%20pickedhttp://
Tutorial: Det-IT
pages in flux