Rodrick Wallace

Correspondence: Rodrick Wallace Wallace@nyspi.columbia.edu

**Author Affiliations :**

Division of Epidemiology, the New York State Psychiatric Institute, USA.

This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/3.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

**Background:** Recent studies indicate that intrinsically disordered proteins (IDP) do not preferentially bind to chaperones in vivo,
suggesting that their role is to either prevent pathological conformations, or to aid in the assembly of large complexes. This, in turn,
suggests that large IDP complexes must form under the control of a sophisticated regulatory system of chemical cognition – in effect,
an information catalysis – that, while it may have evolutionarily exapted existing chaperones, is likely to have evolved other,
specialized, mechanisms or modalities of process modulation.

** Methods:** We model this using recently developed 'statistical' approaches from information theory that exploit the fact that
information is itself a form of free energy.

**Results:** Information catalysis is found to arise directly from the 'chain rule' of information theory, via a statistical mechanical
argument in which metabolic free energy both powers the transmission of information and applies available entropy as a tool to
correct malformed complexes by local heating.
Conclusions The regulatory mechanisms or modalities may well be IDP's within the complexes themselves, an extension of the
chaperone concept explaining the observed prevalence of disorder in large complexes.

** keywords:** catalysis, information theory, rate distortion, regulation

The observation by Hegyi and Tompa [1] that intrinsically disordered proteins (IDP) display no preference for chaperone binding in vivo is striking: IDP are extremely sensitive to proteolysis in vitro, but show no enhanced degradation rates in vivo. Inferring the general from the particular, they suggest that its primary reason is not the assistance of folding, but promotion of assembly with partners, since IDP's that bind to chaperones tend to bind to other proteins as well. These results might promote the idea of the extension and generalization of the chaperone concept. It seems, then, appropriate to suggest that one prime reason for IDP interaction with chaperones is to prevent amyloid formation. Others may be transport through physiological membranes and assistance for partner binding, i.e., assembly of complexes.

Hegyi and Tompa note that IDPs have been observed in vitro to be very effective in binding, primarily manifested in binding to their partners at an increased speed. Their avoidance of chaperones, in general, may be related to this. When they do bind to chaperones, however, the reason might be that in vivo assembly of large complexes may be slowed by non-specific interactions, in the case of which chaperone assistance may be of help.

For small – rapidly binding – complexes, Wallace [2] has described IDP reaction dynamics via a statistical mechanics approach to a 'symmetry spectrum' derived from a groupoid generalization of the wreath product of groups [3] that characterizes 'conventional' nonrigid molecule theory [4,5]. For large complexes, however, it seems likely that an even more general approach will be needed, one that reflects the operation of an elaborate regulatory system of chemical cognition analogous to what has been used to describe the immune system [6,7] or higher order neural and social function [8,9]. We will suggest that, while 'ordinary' chaperones may have been evolutionarily exapted (in the sense of Gould [10]) into regulatory function for large IDP complexes, other, less familiar, molecular regulators might well remain to be found. Tompa and Csermely [11], for example, have even gone so far as to suggest that IDPs within complexes may serve as self-chaperones, a significant generalization of the chaperone concept.

We begin with some formal development, leading to the idea of cognitive control in large complexes. From the perspective of Atlan and Cohen [6], who introduce a cognitive paradigm for the immune system, cognition involves comparison of a perceived signal with an internal, learned or inherited, picture of the world, and, upon that comparison, choice of a single response from a larger repertoire of possible responses. This inherently involves the transmission of information, since choice always necessitates a reduction in uncertainty ([12], p.21). Such cognition is, in a sense, routine, since even a thermostat would be cognitive from this perspective. The essential point is that large enough biological structures can follow a large multiplicity of possible 'reaction paths', and focus must thereupon shift from the details of the chemical machinery itself to the details of its behavior in the context of impinging external signals.

Symbolic dynamics of IDP complex formation
Symbolic dynamics is a 'coarse-grained' perspective on
physical systems that discretizes their time trajectories in terms
of dynamically accessible regions so that it is possible to do
statistical mechanics on symbol sequences ([13], Ch. 8) that
can be said to constitute an 'alphabet'. Within that 'alphabet',
certain 'statements' are highly probable, and others far less
so. The simple (ideal) oscillating reaction described by the
equations
*X-Y* plane into two components, the simplest possible coarse graining, calling the halfplane to the left of the vertical *Y* axis *A* and that to the right *B* . This system, over units of the period
*A*'s and *B*'s having a very precise grammar and syntax: ABABABAB...
Many other such statements might be conceivable, e.g.,
AAAAAA..., BBBBB..., AAABAAAB..., ABAABAAAB...,
and so on, but, of the infinite number of possibilities, only one is actually observed, is 'grammatical'.

More complex dynamical reaction models, incorporating diffusional drift around deterministic solutions, or elaborate structures of complicated stochastic differential equations having various domains of attraction – different sets of 'grammars' – can be described by analogous means ([14], Ch.3).

Rather than taking symbolic dynamics as a simplification of more exact analytic or stochastic approaches, it is possible to generalize the technique to more comprehensive structures. Complicated cellular processes may not have identifiable sets of stochastic differential equations like noisy, nonlinear mechanical clocks, but, under appropriate coarse-graining, they may still have recognizable sets of grammar and syntax over the long-term. Proper coarse-graining may, however, often be the hard scientific kernel of the problem.

The fundamental assumption for complicated biological
reactions like the formation of large IDP complexes is that
reaction trajectories can be classified into two groups, a very
large set that has essentially zero probability, and a much
smaller 'grammatical' set. For the grammatical/syntactical set,
the argument is that, given a set of elaborate trajectories of
length *n* , the number of grammatical ones,* N(n)* , follows a limit law of the form

such that * H* both exists and is independent of path. If
convergence occurs for some finite

The basic argument is shown in figure 1, where an initial IDP/
partner configuration, * S_{0}* , can either converge on a normal
large IDP complex

The astute observer will have noted that we are, via coarsegraining
and symbolic dynamics, assigning classic information
sources to the two sets of thermodynamically competitive
'grammatical' pathways. The essential question is how a
regulatory catalysis – the generalized chaperones of Hegyi
and Tompa [1] – can act in such a circumstance to raise the
probability of convergence on **S _{f}**

**The dual information source of a cognitive regulatory
process**

The first step in answering that question lies in describing the activity of a large class of regulatory chaperone activity in terms of another information source. To reiterate, Atlan and Cohen [6], in the context of a study of the immune system, argue that the essence of cognition is the comparison of a perceived signal with an internal, learned picture of the world, and then choice of a single response from a large repertoire of possible responses. Such choice inherently involves information and information transmission since it always generates a reduction in uncertainty. Thus structures that process information are constrained by the asymptotic limit theorems of information theory, in the same sense that sums of stochastic variables are constrained by the Central Limit Theorem, allowing the construction of powerful statistical tools useful for data analysis.

More formally, a pattern of incoming input * S_{i}* describing
the status of the IDP/partner configuration – starting with the
initial state

for some unspecified function *f* . The a* _{i}* are seen to be
very complicated composite objects, in this treatment that
we may choose to coarse-grain so as to obtain an appropriate
'alphabet'.

In a simple spinglass-like model, **S** would be a vector, **W** a
matrix, and *f* would be a function of their product at 'time' *i* .
The path *x* is fed into a highly nonlinear decision oscillator,* h* a 'sudden threshold machine' pattern recognition structure,
in a sense, that generates an output ** h(x)** that is an element
of one of two disjoint sets

Assume a graded response, supposing that if
* B_{1}* would represent the final state of
the large IDP complex, either normal or in some pathological
conformation, that is sent on in the biological process or else
subjected to some attempted corrective action. Corrections
may, for example, range from activation of some heat shocklike
protein repair to more drastic clean-up attack.

The principal objects of formal interest are paths ** x** triggering pattern recognition-and-response. That is, given
a fixed initial state

Figure 1: **An initial IDP/partner configuration S**_{0}** can
either converge on a normal large IDP complex S**_{f}** via
the set of high probability reaction paths to the left of the
filled triangle, or it can converge to a thermodynamically
competitive pathological state S _{path} to the right.**

Again, for each positive integer *n* , let *N(n)* be the number
of high probability grammatical and syntactical paths of
length n which begin with some particular *a _{0}* and lead to the
condition

While the combining algorithm, the form of the nonlinear
oscillator, and the details of grammar and syntax, can all
be unspecified in this model, the critical assumption that
permits inference of the necessary conditions constrained
by the asymptotic limit theorems of information theory is
that, again, the finite limit
*x*.

Call such a pattern recognition-and-response cognitive process *ergodic*. Not all cognitive processes are likely to be ergodic in this sense, implying that ** H**, if it indeed exists at all, is path dependent, although extension to nearly ergodic processes seems possible [9].

Invoking the spirit of the Shannon-McMillan Theorem,
as choice involves an inherent reduction in uncertainty,
it is then possible to define an adiabatically, piecewise
stationary, ergodic (APSE) information source **X** associated
with stochastic variates *X _{j}* having joint and conditional
probabilities

This information source is defined as dual to the underlying ergodic cognitive process.

Adiabatic means that the source has been parameterized according to some scheme, and that, over a certain range, along a particular piece, as the parameters vary, the source remains as close to stationary and ergodic as needed for information theory's central theorems to apply. Stationary means that the system's probabilities do not change in time, and ergodic, roughly, that the cross sectional means approximate long-time averages. Between pieces it is necessary to invoke various kinds of phase transition formalisms, as described more fully in e.g., [8].

**Information catalysis**

In the limit of large *n* ,
*V* and partition function Z( β ) derived from the system's
Hamiltonian – the energy function – at temperature β is [6]

with

Information catalysis, in the circumstance of figure 1, arises
most simply via the 'information theory chain rule' [15]. Given
*X* as the information source representing the reaction
paths of figure 1, and *Y* , an information source dual to the
sophisticated chemical cognition of the generalized chaperone
mechanisms of Hegyi and Tompa [1], one can define jointly
typical paths

Of necessity, then,

These relations imply that, by means of the identification
of information as a form of free energy, at the expense of
adding the considerable energy burden of the regulatory
apparatus, represented by its dual information source
*Y* , it becomes possible to canalize the reaction paths of
figure 1, so as to make the pathways **S _{0} →S_{f}** far more
probable than those to the right of the filled triangle.

That is, by raising the entire reaction free energy landscape
corresponding to **H(X)** by the amount **H(Y)** it becomes
possible to deepen the energy channel leading to **S*** _{f}* at the
expense of the one leading to

Within a cell, however, there will be an ensemble of possible
reactions, driven by available metabolic free energy, so that,
taking

Typically, letting ** M** represent the intensity of available
metabolic free energy, one would expect, in the standard
manner of statistical mechanics, that

leading to an estimate for the mean value of * H* as

where **κ**, an inverse energy intensity scaling constant, is
expected to be quite small indeed, a consequence of entropic
translation losses between metabolic free energy and the
expression of information. Thus * H/ κM* can become very
large, and the integral converges, in this approximation.
The resulting expression,

suggests an explicit free energy mechanism for reaction canalization, at the considerable expense of maintaining an embedding regulatory environment.

That is, quite counterintuitively, entropic loss – small **κ** – can be a powerful tool for regulating complex biological
phenomena, in much the same sense that Tompa and Csermely
[11] propose that entropy transfer can be used by generalized
chaperones to trigger proper conformation in pathologically
folded protein complexes.

**The entropy transfer model**

As noted, Tompa and Csermely [11] have suggested a localized heating model for IDP chaperone activity likely to be of considerable importance in large complex formation. The basic idea is that a large complex can become trapped in a local free energy minimum far from the physiologically required conformation. An IDP chaperone, having manyto- one binding flexibility, can 'cool' itself by binding to the misformed complex, and transferring entropy to kick the complex out of its misformed state, allowing it to continue a global search for the proper shape. The basic relation is the classic

where ** S** is the entropy change,

In general, transmission of information in real systems is
not without error, and it is useful to ask about the minimum
average error desired under a specific level of transmission
channel noise, and the minimum channel capacity needed to
achieve that average distortion, say * D* . The Rate Distortion
Theorem shows that, for all possible distortion measures
(surprisingly), there is a minimum channel capacity,

All possible such expressions can be shown to be convex in
* R* and

Given that the cognitive chaperone system wishes to
achieve a minimum distortion * D* in the complex it chooses
to heat up, where does the metabolic free energy to direct
cognition come from, or, more relevant, how much is needed,
and, once used for cognitive purposes, where does it go? The
Second Law of thermodynamics requires that free energy
transfer (almost) always involves losses due to heating
entropy increase.

Again, the simplest model, given an available metabolic
free energy intensity ** M** , is that the probability density for
a particular value of the cognitive channel capacity

for an appropriate – likely very small – scaling constant
**κ** , so that, as before,

Thus

and demands for metabolic free energy can rise rapidly with required channel capacity, or its equivalent, a lessening of average distortion between what is wanted and what is observed by the cognitive regulatory system – the generalized chaperones.

The essential inference is that a most elegant and parsimonious use of the waste energy necessarily generated by such a cognitive process would be to heat large molecular complexes trapped at some malconformed intermediate – the proposed entropy transfer mechanism of Tompa and Csermely [11].

Wallace [2], using a groupoid extension of conventional nonrigid molecule theory, introduced a literally astronomically large spectrum of possible symmetry classifications for small IDP/partner complexes. The size of the appropriate symmetry group (or groupoid) must grow exponentially in the number of amino acid bases within the flexible IDP frond. For 30 to 100 bases, the non rigid symmetry set is indeed astronomical, and can only be addressed by a statistical mechanics argument. The understanding of large IDP/partner complexes, as inferred from the work of Hegyi and Tompa [1], faces even greater difficulties, since it appears compounded by a generalized chaperone regulatory structure that is likely to be another example of sophisticated chemical cognition, akin to the immune system. Given the Wallace results, cognitive biochemical processes regulating large IDP/partner complexes are not likely to yield to exact 'chemical' description, not only from considerations of symmetry group magnitude, but because their dynamics are particularly contingent on signals that may themselves arise from higher level, embedding, cognitive regulatory processes. However, such behaviors, in terms of the dual information source, are nonetheless constrained by the asymptotic limit theorems of information theory, and this may allow construction of regression model-like statistical tools useful for scientific inference, focusing on the behaviors of the chaperone system rather than on a detailed description of its mechanical function under all circumstances. The analogy is to describe the behavior of a computer in terms of its program, rather than attempting provide a full cross-sectional description of the state of each logic gate after each clock cycle.

Known chaperones may have undergone evolutionary exaptation – cooption of one adaptation/correlation for another purpose [10] – so as to contribute to regulating large IDP/partner complexes, as Hegyi and Tompa [1] speculate. This does not preclude the evolution or exaptation of other processes, mechanisms, or chemical species, into a similar role. That is, there may well be many other generalized chaperones for the control of large IDP/partner complexes.

Indeed, following Tompa's lead, a direct argument can be made as follows:

Hegyi *et al*. [21], found that larger complexes need to have
more disorder for successful assembly. Tompa and Csermely
[11] directly stated that IDPs can themselves be chaperones. It is
not at all far-fetched that they might also be involved in the
assembly of large complexes. This suggests that some measure
of self-chaperoning is provided by IDPs within complexes
themselves, a significant extension of the chaperone concept in the spirit of the information catalysis we have suggested
here, and would naturally explain the prevalence of disorder in
large complexes, serving as an internal chaperoning element.

There are no competing interests.

The author thanks P. Tompa for useful discussions.

Received: 15-Feb-2012 Revised: 02-Aug-2012

Accepted: 09-Aug-2012 Published: 14-Sep-2012

- Hegyi H, Tompa P:
**Intrinsically disordered proteins display no preference for chaperone binding in vivo**.*PLoS Comput Biol*2008;**4**;(3.);e1000017. | Article | PubMed Abstract | PubMed Full Text - Wallace R:
**Spontaneous symmetry breaking in a non-rigid molecule approach to intrinsically disordered proteins**.*Mol Biosyst*2012;**8**;(1.);374-7. | Article | PubMed - Houghton CH:
**Wreath Products of Groupoids**.*Journal of the London Mathematical Society*1975;**s2-10**;(2.);179-88. | Article - Longuet-Higgins HC:
**The symmetry groups of non-rigid molecules**.*Molecular Physics*1963;**6**;(5.);445-60. | Article - Balasubramanian, K:
**The symmetry groups of nonrigid molecules as generalized wreath products and their representations**.*J. Chem. Phys*1980; 7;665-77. | Article - Atlan H, Cohen IR:
**Immune information, self-organization and meaning**.*Int Immunol*1998;**10**;(6.);711-7. | Article | PubMed - Cohen I:
**Tending Adam’s Garden: Evolving the Cognitive Immune Self**, Academic Press, NY, 2000. | Book - Wallace R:
*Consciousness: A Mathematical Treatment of the Global Neuronal Workspace Model*, Springer, NY, 2005. - Wallace R, Fullilove M:
*Collective Consciousness and Its Discontents*, Springer, NY, 2008. - Gould S:
**The Structure of Evolutionary Theory**, Harvard University Press, Cambridge, MA, 2002. | Book - Tompa P, Csermely P:
**The role of structural disorder in the function of RNA and protein chaperones**.*FASEB J*2004;**18**;(11.);1169-75. | Article | PubMed - Ash R:
*Information Theory*, Dover Publications, NY, 1990. - McCauley J:
**Chaos, Dynamics and Fractals: An algorithmic approach to deterministic chaos,**Cambridge University Press, NY, 1993. - Beck C, Schlogl F:
**Thermodynamics of Chaotic Systems**, Cambridge University Press, NY, 1995. | Book - Cover T, Thomas J:
**Elements of Information Theory, 2nd Edition**, Wiley, New York, 2006. - Landau L, Lifshitz E:
**Statistical Physics, Part I**, Elsevier, NY, 2007. - Feynman R:
**Lectures on Computation**, Westview Press, NY, 2000. - Bennett C: Logical depth and physical complexity. In: Herkin R (ed.):
**The Universal Turing Machine: A Half-Century Survey**, Oxford University Press, 1988, 227-257. - Rockafellar R:
**Convex Analysis**, Princeton University Press, Princeton, NJ, 1970. - Ellis R:
*Entropy,***Large Deviations, and Statistical Mechanics**, Springer, NY, 1985. - Hegyi H, Schad E, Tompa P:
**Structural disorder promotes assembly of protein complexes**.*BMC Struct Biol*2007;**7**;(65. | Article | PubMed Abstract | PubMed Full Text

Volume 1

Wallace R: **Information catalysis and intrinsically disordered proteins in large complexes**. *journal of Proteome Science and Computational Biology* 2012, **1**:4 http://dx.doi.org/10.7243/2050-2273-1-4

View Metrics

Copyright © 2015 Herbert Publications Limited. All rights reserved.

Post Comment|View Comments