Analytical Flory Random Coil

The Analytical Flory Random Coil (AFRC) is the central model of the package, exposed through the AnalyticalFRC object. It reproduces the dimensions of a polypeptide behaving as an ideal (Gaussian) chain - one in which the apparent Flory scaling exponent is \(\nu = 0.5\), analogous to a chain in a \(\theta\)-solvent - and, unlike a finite self-avoiding chain, it carries no finite-size (“dangling-end”) effects. It is a parameter-free reference model: it is fully determined by the amino acid sequence and has nothing to fit. The model and its parameterisation are described in Alston et al. (2023).

Origin: numerical Flory Random Coil simulations

The AFRC is a closed-form fit to numerical Flory Random Coil (FRC) ensembles. Those ensembles are generated with all-atom Monte Carlo using Flory’s rotational isomeric state (RIS) approximation: at each step a residue is chosen at random and its backbone dihedrals (\(\phi, \psi\)) are reassigned to one of a precomputed set of residue-specific allowed states, drawn from all-atom Ramachandran maps. The moves are rejection-free - only sterically allowed local dihedrals are proposed and the resulting global conformation is accepted unconditionally, with no through-space (chain-chain, chain-solvent, or chain-self) interactions of any kind.

Two consequences follow, and together they are what make an analytical description possible:

  • The chain is ideal. Because every monomer is “agnostic” to its surroundings, both global and internal dimensions scale with an apparent exponent of \(\nu = 0.5\), exactly as expected for a Gaussian chain in a \(\theta\)-solvent.

  • There are no end effects. Terminal residues sample the same conformational space as internal ones, so the “dangling-end” finite-size deviations seen in finite self-avoiding chains are absent. The internal scaling profiles for chains of every length superimpose, which means a single set of closed-form expressions can describe the chain over all length scales.

From simulations to a closed-form model

Residue-specific prefactors are fit once, against homopolymer FRC simulations, and then used analytically:

  • The root-mean-square inter-residue distance follows \(\sqrt{\langle r_{ij}^2 \rangle} = A_0\,|i-j|^{\nu}\); fitting the FRC internal scaling profiles yields a per-residue prefactor \(A_0\) (the \(R_0\) / \(R_0^{\mathrm{rms}}\) constants in this package).

  • The radius-of-gyration prefactor \(X_0\) is fit so that the analytical Lhuillier distribution matches the numerically generated \(P(R_g)\).

Because the RIS construction treats each residue independently, heteropolymers are handled by taking a composition-weighted average of these homopolymer prefactors. This generalisation was validated against FRC simulations of hundreds of heteropolymeric sequences (10-500 residues), reproducing both end-to-end and \(R_g\) distributions with sub-angstrom accuracy.

Mathematical formalism

End-to-end distance. The end-to-end distribution is the standard Gaussian chain result

\[P(r) = 4\pi r^2 \left( \frac{3}{2\pi \langle R_e^2 \rangle} \right)^{3/2} \exp\!\left( -\frac{3 r^2}{2 \langle R_e^2 \rangle} \right),\]

where the root-mean-square size follows the ideal-chain scaling law

\[\sqrt{\langle R_e^2 \rangle} = R_0^{\mathrm{rms}}\, N^{1/2}.\]

The prefactor \(R_0^{\mathrm{rms}}\) is the composition-weighted average of the per-residue \(A_0\) constants described above. An analogous prefactor \(R_0\) gives the mean end-to-end distance, \(\langle R_e \rangle = R_0\, N^{1/2}\).

Radius of gyration. The \(R_g\) distribution uses the analytical fractal-polymer form of Lhuillier (1988):

\[P(R_g) \propto N^{-\nu d}\, \frac{\rho}{N^{\nu}} \exp\!\left[ -\left(\frac{N^{\nu}}{\rho}\right)^{\alpha d} -\left(\frac{\rho}{N^{\nu}}\right)^{\delta} \right],\]

with \(\rho = X_0\, R_g\), dimensionality \(d = 3\), \(\nu = 1/2\), \(\alpha = 1/(\nu d - 1) = 2\), and \(\delta = 1/(1-\nu) = 2\). The composition-weighted prefactor \(X_0\) again comes from the calibrated per-residue table. The mean radius of gyration can be taken either as the expectation of this distribution or from the ideal-chain relation \(R_g = \langle R_e \rangle / \sqrt{6}\).

Hydrodynamic radius. \(R_h\) is available either from the Kirkwood-Riseman relation applied to the full inter-residue distance map, or from the empirical \(R_g \to R_h\) conversion of Nygaard et al. (2017).

Because the model also exposes every inter-residue distance, it additionally provides distance maps, contact-fraction maps, and per-residue PRE profiles for the same theta-state reference.

Behaviour and relationship to other models

By construction the AFRC behaves like a nu-dependent SAW evaluated at \(\nu = 0.5\); the two distributions sit essentially on top of one another. Relative to the other reference models in this package, the AFRC is slightly more expanded than the worm-like chain (at a persistence length of 3 Å) and substantially more compact than the good-solvent self-avoiding walk (\(\nu \approx 0.588\)). It therefore occupies the theta-point between the collapsed and fully solvated extremes.

Intended use

Note

The AFRC is a reference (null) model, not a predictor of unfolded-protein dimensions. Real dimensions depend on sequence-encoded chain-chain and chain-solvent interactions that the AFRC deliberately omits. Its value is as a fixed, sequence-matched touchstone: deviations of a simulation or experiment from the AFRC are a direct readout of sequence-specific intramolecular interactions, and normalising to the AFRC lets chains of different lengths and compositions be compared on a common footing.

Parameters

The AFRC is deliberately parameter-free: the per-residue calibration constants (\(R_0\), \(R_0^{\mathrm{rms}}\), \(X_0\)) are fixed and the only sequence input is composition and length. There is consequently nothing to tune.

Argument

Default

Meaning and typical values

adaptable_P_res

False

Numerical only. If True the distribution grid spacing is set to \(d_{max}/500\) (with \(d_{max} = 3.7N\)) rather than the fixed 0.05 Å. Does not change the model, only its discretisation.

What to expect for a protein. The apparent scaling exponent is \(\nu^{app} = 0.5\) by construction. With \(R_0 \approx 6\) Å, a disordered region of \(N\) residues has \(R_e \approx 6\sqrt{N}\) Å and \(R_g \approx R_e/\sqrt{6} \approx 2.5\sqrt{N}\) Å. Real intrinsically disordered regions scatter around these values: in the original study the ratio of simulated/measured to AFRC dimensions ranged from roughly 0.7 (more compact) to 1.4 (more expanded), so the AFRC is best read as a theta-point touchstone rather than a strict bound.

Citations

  1. Alston, J. J., Ginell, G. M., Soranno, A., & Holehouse, A. S. (2023). The Analytical Flory Random Coil is a simple-to-use reference model for unfolded and disordered proteins. The Journal of Physical Chemistry B, 127(21), 4746-4760. https://doi.org/10.1021/acs.jpcb.3c01619

  2. Flory, P. J. (1969). Statistical Mechanics of Chain Molecules. Wiley-Interscience.

  3. Mao, A. H., Lyle, N., & Pappu, R. V. (2013). Describing sequence-ensemble relationships for intrinsically disordered proteins. Biochemical Journal, 449(2), 307-318.

  4. Lhuillier, D. (1988). A simple model for polymeric fractals in a good solvent and an improved version of the Flory approximation. Journal de Physique, 49(5), 705-710.

  5. Rubinstein, M., & Colby, R. H. (2003). Polymer Physics. Oxford University Press.

  6. Nygaard, M., Kragelund, B. B., Papaleo, E., & Lindorff-Larsen, K. (2017). An efficient method for estimating the hydrodynamic radius of disordered protein conformations. Biophysical Journal, 113(3), 550-557.

  7. Kirkwood, J. G., & Riseman, J. (1948). The intrinsic viscosities and diffusion constants of flexible macromolecules in solution. The Journal of Chemical Physics, 16(6), 565-573.