Home Page
Mission of the Research Institute on Human Evolution
Send message to HumanEvol
Search for all documents
Search for all authors
Versión en Español


OTHER PUBLISHED WORKS

Biology: An old perspective
The sentient force. Teilhard de chardin and the new science
Life and consciousness: Phenomena originated by the electromagnetic unity?
Evolution of the man
The sentient universe. In search of the theory of cosmic evolution

COVERS

1   2  

BULLETINS

1   2  

Genes: Only dna chain segments, or functional units of information?

Jorge Barragán
 



INTRODUCTION

The genome capacity self-organize, or to guide the organism self-organization, is a condition that respects any model of the same one (1) (2) (3) (4) (5). Even the same human genome project, assumes that the organism genes set demands for himself such condition (6) (7) (8) (9) (10). The model whose accuracy will be evaluated, is the one that happens of the atractor call theory, developed by Stuart Kauffman. (1) One is a model that leaves from the base to consider that a proportional relation between the amount of DNA or value "c" exists in the alive beings, and the genes number that they have (all models do not do it) (2) (3) (9) (10). The genome would behave like a NK2 type random boolean network. That means, networks of "n" elements, in which the one state anyone of them depends on (or it is determined by) been on other two elements of the network (K=2). In this network type, the order emerges from spontaneous way. The system will display in addition, the capacity to draw for disturbances (homeostasis). The self-organization and the homeostasis, are two fundamental characteristics of the biological physical systems (the alive beings) (11). It is remarkable that if the network outside type N equal K (N=K) , its behavior would be chaotic, and if outside NK1 "would crystallize" in a high degree of order that it would prevent to draw for disturbances him (to adapt to the possible disturbances). The systems of this type, become stabilized (they cross all his possible expression guidelines) after an cycles number equal to the square root of "n" (n). This allows to the model, with base in the number of genome elements, to predict the different cellular types number that they could appear in the organism. The system crosses in each cycle a certain "river basins" of attraction (one of its possible expression guidelines), and crosses all its atractor (the total river basins set that integrates it) after cycles number whose value is square root of n (n). (1)


OBJECTIVES

To verify if the observed thing in the reality, agrees with the hoped thing by the model predictions as far as the different possible cellular types number in the organism.

To analyze the relation between the genes number found by the human genome project, and the different cellular types number whereupon we counted in the organism, such are the quantitative aspects of the human genome project whose analysis refers the work title.

To propose a new concept of gene, like a functional unit of information, beyond the classical definition of "DNA chain segments that codify the information for the polypeptides synthesis (structural molecular units)".


DESIGN AND RESULTS

The model´s data, as far as genes number and the one of different possible cellular types, will be compared with observed in the human genome project. In order to evaluate an aspect of model´s accuraccy as it is his calibration, and the relation between different cellular types number and the findings from the genome project, the test of Hosmer-Lemeshow will be used. The model (1) supposes a base of 100,000 to 136,900 genes (value of "n"), reason why if different cellular types number is square root of n, the same one would go up to around a value between the 316 and 370 different cells types in the organism. The data coming from the human genome project, confirm that we would count on genes number included between the 30,000 and the 40.000 (12). It happens of real identification and genes count, and not of its "mathematical estimation by model". Following the formula n = RB, in where n is the square root of the system number elements (genes), and RB is the possible river basins number of attraction (different cellular types number), is possible to hope that different cellular types number in the organism is included between 173 and 200.

The different cellular types number that they are in the organism, is of 200 to 254 according to the last counts that the histology offers (13) (14) (15) (1). One is an interest data, at time of evaluating the model´s predictions, and the human genome project´s estimations. The statistical analysis sample that the model is valid (p<0.005), in the same way that the prediction on different cellular types number from the genome project, also is in the correct magnitude order (p<0.005). However both predictions only are statistical approaches to reality. It isn´t 100,000 the genes number, neither is 173 the different celullar types number.


DISCUSSION

The results demonstrate that the model overestimate the genes number to the supposition that our genome is constituted by about 100,000 elements. The genes number count carried out in the human genome project, gives account of not more of about 30,000 to 40,000 elements. The model predicts then, 316 to 370 different cellular types existence, while that human genome project´s data is deduced that must not more have 173 to 200 different cellular types.

But the formula n = RB (square root of n = RB) is not a capricious application, but the turn out to consider to the genome like a self-regulated network. In order to behave like so, that is to say, as a network able self-organize and to guide the organism self-organization, the system must respond to the characteristics of a NK2 type network. And in that networks type, the river basins (RB) number or possible guidelines of expression is square root of n (n).

This characteristic is not a "theoretical device", but an observation of the reality, a characteristic of the essence or intrinsic nature of our genomic system. And it is to hope that they fulfill so much the model, as the human genome project. But the real number of different cellular types whereupon counts the organism, is 200 to 254. This takes to consider that the model´s evaluation, could show the existence of an anomaly (from an epistemologyc point of view) in the theoretical genetics bases. Identifying and recounting DNA chain segments that codify the information for (16) (17) the polypeptides synthesis, the human genome project adjusts to the gene definition like structural molecular unit carrier of information. But its count of 30,000 to 40,000 genes, takes to wait for 173 to us to 200 cellular types, number that does not respond to the real observation of 200 to 254 different cellular types (200 it isn´t heigth above of different cellular types, but its lower). The analyzed genomic model, supposes the existence of about 100,000 or more of these molecular structural units. Number that does not correspond with the real values found by the human genome project. In addition, the model predicts with base in the false premise to count on about 100,000 (or little more) genes, than different cellular types number would be of 316 or 370. Number that does not agree either with the real count of different cellular types presents in the organism. The model overestimate of different cellular types number that they are possible to be found in the organism. But it not must to that the knowed formula square root of n = River Basins (n = RB) is unsuitable to predict different cellular types number, since the same one gives account of river basins number that constitute the atractor of a NK2 type system like the genome. Of not behaving like so, the genome would not constitute a self-regulated network, and that is a necessary condition as much for the model, as for the molecular findings of the human genome project. The wrong in the prediction of different cellular types number from the organism, must for that the model leaves from the false premise that the system (the genome) counts on 100,000 or more elements (genes). The human genome project however, with its count of structural units molecular carriers of information (genes), underestimates of different cellular types number whereupon in fact the organism counts. Again, the cause of the prediction wrong does not have to look for in the formula square root = RB (n = RB), but in genes number. But as said number it isn´t in this case only a mathematical estimation (only partially), but of the real count of such. The only thing that is possible to think it is that not yet we know true number carrying units of information (genes) whereupon we counted. Since the genes number estimation varies according to the method that is used for it, the present work adopts the following point of view: the only real and reliable data seems to be the different cellular types number whereupon we counted (254), since it is easier to count cells than to count genes. Thus, we can consider the genes number or the genomic elements system, with base in formula square of RB = n (RB2 = n). This way, carrying units of information (genes) number would be from 2542 = 64,516 (square of 254 = 64,516) or 2002 = 40,000 (square of 200 = 40,000). This estimation is near of Wright´s estimation (65,000 genes) (18). But in the same way that is doubtless that we told on 254 different cellular types (1) (13), also is doubtless that the genes count is from 30,000 to 40,000 (12), and less still we doubted that the genome behaves like a self-regulated network (1) (19).

One would be three genome objective characteristics, but opposed to each other. So that arrived east point, it is worth to question itself what conclusions can be extracted of these opposed facts, or better still, will be able to be unified these genomic characteristics under a same and only explanation?


CONCLUSIONS

One possible conclusions is, to assume that since we counted with not more than 30,000 or 40,000 structural molecular units carriers of information (genes), the genome does not behave like a self-regulated NK2 type network, reason why the formula square root of n = River Basins (n = RB) is not valid to predict the possible cellular types number in the organism, and that said number (254) is from some regulation form of the genomic expression until now not completely clarified. A conclusion of this type, beyond adjusting or not to the reality, not only non integred to each other the three facts raised at the end of the discussion, but that been worth whichever much only two of them (genes number found in the human genome project, and different cellular types number recounted by the human histology).

Another possible conclusion is perhaps that, is not correct the value of 254 different cellular types, and the notion is due to review of which we considered functional properties and morphologic differentials between the cells. This would allow to consider like the findings of the human genome project with respect to genes number valid (30,000 to 40,000) and the genomic self-regulation (n = RB), but at the cost of rejecting that we counted in effect, with 200 to 254 different cellular types. Is not an integrating conclusion, and is either desirable to find one that yes is it, since we cannot not know that the genome is a self-regulated network, neither the count of the human genome project, nor different cellular types number whereupon we counted.

Another possible conclusion would be, to consider that the molecular structural units, or the DNA chain segments called genes, are not in carrying themselves of information, but that the same one is from the interaction between products of these molecular structures, so that the genes were in fact functions or carrying functional units of information. In such case, the finding of 30,000 to 40,000 structural units (human genome project), would be compatible with the notion of the genome as a self-regulated network (n = RB), and with the presence of 200 to 254 different cellular types. The prediction of 64,516 (or like few 40,000) functional units of information existence, can be difficult to verify, but it deserves attention, because it is the only conclusion results of the commented ones that it allows to integrate all the mentioned opposed factual facts (there are others reasons for to consider the carring units of information existence), like the consideration of the genome like a self-regulated network, the human genome project results, and different cellular types number whereupon our organism counts. Any way is important to consider that: one thing is to have carrying structural units of information on function, and other different is to account with carrying functional units of information.


REFERENCES

  • 1- Kauffman, S. 1992 Anticaos y adaptación. Scient. Am. nº 184 . 46-53
  • 2- Lee, T. et al. 2002 Transcriptional regulatory networks in Sacharomyces cerevisiae. Science 298: 799-804
  • 3- Mathe, C. et al 2002 Current methods of gene prediction, yhe strengths and weaknesses. Nucleic Acids Research 30 (19): 4103-17
  • 4- Bockhorst, J. Et al 2003 A Bayesian network approach to operon prediction. Bioinformatics 19 (10): 1227-35
  • 5- Larsen, T.S. et al 2003 Easygene a prokariotic gene finder than ranks ORFs by statistical significance. Bioinformatics 4 (1): 21
  • 6- Hlusko, L.J. 2004 Integrating the genotype and phenotye in hominid paleontology. Proc. Natl. Acad. Science USA
  • 7- Moore, J.E. et al 2003 Gene structure predictions in syntenic DNA segments. Nucleic Acids Research
  • 8- Wang, X. Et al 2003 A PCR primer bank of quantitative gene expression analisys. Nucleic Acids Research
  • 9- Bogue, C. 2003 Genetic models in appled physiology. J. Appl Physiol. 94: 2502-9
  • 10- Barr, M.M. 2003 Super Models Physiol. Genomics 13: 15-24
  • 11- Kauffman, S. 1995 At Home in the Universe: The Search for the Laws of Self- organization and complexity. Oxford University Press
  • 12- Venter, J. Craig 2001 The sequence of the human genome. Science 291: 1304-51
  • 13- Skusa, A. and Prank, K 2003 Analyzing intercellular communication networs. ECCB Proceedings
  • 14- Wilsker, D. at al 2002 ARID proteins: Adiverse family of differentiation and development. Cell Growth Differ. 13: 95-106
  • 15- Köller, J. et al 2003 Reconstruction of intercellular communication networks ECCB Proceedings
  • 16- Cravchik, A. et al 2001 Sequence analysis of the human genome in function and disease. Arch Neurl 58: 1772-78
  • 17- Gern, J. E. 2002 The sequence of the human genome. Pediatrics 110: 429
  • 18- Wrigth, F. et al. 2001 A draft annotation and overview of the human genome. Genome Biol. 2 (7):25
  • 19- Barragán, J. et al. 2004 Evaluation of a relational model between genes number and the differente cellular types number. Argentinian Histology Journey. Proceedings Posters. S.M. de Tcucumán



Other documents of the same author




© Copyright 2004 Jorge Barragán.
© Copyright 2004 Research Institute on Human Evolution.
All rights reserved.