Animal Evolution: The animal in the genome: comparative genomics and evolution

Comparisons between completely sequenced metazoan genomes have generally emphasized how similar their encoded protein content is, even when the comparison is between phyla. Given the manifest differences between phyla and, in par- ticular, intuitive notions that some animals are more complex than others, this creates something of a paradox. Simplistic explanations have included arguments such as increased numbers of genes, greater numbers of protein products produced through alternative splicing, increased numbers of regulatory non-coding RNAs, and increased com- plexity of the cis-regulatory code. An obvious value of complete genome sequences lies in their ability to provide us with inventories of such components. Here I examine progress being made in linking genome content to the pattern of animal evolu- tion, and argue that the gap between genome and phenotypic complexity can only be understood through the totality of interacting components.

Deus ex machina: ‘A power, event, person, or thing that comes in the nick of time to solve a difficulty; providen- tial interposition . . .’
Oxford English Dictionary

14.1 Introduction

Complete genome sequences provide limits to our imaginations. Even just a few years before the human genome was available in rough draft form, it was widely believed to encode at least 50,000 genes (Fields et al., 1994; Editorial, 2000). In contrast, the initial publications estimated 25,000–40,000
protein-coding genes (Lander et al., 2001; Venter et al., 2001), and since then estimates have generally carried a downward momentum, most recently approaching 20,000 (Goodstadt and Ponting, 2006; Pennisi, 2007). Although this number is higher than the 16,000 or so found in invertebrate chord- ates (Dehal et al., 2002) it is basically the same total as in the nematode worm Caenorhabditis elegans (Hillier et al., 2005). Whether or not these low num- bers of protein-coding genes for vertebrates stand the test of time, the sense of unease surrounding the lack of correlation between organismal com- plexity (often measured in number of distinct cell types) and protein-coding gene count is evident from the framing of the ‘g-value paradox’ by Hahn and Wray (2002), and the various explanations that have been put forward to ease it, including, for example, miRNAs (Sempere et al., 2006, Heimberg et al., 2008), non-protein-coding DNA (Taft et al.,
2007), and alternative splicing (Kim et al., 2007). Similar gene counts are, of course, a crude meas-
ure of biological complexity. There is no reason why two genomes should not encode very different sets of protein-coding genes, but still have similar over- all totals. Within the field of animal evolution and the evolution of development (evo-devo), however, the g-value paradox has a particular resonance. Studies in different animal phyla have repeatedly shown the reuse of a core set of developmental genes, the so-called ‘toolkit’ (Carroll et al., 2005), with the HOX genes, in particular, taking on an iconic significance. Broadly, toolkit genes come from a handful of transcription factor families, defined by the presence of particular structural domains

148

CO M P A R A T I V E G E N O M I C S 149

such as the helix–turn–helix (HTH the class that includes, the homeobox genes), zinc fingers (ZnF), leucine zippers, and the helix–loop–helix (HLH). As well as transcription factors there are seven well- conserved pathways responsible for intercellular signalling (Pires-daSilva and Sommer, 2003), many of which appear to be present in sponges, the earli- est branching clade of animals (Nichols et al., 2006). An extreme interpretation of these data is provided by Davidson (2006): ‘if we focus explicitly on the genes encoding transcription factors, and [ . . . ] sig- nalling systems required for developmental spatial regulation, there is almost no qualitative variation among the genomes of bilaterians’.
Given all this, where in the genome do the phenotypic differences between animal taxa arise? The undoubted conservation of the protein-coding developmental genes has, particularly in the evo- devo field with its morphological concerns, focused attention on cis-enhancer elements affecting tran- scription (Carroll et al., 2005; Davidson, 2006; Wray,
2007; Simpson, 2007), although there are alternative views emphasizing the importance of different kinds of regulatory elements (Alonso and Wilkins,
2005) and different protein classes, such as struc- tural genes (Hoekstra and Coyne, 2007). Below I outline some major themes being developed by large-scale genome comparisons, principally of nematodes, insects, and vertebrates. My aim is not to present an exhaustive account, but to highlight areas where functionally relevant species-specific differences may arise, within apparently conserved systems. Although I concentrate on the evolution of the systems regulating animal development, this is not to lose sight of the things being regu- lated: the proteins involved in making nematode cuticles, or asynchronous flight muscles in insects, or the human brain and adaptive immune system, to name but a few, are what made it necessary to evolve those systems.

14.2 Gene duplication

Usefully summarizing the differences and similarities between more than 10,000 protein- coding genes from several species at once is not necessarily straightforward. Although pair- wise similarities between sequences are easy to
compute, they suffer from the imposition of arbi- trary cut-offs and are less easy to interpret than measures that explicitly reflect phylogeny. Genes in different species are most obviously compared by grouping into sets of orthologues (that is, genes related by speciation events) and paralogues (genes related by intragenome duplication events). Closely related species share large numbers of orthologues: 93% of dog (Canis familiaris) and 82% of the marsupial Monodelphis domesticus gene pre- dictions have orthologues in human (Goodstadt et al., 2007). The Linnean hierarchy, however, is not necessarily a good guide of genomic relatedness by this definition of similarity. Within the nematodes
65% of C. elegans genes share an orthologue with Caenorhabditis briggsae, despite their being from the same genus (Stein et al., 2003). For more distantly related genomes, orthologue counts can drop rap- idly. This may be as much a sign of difficulties in reliably assigning gene phylogenies on a large scale as a real indication of the extents of the con- served cores.
Paralogues often arise via tandem duplication of genes, giving rise to localized clusters of func- tionally related genes. As these are the regions where gene content is evolving most rapidly between closely related species, the functions of these genes are of special interest for understand- ing animal-specific differences. For the most part, for any two closely related vertebrate genomes the functional classes of genes duplicated in this way are similar—olfaction and chemosensation, repro- duction, and effectors of the immune response— although the duplications have occurred independently in each lineage (Emes et al., 2003). These large groups of paralogues often show evi- dence of adaptive evolution in their amino acid sequences, suggesting that new functions have been selected for (Emes et al., 2004a,b).
The recurrent nature of duplications within particular functional classes, coupled with the observed diversifying selection, suggests that they are a standard adaptive genomic response to environmental challenges. Does similar rapid duplication occur in the kinds of genes, such as transcription factors, that might be implicated in development? A growing number of examples are known. Perhaps most dramatically, in mice a set

150 AN I M AL EV O L UTI O N

of 32 tandemly duplicated homeoboxes have arisen from apparently one or two genes in the common ancestor of humans and rodents; they are believed to play a role in germ cell development and embry- onic stem cell differentiation (Maclean et al., 2005; Jackson et al., 2006).
Zinc finger-containing transcription factors have undergone independent rounds of gene duplication in insects and tetrapods. In insects a set of zinc fingers are found to co-occur with a zinc finger associated domain (ZAD) (Chung et al., 2007); this ZAD class is found in around
100 copies in Drosophila melanogaster and and 150 copies in the mosquito Anopheles gambiae; there is only a single copy in vertebrates (Chung et al.,
2007). In D. melanogaster, many are expressed in the female germline, suggesting a role in oocyte development or embryogenesis (Chung et al.,
2007). An analogous story is found with Krüppel associated box (KRAB) containing zinc fingers in tetrapods. Successive independent tandem dupli- cation events have occurred in different mam- malian lineages, leading to more than 400 copies in the human genome (Huntley et al., 2006). The KRAB domain itself appears to have been co- opted from a progenitor sequence conserved throughout eukaryotes (Birtle and Ponting, 2006); however, it has evolved so much as to make this similarity difficult to detect; clearly identifi- able KRAB domains are specific to tetrapods. Their functions are largely unknown, and have not been tied to any general aspects of tetrapod biology. As such, why the family as a whole has expanded is a puzzle.
Nematodes too exhibit lineage-specific expansions of particular transcription factor families, most not- ably the nuclear hormone receptors (NHRs). The C. elegans genome encodes 284, many more than the 48 in human and 21 in D. melanogaster. The bulk of these (> 200) have arisen from an apparently nematode- specific expansion of a unique gene (Lander et al.,
2001, Robinson-Rechavi et al., 2005). Once more, the reasons for such a dramatic lineage-specific expan- sion of a particular transcription factor family, and any links to taxon-specific biology, are obscure, although it has been speculated that C. elegans relies less on combinatorial reuse of different transcrip- tion factors (Antebi, 2006). A less dramatic lineage-
specific expansion occurs in the case of the T-box-containing transcription factors: there are 21 in C. elegans, with 17 arising from a lineage-specific expansion when compared with D. melanogaster and humans.Ascertainingwhenandinwhichcladesthese C. elegans duplications took place is currently frus- trated by a lack of relevant genome sequences. As a set these T-box genes map to several genomic loca- tions, suggesting that they have arisen over a more protracted timescale than the examples discussed above; some, at least, have known roles in the devel- opment of C. elegans (Poole and Hobert, 2006).

14.3 The ‘invention’ of new genes

A number of gene families appear to be meta- zoan novelties, with no clear sequence similarity to other genes outside the Metazoa, but present in the more basal animal phyla such as cnidarians and sponges. These include key families involved in animal development, like T-box and SMAD transcription factors, and signalling molecules such as WNTs and FGFs (Putnam et al., 2007). The most closely related non-metazoan eukaryote sequenced to date, the choanoflagellate Monosiga brevicollis, was reported to be missing true HOX, ETS, NHR, POU, and T-box class transcription fac- tors, strongly suggesting their origin was co-inci- dent with that of the metazoans (King et al., 2008). Analysis of preliminary data from the sponge genome indicates that, although present, many of these gene families were much smaller in number prior to the divergence of sponges and cnidarians (Larroux et al., 2008).
Was the invention of such families a prerequisite for the evolution of the Metazoa, and were analo- gous protein inventions required for the evolution of particular taxa, such as insects and vertebrates? At the level of three-dimensional structures (i.e. the protein fold itself), there is some reason to be sceptical that this is the case. In many cases, exam- ination of similarities in three-dimensional pro- tein structures shows that these genes have distant homologues in non-metazoan genomes. The MH1 (DNA-binding) domain of SMADs, for instance, is probably homologous to a family of homing endo- nucleases found in all kingdoms of life (Grishin,
2001); the T-box shares structural similarities

CO M P A R A T I V E G E N O M I C S 151

indicative of homology with a variety of other transcription factors, such as STAT DNA-binding domains, which are found in other eukaryotes (Murzin et al., 1995; Soler-Lopez et al., 2004); and the signalling domain of metazoan hedgehog proteins shares detailed similarities with members of a fam- ily of bacterial peptidases, suggesting that they too are likely to be homologous (Murzin et al., 1995). In these cases the novel families are likely to be cases of rapid sequence evolution, accompanying func- tional shifts, within stem lineages leading to the Metazoa. Sparse sequence sampling of non-fungal and metazoan eukaryotic genomes may contribute to the apparent co-origin of these protein domains with the animals.
As this type of domain evolution is occurring from pre-existing domain types, the process fits within a standard framework of accelerated point mutation and selection for new functions. The invention of the domain type is not a key innov- ation in itself; rather, it can be seen as the exten- sion of functional diversification of subfamilies of the kind that is apparent when comparing more closely related species. The fact that so many new domain types are found to be co-incident with the origin of metazoans suggests that the selective
pressures giving rise to this kind of accelerated sequence evolution were greater in the metazoan stem lineage.
An example of a more recent domain innovation is found in the Drosophila gene brinker, which plays a key role in the establishment of dorsoventral pat- terning. Although the protein-coding sequence of its DNA-binding domain is well conserved in insects, using current sequence data bases it shows no significant sequence similarity to proteins from any other taxa (Figure 14.1 and Plate 9), although there is weak (non-significant) similarity to pogo- like transposases, and the structure, which is only folded when complexed with DNA, suggests simi- larity to various transcription factors (Cordier et al.,
2006).

14.4 Evolution of transcription factors:
the animal in the orthologue

Lineage-specific duplication followed by sequence divergence provides one route to species-specific biology, but what scope is there for lineage-specific functional shifts within orthologous genes? In the absence of gene duplication, it is hard to imagine how the DNA specificity of a particular factor

a. b.
A.pisum
B mori
A mellifera
N.vitripennis
P.humanus
D mojavensis
D melanogaster
D.pseudoobscura
D.ananassae
D.erecta
D.yakuba
D.sechellia
D.simulans
D.grimshawi
D.virilis
T.castaneum
C.pipiens
A.aegypti
A.gambiae CCLHKTYHAHSLLSVLDSYRQDSDCQGNQRATARKYGIHRRQIQKWLQTE AGSRRIFPPQFKLQVLEAYRRDSQCRGNQRATARKFGIHRRQIQKWLQAE MGSRRIFAPAFKLKVLDSYRNDIDCRGNQRATARKYGIHRRQIQKWLQCE MGSRRIFAPAFKLKVLDSYRKDIDCRGNQRATARKYGIHRRQIQKWLQCE VGSRRIFSPHFKLQVLDSYRYDADCRGNQRATARKYNIHRRQIQKWLQCE MGSRRIFTPQFKLQVLESYRHDNDCKGNQRATARKYNIHRRQIQKWLQCE MGSRRIFTPHFKLQVLESYRNDNDCKGNQRATARKYNIHRRQIQKWLQCE MGSRRIFTPHFKLQVLESYRNDNDCKGNQRATARKYNIHRRQIQKWLQCE MGSRRIFTPHFKLQVLESYRNDNDCKGNQRATARKYNIHRRQIQKWLQCE MGSRRIFTPHFKLQVLESYRNDNDCKGNQRATARKYNIHRRQIQKWLQCE MGSRRIFTPHFKLQVLESYRNDNDCKGNQRATARKYNIHRRQIQKWLQCE MGSRRIFTPHFKLQVLESYRNDNDCKGNQRATARKYNIHRRQIQKWLQCE MGSRRIFTPHFKLQVLESYRNDNDCKGNQRATARKYNIHRRQIQKWLQCE MGSRRIFTPQFKLQVLESYRNDNDCKGNQRATARKYNIHRRQIQKWLQCE MGSRRIFTPQFKLQVLESYRNDNDCKGNQRATARKYNIHRRQIQKWLQCE IGSRRIFAPHFKLQVLDSYRNDADCKGNQRATARKYGIHRRQIQKWLQVE MGSRRIFTPQFKLQVLDSYRNDSDCKGNQRATARKYGIHRRQIQKWLQVE MGSRRIFTPQFKLQVLDSYRNDSDCKGNQRATARKYGIHRRQIQKWLQVE MGSRRIFTAQFKLQVLDSYRNDGDCKGNQRATARKYGIHRRQIQKWLQVE
Consensus/90% hGSRRIFss.FKLpVL-SYRpD.DC+GNQRATARKYsIHRRQIQKWLQsE

Figure 14.1 The DNA-binding domain of brinker is conserved within insects, but has no signiﬁcantly similar sequences in other taxa.
(a) The alignment shows the conserved core from a selection of insect species. Sequences of Drosophila species were taken from the UCSC
web browser (http://genome.ucsc.edu/), Anopheles and Aedes from ENSEMBL (http://www.ensembl.org/), other predictions were made
from sequences at the NCBI. GI accession numbers: N. vitripennis 146253130; T. castaneum 73486274; C. pipiens 145464888; P. humanus
145365328; A. mellifera 63051942; B. mori 91842977; A. pisum 47522326. (b) The three-dimensional structure of the aligned region when binding DNA. The structure was taken from the PDB ﬁle 2glo. (See also Plate 9.)

152 AN I M AL EV O L UTI O N

might be significantly changed, in such a way that it targets new genes, without deleterious con- sequences. The modular structure of proteins, however, suggests that other routes of functional evolution are available. A protein may have pleio- tropic effects, but that is not the same as saying that every amino acid in the protein will be dir- ectly involved in all those effects. A recent illustra- tive example from the hox gene Ultrabithorax, is of an insect-specific ‘QA’ protein motif, found outside the homeodomain. The region is involved in limb repression; the effects of deleting the motif are strong in some tissues but close to undetectable in others (Hittinger et al., 2005). Clearly, changes in the protein-coding sequences of transcription fac- tors, apart from their more obvious DNA-binding residues, must be integrated into our understand- ing of the evolution of developmental regulation (Wagner and Lynch, 2008).
The majority of residues in metazoan transcrip- tion factors do not fall within regions of well-defined globular structure, with many belonging to so- called ‘intrinsically disordered’ regions—regions that may form a structure when complexed with other macromolecules (J. Liu et al., 2006, Minezaki et al., 2006). The specific sequences of these regions are typically not obviously conserved between par- alogues; because they are unique to particular fam- ilies they are not covered in domain data bases such as SMART and Pfam (Finn et al., 2006; Letunic et al.,
2006). The lack of extreme conservation between distant species has sometimes masked the fact that, within closely related species, these regions are conserved. Comparisons of orthologous sequences from closely related genomes (e.g. vertebrates or drosophilids) often show that substantial propor- tions of these non-domain sequences are undergo- ing strong purifying selection—they accumulate many more synonymous nucleotide changes than non-synonymous changes—and are thus func- tional. For the large part, precisely what these bio- logical functions are is unknown; two possibilities, however, suggest themselves. Firstly, they may have relatively uninteresting non-specific effects, such as facilitating folding of the major domain (for instance by reducing aggregation) or acting as spacers between globular domains. Secondly, and more interestingly from the point of view of
animal evolution, they may include short linear peptide motifs that mediate protein–protein inter- actions (Dyson and Wright, 2005; Neduva et al.,
2005; Neduva and Russell, 2005).
There are numerous examples of regula-
tory motifs found outside of transcription factor
domains. Many hox proteins include a YPWM-like
hexapeptide motif that interacts with other home-
odomain-containing proteins (In der Rieden et al.,
2004); Drosophila fushi tarazu ( ftz) orthologues have
lost this motif but acquired an LXXLL motif cou-
pled to a new role in segmentation (Lohr and Pick,
2005); and an N-terminal SSYF-like motif believed
to be involved in transcriptional activation is con-
served across hox orthologues and paralogues
from different phyla (Tour et al., 2005). Interaction
motifs can be coupled with signalling pathways to
create cell-type specificity. They can, for instance,
be regulated by phosphorylation, such that the
phosphorylation status governs what interactions
can be made (e.g. Sapkota et al., 2007), or alternative
splicing can result in protein–protein interaction
motifs being included or excluded from particular
cell types, providing additional layers of regula-
tory complexity that are likely to be species specific
(Neduva and Russell, 2005).
The challenge of identifying small regulatory
motifs means that their species distributions, and
how their presence might produce taxon-specific
differences in protein functions, have not been
well studied. Examples that tie cleanly to one taxo-
nomic group are less common, but an interesting
case has been proposed in bilaterian orthologues
of the Brachyury gene. These possess an N-terminal
motif that is not found in non-bilaterian Metazoa
(Marcellini, 2006), which instead have a well-defined
EH1-like motif (Copley, 2005). The bilaterian motif
is believed to be responsible for an interaction with
Smad1, and hence to link gastrulation to bilateral
pattern formation (Marcellini, 2006).

14.5 Enhancers: transcription factor binding sites and ultraconserved regions

Theoretical considerations have led to an intense focus on transcription factor binding sites (TFBSs) as a major molecular source of morphological novelty

CO M P A R A T I V E G E N O M I C S 153

(Wray et al., 2003; Carroll et al. 2005, Davidson, 2006; Wray, 2007; although see Hoekstra and Coyne, 2007, for a critique). Individual TFBSs show rapid turn- over in comparisons of closely related genomes, with many being lineage specific (Dermitzakis and Clark, 2002; Moses et al., 2006). This dynamic nature may not be revealed in the phenotype— patterns of gene expression may be conserved even though regulatory sequences change at the molecular level (Ludwig et al., 2000; Romano and Wray, 2003; Fisher et al., 2006). On the other hand, the gain and loss of individual TFBSs has been implicated in several recent cases of morphological evolution, in both vertebrates and invertebrates (reviewed in Wray, 2007, and Simpson, 2007). The relationship between individual TFBSs and enhan- cer function is clearly not straightforward, beyond the fact that clustering of individual binding sites can identify some enhancer regions (Markstein et al., 2002). Cases of functional linkages between particular transcription factors have been pro- posed, for example, between Dorsal, twist, Su(H), and an unidentified motif in neurogenic ectoderm formation in Diptera (Markstein et al., 2004), and even a coupling originating prior to the origin of Bilateria, of hairy and E(spl), promoting neural cell fate (Rebeiz et al., 2005).
Comparisons of vertebrate genomes have revealed large regions (more than 100 nucleotides) of extreme conservation of non-coding sequences (conserved non-coding elements, CNEs) (Bejerano et al., 2004). These regions are often found near transcription factors and other developmental genes (Sandelin et al., 2004). Outside of the verte- brates, there is evidence for similar regions occur- ring near developmental genes in flies (Glazov et al., 2005) and nematodes (Vavouri et al., 2007). Although in many cases the conserved regions are even found near orthologous genes, there is no evidence that they are homologous; they appear to have evolved independently in each of the phyla (Vavouri et al., 2007). Experimental evidence from vertebrates shows that many instances have roles as tissue-specific enhancer elements (Woolfe et al.,
2005, Pennacchio et al., 2006).
The length, and lack of interphyla conserva-
tion of CNEs is in contrast to individual TFBSs.
The DNA specificity of orthologous transcription
factors is usually well conserved over large phylo- genetic distances, but typical TFBSs are short, of the order of six to ten nucleotides. An obvious pos- sibility is that longer CNEs are composed of over- lapping or adjacent TFBSs. This would suggest a tight packing of transcription factor proteins on the genomic DNA of these CNEs. There is direct evidence for this: some fragments of highly con- served non-coding sequences are present in crys- tal structures of transcription factor complexes. An atomic model based on known crystal structures of the interferon-E enhancer, for example, shows
50 consecutive nucleotides in contact with eight different proteins; these nucleotides are well con- served in mammalian species (Panne et al., 2007; see Figure 14.2 (also Plate 10) for another example). Given that such structures exist, it is not such a leap to imagine 16 proteins binding to 100 nucle- otides, or even bigger complexes. This suggests a model where CNE enhancer regions controlling orthologous genes in different phyla are controlled by multiple transcription factor binding sites, although not necessarily the same transcription factors or in the same orientation. Moreover, the tight packing of transcription factors on the gen- omic DNA suggests that the proteins themselves may be co-adapted to interact with each other and aid the cooperative formation of enhancer com- plexes. Previously, Ruvinsky and Ruvkun (2003) have presented experimental evidence that enhanc- ers and transcription factors co-evolve in this way, with neuronal and muscle-specific enhancer elem- ents from D. melanogaster failing to drive expres- sion in homologous tissue types in C. elegans, and Dover and co-workers (McGregor et al., 2001; Shaw et al., 2002) have argued for co-evolution of bicoid protein and hunchback regulatory regions. Wagner (2007) has proposed that the protein–protein inter- actions of co-adapted transcription factors may form the underpinnings of ‘character identity net- works’; that is, the gene regulatory networks that control the development of homologous morpho- logical characters.
If protein–protein interactions between transcrip- tion factors are often required for the formation of enhancer complexes, close analysis of transcription factor sequence and structure may reveal evidence for co-adaptation of proteins, such as the HOX

154 AN I M AL EV O L UTI O N

CEBP

BRLZ

CEBP

BRLZ

PFAM: Runt
RUNX1

Pfam: RunxI

Human GCAACCACAGAGTTTGGAAATCTT Chimp ........................ Rhesus ........................ Mouse G...T.....A.........A.. Rat G...T...............A.. Rabbit ........................ Dog ........................ Cow ........................ Elephant ..........T..........-.. Tenrec ..........G.............

Figure 14.2 Adjacent transcription factor binding sites cause extended regions of DNA sequence conservation. Structure of CEBPE homodimer and RUNX1 (Tahirov et al., 2001). Three transcription factors (2× CEBPE and RUNX1) bind in a region of 25 nucleotides conserved throughout placental mammals. The DNA-binding domains represented as three-dimensional structures are boxed and colour- coded in the schematic representation of the proteins. In each case, the majority of the protein is not represented in the structure; these regions could interact with other transcription factors, activators, and repressors. The human sequence coordinates are chromosome 5, bases
149,446,373–149,446,396 of the NCBI build 36. The alignment is taken from the UCSC web browser (http://genome.ucsc.edu/). (See also
Plate 10.)

hexapeptide motif, through which homeotic pro- teins form complexes with TALE class homeodo- mains (LaRonde-LeBlanc and Wolberger, 2003). We might expect instances of co-adapted transcription factor combinations to be taxon specific, to match the taxon specificity of enhancer sequences.

14.6 Alternative splicing

Not all CNEs are associated with enhancer regions.
There is good evidence that many are involved
in regulating alternative splicing events, includ- ing the alternative splicing of mRNAs of proteins which themselves regulate alternative splicing (Lareau et al., 2007; Ni et al., 2007). The presence of highly conserved control elements to regulate alternative splicing indicates that the functional consequences are of importance. Although large very conserved elements may be the exception rather than the rule, detailed comparative analyses have identified smaller conserved motifs regulat- ing alternative splicing, for instance in nematodes

CO M P A R A T I V E G E N O M I C S 155

(Kabat et al., 2006) and vertebrates (Sorek and Ast,
2003; Yeo et al., 2005).
Alternative splicing is often touted as a mechan-
ism by which proteomic complexity is increased.
Although early reports suggested that levels of
alternative splicing were comparable in vertebrates
and invertebrates (Brett et al., 2002), more recent
studies suggest that there is indeed more alterna-
tive splicing of transcripts in vertebrates (Kim et al.,
2007), suggesting a link with increased phenotypic
complexity. How relevant is alternative splicing for
species-specific biology and morphological differ-
ences? Quantitatively, the gene products that appear
to be most affected by alternative splicing are typic-
ally involved in the functioning of the nervous and
immune systems (Modrek et al., 2001). There are,
however, ample examples of alternatively spliced
transcription factors—as many as 63% of mouse
transcription factors have variant exons (Taneri
et al., 2004). Although the differences in molecu-
lar roles of the alternatively spliced products are
often unknown, the genes themselves include
developmental classics such as members of Hox,
SMAD, and T-box families (Noro et al., 2006; Fan
et al., 2004; Dunn et al., 2005;), although they do not
necessarily present obvious morphological corre-
lates (Yoder and Carroll, 2006). Alternative splicing
of modular proteins is an obvious route through
which functions can be changed by including or
excluding particular combinations of domains. In
this regard, it is interesting that alternative splicing
often affects intrinsically disordered regions out-
side known protein domains (Romero et al., 2006)—
this again points to a critical role for finely tuned
protein–protein interactions among transcriptional
regulators.
There are few known cases of distant conserva-
tion of alternative splice variants of transcription
factors; typically, examples are conserved within
phyla at best. Widening the search to other classes
of gene again suggests that splice variants are not
conserved over long periods, although it should
be remembered that transcript coverage of most
species, from which evidence of alternative spli-
cing is obtained, is very restricted. Perhaps the
best example is currently that of fibroblast growth
factor receptor 2 (FGFR2), where an exon configur-
ation diagnostic of mutually exclusive alternative
splicing is found in both vertebrates and the sea
urchin Strongylocentrotus purpuratus (Mistry et al.,
2003). Examples of orthologous ion channel encod-
ing genes showing similar alternative splicing pat-
terns in D. melanogaster, C. elegans, and humans are
likely to be cases of parallel evolution (Copley,
2004). The shared ability of vertebrates and at
least insects and C. elegans to produce alternative
transcripts in a regulated manner, alongside the
absence of large numbers of conserved alterna-
tive splicing between protostomes and deuteros-
tomes, suggests that gene products have become
alternatively spliced in parallel between different
lineages, while at the same time hinting that the
functions performed by alternative splice variants
may, over time, be replaced by different genomic
solutions.

14.7 Summary

Key genetic innovations, such as alternative spli- cing, the invention of hox genes or the advent of micro-RNAs, have held a strong appeal for those seeking to explain animal evolution in terms of genomes. Without denying the importance of such phenomena, a more nuanced outlook is preferable. Much of the molecular complexity found in ani- mals could have its origins in non-adaptive proc- esses attributable to small population sizes (Lynch,
2007a,b), but this complexity may then be exploited in the service of phenotypic adaptation (Lynch,
2007a) within a framework of point mutation and
selection.
Although most major classes of protein involved
in animal development may be conserved through-
out the Metazoa, detailed comparative analysis of
these gene types reveals a more dynamic picture,
with frequent gene duplication, gene loss, coup-
lings with new motifs, and other processes such as
alternative splicing and regulation by micro-RNAs,
all of which are likely to be important for a full
understanding of function. Cis-regulatory vari-
ation may well be revealed to be quantitatively the
most common form of variation between species,
but it seems likely that the cumulative effects of
multiple cis-regulatory changes will have required
that protein networks evolve to accommodate and
correctly regulate changed enhancer structures.

156 AN I M AL EV O L UTI O N

Our knowledge of animal evolution and the pic- ture presented here is currently based on a very small sampling of almost exclusively nematode, insect, and vertebrate genomes. Although this situ- ation is beginning to change, the fact that many important functional regions, especially those that
do not encode proteins, are only revealed by hav- ing sets of closely related genome sequences, and that there are 35 or so animal phyla, gives some idea of the huge scale of the challenges ahead. The rapidly falling costs of genome sequencing do, however, give grounds for optimism.

Animal Evolution

Sunday, March 21, 2010

The animal in the genome: comparative genomics and evolution

No comments:

Post a Comment

Followers

Blog Archive

About Me