Sunday, March 21, 2010

Improvement of molecular phylogenetic inference and the phylogeny of Bilateria

Inferring the relationships among Bilateria has been an active and controversial research area since the time of Haeckel. The lack of a sufficient number of phylogenetically reliable characters was the main limitation of traditional phylog- enies based on morphology. With the advent of molecular data this problem has been replaced by another, statistical inconsistency, which stems from an erroneous interpretation of con- vergences induced by multiple changes. The analysis of alignments rich in both genes and species, combined with a probabilistic method (maximum likelihood or Bayesian) using sophis- ticated models of sequence evolution, should alleviate these two major limitations. We have applied this approach to a data set of 94 genes from 79 species using the CAT model, which accounts for site-specific amino acid replacement patterns. The resulting tree is in good agree- ment with current knowledge: the monophyly of most major groups (e.g. Chordata, Arthropoda, Lophotrochozoa, Ecdysozoa, Protostomia) was recovered with high support. Two results are surprising and are discussed in an evo-devo framework: the sister-group relationship of Platyhelminthes and Annelida to the exclusion of Mollusca, contradicting the Neotrochozoa hypothesis, and, with a lower statistical support, the paraphyly of Deuterostomia. These results, in particular the status of deuterostomes, need further confirmation, both through increased taxonomic sampling and future improvements of probabilistic models.
12.1 Introduction

12.1.1 The limits of morphology

The inference of animal phylogeny from mor- phological data has always been a difficult issue. Although a rapid consensus was obtained on the definition of phyla (with a few exceptions: vesti- mentiferans, pogonophores, or platyhelminths), the relationships among phyla has long remained unsolved (Brusca and Brusca, 1990; Nielsen, 2001). The dominant view, albeit far from being univer- sally accepted, was traditionally biased in favour of the Scala Naturae concept of Aristotle, which postulates an evolution from simple to more com- plex organisms (Adoutte et al., 1999). Briefly, acoe- lomates (platyhelminths and nemertines) were considered as emerging first, followed by pseu- docoelomates (nematodes), and then coelomates, representing the ‘crown group’ of Bilateria. A simi- lar gradist view was proposed for deuterostomes, with the successive emergence of Chaetognatha, Echinodermata, Hemichordata, Urochordata, and Cephalochordata, culminating in Vertebrata (e.g. Conway Morris, 1993a).
Irrespective of its underlying ‘ideological’ pre- conceptions, however, this traditional bilaterian phylogeny was based on very few morphological and developmental characters (position of the nerve cord, cleavage patterns, modes of gastrulation, etc.) whose phylogenetic reliability may sometimes be disputable (either because of the description, or the coding and analysis; see Jenner, 2001). This gen- eral lack of homologous characters is related to the



127

128 AN I M AL EV O L UTI O N



wide disparity observed between body plans. For some phyla (such as echinoderms), the body plan is nearly exclusively characterized by idiosyncrasies, leaving few characters to compare with other bila- terian phyla. Traditional animal phylogenies based on morphological data were thus hampered by an insufficient amount of reliable primary signal.


12.1.2 The difficult beginnings of molecular phylogeny

Great hopes were placed in the use of molecular data (Zuckerkandl and Pauling, 1965). Unfortunately, the first phylogenies based on ribosomal RNA (rRNA) turned out to be quite controversial (Field et al.,
1988). They contained some results that were dif- ficult to accept, such as the polyphyly of animals. We will not review in detail this turbulent early his- tory, but rather note that these trees were based on a scarce taxon sampling and inferred using overly simple methods (e.g. the Jukes and Cantor distance). As a consequence, tree building artefacts were fre- quent. The problem was mainly addressed through improved taxon sampling: over a period of about
10 years, rRNAs were sequenced from several hun- dred species. In part because of the sheer improve- ment due to a denser taxonomic sampling, but also thanks to a systematic selection of the slowest-evolv- ing representatives of the majority of animal phyla, a consensus rapidly emerged, reducing the diversity of Bilateria into three main clades: Deuterostomia, Lophotrochozoa (Halanych et al., 1995), and Ecdysozoa (Aguinaldo et al., 1997). The statistical support for most of the nodes was nevertheless non- significant (Philippe et al., 1994; Abouheif et al., 1998), thus preventing any firm conclusions.
This brief historical overview provides a clear illustration of the problems of phylogenetic infer- ence. The resolution of the morphological and rRNA trees is limited because too few substitu- tions occurred during the evolution of this set of conserved characters, yielding too few syn- apomorphies. At the same time, unequal rates of evolution across characters imply that some char- acters concentrate numerous multiple substitutions (convergences and reversions). These multiple substitutions can be misinterpreted by tree recon- struction methods and lead to incorrect results. In particular, the well-known long branch attraction
(LBA) artefact (Felsenstein, 1978) leads to an erro- neous grouping of fast-evolving taxa, often result- ing in an apparent earlier emergence (Philippe and Laurent, 1998). For instance, this is the reason for the initial absence of monophyly of animals, with Bilateria evolving too fast and being attracted towards the outgroup. Similarly, the recognition of the LBA problem played a major role in the estab- lishment of the Ecdysozoa hypothesis (Aguinaldo et al., 1997); i.e. when the fast-evolving Caenorhabditis is considered, nematodes emerge at the base of Bilateria, but when the slowly-evolving Trichinella is included, nematodes cluster with arthropods. Compositional heterogeneity can also generate artefacts, especially for trees based on mitochon- drial sequences (Foster and Hickey, 1999).
Twenty years ago, the expectations from molecu- lar data were further boosted by the prospect of using genomic data. The underlying assump- tion is that the joint analysis of numerous genes potentially provides numerous synapomorphies, thus eliminating the problem of stochastic errors. Yet although it is true that stochastic errors will naturally vanish in such a phylogenomic context, systematic errors, which are due to the inconsist- ency of tree building methods, will not disappear. Indeed, they should even become more apparent (Philippe et al., 2005b).
We recognized two main avenues to circumvent systematic errors (Philippe and Laurent, 1998): (1) the use of rare, and putatively slowly evolving, complex characters, such as gene order (Boore,
2006), which should be homoplasy-free; (2) the use of numerous genes combined with inference methods that deal efficiently with multiple substi- tutions, which should avoid artefacts due to homo- plasy. We will briefly review the application of the second approach to the question of the mono- phyly of Ecdysozoa as a way of demonstrating the importance of using numerous species and models that handle the heterogeneity of the evolutionary process across positions.


12.1.3 Illustration of the misleading effect of multiple substitutions in the case of Ecdysozoa

The first phylogenies based on numerous genes
(up to 500) significantly rejected the monophyly of

I M P ROV EM EN T M O L E CU L A R P H Y L O G EN E T I C I N F ER EN CE 129



Ecdysozoa (e.g. Blair et al., 2002; Dopazo et al., 2004; Wolf et al., 2004). To exclude the possibility that this was due to a LBA artefact, the use of putatively rarely changing amino acids was proposed (Rogozin et al.,
2007b), an approach that also supported Coelomata (i.e. arthropods as sister group of vertebrates rather than nematodes). At first, phylogenomics seemed to reject strongly the new animal phylogeny, which was mainly based on rRNA.
However, these phylogenomic analyses were characterized by a very sparse taxon sampling, and used only simple tree reconstruction methods, rendering them potentially sensitive to system- atic errors. We will show that, as in the first rRNA phylogenies, the monophyly of Coelomata was an artefact due to the attraction of the fast evolving Caenorhabditis to the distant outgroup (e.g. Fungi). As detailed below, three different and independ- ent approaches that reduce the misleading effect of multiple substitutions lead to a change the top- ology from Coelomata to Ecdysozoa.


12.1.4 Removal of the fast-evolving positions

An obvious way to reduce systematic errors is to remove the fastest-evolving characters from the alignment (Olsen, 1987). In principle, the phyl- ogeny has to be known to compute the evolution- ary rate, rendering simplistic circular approaches potentially hazardous (Rodríguez-Ezpeleta et al.,
2007). The SF method (Brinkmann and Philippe,
1999) partially circumvents this issue by comput-
ing rates within predefined monophyletic groups.
Only the relationships among these groups can be
studied and an equilibrated species sample should
be available for each predefined group. When the
SF method is applied to a large alignment of 146
genes with four representatives each from Fungi,
Arthropoda, Nematoda, and Deuterostomia
(Delsuc et al., 2005), the removal of fast-evolving
sites leads to an almost total disappearance of the
support in favour of Coelomata. Interestingly, this
does not correspond to a loss of phylogenetic sig-
nal, since support in favour of Ecdysozoa steadily
increases (up to a bootstrap support value of 91%).
The simplest interpretation of this experiment
is that the misleading effect of multiple substitu-
tions creates a LBA artefact that disappears when
fast-evolving positions are discarded. Note that this way of selecting slowly evolving characters (SF method) differs from the one used by Rogozin et al. (2007b) by the use of a rich taxon sampling that allows us to select positions that are more likely to reflect the ancestral state of the predefined monophyletic group, therefore reducing the risk of convergence along the long terminal branches.


12.1.5 Improvement of taxon sampling

Another obvious way of reducing the mislead- ing effect of multiple substitutions is to incorpor- ate more species, breaking long branches (Hendy and Penny, 1989) and thus allowing one to detect convergences and reversions more easily. In the case of Bilateria, simply adding a close outgroup (Cnidaria) to an alignment containing only a dis- tant outgroup (Ascomyceta) is sufficient to change from strong support for Coelomata to strong sup- port for Ecdysozoa. This is true for the analysis of both complete primary sequences (Delsuc et al.,
2005) and rare amino acid changes (Irimia et al.,
2007). Undetected convergences between the fast-
evolving nematodes and the distant outgroup
therefore create a strong but erroneous signal that
biases tree building methods. Accordingly, none
of the phylogenomic studies that have used dense
taxon sampling found any support in favour of
Coelomata (e.g. Philippe et al., 2005a; Marlétaz et al.,
2006; Matus et al., 2006b).


12.1.6 Improvement of the tree building method

Probabilistic methods are now widely recognized as the most accurate methods for phylogenetic reconstruction (Felsenstein, 2004). However, to avoid the problem of systematic errors, they require good models of sequence evolution. We recently developed a new model, named CAT, which parti- tions sites into categories, so as to take into account site-specific amino acid preferences (Lartillot and Philippe, 2004). When applied to the difficult case of the bilaterian tree rooted by a distant outgroup (Fungi), the CAT model provides strong support for Ecdysozoa, whereas the WAG model (Whelan and Goldman, 2001) strongly favours Coelomata (Lartillot et al., 2007). Posterior predictive analyses

130 AN I M AL EV O L UTI O N



demonstrate that the CAT model predicts homo- plasies more accurately than the WAG model. In other words, the CAT model detects multiple sub- stitutions more efficiently and is therefore less sensitive to systematic errors.
The greater robustness of CAT against the Coelomata artefact is related to the fact that it accounts better for site-specific restrictions of the amino acid alphabet. Amino acid replacements in most proteins tend to be biochemically conservative, with typical variable positions in a protein accepting substitutions among only two or three amino acids. This has important consequences for phylogenetic
No. subs
0
1
2
3
4
5
6
7
WAG
GTR
CAT

reconstruction using amino acid sequences, since it implies that convergences and reversions (homo- plasies) are much more frequent than what would be expected if all amino acids were considered equally acceptable at any given position. In practice, the typical number of amino acids observed per position is indeed overestimated by classical site- homogeneous models, based on empirical matri- ces such as WAG, which in turn results in a poor anticipation of the risk of homoplasy, and thereby in a greater prevalence of artefacts. In contrast, site- specific models such as CAT will anticipate these problems better, and will be less prone to system- atic errors (Lartillot et al., 2007).
Figure 12.1 (see also Plate 8) illustrates the strik- ingly different behaviour of site-homogenous and site-heterogeneous models when a position is saturated. Under WAG and GTR (general time reversible) models, the substitution process rap- idly converges to a nearly flat distribution over the
20 amino acids. Therefore, according to these two models, a position cannot be saturated and at the same time display a strong preference for a few amino acids. This is at odds with common intu- ition about strong site-specific effects of purifying selection related to the protein’s conformational and functional constraints. In contrast, under the CAT model, the position underwent substitution almost solely between the two negatively charged amino acids aspartate and glutamate, and all other
18 amino acids are rarely encountered, even over long time periods (Figure 12.1). Thus, under CAT, the effective substitutional alphabet at the position under investigation is essentially of size 2, which automatically implies that convergence towards the
Figure 12.1 Posterior predictive tests to analyse the behaviour
of the WAG, GTR, and CAT models under substitutional saturation. A column of the alignment displaying only aspartic acid (D) and glutamic acid (E) was chosen at random, and for the three models, the probability of observing each of the 20 amino acids after n substitutions (n = 0–7), and starting from an aspartic acid, was estimated and visualized graphically. The height of each letter is proportional to the probability of the corresponding amino acid. The parameters of the substitution process were taken at random from the posterior distribution under each model. (See also Plate 8.)


same amino acid is very likely, much more likely than if all 20 amino acids were allowed at this site. Thanks to this phenomenon, the CAT model more easily detects saturation in protein alignments, compared with standard models such as WAG or GTR, and is therefore less sensitive to long branch attraction artefacts (Lartillot et al., 2007).
Of course, there are many other potential causes of error, all of which can in principle be traced back to model misspecification problems: in all cases, it is a matter of correctly modelling various features of the substitution process that may potentially lead to an increase in the level of homoplasy (e.g. compositional biases). Improving the models of sequence evolution is thus an essential require- ment for phylogenetics, and is currently a very active area of research. In principle, it should be preferred to the two other approaches detailed above because (1) it avoids the risk of stochastic errors implied by the use of the rare slowly evolv- ing positions and (2) it applies even when the taxon sampling is naturally sparse (e.g. coelacanth).
Finally, it should be noted that an incorrect handling of multiple substitutions does not necessarily lead to a robust incorrect tree (as in the

I M P ROV EM EN T M O L E CU L A R P H Y L O G EN E T I C I N F ER EN CE 131



case of Coelomata) but possibly to an unresolved tree. For instance, an analysis based on 50 genes using a sparse taxon sampling (21 species, with most of the animal phyla being represented by a single, often fast-evolving, species) and a simple model of sequence evolution (RtREV+I+G), resulted in a poorly resolved tree, in which even the mono- phyly of Bilateria was not supported (Rokas et al.,
2005). Since the approach used does not allow efficient detection of multiple substitutions, we decided to do a comparable study in which we sim- ultaneously improved the species sampling (from
21 to 57, including many slowly evolving species) and the model of sequence evolution (i.e. using the CAT model). Interestingly, the statistical support was high (e.g. bootstrap values > 95% for Bilateria, Ecdysozoa, and Lophotrochozoa) in the resulting tree (Baurain et al., 2007). This illustrates that incor- rect handling of multiple substitutions can create an artefactual signal that is not sufficiently strong to overcome the genuine phylogenetic signal and to create a highly supported erroneous topology, but is sufficient to lead to a poorly resolved tree.
In summary, a combination of many positions, corresponding to multiple genes, and a dense taxonomic sampling are a necessary prerequis- ite to obtain reliable phylogenies. Ideally, these sequences should then be analysed with probabil- istic models that correctly describe the true evolu- tionary patterns of the sequences under study. In practice, one may perform analyses with alterna- tive models of evolution among those currently available, compare the fit of those models, check for possible model violations, and test the robust- ness of the analyses by site and taxon resampling. This is the method that we apply to the phylogeny of Bilateria. Full details of the materials and meth- ods used can be found associated with the original article (Lartillot and Philippe, 2008).


12.2 Results

12.2.1 Comparison of phylogenies based on
CAT and WAG models

We analysed our large data set (79 animal species and 19,993 positions) using two alternative mod- els of amino acid replacement: the WAG empirical
matrix (Whelan and Goldman, 2001), which is cur- rently one of the standard models (Ronquist and Huelsenbeck, 2003; Jobb et al., 2004; Hordijk and Gascuel, 2005; Stamatakis et al., 2005); and the CAT mixture model (see above). The trees obtained under CAT (Figure 12.2) and WAG (Figure 12.3) models are very similar and in good agreement with current knowledge (Halanych, 2004). The fol- lowing major aspects can be noted:

• Ecdysozoa and Lophotrochozoa receive a stronger bootstrap support under CAT (bootstrap proportion (BP) of 99% and 100%) than under WAG (53% and 71%). Under WAG, platyhelminths are slightly attracted by nematodes, as can be seen by the low bootstrap support values along the path between the two groups. The attraction is neverthe- less less marked than with a poorer taxon sampling (Philippe et al., 2005a).
• Within Lophotrochozoa, many phyla are unsam- pled, but the three that are present (platyhelminths, molluscs, and annelids) are reasonably well repre- sented. Interestingly, with the CAT model annelids and platyhelminths are sister groups (94% BP), while the analysis under WAG recovers a more traditional grouping of annelids and molluscs (Neotrochozoa, 97% BP). A sister-group relation- ship between annelids and platyhelminths had already been observed in a combined large sub- unit (LSU)–small subunit (SSU) rDNA analysis (Passamaneck and Halanych, 2006), and in an ana- lysis based on mitochondrial gene order (Lavrov and Lang, 2005), but was not found in previous analyses based on expressed sequence tags (ESTs) (Philippe et al., 2005a).
• The relationships among Ecdysozoa are not well resolved. This is mainly due to the fluctuat- ing position of tardigrades and priapulids, whose sequences are incomplete (39.9% and 75.8% of missing data, respectively). The most likely con- figuration displays priapulids at the base of all other Ecdysozoa, and tardigrades as sister group of nematodes, but two major alternatives are also proposed by the bootstrap analysis: tardigrades as sister group of priapulids, together at the base of nematodes and arthropods, or priapulids at the base of nematodes.

132 AN I M AL EV O L UTI O N





Nematostella
Acropora
Oscarella
Suberites
Reniera

Porifera

99 Cyanea
Hydractinia
Hydra
Branchiostoma
98 Ciona
Molgula
Halocynthia
Petromyzon
Eptatretus
Cnidaria



Chordata

Danio
Gallus
92 Xenopus
Saccoglossus


Xenoturbella


Xenoturbellida

81 Strongylocentrotus
Asterina
Ambulacraria



73 94
Spadella
Macrostomum

Schmidtea
Echinococcus
Chaetognatha

Platyhelminthes

93



96

99
92 55

73

Platynereis
Capitella
Lumbricus
Helobdella
Euprymna
Lottia
Aplysia
Crassostrea
Mytilus
Argopecten
Priapulus

83








Hypsibius
Xiphinema
Schistosoma
Fasciola








Trichuris
Trichinella
Brugia


Annelida


Mollusca

Priapulida
Tardigrada

99 Ascaris
Pristionchus
Caenorhabditis
Ancylostoma


Strongyloides

Nematoda

78

Acanthoscurria Ixodes Boophilus




Litopenaeus Homarus Carcinus
Bursaphelenchus
Pratylenchus
Meloidogyne
84 Heterodera
Daphnia
Artemia
Locusta
73 Laupala
Pediculus
Rhodnius
93 Homalodisca

Maconellicoccus
Acyrthosiphon
Lysiphlebus
Nasonia
75 Apis
Tribolium
Spodoptera
Bombyx
Ctenocephalides
95 Chironomus
95 Aedes
Lutzomyia
0.1 Glossina
Drosophila
Arthropoda

Figure 12.2 Phylogeny inferred using the CAT model. The alignment consists of 19,993 unambiguously aligned positions (94 genes and 79 species). The tree was rooted using sponges and cnidarians as outgroups. Nodes supported by 100% bootstrap values are denoted by black circles while lower values are given in plain style. The scale bar indicates the number of changes per site.


• Chaetognaths appear at the base of all proto- stomes (92% CAT BP, 52% WAG BP), which is in accordance with Marlétaz et al. (2006).
• The monophyly of deuterostomes is weakly sup- ported under the WAG model (76% BP), whereas the CAT model favours a paraphyly, also weakly

I M P ROV EM EN T M O L E CU L A R P H Y L O G EN E T I C I N F ER EN CE 133



Oscarella
Reniera
Suberites

Porifera

Nematostella
Acropora
Cyanea
Hydractinia
Hydra
Branchiostoma
83




Ciona Molgula Halocynthia
Eptatretus

Cnidaria



Chordata

76 Petromyzon
Danio
Gallus
Xenopus
Xenoturbella


Xenoturbellida

73 Saccoglossus
Strongylocentrotus
Asterina
Spadella


Macrostomum



Schmidtea
Ambulacraria
Chaetognatha


53
Platynereis
Capitella
98 Lumbricus
Helobdella
97 Euprymna
92 Lottia
Aplysia
99 Crassostrea
52 Mytilus
88 Argopecten
Priapulus








Hypsibius
Echinococcus
Schistosoma
Fasciola
Platyhelminthes


Annelida


Mollusca

Priapulida
Tardigrada

Xiphinema
88
71
Trichuris
Trichinella
Brugia
Ascaris
Pristionchus
Caenorhabditis
Ancylostoma


82

Acanthoscurria Ixodes Boophilus

92
Daphnia


75


Litopenaeus Homarus Carcinus
Artemia
Maconellicoccus
Strongyloides
Bursaphelenchus
Pratylenchus
Meloidogyne
Heterodera

99 Locusta
51 Laupala
Acyrthosiphon

Pediculus
42 Rhodnius
Homalodisca
59 Lysiphlebus
Nasonia
98 Apis
Tribolium
Spodoptera
Bombyx
83 Ctenocephalides
Arthropoda




Chironomus


0.1
61 Aedes
Lutzomyia
72 Glossina
Drosophila

Figure 12.3 Phylogeny inferred using the WAG model. See legend of Figure 12.2 for details.



supported, with chordates emerging first (73% BP, only 19% for deuterostome monophyly).
• Chordates are monophylet ic (98% CAT BP,
83% WAG BP), receiving a stronger support than
in previous phylogenomic studies (Bourlat et al.,
2006; Delsuc et al., 2006). In addition, under both analyses, urochordates are closer to vertebrates than cephalochordates with 100% BP confirm- ing the monophyly of Olfactores (Delsuc et al.,
2006).

134 AN I M AL EV O L UTI O N



• The phylogenetic position of Xenoturbellida, as sister group of echinoderms + hemichordates (Bourlat et al., 2006), is also recovered (92% CAT BP,
73% WAG BP).

In summary, the WAG and CAT models agree with each other on 73 nodes, and disagree for a minor change at the base of insects and two major points: the monophyly of deuterostomes, and the relative order of molluscs, annelids, and platy- helminths. In the case of lophotrochozoans, the two models strongly disagree, whereas concerning deuterostomes, the difference is not statistically significant.


12.2.2 Model comparison and evaluation

The discrepancy between the two models indicates the presence of artefacts due to systematic errors. A statistical comparison of the two models may help in deciding which of them offers the most reli- able phylogenetic tree. In addition, the observed artefacts are the symptoms of model violations, which we will analyse using a standard statistical method, namely posterior predictive analysis. Note that the WAG empirical matrix is just one among the available empirical matrices, and the results obtained with WAG may not be representative of the general class of site-homogeneous models. To address this point, we also tested the GTR model along with WAG and CAT.
First, based on cross-validation tests (see elec- tronic supplementary material in Lartillot and Philippe, 2008), the CAT model was found to have a much better statistical fit than either WAG (a score of 3939 ± 163 in favour of CAT) or GTR (2765 ±
128). The better score of GTR relative to WAG (a difference of 1174 in favour of GTR) indicates that the data set is big enough for the parameters of the amino acid replacement matrix to be directly inferred, rather than taken from an empirically derived empirical matrix. On the other hand, the improvement in doing so is less significant than that accomplished by using the site-heterogeneous CAT model (1174 versus 2765).
Second, we performed posterior predictive ana- lyses, using two test statistics: one, vertical (i.e. computed along the columns of the alignment), is
the mean number of distinct amino acids per col-
umn (mean site-specific diversity) (Lartillot et al.,
2007); the other one, horizontal (i.e. computed
along the rows of the alignment), is the chi-square
compositional homogeneity test (Foster, 2004); see
electronic supplementary material in Lartillot and
Philippe (2008). Violation of the horizontal statis-
tic indicates that the model does not handle com-
positional biases correctly, whereas violation of
the vertical statistic means that the model does
not correctly account for site-specific biochemical
patterns.
Concerning the vertical test (Figure 12.4a), the
mean number of distinct amino acids per column
of the true alignment (mean observed diversity)
is 4.45. Site-homogeneous models predict a much
higher value (6.89 ± 0.04 for WAG, 6.53 ± 0.04 for
GTR). Thus, the assumptions underlying the site-
homogeneous models are strongly violated (P <
0.001, z = 62.3 for WAG; P < 0.001, z = 72.6 for
GTR). In contrast, the CAT model performs much
better (mean predicted diversity of 4.59 ± 0.03),
although it is also weakly rejected (P < 0.001, z =
4.6). As explained above, since overestimating the
number of states per position lead to underesti-
mating the probability of convergence (Lartillot
et al., 2007), one may expect a greater risk of LBA
under WAG for the present analysis. On the other
hand, all models—CAT, GTR, and WAG—fail the
horizontal test to the same extent (compositional
homogeneity, Figure 12.4b). This is not too surpris-
ing, given that all of them are time-homogeneous
amino acid replacement processes. However, the
violation is strong, as measured by the z-score
(z > 11), which warns us that there is a risk, which-
ever model is used, of observing artefacts related to
compositional biases.


12.3 Discussion

12.3.1 Towards better phylogenetic analyses

Phylogenetics is still a difficult and controversial field, because no foolproof method is yet avail- able for avoiding systematic errors. In this study we have tried to combine two methods that have proven efficient at alleviating artefacts while obtaining sufficient statistical support. First,

I M P ROV EM EN T M O L E CU L A R P H Y L O G EN E T I C I N F ER EN CE 135


(a) (b)



0.3

0.2

0.1



obs.


CAT
GTR



WAG

GTR CAT
WAG



obs.



4 5 6
7 8 0
0.0005 0.001
Amino acid diversity Compositional heterogenity

Figure 12.4 Posterior predictive tests. The observed value (arrow) of the test statistic is compared with the null distributions under CAT
and WAG models: (a) mean biochemical diversity per site; (b) maximum compositional deviation over taxa.




relying on EST projects, we have tried to com- bine an increase in the overall number of aligned sequence positions, so as to capture more of the primary phylogenetic signal, with an improved taxonomic sampling (Philippe and Telford, 2006). Second, we have also brought particular attention to the problem of probabilistic models under- lying the phylogenetic reconstruction. As is obvi- ous from our statistical evaluations, the standard model used in phylogenetics today, WAG, is not reliable, at least for deep-level phylogenies such as that of animals. Essentially, it is strongly rejected for its failure to explain either site-specific bio- chemical patterns or compositional differences between taxa. As indicated by our analysis of GTR, this failure is not specific to WAG and is likely to apply to all the site-homogeneous mod- els. The alternative model used here, CAT, is sig- nificantly better, but may not be reliable enough, in particular against potential artefacts induced by compositional biases.
Interestingly, the weaknesses of the WAG model also result in an overall lack of support, which is probably due to the unstable position of some fast- evolving groups (in particular, platyhelminths). This confirms previous observations (Baurain et al., 2007), and also illustrates that improving taxonomic sampling is not in itself a sufficient response to systematic errors but should be com- bined with an in-depth analysis of the probabilis- tic models used.
In the light of our model evaluation, the position of platyhelminths proposed by CAT as a sister group to annelids should be taken seriously. In this perspective, the Neotrochozoa (molluscs + anne- lids) found by WAG is interpreted as an artefact. This would not be too surprising, given the over- all saturation of the platyhelminth sequences. Note that the vestige of artefactual attraction between platyhelminths and nematodes observed under WAG should in itself warn us that the position of platyhelminths within Lophotrochozoa may not be reliably inferred under WAG. The phylogen- etic position of platyhelminths, relative to other lophotrochozoans, is a long-standing question, the potentially important implications of which have already been pointed out (Passamaneck and Halanych, 2006).
The other point of disagreement is about deuter- ostomes: monophyletic under WAG, they appear to be paraphyletic under CAT. This progressive emergence of deuterostome phyla is unusual. In fact, Deuterostomia sensu stricto (echinoderms, hemichordates, and chordates) have long been considered as one of the most reliable phylogen- etic groupings in animal phylogeny (Adoutte et al.,
2000). A possible explanation of the monophyly of deuterostomes obtained with WAG in terms of LBA would be that the fast-evolving protostomes are attracted by the outgroup. Given its implications (see below), this potential artefact would certainly deserve further attention. On the other hand,

136 AN I M AL EV O L UTI O N



caution is needed since the basal position of chord- ates observed under CAT does not receive a high bootstrap support (73%). It is also unstable upon small variations of the taxon sampling: for instance, deuterostome monophyly is recovered with CAT if either Spadella or Xenoturbella are removed from the analysis (data not shown). Similarly, it appears that the removal of the non-bilaterian outgroup leads to the non-monophyly of Xenambulacraria (Philippe et al., 2007). In addition, the fast-evolving acoels probably emerge close to the base of deu- terostomes, further shortening internal branches (Philippe et al., 2007). In summary, this suggests that the signal for resolving this part of the tree is weak, all the more so when the outgroup is dis- tantly related. Additional data should be analysed with improved methods before taking sides on this issue.
In a few respects our analysis is in contradic- tion with the results found by Dunn et al. (2008) (see also Chapter 6). First, the support in favour of deuterostomes is higher in Dunn et al.’s study than in our own investigation, although it still remains lower than 90%. Second, in their study, chaetog- naths are found in a sister-group relationship with Lophotrochozoa (albeit without support), whereas we found them at the base of Lophotrochozoa
+ Ecdysozoa. Third, Dunn et al. found mol- luscs and annelids to be closely related, whereas platyhelminths fall within a larger group of
‘Aschelminthes’ (Platyzoa), including gastrotrichs, rotifers, acoels, and myzostomids. Compared with our analysis, Dunn et al.’s investigation relies on a richer taxon sampling, which may confer more robustness to their conclusions. On the other hand, the Platyzoa hypothesis is not totally convin- cing. First, the relatively long branches of most of these groups (in particular acoels, but also platy- helminths) raise some suspicion about a possible artefactual attraction between these fast-evolving phyla. Second, Platyzoa are not congruent with recent findings about the phylogenetic position of acoels (Philippe et al., 2007). Third, missing data are much more frequent in the data set of Dunn et al. (2008) than in ours (44.5% versus 20%). In any case, the discrepancies observed between the two analyses suggest that both taxonomic sampling
and probabilistic models still need to be improved before a consensus about the details of the animal phylogeny can be reached.


12.3.2 Implications for the evolution of
Bilateria

Converging towards a reliable picture of the ani- mal phylogenetic tree is an interesting objective in itself. But more important are the implications of this phylogenetic picture for our vision of the mor- phological evolution of Bilateria (Telford and Budd,
2003). As mentioned in Section 12.1, morphological and developmental characters were traditionally the primary source of data used to infer phylogen- etic trees. It has now become clear that many of those characters, such as cleavage patterns or the fate of the blastopore, are not reliable phylogenetic markers. It is nevertheless interesting to map their evolution on a tree that has been inferred from independent (molecular) data and use this to learn as much as possible about the history of morpho- logical diversification of animal body plans. In this respect, comparative embryology, or evo-devo, is probably the primary customer of animal molecu- lar phylogenetics.
Much has already been said about how the
‘new animal phylogeny’ changes our way of look-
ing at the evolution of animals (Adoutte et al.,
1999; Halanych, 2004). One of the most import-
ant, and most frequent, messages has been that
secondary simplifications of morphology and of
developmental processes are common. This has
been repeatedly implied by most of the succes-
sive changes brought to the animal phylogeny
over the last 10 years, such as the repositioning
of Nematoda and Platyhelminthes within coel-
omate protostomes, or of Tunicata as the sister
group of Vertebrata.
In the context of the present chapter, the position
of platyhelminths alongside two neotrochozoan
phyla, as proposed by our CAT analysis, has similar
implications, specifically concerning the evolution
of development. Molluscs and annelids, together
with sipunculans, have a canonical spiral develop-
ment, characterized by a four-quartet spiral cleav-
age, an invariant and evolutionarily conserved cell

I M P ROV EM EN T M O L E CU L A R P H Y L O G EN E T I C I N F ER EN CE 137



lineage, including a single stem cell (4d, or mesen- toblast) giving rise to the mesodermal germbands, and a typical trochophore larva (Nielsen, 2001). In contrast, platyhelminths display atypical forms of spiral cleavage, and pass through a larval stage (Mueller’s larva) that can only loosely be homolo- gized to a trochophore. In this context, a basal pos- ition of platyhelminths in the lophotrochozoan group, as previously often found, is compatible with the intuitively appealing idea that evolution proceeds from simple to complex forms. Namely, platyhelminths would be ‘proto’-spiralians, outside a series of nested phyla, Trochozoa, Eutrochozoa, and Neotrochozoa (Peterson and Eernisse, 2001), corresponding to a graded series of increasingly complex forms of spiral development. Yet, the phylogeny favoured by CAT is at odds with the neotrochozoan hypothesis and implies that the development of platyhelminths is a secondarily modified (and ancestrally canonical) spiral devel- opment. Further taxonomic sampling within lophotrochozoans will be important, as it may not only allow a more robust inference of the position of platyhelminths, but also bring additional phyla that do not display a canonical spiral development among Eu- or Neotrochozoa, thereby leading to a completely different view of the evolution of spiral development.
The paraphyly of deuterostomes favoured by our CAT analysis, if confirmed, would also have deep implications concerning the way we inter- pret the evolution of Bilateria. First, it would result in a paraphyletic succession of three groups (Chordata, Xenambulacraria, and Chaetognatha), all of which display a radial cleavage, a deuter- ostomous gastrulation, and an enterocoelic mode of formation of the body cavity. Although these embryological characters are known to be evolu- tionarily labile (for instance, brachiopods have a deuterostomous gastrulation, and enterocoely is observed in nemerteans), this may be interpreted as phylogenetic evidence in favour of an ancestral deuterostomy. Similarly, the gill slits, found in chordates and hemichordates, would also have to be considered as ancestral to all Bilateria. In add- ition, with respect to all other Bilateria, chordates would then be of basal emergence, which turns
the traditional preconceptions radically upside down: in the perspective of this new phylogen- etic hypothesis, the chordate body plan is no longer the pinnacle of a progressive evolution through a succession of body plans of increas- ing complexity. Rather, chordates are one of the first bilaterian offshoots. This in turn would have consequences concerning the polarization of the morphological characters: thus far, most chordate-specific morphological and developmen- tal features (for instance their unique dorsoven- tral polarity, with dorsal nerve cord and ventral heart, have generally been assumed to be derived; Arendt and Nübler-Jung, 1994). In the context of the more traditional hypothesis of deuterostome monophyly, this assumption is justified, provided that the ancestral condition is clearly and jointly recognized in protostomes and in Ambulacraria (Arendt and Nübler-Jung, 1994). But the argument does not hold if chordates are the sister group of all other Bilateria: in that case, it is possible that some characters of chordates, such as the dorso- ventral polarity, may well have been ancestral to all bilaterally symmetrical animals.


12.4 Conclusion

Several phyla, in particular brachiopods and ony- chophorans, are still missing in phylogenomic analyses, and some others are poorly represented (aschelminths, chaetognaths, and hemichordates, among others), but the most species-rich phyla are now well sampled. Accordingly, one can be increasingly confident concerning the few robust aspects of the phylogeny of bilaterians that emerge from this and previous phylogenomic analyses. Essentially, the overall structure of protostomes (a split between lophotrochozoans and ecdyso- zoans, with chaetognaths at their base) seems stable, as well as the monophyly of Chordata and Ambulacraria. On the other hand, the monophyly of deuterostomes appears to be the most import- ant point yet to be settled in order to draw a complete picture of the scaffold of the bilaterian tree. Many aspects of the detailed relationships within each supergroup (in particular, Ecdysozoa and Lophotrochozoa) remain to be investigated.

138 AN I M AL EV O L UTI O N



Ongoing EST projects will soon bring many new species into this emerging picture, which will not only inform us about the phylogenetic position of those new species, but also result in an enriched taxonomic sampling, having a positive impact on the overall accuracy of phylogenetic inference. Yet, as suggested by the present chapter, this will not be sufficient, and will have to be combined with a significant improvement of the underlying prob- abilistic models. Much work is still needed, both concerning the acquisition of primary data and the methodological side, if one wants to converge towards a reliable, possibly final, picture of the bilaterian tree.
12.5 Acknowledgements

We thank Max Telford and Tim Littlewood for giv- ing us the opportunity to write this chapter. The Réseau Québécois de Calcul de Haute Performance provided computational resources. This work was supported by the Canadian Institute for Advanced Research, the Canadian Research Chair Program, the Centre National de la Recherche Scientifique (through the ACI-IMPBIO Model-Phylo funding program), and the Robert Cedergren Centre for bioinformatics and genomics. This work was finan- cially supported in part by the ‘60ème Commission Franco-Québécoise de Coopération Scientifique’.

No comments:

Post a Comment