E

Etetoolkit Engine

Comprehensive skill designed for phylogenetic, tree, toolkit, manipulation. Includes structured workflows, validation checks, and reusable patterns for scientific.

SkillClipticsscientificv1.0.0MIT
0 views0 copies

ETE Toolkit Engine

A scientific computing skill for phylogenetic tree analysis using ETE (Environment for Tree Exploration) — the Python toolkit for manipulating, analyzing, and visualizing phylogenetic and hierarchical trees with publication-quality rendering.

When to Use This Skill

Choose ETE Toolkit Engine when:

  • Parsing, manipulating, and querying phylogenetic trees (Newick, NHX, PhyloXML)
  • Visualizing trees with custom node styles, colors, and annotations
  • Performing phylogenetic comparisons (Robinson-Foulds distance, topology tests)
  • Building automated phylogenetic analysis pipelines

Consider alternatives when:

  • You need tree inference from sequences (use IQ-TREE, RAxML, or MrBayes)
  • You need sequence alignment (use MAFFT, MUSCLE, or ClustalW)
  • You need interactive web-based visualization (use iTOL or Nextstrain)
  • You need simple tree plotting without analysis (use Bio.Phylo from BioPython)

Quick Start

claude "Load a Newick tree, annotate nodes, and render a publication figure"
from ete3 import Tree, TreeStyle, NodeStyle, TextFace, CircleFace # Load a phylogenetic tree tree = Tree("((A:0.1,B:0.2)90:0.3,(C:0.15,D:0.25)85:0.4)100;", format=0) print(tree.get_ascii(show_internal=True)) print(f"Leaves: {len(tree)}") print(f"Internal nodes: {len(list(tree.traverse())) - len(tree)}") # Traverse and annotate for node in tree.traverse(): if node.is_leaf(): # Add species label face = TextFace(node.name, fsize=12, fgcolor="black") node.add_face(face, column=0, position="branch-right") else: # Color internal nodes by support if node.support >= 90: style = NodeStyle() style["fgcolor"] = "green" style["size"] = 8 node.set_style(style) # Render ts = TreeStyle() ts.show_leaf_name = False # We added custom faces ts.show_branch_support = True ts.branch_vertical_margin = 15 tree.render("phylogeny.pdf", tree_style=ts, w=800)

Core Concepts

Tree Operations

OperationMethodDescription
Load treeTree(newick_string)Parse Newick/NHX format
Traversetree.traverse("postorder")Visit all nodes
Get leavestree.get_leaves()Terminal nodes only
Find nodetree.search_nodes(name="A")Search by attributes
Get ancestortree.get_common_ancestor("A", "B")MRCA of taxa
Prunetree.prune(["A", "B", "C"])Keep only listed leaves
Roottree.set_outgroup("A")Reroot the tree
Distancetree.get_distance("A", "B")Branch length distance

Tree Comparison

from ete3 import Tree t1 = Tree("((A,B),(C,D));") t2 = Tree("((A,C),(B,D));") # Robinson-Foulds distance rf, max_rf, _, _, _, _, _ = t1.robinson_foulds(t2, unrooted_trees=True) print(f"RF distance: {rf}") print(f"Normalized RF: {rf/max_rf:.2f}") # Topology comparison result = t1.compare(t2, unrooted=True) print(f"Source edges: {result['source_edges_in_ref']}") print(f"Ref edges: {result['ref_edges_in_source']}")

Phylogenetic Workflows

from ete3 import PhyloTree, EvolTree # Load alignment and tree together ptree = PhyloTree("tree.nw") ptree.link_to_alignment("alignment.fasta") # Get species-gene reconciliation ptree.set_species_naming_function(lambda x: x.split("_")[0]) recon = ptree.get_speciation_trees() # Evolutionary analysis with CodeML (dN/dS) etree = EvolTree("tree.nw") etree.link_to_alignment("codon_alignment.fasta") etree.run_model("M0") # One-ratio model etree.run_model("M1") # Nearly neutral etree.run_model("M2") # Positive selection # Compare models pvalue = etree.get_most_likely("M2", "M1") print(f"Selection test p-value: {pvalue}")

Configuration

ParameterDescriptionDefault
tree_formatNewick format variant (0-9)0
quoted_node_namesHandle quoted namesFalse
render_engineQt or SVG renderingQt
output_formatPDF, PNG, SVGPDF
branch_length_modeShow branch lengths or notTrue

Best Practices

  1. Specify the correct Newick format. ETE supports 10 Newick format variants (0-9) with different conventions for internal names, support values, and branch lengths. Using the wrong format misparses node labels. Format 0 is most common.

  2. Use traverse() with strategy parameter. "postorder" (leaves first) is best for bottom-up calculations like computing clade sizes. "preorder" (root first) is better for top-down annotation. "levelorder" visits nodes by depth.

  3. Prune trees before comparison. When comparing trees with different leaf sets, prune both to their shared taxa first. Robinson-Foulds distance is undefined for trees with different leaf sets.

  4. Use TreeStyle for publication figures. Customize branch colors, node sizes, and label positions through TreeStyle and NodeStyle rather than post-processing in image editors. ETE produces vector output (PDF/SVG) that scales perfectly for publications.

  5. Cache large trees in pickle format. Parsing very large Newick strings is slow. After first parsing, save the ETE tree object with Python's pickle module for faster subsequent loading.

Common Issues

Tree rendering fails with Qt errors. ETE3 uses Qt for rendering. On headless servers, install xvfb and run with xvfb-run python script.py, or use tree.render() with SVG output which doesn't require a display. ETE4 improves headless rendering.

Node names with special characters cause parsing errors. Newick format uses parentheses, commas, colons, and semicolons as delimiters. Node names containing these characters must be quoted. Use quoted_node_names=True when loading such trees.

Robinson-Foulds distance seems wrong. Check that both trees are rooted/unrooted consistently. Set unrooted_trees=True for unrooted comparison. Also verify that leaf names match exactly between trees — different naming conventions (spaces, underscores) cause mismatches.

Community

Reviews

Write a review

No reviews yet. Be the first to review this template!

Similar Templates