Source Code for Neurite Sequence Representation and Analysis
This code was used in the following two-part study:
Gillette TA and Ascoli GA (2015) Topological characterization of neuronal arbor morphology via sequence representation. I. Motif analysis. BMC Bioinformatics (submitted).
Gillette TA, Hosseini P, and Ascoli GA (2015) Topological characterization of neuronal arbor morphology via sequence representation. II. Global Alignment. BMC Bioinformatics (submitted).
The code is broken up into 3 parts reflective of a combination of coding language and analysis.
- Part 1: Java (zip 1.5MB) - SWC i/o, tree representation and converseion, sequence encoding, and k-mer frequency extraction
- Morphology (morpho) packages - handles SWC files and binary trees
- Neurite Sequence (ns) packages - handles conversion to sequences and k-mer analysis
- Part 2: Python (zip 50KB, git) - Sequence alignment code (PASTA - Pattern Analyais via Sequence-based Tree Alignment)
- Spaghetti - Pairwise global alignment
- Penne - Multiple alignment
- Orzo - Extraction of frequently occuring "domains" (not used in study)
- Part 3: R (zip 0.5MB)
- Utility code
- Motif analysis (including paper 1 figure production)
- Analysis of alignment-space
- Normalization of alignment scores
- Cluster analysis
- Extraction of sequences for multiple alignment
- Paper 2 figure production
Also available are several instruction manuals explaining how the code is organized, how to set it up, and what to run to produce various analyses or outputs. For instance, producing a motif-colored dendrogram image from an SWC file requires components of part 1 and part 3. Parts 3 and 4 both contain R utility files used by files in each.
While you can download SWC files and retrieve metadata from NeuroMorpho.Org and convert them into sequence files (.fasta), you can also download the sequences files: neurite sequences (1.6MB), alignment baseline sequences (12.3MB compressed, 81.8MB uncompressed), and constrained surrogate sequences (252MB compressed, 1.4GB uncompressed) used in the papers.
For further information please contact Todd Gillette - todd <dot> gillette <at> gmail <dot> com.