Implement Codon-GO Analysis Pipeline with CUG-clade support
Created by: cursor[bot]
This PR implements a comprehensive Codon-GO Analysis Pipeline for analyzing codon usage and GO term enrichment in eukaryotic genomes.
Features Added
Core Pipeline
- Modular architecture with parsers, analysis, visualization, and utilities modules
- Multi-file EMBL/GenBank parsing for genome annotations
- GO data integration using GOATOOLS
- Adaptive GO-term enrichment at descending thresholds
- Wobble-modification filtering for specific amino acids
- Publication-quality visualizations (boxplots, heatmaps, PCA)
- YAML configuration system with CLI overrides
- Click-based command-line interface
CUG-clade Fungi Support
- Non-standard genetic code support for CUG-clade fungi (CTG → Serine instead of Leucine)
- Species configuration with cug_clade parameter
- Specialized visualizations for CTG usage analysis
- CLI integration with --cug-clade flag
User Experience Improvements
- Warning suppression for clean output (pkg_resources deprecation warnings)
- Verbose mode for debugging
- Comprehensive documentation and examples
- Testing framework with validation scripts
Commits
- Initial pipeline implementation with core functionality
- CUG-clade support for non-standard genetic code
- Warning suppression and configuration improvements
Testing
- Includes test scripts and validation
- Example configurations for various species
- Comprehensive error handling
Ready for review and testing with real genomic data.