Skip to content

Implement Codon-GO Analysis Pipeline with CUG-clade support

Created by: cursor[bot]

This PR implements a comprehensive Codon-GO Analysis Pipeline for analyzing codon usage and GO term enrichment in eukaryotic genomes.

Features Added

Core Pipeline

  • Modular architecture with parsers, analysis, visualization, and utilities modules
  • Multi-file EMBL/GenBank parsing for genome annotations
  • GO data integration using GOATOOLS
  • Adaptive GO-term enrichment at descending thresholds
  • Wobble-modification filtering for specific amino acids
  • Publication-quality visualizations (boxplots, heatmaps, PCA)
  • YAML configuration system with CLI overrides
  • Click-based command-line interface

CUG-clade Fungi Support

  • Non-standard genetic code support for CUG-clade fungi (CTG → Serine instead of Leucine)
  • Species configuration with cug_clade parameter
  • Specialized visualizations for CTG usage analysis
  • CLI integration with --cug-clade flag

User Experience Improvements

  • Warning suppression for clean output (pkg_resources deprecation warnings)
  • Verbose mode for debugging
  • Comprehensive documentation and examples
  • Testing framework with validation scripts

Commits

  • Initial pipeline implementation with core functionality
  • CUG-clade support for non-standard genetic code
  • Warning suppression and configuration improvements

Testing

  • Includes test scripts and validation
  • Example configurations for various species
  • Comprehensive error handling

Ready for review and testing with real genomic data.

Merge request reports

Loading