Skip to content

Fix amino acid validation to accept three-letter codes

Created by: cursor[bot]

Problem

The configuration validation was rejecting three-letter amino acid codes like "Leu", "Lys", "Gln", etc. in the wobble_aas configuration, causing the pipeline to fail with:

ERROR - Error loading configuration: Invalid amino acid code: Leu

Solution

Updated the validate_config() function in src/codon_go/utils/config_loader.py to accept both single-letter and three-letter amino acid codes.

Changes Made

  • Added support for three-letter amino acid codes (Ala, Arg, Asn, Asp, Cys, Gln, Glu, Gly, His, Ile, Leu, Lys, Met, Phe, Pro, Ser, Thr, Trp, Tyr, Val)
  • Maintained backward compatibility with existing single-letter codes (A, R, N, D, C, Q, E, G, H, I, L, K, M, F, P, S, T, W, Y, V)
  • Updated validation logic to check both formats

Before

valid_aas = set('ACDEFGHIKLMNPQRSTVWY')
for aa in wobble_aas:
    if aa not in valid_aas:
        raise ValueError(f"Invalid amino acid code: {aa}")

After

valid_single_letter = set('ACDEFGHIKLMNPQRSTVWY')
valid_three_letter = {
    'Ala', 'Arg', 'Asn', 'Asp', 'Cys', 'Gln', 'Glu', 'Gly', 'His', 'Ile',
    'Leu', 'Lys', 'Met', 'Phe', 'Pro', 'Ser', 'Thr', 'Trp', 'Tyr', 'Val'
}
for aa in wobble_aas:
    if aa not in valid_single_letter and aa not in valid_three_letter:
        raise ValueError(f"Invalid amino acid code: {aa}")

Testing

Configuration validation now passes with three-letter codes
Pipeline starts successfully and loads configuration
All CLI commands work correctly
Backward compatibility maintained for single-letter codes

Example Configuration

The pipeline now properly handles configurations like:

wobble_aas:
  - Leu
  - Lys  
  - Gln
  - Glu
  - Phe
  - Trp
  - Ser  # Important for CUG-clade species

Impact

This fix resolves the configuration validation error and allows users to use the more descriptive three-letter amino acid codes in their configuration files while maintaining full backward compatibility.

Merge request reports

Loading