Fix amino acid validation to accept three-letter codes
Created by: cursor[bot]
Problem
The configuration validation was rejecting three-letter amino acid codes like "Leu", "Lys", "Gln", etc. in the wobble_aas
configuration, causing the pipeline to fail with:
ERROR - Error loading configuration: Invalid amino acid code: Leu
Solution
Updated the validate_config()
function in src/codon_go/utils/config_loader.py
to accept both single-letter and three-letter amino acid codes.
Changes Made
- Added support for three-letter amino acid codes (Ala, Arg, Asn, Asp, Cys, Gln, Glu, Gly, His, Ile, Leu, Lys, Met, Phe, Pro, Ser, Thr, Trp, Tyr, Val)
- Maintained backward compatibility with existing single-letter codes (A, R, N, D, C, Q, E, G, H, I, L, K, M, F, P, S, T, W, Y, V)
- Updated validation logic to check both formats
Before
valid_aas = set('ACDEFGHIKLMNPQRSTVWY')
for aa in wobble_aas:
if aa not in valid_aas:
raise ValueError(f"Invalid amino acid code: {aa}")
After
valid_single_letter = set('ACDEFGHIKLMNPQRSTVWY')
valid_three_letter = {
'Ala', 'Arg', 'Asn', 'Asp', 'Cys', 'Gln', 'Glu', 'Gly', 'His', 'Ile',
'Leu', 'Lys', 'Met', 'Phe', 'Pro', 'Ser', 'Thr', 'Trp', 'Tyr', 'Val'
}
for aa in wobble_aas:
if aa not in valid_single_letter and aa not in valid_three_letter:
raise ValueError(f"Invalid amino acid code: {aa}")
Testing
Example Configuration
The pipeline now properly handles configurations like:
wobble_aas:
- Leu
- Lys
- Gln
- Glu
- Phe
- Trp
- Ser # Important for CUG-clade species
Impact
This fix resolves the configuration validation error and allows users to use the more descriptive three-letter amino acid codes in their configuration files while maintaining full backward compatibility.