Join GitHub today
GitHub is home to over 50 million developers working together to host and review code, manage projects, and build software together.
Sign upMMCif behavior when auth_seq_id is missing #775
Comments
These tests document the current behavior. See biojava#775 for discusion about the correct behavior.
|
Thanks @sbliven , very nice tests! There's one detail I was forgetting: HETATMs have a '.' for their label_seq_id in deposited PDB files. I've added some HETATMs to your tests and they failed. I have made a fix that should handle that: josemduarte@699af8a Please feel free to pull request all that together if you are happy with that. |
Parsing such a file again throws a NumberFormatException. Further work/discussion of this issue is on biojava#775, but it was blocking the merging of biojava#774.
|
I think this can't be properly fixed in 4.* because we need access to the seq_id. This is stored in I'm going to rebase these tests onto the master branch and do any fixes there. 4.* will continue to throw errors if auth_seq_id is missing or not numeric. |
|
A related comment is that |
A discussion came up in PR #774 regarding the correct behavior when parsing an mmCif file without
auth_seq_id.BioJava 4.2.11 requires the
auth_seq_idcolumn. This is a problem because it is optional according to the spec and omitted by PyMOL.In b207d34 I added code to use the
label_seq_idcolumn for creating the ResidueNumber for each group ifauth_seq_idis missing. There was some concern that this could lead to inconsistent residue numbers if some residues used '?' (defaulting tolabel_) while the rest used theauth_values specified. This worry is actually not justified due to another bug, which causes a NumberFormatException if '?' is used in that column.@josemduarte suggested only doing the
label_fallback if ALL groups have null ResidueNumbers. This is probably the right solution, but it seems like such an edge case it might not be worth the hour it will take to fix it.