Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

StackOverflowError in Structure serialization #697

Open
lukeczapla opened this issue Aug 6, 2017 · 14 comments
Open

StackOverflowError in Structure serialization #697

lukeczapla opened this issue Aug 6, 2017 · 14 comments
Labels
bug
Milestone

Comments

@lukeczapla
Copy link

@lukeczapla lukeczapla commented Aug 6, 2017

Hi guys, it might be an issue with the code on my end, but although my BasePairParameters class works to serialize my data, it still doesn't correctly serialize the Structure object. So if I serialize and save, then reload it into an empty BasePairParameters object, it will correctly pull out all the old data and prints 17 step parameters for a PDB structure (1P71, in my TestBasePairParameters class). However, if I call my analyze() method, which performs all the work all over again on the Structure object, I get out "no data".

So it seems that it has lost something in the translation and if someone could just check this out for me independently on another Structure, it would help. I'm pretty sure I synched to the latest changes because my PR now passes all the tests. I will try to see what's going on but I don't want to mess with the code too much because there were so many things I did to touch it up.

@lukeczapla
Copy link
Author

@lukeczapla lukeczapla commented Aug 6, 2017

Ok, actually I was wrong, it still throws an error (I forgot to recompile! haha). So Serialization with the Structure class doesn't work. The error is very similar to what I got when I tried to go in myself and mark all the related Structure classes as serializable. I am certain this is real though, because I re-cloned the repository from your main branch, put my code folders back into it, and compiled it over again.

java.lang.StackOverflowError
	at java.io.ObjectStreamClass$FieldReflector.getPrimFieldValues(ObjectStreamClass.java:2002)
	at java.io.ObjectStreamClass.getPrimFieldValues(ObjectStreamClass.java:1277)
	at java.io.ObjectOutputStream.defaultWriteFields(ObjectOutputStream.java:1533)
	at java.io.ObjectOutputStream.defaultWriteObject(ObjectOutputStream.java:441)
	at java.util.ArrayList.writeObject(ArrayList.java:755)
	at sun.reflect.GeneratedMethodAccessor33.invoke(Unknown Source)
	at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
	at java.lang.reflect.Method.invoke(Method.java:498)
	at java.io.ObjectStreamClass.invokeWriteObject(ObjectStreamClass.java:1028)
	at java.io.ObjectOutputStream.writeSerialData(ObjectOutputStream.java:1496)
	at java.io.ObjectOutputStream.writeOrdinaryObject(ObjectOutputStream.java:1432)
	at java.io.ObjectOutputStream.writeObject0(ObjectOutputStream.java:1178)
	at java.io.ObjectOutputStream.defaultWriteFields(ObjectOutputStream.java:1548)
	at java.io.ObjectOutputStream.writeSerialData(ObjectOutputStream.java:1509)
	at java.io.ObjectOutputStream.writeOrdinaryObject(ObjectOutputStream.java:1432)

.... it keeps going for about 50 times as many lines

@lukeczapla
Copy link
Author

@lukeczapla lukeczapla commented Aug 6, 2017

here's the code so it can be reproduced (NOT using my code at all, but just serializing a Structure):

https://gist.github.com/lukeczapla/7b87e70947e8ed5732f04bd07d113629

@lafita
Copy link
Member

@lafita lafita commented Aug 7, 2017

I already implemented a test for the Structure serialization and it is working. You might want to take a look at it (https://github.com/biojava/biojava/blob/master/biojava-structure/src/test/java/org/biojava/nbio/structure/TestStructureSerialization.java).

Note that your Test Class is not correct, since the order of the two tests is not enforced and it can be that the deserialization is tested before the serialization, so you have to put both parts in the same test. However, I think this is not the cause of the problem you get.

You are getting a StackOverflowError exception, which I am not sure if it is related to serialization, and I suspect it has to do with the file operations you do. I remember that serialization exceptions are easy to spot. You might want to try the same test with a class you know it is really serializable. Otherwise take a look at my test and add anything you think it is missing.

@lafita lafita added the question label Aug 7, 2017
@lafita
Copy link
Member

@lafita lafita commented Aug 7, 2017

Related to #673, #676 and #693

@lukeczapla
Copy link
Author

@lukeczapla lukeczapla commented Aug 7, 2017

Oh yes, my original test was with the class I knew was serializable (BasePairParameters class with the Structure object marked transient) with the same methodology, and it worked. I understand the order is not enforced but this exact same test works on several known serializable objects.

I'll try to use your tests later on in the day. But the StackOverflowException is specific to this class and not to other ones I've serialized with the same methodology, such as a Deeplearning4j trained model, a DNA simulation I wrote using Nd4j, BasePairParameters, etc.

[I marked second test with @After to enforce order in gist]

@lafita
Copy link
Member

@lafita lafita commented Aug 7, 2017

Ok @lukeczapla, you are right! It seems that the StackOverflowError occurs when the Structure is parsed from an MMCIF or MMTF file format, but not from a PDB file format (which is the format of my test). I will work out a solution and let you know.

@lafita lafita added bug and removed question labels Aug 7, 2017
@lukeczapla
Copy link
Author

@lukeczapla lukeczapla commented Aug 7, 2017

Thanks so much, I appreciate looking into it for me. I had used StructureIO.getStructure() and it seems to work with the RCSB and choose mmCIF by default. I personally prefer PDB but it seems to be moving to mmCIF due to size limitations of PDB format [and I've managed to build systems so big where I had to switch to digits with only 2 digits after the decimal with %8.2f to trick the PDB format]

@lafita
Copy link
Member

@lafita lafita commented Aug 7, 2017

Yes you should try to use mmCif where possible, PDB format is now legacy and should be avoided.

The default in BioJava is MMTF, and apparently that is the one that has problems when serialized. PDB and mmCif work fine. Will continue looking into it, thanks for reporting!

@lafita lafita changed the title Structure serialization issue? StackOverflowError in Structure serialization when parsed from MMTF fromat Aug 7, 2017
@lafita
Copy link
Member

@lafita lafita commented Aug 7, 2017

For the moment, to avoid the problem you can set the parsing file format as mmCif (not MMTF) and you should be able to serialize the Structure objects without problem. Instead of:

Structure s = StructureIO.getStructure("pdbid")

Try the following:

AtomCache cache = new AtomCache();
cache.setUseMmCif(true);
Structure s = cache.getStructure("pdbid");
@lafita
Copy link
Member

@lafita lafita commented Aug 7, 2017

I have been looking into what might be causing the StackOverflowError when calling the writeObject() method and it seems that objects referring to one another (recursively) is the most probable case.

This means that some pointers are corrupted (or misplaced) during the MMTF parsing, since this does not occur for Structures that come from PDB or MMCIF file formats. A way to debug this would be to plot the dependency graph of the objects in the Structure. Maybe we need a new test for that, since this serialization issue has opened another possible problem in BioJava Structures difficult to detect.

These threads are useful:

@pwrose do you have any idea or hint to the origin of the problem in the MMTF parser?

lafita added a commit to lafita/biojava that referenced this issue Aug 7, 2017
@pwrose
Copy link
Member

@pwrose pwrose commented Aug 7, 2017

@lafita
Copy link
Member

@lafita lafita commented Aug 8, 2017

@pwrose you are right, the bonds are creating the StackOverflowError. If I set the bond parsing as true for the MMCIF or PDB formats the error appears, as for MMTF.

I think we need to re-implement the writeObject() and readObject() methods if we want to allow Structure serialization. Another option could be to set the bond information as transient, but I am not sure the side-effects this can have.

@lafita lafita changed the title StackOverflowError in Structure serialization when parsed from MMTF fromat StackOverflowError in Structure serialization Aug 8, 2017
@lafita
Copy link
Member

@lafita lafita commented Aug 8, 2017

Since the MMTF format is actually an efficient serialization of a Structure, I was thinking that we could replace the read and write object methods of the Structure class with a coding and decoding to MMTF representations, respectively.

I think it should work fine and it would be a nice application for the format as well. Do you think this would be possible @pwrose?

@pwrose
Copy link
Member

@pwrose pwrose commented Aug 8, 2017

@josemduarte josemduarte added this to the BioJava 5.0.1 milestone Mar 7, 2018
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Linked pull requests

Successfully merging a pull request may close this issue.

None yet
4 participants
You can’t perform that action at this time.