Midterm evaluation

The midterm-evaluation milestone is as follows:
Have JyNI detect and break reference-cycles in native objects backed by Java-GC. This must be done by Java-GC in order to deal with interfering non-native PyObjects. Further this functionality must be monitorable, so that it can transparently be observed and confirmed.

Sketch of some issues

The issues to overcome for this milestone were manifold:
  • The ordinary reference-counting for scenarios that actually should work without GC contained a lot of bugs in JyNI C-code. This had to be fixed. When I wrote this code initially, the GC-concept was still an early draft and in many scenarios it was unclear whether and how reference-counting should be applied. Now all this needed to be fixed (and there are probably still remaining issues of this type)
  • JNI defines a clear policy how to deal with provided jobject-pointers. Some of them must be freed explicitly. On the other hand some might be freed implicitly by the JVM - without your intention, if you don't get it right. Also on this front vast clean-up in JyNI-code was needed, also to avoid immortal trash.
  • JyNI used to keep alive Java-side-PyObjects that were needed by native objects indefinitely.
    Now these must be kept alive by the Java-copy of the native reference-graph instead. It was hard to make this mechanism sufficiently robust. Several bugs caused reference-loss and had to be found to make the entire construct work. On the other hand some bugs also caused hard references to persist, which kept Java-GC from collecting the right objects and triggering JyNI's GC-mechanism.
  • Issues with converting self-containing PyObjects between native side and Java-side had to be solved. These were actually bugs unrelated to GC, but still had to be solved to achieve the milestone.
  • A mechanism to monitor native references from Java-side, especially their malloc/free actions had to be established.
    Macros to report these actions to Java/JyNI were inserted into JyNI's native code directly before the actual calls to malloc or free. What made this edgy is the fact that some objects are not freed by native code (which was vastly inherited from CPython 2.7), but cached for future use (e.g. one-letter strings, small numbers, short tuples, short lists). Acquiring/returning an object from/to such a cache is now also reported as malloc/free, but specially flagged. For all these actions JyNI records timestamps and maintains a native object-log where one can transparently see the lifetime-cycle of each native object.
  • The original plan to explore native object's connectivity in the GC_Track-method is not feasible because for tuples and lists this method is usually called before the object is populated.
    JyNI will have a mechanism to make it robust of invalid exploration-attempts, but this mechanism should not be used for normal basic operation (e.g. tuple-allocation happens for every method-call) but only for edgy cases, e.g. if an extension defines its own types, registers instances of them in JyNI-GC and then does odd stuff with them.
    So now GC_track saves objects in a todo-list regarding exploration and actual exploration is performed at some critical JyNI-operations like on object sync-on-init or just before releasing the GIL. It is likely that this strategy will have to be fine-tuned later.

Proof of the milestone

To prove the achievement of the explained milestone I wrote a script that creates a reference-cycle of a tuple and a list such that naive reference-counting would not be sufficient to break it. CPython would have to make use of its garbage collector to free the corresponding references.
  1. I pass the self-containing tuple/list to a native method-call to let JyNI create native counterparts of the objects.
  2. I demonstrate that JyNI's reference monitor can display the corresponding native objects ("leaks" in some sense).
  3. The script runs Java-GC and confirms that it collects the Jython-side objects (using a weak reference).
  4. JyNI's GC-mechanism reports native references to clear. It found them, because the corresponding JyNI GC-heads were collected by Java-GC.
  5. Using JyNI's reference monitor again, I confirm that all native objects were freed. Also those in the cycle.

The GC demonstration-script

import time
from JyNI import JyNI
from JyNI import JyReferenceMonitor as monitor
from JyNI.gc import JyWeakReferenceGC
from java.lang import System
from java.lang.ref import WeakReference
import DemoExtension

# For now we attempt to verify JyNI's GC-functionality independently from
# Jython concepts like Jython weak references or Jython GC-module.
# So we use java.lang.ref.WeakReference and java.lang.System.gc
to monitor and control Java-gc.

JyWeakReferenceGC.monitorNativeCollection = True

l = (123, [0, "test"])
l[1][0] = l
#We create weak reference to l to monitor collection by Java-GC:
wkl = WeakReference(l)
print "weak(l): "+str(wkl.get())

# We pass down l to some native method. We don't care for the method itself,
# but conversion to native side causes creation of native PyObjects that
# correspond to l and its elements. We will then track the life-cycle of these.
print "make l native..."

print "Delete l... (but GC not yet ran)"
del l
print "weak(l) after del: "+str(wkl.get())
print ""
# monitor.list-methods display the following format:
# [native pointer]{'' | '_GC_J' | '_J'} ([type]) #[native ref-count]: [repr] *[creation time]
# _GC_J means that JyNI tracks the object
# _J means that a JyNI-GC-head exists, but the object is not actually treated by GC
# This can serve monitoring purposes or soft-keep-alive (c.f. java.lang.ref.SoftReference)
# for caching.
print "Leaks before GC:"
print ""

# By inserting this line you can confirm that native
# leaks would persist if JyNI-GC is not working:
#JyWeakReferenceGC.nativecollectionEnabled = False

print "calling Java-GC..."
print "weak(l) after GC: "+str(wkl.get())
print ""
print ""
print "leaks after GC:"

print ""
print "===="
print "exit"
print "===="

It is contained in JyNI in the file JyNI-Demo/src/JyNIRefMonitor.py

Instructions to reproduce this evaluation

  1. You can get the JyNI-sources by calling
    git clone https://github.com/Stewori/JyNI
    Switch to JyNI-folder:
    cd JyNI
  2. (On Linux with gcc) edit the makefile (for OSX with llvm/clang makefile.osx) to contain the right paths for JAVA_HOME etc. You can place a symlink to jython.jar (2.7.0 or newer!) in the JyNI-folder or adjust the Jython-path in makefile.
  3. Run make (Linux with gcc)
    (for OSX with clang use make -f makefile.osx)
  4. To build the DemoExtension enter its folder:
    cd DemoExtension
    and run setup.py:
    python setup.py build
    cd ..
  5. Confirm that JyNI works:
  6. ./JyNI_GCDemo.sh

Discussion of the output

Running JyNI_GCDemo.sh:

JyNI: memDebug enabled!
weak(l): (123, [(123, [...]), 'test'])
make l native...
Delete l... (but GC not yet ran)
weak(l) after del: (123, [(123, [...]), 'test'])

Leaks before GC:
Current native leaks:
139971370108712_GC_J (list) #2: "[(123, [...]), 'test']" *28
139971370123336_J (str) #2: "test" *28
139971370119272_GC_J (tuple) #1: "((123, [(123, [...]), 'test']),)" *28
139971370108616_GC_J (tuple) #3: "(123, [(123, [...]), 'test'])" *28

calling Java-GC...
weak(l) after GC: None

Native delete-attempts:
139971370108712_GC_J (list) #0: -jfreed- *28
139971370123336_J (str) #0: -jfreed- *28
139971370119272_GC_J (tuple) #0: -jfreed- *28
139971370108616_GC_J (tuple) #0: -jfreed- *28

leaks after GC:
no leaks recorded

Let's briefly discuss this output. We created a self-containing tuple called l. To allow it to self-contain we must put a list in between. Using a Java-WeakReference, we confirm that Java-GC collects our tuple. Before that we let JyNI's reference monitor print a list of native objects that are currently allocated. We refer to them as "leaks", because all native calls are over and there is no obvious need for natively allocated objects now. #x names their current native ref-count. It explains as follows (observe that it contains a cycle):
139971370108712_GC_J (list) #2: "[(123, [...]), 'test']"

This is l[1]. One reference is from JyNI to keep it alive, the second one is from l.

139971370123336_J (str) #2: "test"

This is l[1][1]. One reference is from JyNI to keep it alive, the second one is from l[1].

139971370119272_GC_J (tuple) #1: "((123, [(123, [...]), 'test']),)"
This is the argument-tuple that was used to pass l to the native method. The reference is from JyNI to keep it alive.
139971370108616_GC_J (tuple) #3: "(123, [(123, [...]), 'test'])" 
This is l. One reference is from JyNI to keep it alive, the second one is from the argument-tuple (139971370108616)and the third one is from l[1]. Thus it builds a reference-cycle with l[1].

After running Java-GC (and giving it some time to finnish) we confirm that our weak reference to l was cleared. And indeed, JyNI's GC-mechanism reported some references to clear, all reported leaks among them. Finally another call to JyNI's reference monitor does not list leaks any more.

Check that this behavior is not self-evident

In JyNI-Demo/src/JyNIRefMonitor.py go to the section:

# By inserting this line you can confirm that native
# leaks would persist if JyNI-GC is not working:
#JyWeakReferenceGC.nativecollectionEnabled = False

Change it to
# By inserting this line you can confirm that native
# leaks would persist if JyNI-GC is not working:

JyWeakReferenceGC.nativecollectionEnabled = False

Run JyNI_GCDemo.sh again. You will notice that the native leaks persist.

Next steps

The mechanism currently does not cover all native types. While many should already work I expect that some bugfixing and clean-up will be required to make this actually work. With the demonstrated reference-monitor-mechanism the necessary tools to make this debugging straight forward are now available.

After fixing the remaining types and providing some tests for this, I will implement an improvement to the GC-mechanism that makes it robust against silent modification of native PyObjects (e.g. via macros). And provide tests for this.

Finally I will add support for the PyWeakReference builtin type. As far as time allows after that I'll try to get ctypes working.

No comments:

Post a Comment