Class gc

java.lang.Object
org.python.modules.gc

public class gc extends Object
In Jython, the gc module notably differs from that in CPython. This comes from the different ways Jython and CPython perform garbage collection. While CPython's garbage collection is based on reference counting, Jython is backed by Java's gc, which is based on a mark-and-sweep approach.

This difference becomes most notable if finalizers are involved that perform resurrection. While the resurrected object itself behaves rather similar between Jython and CPython, things are more delicate with objects that are reachable (i.e. strongly referenced) via the resurrected object exclusively. While in CPython such objects do not get their finalizers called, Jython/Java would call all their finalizers. That is because Java detects the whole unreachable subgraph as garbage and thus calls all their finalizers without any chance of direct intervention. CPython instead detects the unreachable object and calls its finalizer, which makes the object reachable again. Then all other objects are reachable from it and CPython does not treat them as garbage and does not call their finalizers at all. This further means that in Jython weak references to such indirectly resurrected objects break, while these persist in CPython.

As of Jython 2.7, the gc module offers some options to emulate CPython behavior. Especially see the flags PRESERVE_WEAKREFS_ON_RESURRECTION, DONT_FINALIZE_RESURRECTED_OBJECTS and DONT_FINALIZE_CYCLIC_GARBAGE for this.

Another difference is that CPython's gc module offers some debug features like counting of collected cyclic trash, which are hard to support by Jython. As of Jython 2.7 the introduction of a traverseproc mechanism (c.f. Traverseproc) made support of these features feasible. As support of these features comes with a significant emulation cost, one must explicitly tell gc to perform this. To make objects subject to cyclic trash counting, these objects must be gc-monitored in Jython. See monitorObject(PyObject), unmonitorObject(PyObject), MONITOR_GLOBAL and stopMonitoring() for this.

If at least one object is gc-monitored, collect() works synchronously in the sense that it blocks until all gc-monitored objects that are garbage actually have been collected and had their finalizers called and completed. collect() will report the number of collected objects in the same manner as in CPython, i.e. counts only those that participate in reference cycles. This allows a unified test implementation across Jython and CPython (which applies to most tests in test_gc.py). If not any object is gc-monitored, collect() just delegates to System.gc(), runs asynchronously (i.e. non-blocking) and returns UNKNOWN_COUNT. See also DEBUG_SAVEALL for a useful gc debugging feature that is supported by Jython from version 2.7 onwards.

Implementing all these features in Jython involved a lot of synchronization logic. While care was taken to implement this without using timeouts as far as possible and rely on locks, states and system/hardware independent synchronization techniques, this was not entirely feasible.
The aspects that were only feasible using a timeout are waiting for gc to enqueue all collected objects (i.e. weak references to monitored objects that were gc'ed) to the reference queue and waiting for gc to run all PyObject finalizers.

Waiting for trash could in theory be strictly synchronized by using MXBeans, i.e. GarbageCollectionNotificationInfo and related API. However, experiments showed that the arising gc notifications do not reliably indicate when enqueuing was done for a specific gc run. We kept the experimental implementation in source code comments to allow easy reproducibility of this issue. (Note that out commented code contradicts Jython styleguide, but this one - however - is needed to document this infeasible approach and is explicitly declared accordingly).

But how is sync done now? We insert a sentinel before running gc and wait until this sentinel was collected. Timestamps are taken to give us an idea at which time scales the gc of the current JVM performs. We then wait until twice the measured time (i.e. duration from call to System.gc() until the sentinel reference was enqueued) has passed after the last reference was enqueued by gc. While this approach is not entirely safe in theory, it passes all tests on various systems and machines we had available for testing so far. We consider it more robust than a fixed-length timeout and regard it the best known feasible compromise to emulate synchronous gc runs in Java.

The other timing-based synchronization issue - waiting for finalizers to run - is solved as follows. Since PyObject finalizers are based on FinalizeTriggers, Jython has full control about these finalization process from a central point. Before such a finalizer runs, it calls notifyPreFinalization() and when it is done, it calls notifyPostFinalization(). While processing of a finalizer can be of arbitrary duration, it widely holds that Java's gc thread calls the next finalizer almost instantaneously after the former. That means that a timestamp taken in notifyPreFinalization() is usually delayed only few milliseconds - often even reported as 0 milliseconds - after the last taken timestamp in notifyPostFinalization() (i.e. that was called by the previous finalizer). Jython's gc module assumes the end of Java's finalization process if postFinalizationTimeOut milliseconds passed after a call of notifyPostFinalization() without another call to notifyPreFinalization() in that time. The default value of postFinalizationTimeOut is 100, which is far larger than the usual almost-zero duration between finalizer calls.
This process can be disturbed by third-party finalizers of non-PyObjects brought into the process by external libraries. If these finalizers are of short duration (which applies to typical finalizers), one can deal with this by adjusting postFinalizationTimeOut, which was declared public for exactly this purpose. However if the external framework causing the issue is Jython aware, a cleaner solution would be to let its finalizers call notifyPreFinalization() and notifyPostFinalization() appropriately. In that case these finalizers must not terminate by throwing an exception before notifyPostFinalization() was called. This is a strict requirement, since a deadlock can be caused otherwise.

Note that the management API (c.f. com.sun.management.GarbageCollectionNotificationInfo) does not emit any notifications that allow to detect the end of the finalization phase. So this API provides no alternative to the described technique.

Usually Java's gc provides hardly any guarantee about its collection and finalization process. It not even guarantees that finalizers are called at all (c.f. http://howtodoinjava.com/2012/10/31/why-not-to-use-finalize-method-in-java). While at least the most common JVM implementations usually do call finalizers reliably under normal conditions, there still is no specific finalization order guaranteed (one might reasonably expect that this would be related to reference connection graph topology, but this appears not to be the case). However Jython now offers some functionality to compensate this situation. Via registerPreFinalizationProcess(Runnable) and registerPostFinalizationProcess(Runnable) and related methods one can now listen to beginning and end of the finalization process. Note that this functionality relies on the technique described in the former paragraph (i.e. based on calls to notifyPreFinalization() and notifyPostFinalization()) and thus underlies its unsafety, if third-party finalizers are involved. Such finalizers can cause false-positive runs of registered (pre/post) finalization processes, so this feature should be used with some care. It is recommended to use it only in such a way that false-positive runs would not cause serious harm, but only some loss in performance or so.