{"id":18,"date":"2014-07-02T15:58:13","date_gmt":"2014-07-02T15:58:13","guid":{"rendered":"https:\/\/projects.lsv.ens-paris-saclay.fr\/orchidsdev\/?p=18"},"modified":"2015-08-30T07:02:52","modified_gmt":"2015-08-30T07:02:52","slug":"garbage-collection","status":"publish","type":"post","link":"https:\/\/projects.lsv.ens-paris-saclay.fr\/orchidsdev\/?p=18","title":{"rendered":"Garbage collection"},"content":{"rendered":"<p>Contrarily to all versions of OrchIDS \u22641.1, OrchIDS 2.0 has <em>garbage-collected<\/em> memory.<\/p>\n<p>This means that you still allocate memory, but do <em>not<\/em> deallocate it explicitly. This imposes a certain number of constraints on the way you program.<!--more--><\/p>\n<p>The garbage collector is implemented in file <tt>src\/util\/gc.c<\/tt>. This is a <a title=\"Mark-during-sweep\" href=\"https:\/\/citeseerx.ist.psu.edu\/viewdoc\/summary?doi=10.1.1.42.5135\">mark-during-sweep<\/a> garbage collector, an algorithm described by Queinnec, Beaudoing and Queille in 1989. This little-known algorithm is an incremental garbage collector that improves upon Dijkstra&#8217;s three-color incremental garbage collector.<\/p>\n<p>Let us say that, at any point in time, every garbage-collectable piece of data can be white, black, or grey.\u00a0 The black objects are those that have been marked as used by the garbage collector.\u00a0 The white objects are those that have not yet been marked by the garbage collector (the objects that eventually remain white after the end of the marking phase are those that will be deallocated).\u00a0 Finally, the grey objects are those that are <em>in the process of<\/em> being marked.<\/p>\n<p>One should remember that no black object should ever point directly to a white object.\u00a0 To store a pointer to a garbage-collectable object p into another one, one must always <em>shade<\/em> p first, namely, make it grey if it was white.\u00a0 This is the purpose of the <code>gc_touch()<\/code> function.<\/p>\n<p>The main functions are:<\/p>\n<ul>\n<li>The allocator itself:\n<pre>void *gc_alloc (gc_t *gc_ctx, size_t n, gc_class_t *class);<\/pre>\n<p>Allocates a garbage-collectable, <code>n<\/code> byte memory zone.<br \/>\nThe first argument, <code>gc_ctx<\/code>, is the garbage collector context.\u00a0 This is a struct containing all the information that the garbage collector needs to function propertly.\u00a0 You are not meant to touch it in any way.<br \/>\nThe\u00a0<code>class<\/code>\u00a0argument describes all that the garbage-collector needs to know about your structure, in particular, how to mark its subfields. \u00a0It should not\u00a0be <code>NULL<\/code>.\u00a0 See <code>gc_class_t<\/code> below for more explanations.<\/p>\n<p><code>gc_alloc()<\/code> returns an object of type <code>void *<\/code>. Internally, however, <code>gc_alloc()<\/code> allocates an object of size <code>n<\/code> that starts with a <code>gc_header_t<\/code> header.\u00a0 In particular, <code>n<\/code> should be \u2265 <code>sizeof(gc_header_t)<\/code>.<br \/>\nThe basic strategy to define a garbage-collectable type is to define it as a struct with a first field of type <code>gc_header_t<\/code>.\u00a0 For example, the Orchids native integer type <a title=\"The ovm_int_t type\" href=\"https:\/\/projects.lsv.ens-paris-saclay.fr\/orchidsdev\/?p=21\"><code>ovm_int_t<\/code><\/a> is defined by:<\/p>\n<pre>struct ovm_int_s\r\n{\r\n  gc_header_t gc;\r\n  long val;\r\n};<\/pre>\n<p>and you would allocate an integer by calling:<\/p>\n<pre>struct ovm_int_s *s =\r\n   gc_alloc(gc_ctx, sizeof(struct ovm_int_s), &amp;int_class);\r\ns-&gt;gc.type = T_INT;\r\n<\/pre>\n<p>The <code>int_class<\/code> <code>gc_class<\/code> we are using for that structure, and we shall explain this bit later.<br \/>\nNote that you should follow the call to <code>gc_alloc()<\/code> by code that sets the type (here, <code>T_INT<\/code>).<br \/>\nIf you create a new <code>gc_class<\/code>, you must also enforce the following relation between the type (here, <code>T_INT<\/code>) and the <code>gc_class<\/code>: there is an array <code>gc_classes[256]<\/code> in <code>lang.c<\/code>, and its entry at index given by the type should be the corresponding <code>gc_class<\/code>; in our case, you should check that <code>gc_classes[T_INT]<\/code> indeed contains <code>&amp;int_class<\/code>.<\/li>\n<li>By the way, the purpose of declaring the return type of <code>gc_alloc()<\/code> to be <code>void *<\/code> instead of <code>gc_header_t *<\/code> is to allow you to write code similar to the above. Otherwise, you would need a cast, and would have to write: <code>struct ovm_int_s *s = (struct ovm_int_s *)gc_alloc(gc_ctx, sizeof(struct ovm_int_s), NULL);<\/code>.<br \/>\nAny data allocated by <code>gc_alloc()<\/code> is created white.\u00a0 This means that it can be deallocated at the next call to <code>gc()<\/code>, <code>gc_full()<\/code>, or <code>gc_alloc()<\/code> itself!\u00a0 If you mean to use the object you&#8217;ve just created, you should first store it into some other garbage-collectable object&#8230; calling <code>GC_TOUCH()<\/code> on it first. \u00a0This is a macro that encapsulates a call to the <code>gc_touch()<\/code> function, which does all the work.<\/li>\n<li>The shading function:\n<pre>void gc_touch (gc_t *gc_ctx, gc_header_t *p);\r\n#define GC_TOUCH(gc_ctx,p) gc_touch(gc_ctx,(gc_header_t *)p)<\/pre>\n<p>shades <code>p<\/code>. This is required before storing <code>p<\/code>\u00a0into an object. Shading means making <code>p<\/code> grey if it was white. This way, you can store <code>p<\/code> into some other object without running the risk of having a black object point to a white object.<br \/>\nHence, in particular, storing a garbage-collected object <code>p<\/code> at some location <code>loc<\/code> should always be written:<\/p>\n<pre>GC_TOUCH (gc_ctx, loc=p);<\/pre>\n<p>and never <code>loc=p;<\/code>, for example. To make it clear that you have not forgotten the call to <code>GC_TOUCH()<\/code>, you should always use the above form <code>GC_TOUCH (gc_ctx, loc=p);<\/code>, and not equivalent forms such as <code>loc=p; GC_TOUCH (gc_ctx, p);<\/code>.<br \/>\nBy the way, shading works slightly differently in one case: if the <code>gc_class<\/code> of <code>p<\/code> has a <code>mark_subfields<\/code> function pointer equal to <code>NULL<\/code>, this lets the garbage-collector know that <code>p<\/code> has not garbage-collectable subfield, and then <code>gc_touch()<\/code> will make <code>p<\/code> black directly, instead of grey. This makes the garbage-collector more efficient in those cases. It is therefore relatively important to be able to leave the <code>mark_subfields<\/code> entry <code>NULL<\/code> in <code>gc_class<\/code>es.<\/li>\n<li>The garbage-collector itself:\n<pre>void gc (gc_t *gc_ctx);\r\nvoid gc_full (gc_t *gc_ctx);<\/pre>\n<p><code>gc()<\/code> calls the incremental garbage collector, and tries to mark and to reclaim a few objects. By default, it marks 3 objects and tries to reclaim 2.5 objects on average. You should almost never need to call <code>gc()<\/code> yourself, as <code>gc_alloc()<\/code> already calls <code>gc()<\/code> to get back a bit of memory itself.<\/p>\n<p><code>gc_full()<\/code> does a complete round of garbage collection. You should call it twice in a row, in critical low-memory situations, to reclaim as much memory as you can. This is what <code>gc_alloc()<\/code> does anyway when memory runs low.<\/li>\n<li>Object classes:\n<pre>typedef struct gc_class_s gc_class_t;<\/pre>\n<p>A record of specific function hooks for specific kinds of garbage-collectable objects.\u00a0 The most useful one is the <code>mark_subfields<\/code> hook, which is used to mark garbage-collectable subobjects.\u00a0 For example, one may define a garbage-collectable type of binary trees as follows:<\/p>\n<pre>typedef struct tree {\r\n  gc_header_t h;\r\n  int n;\r\n  struct tree *left, *right;\r\n} tree;<\/pre>\n<p>If you allocated a tree object using a <code>NULL<\/code> class argument, the garbage collector would have no way of realizing that it should mark the <code>left<\/code> and <code>right<\/code> fields, too, recursively. To tell the garbage collector it has to do so, call <code>gc_alloc()<\/code> with the argument <code>&amp;tree_class<\/code>, where:<\/p>\n<pre>gc_class_t tree_class = {\r\n  GC_ID('t','r','e','e'),\r\n  tree_mark_subfields,\r\n  tree_finalize,\r\n  tree_traverse,\r\n  tree_save,\r\n  tree_restore\r\n};<\/pre>\n<\/li>\n<li>The <code>tree_mark_subfields()<\/code> function is defined as follows. It simply shades the two garbage-collectable subfields:\n<pre>static void tree_mark_subfields (gc_t *ctx, gc_header_t *p)\r\n{\r\n  tree *t = (tree *)p;\r\n\r\n  GC_TOUCH (ctx, t-&gt;left);\r\n  GC_TOUCH (ctx, t-&gt;right);\r\n}<\/pre>\n<p>It is also possible to leave the <code>mark_subfields<\/code>\u00a0hook equal to <code>NULL<\/code>, in which case the garbage-collector will consider that no subfield has to be marked. This would be wrong for binary trees, of course. However, it makes sense for types such as <code>ovm_int_t<\/code>, which have no garbage-collected subfields (only a <code>long<\/code> subfield). It is even advised to leave the <code>mark_subfields<\/code>\u00a0hook equal to <code>NULL<\/code> when there are no subfields, because it makes the garbage-collector more efficient on those types (see the earlier discussion on <code>gc_touch()<\/code>).<br \/>\nThere are four\u00a0other function hooks that need to be defined. The third one, <code>traverse<\/code> (here, defined as <code>tree_traverse()<\/code>) operates a more generic way of traversing the same subfields. We keep the <code>mark_subfields<\/code> hook separate for efficiency reason, although it could have been implemented in terms of the <code>traverse<\/code> hook. Here is the necessary (boilerplate) code for trees:<\/p>\n<pre>static int tree_traverse (gc_traverse_ctx_t *gtc, gc_header_t *p,\r\n\t\t\t  void *data)\r\n{\r\n  tree *t = (tree *)p;\r\n  int err = 0;\r\n\r\n  err = (*gtc-&gt;do_subfield) (gtc, (gc_header_t *)t-&gt;left, data);\r\n  if (err)\r\n    return err;\r\n  err = (*gtc-&gt;do_subfield) (gtc, (gc_header_t *)t-&gt;right, data);\r\n  return err;\r\n}<\/pre>\n<p>The <code>finalize<\/code> hook (here, <code>tree_finalize()<\/code>) is meant to be called to free up any other memory pointed to by the object in argument, provided such memory is not directly under the control of the garbage collector. It can also be used to close file descriptors, for example, when the object is deallocated. In our example, <code>tree_finalize()<\/code> does nothing:<\/p>\n<pre>static void tree_finalize (gc_t *ctx, gc_header_t *p)\r\n{\r\n  return;\r\n}<\/pre>\n<p>In that case, it is allowed to leave the <code>finalize<\/code> equal to <code>NULL<\/code>, instead of setting it to the do-nothing function <code>tree_finalize()<\/code>.<br \/>\nThe fourth hook, <code>save<\/code>, implements one half of a save-and-restore mechanism, which allows OrchIDS to dump its internal state to a file, and later recover from the saved file anew. The <code>save<\/code> hook must run through all the subfields of the object <code>p<\/code> and call the corresponding save functions. In the case of trees, for example, we would write:<\/p>\n<pre>static int tree_save (save_ctx_t *sctx, gc_header_t *p)\r\n{\r\n  tree *t = (tree *)p;\r\n  int err;\r\n\r\n  err = save_int (sctx, t-&gt;n);\r\n  if (err) return err;\r\n  err = save_gc_struct (sctx, (gc_header_t *)t-&gt;left);\r\n  if (err) return err;\r\n  err = save_gc_struct (sctx, (gc_header_t *)t-&gt;right);\r\n  return err;\r\n}<\/pre>\n<p>Note that this works slightly differently from the <code>mark_subfields<\/code> and <code>traverse<\/code> hooks: we also call save functions, recursively, on the fields that are not under the control of the garbage-collector, such as <code>n<\/code> here.<br \/>\nThe <code>save<\/code>\u00a0hook\u00a0<em>cannot<\/em> be left to <code>NULL<\/code>.<br \/>\nConversely, you must write a restore hook as follows.<\/p>\n<pre>static gc_header_t *tree_restore (restore_ctx_t *rctx)\r\n{\r\n  gc_t *gc_ctx = rctx-&gt;gc_ctx;\r\n  tree *t;\r\n  tree *left, *right;\r\n  int n;\r\n  int err;\r\n\r\n  t = NULL; \/* pre-set result to NULL, in case an error occurs. *\/\r\n  GC_START (gc_ctx, 2); \/* We shall need to protect the left and right fields\r\n                           against the GC before we can store them\r\n                           into the newly allocated tree. *\/\r\n  err = restore_int (gc_ctx, &amp;n); \/* get back the n field *\/\r\n  if (err) { errno = err; goto end; } \/* error reporting is through\r\n                                         the errno variable. *\/\r\n  left = (tree *)restore_gc_struct (rctx);\r\n  if (left==NULL &amp;&amp; errno!=0) goto end; \/* propagate error if one happened\r\n                                           in restoring left;\r\n                                           testing left==NULL is not\r\n                                           enough, as NULL might be\r\n                                           a legal tree. *\/\r\n  GC_UPDATE (gc_ctx, 0, left); \/* protect left against GC! *\/\r\n  right = (tree *)restore_gc_struct (rctx);\r\n  if (right==NULL &amp;&amp; errno!=0) goto end;\r\n  GC_UPDATE (gc_ctx, 1, right); \/* protect right against GC! *\/\r\n  t = gc_alloc (gc_ctx, sizeof(tree), &amp;tree_class);\r\n  t-&gt;gc.type = T_TREE; \/* assuming this tag is defined in lang.h---\r\n                          otherwise add it, as a number between 0 and 255,\r\n                          making sure it has not been taken already;\r\n                          then make sure the gc_classes[] array, at\r\n                          the end of lang.c, contains &amp;tree_class\r\n                          at index T_TREE. *\/\r\n  t-&gt;n = n;\r\n  GC_TOUCH (gc_ctx, t-&gt;left = left); \/* do not forget GC_TOUCH()! *\/\r\n  GC_TOUCH (gc_ctx, t-&gt;right = right); \/* do not forget GC_TOUCH()! *\/\r\nend:\r\n  GC_END (gc_ctx);\r\n  return t;\r\n}<\/pre>\n<p>The <code>restore<\/code>\u00a0hook\u00a0<em>cannot<\/em> be left to <code>NULL<\/code>.<\/li>\n<li>Debugging:\n<pre>void gc_check (gc_t *gc_ctx);<\/pre>\n<p>Looks for inconsistencies in memory. Works similarly to <code>fsck<\/code> on Unix systems: call it to check that the garbage collector is in a sane state, that is, that it satisfies all the invariants it is meant to satisfy. For example, this will check that no black object directly points to any white object, as mentioned above.<\/p>\n<p>To detect memory and garbage collector related bugs, sprinkle your code with calls to <code>gc_check()<\/code>, and launch OrchIDS under <tt>gdb<\/tt> (or <tt>lldb<\/tt> on Mavericks MacOSX and later). <code>gc_check()<\/code> should <code>abort()<\/code> and fall back to the debugger the first time it is called after something went wrong.<\/li>\n<\/ul>\n<p>&nbsp;<\/p>\n<p>&nbsp;<\/p>\n","protected":false},"excerpt":{"rendered":"<p>Contrarily to all versions of OrchIDS \u22641.1, OrchIDS 2.0 has garbage-collected memory. This means that you still allocate memory, but do not deallocate it explicitly. This imposes a certain number of constraints on the way you program.<\/p>\n","protected":false},"author":1,"featured_media":0,"comment_status":"closed","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"footnotes":""},"categories":[2],"tags":[],"class_list":["post-18","post","type-post","status-publish","format-standard","hentry","category-memory-management"],"_links":{"self":[{"href":"https:\/\/projects.lsv.ens-paris-saclay.fr\/orchidsdev\/index.php?rest_route=\/wp\/v2\/posts\/18","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/projects.lsv.ens-paris-saclay.fr\/orchidsdev\/index.php?rest_route=\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/projects.lsv.ens-paris-saclay.fr\/orchidsdev\/index.php?rest_route=\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/projects.lsv.ens-paris-saclay.fr\/orchidsdev\/index.php?rest_route=\/wp\/v2\/users\/1"}],"replies":[{"embeddable":true,"href":"https:\/\/projects.lsv.ens-paris-saclay.fr\/orchidsdev\/index.php?rest_route=%2Fwp%2Fv2%2Fcomments&post=18"}],"version-history":[{"count":12,"href":"https:\/\/projects.lsv.ens-paris-saclay.fr\/orchidsdev\/index.php?rest_route=\/wp\/v2\/posts\/18\/revisions"}],"predecessor-version":[{"id":289,"href":"https:\/\/projects.lsv.ens-paris-saclay.fr\/orchidsdev\/index.php?rest_route=\/wp\/v2\/posts\/18\/revisions\/289"}],"wp:attachment":[{"href":"https:\/\/projects.lsv.ens-paris-saclay.fr\/orchidsdev\/index.php?rest_route=%2Fwp%2Fv2%2Fmedia&parent=18"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/projects.lsv.ens-paris-saclay.fr\/orchidsdev\/index.php?rest_route=%2Fwp%2Fv2%2Fcategories&post=18"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/projects.lsv.ens-paris-saclay.fr\/orchidsdev\/index.php?rest_route=%2Fwp%2Fv2%2Ftags&post=18"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}