{"id":150,"date":"2014-07-15T08:44:55","date_gmt":"2014-07-15T08:44:55","guid":{"rendered":"https:\/\/projects.lsv.ens-paris-saclay.fr\/orchidsdev\/?p=150"},"modified":"2015-07-13T11:42:12","modified_gmt":"2015-07-13T11:42:12","slug":"writing-functions-that-allocate-memory","status":"publish","type":"post","link":"https:\/\/projects.lsv.ens-paris-saclay.fr\/orchidsdev\/?p=150","title":{"rendered":"Writing functions that allocate memory"},"content":{"rendered":"<p>The purpose of this note is to explain how to write functions that allocate memory.\u00a0 One basic principle is that one should see the <a title=\"Garbage collection\" href=\"https:\/\/projects.lsv.ens-paris-saclay.fr\/orchidsdev\/?p=18\">garbage collector<\/a> as an adversary, which always tries to deallocate memory.\u00a0 You should protect your data against it.\u00a0 Here are a few examples.<!--more--><\/p>\n<p>The simplest case is that of a function that simply allocates a garbage-collectable data structure, and fills in some fields.\u00a0 Just use <code>gc_alloc()<\/code>, as in the <code>ovm_int_new()<\/code> function:<\/p>\n<pre>ovm_var_t *ovm_int_new(gc_t *gc_ctx, long val)\r\n{\r\n\u00a0 ovm_int_t *n;\r\n\r\n\u00a0 n = gc_alloc (gc_ctx, sizeof (ovm_int_t), NULL);\r\n\u00a0 n-&gt;gc.type = T_INT;\r\n\u00a0 n-&gt;val = val;\r\n\u00a0 return OVM_VAR(n);\r\n}<\/pre>\n<p>&nbsp;<\/p>\n<hr \/>\n<p>&nbsp;<\/p>\n<p>Now consider a slightly more complicated function: <code>build_integer()<\/code> builds an object of type <code>node_expr_t<\/code>, which holds a pointer in its <code>data<\/code> field to an <code>ovm_var_t<\/code> holding an integer.\u00a0 We need to allocate the <code>ovm_var_t<\/code>, then the <code>node_expr_t<\/code>, and store the former into the latter&#8217;s <code>data<\/code> field.\u00a0 The garbage collector context is in <code>ctx-&gt;gc_ctx<\/code>, so we might think of writing the following, but this is wrong.<\/p>\n<pre>node_expr_t *build_integer(rule_compiler_t *ctx, int i)\r\n{\r\n\u00a0 ovm_var_t\u00a0\u00a0\u00a0 *integer;\r\n\u00a0 node_expr_term_t\u00a0 *n;\r\n\r\n\u00a0 integer = ovm_int_new(ctx-&gt;gc_ctx, i);\r\n\u00a0 n = (node_expr_term_t *) gc_alloc (ctx-&gt;gc_ctx, sizeof(node_expr_term_t),\r\n\u00a0\u00a0 \u00a0\u00a0\u00a0 \u00a0\u00a0\u00a0 \u00a0\u00a0\u00a0 \u00a0\u00a0\u00a0\u00a0\u00a0 &amp;node_expr_term_class);\r\n\u00a0 n-&gt;type = NODE_CONST;\r\n\u00a0 n-&gt;data = integer;\r\n\r\n\u00a0 return (node_expr_t *)n;\r\n}<\/pre>\n<p>Here is what goes wrong with this code snippet.\u00a0 The object <code>integer<\/code> created by <code>ovm_int_new()<\/code> is not pointed to by any data structure that the garbage collector can mark. The subsequent call to <a title=\"Garbage collection\" href=\"https:\/\/projects.lsv.ens-paris-saclay.fr\/orchidsdev\/?p=18\"><code>gc_alloc()<\/code><\/a> will invoke the garbage collector, to make sure some memory is deallocated, and might well deallocate the <code>integer<\/code> object we have just allocated.<\/p>\n<p>This problem is nasty. In particular, <a title=\"Garbage collection\" href=\"https:\/\/projects.lsv.ens-paris-saclay.fr\/orchidsdev\/?p=18\"><code>gc_check()<\/code><\/a> has almost no way of detecting it.<\/p>\n<p>Oh, you might think that this will never happen, since the garbage collector would have to be damned quick to deallocate <code>integer<\/code> this fast.<\/p>\n<p>Please <strong>never<\/strong>, <strong>never<\/strong> reason that way.\u00a0 You are right in thinking this will almost never happen, but this is precisely the problem: when the bug occurs (and be sure it will), it will be undiagnosable, irreproducible, impossible to debug, and will make Orchids fail in horrible ways.<\/p>\n<p>&nbsp;<\/p>\n<hr \/>\n<p>&nbsp;<\/p>\n<p>Back to our problem: we need to protect <code>integer<\/code> against the garbage collector. The standard solution is by using the <code>GC_START<\/code>, <code>GC_UPDATE<\/code>, and <code>GC_END<\/code> macros:<\/p>\n<pre>node_expr_t *build_integer(rule_compiler_t *ctx, int i)\r\n{\r\n  ovm_var_t    *integer;\r\n  node_expr_term_t  *n;\r\n\r\n  GC_START(ctx-&gt;gc_ctx, 1);\r\n  integer = ovm_int_new(ctx-&gt;gc_ctx, i);\r\n  GC_UPDATE(ctx-&gt;gc_ctx, 0, integer);\r\n\r\n  n = (node_expr_term_t *) gc_alloc (ctx-&gt;gc_ctx, sizeof(node_expr_term_t),\r\n\t\t\t\t     &amp;node_expr_term_class);\r\n  n-&gt;type = NODE_CONST;\r\n  n-&gt;data = integer;\r\n\r\n  GC_END(ctx-&gt;gc_ctx);\r\n  return (node_expr_t *)n;\r\n}<\/pre>\n<p>The <code>GC_START<\/code> macro allocates an array of protected entries on the C stack. You need to tell it how many entries this array will contain: 1 in our example. The entries are initialized to <code>NULL<\/code>.\u00a0 We then use <code>GC_UPDATE<\/code> to protect <code>integer<\/code>, storing it at index 0 in the array.\u00a0 (But beware!\u00a0 This code still has problems.\u00a0 Read on.)<\/p>\n<p>Each call to <code>GC_START<\/code> must be matched by exactly one call to <code>GC_END<\/code>. To help check this, <code>GC_START<\/code> expands to some piece of code that contains an open curly bracket (<code>{<\/code>), and <code>GC_END<\/code> contains the matching closing curly bracket (<code>}<\/code>). Mismatches should be flagged by auto-indenting text editors.<\/p>\n<p>As a consequence, if your code is indented strangely, don&#8217;t dismiss that as a text formatting problem!\u00a0 It might be indicative of serious <code>GC_START<\/code>\/<code>GC_END<\/code> balancing bug.<\/p>\n<p>By the way, don&#8217;t try to be smart: call <code>GC_START<\/code> at the beginning of your function with the right number of entries,\u00a0<code>GC_END<\/code> at the end, and that&#8217;s it.<\/p>\n<p>And don&#8217;t use <code>return<\/code> in the middle of your code, either. For example, the following is wrong:<\/p>\n<pre>{\r\n  GC_START(gc_ctx, 3);\r\n  \/* do some stuff *\/\r\n  if (\/* somme error condition *\/)\r\n    return 1;\r\n  GC_END(gc_ctx);\r\n  return 0;\r\n}\r\n<\/pre>\n<p>This is wrong because the <code>return 1<\/code> instruction will fail to execute the code hidden inside the <code>GC_END<\/code> macro. Since this codes unlinks pointers stored on the stack, this will really cause havoc after <code>return<\/code>.<\/p>\n<hr \/>\n<p>Our latest attempt at writing <code>build_integer()<\/code> is still wrong. This time, <a title=\"Garbage collection\" href=\"https:\/\/projects.lsv.ens-paris-saclay.fr\/orchidsdev\/?p=18\"><code>gc_check()<\/code><\/a> is more likely to spot the problem, but you&#8217;re safer if you don&#8217;t count on it.<\/p>\n<p>The problem is that the assignment <code>n-&gt;data = integer<\/code> may violate the invariant that no black object points directly to a white object. You should use <a title=\"Garbage collection\" href=\"https:\/\/projects.lsv.ens-paris-saclay.fr\/orchidsdev\/?p=18\"><code>GC_TOUCH<\/code><\/a> to correct the problem, as follows:<\/p>\n<pre>node_expr_t *build_integer(rule_compiler_t *ctx, int i)\r\n{\r\n  ovm_var_t    *integer;\r\n  node_expr_term_t  *n;\r\n\r\n  GC_START(ctx-&gt;gc_ctx, 1);\r\n  integer = ovm_int_new(ctx-&gt;gc_ctx, i);\r\n  GC_UPDATE(ctx-&gt;gc_ctx, 0, integer); \/* protect integer *\/\r\n\r\n  n = (node_expr_term_t *) gc_alloc (ctx-&gt;gc_ctx, sizeof(node_expr_term_t),\r\n\t\t\t\t     &amp;node_expr_term_class);\r\n  n-&gt;type = NODE_CONST;\r\n  GC_TOUCH (ctx-&gt;gc_ctx, n-&gt;data = integer);\r\n\r\n  GC_END(ctx-&gt;gc_ctx);\r\n  return (node_expr_t *)n;\r\n}<\/pre>\n<p>In general, assignments of garbage-collectable objects should be enclosed in a call to <code>GC_TOUCH<\/code>. The style exemplified above is recommended. In particular, don&#8217;t use the (equivalent) idiom:<\/p>\n<pre>  n-&gt;data = integer; \/* suspicious assignement, may violate no-black-object-pointing-directly-to-white-object invariant *\/\r\n  GC_TOUCH (ctx-&gt;gc_ctx, n-&gt;data);<\/pre>\n<p>Code lines similar to <code>n-&gt;data = integer<\/code>, without an enclosing <code>GC_TOUCH<\/code>, should be seen as suspicious. You should always prefer the following style:<\/p>\n<pre>  GC_TOUCH (ctx-&gt;gc_ctx, n-&gt;data = integer);<\/pre>\n<p>which makes it clear that the `no black object pointing directly to a white object&#8217; invariant is satisfied.<\/p>\n<p>&nbsp;<\/p>\n<hr \/>\n<p>A final word (for today). One should protect data, not against <code>gc_alloc()<\/code>, as would seem to be the case looking at the above example, but against <em>any<\/em> function that may call the garbage collector.<\/p>\n<p>These include <code>gc_alloc()<\/code>, <code>gc()<\/code>, <code>gc_full()<\/code>, but also <code>gc_base_malloc()<\/code> and <code>gc_base_realloc()<\/code> (which do ordinary C-style memory allocation and reallocation, but also call the garbage collector to be sure they will have enough memory), and all functions that call them directly or indirectly.<\/p>\n<p>As an illustration, here is the actual code for <code>build_integer()<\/code>. The only difference with the above is that <code>build_integer()<\/code> also adds <code>integer<\/code> to a table of so-called static values, akin to Java&#8217;s constant pool, using <code>statics_add()<\/code>. The latter may need to allocate some memory, and in that case will do it using <code>gc_base_realloc()<\/code>. We therefore need to protect <code>n<\/code> itself. The safest way is to use <code>GC_UPDATE<\/code> again on it. Don&#8217;t forget to modify the call to <code>GC_START<\/code> so that it allocates an array of 2 elements, not 1, this time:<\/p>\n<pre>node_expr_t *build_integer(rule_compiler_t *ctx, int i)\r\n{\r\n  ovm_var_t    *integer;\r\n  node_expr_term_t  *n;\r\n\r\n  GC_START(ctx-&gt;gc_ctx, 2);\r\n  integer = ovm_int_new(ctx-&gt;gc_ctx, i);\r\n  GC_UPDATE(ctx-&gt;gc_ctx, 0, integer); \/* protect integer *\/\r\n\r\n  n = (node_expr_term_t *) gc_alloc (ctx-&gt;gc_ctx, sizeof(node_expr_term_t),\r\n\t\t\t\t     &amp;node_expr_term_class);\r\n  n-&gt;type = NODE_CONST;\r\n  GC_TOUCH (ctx-&gt;gc_ctx, n-&gt;data = integer);\r\n  n-&gt;res_id = ctx-&gt;statics_nb;\r\n  GC_UPDATE(ctx-&gt;gc_ctx, 1, n); \/* protect n *\/\r\n\r\n  statics_add(ctx, integer);\r\n  GC_END(ctx-&gt;gc_ctx);\r\n  return (node_expr_t *)n;\r\n}\r\n<\/pre>\n<p>&nbsp;<\/p>\n<hr \/>\n<p>&nbsp;<\/p>\n<p>Well, I&#8217;ve lied. This is not quite the actual code to <code>build_integer()<\/code>. The actual code reuses slot 0 in the <code>GC_START <\/code>array: <code>integer<\/code> does not need to be stored there, since it will be protected by the <code>data<\/code> field from the protected <code>n<\/code> object.\u00a0 However, don&#8217;t try to be that smart. There are many ways to do smart things that will cause havoc with the garbage collector.<\/p>\n<p>&nbsp;<\/p>\n","protected":false},"excerpt":{"rendered":"<p>The purpose of this note is to explain how to write functions that allocate memory.\u00a0 One basic principle is that one should see the garbage collector as an adversary, which always tries to deallocate memory.\u00a0 You should protect your data against it.\u00a0 Here are a few examples.<\/p>\n","protected":false},"author":1,"featured_media":0,"comment_status":"closed","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"footnotes":""},"categories":[2],"tags":[],"class_list":["post-150","post","type-post","status-publish","format-standard","hentry","category-memory-management"],"_links":{"self":[{"href":"https:\/\/projects.lsv.ens-paris-saclay.fr\/orchidsdev\/index.php?rest_route=\/wp\/v2\/posts\/150","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/projects.lsv.ens-paris-saclay.fr\/orchidsdev\/index.php?rest_route=\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/projects.lsv.ens-paris-saclay.fr\/orchidsdev\/index.php?rest_route=\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/projects.lsv.ens-paris-saclay.fr\/orchidsdev\/index.php?rest_route=\/wp\/v2\/users\/1"}],"replies":[{"embeddable":true,"href":"https:\/\/projects.lsv.ens-paris-saclay.fr\/orchidsdev\/index.php?rest_route=%2Fwp%2Fv2%2Fcomments&post=150"}],"version-history":[{"count":8,"href":"https:\/\/projects.lsv.ens-paris-saclay.fr\/orchidsdev\/index.php?rest_route=\/wp\/v2\/posts\/150\/revisions"}],"predecessor-version":[{"id":284,"href":"https:\/\/projects.lsv.ens-paris-saclay.fr\/orchidsdev\/index.php?rest_route=\/wp\/v2\/posts\/150\/revisions\/284"}],"wp:attachment":[{"href":"https:\/\/projects.lsv.ens-paris-saclay.fr\/orchidsdev\/index.php?rest_route=%2Fwp%2Fv2%2Fmedia&parent=150"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/projects.lsv.ens-paris-saclay.fr\/orchidsdev\/index.php?rest_route=%2Fwp%2Fv2%2Fcategories&post=150"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/projects.lsv.ens-paris-saclay.fr\/orchidsdev\/index.php?rest_route=%2Fwp%2Fv2%2Ftags&post=150"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}