{"id":203,"date":"2014-11-02T19:39:38","date_gmt":"2014-11-02T19:39:38","guid":{"rendered":"https:\/\/projects.lsv.ens-paris-saclay.fr\/orchidsdev\/?p=203"},"modified":"2014-11-05T16:34:06","modified_gmt":"2014-11-05T16:34:06","slug":"type-punning-and-the-strict-aliasing-rule","status":"publish","type":"post","link":"https:\/\/projects.lsv.ens-paris-saclay.fr\/orchidsdev\/?p=203","title":{"rendered":"Type-punning and the strict aliasing rule"},"content":{"rendered":"<p>The ANSI C standards were obviously largely influenced by compiler writers. This is a good thing: these are the guys would should know about the language. However, they sometimes enforce semantics assumptions that allow for very efficient optimizations. Ultimately, the main problem is that these assumptions are not or only partially checked by the compiler, but the compiler will still do as though these assumptions are true, and modify your code accordingly. I will concentrate on one of these assumptions here: the strict aliasing rule, and its pretty bad way of mixing with type-punning.<!--more--><\/p>\n<p><a title=\"Type-punning\" href=\"https:\/\/en.wikipedia.org\/wiki\/Type_punning\">Type-punning<\/a> is the practice of pretending an object of some type really is of some other type. This is accomplished in C by using casts or <code>union<\/code> types.<\/p>\n<p>Orchids relies <em>a lot<\/em> on casts between pointer types: there is some type-punning all over the place in its code! For example:<\/p>\n<ul>\n<li>In the Orchids virtual machine, all values that it handles are cast to the C type of pointers to <code>ovm_var_t<\/code>, but are actually pointers to objects of type <code>ovm_int_t<\/code>, or <code>ovm_str_t<\/code>, or <code>ovm_vstr_t<\/code>, or etc. The latter types are the member fields of the <code>union<\/code> type defining the C type <code>ovm_var_t<\/code>.<\/li>\n<li>In the Orchids parser, the nodes of the parse tree are of type <code>node_expr_t *<code>, but are actually pointers to objects of type <code>node_expr_binop_t<\/code>, <code>node_expr_sym_t<\/code>, etc. The latter types are not even designed as member fields of the type <code>node_expr_t<\/code>.<\/code><\/code><\/li>\n<li>All objects of a <a title=\"Garbage collection\" href=\"https:\/\/projects.lsv.ens-paris-saclay.fr\/orchidsdev\/?p=18\">garbage-collectable<\/a> type (and that includes all of the above types) are pointers to objects of a <code>struct<\/code> type whose first field is <code>gc_header_t gc<\/code>; this way, the garbage collector knows it can cast all of these pointers to <code>gc_header_t *<\/code>, its basic data structure, and work on the latter.<\/li>\n<\/ul>\n<p>Any compiler for a langage with pointers must deal with the thorny issue of <em>pointer aliasing<\/em>. For example, if you write:<\/p>\n<pre>{\r\n  int n = 10;\r\n  int *p, *q;\r\n\r\n  p = &amp;n;\r\n  q = &amp;n;\r\n  *p = 29;\r\n}\r\n<\/pre>\n<p>then <code>p<\/code> and <code>q<\/code> are pointers which are <em>aliases<\/em>: they point to the same address, namely <code>&amp;n<\/code>. The final assignement to <code>*p<\/code> also changes <code>*q<\/code> and <code>n<\/code>.<\/p>\n<p>The dream of any compiler optimization writer would be to have a simple, efficient way of guaranteeing that a given pointer has no alias, or at least to find a short list of all pointers that may be aliases of it.<\/p>\n<p>A very simple trick that improves upon any alias detection technique in optimizers consists in realizing that pointers to objects of different types <em>cannot be aliases of each other<\/em>. However, while this would certainly be true in a strongly, statically typed language, this is obviously false in a language with casts.<\/p>\n<p>This is a problem: you cannot count on this trick to improve your C optimizer.<\/p>\n<p>Or can you? The ANSI C normalization committees have decided on the <em>strict aliasing<\/em> rule, which is basically to say that if you ever access the same data (read access, or write access) through two different pointers to different types, then the semantics of your program is undefined.\u00a0 Undefinedness is the standard trick of the ANSI C committee to say &#8220;you&#8217;ve been warned&#8221;.<\/p>\n<p>Compilers usually try to detect such undefined cases, and warn you about them.\u00a0 For example, <code>gcc<\/code> might tell you (in <code>-O2<\/code> mode) &#8220;type-punning to incomplete type might break strict aliasing rules&#8221;, or &#8220;type-punned pointer will break strict aliasing rules&#8221;.\u00a0 However, <code>gcc<\/code> also sometimes just remains silent and produces weird code&#8230;<\/p>\n<p>What is the effect of all this?<\/p>\n<p>For example, the <code>GC_START()<\/code>, <code>GC_UPDATE()<\/code> and <code>GC_END()<\/code> macros rely heavily on type-punning. I&#8217;ve once had a bug where, inside a <code>GC_START\/GC_END<\/code> section, I did some computations and stored the intermediate, garbage-collectable results by using <code>GC_UDPATE()<\/code>. This code failed miserably, with the following strange behavior:<\/p>\n<ul>\n<li>It core dumped, showing signs that all memory structures had been garbled (ASCII strings instead of memory addresses in pointers that the garbage collector was using), but <code>gc_check()<\/code> never found anything wrong until the bug actually fired<\/li>\n<li>The code worked perfectly if I did not use the <code>-O2<\/code> option, or even if I only used some moderate amount of optimization (<code>-O1<\/code>)<\/li>\n<li>After some modifications of the code, I realized that the pointer <code>gc_ctx-&gt;stack_data<\/code> had not been restored by <code>GC_END()<\/code> to the value it had before I did <code>GC_START()<\/code>; adding pointer equality checks into the code to check when this value changed suddenly spurred <code>gc<\/code> into complaining about type-punned pointers breaking the strict aliasing rule. Note that <code>gcc<\/code> complained about the bug-checking code, not on the actual function I was checking.<\/li>\n<\/ul>\n<p>I have no general solution to that kind of bug.\u00a0 Although the code of Orchids uses extensively that kind of type-punning in modules (the <code>REGISTER_EVENTS()<\/code> macro is a horrible case of type-punning), this seems to work at the moment.\u00a0 In the code mentioned above, I did things such as:<\/p>\n<pre>{\r\n\u00a0 varset_t *vs;\r\n\u00a0 GC_START(gc_ctx, 1);\r\n\r\n\u00a0 vs = &lt;whatever&gt;;\r\n\u00a0 GC_UPDATE(gc_ctx, 0, vs);<\/pre>\n<p>then I used both <code>vs<\/code> or <code>(varset_t *)GC_LOOKUP(0)<\/code> which, after all, are exactly the same object. The C compiler is entitled to consider that they are distinct pointers, since <code>vs<\/code> is a pointer to <code>varset_t<\/code> and <code>GC_LOOKUP(0)<\/code> is a pointer to <code>gc_header_t<\/code> (or to <code>void<\/code>, depending on versions). Therefore side effects to one pointed object are deemed to have no effect on the other one, and the C compiler will optimize away any instruction which, as a result, is deemed to have no effect on the final computed result.<\/p>\n","protected":false},"excerpt":{"rendered":"<p>The ANSI C standards were obviously largely influenced by compiler writers. This is a good thing: these are the guys would should know about the language. However, they sometimes enforce semantics assumptions that allow for very efficient optimizations. Ultimately, the main problem is that these assumptions are not or only partially checked by the compiler, [&hellip;]<\/p>\n","protected":false},"author":1,"featured_media":0,"comment_status":"closed","ping_status":"closed","sticky":false,"template":"","format":"standard","meta":{"footnotes":""},"categories":[7],"tags":[],"class_list":["post-203","post","type-post","status-publish","format-standard","hentry","category-bugs-mean-bugs-and-c"],"_links":{"self":[{"href":"https:\/\/projects.lsv.ens-paris-saclay.fr\/orchidsdev\/index.php?rest_route=\/wp\/v2\/posts\/203","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/projects.lsv.ens-paris-saclay.fr\/orchidsdev\/index.php?rest_route=\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/projects.lsv.ens-paris-saclay.fr\/orchidsdev\/index.php?rest_route=\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/projects.lsv.ens-paris-saclay.fr\/orchidsdev\/index.php?rest_route=\/wp\/v2\/users\/1"}],"replies":[{"embeddable":true,"href":"https:\/\/projects.lsv.ens-paris-saclay.fr\/orchidsdev\/index.php?rest_route=%2Fwp%2Fv2%2Fcomments&post=203"}],"version-history":[{"count":6,"href":"https:\/\/projects.lsv.ens-paris-saclay.fr\/orchidsdev\/index.php?rest_route=\/wp\/v2\/posts\/203\/revisions"}],"predecessor-version":[{"id":209,"href":"https:\/\/projects.lsv.ens-paris-saclay.fr\/orchidsdev\/index.php?rest_route=\/wp\/v2\/posts\/203\/revisions\/209"}],"wp:attachment":[{"href":"https:\/\/projects.lsv.ens-paris-saclay.fr\/orchidsdev\/index.php?rest_route=%2Fwp%2Fv2%2Fmedia&parent=203"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/projects.lsv.ens-paris-saclay.fr\/orchidsdev\/index.php?rest_route=%2Fwp%2Fv2%2Fcategories&post=203"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/projects.lsv.ens-paris-saclay.fr\/orchidsdev\/index.php?rest_route=%2Fwp%2Fv2%2Ftags&post=203"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}