{"id":238,"date":"2015-05-25T18:06:32","date_gmt":"2015-05-25T18:06:32","guid":{"rendered":"https:\/\/projects.lsv.ens-paris-saclay.fr\/orchidsdev\/?p=238"},"modified":"2015-07-13T11:26:20","modified_gmt":"2015-07-13T11:26:20","slug":"thread-local-imperative-data-structures","status":"publish","type":"post","link":"https:\/\/projects.lsv.ens-paris-saclay.fr\/orchidsdev\/?p=238","title":{"rendered":"Thread-local imperative data structures"},"content":{"rendered":"<p>All primitives and all constructions of the Orchids language must behave in a thread-local way (except for a few special-purpose features). \u00a0This is incompatible with the way some libraries work, such as <code>libxml2<\/code>. We explain the problem, and a standard solution: thread-local objects. We illustrate this on the <code>mod_xml<\/code> module.<!--more--><\/p>\n<h3>The problem<\/h3>\n<p>The <code>mod_xml<\/code> module offers a few Orchids primitives that manipulate in-memory XML documents, notably <code>xml_get_str<\/code> (takes an XML document, an XPath, and an attribute name, and returns the value of the attribute as a string) and <code>xml_set_str<\/code> (takes an XML document, an XPath, an attribute name, and a string, and sets the attribute to the given string).<\/p>\n<p>Consider the following toy example of Orchids code:<\/p>\n<pre>state p {\r\n  $doc = idmef_new_alert(); \/* new IDMEF XML document *\/\r\n  expect (...) goto ipv4_state;\r\n  expect (...) goto ipv6_state;\r\n}\r\n\r\nstate ipv4_state {\r\n  xml_set_str ($doc, \"\/*\/idmef:Alert\/idmef:Source\/idmef:Node\/idmef:Address\",\r\n               \"category\", \"ipv4addr\");\r\n  expect (...) goto alert;\r\n}\r\n\r\nstate ipv6_state {\r\n  xml_set_str ($doc, \"\/*\/idmef:Alert\/idmef:Source\/idmef:Node\/idmef:Address\",\r\n               \"category\", \"ipv6addr\");\r\n  expect (...) goto alert;\r\n}\r\n\r\nstate alert {\r\n  idmef_write_alert ($doc);\r\n}\r\n<\/pre>\n<p>This is a typical way of writing part of a signature that, in state <code>p<\/code>, knows it will have something to report, hence creates an IDMEF alert document <code>$doc<\/code>; but does not know yet the value of some of the attributes.<\/p>\n<p>Depending on some conditions (abbreviated as <code>...<\/code>) it will set the <code>\"category\"<\/code> attribute to <code>\"ipv4addr\"<\/code> or to <code>\"ipv6addr\"<\/code>, then waits for additional conditions to happen and go to the <code>alert<\/code> state, finally emitting the IDMEF alert document.<\/p>\n<p>Note the two <code>expect<\/code> clauses in state <code>p<\/code> will create two OrchIDS threads, call them\u00a0<em>T1<\/em> and <em>T2<\/em>. Each one waits for some specific form of event before it goes to <code>ipv4_state<\/code> or to <code>ipv6_state<\/code>.<\/p>\n<p>Now imagine XML documents were handled in the obvious way: <code>idmef_new_alert<\/code> creates an IDMEF conformant XML document, <code>xml_set_str<\/code> modifies it. That would be wrong, because the document would be shared between <code>T1<\/code> and <code>T2<\/code>, and any modification by one of the threads would change the other thread&#8217;s document.<\/p>\n<p>To make that clear, imagine a sequence of events numbered, say, from 1 to 100. At event 1, we are in state <code>p<\/code> and the two threads <em>T1<\/em> and <em>T2<\/em> are created.<\/p>\n<p>At event 2, <em>T1<\/em> finds an event that satisfies the condition of its <code>expect<\/code> clause, goes to state <code>ipv4_address<\/code>, and sets the <code>\"category\"<\/code> attribute to <code>\"ipv4addr\"<\/code>. It then waits on another event that would enable it to go to state <code>alert<\/code>; let&#8217;s say this happens for the first time at event 100.<\/p>\n<p>Meanwhile, <em>T2<\/em> waits for an event that would match its own <code>expect<\/code> clause, finds it at event 3. Imagine, for the purpose of illustration, that <em>T2<\/em> never gets the opportunity of going to state <code>alert<\/code>, so only one IDMEF alert will be reported, by thread <em>T1<\/em>, at event 100.<\/p>\n<p>One would expect this alert to report an alert with the <code>\"category\"<\/code> attribute equal to <code>\"ipv4addr\"<\/code>, since\u00a0<em>T1<\/em>\u00a0is responsible for it. However, the IDMEF document will have been clobbered by thread <em>T2<\/em> at event 3, and <em>T1<\/em> will erroneously report a <code>\"category\"<\/code> attribute equal to <code>\"ipv6addr\"<\/code>.<\/p>\n<p>The problem is that our naive implementation of XML documents is not <em>thread-local<\/em>. All data structures handled by the OrchIDS virtual machine should be thread-local, namely: modifications done by one thread should be invisible from any other thread. (There are exceptions to this rule&#8230; starting with the <code>mod_sharedvars<\/code> module, which aims at making two threads communicate through shared values.)<\/p>\n<h3>Applicative data structures<\/h3>\n<p>The ideal cure for that kind of problem would be to implement an API of <em>applicative<\/em> XML documents: instead of modifying XML documents, one would create new XML documents from old and never actually modify any. This is what was done with the database objects that the OrchIDS language offers: instead of adding a record <em>R<\/em> to a database <em>D<\/em>, we build the union of <em>R<\/em> with <em>D<\/em>, and store the new database into a variable.<\/p>\n<p>For some data structures, this would be impractical. The case of XML documents is a typical case. We handle XML documents by calling the <code>libxml2<\/code> library, and it is not applicative.<\/p>\n<p>How do we make non-applicative data structures thread-local?<\/p>\n<h3>Thread-local objects<\/h3>\n<p>A good solution is provided by the notion of <em>thread-local<\/em> objects, namely objects that OrchIDS will copy each time it creates a new thread. \u00a0This way, each thread will have its own copy of the object.<\/p>\n<p>You may object that copying all the time will be costly. \u00a0This is why most thread-local objects will be\u00a0<em>copy-on-write<\/em>: they won&#8217;t actually be copied, until we try to write into them. \u00a0I said &#8220;most&#8221;, not &#8220;all&#8221;. \u00a0Currently, OrchIDS only allows for 32 copy-on-write objects per thread. This should be sufficient for most envisioned applications: the only thread-local objects one usually needs are XML documents (IDMEF alerts, IODEF reports, prelude alerts), and one usually needs only one of each in a given rule.\u00a0 The other thread-local objects will actually be copied,\u00a0in an inefficient way\u2014however, we have just argued that there should hardly ever be any such inefficient thread-local object.<\/p>\n<p>Roughly, encapsulation works as follows. Each thread-local object appears as a\u00a0<em>handle<\/em>, namely, an unsigned integer, of type <code>uint16_t<\/code>. For each handle <em>k<\/em>, the value that the object has in the current thread is to be found is stored in a fake variable (a variable that the user has no access to), of number <code>HANDLE_FAKE_VAR(<\/code><em>k<\/em><code>)<\/code>. While user variables have low numbers 0, 1, 2, etc., those fake variables have high numbers <code>ULONG_MAX<\/code>, <code>ULONG_MAX-1<\/code>, etc.<\/p>\n<p>To create a new thread-local object with value <code>val, we call:<\/code><\/p>\n<pre>uint16_t create_fresh_handle (gc_t *gc_ctx, state_instance_t *si,\r\n                              ovm_var_t *val);\r\n<\/pre>\n<p>If the number returned is strictly less than <code>MAX_HANDLES<\/code> (=32), then this object will be copy-on-write. Otherwise, it will be copied each time a new thread (of type <code>state_instance_t<\/code>) is created by the OrchIDS engine (in <code>engine.c<\/code>).<\/p>\n<p>We can access the actual object referred to by the handle, by using one of the following two functions.<\/p>\n<pre>ovm_var_t *handle_get_rd (gc_t *gc_ctx, state_instance_t *si, uint16_t k);\r\n<\/pre>\n<p>returns the local copy of the object <code>val<\/code> referred to by handle <code>k This is very fast (just one variable lookup). However, by calling this function, you promise not to make any modification to the structure pointed to by the returned pointer: it may be shared with other threads. If you cannot hold that promise, you must use the following function.<\/code><\/p>\n<pre>ovm_var_t *handle_get_wr (gc_t *gc_ctx, state_instance_t *si, uint16_t k);\r\n<\/pre>\n<p>returns a private copy of the non-applicative data stored in the thread-local object <code>tl<\/code>. The copy is guaranteed to be used only with thread <code>si<\/code>. In fact, <code>handle_get_wr()<\/code> works by checking whether <code>si<\/code> <em>owns<\/em> the object behind the handle. If it does, then the object is returned directly, just as if we had used <code>handle_get_rd()<\/code>. Otherwise, <code>handle_get_wr()<\/code> copies the object, stores it as the value of fake variable <code>HANDLE_FAKE_VAR(k)<\/code>, and declares that it owns the new object.<\/p>\n<p>This is less efficient than calling <code>handle_get_rd()<\/code>, but it is safe.<\/p>\n<p>Copies are done by using the <code>issdl_clone()<\/code> function. XML documents, for instance, are (handles to) OrchIDS values of type <code>T_EXTERN<\/code>; that is, they are really pointers to objects of type <a title=\"The ovm_extern_t type\" href=\"https:\/\/projects.lsv.ens-paris-saclay.fr\/orchidsdev\/?p=143\"><code>ovm_extern_t<\/code><\/a>. Every such object must provide function pointers that copy and that free the embedded objects, and that is how OrchIDS knows how to do copies.<\/p>\n<p>Finally, the following function is used to free the thread-local object, namely the local copy held by thread <code>si<\/code>, and also the handle <code>k<\/code>, for further reuse.<\/p>\n<pre>int release_handle (gc_t *gc_ctx, state_instance_t *si, uint16_t k);\r\n<\/pre>\n<p>Currently, <code>release_handle()<\/code> is not used in OrchIDS. That would mean, for example, allowing the user to manually call a function that releases a handle, causing possible run-time errors (undefined object). Since handles are meant to be created in very low numbers, and are anyway releases when threads terminate, it is probably not worth the trouble to provide a way to manually release them.<\/p>\n","protected":false},"excerpt":{"rendered":"<p>All primitives and all constructions of the Orchids language must behave in a thread-local way (except for a few special-purpose features). \u00a0This is incompatible with the way some libraries work, such as libxml2. We explain the problem, and a standard solution: thread-local objects. We illustrate this on the mod_xml module.<\/p>\n","protected":false},"author":1,"featured_media":0,"comment_status":"closed","ping_status":"closed","sticky":false,"template":"","format":"standard","meta":{"footnotes":""},"categories":[3],"tags":[],"class_list":["post-238","post","type-post","status-publish","format-standard","hentry","category-data-types"],"_links":{"self":[{"href":"https:\/\/projects.lsv.ens-paris-saclay.fr\/orchidsdev\/index.php?rest_route=\/wp\/v2\/posts\/238","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/projects.lsv.ens-paris-saclay.fr\/orchidsdev\/index.php?rest_route=\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/projects.lsv.ens-paris-saclay.fr\/orchidsdev\/index.php?rest_route=\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/projects.lsv.ens-paris-saclay.fr\/orchidsdev\/index.php?rest_route=\/wp\/v2\/users\/1"}],"replies":[{"embeddable":true,"href":"https:\/\/projects.lsv.ens-paris-saclay.fr\/orchidsdev\/index.php?rest_route=%2Fwp%2Fv2%2Fcomments&post=238"}],"version-history":[{"count":11,"href":"https:\/\/projects.lsv.ens-paris-saclay.fr\/orchidsdev\/index.php?rest_route=\/wp\/v2\/posts\/238\/revisions"}],"predecessor-version":[{"id":279,"href":"https:\/\/projects.lsv.ens-paris-saclay.fr\/orchidsdev\/index.php?rest_route=\/wp\/v2\/posts\/238\/revisions\/279"}],"wp:attachment":[{"href":"https:\/\/projects.lsv.ens-paris-saclay.fr\/orchidsdev\/index.php?rest_route=%2Fwp%2Fv2%2Fmedia&parent=238"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/projects.lsv.ens-paris-saclay.fr\/orchidsdev\/index.php?rest_route=%2Fwp%2Fv2%2Fcategories&post=238"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/projects.lsv.ens-paris-saclay.fr\/orchidsdev\/index.php?rest_route=%2Fwp%2Fv2%2Ftags&post=238"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}