{"id":170,"date":"2014-09-24T16:40:02","date_gmt":"2014-09-24T16:40:02","guid":{"rendered":"https:\/\/projects.lsv.ens-paris-saclay.fr\/orchidsdev\/?p=170"},"modified":"2016-04-04T09:08:34","modified_gmt":"2016-04-04T09:08:34","slug":"extending-the-language","status":"publish","type":"post","link":"https:\/\/projects.lsv.ens-paris-saclay.fr\/orchidsdev\/?p=170","title":{"rendered":"Extending the language"},"content":{"rendered":"<p>Imagine you want to add a new construction to the Orchids language.\u00a0 A simple way is to just add a <a title=\"Writing simple primitives\" href=\"https:\/\/projects.lsv.ens-paris-saclay.fr\/orchidsdev\/?p=163\">new primitive<\/a>.\u00a0 However, in some cases this is not enough. Imagine Orchids did not have multiplication.\u00a0 We might add it as a primitive, and name it <code>mult<\/code> for example. However you would then be forced to write, say, <code>$x = mult($y, $z)<\/code> to multiply <code>$y<\/code> with <code>$z<\/code> in Orchids signatures instead of the cosier <code>$x = $y*$z<\/code>.<!--more--><\/p>\n<p>To allow for the latter kind of syntax, the first thing you should do is make sure you know how to write a primitive implementing multiplication, just like for <a title=\"Writing simple primitives\" href=\"https:\/\/projects.lsv.ens-paris-saclay.fr\/orchidsdev\/?p=163\">simple primitives<\/a>. We shall use slightly different calling conventions (see <code>ovm_mul<\/code> later). For now, we assume that we have implemented multiplicaton through a function named <code>issdl_mul<\/code>, whose C prototype is:<\/p>\n<pre>ovm_var_t *issdl_mul(gc_t *gc_ctx, ovm_var_t *var1, ovm_var_t *var2)<\/pre>\n<p>This takes its arguments explicitly, as C parameters, not from the stack as Orchids primitives would. It also returns its result instead of pushing it onto the stack. Handling the stack will be done in the code of <code>ovm_mul<\/code>. For now, note that <code>ovm_mul<\/code> will have a slightly different calling mechanism than ordinary primitives.<\/p>\n<p>Do <em>not<\/em> register <code>issdl_mul<\/code> or <code>ovm_mul<\/code> as an Orchids primitive, that is, do not try to apply <code>register_lang_function<\/code> or to insert a row in the <code>issdl_function_g<\/code> table. That would add it as a primitive (named <code>mult<\/code>, say), and this is not what we want.<\/p>\n<p>Instead, we shall need to:<\/p>\n<ul>\n<li>Extend the lexer<\/li>\n<li>Extend the parser<\/li>\n<li>Extend the virtual machine<\/li>\n<li>Extend the rule compiler<\/li>\n<li>Extend the type-checker<\/li>\n<\/ul>\n<hr \/>\n<p><strong>Extending the lexer<\/strong><\/p>\n<p>Just add the following line to the lexer (<code>issdl.l<\/code>):<\/p>\n<pre>\"*\"\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0 { return (O_TIMES); }\r\n<\/pre>\n<p>Of course this line is already present in the lexer&#8230;\u00a0 I am using a primitive that already exists as an example.\u00a0 In particular, if you ever wonder where you have to insert similar lines, just look where the above line lies in <code>issdl.l<\/code>.<\/p>\n<p>In the above line, we return a token <code>O_TIMES<\/code> to be used by the parser.\u00a0 This will be defined in <code>issdl.y<\/code>, see below.<\/p>\n<hr \/>\n<p><strong>Extending the parser<\/strong><\/p>\n<p>We declare\u00a0<code>O_TIMES<\/code> by one of the standard yacc declarations <code>%token<\/code>, <code>%left<\/code>, <code>%right<\/code>, or <code>%nonassoc <\/code>in <code>issdl.y<\/code>.\u00a0 (I&#8217;ll assume some familiarity with <a title=\"Yacc\" href=\"https:\/\/en.wikipedia.org\/wiki\/Yacc\">yacc<\/a> here.)<\/p>\n<p>Here we&#8217;ll add it as a left-associative operator with the same precedence as <code>\/<\/code> and <code>%<\/code>.\u00a0 Let me give you the surrounding lines as well.<\/p>\n<pre>%left O_PLUS O_PLUS_EVENT O_MINUS\r\n%left O_TIMES O_DIV O_MOD\r\n%right PLUSPLUS MINUSMINUS\r\n<\/pre>\n<p>This notably says that <code>*<\/code> (<code>O_TIMES<\/code>) binds tighter than <code>+<\/code> (<code>O_PLUS<\/code>) and slacker than <code>++<\/code> (<code>PLUSPLUS<\/code>).\u00a0 Please respect the principle of least surprise and use the same precedences than some known language, e.g., <a title=\"C operator precedence table\" href=\"https:\/\/www.difranco.net\/compsci\/C_Operator_Precedence_Table.htm\">those of C<\/a>.<\/p>\n<p>Then we add a new production to the grammar, or enrich a pre-existing production.\u00a0 Here we wish products to be expressions.\u00a0 We therefore add the following clause to the <code>expr<\/code> production:<\/p>\n<pre>expr:\r\n     ...\r\n  | expr O_TIMES expr\r\n\u00a0   { RESULT($$, build_expr_binop(compiler_ctx_g, OP_MUL, $1, $3)); }\r\n     ...<\/pre>\n<p>The code between the curly brackets builds a piece of the abstract syntax tree that will later be compiled to bytecode.<\/p>\n<ul>\n<ul>\n<li><code>compiler_ctx_g<\/code> is the compiler&#8217;s global context.\u00a0 Despite what the <code>_g<\/code> ending says, this is not a global variable; rather, it is a macro, which gets the compiler context from the global Orchids context <code>ctx<\/code>.<\/li>\n<li><code>build_expr_binop<\/code> builds a binary expression node, tagged with the constant <code>OP_MUL<\/code>, and with two arguments <code>$1<\/code>, <code>$3<\/code> denoting the two expressions that we are multiplying. <code>OP_MUL<\/code> is the bytecode that the Orchids virtual machine uses to implement multiplication, and is defined in <code>ovm.h <\/code>(see below).\u00a0 The function <code>build_expr_binop<\/code> is used to build abstract syntax tree nodes for constructions that take two arguments and return a value. For unary constructions, use <code>build_expr_monop<\/code>. For binary constructions that return a Boolean value, meant to be tested by <code>if<\/code> tests, say, use <code>build_expr_cond<\/code> instead (e.g, <code>&lt;=<\/code> or <code>&amp;&amp;<\/code>).\u00a0 Other examples can be found in the grammar file <code>issdl.y<\/code>.<\/li>\n<li>The resulting abstract syntax node should be stored into <code>$$<\/code>. Instead of writing <code>$$ = build_expr_binop(compiler_ctx_g, OP_MUL, $1, $3)<\/code>, please use the <code>RESULT <\/code>macro. This not only takes care of calling <code>GC_TOUCH<\/code>, but will also push <code>$$<\/code> onto a stack (<code>ctx-&gt;protected<\/code>, where <code>ctx<\/code> is the rule compiler context) that the garbage collector will go though to mark objects. This makes sure that none of these newly created nodes will be freed by the garbage collector. You may also use <code>RESULT_DROP<\/code>, instead of <code>RESULT<\/code>, to first pop that stack to a predefined level, then push the result. A typical example is the grammar production:\n<pre>rule:\r\n  rulekw SYMNAME synchro O_BRACE state states C_BRACE\r\n  { RESULT_DROP($$,$1, build_rule(compiler_ctx_g, &amp;($2), $5, $6, $3)); }\r\n;\r\n<\/pre>\n<p>which first pops the\u00a0<code>ctx-&gt;protected<\/code> stack to the level given by <code>$1<\/code>, then pushes the result of build_rule.\u00a0 The right level is obtained by calling <code>COMPILE_GC_DEPTH<\/code> on recognizing the first token (<code>RULE<\/code>):<\/p>\n<pre>rulekw : RULE { $$ = COMPILE_GC_DEPTH(compiler_ctx_g); }\r\n;<\/pre>\n<p>We did not do that for multiplication, because it would be impractical to obtain the value of <code>COMPILE_GC_DEPTH<\/code> before we even recognize the first <code>expr<\/code> given as multiplicand. In that case, the <code>ctx-&gt;protected<\/code> stack will probably be slightly larger than needed, but that should not cause much of a problem.<\/li>\n<\/ul>\n<\/ul>\n<hr \/>\n<p><strong>Extending the virtual machine<br \/>\n<\/strong><\/p>\n<p>That mostly means adding one bytecode in our example, namely <code>OP_MUL<\/code>.\u00a0 Add it at the end of the list of macros of the form <code>OP_<\/code>&#8230; in <code>ovm.h<\/code>.\u00a0 Make sure that the total number of opcodes is updated.\u00a0 At the time of writing, the last opcode becomes number 36, so we must write:<\/p>\n<pre>#define OPCODE_NUM 37<\/pre>\n<p>Now you need to write the code for the <code>OP_MUL<\/code> opcode:<\/p>\n<pre>static int ovm_mul(isn_param_t *param)\r\n{\r\n\u00a0 orchids_t *ctx = param-&gt;ctx;\r\n\u00a0 ovm_var_t *op1;\r\n\u00a0 ovm_var_t *op2;\r\n\u00a0 ovm_var_t *res = NULL;\r\n\r\n\u00a0 DebugLog(DF_OVM, DS_DEBUG, \"OP_MUL\\n\");\r\n\u00a0 op2 = (ovm_var_t *)STACK_ELT(ctx-&gt;ovm_stack, 1);\r\n\u00a0 op1 = (ovm_var_t *)STACK_ELT(ctx-&gt;ovm_stack, 2);\r\n\u00a0 param-&gt;ip += 1;\r\n\u00a0 if (!IS_NULL(op1) &amp;&amp; !IS_NULL(op2))\r\n\u00a0\u00a0\u00a0 res = issdl_mul(ctx-&gt;gc_ctx, op1, op2);\r\n\u00a0 STACK_DROP(ctx-&gt;ovm_stack, 2);\r\n\u00a0 PUSH_VALUE(ctx, res);\r\n\u00a0 return 0;\r\n}<\/pre>\n<p>As you can see, <code>ovm_mul<\/code> takes a unique parameter of type <code>isn_param_t *<\/code> as argument. This is a pointer to a struct that contains all the relevant information: the Orchids context <code>param-&gt;ctx<\/code> (from which we obtain the virtual machine&#8217;s stack, and then the two values to multiply) and the program counter <code>param-&gt;ip<\/code>, which we will have to increment.<\/p>\n<hr \/>\n<p><strong>Extending the rule compiler<\/strong><\/p>\n<p>We do not have to do anything here!\u00a0 Since we have already registered our new construction as building an abstract syntax tree through <code>build_expr_binop<\/code>, the rule compiler already knows how to compile this with the opcode given (<code>OP_MUL<\/code>).<\/p>\n<p>We would be in an equally cosy situation with <code>build_expr_monop<\/code>, or <code>build_expr_cond<\/code>.\u00a0 Exploring the various other grammar rules, you will see that various other node forming constructs are provided, for various situations.<\/p>\n<hr \/>\n<p><strong>Extending the type-checker<\/strong><\/p>\n<p>You need to instruct Orchids about the type(s) of your new construct.\u00a0 Contrarily to <a title=\"Writing simple primitives\" href=\"https:\/\/projects.lsv.ens-paris-saclay.fr\/orchidsdev\/?p=163\">primitives<\/a>, there is no standardized way of doing that.\u00a0 The best advice I can tell you is: find an already existing primitive that looks similar to yours, and type it in a similar way.<\/p>\n<p>In our example, since we have added a binary operator, we should look at the <code>compute_stype_binop()<\/code> function in <code>rule_compiler.c<\/code>. There is a <code>switch<\/code> on the binary operator, and we add a new branch:<\/p>\n<pre>    case OP_MUL:\r\n      restype = stype_join (ltype, rtype);\r\n      if (restype==&amp;t_any)\r\n        goto type_error;\r\n      if (strcmp(restype-&gt;name, \"int\") &amp;&amp;\r\n          strcmp(restype-&gt;name, \"uint\") &amp;&amp;\r\n          strcmp(restype-&gt;name, \"float\"))\r\n        goto type_error;\r\n      set_type (ctx, myself, restype);\r\n      return;\r\n<\/pre>\n<p>This reads as follows. We first call the <code>stype_join<\/code>\u00a0function to compute the least common super types of the types <code>ltype<\/code> and <code>rtype<\/code> of the two arguments. (A super type is a type that contains a superset of values. For example, <code>db[int, uint]<\/code> is a super type of the type <code>db[*]<\/code>, the latter containing only the empty database.) This makes sure that we know a common type that the two arguments inhabit.<\/p>\n<p>Next, we compare the common type to one of the allowed types, <code>int<\/code>, <code>uint<\/code>, or <code>float<\/code>. Indeed <code>ovm_mul<\/code> calls <code>issdl_mul<\/code>, which will call a function pointer, depending on the runtime type of the arguments, and is able to multiply signed and unsigned integers, and floating-point values.<\/p>\n<p>Finally, if we have passed all tests, we call <code>set_type()<\/code> to set the type of the current expression, <code>myself<\/code>, to the common type. This does not really set the type of the expression, rather it pushes the task of doing so onto a queue, but that does not really matter.<\/p>\n<hr \/>\n<p><strong>Extending the static analyzer<\/strong><\/p>\n<p>We should also update the static analyzer. This means essentially the monotonicity checker.<\/p>\n<p>In our example, since we have added a binary operator, we should look at the <code>compute_monotony_bin()<\/code> function in <code>rule_compiler.c<\/code>. There is a <code>switch<\/code> on the binary operator, and we add a new branch:<\/p>\n<pre>    case OP_MUL:\r\n      if (ma==MONO_CONST &amp;&amp; mb==MONO_CONST)\r\n\tres = MONO_CONST;\r\n      break;\r\n<\/pre>\n<p>This means that <code>OP_MUL<\/code> is not particularly monotonic. We are just saying that if both arguments are constant, then the result is constant.<\/p>\n<p>That seems to be the absolute minimum, but that is not true. For example, a function that returns a new number each time it is called, or a construction that reads from a file, or writes to a file and returns an error code, should return <code>MONO_UNKNOWN<\/code>, not <code>MONO_CONST<\/code>.<\/p>\n<p>On the opposite, monotonic constructions such as addition are handled as follows:<\/p>\n<pre>    case OP_ADD: \/* + is meant to be monotonic, whatever the types\r\n\t\t    of the arguments *\/\r\n      \/* MONO+MONO=MONO, ANTI+ANTI=ANTI *\/\r\n      if ((ma &amp; MONO_MONO) &amp;&amp; (mb &amp; MONO_MONO))\r\n\tres |= MONO_MONO;\r\n      if ((ma &amp; MONO_ANTI) &amp;&amp; (mb &amp; MONO_ANTI))\r\n\tres |= MONO_ANTI;\r\n      break;\r\n<\/pre>\n<p>knowing that <code>res<\/code> is initialized to <code>MONO_UNKNOWN<\/code>, that is, to <code>0<\/code>. Also, it may be that the two arguments satisfy the two <code>if<\/code> tests. This happens when <code>ma<\/code> and <code>mb<\/code> are equal to <code>MONO_CONST<\/code>, which is just <code>MONO_MONO | MONO_ANTI<\/code>. In that case, <code>res<\/code> will end up containing <code>MONO_CONST<\/code>, which just says that the sum of two constant arguments is constant.<\/p>\n<p>As Orchids develops, the number of static analyzers grows. \u00a0How do you know which analyzers you should modify? \u00a0The simplest recipe is as follows. \u00a0Find a common byte code that already exists, say, \u00a0<span style=\"font-family: Consolas, Monaco, 'Lucida Console', monospace;\"><span style=\"font-size: 12px;\">OP_SUB<\/span><\/span>, and take one that would be the most like the one you want to add. \u00a0Then, look for all the places in the code where that is used, by typing\u00a0<code>grep OP_SUB src\/*.[ch] src\/modules\/*.[ch]<\/code>, and add your own code after each of the relevant lines that you have just\u00a0found.<\/p>\n","protected":false},"excerpt":{"rendered":"<p>Imagine you want to add a new construction to the Orchids language.\u00a0 A simple way is to just add a new primitive.\u00a0 However, in some cases this is not enough. Imagine Orchids did not have multiplication.\u00a0 We might add it as a primitive, and name it mult for example. However you would then be forced [&hellip;]<\/p>\n","protected":false},"author":1,"featured_media":0,"comment_status":"closed","ping_status":"closed","sticky":false,"template":"","format":"standard","meta":{"footnotes":""},"categories":[4],"tags":[],"class_list":["post-170","post","type-post","status-publish","format-standard","hentry","category-virtual-machine"],"_links":{"self":[{"href":"https:\/\/projects.lsv.ens-paris-saclay.fr\/orchidsdev\/index.php?rest_route=\/wp\/v2\/posts\/170","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/projects.lsv.ens-paris-saclay.fr\/orchidsdev\/index.php?rest_route=\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/projects.lsv.ens-paris-saclay.fr\/orchidsdev\/index.php?rest_route=\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/projects.lsv.ens-paris-saclay.fr\/orchidsdev\/index.php?rest_route=\/wp\/v2\/users\/1"}],"replies":[{"embeddable":true,"href":"https:\/\/projects.lsv.ens-paris-saclay.fr\/orchidsdev\/index.php?rest_route=%2Fwp%2Fv2%2Fcomments&post=170"}],"version-history":[{"count":24,"href":"https:\/\/projects.lsv.ens-paris-saclay.fr\/orchidsdev\/index.php?rest_route=\/wp\/v2\/posts\/170\/revisions"}],"predecessor-version":[{"id":299,"href":"https:\/\/projects.lsv.ens-paris-saclay.fr\/orchidsdev\/index.php?rest_route=\/wp\/v2\/posts\/170\/revisions\/299"}],"wp:attachment":[{"href":"https:\/\/projects.lsv.ens-paris-saclay.fr\/orchidsdev\/index.php?rest_route=%2Fwp%2Fv2%2Fmedia&parent=170"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/projects.lsv.ens-paris-saclay.fr\/orchidsdev\/index.php?rest_route=%2Fwp%2Fv2%2Fcategories&post=170"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/projects.lsv.ens-paris-saclay.fr\/orchidsdev\/index.php?rest_route=%2Fwp%2Fv2%2Ftags&post=170"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}