{"id":1934,"date":"2019-04-12T12:52:21","date_gmt":"2019-04-12T16:52:21","guid":{"rendered":"http:\/\/codinggorilla.com\/?p=1934"},"modified":"2019-04-12T17:30:07","modified_gmt":"2019-04-12T21:30:07","slug":"rewriting-the-pattern-matching-engine-part-2","status":"publish","type":"post","link":"http:\/\/165.227.223.229\/index.php\/2019\/04\/12\/rewriting-the-pattern-matching-engine-part-2\/","title":{"rendered":"Rewriting the pattern matching engine &#8212; part 2"},"content":{"rendered":"\n<p>Patterns in Piggy are regular expressions of parenthesized expressions that represent trees. The conversion of the regular expressions to a finite state automaton is easy (<a href=\"http:\/\/www.cs.may.ie\/staff\/jpower\/Courses\/Previous\/parsing\/node5.html\">via Thompson&#8217;s Construction<\/a>), but the resulting automaton for the pattern may not work for an important reason: patterns that specify attributes and children of an AST node have an implicit &#8220;&#8230;&#8221; between each attribute or child pattern. While it&#8217;s possible to introduce additional states to recognize &#8220;&#8230;&#8221; between each attribute or child node in the pattern, the resulting automaton is ambiguous. For every input AST node, any attribute or child AST node can mismatch a &#8220;sub-pattern&#8221;, but can still match the complete pattern.<\/p>\n\n\n\n<!--more-->\n\n\n\n<p>To illustrate this issue, consider the following AST for an EnumDecl node, and the pattern &#8220;( EnumDecl ( EnumConstantDecl ) )&#8221;. The EnumDecl contains a FullComment child node and attributes for SrcLoc, Name, and Type.<\/p>\n\n\n\n<pre class=\"wp-block-preformatted\">( EnumDecl SrcLoc=\"..\\src\\environ.h:286:1\"<br>     ( FullComment <br>       ( ParagraphComment <br>         ( TextComment Text=\" Simple search state variables \"<br>     ) ) )<br>     ( EnumConstantDecl Name=\"L_NOT_FOUND\" Type=\"int\"<br>       ( IntegerLiteral Type=\"int\" Value=\"0\"<br>     ) )<br>     ( EnumConstantDecl Name=\"L_FOUND\" Type=\"int\"<br>       ( IntegerLiteral Type=\"int\" Value=\"1\"<br>   ) ) )<\/pre>\n\n\n\n<p>This pattern should match this EnumDecl AST node, but it can&#8217;t if we use an automaton that looks for literally \u00e2\u20ac\u009c(&#8221; &#8220;EnumDecl&#8221; &#8220;(&#8221; &#8220;EnumConstantDecl&#8221; &#8220;)&#8221; &#8220;)\u00e2\u20ac\u009d in the token input stream. In fact, the automaton would fault on the input token &#8220;SrcLoc&#8221; because it would be expecting a &#8220;(&#8221; token. Even after removing the &#8220;SrcLoc=&#8230;.&#8221; the automaton would still fail because it would be looking for &#8220;EnumConstantDecl&#8221; after recognizing &#8220;(&#8220;, finding instead &#8220;FullComment&#8221;.<\/p>\n\n\n\n<p>In order to perform a lookahead and skip any attribute or child node not specified in the pattern, I extended the automaton with &#8220;subpatterns&#8221;, which are separate finite state automatons of the pattern used to match attributes or child nodes in a pattern.<\/p>\n\n\n\n<p>So, for &#8220;( EnumDecl ( EnumConstantDecl ) )&#8221;, an NFA is first constructed for the regular expression pattern. In this automaton, edges can be an input <g class=\"gr_ gr_97 gr-alert gr_gramm gr_inline_cards gr_run_anim Punctuation only-del replaceWithoutSep\" id=\"97\" data-gr-id=\"97\">symbol,<\/g> or a subpattern.<\/p>\n\n\n\n<p style=\"text-align:center\"><img class=\"wp-image-1973\" style=\"width: 150px;\" src=\"http:\/\/codinggorilla.com\/wordpress\/wp-content\/uploads\/2019\/04\/graphviz.png\" alt=\"\"><\/p>\n\n\n\n<p>A DFA-like automaton is constructed for the NFA using the usual epsilon-closure algorithm.<\/p>\n\n\n\n<p style=\"text-align:center\"><img class=\"wp-image-1972\" style=\"width: 150px;\" src=\"http:\/\/codinggorilla.com\/wordpress\/wp-content\/uploads\/2019\/04\/graphviz-1.png\" alt=\"\"><\/p>\n\n\n\n<p>When parsing the input, the sub-automaton is used to recognize an attribute or child node. If it does not match, the &#8220;Any&#8221; sub-automaton is used to recognized the input and the top-level automaton recognition restarted with the next input attribute or child node.<\/p>\n","protected":false},"excerpt":{"rendered":"<p>Patterns in Piggy are regular expressions of parenthesized expressions that represent trees. The conversion of the regular expressions to a finite state automaton is easy (via Thompson&#8217;s Construction), but the resulting automaton for the pattern may not work for an important reason: patterns that specify attributes and children of an AST node have an implicit &hellip; <\/p>\n<p class=\"link-more\"><a href=\"http:\/\/165.227.223.229\/index.php\/2019\/04\/12\/rewriting-the-pattern-matching-engine-part-2\/\" class=\"more-link\">Continue reading<span class=\"screen-reader-text\"> &#8220;Rewriting the pattern matching engine &#8212; part 2&#8221;<\/span><\/a><\/p>\n","protected":false},"author":1,"featured_media":0,"comment_status":"open","ping_status":"open","sticky":false,"template":"","format":"standard","meta":[],"categories":[],"tags":[],"_links":{"self":[{"href":"http:\/\/165.227.223.229\/index.php\/wp-json\/wp\/v2\/posts\/1934"}],"collection":[{"href":"http:\/\/165.227.223.229\/index.php\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"http:\/\/165.227.223.229\/index.php\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"http:\/\/165.227.223.229\/index.php\/wp-json\/wp\/v2\/users\/1"}],"replies":[{"embeddable":true,"href":"http:\/\/165.227.223.229\/index.php\/wp-json\/wp\/v2\/comments?post=1934"}],"version-history":[{"count":0,"href":"http:\/\/165.227.223.229\/index.php\/wp-json\/wp\/v2\/posts\/1934\/revisions"}],"wp:attachment":[{"href":"http:\/\/165.227.223.229\/index.php\/wp-json\/wp\/v2\/media?parent=1934"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"http:\/\/165.227.223.229\/index.php\/wp-json\/wp\/v2\/categories?post=1934"},{"taxonomy":"post_tag","embeddable":true,"href":"http:\/\/165.227.223.229\/index.php\/wp-json\/wp\/v2\/tags?post=1934"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}