{"id":1824,"date":"2019-02-27T21:14:30","date_gmt":"2019-02-28T02:14:30","guid":{"rendered":"http:\/\/codinggorilla.com\/?p=1824"},"modified":"2019-03-13T20:03:14","modified_gmt":"2019-03-14T00:03:14","slug":"program-transformational-systems-coccinelle","status":"publish","type":"post","link":"http:\/\/165.227.223.229\/index.php\/2019\/02\/27\/program-transformational-systems-coccinelle\/","title":{"rendered":"Series on program transformation systems: Coccinelle (2006)"},"content":{"rendered":"\n<p>Due to my work on Piggy, I&#8217;m starting to do a thorough review of the literature on program transformation systems, how Piggy relates to prior research, and what improvements I can make to Piggy. Note, a good list to start from is in <a href=\"https:\/\/en.wikipedia.org\/wiki\/List_of_program_transformation_systems\">Wikipedia<\/a>: ASF+SDF, CIL (for C), Coccinelle (for C), DMS, Fermat, Spoon (for Java), Stratego\/XT, TXL). This is the first entry in the series, on <a href=\"https:\/\/en.wikipedia.org\/wiki\/Coccinelle_(software)\">Coccinelle<\/a>, a system that modifies <g class=\"gr_ gr_5 gr-alert gr_tiny gr_gramm gr_inline_cards gr_run_anim Grammar only-ins replaceWithoutSep\" id=\"5\" data-gr-id=\"5\">C<\/g> source code.<\/p>\n\n\n\n<!--more-->\n\n\n\n<p>Coccinelle has its roots in the analysis of software for detecting bugs in device drivers in Linux [Padioleau, Lawall, Muller 2006]. In this work, the authors were trying to solve the problem of taking a bug fix of some code and applying the fix across the entire source code base. <g class=\"gr_ gr_9 gr-alert gr_spell gr_inline_cards gr_run_anim ContextualSpelling ins-del multiReplace\" id=\"9\" data-gr-id=\"9\">Coccinelle<\/g> represents the program as a control-flow graph (CFG), and a language used to make transformations, <a href=\"http:\/\/coccinelle.lip6.fr\/docs\/main_grammar.pdf\">SmPL<\/a>, which represents computational tree logic (CTL). The pattern language uses a syntax that specifies the pattern and fix for source code transformations together. Patterns are written in C language, and the replacement code with &#8220;+&#8221; and &#8220;-&#8220;. So, patterns should be easy to construct. Note, Coccinelle does not guarantee semantics are preserved [Jones, Hansen 2007], but this isn&#8217;t usually a problem in practice. It also represents the C program prior to preprocessing in order to preserve the original look of the program.<\/p>\n\n\n\n<p>Taking a closer look at their example, the bug is in <a href=\"https:\/\/git.kernel.org\/pub\/scm\/linux\/kernel\/git\/torvalds\/linux.git\/tree\/drivers\/rtc\/rtc-pcf8563.c?id=57d54889cd00db2752994b389ba714138652e60c#n233\">pcf8563_remove<\/a>(), where the function fails to clear the client data before calling <g class=\"gr_ gr_4 gr-alert gr_spell gr_inline_cards gr_run_anim ContextualSpelling ins-del multiReplace\" id=\"4\" data-gr-id=\"4\">kfree<\/g>(). This is a dangling pointer, which is usually fixed by just clearing the buffer that contains the pointer so that it is never used. Patch can&#8217;t be used because the format of the code may interfere with the problem being recognized. Further, it doesn&#8217;t allow one to express missing function calls and various ways of writing the code.<\/p>\n\n\n\n<pre class=\"wp-block-code\"><code>    static int pcf8563_remove(struct i2c_client *client)\n    {\n        struct pcf8563 *pcf8563 = i2c_get_clientdata(client);\n\n        if (pcf8563->rtc)\n            rtc_device_unregister(pcf8563->rtc);\n\n        kfree(pcf8563);\n\n        return 0;\n    }<\/code><\/pre>\n\n\n\n<p>The pattern to correct the problem detects whether there isn&#8217;t a call to i2c_set_clientdata() before the kfree(). If the pattern is detected, the call to  <br>i2c_set_clientdata () is inserted.<\/p>\n\n\n\n<pre class=\"wp-block-code\"><code>    @@\n    type T;\n    identifier client, data;\n    @@\n\n    \/\/ Check if function uses clientdata\n    (\n        i2c_set_clientdata(client, data);\n    |\n        data = i2c_get_clientdata(client);\n    |\n        T data = i2c_get_clientdata(client);\n    )\n        \/\/ Anything in between is OK\n        ...\n    (\n        \/\/ If this pattern is found, clientdata is set to NULL before data is freed.\n        \/\/ Do nothing and skip the rest of the alternation\n        i2c_set_clientdata(client, NULL);\n        ...\n        kfree(data);\n    |\n        \/\/ If this pattern is found, clientdata is set to NULL after data is freed.\n        \/\/ Move it to the front and skip the rest of the alternation\n    +\ti2c_set_clientdata(client, NULL);\n        kfree(data);\n        ...\n    -\ti2c_set_clientdata(client, NULL);\n    |\n        \/\/ Otherwise apply a fix if kfree() has been found in some code path\n        \/\/ (doesn't need to be in all paths).\n    +\ti2c_set_clientdata(client, NULL);\n    ? \tkfree(data);\n    )<\/code><\/pre>\n\n\n\n<h2>Observations and notes<\/h2>\n\n\n\n<ul><li> Coccinelle uses a syntax that intermingles the pattern and the fix, which is also how Piggy patterns are specified. But, pattern and fix seem to be only for C code. Piggy&#8217;s program transformation language is for ASTs, not CTL. Coccinelle uses an ellipsis (&#8220;&#8230;&#8221;) to denote control flow patterns where arbitrary code may appear. Piggy does not have the equivalent, because it is assumed that all AST nodes can have arbitrary content between siblings in an AST. However, Piggy uses Kleene star to denote items at an arbitrary depth in the subtree. Coccinelle&#8217;s language, SmPL, is large and complicated. There is little explanation of many constructs, e.g., what <em>dep &amp;&amp; dep<\/em> means within the metavariable section. Indeed, they remark in one paper the value in not adding features for every damn thing.<\/li><\/ul>\n\n\n\n<ul><li>Coccinelle declares sub-expressions using <em>metavariables<\/em> within the &#8220;@@ &#8230; @@&#8221; metavariable sections. Variables can be used in the replacement code. Piggy doesn&#8217;t provide that, but provides access to the AST through the <em>Tree<\/em> type, and allows one to specify any number of C# fields in the generated class. <\/li><\/ul>\n\n\n\n<ul><li>Coccinelle alters the source code and provides next-in-chain pattern matching. Piggy currently does not provide a clear path to alter the AST.<\/li><\/ul>\n\n\n\n<ul><li>Coccinelle is written in OCAML.<\/li><\/ul>\n\n\n\n<ul><li>Coccinelle parses source code using a hand-written, top-down parser, then performs data flow analysis to derive a CTL. The system performs pattern matching on this graph, then converts it back to source code. It looks like it can handle #define&#8217;s but this was only added in the last few months.<\/li><\/ul>\n\n\n\n<ul><li>The source file <g class=\"gr_ gr_45 gr-alert gr_spell gr_inline_cards gr_run_anim ContextualSpelling ins-del multiReplace\" id=\"45\" data-gr-id=\"45\"><a href=\"https:\/\/github.com\/torvalds\/linux\/blob\/master\/drivers\/rtc\/rtc-pcf8563.c\">rtc<\/a><\/g><a href=\"https:\/\/github.com\/torvalds\/linux\/blob\/master\/drivers\/rtc\/rtc-pcf8563.c\">-pcf8563.c<\/a> no longer contains pcf8563_remove().<\/li><\/ul>\n\n\n\n<ul><li>I really like the way Coccinelle allows you to use C source for the patterns. It makes sense to incorporate in Piggy C source code patterns in the language.<\/li><\/ul>\n\n\n\n<h2>References<\/h2>\n\n\n\n<p>Padioleau Y, Lawall JL, Muller G. Understanding collateral evolution in Linux device drivers. In ACM SIGOPS Operating Systems Review 2006 Apr 18 (Vol. 40, No. 4, pp. 59-71). ACM.  (<a href=\"https:\/\/dl.acm.org\/citation.cfm?id=1217942\">DOI<\/a>, <a href=\"https:\/\/hal.inria.fr\/file\/index\/docid\/70251\/filename\/RR-5769.pdf\">pdf<\/a>, <a href=\"https:\/\/scholar.google.com\/scholar?hl=en&amp;as_sdt=0%2C22&amp;q=Understanding+collateral+evolution+in+Linux+device+drivers.&amp;btnG=\">GS<\/a>)<\/p>\n\n\n\n<p>Jones, N.D. and Hansen, R.R., 2007, November. The semantics of \u00e2\u20ac\u009csemantic patches\u00e2\u20ac\u009d in <g class=\"gr_ gr_6 gr-alert gr_spell gr_inline_cards gr_run_anim ContextualSpelling\" id=\"6\" data-gr-id=\"6\"><g class=\"gr_ gr_6 gr-alert gr_spell gr_inline_cards gr_run_anim ContextualSpelling\" id=\"6\" data-gr-id=\"6\">coccinelle<\/g><\/g>: Program transformation for the working programmer. In&nbsp;<em><g class=\"gr_ gr_7 gr-alert gr_gramm gr_inline_cards gr_run_anim Grammar only-ins replaceWithoutSep\" id=\"7\" data-gr-id=\"7\">Asian<\/g> Symposium on Programming Languages and Systems<\/em> (pp. 303-318). Springer, Berlin, Heidelberg. (<a href=\"https:\/\/link.springer.com\/chapter\/10.1007\/978-3-540-76637-7_21\">DOI<\/a>, <a href=\"https:\/\/www.researchgate.net\/profile\/Neil_Jones3\/publication\/221323392_The_Semantics_of_Semantic_Patches_in_Coccinelle_Program_Transformation_for_the_Working_Programmer\/links\/00b4951922ad1171ac000000\/The-Semantics-of-Semantic-Patches-in-Coccinelle-Program-Transformation-for-the-Working-Programmer.pdf\">pdf<\/a>,<a href=\"https:\/\/scholar.google.com\/scholar?hl=en&amp;as_sdt=0%2C22&amp;q=The+semantics+of+%E2%80%9Csemantic+patches%E2%80%9D+in+coccinelle%3A+Program+transformation+for+the+working+programmer.&amp;btnG=\">GS<\/a>)<\/p>\n\n\n\n<h2>Additional Reading<\/h2>\n\n\n\n<p>Stuart, H., 2008. Hunting bugs with Coccinelle. <em>Master&#8217;s Thesis<\/em>. (<a href=\"http:\/\/web.imt-atlantique.fr\/x-info\/coccinelle\/stuart_thesis.pdf\">pdf<\/a>)<\/p>\n\n\n\n<p><a href=\"http:\/\/coccinellery.org\/\">http:\/\/coccinellery.org\/<\/a>, accessed Feb 27, 2019.<\/p>\n\n\n\n<p><a href=\"http:\/\/coccinelle.lip6.fr\/\">http:\/\/coccinelle.lip6.fr\/<\/a>, accessed Feb 27, 2019.<\/p>\n\n\n\n<p><a href=\"https:\/\/lwn.net\/Articles\/380835\/\">https:\/\/lwn.net\/Articles\/380835\/<\/a>, accessed Feb 27, 2019.<\/p>\n\n\n\n<p>Lawall, J. and Muller, G., 2018. Coccinelle: 10 years of automated evolution in the Linux kernel. In <em>2018 {USENIX} Annual Technical Conference ({USENIX}{ATC} 18)<\/em> (pp. 601-614). (<a href=\"https:\/\/www.usenix.org\/system\/files\/conference\/atc18\/atc18-lawall.pdf\">pdf<\/a>)<\/p>\n\n\n\n<h2>Open Source (GPLv2) at Github<\/h2>\n\n\n\n<p><a href=\"https:\/\/github.com\/coccinelle\/coccinelle\">https:\/\/github.com\/coccinelle\/coccinelle<\/a><\/p>\n","protected":false},"excerpt":{"rendered":"<p>Due to my work on Piggy, I&#8217;m starting to do a thorough review of the literature on program transformation systems, how Piggy relates to prior research, and what improvements I can make to Piggy. Note, a good list to start from is in Wikipedia: ASF+SDF, CIL (for C), Coccinelle (for C), DMS, Fermat, Spoon (for &hellip; <\/p>\n<p class=\"link-more\"><a href=\"http:\/\/165.227.223.229\/index.php\/2019\/02\/27\/program-transformational-systems-coccinelle\/\" class=\"more-link\">Continue reading<span class=\"screen-reader-text\"> &#8220;Series on program transformation systems: Coccinelle (2006)&#8221;<\/span><\/a><\/p>\n","protected":false},"author":1,"featured_media":0,"comment_status":"open","ping_status":"open","sticky":false,"template":"","format":"standard","meta":[],"categories":[],"tags":[],"_links":{"self":[{"href":"http:\/\/165.227.223.229\/index.php\/wp-json\/wp\/v2\/posts\/1824"}],"collection":[{"href":"http:\/\/165.227.223.229\/index.php\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"http:\/\/165.227.223.229\/index.php\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"http:\/\/165.227.223.229\/index.php\/wp-json\/wp\/v2\/users\/1"}],"replies":[{"embeddable":true,"href":"http:\/\/165.227.223.229\/index.php\/wp-json\/wp\/v2\/comments?post=1824"}],"version-history":[{"count":0,"href":"http:\/\/165.227.223.229\/index.php\/wp-json\/wp\/v2\/posts\/1824\/revisions"}],"wp:attachment":[{"href":"http:\/\/165.227.223.229\/index.php\/wp-json\/wp\/v2\/media?parent=1824"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"http:\/\/165.227.223.229\/index.php\/wp-json\/wp\/v2\/categories?post=1824"},{"taxonomy":"post_tag","embeddable":true,"href":"http:\/\/165.227.223.229\/index.php\/wp-json\/wp\/v2\/tags?post=1824"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}