{"id":2868,"date":"2021-07-09T15:57:51","date_gmt":"2021-07-09T19:57:51","guid":{"rendered":"http:\/\/codinggorilla.com\/?p=2868"},"modified":"2021-12-13T11:21:13","modified_gmt":"2021-12-13T11:21:13","slug":"writing-a-universal-programming-language-extension","status":"publish","type":"post","link":"http:\/\/165.227.223.229\/index.php\/2021\/07\/09\/writing-a-universal-programming-language-extension\/","title":{"rendered":"Writing a universal programming language extension for VSCode"},"content":{"rendered":"\n<p>There was an interesting <a href=\"https:\/\/stackoverflow.com\/questions\/68309459\/is-it-possible-to-build-a-vs-code-extension-that-allows-users-to-supply-their-ow\">question in Stackoverflow<\/a> today: Is it possible to write an extension for VSCode that can accept any grammar and support semantic highlighting, defs\/refs, and other language features?<\/p>\n\n\n\n<p>I&#8217;ve always toyed with the idea of having a \u00e2\u20ac\u009cuniversal language server\u00e2\u20ac\u009d with the server dynamically reading a grammar then parsing the input.<\/p>\n\n\n\n<p>So, I decided to write <a href=\"https:\/\/github.com\/kaby76\/uni-vscode\" class=\"ek-link\">such an extension<\/a>. It took me about four hours to get something working, but I would assume some could write something faster and cleaner.<\/p>\n\n\n\n<p>Is such an extension possible? The answer is yes, but writing one depends on several factors: Have you written a VSCode extension before? Are you starting from an existing implementation? Are you a language tool expert? If your answer \u00e2\u20ac\u009cno\u00e2\u20ac\u009d to either of these questions, it could take a bit of time to write.<\/p>\n\n\n\n<p>I started from the <a href=\"https:\/\/github.com\/kaby76\/AntlrVSIX\" class=\"ek-link\">AntlrVSIX<\/a> extension for VSCode that I already wrote a year or two ago. The server is in C# and client in Typescript.<\/p>\n\n\n\n<p>The server code originally required a month or two to write because there weren\u00e2\u20ac\u2122t any good examples to model. In addition, a bit of infrastructure need to be written because the server-side APIs for LSP did not implement version 3.16 of Language Server Protocol, which includes \u00e2\u20ac\u009csemantic highlighting\u00e2\u20ac\u009d.<\/p>\n\n\n\n<p>For the grammar specification, one has to put a stake in the ground and decide what parser generator to use. There are many generators, but I used <a href=\"https:\/\/www.antlr.org\/\" class=\"ek-link\">Antlr4<\/a> because that is what I am familiar with. And, while Antlr4 can target C#, the parser it outputs requires a driver, and the code compiled. Therefore, I assumed this would be done in a step prior to running VSCode. \u00e2\u20ac\u009cUsers\u00e2\u20ac\u009d can write a new grammar, generate a parser, build, then tell the extension where the parser is located.<\/p>\n\n\n\n<p>I\u00e2\u20ac\u2122ve noted that other parser implementations exist. One \u00e2\u20ac\u009cfast\u00e2\u20ac\u009d implementation in JavaScript is <a href=\"https:\/\/github.com\/Chevrotain\/chevrotain\" class=\"ek-link\">Chevrotain<\/a>. Unfortunately, the rules are specified in JavaScript, so while it would be easy to convert an EBNF-based grammar like Antlr or <a href=\"https:\/\/hackage.haskell.org\/package\/BNFC\" class=\"ek-link\">BNFC<\/a> to Chevrotain, scraping the code to extract the grammar from Chevrotain syntax would be difficult. That\u00e2\u20ac\u2122s because one can modify the rules on the fly.<\/p>\n\n\n\n<p>The output of a parse is a parse tree. But a grammar in itself is insufficient to classify tokens in some file.<\/p>\n\n\n\n<p>In addition to the grammar, you will need to declare classes of symbols. Here, I used <a href=\"https:\/\/en.wikipedia.org\/wiki\/XPath\" class=\"ek-link\">XPath<\/a> expressions, which is an expression to match a particular parse tree node. The reason for XPath expression is because we want the context of a token to be used for classification. For example, <a href=\"https:\/\/tomassetti.me\/how-to-create-an-editor-with-syntax-highlighting-dsl\/#chapter4\" class=\"ek-link\">this grammar-based editor<\/a> selects tokens from the lexer for a classification. But, this ignores the fact that in many languages, tokens have different meaning because they occur in different contexts in the parse. If you ignore the parse tree context that the token occurs within, you may as well just use <a href=\"https:\/\/macromates.com\/\">Textmate<\/a>.<\/p>\n\n\n\n<p>Note, a <a href=\"https:\/\/macromates.com\/\" class=\"ek-link\">Textmate<\/a> language spec is an alternative solution, but Textmate grammars are difficult to implement. A Textmate pattern is either a single regular expression, or a \u00e2\u20ac\u009cdual-level\u00e2\u20ac\u009d regular expression pattern. As far as I know, there are no tools to convert a context-free grammar with class patterns into a Textmate specification. My gut feeling is that an XPath expression into a \u00e2\u20ac\u009cdual-level\u00e2\u20ac\u009d pattern, but not always.<\/p>\n\n\n\n<h2>Putting it all together: Java as an example<\/h2>\n\n\n\n<p>Let&#8217;s start with the <a href=\"https:\/\/github.com\/antlr\/grammars-v4\/tree\/master\/java\/java\">Java grammar<\/a>. To get this extension to work, you will need to clone the grammars-v4 repo, and build a parser for the Java grammar.<\/p>\n\n\n\n<pre class=\"wp-block-code\"><code>git clone https:\/\/github.com\/antlr\/grammars-v4\ncd grammars-v4\/java\/java\ntrgen\ndotnet build\ncd ..\/..<\/code><\/pre>\n\n\n\n<p>Next, we need to set up the three <code>~\/.grammar-*<\/code> files. In <code>~\/.grammar-location<\/code>:<\/p>\n\n\n\n<pre class=\"wp-block-code\"><code>c:\/users\/foobar\/documents\/grammars-v4\/java\/java<\/code><\/pre>\n\n\n\n<p>In <code>~\/.grammar-classes<\/code>:<\/p>\n\n\n\n<pre class=\"wp-block-code\"><code>class\nproperty\nvariable\nmethod\nkeyword\nstring\n<\/code><\/pre>\n\n\n\n<p>Next, we need to clone the extension code, build it, then run VSCode.<\/p>\n\n\n\n<p>In <code>~\/.grammar-classifiers<\/code>:<\/p>\n\n\n\n<pre class=\"wp-block-code\"><code>\/\/classDeclaration\/IDENTIFIER\n\/\/fieldDeclaration\/variableDeclarators\/variableDeclarator\/variableDeclaratorId\/IDENTIFIER\n\/\/variableDeclarator\/variableDeclaratorId\/IDENTIFIER\n\/\/methodDeclaration\/IDENTIFIER\n\/\/(ABSTRACT | ASSERT | BOOLEAN | BREAK | BYTE | CASE | CHAR | CLASS | CONST | CONTINUE | DEFAULT | DO | DOUBLE | ELSE | ENUM | EXTENDS | FINAL | FINALLY | FLOAT | FOR | IF | GOTO | IMPLEMENTS | IMPORT | INSTANCEOF | INT | INTERFACE | LONG | NATIVE | NEW | PACKAGE | PRIVATE | PROTECTED | PUBLIC | SHORT | STATIC | STRICTFP | SUPER | SWITCH | SYNCHRONIZED | THIS | THROW | THROWS | TRANSIENT | TRY | VOID | VOLATILE | WHILE)\n\/\/(DECIMAL_LITERAL | HEX_LITERAL | OCT_LITERAL | BINARY_LITERAL | HEX_FLOAT_LITERAL | BOOL_LITERAL | CHAR_LITERAL | STRING_LITERAL | NULL_LITERAL)\n<\/code><\/pre>\n\n\n\n<p>Finally, we need to clone the extension code, build it, then run VSCode.<\/p>\n\n\n\n<pre class=\"wp-block-code\"><code>git clone https:\/\/github.com\/kaby76\/uni-vscode.git\ncd uni-vscode\ndotnet build\ncd VsCode\nbash install.sh\ncode .<\/code><\/pre>\n\n\n\n<h2>Semantic highlighting<\/h2>\n\n\n\n<p>I tried it out for Java, and have five classes of symbols defined. Here&#8217;s how the editor displays a Java source file with the standard Java extension.<\/p>\n\n\n\n<figure class=\"wp-block-image size-large\"><img src=\"http:\/\/codinggorilla.com\/wordpress\/wp-content\/uploads\/2021\/07\/Screenshot-28-1024x576.png\" alt=\"\" class=\"wp-image-2871\"\/><\/figure>\n\n\n\n<p>This is how the editor displays the same Java source file with the &#8220;universal language server&#8221;.<\/p>\n\n\n\n<figure class=\"wp-block-image size-large\"><img src=\"http:\/\/codinggorilla.com\/wordpress\/wp-content\/uploads\/2021\/07\/Screenshot-29-1024x576.png\" alt=\"\" class=\"wp-image-2872\"\/><\/figure>\n\n\n\n<h2>Hover, Select, Defs\/Refs<\/h2>\n\n\n\n<p>This implementation does support mouse hover pop-ups. The information displayed is just the classes that the symbol belongs to.<\/p>\n\n\n\n<p>Select (aka &#8220;TextDocument\/DocumentHighlight&#8221;) only selects one symbol because there is no symbol table implemented. Similarly, the Defs\/Refs of a symbol only fetch the one symbol selected.<\/p>\n\n\n\n<h2>Implementation problems<\/h2>\n\n\n\n<ul><li>Tagging is slow: O(Number of tree nodes * number of Classifications). This is because each classifier is run separately with the XPath engine.<\/li><li>Defs\/Refs\/Select only handle one symbol.<\/li><li>The location of the grammar, classes, and classifiers should be in one file.<\/li><li>The server can only handle one grammar per VSCode session.<\/li><\/ul>\n\n\n\n<p>&#8211;Ken, July 9, 2021<\/p>\n","protected":false},"excerpt":{"rendered":"<p>There was an interesting question in Stackoverflow today: Is it possible to write an extension for VSCode that can accept any grammar and support semantic highlighting, defs\/refs, and other language features? I&#8217;ve always toyed with the idea of having a \u00e2\u20ac\u009cuniversal language server\u00e2\u20ac\u009d with the server dynamically reading a grammar then parsing the input. So, &hellip; <\/p>\n<p class=\"link-more\"><a href=\"http:\/\/165.227.223.229\/index.php\/2021\/07\/09\/writing-a-universal-programming-language-extension\/\" class=\"more-link\">Continue reading<span class=\"screen-reader-text\"> &#8220;Writing a universal programming language extension for VSCode&#8221;<\/span><\/a><\/p>\n","protected":false},"author":1,"featured_media":0,"comment_status":"open","ping_status":"open","sticky":false,"template":"","format":"standard","meta":[],"categories":[1],"tags":[],"_links":{"self":[{"href":"http:\/\/165.227.223.229\/index.php\/wp-json\/wp\/v2\/posts\/2868"}],"collection":[{"href":"http:\/\/165.227.223.229\/index.php\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"http:\/\/165.227.223.229\/index.php\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"http:\/\/165.227.223.229\/index.php\/wp-json\/wp\/v2\/users\/1"}],"replies":[{"embeddable":true,"href":"http:\/\/165.227.223.229\/index.php\/wp-json\/wp\/v2\/comments?post=2868"}],"version-history":[{"count":0,"href":"http:\/\/165.227.223.229\/index.php\/wp-json\/wp\/v2\/posts\/2868\/revisions"}],"wp:attachment":[{"href":"http:\/\/165.227.223.229\/index.php\/wp-json\/wp\/v2\/media?parent=2868"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"http:\/\/165.227.223.229\/index.php\/wp-json\/wp\/v2\/categories?post=2868"},{"taxonomy":"post_tag","embeddable":true,"href":"http:\/\/165.227.223.229\/index.php\/wp-json\/wp\/v2\/tags?post=2868"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}