Parsing script files
From Monster Wiki
Contents |
File names
By convention, script file names should be lowercase, contain no spaces, and end with the extension '.mn'. None of these rules are absolute, but breaking them might it make it impossible for the VM to find class scripts automatically. (However you can easily work around this by loading all the classes explicitly before you use them.) If you are not loading scripts from the file system (eg. if you load them directly from an archive file stream), these rules obviously do not apply.
For class scripts (see below), it's a good idea to give the script file the same name as the class, but in lowercase. So the class MyClass would reside in myclass.mn. For portability reasons it's not recommended to put files which differ only in case (like MyScript.mn and myscript.mn) in the same directory. Some file systems (like Unix) are case sensitive and will allow this, while others (like Windows) will not.
Unicode
All script files are expected to be encoded in Unicode (currently only UTF8). The file may contain an optional byte-order mark. Normal ASCII text works as well, since ASCII is a subset of UTF8.
TODO: The VM does not currently load files encoded with UTF16 or UTF32, this will be implemented upon request.
Unix scripts
If the first lines begins with a 'shebang' - '#!' - it is ignored. This can be used on Unix systems to run scripts directly from the command line:
#!/usr/bin/mvm writeln("Hello world!");
The name and location of the 'mvm' program isn't standardized yet though.
Script types
Functions
The engine supports two types of script files. The first is a pure list of statements to execute. These are run basically like functions, and are often called function scripts.
// Example of function script. This is the entire contents of the file. io.writeln("This is a script");
Function scripts may optionally have a function definition as the first statement in the file:
// Example of function script that takes parameters function int doSomething(int i, int j); i += j; io.writeln("Returning", i); return i;
Classes
The other type of scripts are classes. They must begin with a valid class, module or singleton declaration.
class MyClass; int i; func() {}
See function declarations and class declarations for more information.
Tokens and parsing
Monster-script is in the C-family of languages (together with C++, D, Java, C# and many others). Like most of its brethren it is parsed as a series of tokens. These tokens include syntax characters like { and }, operators like +, && and *=, and identifiers like myFunctionName. A complete list of tokens is given below.
Whitespace (newlines, spaces, tabs, etc) are optional and mostly ignored. Thus the following two snippets of code are equivalent:
int sum(int[] list) { int res = 0; foreach( v; list ) res += v; return res; }
int sum(int[]list){int res=0 ;foreach(v;list )res+=v;return res;}
The only places where whitespace is not ignored, are
- when separating identifier tokens ('int res' is two identifiers while 'intres' is one)
- inside string and character literals
Case sensitivity
Monster is case sensitive with regard to all identifiers and keywords:
int myInt; myint = 3; // error, not defined int abc; int ABC; // ok, abc and ABC are different names int class; // error, 'class' is a keyword int Class; // ok, 'Class' is not a keyword
Comments
A comment is a part of the source code that is completely ignored by the parser. Monster-script has three types of comments:
- Line comments start with '//' terminate at the end of the line
- Block comments start with '/*' and can span multiple lines, terminates with '*/'
- Nested block comments start with '/+' and terminate with '+/', but may contain any number of matching pairs of '/+' and '+/' in between.
Example:
// line comment: ignore the rest of this line /* block comment: ignore a block of text that can span multiple lines */ /+ nested block comments: ignore blocks of text, including other /+ nested comments +/ /+ You /+ can /+ nest /+ these /+ as +/ deep +/ as +/ you +/ like +/ +/
Nested comments are particularly useful for commenting out large pieces of unused code, which might contain comment blocks itself.
Symbols
The tokens can be roughly divided into three categories: symbols, keywords and literals / identifiers. Lets list the symbols first:
(
)
{
}
[
]
,
:
;
.
..
...
$
!
&&
||
++
--
==
!=
=i=
=I=
!=i=
!=I=
<
>
<=
>=
=
+
-
*
/
%
\
~
+=
-=
*=
/=
%=
\=
~=
Keywords
The following names are reserved keywords. This means you can not use any of the following names for your variables, functions, etc.
class module return for this new if else foreach foreach_reverse do while until continue break switch select state struct enum import typeof singleton clone static const abstract override final function with idle out ref public private protected true false native null goto var
