Guide:Syntax

From IokeWiki
Revision as of 15:12, 27 March 2009 by Olabini (talk | contribs) (Text)
Jump to: navigation, search

Syntax

Ioke has no keywords or statements. Everything is an expression composed of a chain of messages. A piece of code is represented as a chain of messages that links to the next message. The result of one message will be the receiver of the next message, until a "." message is received. The "." message is a terminator that throws away the current receiver. A newline will serve as a "." message in the circumstances where it feels natural.

An informal BNF description of Ioke looks like this:

program       ::=  messageChain?
messageChain  ::=  expression+
expression    ::=  message | brackets | literal | terminator
literal       ::=  Text | Regexp | Number | Decimal | Unit
message       ::=  Identifier ( "(" commated? ")" )?
commated      ::=  messageChain ( "," messageChain )*
brackets      ::=  ( "[" commated? "]" ) | ( "{" commated? "}" )
terminator    ::=  "." | "\n"
comment       ::=  ";" .* "\n"

What isn't visible here is that all whitespace -- except for newlines -- will work only as separators of messages, and is otherwise ignored. That means that message sending does not use the dot, as in most other languages. A phrase such as foo().bar(quux(42)).baaz() would be expressed as foo() bar(quux(42)) baaz(), or more succinctly foo bar(quux(42)) baaz in Ioke.

All the types of literals are actually turned into a message to create that literal, so the canonical form of the message chain contains no literals, just a message to create that literal. Any message can have zero or more arguments given to it. Arguments are separated with comma. If there are no arguments to a message, the parenthesis can be left off, but they need to be there if there are arguments. Mostly any combination of characters can be used as an Identifier, with some exceptions.

There used to be a parsing element called operators, but these have now been included into identifiers. They are not parsed differently at all, but the operator shuffling step will handle them differently. Specifically, operators can be used in infix, including having different precedence rules. Assignment is a specific form of operator which gets its own kind of shuffling. These are both described below.

An identifier in Ioke can be one of several things. Ioke takes the rules for Java identifiers, and adds some more to them. All Unicode letters and digits can be part of an identifier, except for the first entry. Underscores are allowed, just like in Java. Ioke also allows colons as an identifier. Exclamation mark and question mark is allowed anywhere in the identifier except for in the beginning. Identifiers can be broadly classified into identifiers and operators, where operators can be any combination of several sigils. There are also some special operators that have restrictions. These are: Opening and close brackets are not allowed, except together with its counterpart, so [ is not a valid identifier, while [] is. So is {}. () is not valid either. Two or more dots is a valid identifier. A hash sign can be followed by any operator char, but isn't parsed as an identifier by itself. Slash is not an operator char, but can be used as it except in combinations that look like regular expressions. The operator chars are: +, -, *, %, <, >, !, ?, ~, &, |, ^, $, =, @, ', ` and :. These can be combined together in any order, and any number, except for the caveats noted before. That means the available operator space is infinite, and very wide. Combinations of letters and operator characters are generally not allowed, except for the exceptions with :, ! and ?. This is to make it possible to have infix operations without spaces in some situations.

The two forms of brackets will get turned into a canonical form. Surrounding comma-separated message chains with square brackets is the same as calling the method [], giving it those message chains as argument. So [foo, bar, quux] is exactly the same as [](foo, bar, quux). The same is true for curly brackets.

Comments start with semicolon and end at the first newline. They can be used mostly anywhere, except inside of literal texts. The hash sign followed by an exclamation mark is also a comment, to allow the shebang line in Unix scripts.

How and when the actual evaluation of messages happen depend on what kind the message type is. If it's inactive, the value reflecting that cell will be returned. If it's active, the cell will be activated and the result of that activation returned. How the activation depends on what kind of code the cell contains. The various kinds of code is described more closely in the chapter about code.

Literal values

Ioke currently contains four different kinds of literals. There is a fifth quasi literal, that isn't exactly parsed as a literal, but will be evaluated differently based on its name. These literals are texts, regular expressions, integers and decimal numbers. Symbols are actually parsed as regular identifiers, but they are handled a bit differently during evaluation.

Text

A literal text in Ioke is what is generally called strings in most languages. As in most languages, text is written inside of double quotes. Any characters are valid inside of those double quotes. That includes newlines - so you can write a literal text that extends to several lines. There is an alternate syntax for text when the value contains a lot of double quotes. As in most other languages, several escapes are valid inside of a text. Escapes are preceded by the backslash, and insert the character corresponding to the escape values. These escapes are:

\b
Inserts the backspace character, that is represented in ASCII by the decimal value 8.
\e
Inserts the character that is represented in ASCII by the decimal value 27. This value is used for sending escape values to the TTYs in some operating systems.
\t
Inserts the TAB character - ASCII decimal 9.
\n
Inserts the newline character - ASCII decimal 10.
\f
Inserts the form feed character - ASCII decimal 12.
\r
Inserts the carriage return character - ASCII decimal 13.
\"
Inserts the double quote character - ASCII decimal 34.
\\
Inserts the backslash character - ASCII decimal 92.
\[newline]
Inserts nothing at all. Used to escape necessary newlines, without having them show up in the output text.
\#
Inserts a literal hash character - ASCII decimal 35.
\uABCD
Inserts the Unicode codepoint corresponding to the hexadecimal value of the four characters following the "u". All four hexadecimal characters need to be specified.
\7, \12, \316
Inserts the Unicode codepoint corresponding to the octal value of the one, two or three octal characters. The maximum value allowed is \377, and the minimum is obviously \0.

Ioke also supports an alternative text syntax that can be used when the text in question contains many scare quotes. The alternative syntax starts with #[ and ends with ]. A right bracket will have to be escaped, but scare quotes doesn't have to be.

The parsing of text will generate a message with name "internal:createText". This message will get one argument that is the raw Java String corresponding to the text.

Ioke allows automatic interpolation of arbitrary values in the same manner as Ruby. It uses the same syntax for this, which is the #{} syntax inside a text. These can be nested in any way. The elements will be parsed and sent as arguments to the message with name "internal:concatenateText". So an Ioke text such as "foo bar#{flux} will #{1+2}" will generate the message internal:concatenateText("foo bar", flux, " will ", 1+(2), ""). As you can see, there is a small amount of waste in the way this is generated -- but the simple model makes it easy to understand. It's not guaranteed that this will remain the same, although the message will definitely remain.

Some examples:

"foo"

"flax \
mux"

"one two #{three} \b four"

#[you don't really  "#{1+2+3}" believe that?]

Regular expressions

Ioke has very capable regular expressions. Exactly what you can do with them can be found further down in this guide. The literal syntax allows regular expressions to be embedded in code directly. The syntax for this starts with a #/ and ends with another /. The last slash can optionally be followed by some flags that change the behavior of the expression. Regular expressions can also use an alternative syntax that starts with #r[ and ends with ]. Just as with Text, regular expressions can contain interpolation. This interpolation will be transformed into regular expressions and then combined with the outer regular expression. A few examples might be in order here:

#//
#r[]

#/foo/
#r[foo]

#/fo+/x
#r[fo+]x

#/bla #{"foo"} bar/
#r[bla #{"foo"} bar]

The first example is an empty regular expression. The second is an expression matching the word "foo". The third expression matches an "f" followed with one or more "o". It also allows extended regular expression syntax, due to the x flag. The flags supported in Ioke are x, i, u, m and s. The meaning of these match the meaning of corresponding Ruby flags. Regular expressions allow most of the same escapes as Ioke text. Specifically, these escapes are supported: b, t, n, f, r, /, \ and newline. Unicode and octal escapes also work. The fourth example shows the insertion of a literal text inside of a regular expression.

Ioke regular expressions will be transformed into a call to internal:createRegexp. This message expects two Java strings, one with the actual pattern, and one with the flags.

Integers

Ioke supports arbitrarily sized numbers. It also contains a numerical tower that can be more closely explored in the reference documentation. The numerical tower is based in Number. Number Real mimics Number. Number Rational mimics Number Real, and so does Number Decimal. Finally, Number Integer and Number Ratio both mimics Number Rational. The interesting parts of this tower is Number Integer, which corresponds to integers, Number Ratio, which is any ratio between two integers, and Number Decimal, which corresponds to decimal numbers. These are arbitrarily sized and exact. There are no floats or doubles in Ioke. There is also a potential place for Number Complex at the same layer as Number Real, although complex numbers are not currently implemented. There are also plans for implementing a unit system further down the line.

Literal integers can be written using either decimal or hexadecimal notation. Hexadecimal notation begins with 0x or 0X and are then followed by one or more hexadecimal letters. They can be either upper or lower case. A decimal literal number is written using one or more decimal letters, but nothing else.

There is no literal to create ratios - these can only be created by division of integers. Negative numbers have no literal syntax, but preceding a number with a minus sign will call the message - on the number and generate the negative value.

A literal integer will be transformed into a call to internal:createNumber, which takes one native Java String from which to create the number.

Some examples:

1234444444444444444444444444444444444444235234534534

0

0xFFFFF

Decimals

Literal decimal values can be written either using exponential notation, or using a decimal dot. A decimal dot notation can be combined with exponential notation. Exponential notation starts with a number or a decimal number, followed by lower or upper case E, followed by an optional sign, and then followed by one or more decimal letters.

A literal decimal will be transformed into a call to internal:createDecimal, which takes one native Java String from which to create the decimal.

Some examples:

0.0

1E6

1E-32

23.4445e10

Symbols

Symbols aren't exactly syntax, but they aren't exactly messages either. Or rather, they are messages that will evaluate to the symbol that represent themselves. Symbol is a kind in Ioke. There are two kinds of symbols - the first one is simple symbols that can be parsed as is. The second is symbols that can't be parsed as is. Symbols are preceded by a colon and then directly followed by the symbol text. If it can't be parsed correctly, the value should be surrounded by quotes, and this will be turned into a call to the method :, which takes the text as argument. That means that you can actually get dynamic symbols by calling the : method.

Some examples:

:foo

:flaxBarFoo

:""

:"mux mex mox \n ::::::::"

Operator shuffling

One exception to the way message handling works in Ioke is operators. All the so called operators in this section is possible to call directly in message passing position too -- but to make it possible to use them in a more natural way, the parsing step will handle them a bit differently, and then do a shuffling step that actually takes operator precedence into account. So all the common operators will generally work as you expect them too -- although I recommend adding parenthesis when something is possibly unclear.

Ioke has a slightly larger amount of operators than most other languages. Most of these are currently unused, but they are certainly available for use for any purpose the programmer wants to use it for. Many adherents of other languages (Java, I'm looking at you) claim that operator overloading is evil. I don't believe that is true, seeing as how it works so well in Ruby, so Ioke instead allow you quite large freedom with regards to operators.

The precedence rules for regular operators can be found in the cell 'Message OperatorTable operators', which is a regular Dict that can be updated with new values. The new values will obviously not take effect until the current code has run, and a new parse is started.

Note that the below is only the operators that have defined precedence rules. As noted in the section on syntax, you can use any operator you want really. It is easy to add new precedences to the table, either temporarily or permanently.

At the time of writing, the available operators - in order of precedence - are these:

  •  !
  •  ?
  • $
  • ~
  • #
  • **
  • *
  • /
  •  %
  • +
  • -
  • <<
  • >>
  • <=>
  • >
  • <
  • <=
  • >=
  • <>
  • <>>
  • ==
  •  !=
  • ===
  • =~
  •  !~
  • &
  • ^
  • |
  • &&
  •  ?&
  • ||
  •  ?|
  • ..
  • ...
  • =>
  • <->
  • ->
  • +>
  •  !>
  • &>
  •  %>
  • #>
  • @>
  • />
  • *>
  •  ?>
  • |>
  • ^>
  • ~>
  • ->>
  • +>>
  •  !>>
  • &>>
  •  %>>
  • #>>
  • @>>
  • />>
  • *>>
  •  ?>>
  • |>>
  • ^>>
  • ~>>
  • =>>
  • **>
  • **>>
  • &&>
  • &&>>
  • ||>
  • ||>>
  • $>
  • $>>
  • +=
  • -=
  • **=
  • *=
  • /=
  •  %=
  • and
  • nand
  • &=
  • &&=
  • ^=
  • or
  • xor
  • nor
  • |=
  • ||=
  • <<=
  • >>=
  • <-
  • return

And as mentioned above, all of these can be used for your own purpose, although some of them already have reserved meanings. This document will cover most of the used operators, while the rest can be found in the reference.

Since this operator shuffling happens, that also means that an Ioke program has a canonical inner form that can differ from the source text. When you use introspection of any kind, you will get back that canonical form which might not look exactly like you expected. Similarly, if you ask some code to print itself, it will use the canonical form instead of the operator skin. Macros that modify message chains should work against the canonical form, and nothing else.

What an operator does depends on the result of sending the message of that name to the receiver, just like regular messages. In fact, to Ioke there really isn't any difference, except that the parsing takes special notice about operators and assignment operators.

Assignment shuffling

Much like with regular operators, trinary - assignment - operators are subject to a kind of shuffling. This shuffling differs from regular operator shuffling, in that it will shuffle around two things - the left hand side and the right hand side. This is true for every assignment operator except for the unary ones, which will only reshuffle one message.

A few examples might make the translation easier to perceive. The first item is the readable form, while the second form is the canonical form:

foo = 1 + 2
=(foo, 1 +(2))

Ground foo *= "text"
Ground *=(foo, "text")

bar foo(123) = 42
bar =(foo(123), 42)

flux++
++(flux)

These examples show some more advanced details -- specifically the fact that assignment operators generally work on "places", not on names or cells. This will be more explored in the chapter on assignment. The important thing to notice from the above examples is that for most assignments two things will be rearranged. For the unary operators only one thing will be moved.

Just as with regular operators, the assignment operators have information in the 'Message OperatorTable' cell. The specific cell is 'Message OperatorTable trinaryOperators', and it matches an assignment operator to either the integer 1, or the integer 2. Everything with 1 will be matched as being unary assignment.

The currently available assignment operators are:

  • =
  • ++
  • --
  • +=
  • -=
  • /=
  • **=
  • *=
  •  %=
  • &=
  • &&=
  • |=
  • ||=
  • ^=
  • <<=
  • >>=

Just as with regular operators, what an assignment operator does depend on what the result is from sending the message of that name to the receiver object, just like with any type of message.

Inverted operators

In addition to the regular binary operators and the trinary assignment operators, Ioke also sports inverted operators. These aren't actually used anywhere in the core distribution, but they might be useful at some time or another. The basic idea is that sometimes you want to have the right hand side of an expresssion become the receiver of an operator call, and the left hand side become the argument to the operator. Inverted operators allow this.

As with both the binary and trinary operators, you can find and update information about inverted operators in the cell 'Message OperatorTable invertedOperators'. To make this a little less abstract, let us look at two simple examples and what they translate into:

"foo" :: [1, 2, 3, 4] map(asText) 
;; will be translated to
[1, 2, 3, 4] map(asText) ::("foo")

;; provided we have an inverted
;; operator called 'doit'
abc foo quux doit another time
;; will be translated to
another time doit(abc foo quux)