Wednesday, 16 May 2012

Occam-Pi - a review

Apologies for a lengthy absence. After a term of easy work-load the past month or so has been very heavy, and I haven't had time to think much about my Musings. I anticipate this will continue for another couple of months, until I hand my dissertation in.

While there has been a lot of work, that's not to say there hasn't been fun: one of the more enjoyable things I've done this month is learn some Occam-Pi.

Occam-Pi: A brief history
For those of you who don't know what Occam-Pi is, and I'm guessing that's most of you, it is a programming language that mixes the Occam language with Pi calculus.

The Occam language was designed to run on transputers. Transputers are like computers, but run many processors independently of each other (i.e. no shared memory, like multi-core computers) and instead talk to each other using defined data channels.

Occam-Pi has a compiler that allows it to be run on modern computers using threading to simulate having lots of independent processors. It doesn't work perfectly, but it runs well enough.

The review
Occam-Pi makes writing parallel programs very easy. The compiler can sometimes feel very strict, but that is a necessity of writing error-free concurrent programs. The unique nature of this language makes it wonderfully fun to write in, and actually very powerful as a tool to write multi-threaded programs. By enforcing a strict use of data channels it helps you think about multi-threaded programming in a way that other languages don't.

That is not to say the language is perfect, however. There are many minor annoyances about the syntax, which would probably get ironed out with more development.

The most difficult thing about the language is the total lack of debugging facilities and documentation, and the terseness of the error messages. When an Occam program fails a typical error message would be something along the lines of "Error at C:\[path]\occam_program.occ:123. Program failed, state = e, eflags 00000001".

The first bit tells you which line failed, and what line number. That's useful information, but not always accurate, and sometimes totally baffling. The "failed state = e" (or sometimes, = E) doesn't have anything anywhere telling you what that means, and the "eflags" presumably mean something, but I have no idea where to find that information.

So often when a program crashes, I want to be able to see the exact state of the program at the time of the crash - which processes are running, what the values of all the variables are, what is going on at the time. I can't get this information without rigging every process I'm interested in to talk to the screen, but this act is likely to fundamentally alter the way the program runs anyway.

Some of the niceties of other languages are also missing. For example, I can't define my own data types. (Well, I sort of can, but not neatly and they don't get enforced by the compiler). If I want an integer value with a byte tagged onto it, I can define a data protocol to send that pair through communication channels, but I can't use that protocol directly, and always have to decompose it back to an integer and a byte.

There are other things that would be lovely to have. For example, trying to receive on a channel where the sending process has finished will crash the program. That sounds fair enough, but I may be giving that channel a new process which will send down it. More frustratingly, this is true when the receive is part of an ALT statement, which can be used to listen to multiple channels, and act according to which one sends first. Any "dead" channels will not be sent down, but will still crash the program. (Worse, the error line points at the line with the ALT on it, not the line with the channel that failed, so you can't even tell which process had finished...) Much nicer behaviour here would be to simply ignore the channels that won't ever receive, or are shut, and listen to the channels that are still open.

Program flow is defined by indentation. This is the reason I stopped programming in Python, and it hasn't grown on me. Brackets are a far more intuitive way to indicate flow, and it's always easy to see which parts of the program belong where. The main arguments against brackets is that they clutter the screen, and make source code look uglier, as well as allow programmers to write obfuscated programs. Both poor arguments - bad programmers write obfuscated programs and you shouldn't use them; "ugliness" is debatable, and a poor reason to obfuscate the flow of the program.

There are other things, too, that would make Occam-Pi easier to program in. Currently, all IF statements are required to have at least one condition true. This often means appending "TRUE \\ SKIP" to IF statements. If an IF statement has no true conditions, it causes a crash. (True, ignore this sentence.) This is unnecessarily verbose, as the compiler can easily catch this error. Where the crash is desired writing "TRUE \\ STOP" at the end of the IF statement induces a crash.

Also, multiline comments don't exist. I know this doesn't sound like much, but sometimes I want to comment out entire sections at a time to experiment. With only single line comments, this is painful.

To sum up: Occam-Pi is a very interesting language with an easy learning curve at first. However, more difficult programs become much, much more difficult then they need to, especially with the lack of debugging tools making it almost impossible to track down errors apart from the most trivial.

No comments:

Post a Comment