$Id: writingPlugins.otl,v 1.3 2002/11/09 15:51:14 ned Exp $

Extending Squeak by Writing Plugins

by Ned Konz (mailto:ned@bike-nomad.com)

Quick links:

Imagine a world in which we couldn't choose what our programming languages could talk to; where programmers had to rely on language vendors for access to libraries, operating system calls, or devices. Fortunately, those of us using open source languages generally have an alternative. Today's most successful languages are all capable of being extended more or less easily to integrate with new systems and devices. This ease depends on a number of factors, but generally, dynamic languages (like Smalltalk, Ruby, or Perl) can be harder to integrate with external libraries than C, C++, or assembly language. Most of them provide memory management that is different than the memory management (if any) of external libraries written in C. Further, since the execution model is probably different, there is usually some glue code required between the language and the extension library.

I recently decided that I was going to make Squeak Smalltalk work with an open source package that is available as a static link library. It took me a while to learn how to do this correctly, so I thought I'd share what I learned with you.

Squeak (http://www.squeak.org) is an open-source Smalltalk language development system that comes with a powerful development environment, graphics frameworks, and a number of other tools.

You can write code to do most of the things you need to do in Squeak directly in the Smalltalk language. But it's also possible to extend Squeak using code that's written in C or another language.

These extensions are called plugins, and contain primitives, which are named subroutines that can be called directly from Squeak code.

This article will explain how to make your own plugin in Squeak, and will take you through the construction of an example plugin. There is an appendix (#appendix) at the end for use as a quick language reference.

I assume that the reader has some familiarity with both Smalltalk and C syntax, and at least some familiarity with non-blocking file I/O and the select() runtime library function.

Why bother writing primitives?

Why would you choose to write a primitive rather than writing a method in Smalltalk?

One reason is that the primitive will probably run faster than Smalltalk. Since Squeak uses a byte code interpreter, individual instructions run more slowly than native code produced by a good compiler would. For some applications -- realtime streaming video or audio, compression, crypto algorithms, and JPEG decoding, for instance -- this gain in speed can make the difference between an application being usable or not.

Another reason to write a primitive is to use the services of an pre-existing library. This could be anything from native OS services (like sockets, asynchronous file support, or serial port usage), to extension libraries like zlib (compression) or pcre (regular expressions), to interfaces with other programs (like OLE, Applescript, or X11 servers).

Primitives also let you deal with callbacks from external sources -- somewhat. Unfortunately, the Squeak interpreter doesn't let you call Smalltalk code from external code. Because of this, the usual idiom is to receive the callback in a routine written in another language, and signal a Squeak Semaphore to let a Squeak Process continue running to handle the condition.

The other important justification for Squeak primitives is to make sure that Squeak doesn't block while waiting for I/O. The problem is that Squeak runs in a single OS process, and has its own multi-tasker internally. If one Squeak Process blocks at an OS level, no other Squeak Process can run. To make non-blocking I/O possible, most ports of Squeak have a provision for checking I/O events (files or sockets that have become readable or writable, or sockets that have exceptions) and calling back to user code in a plugin. This code then sets a Semaphore as described above in the discussion of callbacks. Using this scheme, a Squeak Process that needs I/O service can start the request and block on a Semaphore until the transfer is complete, letting other Processes run.

For some examples of existing primitives, you can look at the classes in the class category VMConstruction-Plugins. Good examples include:

About the Spread plugin

I can best demonstrate how to write a plugin by showing you a concrete example: my Spread plugin. This is a plugin that I made to interface with an external library, in this case the Spread library libsp. Let me introduce you to Spread and take you through the process of writing this plugin.

Spread (http://www.spread.org) is a group communications system that allows messaging to groups across the network. I want to add Spread capability to Squeak so that I can experiment with various broadcast, collaboration, and distributed object schemes. A Spread system consists of one or more Spread daemons that receive requests from Spread clients and pass messages between themselves and between Spread clients. A Spread client can be in as many groups as it wants, and it can send messages to as many groups as it wants (even ones that it doesn't belong to).

After looking at the Spread API documentation and source code, I saw two choices for connecting Squeak to Spread.

One was to duplicate all the client logic in Smalltalk, down to sending packets over the network. I could read the C or Java implementations and duplicate them in Smalltalk. This has the advantage of not requiring a compiled plugin, but has the serious disadvantage of being a lot of work.

The other choice that I saw was to hook Squeak up to the Spread client library libsp, which is written in C and is available as a static linker library. Although this choice requires compilation of a plugin, it has the advantage of being able to track new versions of Spread easily by a simple re-compile. But most important for me, it looks like much less work, so this is the strategy I chose.

libsp is written in reasonably portable C and is compilable on all the popular desktop platforms that support the standard Sockets API. This means that my plugin potentially can be used on most of the computers that run Squeak.

The Spread API itself is quite simple. At its core, it consists of the following functions:

Getting Squeak to use these functions seems straightforward enough, except for SP_receive(). I don't want to call a blocking function, because if I do, none of the other Squeak Processes will have a chance to run until the function completes. So I'm going to have to avoid calling SP_receive() until I know that it won't block. This requires knowing that there are bytes to be received on the socket that is being used by the Spread client connection. Luckily, one of the return values from SP_connect() is actually a socket file descriptor (though this isn't documented).

Using the socket file descriptor to test for readiness to receive requires Squeak to call the runtime library select() periodically to test whether the file is ready. This support has already been built into Squeak for the use of Squeak's native sockets and asynchronous file support, so I need to hook into it. Unfortunately, there isn't yet a standard API for this select() polling, so there will have to be a platform-specific portion of my Spread plugin.

To make the SpreadPlugin easy to port, I should write it so that the Smalltalk part doesn't have to change for different platforms. So it looks like I'll end up with these files:

I'm writing this first for Linux, so my platform-specific file will be called spUnixSpread.c .

Anatomy of the plugin

Now that I've figured out a broad strategy, what do I need to get this plugin to work? The required pieces are independent of the way I choose to write the plugin code itself. From the top down, I will have:

It probably makes sense to define these from the top down, so that the client interface is as clear and Squeak-friendly as possible. So let's go over each of these pieces in order from the top down.

At the highest level, my requirements for using this plugin from Squeak seem pretty simple:

There's also some unknowns and code that I don't want to write right now:

The next level below the client code is the interface class. Since all of the operations in the Spread API either require or return mailboxes (which identify individual connections, and are represented by socket file descriptors), it makes sense to have the interface class represent a connection. I'll call it a SpreadConnection. It will have to present the appropriate API for client code, of course, but it will also have to hold whatever data I need to represent the state of the connection itself for the use of the plugin code.

This state data includes at least the file descriptor returned from SP_connect() and the Semaphore used to block a single Squeak Process while waiting for a receive. It might also be nice to maintain whatever data pertaining to the connection that SP_connect() returns for the sake of client code, though I might not actually need it. So I'll add the private group name that is returned by SP_connect(). Maybe later I'll also save the name and/or port of the Spread daemon for error reporting, but not now.

At first, I'll make the interface of the SpreadConnection class mirror the Spread API. This will make debugging easier, but may not be appropriate for final use. As I discover more about the needs of my programs that use this plugin, I can add to or change the interface.

All the Spread API calls return a numeric error code of some sort; some of the calls also pass back a byte count in the error code. I'm going to return the same error code from my Smalltalk API for the time being, because it makes testing easier.

So my initial interface will be:

After thinking about it a while, I realize that these connections will have to be able to withstand an image save and startup. However, if part of their state is a file descriptor, that descriptor will certainly not be valid when the image comes back up. I could either disconnect all the connections on a shutdown using a class shutDown method, or I could just keep track of whether they're valid somehow. I think I'll do both, because I also need to be able to close out connections when they get garbage collected, if someone forgets to disconnect them. And I need to tell if a connection that has been closed is safe to use. So I'm going to add a way to query validity from Smalltalk (I get to figure out how to do this later):

Now that I've mapped out the top SpreadConnection layer, I have to actually call from Squeak to the primitives in the plugin. I'll call these interface methods here. Squeak has a special syntax for these calls. They look like this:

primIsValid: conn
  <primitive: 'primitiveIsValid' module:'SpreadPlugin'>
  ^ false

The first line of these calls looks like the first line of a normal Smalltalk method, with the name of the method and arguments, if any. This is followed by the special syntax
<primitive: 'primitiveIsValid' module:'SpreadPlugin'>
which calls a named primitive (in this case, primitiveIsValid()) in a named plugin (here SpreadPlugin).

After the primitive call is Smalltalk code. This code is only run if the primitive call fails for some reason. Reasons for a primitive failing include not having the proper plugin, not being able to load it because of library dependencies or sending the wrong kind of parameters. Since the SpreadPlugin requires the use of the primitives and can't be effectively replaced by Smalltalk code, all of the plugin's primitive calls, except for primIsValid: and primConnect:..., which will raise an exception if they fail.

Another thing that the interface code can do easily is to translate and prepare argument data for the primitive, and to modify the output data from the primitive. I've found that it's often easier to do this kind of translation in Smalltalk than down in the primitive code. For instance, C often wants NUL-terminated strings. But Squeak's strings have a count and no NUL. My first draft of a couple of these interface methods passed the primitive the string and its count, so that I could avoid counting the string in the primitive. Some translation that did survive my optimization is the packing and unpacking of group names. SP_join() has a list of groups as an input, and SP_receive() returns a list of groups. The C interface to the Spread plugin expects these names to be in fixed-size arrays, 32 bytes per each group name. However, it's much more natural for Squeak to pass around collections of group names. Rather than requiring a specific data type to be passed (say, requiring an Array of Strings), I allocate and stuff a ByteArray with the characters going in to the SP_join() call, and I allocate a buffer of a nominal size to return the groups from the SP_receive() call. Luckily, Spread will return an error code telling me if my buffers are too short, and will also indicate how long they have to be. So my Smalltalk code allocates nominally sized buffers, calls the primitive, and then reallocates buffers and calls the primitive again until the buffers are big enough.

Now I'm ready to specify how the interface methods look. These look very similar to the higher level interface; the differences are mostly because some of these API calls have multiple return values that have to be returned in Smalltalk objects. The interface section of the SpreadConnection class looks like this:

The linkage between Squeak and the named primitives in the plugin is managed by Squeak, which will load the external library (or hook up the internal library) when needed and arrange for the method calls to call the primitives.

Primitives don't take arguments or return values like, say, C functions do; rather, they get inputs from and leave their output on the Squeak object stack. Further, the contents of the Squeak stack aren't C objects, so translation is usually needed before your C code can do anything with the arguments or receiver of the message. You also have to translate the C return value back into a Smalltalk object.

I could write my primitives directly in C, but decided instead to have the C generated by a compiler for a language called Slang. This compiler comes standard with Squeak. Its syntax is that of Smalltalk, but it generates plain old (non-object) C code. So method calls become regular function calls, plugin instance variables become plugin globals, and other Smalltalk expressions become C expressions.

All of the Squeak plugins that I know about have been compiled from C. And most (but not all) of this C has been generated from Slang code in the image. There's nothing magical about either C or Slang; since all that is needed is to have functions in a library that have C calling convention, I could write my primitives in (non-object) C++, or assembly language, or Delphi, or any other language that was compatible.

If I were writing in C, I'd declare the primitives as taking no arguments and returning an (unused) int or void. However, I'm using Slang to generate my C, so the Slang compiler will automatically generate code to deal with the Squeak stack.

Because I'm using Slang, the primitives are declared just like the interface methods. However, their exported names are the names given in the interface methods (i.e. primitiveReceive rather than primitiveReceive:dropRecv:). They declare these exported names in the Slang code.

The declaration of the primitives looks familiar:

There are also three methods that will be called in plugins that define them; their names are fixed by the runtime system. Since I need both startup and shutdown processing, I define all of them: