Ned Konz, August 12, 2004
There has been a lot of discussion over the last year or so about ways in which we could make the ongoing development of Squeak more agile and responsive to the community needs.
This document is the result of my thought as well as discussion with a number of the Guides and Viewpoints Research people. It consists of a discussion of the problems, the problems with some of the proposed solutions, and some suggestions of future directions that I think we can and should take.
I only speak for myself here; I know that there will be differences of opinions with any proposed solution.
Please note that I have tried not to talk about technical solutions to problems. I don't think that the problems we're having are primarily because of a lack of appropriate tools or system design. Rather, I think that we have ignored a number of important social factors in the design of the community.
I am sure that many of us tool makers will want to talk about technical implementations of some of my suggestions, but I think that is secondary (and not nearly as useful or interesting) as deciding to do something about the community issues first. Tools will follow.
So I have deliberately tried to avoid too much discussion of the implementations of package dependencies, of bug trackers, etc. I think that the engineering effort would best go into getting the actual human interaction working first.
First, let me identify the problems that we've seen and I'd like to solve with this proposal.
Squeak is different from typical programming languages because it's more than just a programming language and IDE. It's also an application framework, an authoring application, a multimedia application, etc.
We have different groups of users of Squeak. Each group has its own view of what Squeak is and should be. We don't have a single benevolent dictator pushing us in one direction.
Each of these groups defines their view of Squeak in terms of what they want to do with it, and what they want it to contain. And these lists of desires and contents are often quite different between the communities.
Of course, a given person might count themselves in more than one of these communities depending on what they're doing at a given instant.
Among these communities are:
Our largest group of users doesn't even realize that there is something called Smalltalk in Squeak. For them, Squeak is EToys, and is an educational application. This includes the 80000 users in Extremadura in Spain, as well as tens of thousands of users in Japan and around the world.
Even within this group, the view of Squeak is different. The Squeakland distribution is relatively sparse, and is intended for relatively quick download times. The SmallLand distribution contains a number of additional packages and projects.
A rapidly growing group of the Squeak community uses Squeak as a platform for developing and delivering web-based applications. They use packages like Seaside or HTTPView, as well as web server packages like Comanche.
These folks don't care about EToys, nor particularly about Morphic, MVC, sound, multimedia, etc. They do care about solid network, file system and XML support.
Because of its openness and simple architecture, Squeak is also appealing as a platform for programming language and systems research. We have a number of academic researchers using Squeak for their goals.
Squeak is very close to being a good multimedia authoring tool. There is interest from various people about making a "Power Point killer"
Squeak is being used as the basis for the Croquet system. The Croquet team is interested in having a stable base on which to build their system, and is happy to not have to re-invent pieces of infrastructure.
And, of course, Squeak is also a fairly refined and open Smalltalk development system. There are a number of us who use Squeak to develop applications of various sorts. This includes the development of TK4.
Because of the history of Squeak's development, we have historically only had a few "official" downloadable artifacts. These have included:
the Full Image
the Basic Image
the Squeakland Image
Now, however, as the needs of various different groups have forced development along a single more specialized track, we now see as well:
the M17N Image
the Nihongo Image
the SmallLand Image
and so on. Every time someone starts talking about making a version of Squeak with different contents or focus, we start discussing whether a "fork" is necessary. The implication of the "fork" discussions is that two significantly different versions of Squeak can't both be developed or maintained at the same time. And it seems to be generally accepted that a "fork" is a bad thing, as it divides the community's efforts.
We also need to remember that a given image (or whatever form the downloadable Squeak artifacts are in) is more than just a bunch of code. There is also the actual content inside the image, including projects, pre-built morphs, preferences settings, color schemes, fonts, etc. which all are the result of making choices and setting things up carefully. Each community is likely to have different content needs, just as they need different packages of code.
Some problems have come along with the move toward separating various parts of Squeak into separately maintained packages.
Among these problems is that it is very hard to ensure that a given arbitrary group of packages will work together. There have been a number of technical proposals that have been presented as suggestions of how that particular problem might be managed.
I see several problems with our current package system:
lack of visibilityBecause we have many packages, and because they aren't loaded by default in the versions of Squeak that people can download, they don't get used and tested as much as they should. Instead, people see a list of 400 or so packages with short descriptions. They have no guarantee that loading a given package into a given image or alongside other packages will work or in fact has ever been attempted.
it makes package maintainers into second-class citizensAs long as there is a distinction between a package being "in the image" and being "on SqueakMap" or elsewhere, we have diluted our community. And the maintainers of these packages are more loosely connected to the Squeak community than maintainers of the common packages that are in the image.
no idea of user communityAs a side effect of the separation, it is much harder for a package maintainer to know who the users of their package are. We see bug reports and discussion of the Image on the list, but we don't see as much discussion of package evolution or bugs.
no idea of what will work togetherCombining any group of packages together generally means that someone will have to do some work to make the combination work right. As a result, we generally don't and can't know that a given combination of packages will work together until they actually have been put together and tested. It is impossible to test all the combinations of packages and settings.
Some problems that are the direct result of the way we're handling bug reports and fixes include:
no way to commit to actionIf a bug does get reported, there's no way really to whose responsibility it is, or in fact to commit to taking care of it. Yes, one could send an email saying that, but this hasn't happened very often in the past. I'm sure that there are bugs that don't get fixed because it's not clear whether there is someone already working on them.
no idea of who's doing whatRelated to this is the fact that there is little awareness available to our community. Unless someone says on the list "I'm working on XXX and I want to hear about YYY" their work is invisible.
So we sometimes find ourselves wasting precious time duplicating effort, where a smarter solution would have been for the people who duplicated each others efforts to have teamed up on the work.
excessively conservative update streamBack in the Squeak Central days, there were two update streams. One (the "internal" update stream) was used by a small group of people who all lived in the resultant images. This gave them incentive to fix broken things. When they had lived with the updates for a while, fixed versions of these updates would be posted to the external update stream.
However, we don't have the equivalent of the internal update stream. So as a result there is no single combination of updates that can be tested together other than what appears in the official development stream. And since there are people actually using the pre-release images for real work (admit it!) we have to be careful.
This leads to the apparent conflict of:
inadequately tested update streamBecause there is no mechanism to provide an alternate, fast-moving update stream, active developers don't get much experience with proposed updates. If we did, we could avoid many of the problems we have had in the official update stream.
slow bug fixesBecause of the requirements of the process (approval stages, etc.) and because of the conservatism of the update stream policy, it can take quite a while between a bug fix being posted to the list and actually getting into the update stream so people can test it.
Other side effects of our current system include:
dead/unmaintained codeBecause there's no official responsibility for individual packages in the Squeak image, there's also no way to locate code that is dead or not being maintained by anybody. We started steps in the direction of at least identifying packages and assigning volunteer maintainers with TFNR but that task didn't get finished.
the squeak-dev list is frequently over the heads of non-programmersWe have a couple of other mailing lists (Squeakland, SmallLand, etc.) that are aimed at non-programmers. But much potentially useful discussion takes place on squeak-dev that is unfortunately buried under the weight of lots of discussions that don't make any sense to non-programmers (or in fact to many programmers).
no way to make the contributions of various community members visible
users not able to contribute in meaningful (non-programmer) waysIt's hard to figure out where to help with Squeak if you're not a programmer. There isn't a downloadable version of Squeak that's really targeted to end users in such a way that it will let people contribute. It's even hard to figure out how to send a bug report properly without knowing something about programming.
But we have a number of needs for help that don't include programming. These include:
testingIf no one uses a given package or set of packages, we can't find and fix as many bugs.
documentationThere's lots of missing documentation. Not just class comments and the like, but also simple things like HOWTO and FAQ entries.
reorganization and packagingLots of design choices need to be re-thought in Squeak. We don't need programmers to do it (in fact, they're the reason that many of the existing choices are so useless for end users). For instance, how about World menu reorganization or Preference cleanup?
artworkIf it were simpler to actually incorporate design changes into the UI, we could take contributions from artistically inclined members of the community.
OK, so there's a number of problems. We've had lots of discussion over the last few years about how to fix these problems.
However, most of these discussions have either dived directly into technical solutions (automatic package compatibility metrics, server mirroring, etc.) which, while interesting to programmers, do nothing directly to fix the source of the problems.
As toolmakers, we programmers have a tendency to see problems in terms of the tools we could make to fix them. Luckily, I don't think that our problems are at all technological in nature.
Communities must concern themselves primarily with the realities of human action. This is not different in the Squeak community just because we're working on software.
Alan Kay's vision has always been human-centered. He sees computers as a tool to enhance the abilities of humans. We need to acknowledge that our goals in the Squeak community are also human-centered.
If we had designed (say) a network communications protocol as carelessly as we've designed the social protocol of the Squeak community, we would have been fired or ignored.
What have we missed?
Simply speaking, we have failed to think about the various interactions that are needed to get our job done.
First, no social interaction is one-way. There must always be some kind of acknowledgement or feedback provided in a community.
For instance, we've largely missed the concept of responsibility. We currently have no reliable way, for instance, for someone to volunteer for a particular bug fix or other task and have their status visible to others.
We also are missing the indicators of awareness that help to clarify what it is we are actually doing as a community. So it's easy to feel separated from the effort at large. We should be able to tell at a glance what we're doing and working on, and who's responsible for what.
We also have no way to acknowledge or even to notice the contributions of our community members. So much work is done by so few people that is largely unacknowledged that it is frustrating for them. Look, for instance, at the work done by the few active Harvesters and at the work done by Doug and Bruce in getting the updates and updated images out to the public.
Every time someone says that they need to fork Squeak or some other shared code, I hear them saying that they are not willing or able to follow through on the commitment that should be required to used a shared package and still be a member of the Squeak community.
They are, I think, missing the reciprocal responsibility that is implied by sharing code written by others. To me, that community responsibility goes further than just grabbing a copy of something and later making the modified code available.
I believe that they also have the responsibility to the maintainers and other users of the shared code that they modified to help them with integrating your changes, or at least separating, refactoring, and identifying them as being irrelevant to other users. This is not just altruism; if both sides spend the effort to do this, it will reduce the amount of duplicated work required later.
We should always view the co-development of a set of projects that share and modify artifacts like packages as a process of several steps:
start with a common version of a packageThis requires being able to identify the package and name specific versions.
diverge the versions as neededThis is what we're already good at.
merge the diverged versions back into one or more packagesThis is the part we've missed. Since this merge process requires effort from all the users of the shared package, an agreement as to who is going to do this work should be reached whenever a project wants to use a particular shared package.
What can we change?
First, there is no need to talk about "forks" vs. the "official image". This is an artificial consequence of certain historical packaging and distribution choices, and only serves to separate various sub-communities in Squeak.
I suggest that we acknowledge the existence of specialized versions of Squeak and encourage the definition of others as their user communities come forward to take responsibility for doing that work.
I will use the term "distribution" here to refer to a particular image, and its code and other packaged content. Each distribution has downloadable versions that people can get and use immediately.
Each of these distributions has its own communities. This is not any more divisive than what we have now, just an acknowledgement of reality.
By naming distributions, defining what each of them include, and letting people declare themselves as their developers and/or users, we can gain clarity and community visibility without separation or isolation.
I use the term package here to mean a specific, versionable, set of methods and classes, possibly combined with other content. I do not mean "executable scripts", as these are much more difficult to share and to combine at will. And they can't really be compared with any reliability.
To talk about what is in a distribution and what distributions share in common, we must be able to name the packages. This is a task that has been started several times (most recently with TFNR) but has not been finished. This is essential for my proposal to work, because as soon as we share packages (which we are) we must be able to both take responsibility for them and to talk about changes to them.
So we must define the packages to whatever granularity is possible, and to save versions of each of these packages with version names so that we can identify what versions of which packages comprise a given distribution.
Every package that is included into a distribution implies more work, both for the maintainer of that package and for the maintainers of the distribution and the other packages used in that distribution. So inclusion of a package in a distribution requires the following (where I say 'someone' here I mean one or more people):
someone must commit to being globally responsible for maintaining each packageThere must be a single point of contact for decisions regarding evolution and other maintenance of each package.
someone must commit to being responsible for maintaining each package within each distributionSomeone must commit to being the point of contact within a distribution for each package that is included in that distribution. This could be the package maintainer, or it could be someone from the distribution's sub-community who volunteers to do this work. This person is responsible for being the liaison between the developers and users of that distribution and the package maintainer. They must commit to knowing enough about the package to at least propose responsible changes or bug fixes to it, and to understand how it should be used within the distribution. They are also responsible for suggesting enhancements or other changes to the package maintainer, and they are the point of contact for the package maintainer when that person needs to communicate with the distribution maintainers. They represent the "diverge" phase of development for that package in that distribution.
someone must commit to spending the effort to merge the changes back into new common shared versions.That is, someone must be responsible for the "merge" phase, when the divergences from all the distributions get merged back together to produce one or more new packages. This is a commitment that has been lacking.
Specifically, the mere inclusion of a package in a distribution should not necessarily require more work on the part of the package maintainer. Instead, the inclusion of a package should be the result of a two-way conversation and commitment between the package maintainer and the person or people in the distribution community who are responsible for that package within that distribution.
If a distribution will be making significant changes to a shared package, they must realize that they also must commit to working with other users of that package after they have made the changes they needed for their own distribution. In otherwords, their work does not stop when they release a version of their own distribution that uses that package.
I suggest also that we adopt a model for each of these distributions that resembles the majority of open source development projects with multiple developers.
This would be equivalent to the model of having a trusted core of developers with commit rights to a CVS repository, with nightly builds being tested regularly by those developers and others. From time to time the development forks toward a deliverable, which is then provided for public download.
Specific people would be given roles with respect to a particular distribution (developer, reporter, etc.) that define their interaction with the process. If you commit to being a developer on a particular distribution, this requires a commitment on your part to actually use that distribution as it evolves and to spend time using, testing, and fixing it.
We must provide some way to ensure that the most recent development version of each distribution is always readily available for those interested in helping with development or testing. This is similar to the nightly builds/milestone releases/release fork model of Mozilla and other projects.
I imagine that an automated build process could incorporate the various changes made to each of the components of each distribution every day, and deposit the result for download by testers and developers.
For the community to stay unified and to present a unified face, there must be a single place for people to go to check on project status, report bugs, volunteer to do something, ask for new features or changes, etc.
This place should also be where potential new users of Squeak can be directed to in order to find the version of Squeak that fits them best. Instead of each sub-community having its own web site, update mechanism, mailing list, Swiki, etc. why not have a central place where people can see the state of the entire Squeak community?
I would like to see us stop relying on email for bug tracking. There are a number of off-the-shelf bug tracking systems available; it would seem to make sense to use one of these until we decide why we need something different. It makes no sense to spend time that would be better spent improving Squeak on widely available infrastructure. And we don't really need too much availability of this system inside Squeak, at least at first. If we can submit bug reports (preferably requiring more content than the current system requires), and get updates, then we will have satisfied the minimal requirements of such a system. We can use a web browser for other things.
We have a number of media through which members of the Squeak community can talk with each other and find out what's going on with Squeak. At present, these include at least:
face to face meetings
the Squeak-dev and other email lists
the Minnow swiki
the forums on Squeakland.org
blogs on SqueakPeople
other web sites
There is always the danger that adding to the number of such channels will further fragment the awareness of the voice of the community. Still, different people are more comfortable with different media, and different media each have their own advantages.
To increase awareness, I suggest that we make some effort to make as many of these as possible searchable and viewable in one place.
For instance, if we could search the text of IRC logs, squeak-dev archives, forum messages on Squeakland, and the Minnow swiki all at once, it would be easier to answer questions.
Similarly, if there were a single timeline that could show in one place threads on squeak-dev, new changes to Minnow, new blog postings on SqP, new forum posts on Squeakland, etc., it would be easier to follow the pulse of the community by reading (or subscribing to) a single resource.
One paper that I read recently was about a project called ToxicFarm (http://www.loria.fr/~molli/rech/dpd03/dpdFINAL.pdf) which discussed their model for shared, distributed development. I recommend that you look at this paper.
Their report is a thoughtful survey of some of the issues that we in the Squeak community also face.
Their model uses a central server that acts as a repository for the shared artifacts, as well as for the groupware database (bug tracking, blogs, forums, etc.)
On this central server there are also private workspaces for each participant in each project. These private workspaces are then mirrored and synchronized with local workspaces on computers local to the participants.
An update action copies changed items from the repository to the private workspaces. This is what you would do to get newer changes from other people.
A publish action copies changed items from a private workspace back to the repository. ToxicFarm requires that private workspaces be updated and conflicts dealt with before publishing.
There were some other interesting ideas in the paper regarding coordination and awareness. They talked about both task coordination (or explicit coordination), which is based on the hypothesis that it is possible to define a process and enforce this process on working sites, and group awareness (or implicit coordination), which is based on the hypothesis that if the right information about what other people do is sent at the right time to the right people, this information will trigger communication between people that will result in automatic coordination of the virtual team.
They also talked about the different classes of awareness, including:
state awareness, based on the state of shared data (Molli et al., 2001),
availability awareness, based on the knowledge of the physical presence and current status of a member
process awareness, based on the knowledge of current activities and their place in a global process
divergence awareness, based on the quantity of conflicts introduced by integrating concurrent operations
As I read the paper, I thought that Squeak images would be a natural equivalent for their concept of workspaces, and could be built automatically every time that a publish operation happened.
We could provide synchronization between local and private workspaces using something as simple as Monticello "save" operations or CS fileOuts.
Anyway, it's a model that would seem to fit our desired process well.
There will be several welcome changes from adoption of my proposals.
For one thing, the Squeak community will be better able to define itself as a group of people who have agreed to work together, rather than as a group of people who all happen to be using some of the same code. This is a good change, and more honestly represents the two-way commitment implied by the use of shared code.
Second, it will showcase the efforts of the entire community. If the members of your distribution's user community agree to be part of the two-way process, and to work with the rest of the community to evolve the shared code, web site, tools, etc., then your distribution will be presented alongside all the others. Prospective Squeak community members will see a choice of well-maintained distributions in side by side to choose from. They will be able to choose one or more well-defined distributions that they can help with.
Third, it will speed up the response to bug fixes and evolution of code. With changes being applied every day, the latency between bug reporting and fixing will be reduced. The status of each bug and feature request, for Squeak as well as all for the packages that are included in any distribution, will be available and visible in the same place. You will be able to see who is responsible for doing a certain job, and whether they've done it or not.
Please consider my ideas. Don't focus on technological solutions to them at first, rather consider whether they will help our community situation. With the enthusiasm and participation of the vibrant Squeak community, technological problems are easy to fix!
Thanks again to all of you for being part of Squeak.