A More Inclusive Community-Based Model for Squeak Development

Ned Konz, August 12, 2004

Introduction

There has been a lot of discussion over the last year or so about ways in which we could make the ongoing development of Squeak more agile and responsive to the community needs.

This document is the result of my thought as well as discussion with a number of the Guides and Viewpoints Research people. It consists of a discussion of the problems, the problems with some of the proposed solutions, and some suggestions of future directions that I think we can and should take.

I only speak for myself here; I know that there will be differences of opinions with any proposed solution.

Please note that I have tried not to talk about technical solutions to problems. I don't think that the problems we're having are primarily because of a lack of appropriate tools or system design. Rather, I think that we have ignored a number of important social factors in the design of the community.

I am sure that many of us tool makers will want to talk about technical implementations of some of my suggestions, but I think that is secondary (and not nearly as useful or interesting) as deciding to do something about the community issues first. Tools will follow.

So I have deliberately tried to avoid too much discussion of the implementations of package dependencies, of bug trackers, etc. I think that the engineering effort would best go into getting the actual human interaction working first.

The problems

First, let me identify the problems that we've seen and I'd like to solve with this proposal.

Squeak's different communities

Squeak is different from typical programming languages because it's more than just a programming language and IDE. It's also an application framework, an authoring application, a multimedia application, etc.

We have different groups of users of Squeak. Each group has its own view of what Squeak is and should be. We don't have a single benevolent dictator pushing us in one direction.

Each of these groups defines their view of Squeak in terms of what they want to do with it, and what they want it to contain. And these lists of desires and contents are often quite different between the communities.

Of course, a given person might count themselves in more than one of these communities depending on what they're doing at a given instant.

Among these communities are:

Choices and conflicts in packaging various images

Because of the history of Squeak's development, we have historically only had a few "official" downloadable artifacts. These have included:

Now, however, as the needs of various different groups have forced development along a single more specialized track, we now see as well:

and so on. Every time someone starts talking about making a version of Squeak with different contents or focus, we start discussing whether a "fork" is necessary. The implication of the "fork" discussions is that two significantly different versions of Squeak can't both be developed or maintained at the same time. And it seems to be generally accepted that a "fork" is a bad thing, as it divides the community's efforts.

We also need to remember that a given image (or whatever form the downloadable Squeak artifacts are in) is more than just a bunch of code. There is also the actual content inside the image, including projects, pre-built morphs, preferences settings, color schemes, fonts, etc. which all are the result of making choices and setting things up carefully. Each community is likely to have different content needs, just as they need different packages of code.

Package separation and its problems

Some problems have come along with the move toward separating various parts of Squeak into separately maintained packages.

Among these problems is that it is very hard to ensure that a given arbitrary group of packages will work together. There have been a number of technical proposals that have been presented as suggestions of how that particular problem might be managed.

I see several problems with our current package system:

Process problems and bug fixes

Some problems that are the direct result of the way we're handling bug reports and fixes include:

Other community problems

Other side effects of our current system include:

Ways that non-programmers should be able to help

But we have a number of needs for help that don't include programming. These include:

The problems with recent proposals

OK, so there's a number of problems. We've had lots of discussion over the last few years about how to fix these problems.

However, most of these discussions have either dived directly into technical solutions (automatic package compatibility metrics, server mirroring, etc.) which, while interesting to programmers, do nothing directly to fix the source of the problems.

Our problems are not because of missing or inadequate tools!

As toolmakers, we programmers have a tendency to see problems in terms of the tools we could make to fix them. Luckily, I don't think that our problems are at all technological in nature.

We have ignored the human element

Communities must concern themselves primarily with the realities of human action. This is not different in the Squeak community just because we're working on software.

Alan Kay's vision has always been human-centered. He sees computers as a tool to enhance the abilities of humans. We need to acknowledge that our goals in the Squeak community are also human-centered.

If we had designed (say) a network communications protocol as carelessly as we've designed the social protocol of the Squeak community, we would have been fired or ignored.

What have we missed?

Missing models of interactions

Simply speaking, we have failed to think about the various interactions that are needed to get our job done.

First, no social interaction is one-way. There must always be some kind of acknowledgement or feedback provided in a community.

For instance, we've largely missed the concept of responsibility. We currently have no reliable way, for instance, for someone to volunteer for a particular bug fix or other task and have their status visible to others.

We also are missing the indicators of awareness that help to clarify what it is we are actually doing as a community. So it's easy to feel separated from the effort at large. We should be able to tell at a glance what we're doing and working on, and who's responsible for what.

Missing acknowledgement and feedback

We also have no way to acknowledge or even to notice the contributions of our community members. So much work is done by so few people that is largely unacknowledged that it is frustrating for them. Look, for instance, at the work done by the few active Harvesters and at the work done by Doug and Bruce in getting the updates and updated images out to the public.

Missing sane model of development

Every time someone says that they need to fork Squeak or some other shared code, I hear them saying that they are not willing or able to follow through on the commitment that should be required to used a shared package and still be a member of the Squeak community.

They are, I think, missing the reciprocal responsibility that is implied by sharing code written by others. To me, that community responsibility goes further than just grabbing a copy of something and later making the modified code available.

I believe that they also have the responsibility to the maintainers and other users of the shared code that they modified to help them with integrating your changes, or at least separating, refactoring, and identifying them as being irrelevant to other users. This is not just altruism; if both sides spend the effort to do this, it will reduce the amount of duplicated work required later.

We should always view the co-development of a set of projects that share and modify artifacts like packages as a process of several steps:

What I think we should do

What can we change?

Define distributions

First, there is no need to talk about "forks" vs. the "official image". This is an artificial consequence of certain historical packaging and distribution choices, and only serves to separate various sub-communities in Squeak.

I suggest that we acknowledge the existence of specialized versions of Squeak and encourage the definition of others as their user communities come forward to take responsibility for doing that work.

I will use the term "distribution" here to refer to a particular image, and its code and other packaged content. Each distribution has downloadable versions that people can get and use immediately.

Each of these distributions has its own communities. This is not any more divisive than what we have now, just an acknowledgement of reality.

By naming distributions, defining what each of them include, and letting people declare themselves as their developers and/or users, we can gain clarity and community visibility without separation or isolation.

Need to define specific packages

I use the term package here to mean a specific, versionable, set of methods and classes, possibly combined with other content. I do not mean "executable scripts", as these are much more difficult to share and to combine at will. And they can't really be compared with any reliability.

To talk about what is in a distribution and what distributions share in common, we must be able to name the packages. This is a task that has been started several times (most recently with TFNR) but has not been finished. This is essential for my proposal to work, because as soon as we share packages (which we are) we must be able to both take responsibility for them and to talk about changes to them.

So we must define the packages to whatever granularity is possible, and to save versions of each of these packages with version names so that we can identify what versions of which packages comprise a given distribution.

Inclusion of a package in a distribution implies more work

Every package that is included into a distribution implies more work, both for the maintainer of that package and for the maintainers of the distribution and the other packages used in that distribution. So inclusion of a package in a distribution requires the following (where I say 'someone' here I mean one or more people):

Model closer to other open source projects

I suggest also that we adopt a model for each of these distributions that resembles the majority of open source development projects with multiple developers.

This would be equivalent to the model of having a trusted core of developers with commit rights to a CVS repository, with nightly builds being tested regularly by those developers and others. From time to time the development forks toward a deliverable, which is then provided for public download.

Specific people would be given roles with respect to a particular distribution (developer, reporter, etc.) that define their interaction with the process. If you commit to being a developer on a particular distribution, this requires a commitment on your part to actually use that distribution as it evolves and to spend time using, testing, and fixing it.

Distributions are always downloadable

We must provide some way to ensure that the most recent development version of each distribution is always readily available for those interested in helping with development or testing. This is similar to the nightly builds/milestone releases/release fork model of Mozilla and other projects.

I imagine that an automated build process could incorporate the various changes made to each of the components of each distribution every day, and deposit the result for download by testers and developers.

Share distribution sites, bug tracking databases, etc.

For the community to stay unified and to present a unified face, there must be a single place for people to go to check on project status, report bugs, volunteer to do something, ask for new features or changes, etc.

This place should also be where potential new users of Squeak can be directed to in order to find the version of Squeak that fits them best. Instead of each sub-community having its own web site, update mechanism, mailing list, Swiki, etc. why not have a central place where people can see the state of the entire Squeak community?

Central database for bug tracking, feature requests, etc.

I would like to see us stop relying on email for bug tracking. There are a number of off-the-shelf bug tracking systems available; it would seem to make sense to use one of these until we decide why we need something different. It makes no sense to spend time that would be better spent improving Squeak on widely available infrastructure. And we don't really need too much availability of this system inside Squeak, at least at first. If we can submit bug reports (preferably requiring more content than the current system requires), and get updates, then we will have satisfied the minimal requirements of such a system. We can use a web browser for other things.

Communications channels

We have a number of media through which members of the Squeak community can talk with each other and find out what's going on with Squeak. At present, these include at least:

There is always the danger that adding to the number of such channels will further fragment the awareness of the voice of the community. Still, different people are more comfortable with different media, and different media each have their own advantages.

To increase awareness, I suggest that we make some effort to make as many of these as possible searchable and viewable in one place.

For instance, if we could search the text of IRC logs, squeak-dev archives, forum messages on Squeakland, and the Minnow swiki all at once, it would be easier to answer questions.

Similarly, if there were a single timeline that could show in one place threads on squeak-dev, new changes to Minnow, new blog postings on SqP, new forum posts on Squeakland, etc., it would be easier to follow the pulse of the community by reading (or subscribing to) a single resource.

Lessons from ToxicFarm

One paper that I read recently was about a project called ToxicFarm (http://www.loria.fr/~molli/rech/dpd03/dpdFINAL.pdf) which discussed their model for shared, distributed development. I recommend that you look at this paper.

Their report is a thoughtful survey of some of the issues that we in the Squeak community also face.

Their model uses a central server that acts as a repository for the shared artifacts, as well as for the groupware database (bug tracking, blogs, forums, etc.)

On this central server there are also private workspaces for each participant in each project. These private workspaces are then mirrored and synchronized with local workspaces on computers local to the participants.

An update action copies changed items from the repository to the private workspaces. This is what you would do to get newer changes from other people.

A publish action copies changed items from a private workspace back to the repository. ToxicFarm requires that private workspaces be updated and conflicts dealt with before publishing.

There were some other interesting ideas in the paper regarding coordination and awareness. They talked about both task coordination (or explicit coordination), which is based on the hypothesis that it is possible to define a process and enforce this process on working sites, and group awareness (or implicit coordination), which is based on the hypothesis that if the right information about what other people do is sent at the right time to the right people, this information will trigger communication between people that will result in automatic coordination of the virtual team.

They also talked about the different classes of awareness, including:

As I read the paper, I thought that Squeak images would be a natural equivalent for their concept of workspaces, and could be built automatically every time that a publish operation happened.

We could provide synchronization between local and private workspaces using something as simple as Monticello "save" operations or CS fileOuts.

Anyway, it's a model that would seem to fit our desired process well.

Conclusion

There will be several welcome changes from adoption of my proposals.

For one thing, the Squeak community will be better able to define itself as a group of people who have agreed to work together, rather than as a group of people who all happen to be using some of the same code. This is a good change, and more honestly represents the two-way commitment implied by the use of shared code.

Second, it will showcase the efforts of the entire community. If the members of your distribution's user community agree to be part of the two-way process, and to work with the rest of the community to evolve the shared code, web site, tools, etc., then your distribution will be presented alongside all the others. Prospective Squeak community members will see a choice of well-maintained distributions in side by side to choose from. They will be able to choose one or more well-defined distributions that they can help with.

Third, it will speed up the response to bug fixes and evolution of code. With changes being applied every day, the latency between bug reporting and fixing will be reduced. The status of each bug and feature request, for Squeak as well as all for the packages that are included in any distribution, will be available and visible in the same place. You will be able to see who is responsible for doing a certain job, and whether they've done it or not.

Please consider my ideas. Don't focus on technological solutions to them at first, rather consider whether they will help our community situation. With the enthusiasm and participation of the vibrant Squeak community, technological problems are easy to fix!

Thanks again to all of you for being part of Squeak.

$Id: community2.html,v 1.4 2004/08/13 21:28:16 ned Exp $