Array indexes:
start from 0 or 1?

I will be arguing by means of
  • historical development
  • mathematical extensions
  • and ergonomics.
  • I will also consider argumentation of others.

Performance difference

Even the low-level assembler is a compiled language - which neutralizes the dilemma completely. If a language really should allow tackling problems relying on such micro-optimizations, syntactical constructs like [0 0] and [1 1] can be used to address this.

Extendability of concept

Accessing items from the end of the array

+0 and -0 do exist in certain types of numerical encodings1. But usually, there is no such concept available, so if we begin array with a 0, we just have to take -1 as the index of first item from the end. Thinking in 0-based paradigm from the start and in 1-based paradigm from the end, that requires to be skilled in thinking in both. To start indexing with the 1, allows a simpler, symmetrical extension from domain of unsigned to the domain of signed numbers.
And so 0-based arrays could hold the line, somehow3, but the extensibility concept does not end there. An array does not consist all just of neatly ordered items.

Accessing space between items

Inserting an item between some other two is exactly that, accessing a gap. Usually, inserting is treated as a special operation, where the index passed as parameter, talks about inserting before the item of that index2. Yet there are two different semantics: inserting before and after. Defining an item to be placed at positions i+0.5 or i-0.5 is just that. The extension, is from the domain of signed numbers to the domain of floating-point numbers.
The concept of 1-based array can straightforwardly encompass this - the 0-based concept breaks down completely, even in it's native, the positive domain.

Further numerical extensions

Continuing along IEEE floating-point encoding: 1-baseness allows, apart from +Inf, -Inf and NaN as well 0 to have a special meaning. Languages might want to begin supporting +0 and -0 floating point expressions. IEEE standards also define signals and exceptions.
1 Wikipedia entry of signed zero.
2 There is no other choice in 0-baseness.
3 As well, IEEE floating-point supports +0 and -0.

Easier notation

form

problems

special chars variation

   zero-based, upwards
for
(i=0; i<n; i++)
invalid index: n
<
1
in 1 cluster
and 1 logical group
for
(i=0; i<=n-1; i++)
visually cumbersome: n-1
< = -
3
in 2 clusters
and 2 logical groups
   zero-based, backwards
for
(i=n-1; i>=0; i--)
visually cumbersome: n-1
- < =
3
in 2 clusters
and 2 logical groups
for
(i=n-1; i>-1; i--)
visually cumbersome: n-1
more groups than clusters
invalid index: -1
- > -
3
in 2 clusters
and 3 logical groups
   one-based, upwards
for
(i=1; i<=n; i++)
< =
2
in 1 cluster
and 1 logical group
for
(i=1; i<n+1; i++)
unnatural boundary
visually cumbersome: n+1
< +
2
in 2 clusters
and 2 logical groups
   one-based, backwards
for
(i=n; i>=1; i--)
> =
2
in 1 cluster
and 1 logical group
for
(i=n; i>0; i--)
invalid index: 0
>
1
in 1 cluster
and 1 logical group
Complicated writing procedure, unnatural constructs, interfere with thought flow. Visual simplicity makes reading and understanding simpler, especially boundaries which are in-sync with how they are expressed. This aids natural perception and direct expectancy fulfilment.
TODO? The table does not contain the whole number extension utilizing negative indexes.

v.s.

Dijkstra

I found it was a rather quickly written note.
Dijkstra rooted himself in the first section about intervals, didn't put the two variations against each other anymore in the later parts and thus had no choice but arriving at 0-based index as the more natural one.
We sail apart right at the first argument, because having constants readable on the monitor is what I consider as expected, instead of thinking in length of runs. There are two subtractions which give correct length (expanding either the lower side or the upper side of the interval). Yet visually in-sync, is only one variation. The empirical evidence is 35 years old without a reference to inspect it.
My argumentation worked itself through multiple areas independently, and in each section found 1-based array to be a better variation in all areas.

Others

Most discussion pro 0-baseness contains reference to Dijkstra and simple arguments without analysis (like a naive argument on obvious performance).
1-based arguments often contain references to mathematical notation and pedagogical reasons reasoned by expectations and simplicity.
If you find or have a nice collection, pro or contra, please send it over - I will incorporate it here.

Conclusion

Well defined concepts, which provide sound extensions are in mathematics and it's applied areas, like programming is, all it is about. 0-based arrays seem to me like a missed decoupling of assembler language from binary code - which was bound on the hardware and it's structure by necessity.

Links

Dijkstra: Why numbering should start at zero
Wiki of Cunningham & Cunningham: Zero And One Based Indexes
Wikipedia: Zero based numbering