Informatik Handwerk
Peter Fargaš
Programmer :: Prototyping, Research
PHP | JavaScript | Java
Informatik Handwerk
Peter Fargaš
Programmer :: Prototyping, Research
PHP | JavaScript | Java
Informatik Handwerk
Peter Fargaš | Programmer :: Prototyping, Research | PHP,JavaScript,Java
Release date: October 2016
Link to authoritative version https://knowledge-transfer.informatik-handwerk.de
/article/discussion/arrayIndexes.php

Array Indexes Must Start At 1, Otherwise Cripled

I will be arguing by means of
  • historical development
  • mathematical extensions
  • and ergonomics.
  • I will also consider argumentation of others.

Performance difference

Even the low-level assembler is a compiled language - which neutralizes the dilemma completely because the compiler can take care of the translation. If a language really should allow tackling problems relying on such micro-optimizations, syntactical constructs like [0 ] and [1 ] can be used to address this.

Extendability of concept

Accessing items from the end of the array

If we begin array with a 0, we have no choice, but to take -1 as the index of first item from the end. Thinking in 0-based paradigm from the start and in 1-based paradigm from the end - that requires to be skilled in thinking in both. To start indexing with the 1, allows a simpler, symmetrical extension from domain of unsigned to the domain of signed numbers.

+0 and -0 do exist in certain types of numerical encodings1. This representation exists at the level of source code, but there is no concept which would capture this distinction at runtime.

And so 0-based arrays could hold the line, somehow3, yet the extensibility concept does not end there. An array does not consist all just of neatly ordered items.

Addressing space between items

Inserting an item between some other two is exactly that, addressing a gap. Usually, inserting is treated as a special operation, where the index passed as parameter, talks about inserting before the item of that index2. Yet there are two different semantics: inserting before and after.

The same goes for addressing slices. Does the starting index include the element, is the ending index "up to, not including"? Now there are four options, when one has no choice but to stick to integers. Floating points offer a solution.

The concept of 1-based array can straightforwardly encompass this - defining an item to be placed at positions i+0.5 or i-0.5 is just that. The extension, is from the domain of signed numbers to the domain of floating-point numbers.

The 0-based concept breaks down here. In it's native, the positive domain, inserting item before 0, would mean to insert it at -0.5. Worse than counterintuitive, collision occurs, when inserting item in the negative domain after the end, at the index -1 + 0.5.

Further numerical extensions

One basedness allows 0 to have special meaning, but also +Inf, -Inf and NaN can be difined to have a meaning. Languages might want to begin supporting +0 and -0 floating point expressions.

1 Wikipedia entry of signed zero.
2 There is no other choice in 0-baseness, explained in a later paragraph.
3 As well, IEEE floating-point supports +0 and -0.

Notation Comparism

Here are all different notations of iteration upon zero- and one-based arrays. I only list iterating through the positive indexes, as it is always simpler and using the negative domain does not lead to any advantages.

There are always two alternatives on how to notate an iteration: having a sharp and unsharp comparism. I included both for completeness, just note that the upper one of the pairs is the better/usual way to notate the range.

form

problems

special chars variation

   zero-based, upwards
for
(i=0; i<n; i++)
invalid index: n
<
1
in 1 cluster
and 1 logical group
for
(i=0; i<=n-1; i++)
visually cumbersome: n-1
< = -
3
in 2 clusters
and 2 logical groups
   zero-based, backwards
for
(i=n-1; i>=0; i--)
visually cumbersome: n-1
- < =
3
in 2 clusters
and 2 logical groups
for
(i=n-1; i>-1; i--)
visually cumbersome: n-1
more groups than clusters
invalid index: -1
- > -
3
in 2 clusters
and 3 logical groups
   one-based, upwards
for
(i=1; i<=n; i++)
< =
2
in 1 cluster
and 1 logical group
for
(i=1; i<n+1; i++)
unnatural boundary
visually cumbersome: n+1
< +
2
in 2 clusters
and 2 logical groups
   one-based, backwards
for
(i=n; i>=1; i--)
> =
2
in 1 cluster
and 1 logical group
for
(i=n; i>0; i--)
invalid index: 0
>
1
in 1 cluster
and 1 logical group

1-basedness iterations are beautifully symmetrical, using the same unsharp compare operator (larger-equal / smaller-equal). This also shows how practical is, when the upper index is the same as element count. Imagine every n would be expressed as count(arr).

Visual simplicity makes reading and understanding simpler, especially boundaries which are in-sync with how they are expressed. This aids natural perception and direct expectancy fulfilment.

v.s.

Dijkstra

I found it was a rather quickly written note.

Dijkstra rooted himself in the first section about intervals, didn't put the two variations against each other anymore in the later parts and thus had no choice but arriving at 0-based index as the more natural one.

We sail apart right at the first argument, because having constants readable on the monitor is what I consider as expected, instead of thinking in length of runs. There are two subtractions which give correct length (expanding either the lower side or the upper side of the interval). Yet visually in-sync, is only one variation. The empirical evidence is 35 years old without a reference to inspect it.

My argumentation worked itself through multiple areas independently, and in each section found 1-based array to be a better variation in all areas.

Others

Most discussion pro 0-baseness contains reference to Dijkstra and simple arguments without analysis (like a naive argument on obvious performance).

1-based arguments often contain references to mathematical notation and pedagogical reasons reasoned by expectations and simplicity.

If you find or have a nice collection, pro or contra, please send it over - I will incorporate it here.

Conclusion

Well defined concepts which provide sound extensions - are in mathematics and it's applied areas, like programming, what it is all about. To me, 0-based arrays seem like a missed decision on decoupling of assembler language from binary code - which was bound on the hardware and it's structure by necessity. We are struggling with that ever since.

Links

Dijkstra: Why numbering should start at zero
Wiki of Cunningham & Cunningham: Zero And One Based Indexes
Wikipedia: Zero based numbering

Comments
Waiting for approval.