Advanced counter arrays

The new logic ability to access variables indirectly by name (even in `@this` processor) allows various arrangements of internal arrays.

We categorize the internal array mechanism into two classes:

* _Regular arrays_: the jump tables are generated separately for read and writes, and they use an access variable to read/write a value from/to an array element. This is the usual `@counter` array supported by Mindustry Logic since version 6.
* _Compact arrays_: there is just one jump table. The jump table stores the name of the variable corresponding to the array element to the access variable. A separate `read` or `write` instruction is needed to read/write the element value from either a local, or a remote processor. If the array element variables aren't accessed directly, they need to be created using `draw triangle` instructions. This mechanism requires Mindustry BE.

Compact arrays may offer an advantage when elements are always read and written at the same index (e.g. `array[index]++`). Here, the compact array requires just one variable name lookup, making this actually _faster_ than a regular array.

## Single processor

The size of arrays in a single processor setting is limited by available instruction space. Compact arrays can be twice as large given the same space. 

The decision to use compact versus regular arrays on a single processor will be made by the compiler. The arrays will be created as compact arrays always. The optimizer will generate additional regular jump tables, or replace compact table with regular jump tables, using the optimization for speed mechanism. 

On top of that, both compact and regular array access can be inlined.

A support for declaring compact arrays might be added (a new `compact` modifier keyword). These arrays won't be expanded to regular arrays.

## Remote processors

The various configurations here are described from the point of view of a local processor, which uses a remote processor to store either the variables, or some portions of the access code, or both.

Some schemes might allow the remote processor to use the array for its own processing as well, but this is not part of this evaluation. The remote processor here serves only for providing access to the array to the local processor and doesn't contain other code.

Since the code is spread over multiple processors and potentially compiled separately, a concrete configuration needs to be specified by the programmer - it cannot be altered by the optimizer. 

### Remote storage, local access code

The only scenario where this arrangement brings some benefits for the local processor is when using a compact array. In this case, the variables need not be created in the local processor, only in the remote processor, saving the space needed for `draw triangle` instructions.

Pros:
* Array access doesn't require synchronization.
* Random access as fast as local compact arrays.
* Saves some instruction space in the local processor (specifically, one twelfth of the array jump table).

Cons:
* Direct element access is converted to a `read`/`write` operation instead of local variable.

When compact arrays get implemented in Mindcode, this functionality will be provided just by declaring a remote array.

### Local storage, remote access code: compact array

Supports up to 500 array elements - largest continuous internal array possible.

Pros:
* Local storage means direct element access can be optimized down to actual variable.
* Array access served in a single tick even on microprocessor.

Cons:
* Requires synchronization; one element access takes one tick on average (however, read-update-write sequence still requires just one remote call).

Conclusion: should be supported.

**Implementation**

Remote processor: links the local processor as `processor1`. Code:
```
write "array0" processor1 "array"
wait 1e15 # or stop, both should work
write "array1" processor1 "array"
wait 1e15 
write "array2" processor1 "array"
wait 1e15 
...
write "array499" processor1 "array"
wait 1e15     # last (1000th) instruction  
```
Local processor: links the remote processor as `processor1`. Creates variables `array0` to `array499` using `draw triangle`.

Access to array element at position `index`:
```
op mul address index 2
write address processor1 `@counter`
wait 1e-12
# to read
read variable @this array
# to write
write value @this array
....
draw triangle array0 array1 array2 array3 array4 array5
...
```
The local processor uses `wait` to wait for the next tick, during which the `array` variable will be set even if remote processor is a microprocessor. If the local processor accesses the array element once per tick on average, and in bursts of at most 4 accesses in a single tick, the average IPT of the local processor should not suffer.

It might break down at FPS higher than 60, though. To make sure it won't break, the array access would have to be modified to this:

```
op mul address index 2
set array null
write address processor1 `@counter`
loop:
wait 1e-12
jump loop strictEqual array null
# to read
read variable @this array
# to write
write value @this array
```

Another possibility might be to specify a time in `wait` precisely equal to one tick (or maybe a wee bit shorter). Will need some experiments to confirm.

### Local storage, remote access code: regular array

Supports up to 166 array elements.

For regular arrays, the remote processor needs to read/write the value from/to proper access variable in local storage.

Pros:
* Local storage means direct element access can be optimized down to actual variable.
* Reads may be one instruction faster than compact arrays, depending on context.

Cons:
* Writes are not faster than writes in compact arrays, perhaps might even be slower sometimes.
* Supports only 1/3 of elements compared to a compact array.
* Can't be proofed against FPS > 60 (or maybe can using the tick-long `wait` instruction). 
* Array access requires three instructions: microprocessor is not fast enough.

Conclusion: won't be supported.

**Implementation**

Element #0 read, followed by element #0 write - remote code:
```
read value processor1 "array0"
write value processor1 "arrayRead"
wait 1e15
read value processor1 "arrayWrite"
write value processor1 "array0"
wait 1e15
```

Read at index - local code:
```
op mul address index 6
write address processor1 `@counter`
wait 1e-12
print arrayRead
```

Write at index - local code:
```
set arrayWrite value
op mul address index 6
op add address address 3
write address processor1 `@counter`
wait 1e-12
```

### Remote storage, remote access code: compact array

Supports up to 461 elements.

This scheme is a variation on [Local storage, remote access code: compact array](#local-storage-remote-access-code-compact-array).

Local code is modified to read array element from the remote processor instead of `@this`. Remote code must create instructions using `draw triangle`, reducing maximum array size to 461 elements.

Pros:
* Array access served in a single tick even on microprocessor.
* Frees up to 77 `draw triangle` instructions on the local processor 

Cons:
* Requires synchronization; one element access takes one tick on average (however, read-update-write sequence still requires just one remote call).
* Direct element access is converted to a `read`/`write` operation instead of local variable.

Conclusion: should be supported if the implementation proves to be easy (I expect it will).

### Remote storage, remote access code: regular array

Supports up to 250 array elements.

Array elements are referenced directly by jump tables: no need for `draw triangle` initialization.

Pros:
* Reads may be one instruction faster than compact arrays, depending on context.
* Array access served in a single tick even on microprocessor.

Cons:
* Writes are not faster than writes in compact arrays, perhaps might even be slower sometimes.
* Read-update-write can't be optimized.
* Can't be proofed against FPS > 60.
* Direct element access is converted to a `read`/`write` operation instead of local variable.

Conclusion: compared to compact array, there are practically only downsides. Won't be supported.

**Implementation**

Element #0 read, followed by element #0 write - remote code:
```
write array0 processor1 "arrayRead"
wait 1e15
read array0 processor1 "arrayWrite"
wait 1e15
```

Read at index - local code:
```
op mul address index 4
write address processor1 `@counter`
wait 1e-12
print arrayRead
```

Write at index - local code:
```
set arrayWrite value
op mul address index 4
op add address address 2
write address processor1 `@counter`
wait 1e-12
```


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Advanced counter arrays #221

Single processor

Remote processors

Remote storage, local access code

Local storage, remote access code: compact array

Local storage, remote access code: regular array

Remote storage, remote access code: compact array

Remote storage, remote access code: regular array

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Advanced counter arrays #221

Description

Single processor

Remote processors

Remote storage, local access code

Local storage, remote access code: compact array

Local storage, remote access code: regular array

Remote storage, remote access code: compact array

Remote storage, remote access code: regular array

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions