- Vincent Lyles

# Random Out Of Order Execution (ROOOX) - A Guide for Non-technical People

Our generalisable approach produces Random Out Of Order Execution [ROOOX] of an

algorithm. Some or all non-commutative steps and/or some or all commutative portions of the algorithm are randomly presented for execution. There is random mixing both within individual runs and between different runs of the algorithm. Randomisation occurs between each run of the algorithm. Maximising ROOOX leads to side channel leakage which tends to “white noise” for readable data.

For ROOOX, an algorithm is unrolled, de-nested, and split into portions of sufficiently fine

granularity that they can all be randomised and presented in a wait-free asynchronous manner with two critical fairness guarantees, viz:

1. Although each task will be presented for execution in a random order it is guaranteed

to be executed at least once within the execution time of the total number of tasks in the pipeline;

2. Although the tasks in the pipeline will be presented for execution randomly the

whole pipeline is guaranteed to be executed.

For a given plurality of sequential blocks of plaintext, the ones earlier in the sequence will have been enciphered or will be having later tasks executed and for ones later in the sequence, they will be having earlier tasks executed or be waiting to join. This relationship arises from the overall logic of execution being preserved.

The level of granularity may be chosen so that the arrangement tends towards an embarrassingly parallel problem and linear scaling may apply. Moreover, the amount of ROOOX applied to an algorithm may be tailored to gain a balance between improved security and increased performance costs.

Non-commutative ROOOX

Let us demonstrate the workings of the invention with reference to 128bit AES, a common encryption algorithm. Each cycle of 128bitAES comprises ten rounds involving the steps AddRoundKey, SubBytes, ShiftRows, and MixColumns. There are 40 steps for each cycle of the algorithm to encrypt one block of plaintext and they must be executed in order. To aid understanding, let us consider these 40 steps are represented by the number cards in a pack [face cards removed] and they must be played in the order all 1s, 2s, 3s, to 10s as Hearts, Clubs, Diamonds and Spades for each number. Imagine also a large spreadsheet with a long column of data blocks waiting to be processed and a top row having the order of the cards. For native 128bitAES the cards would be turned over in the order above and ticked off. For every deal of the deck a plaintext block gets fully ticked off. It is predictable execution and therefore vulnerable to attack. Runtime could be altered by the addition of joker cards.

The invention shuffles the cards; 40 cards, with 40! possible permutations, which equals 8E47 - a huge number. All the cards are dealt and some will be turned over in order [a bit like they can be played out in Patience or Solitaire]. The cards in order can be ticked [or checked off] on the spreadsheet. The cards are picked up and shuffled again. The steps just ticked off are now available to the next data block down the list and the first row is ready to be ticked off from where the ticking stopped. The cards are dealt again but there are now two data blocks being processed. At the end of the deal more boxes are ticked for the upper data block and they will be available for the block below. Some of the cards for the lower block will have come out in order and be available for the data block below that one. The cards are picked up and shuffled again and another data block begins to get processed. Typically after about 20 cycles, every card turned over elicits a tick on the spreadsheet but no one can predict how many for each block being encrypted and what order there will be between blocks, for laying them down, or how many blocks will have unfinished rows of ticks. As stated above, all one can say is that earlier blocks will have completed encryption, or be having later steps executed, and later blocks will be having earlier steps executed, or be waiting in a queue.

In this example, one doesn’t need to shuffle all the cards, one could shuffle only the even ones, or only reds, or only Spades. Shuffled and non-shuffled parts are melded together without disrupting order and played. Similar tailoring of randomness can be done on the 128bitAES steps. The technique can be applied to any algorithm.

Commutative ROOOX

If we break the steps of 128bitAES into finer granularity and de-nest parts we get the following:-

If we consider the cards and the spreadsheet, the row of boxes to tick is now massive; each AddRoundKey is replaced by 16 cards, each SubBytes step is replaced by 16 cards, there are four cards for MixColumns and four for ShiftRows. The cards stay as packs for each round and these packs are dealt in the round order. However, there is shuffling within each pack and so one moves back and forth across the database for each deal. The pattern of moving to and fro is different each time the cards are dealt. The loss of sequential ticking left to right is comparable to the loss of pattern in executing 128bit AES.

Herbst et al and their 2006 paper “An AES Smart Card Implementation Resistant to Power Analysis Attacks,“ discuss permuting first and last rounds to stop the attacker knowing when a run starts or ends. The top of p234 says:

“The statistical effects of randomization have been studied in [CCD00] and [Man04] in detail. Both papers come to the same conclusion. If the probability that the intermediate value occurs at a certain time is p, then the correlation coefficient decreases by a factor of p and the number of measurements needed for a successful attack increases by a factor of p2 [sic - I think this should be 1/p2].” With our invention, there is great variation between each trace. If we consider, say, randomising all the commutative tasks in the 128bit AES tasks of AddRoundKey and SubBytes, then these tasks occur as 19 loci along each trace and there will be 16! [2.09+E13] possible permutations for each locus. All those probability “p” values mentioned above, it seems to me, will become very small. Now, if we permuted presentation of the 36 non-commutative steps of 128bit AES, the likelihood of any intermediate value occurring at a certain time all but vanishes [36! [3.7+E41] permutations => a one in 3.7+E41 likelihood of a certain value occurring at a certain time].

Increased security inevitably brings performance costs. However, by working at the finer granularity level it tends towards an embarrassingly parallel problem and linear scaling may follow per Gustafson-Barsis’ Law. This formed part of Jeremy’s MSc by research.