The third Trustlines validator auction was a success. Afterwards, the Trustlines Protocol development team started preparing the update for the chain. Unexpected events took place, the validator period change was not smooth, and we learned valuable lessons.
We deployed the new set of validators to the Trustlines Blockchain. The smart contract contained the list of validators for the third validator period. A block height for the update to take place was decided and set to block 6259269. Calculated by the average block times, this block would be reached on the 22nd of March 2021.
The fork block came at the expected date, but not everything went according to the original plan. There weren't enough validators online to reach consensus and finality for the deployed smart contract at the time of the fork block. Following that, an update was released. In the end, the update window was too short. Upon giving further instructions to the validators on correctly applying the changes if they had already produced blocks on the wrong chain, the new chain became the longest, and the third validator period was in action.
Timeline of Events
22nd of March, 2021 02:00 – 14:00 UTC
The day of the calculated chain split arrived and we expected that a swift validator period change would take place. After observing the chain for a while, we concluded that this was not true. The expected fork block was reached, but the new validator set did not activate. Upon a closer inspection, it turned out that less than 51% of the validators were online. This led to the nodes ignoring the proposed change of the set. An attempt to reach validators from the second period failed. There still were not enough validators to reach finality.
Our first immediate course of action was to try to reach as many of the validators as possible and ask them to come back online with their nodes. During this time, we made preparations to move from a smart contract-based validator set to a list. With this solution, the validator list is hardcoded into the chain spec, and the finality of the smart contract can be circumvented.
We need to ensure that the validators are incentivized to remain online until the next validator period switch has been completed. Also, taking active steps with the community to remove inactive validators should prevent the issue of not reaching finality.
22nd of March, 2021 14:00 – 22:00 UTC
The new release gave the validators a grace period of about three hours to update their nodes. Once the grace period was over, and the new chain specification was meant to go into effect, we saw the chain split at about 8 PM CET on the 22nd of March. This time there was a successful fork of the chain but the result was not what we were expecting to see.
In retrospect, this was a mistake. We operated under the assumption that most of the nodes would have the watchtower component and auto-updates on which was not the case.
Unfortunately, this choice wasn't a successful one for the majority of validators. We have not yet entirely determined the cause for this failure. Possible issues could be that some nodes with watchtower and auto-updates enabled only check for updates every 24 hours, not every 10 minutes, as was the case when TLBC launched. Another reason could be the docker pull limit being reached. This attempt was successful for a few nodes and they got the updated chain spec, but a majority remained at the old chain spec after the grace period of three hours.
The second chain specification update was prepared and we set out to push the change live the same day. As we were already past the fork block and the change hadn't happened as expected, we wanted to act quickly. Time was of the essence, so the update was pushed out on the same day.
Another communication push to ask validators to apply the second update was completed. Many did perform the necessary upgrades, but they still continued to be on the wrong chain.
When propagating a chain update, we shall give enough time for the validators to update their nodes. We will also ensure that the chain specification is adequately applied before the expected chain split for most of the validators.
OpenEthereum doesn't seem to self-correct when the chain spec changes; this is something to keep in mind when applying updates.
23nd of March 2021 08:00 – 16:00 UTC
It turned out that if you were validating on the wrong chain and then applied the correct chain spec file, the node doesn't automatically correct the blocks that were "erroneously" validated. It seemed to be the case that OpenEthereum considers the longest chain always to be the correct one, ignoring the chain spec file.
We were hoping that once the new chain would overtake the old chain, the old chain's nodes would switch to the correct chain. But that didn't happen. Our current estimate is that those validators, for some reason, are running an incorrect chain spec or that perhaps OpenEthereum doesn't support this.
It turned out that a node operator had to erase the OpenEthereum database and then let the node resync the chain. By doing this, the nodes ended up on the correct chain, despite being thousands of blocks behind the other chain.
Once validators started performing the erases and resyncs, the new, shorter chain slowly began to produce blocks at a faster pace than the old chain.
Let the validators know that they can erase their database and start a resync to correct their course if they miss a fork.
As there is no smart contract involved in this current setup, another fork will be needed to switch to it. The smart contract is required for the full operability of the Trustlines Blockchain. It will enable the validators' slashing conditions and allow people to convert their TLN to TLC via the bridge.
Therefore, the Trustlines Foundation will propose another fork to remove the inactive validators from the validator set.
See the Proposal to remove inactive validators from the third validator period forum post for more information and developments regarding this proposed upcoming fork.