[Proposal] Tracing bottleneck analysis

XDC tracing bottleneck

It was reported that tracing a block will often take a long time and time out in the end which causes issue for indexers such as elliptic. Here we investigate the bottleneck and propose a solution.

Cause of issue

To trace a block/transaction the state needs to be prepared and it would be re-executed via EVM in addition to extra processing for the tracing. It is easily demonstrated via a few queries that the issue lies in the re-execution. Let's observe:

❯ time curl -X POST -H "Content-Type: application/json" --data '{"method":"debug_traceTransaction","params":["0x52316c4e59d7db74801c8370044eff4a7eb58b79daec81bb9c0e147762cb44ba", { "tracer": "noopTracer", "timeout": "600s" }],"id":1,"jsonrpc":"2.0"}' https://rpc.xinfin.network
{"jsonrpc":"2.0","id":1,"result":{}}
curl -X POST -H "Content-Type: application/json" --data    0.02s user 0.01s system 4% cpu 0.474 total

This re-executes a simple transfer in the middle of the block. Time taken is 0.5s.

The following transaction executes a contract and uses 40k gas:

❯ time curl -X POST -H "Content-Type: application/json" --data '{"method":"debug_traceTransaction","params":["0x7204e1f9339ee3fc9ec7bf75bf077470a2326e6e314f3b5e8222b038d0314db4", { "tracer": "noopTracer", "timeout": "600s" }],"id":1,"jsonrpc":"2.0"}' https://rpc.xinfin.network
{"jsonrpc":"2.0","id":1,"result":{}}
curl -X POST -H "Content-Type: application/json" --data    0.02s user 0.01s system 5% cpu 0.463 total

Again around 0.5s.

Now here's a transaction that's burning more than 6 million gas. First we use the noopTracer which returns after roughly 9s. And callTracer which actually does processing of the data fails with timeout.

❯ time curl -X POST -H "Content-Type: application/json" --data '{"method":"debug_traceTransaction","params":["0x4b5f5dc7c86fdc275fe5466c58ffca3ae6a76eded952a73c6d88b836183fa4a4", { "tracer": "noopTracer", "timeout": "600s" }],"id":1,"jsonrpc":"2.0"}' https://rpc.xinfin.network
{"jsonrpc":"2.0","id":1,"result":{}}
curl -X POST -H "Content-Type: application/json" --data    0.02s user 0.01s system 0% cpu 9.397 total

~ 9s
❯ time curl -X POST -H "Content-Type: application/json" --data '{"method":"debug_traceTransaction","params":["0x4b5f5dc7c86fdc275fe5466c58ffca3ae6a76eded952a73c6d88b836183fa4a4", { "tracer": "callTracer", "timeout": "600s" }],"id":1,"jsonrpc":"2.0"}' https://rpc.xinfin.network
{"error":"http server timeout"}curl -X POST -H "Content-Type: application/json" --data    0.02s user 0.01s system 0% cpu 10.387 total

This tells us tracing bottleneck is (not surprisingly) in execution. To understand that better keep in mind the difference between noopTracer and callTracer used above. They are both Javascript codes that are invoked by the client on various events (most notably opcode execution). The difference is tho that noopTracer does no processing whatsoever. However callTracer actually has JS code to infer the structure of a tx (it's calls and subcalls) from those opcodes. Executing JS code from Go is expensive, hence this takes a long time.

Proposal

As mentioned above the EVM will call into the JS step() function of the callTracer for every opcode executed. When we have 6 million gas, that is potentially a lot of opcodes. This issue has been largely mitigated in upstream go-ethereum:

Improving the JS callTracer

Better JS engine

Upstream go-ethereum has migrated from duktape to Goja as its JS interpretation engine. The large difference is duktape is written via cgo. That means any time a JS code is executed (once per opcode) the runtime has to cross the Go-C barrier. On the other hand, Goja is written in native Go. It performs faster. This is a low-hanging fruit and will result in a 2x improvement. The code for the goja tracer.

OnEnter/OnExit events

callTracer gathers information about all the EVM calls within a transaction. In the current version of XDC, this is done via meticuluosly checking every opcode, seeing if it is a CALL/DELEGATECALL/STATICCALL or CREATE/CREATE2 and marking the beginning of the call. Then on a RETURN or REVERT note the end of that call. However this means this step() function will be processed for every other unrelated opcode such as PUSH* LOG* DUP*, etc. too.

In upstream go-ethereum new event hooks have been added to the EVM to capture calls directly. After adopting them it will be possible to update the JS callTracer to use that information instead of needing to parse that info from the opcodes. This will result in an order of magnitude improvement.

Native tracers FTW

Interpreting JS code from Go is very expensive. Specially that XDC client is still using duktape as its JS engine which has to cross the Go-C barrier as well. Upstream go-ethereum has migrated to a native Go callTracer which is possibly 1-2 orders of magnitude faster. This is definitely the way to go long-term. Here's the code. This would require a larger refactor of the tracing infrastructure.

Developers Forum for XinFin XDC Network

Developers Forum for XinFin XDC Network is a community of 6,466 amazing users

[Proposal] Tracing bottleneck analysis

XDC tracing bottleneck

Cause of issue

Proposal

Improving the JS callTracer

Better JS engine

OnEnter/OnExit events

Native tracers FTW

Discussion (0)

Read next

ZenithBlox X XDC -Collaboration Proposal

XDC Wallet Integration

How to Speed Up XDC Network Node Sync Using Snapshots

Revolutionize Your Wellness: VitalVEDA's Innovative Mobile App Now Live