Developers Forum for XinFin XDC Network

Lokesh
Lokesh

Posted on • Updated on

[WIP] HeadTracker is not capturing few blocks in Apothem / Mainnet

We are witnessing this issues recently, when we setup Plugin2.0.

This issues were not occuring before, but recently. To check the issues, we have executed the Plugin2.0 in "Polygon" and also in "XDC Mainnet". Logs for both the execution is captured below.

Actual issue?

  • In Plugin, when we configure more than 4 jobs, the execution is failed and resulting the following error

"2023-09-25T12:48:34.707+0530 [ERROR] Error in new head subscription, unsubscribed headtracker/head_listener.go:67 err=websocket: close 1000 (normal) evmChainID=50 logger=EVM.HeadTracker.HeadListener stacktrace=github.com/GoPlugin/pluginV2/core/chains/evm/headtracker.(*headListener).ListenForNewHeads"

When we execute the same against "Polygon" chain, it runs without any issues.

Log ::::
Image description

Log shows that, "headtracker is missing" a block, due to which we face "RPC node is not reachable" and it is on & off and instable.

Head & Block logs for Polygon: https://notepad.pw/g50JXvp3RIn1PNfe2Tea

For XDC : https://notepad.pw/o7VtAjvuS4dFtN0YdLyd

You may see the difference in XDC Log, where the blocks are missing.

Note: We have tested the available RPC from "https://chainlist.org/" for Apothem & Mainnet and the results are same.

Let us know if you need some more informations to debug.

Kindly review and let us know the possible solutions.

Discussion (15)

Collapse
wjrjerome profile image
Blockchain Minions

Hi,

I believe I can provide some insights into the issue.

The problem stems from the configuration of the write buffer size inside the ethclient. In the new version of ethclient (available on GitHub at github.com/ethereum/go-ethereum/ethclient), the write buffer size is hardcoded to 1024 bytes. If the data being written exceeds this size, it will not function properly.

Conversely, the xdcpos version of ethclient (located at github.com/XinFinOrg/XDPoSChain/ethclient) uses an older version of ethclient that lacks this limitation on the write buffer size. Consequently, it works as expected.

You can view the relevant code at this link: github.com/ethereum/go-ethereum/bl...

It's important to note that this issue isn't the fault of the ethclient from go-ethereum. The root cause lies in our XDC node's handling of the WebSocket, which is somewhat outdated.

The latest ethclient is attempting to divide large payloads into smaller chunks, with each chunk having a maximum size of 1024 bytes. However, our XDC node lacks a mechanism to handle these chunked data. As a result, when the XDC node receives the first 1024 bytes, it attempts to decode them immediately, even though it's supposed to wait until all the chunks arrive. Because the data is chunked, decoding it prematurely will fail. This explains why you are encountering failed requests when the payload is larger.

To address this issue fundamentally, we are actively working on upgrading the entire WebSocket and its related RPC module in XDPoS.

In the meantime, as a temporary solution, I recommend keeping the batch size of 4 jobs when making requests.

Collapse
logeswaran profile image
Lokesh Author

Thank you so much for the detailed inputs!, this really helps

Collapse
11ppm profile image
11ppm

I am very grateful for your comment. And what I want to ask is, when XDCPos2.0 starts on the mainnet, will this issue be resolved? This is very important for decentralized oracles on the XDC Network. It is absolutely necessary in order to sufficiently run jobs. This is also the case for projects that require oracles on the XDC Network.

Collapse
wjrjerome profile image
Blockchain Minions • Edited on

We have merged the PR into master branch. it's currently undergo testing on our testnet.
PR: github.com/XinFinOrg/XDPoSChain/pu...

It will be fixed/working before v2 consensus is even enabled

Thread Thread
11ppm profile image
11ppm

Thank you for your response, and I also appreciate the daily development efforts of the XDC team. I'm really looking forward to the new V2 consensus.

Collapse
wanwiset25 profile image
Wanwiset Peerapatanapokin

Hi @logeswaran

Sorry for the delay. After spending way too much time checking the server side, I found an incompatibility in the client side. Would you mind to try using the ethclient from the XDC repo.

Instead of

"github.com/ethereum/go-ethereum"
"github.com/ethereum/go-ethereum/common"
"github.com/ethereum/go-ethereum/core/types"
"github.com/ethereum/go-ethereum/ethclient"

Please put
ethereum "github.com/XinFinOrg/XDPoSChain"
"github.com/XinFinOrg/XDPoSChain/common"
"github.com/XinFinOrg/XDPoSChain/core/types"
"github.com/XinFinOrg/XDPoSChain/ethclient"

Collapse
logeswaran profile image
Lokesh Author

Thank you @wanwiset25, it works fine(tested with sample repo). But we have nearly ~1300 files to be changed with this updates in the Plugin Core.

Do we have any other alternate fix to accomplish this?.

Collapse
wanwiset25 profile image
Wanwiset Peerapatanapokin

I'm still looking further regarding the root cause of this. It is strange because I found it to be a websocket message size issue on the server side.

Thread Thread
logeswaran profile image
Lokesh Author

sure, please update when you find the root cause..

Thank you for your support here

Collapse
logeswaran profile image
Lokesh Author

Sure, will test the same and revert shortly please.

Collapse
wanwiset25 profile image
Wanwiset Peerapatanapokin

Hi @logeswaran

Thank you for reporting this issue. Please allow us some time to check on this.

Collapse
wanwiset25 profile image
Wanwiset Peerapatanapokin

Hi @logeswaran

Would you mind to share additional info.

As I understand you are facing multiple issues.

  1. Execution failed when running 4+ jobs
  2. Block skipping on "Headtracker"

These are my questions for further debug

  1. Is the problem still occuring, in which time frame (in case it's network issue).
  2. What is the websocket url used in problem 1. and which calls are being made
  3. What is the RPC url used in notepad.pw/o7VtAjvuS4dFtN0YdLyd and what calls are used to get this info.
Collapse
logeswaran profile image
Lokesh Author

Hi @wanwiset25 ,

Thanks for your reply. Here are our responses.

1) Yes, the problem is still occuring. We tried all RPC/WSS provided in the chainlist
2) chainlist.org/?search=xdc (We tried both APothem / Mainnet)

You can try this repository to replicate the issue:
github.com/GoPlugin/Xdc_WebSocket_...

Collapse
ruslan_wing profile image
ruslan wing

Hello Lokesh

Can you please use the Archive node RPC and do a test

Main-net RPC

RPC:- arpc.xinfin.network
WS:- wss://aws.xinfin.network

Apothem RPC

RPC:- arpc.apothem.network
WS:- wss://aws.apothem.network

Collapse
logeswaran profile image
Lokesh Author

Hi Ruslan - thanks for the inputs

Tested with Apothem Websocket, getting same issue, pleas find the snapshot below.

with 5 jobs -> xdc.dev/uploads/articles/php9x2t5j...

with 4 jobs -> xdc.dev/uploads/articles/exxfa0geq...

Mainnet Websocket -
xdc.dev/uploads/articles/a7r5imeo7...
4 jobs -> no error
5 jobs -> errored

kindly check.