Skip the joins with Semantic ABI
An open source library for analytics ready decoding of EVM transactions.
One of the hardest parts to understanding on chain Web3 usage is just getting the data in the right shape for analysis — be it SQL in Dune or through clicks in Evidenly. Even after decoding logs and traces from their hex in transactions with their ABIs, you’re left with semi-structured JSON of arbitrary nesting due to struct and array fields.
tl;dr Annotate your ABIs and skip all the SQL and slow runtime joins with this.
Why is Everything a Table?
To handle this semi-structured JSON, most Web3 data tools try to normalize everything to a table. A decoded project like for Lens Protocol will result in dozens of tables, one for each event and function.
Even with these normalized to the lowest common denominator of a
single table, more complex — but common — fields of type structs and
arrays are still left as hard to use “blobs”. This forces things
like orders (a struct with nested arrays of more structs) in the
fulfillAvailableOrders Seaport call to be
jammed into a single column as an array of strings that needs to be
parsed out manually later in SQL.
After all this work to normalize, these single tables are not actually useful on their own. The next step is to denormalize and bring them back together through joins and unions so that the final “curated” table is ready for analytics and has all the fields needed to understand the behaviors and users of the project. This is usually by joining on the hash of the transaction that emitted all these tables, a pretty expensive high cardinality operation.
Kinda went in a circle there...
We broke down a nicely structured transaction into dozens of tables just to painstakingly stitch it back together by transaction to make sense of the transaction.
Skip the Tables
What we really want to do is to get to skip the intermediate tables and get to the final dataset directly from the hierarchical logs and traces of a transaction. A smart contract ABI is the schema of this structure so lets start there and just give it a bit more meaning.
The Semantic ABI does just this, it allows us to annotate an ABI with additional meaning — semantics — so that we can get to an analytics ready dataset in a single shot.
Explosive Unions
We often need to compare or aggregate data from different events or
functions requiring us to union and normalize multiple tables. We
can do this easily with @isPrimary
annotations on multiple items in a Semantic ABI, but lets make it a
little harder with one function containing an array.
Lets create a single dataset containing orders from both
fulfillAvailableAdvancedOrders and
fulfillBasicOrder_efficient_6GL6yc traces for
all Seaport transactions. The first step is to
@explode
advancedOrders in
fulfillAvailableAdvancedOrders, creating a
single row for each item in the array. Explode automatically applies
values hierarchically so fields like
fulfilled will be copied to every exploded
child row.
All we need to do now tag both as
@isPrimary which will align basic and exploded
fields by name with manual alignments possible with name in
@transform such as for
parameters_orderType. The resulting table will
contain all orders from both functions, ready for analytics.
The final Semantic ABI will look something like:
{
"metadata": {
"chains": ["ethereum"]
},
"abi": [
{
"name": "fulfillAvailableAdvancedOrders",
"@isPrimary": true,
"@explode": {
"paths": ["advancedOrders"]
},
"inputs": [
{
"components": [
{
"components": [
{
"internalType": "address",
"name": "offerer",
"type": "address"
},
{
"internalType": "enum OrderType",
"name": "orderType",
"type": "uint8",
"@transform": {
"name": "parameters_orderType"
}
},
...
],
"internalType": "struct OrderParameters",
"name": "parameters",
"type": "tuple"
},
...
],
"internalType": "struct AdvancedOrder[]",
"name": "advancedOrders",
"type": "tuple[]"
},
...
],
"outputs": [
{
"internalType": "bool[]",
"name": "fulfilled",
"type": "bool[]"
},
...
],
"stateMutability": "payable",
"type": "function"
},
{
"name": "fulfillBasicOrder_efficient_6GL6yc",
"@isPrimary": true,
"inputs": [
{
"components": [
{
"internalType": "address payable",
"name": "offerer",
"type": "address"
},
{
"internalType": "enum BasicOrderType",
"name": "basicOrderType",
"type": "uint8",
"@transform": {
"name": "parameters_orderType"
}
},
...
],
"internalType": "struct BasicOrderParameters",
"name": "parameters",
"type": "tuple"
}
],
"outputs": [
{
"internalType": "bool[]",
"name": "fulfilled",
"type": "bool[]"
}
],
"stateMutability": "payable",
"type": "function"
}
]
}
Dependent Joins
Lets do some more analysis on Seaport conduits this time. To do
this we will need to join
fulfillAvailableAdvancedOrders trace which
contains the conduit field with the
OrderFulfilled log to validate which orders
where actually fulfilled. Since we’re mildly paranoid, lets also
verify this with an actual transfer log.
Instead of complex joins we simply use the
@matches annotation which will find events,
traces, or transfers within the same transaction (much more
performant than joins in SQL) to bring together all the fields
required for analysis into a single dataset. No further joins
required.
Matches are applied sequentially so we can first match the single
fulfill trace with multiple OrderFulfilled
events and then match each resulting row with a single expected
transfer for verification. We can be explicit with verifications by
asserting on expected cardinality with
many vs.
onlyOne.
Here’s what that Semantic ABI will look like.
{
"metadata": {
"chains": ["ethereum"]
},
"abi": [
{
"name": "fulfillAvailableAdvancedOrders",
"@isPrimary": true,
"@matches": [
{
"type": "event",
"signature": "OrderFulfilled(...)[])",
"prefix": "fulfill",
"assert": "many",
"predicates": [
{
"type": "equal",
"source": "recipient",
"matched": "recipient"
}
]
},
{
"type": "transfer",
"prefix": "transfer",
"assert": "onlyOne",
"predicates": [
{
"type": "equal",
"source": "fulfill_offerer",
"matched": "fromAddress"
}
]
}
],
"inputs": [
...
{
"internalType": "bytes32",
"name": "fulfillerConduitKey",
"type": "bytes32"
},
...
],
"outputs": [...],
"stateMutability": "payable",
"type": "function"
},
{
"name": "OrderFulfilled",
"anonymous": false,
"inputs": [...],
"type": "event"
}
]
}
The Rest
There’s a bunch more features including adding derived fields through expressions and is implemented in both Python and TypeScript, take a look and contribute here!
