Devlog: Protocol 23 support, resumable ingestion, and historical specs

I’m Tim Baker, CTO at Hoops Finance. We build market infrastructure on Stellar with one principle: correctness first, then speed at scale. These devlogs walk through the actual implementation details—schemas, indexers, caching, and replay—so you can see how we make data precise and durable before we make it fast.

The Challenge

This update is about turning a finicky, version-sensitive parsing stack into a boring, reliable machine. Protocol 23 landed with Soroban v4 meta; contracts keep upgrading in the wild; and we refuse to re-ingest the same pages ever again. So we went deep on three fronts:

Support Soroban v4 TransactionMeta (Protocol 23) while keeping v3 behavior intact.
Make ingestion resumable so we can pick up exactly where we left off.
Resolve historical function names/args even when contracts upgrade and rename things out from under us.

Some Notes about our Data

One important context point up front: Stellar.Expert is only one input. We run a captive core, a local Horizon API, and a local Soroban RPC, all backed by full archivers. Classic ledger state and all historical market data live in Postgres (for transactional correctness and SQL analytics). We persist “human readable” objects and raw transaction XDRs in a MongoDB replicaset (for fast API responses and horizontal scale). We also ingest Stellar ETL to build a Galaxie dataset locally—mirroring what’s available on Google Cloud—so we can do large-scale, columnar analysis without rate-limiting headaches. Neo4j powers our on-chain relationship graph. StellarExpert is our pragmatic backfill and gap-filler when we want batched historical pages with clean paging links; the canonical stream and replay capability come from our own archivers. We validate parts of our ingestion against third-party sources—including Stellar.Expert indexed views—precisely because they make it trivial to filter by account, asset, timestamp, and operation types. That validation step is also why building page-resume into our StellarExpert finder mattered so much: we can rerun comparisons without re-pulling the universe.

Hoopy, Hoops Finance’s mascot, feeling the power of Protocol 23 (Whisk)

Storage and compute: why we use Postgres, MongoDB, and Neo4j together

Each database has a job:

Postgres holds classic ledger state and historical market data. It’s our “sober accountant”: correct, durable, and perfect for SQL analytics and ledger-pinned snapshots.
MongoDB (replicaset) is our fast edge cache. It stores raw XDRs and humanized docs we serve to clients, plus the most common API aggregates: period stats, market summaries, hot paths, etc. It backs the dashboard for low-latency reads and horizontal scale.
Neo4j is our relationship brain. Contracts ↔ accounts ↔ assets ↔ pools, with edges for calls, swaps, approvals, liquidity moves. When the question is “how is this connected and what’s the real path capacity to USDC?”, a graph database is the right hammer.

We also maintain a local Galaxie dataset from Stellar ETL so we can run large, columnar jobs (think: fleet-level analytics, backfills, and consistency checks) without timeouts or per-request rate limits. The point isn’t to trust any single source; it’s to triangulate—our own archivers, ETL, and third-party indexed views (like Stellar.Expert) cross-validate each other so we can catch gaps early and fix them fast.

Why we validate against third-party indexes

Even with captive core, Horizon, RPC, ETL, and our own archives, external validation is non-optional. StellarExpert does a great job at discoverability—finding transactions by account, asset, timestamp, and filters—which makes it ideal for cross-checks and forensic queries. That’s why we made the page resume in our StellarExpert finder a first-class feature: when we compare windows or re-run point-in-time checks, we can land on the same page again, deterministically. It’s boring. That’s the point.

Never re-ingest a page again: resumable ingestion + raw page storage

Two practical changes unlocked stability and speed.

1) Store raw transactions before parsing

We created a Mongo collection transactionDetails and write one flat document per transaction before any parsing. Each doc includes:

_id (tx hash), ledger, ts (unix secs), protocol, paging_token, feebump?, sponsor?,
sourceaccount, optional accounts,
contractIds: string[] (we attach the lookup contractId so joins by contract are trivial),
txbodyxdr, txmetaxdr, txresultxdr (base64 strings as returned),
storedat: Date.

We indexed the obvious fields (_id, ts, ledger, contractIds+ts, paging_token). We added readers to query by hash/contract/time window and a batch writer for fast upserts. If a run dies mid-page, we still have the entire source as it was fetched, which makes replays deterministic.

2) Resume from exactly where we left off

Our fetcher now accepts a startUrl and calls us back with the response’s _links.self. We store that URL under the protocol summary document (paging[<pairId>].lastSelf). On restart we load lastSelf and continue from the exact same page. No rewind. No double processing. This is especially useful for validation against StellarExpert: we can page to the same window repeatedly to compare our captive-core/horizon/RPC view with their indexed snapshot, without re-fetching the world.

Handling contract upgrades: historical specs for historical calls

Contracts upgrade. Functions get renamed. Args shift. If you fetch the current WASM to build a spec for an old transaction, you’ll eventually hit “no such entry.” We treated the code itself as a dataset and solved this properly.

A historical code store keyed by WASM hash

We added a ContractCode collection keyed by WASM hash. For each entry we persist:

_id (wasm hash), operation, ts, paging_token,
bytecode: Uint8Array (exactly as the RPC returns it; no base64 re-encoding),
contractIds: string[] (a set—code can be shared across contracts),
version: number (deterministic per contract, assigned 1..N by ts ascending),
storedat: Date.

We acquire the version list from StellarExpert’s contract versions endpoint (handy for paging) and fetch the raw bytes from our local Soroban RPC via getContractWasmByHash. Missing hashes are downloaded and upserted; we then recompute per-contract versions. Important detail: we store bytes exactly as returned. No base64 gymnastics—those transformations are easy to get wrong and will break spec decoding later.

The Problem Code

In our code we parse every transaction and one of the things we track is what function was called, by what account, and what were the arguments. We then try to infer what the contract does by comparing the spec of the contract, and the effects the transaction had. So we can look at a code snippet to see the problem.

//console.log(parsedTransaction);
                
 if (TransactionParser.TxParser.isParsedInvokeContractFunction(parsedTransaction)) {
                  let functionInputs: xdr.ScSpecFunctionInputV0[] | undefined;

                  // 1) Try the current live spec for this contractId
                  try {
                    const spec = await getSpec(parsedTransaction.contract);
                    functionInputs = spec.getFunc(parsedTransaction.function).inputs();
                    console.warn("Error fetching function inputs:", err, parsedTransaction);
                    for (const arg of parsedTransaction.args) {
                      console.log("arg:", JSON.stringify(arg));
                    }
                    // Do not abort the whole record; keep args unannotated
                    functionInputs = undefined;

The Error

                   /*
    Error fetching function inputs: Error: no such entry: multi_arb
    at Spec.findEntry (/home/admins/hoops/indexer_update/indexer/node_modules/@stellar/stellar-sdk/lib/contract/spec.js:546:15)
    at Spec.getFunc (/home/admins/hoops/indexer_update/indexer/node_modules/@stellar/stellar-sdk/lib/contract/spec.js:502:24)
    at <anonymous> (/home/admins/hoops/indexer_update/indexer/src/main.ts:153:43)
    at async Finder.fetchTransactionsStream (/home/admins/hoops/indexer_update/indexer/src/Fetchers/StellarExpertTxFinder.ts:303:25)
    at async Function.stream (/home/admins/hoops/indexer_update/indexer/src/Fetchers/StellarExpertTxFinder.ts:154:13)
    at async main (/home/admins/hoops/indexer_update/indexer/src/main.ts:95:9) {
  source: 'GDU5VNXTKRHXKM4EHWUX3MEDHRCZ65FKSMQZAJHQYKQN7A6ZT4NQZWJ6',
  isFeeBump: false,
  OperationType: 'InvokeContractFunction',
  contract: 'CCBVCCNPIFMCXW7S3GBM6IOQBD5TEUCSQ6WWJGB5VIXZCRVJJQHQQE23',
  function: 'multi_arb',
  args: [
    'CAS3J7GYLGXMF6TDJBBYYSE3HQ6BBSMLNUQ34T6TZMYMW2EVH34XOWMA',
    '0',
    [ [Array], [Array] ]
  ]
}*/

Seeing this error (i put it inline with some code so you could see how it worked)
we knew there was a problem, the function name the account called in the transaction just didn’t exist. When a contract gets updated to have different function names it's difficult to parse currently. that's why we should kkeep track of the bytecodes at ledgerheights but we don't currently so we might wanna fix for this multi_arb transaction that's now called yeet.

So how did we fix it?
Well, The main problem is in these two lines:

const spec = await getSpec(parsedTransaction.contract);                     
functionInputs = spec.getFunc(parsedTransaction.function).inputs();

Basically the problem is that at the time the transaction executed the contract had a function called multi_arb yet at some point the contract admin updated the hash of the contract to a new byteccode which didn’t have this function call at all, and instead the closest function it had was yeet but it had actually different arguments.

This of course means you cannot figure out what happened, since the transaction says i called multi_arb and the contract itself says “I don’t have a `ulti_arb`”

Choosing the right spec for the right time

At parse time, we attempt the spec from the current WASM first. If getFunc(name) throws, we fall back:

Load the contract’s code history from the DB (already warmed by the prefetch/cache step).
Choose the newest WASM with timestamp <= tx.timestamp (or the newest overall if none qualify).
Build/lookup a compiled Spec from the raw bytes (kept in an in-memory cache keyed by hash).
Return the first spec that actually exposes the function in that transaction, then decode the inputs (names/types) accordingly.

In practice this is fast—cache warming per contract means we’re usually pulling a spec from memory. And it’s robust: old transactions keep their original semantics, even if the live contract has since been refactored or renamed.

So luckily in our getSpec function, we are able to look up a spec by contract id. Specifically we use some code like this:

   const wasmByteCode = await config.rpc.getContractWasmByContractId(contractId);
    const spec = (await Client.fromWasm(wasmByteCode, clientoptions)).spec;
    return spec;

Luckily the rpc server also provides a similar function but by wasm hash.

So now, we get the historical version hashes (for now from stellar.expert) and then we iterate through each of the versions.

We also store the raw bytecode in our database for future lookup, and we also cache the spec, so we can quickly index future transactions that use this contract version.

And that’s about it for dynamically parsing an upgradable contract.

The sun is the money, all the rest is dark matter 🤣

Why does this matter for users

If you’re consuming our events and swap/liquidity data, here’s what you’ll notice:

Fewer gaps and retries. We process transactions on every ledger cloe from our captive core/Horizon/RPC stack, and we use StellarExpert only to grab backfill where batched paging is convenient. That means steadier real-time views and deterministic historical replays—no stampedes against upstream services, and no weird holes in charts.

Stable semantics across upgrades. Soroban lets contracts point to new bytecode. Great for dev velocity; also a potential footgun for indexers. We parse historical calls against the exact WASM that was active when the call happened. Not just for the AMM you’re watching, but for upstream contracts in the call graph that may have triggered the observable event. Old data stays true to its original meaning, and renamed functions don’t silently evaporate from history.

Richer context, not just events. If we’re indexing swaps for an AMM, we don’t stop at “a swap occurred.” We link the contract that caused the swap, the caller, and the relevant addresses. That context powers the dashboard and the API. MongoDB here acts as a caching layer for the dashboard and serves the most common API requests (period stats, market summaries, top routes) with sub-second latency.

Graph-first analytics that reflect reality. All of this rolls into a graph data model (Neo4j): contracts, accounts, assets, pools, and edges for calls, swaps, approvals, liquidity moves. Graphs let us ask structural questions that time-series can’t answer alone:

What are the executable off-ramp paths from Token A to whitelisted stables (USDC, EURC, USDX, XLM)?
What is the real capacity along those paths, accounting for slippage and per-edge liquidity?
Which nodes are critical connectors—if they vanish, connectivity to stables collapses?

This is how we separate appearance from reality. Example: a spam token pairs with USDC for $70 of actual off-ramp capacity, pairs with another spam token for a giant nominal pool, and suddenly some dashboards claim “billions TVL.” The graph shows the truth: the only safe off-ramps cap out around $140 total, and that drains as soon as someone exits. We price by executable path capacity to stables, not by wishful spot math.

Better risk signals. By running convolutions over trading and liquidity events across time—and incorporating path-aware, executable liquidity—our models produce scores meant to predict how likely it is you’ll lose money with an asset. Those scores inform smart accounts that continuously rebalance across Stellar markets. Cleaner inputs → better decisions.

Easier access to the data. We’re exposing these datasets through both REST and GraphQL/RPC, with Postgres backing the canonical history, MongoDB serving the fast aggregates, and Neo4j powering relationship queries. Our local Galaxie dataset (from Stellar ETL) underpins heavy analytics jobs and backfills that would otherwise be impractical to run against live APIs.

Operational resilience. Because every fetched transaction lands in transactionDetails, we can reproduce issues without waiting on external APIs. That shrinks mean time to insight. The faster we index, the fresher your data. Our target remains real-time at every ledger close—and we’re getting closer.

Conclusion

Protocol 23 changed the shape of Soroban meta. Contracts keep upgrading. Indexers that pretend otherwise eventually drift into fiction. We decided to make the system boring: normalize v3/v4 meta to one event model, store raw pages before parsing, resume from _links.self so we never re-ingest, and resolve historical specs from a local code archive keyed by WASM hash. Add in captive core + Horizon + RPC archivers, Stellar ETL’s Galaxie dataset, third-party validation against StellarExpert, and a split data model (Postgres + MongoDB + Neo4j), and you get the same outcome every time: stable semantics, fast access, and fewer surprises.

These aren’t flashy features, but they compound. Cleaner ingestion and historical correctness flow downstream into our dashboards, our public APIs, our GraphQL endpoints, and ultimately the risk scores our smart accounts use to make decisions. The whole point of an indexer is to make the chain legible without distorting it. That’s what this release is about.

Thanks for reading—and for using Hoops Finance. If there’s a specific metric, route, or market pattern you want us to surface next, drop a note. We’re listening.

Start building with Hoops Finance’s API

One last note: our repos aren’t public, but the results are. You can feel the impact right now—the data is fresher, deeper, and more consistent end-to-end.

If you want to kick the tires:

Real-time and historical market data: api.hoops.finance
Schemas and endpoint details: docs.hoops.finance

Thanks for riding along on this deep dive. Onward—and happy shipping.

Tim

Devlog: Protocol 23 support, resumable ingestion, and historical specs—making the indexer bulletproof