Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: migration("re-indexing"), backfilling and diasgnostics tooling for the ChainIndexer #12450

Open
wants to merge 27 commits into
base: feat/msg-eth-tx-index
Choose a base branch
from

Conversation

aarshkshah1992
Copy link
Contributor

@aarshkshah1992 aarshkshah1992 commented Sep 11, 2024

ChainIndexer Migration and Diagnostics Tooling

This PR implements the "migration" (really re-indexing / backfilling), and diagnostics tooling for the ChainIndexer implemented in PR #12450, and is part of the work for #12453. This tooling takes the form of both RPC APIs on the daemon and lotus-shed CLI commands.

Re-indexing Process

The re-indexing tool enables clients to index their entire existing ChainState in the ChainIndexer. This process is necessary due to the removal of the existing MsgIndex, EthTxIndex, and EventIndex from Lotus.

Why Re-index Instead of Migrate?

We've chosen to re-index rather than migrate data from existing indices for two primary reasons:

  1. Known issues: The existing indices have multiple known problems, and migrating could perpetuate incorrect index entries.
  2. Lack of garbage collection: Existing indices contain many entries for which the corresponding tipset messages/events no longer exist in the ChainStore due to splitstore GC.

Instead, we're re-indexing the Chainstore/Chainstate on the node into the ChainIndexer. This ensures that all re-indexed entries have gone through the indexing logic of the new ChainIndexer and that the Index is in sync/reflects the actual contents of the Chainstore/Chainstate post re-indexing.

Diagnostics Tooling

This PR introduces diagnostic tools for detecting corrupt Index entries at specific epochs or epoch ranges.

While this PR implements functionality for optionally backfilling missing Index entries, it does not yet include the capability to "repair" corrupted Indexed entries. The repair functionality will be introduced in a subsequent PR. This approach allows us to first gather and analyze user reports, helping us understand the types and causes of corrupted Indexed entries(and if all they exist in the new ChainIndexer) before implementing repair mechanisms.

Core API

The fundamental building block for this tooling is the following RPC API:

type IndexValidation struct {
	TipsetKey string
	Height    uint64

	TotalMessages  uint64
	TotalEvents    uint64
	EventsReverted bool

	Backfilled bool
}

func (si *SqliteIndexer) ChainValidateIndex(ctx context.Context, epoch abi.ChainEpoch, backfill bool) (*types.IndexValidation, error)

This API has the following features:

  • Optionally backfills the Index with a tipset on the canonical chain for the given epoch if it is absent in the Index
  • Returns some aggregated stats for an indexed entry for diagnostics/inspection
  • Reports errors/corrupted indexed entries at the given epoch. Forms of Index corruption that can be diagnosed includes:
    • Presence of multiple non-reverted tipsets at the given epoch
    • Complete absence of a non-reverted tipset at the given epoch that does contain reverted tipsets
    • Mismatch between the Chainstore state and the Indexed entries (tipset messages/events)
    • Incorrect Indexing of null rounds at the given epoch

lotus-shed CLI tooling

The lotus-shed CLI tooling for both re-indexing/backfilling and diagnostics can then invoke this RPC API over epoch ranges. The corresponding lotus-shed backfill index [from, to] and lotus-shed inspect index [from, to] can then backfill/inspect/diagnose the Index for the given epoch ranges.

TODO

  • automated tests

Copy link

@github-actions github-actions bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

chainindex/api.go Outdated Show resolved Hide resolved
return &types.IndexValidation{
TipsetKey: ts.Key().String(),
Height: uint64(ts.Height()),
Backfilled: true,
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fill out the other fields here using SQL queries.

chain/types/tipset.go Outdated Show resolved Hide resolved
chainindex/indexer.go Outdated Show resolved Hide resolved
chainindex/indexer.go Outdated Show resolved Hide resolved
chainindex/api.go Outdated Show resolved Hide resolved
chainindex/api.go Outdated Show resolved Hide resolved
chainindex/api.go Outdated Show resolved Hide resolved
chainindex/ddls.go Outdated Show resolved Hide resolved
chainindex/ddls.go Outdated Show resolved Hide resolved
github-actions[bot]

This comment was marked as duplicate.

@aarshkshah1992 aarshkshah1992 changed the title [WIP] Chain Index Validation API Chain Index Validation API Sep 12, 2024
github-actions[bot]

This comment was marked as duplicate.

@aarshkshah1992
Copy link
Contributor Author

@rvagg Would be great to have a first round of review here when you get the time.

github-actions[bot]

This comment was marked as duplicate.

github-actions[bot]

This comment was marked as duplicate.

github-actions[bot]

This comment was marked as duplicate.

@aarshkshah1992 aarshkshah1992 changed the title Chain Index Validation API Migration and Diasnostics tooling for the ChainIndexer Sep 12, 2024
Copy link

@github-actions github-actions bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link

@github-actions github-actions bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@aarshkshah1992 aarshkshah1992 changed the title Migration and Diasnostics tooling for the ChainIndexer Migration("Re-Indexing") and Inspection tooling for the ChainIndexer Sep 12, 2024
github-actions[bot]

This comment was marked as duplicate.

github-actions[bot]

This comment was marked as duplicate.

@aarshkshah1992 aarshkshah1992 changed the title Migration("Re-Indexing") and Inspection tooling for the ChainIndexer Migration("Re-Indexing") and Diasgnostics tooling for the ChainIndexer Sep 12, 2024
github-actions[bot]

This comment was marked as duplicate.

@aarshkshah1992 aarshkshah1992 self-assigned this Sep 12, 2024
github-actions[bot]

This comment was marked as duplicate.

@aarshkshah1992 aarshkshah1992 changed the title Migration("Re-Indexing") and Diasgnostics tooling for the ChainIndexer feat: migration("re-indexing") and diasgnostics tooling for the ChainIndexer Sep 12, 2024
aarshkshah1992 and others added 2 commits September 24, 2024 11:25
Suggestions from Steve's read of the user doc.

Co-authored-by: Steve Loeppky <[email protected]>
CHANGELOG.md Outdated Show resolved Hide resolved

#### The `ChainValidateIndex` JSON RPC API

Please refer to the [Lotus API documentation](https://github.com/filecoin-project/lotus/blob/master/documentation/en/api-v1-unstable-methods.md) for detailed documentation of the `ChainValidateIndex` JSON RPC API.
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Link will have updated docs for the ChainIndexer RPC once we merge this to master.

@aarshkshah1992
Copy link
Contributor Author

Okay, starting work on the automated tests as it is the last remaining dev task for this workstream.

chain/index/chain-indexing-overview-for-rpc-providers.MD Outdated Show resolved Hide resolved
CHANGELOG.md Outdated Show resolved Hide resolved
chain/index/chain-indexing-overview-for-rpc-providers.MD Outdated Show resolved Hide resolved
chain/index/README.MD Outdated Show resolved Hide resolved
Copy link
Member

@BigLep BigLep left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great docs updates - thanks.

I read the updated docs and left some comments. I also went through everything yesterday and resolved any conversation that I saw had been incorporated. I think anything still open is still relevant.

The big feedback idea/item from this revision is whether we want to write the chainindexer sqllite state to a different directory since that simplifies the migration and rollback story.

I can meet in office on 2024-09-25 Pacific morning if that helps for closing anything out as I don't want to drag this out on you.

chain/index/chain-indexing-overview-for-rpc-providers.MD Outdated Show resolved Hide resolved
chain/index/chain-indexing-overview-for-rpc-providers.MD Outdated Show resolved Hide resolved
chain/index/chain-indexing-overview-for-rpc-providers.MD Outdated Show resolved Hide resolved
Comment on lines 96 to 111
2. **Backup Existing Index Databases**
- Before restarting your Lotus node, back up the directory containing your existing index databases (`MsgIndex`, `EthTxIndex`, and `EventIndex`).
- These databases are located in the `{$LOTUS_PATH/sqlite}` directory.
- Use the following command to copy the entire directory:
```bash
cp -r $LOTUS_PATH/sqlite {destination_path}
```
- **Note: If you have configured a custom directory path for the Index databases using the `Events.DatabasePath` config option, replace `{$LOTUS_PATH/sqlite}` with your custom path.**
- These backups are essential for potential rollbacks, even though they are not used in the migration process.

3. **Remove Old Index Files**
- After creating backups, remove the `{$LOTUS_PATH/sqlite}` directory (*or your custom index database path*) using the following command:
```bash
rm -rf $LOTUS_PATH/sqlite
```
- **Warning: Please ensure and validate that you have made backups of your existing index databases before removing the directory.**
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
2. **Backup Existing Index Databases**
- Before restarting your Lotus node, back up the directory containing your existing index databases (`MsgIndex`, `EthTxIndex`, and `EventIndex`).
- These databases are located in the `{$LOTUS_PATH/sqlite}` directory.
- Use the following command to copy the entire directory:
```bash
cp -r $LOTUS_PATH/sqlite {destination_path}
```
- **Note: If you have configured a custom directory path for the Index databases using the `Events.DatabasePath` config option, replace `{$LOTUS_PATH/sqlite}` with your custom path.**
- These backups are essential for potential rollbacks, even though they are not used in the migration process.
3. **Remove Old Index Files**
- After creating backups, remove the `{$LOTUS_PATH/sqlite}` directory (*or your custom index database path*) using the following command:
```bash
rm -rf $LOTUS_PATH/sqlite
```
- **Warning: Please ensure and validate that you have made backups of your existing index databases before removing the directory.**
2. **Backup and Remove Existing Index Databases**
- The backup is accomplished by moving the legacy `{$LOTUS_PATH/sqlite}` directory so the existing index databases (`MsgIndex`, `EthTxIndex`, and `EventIndex`) don't get overwritten.
- Use the following command to mv the entire directory:
```bash
mv $LOTUS_PATH/sqlite $LOTUS_PATH/sqlite-pre-chainindexer
```
- **Note: If you had configured a custom directory path for the Index databases using the `Events.DatabasePath` config option, replace `{$LOTUS_PATH/sqlite}` with your custom path.**
- These old indexes are essential for potential rollbacks, even though they are not used in the migration process.

I was realizing we can hit two birds with one stone here: moving accomplishes the backup and the "removal".

That said, we could simplify this further if chaindindexer didn't use $LOTUS_PATH/sqlite. What if instead we wrote to $LOTUS_PATH/chainindex ? (I don't have access to a full lotus node to come up with other ideas. ). I assume we don't want to write under chain even though these indices are related to the chain. ChatGPT tells me there is a indices directory on full nodes, but I don't know what that holds

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is done.

In case you need to rollback to the previous indexing system (`EthTxIndex`, `MsgIndex`, and `EventIndex`), follow these steps:
1. Stop your Lotus node.
2. Remove the current `${LOTUS_PATH}/sqlite` directory and replace it with the backup taken in the "**Backup Existing Index Databases**" section of the [Migration Guide](#migration-guide).
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
2. Remove the current `${LOTUS_PATH}/sqlite` directory and replace it with the backup taken in the "**Backup Existing Index Databases**" section of the [Migration Guide](#migration-guide).
2. Replace the current `${LOTUS_PATH}/sqlite` with the backup taken in the "**Backup Existing Index Databases**" section of the [Migration Guide](#migration-guide).
```bash
mv $LOTUS_PATH/sqlite-pre-chainindexer $LOTUS_PATH/sqlite
```

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Related to previous discussion, this step goes away if we just write chainindexer sql state to somewhere else.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would love to know what @rvagg thinks about this. Writing the chainindexer sql state to another directory makes sense to me as it does simplify migration ops.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@aarshkshah1992 : @rvagg and I spoke verbally on 2024-09-25 (my time) and he was supportive of this idea of writing chainindexer sqllite in a different directory.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@BigLep This was a great idea and is now done. These steps look much cleaner now. Please take a look.

chain/index/chain-indexing-overview-for-rpc-providers.MD Outdated Show resolved Hide resolved
api/api_full.go Outdated
// - error: An error object if the validation/backfill fails. The error message will contain details about the index
// corruption if the call fails because of an incosistency between indexed data and the actual chain state.
//
// Note: The API returns an error if the index does not have data for the specified epoch and backfill is set to false.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Inline this into your documentation above about "error'?

api/api_full.go Outdated
Comment on lines 69 to 70
// IndexValidation contains detailed information about the validation status of a specific chain epoch.
//type IndexValidation struct {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since this is docs about IndexValidation, should these comments be moved to where IndexValidation is defined?

api/api_full.go Show resolved Hide resolved
@aarshkshah1992
Copy link
Contributor Author

@BigLep Have addressed your second round of review on the RPC user doc.

CHANGELOG.md Outdated Show resolved Hide resolved
In case you need to rollback to the previous indexing system (`EthTxIndex`, `MsgIndex`, and `EventIndex`), follow these steps:
1. Stop your Lotus node.
2. Remove the current `${LOTUS_PATH}/sqlite` directory and replace it with the backup taken in the "**Backup Existing Index Databases**" section of the [Migration Guide](#migration-guide).
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@aarshkshah1992 : @rvagg and I spoke verbally on 2024-09-25 (my time) and he was supportive of this idea of writing chainindexer sqllite in a different directory.

Copy link
Member

@BigLep BigLep left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

From a docs regard, things look good to me. I left a couple of small suggestions.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
Status: ⌨️ In Progress
Development

Successfully merging this pull request may close these issues.

5 participants