feat(import): add script tool for multiple hbase snapshot imports)#4606
feat(import): add script tool for multiple hbase snapshot imports)#4606tianlei2 wants to merge 20 commits into
Conversation
…sjava SPI conflict
…bFromHbaseSnapshotTest
3086725 to
958c187
Compare
40e4a02 to
af3976d
Compare
| END_SHARD=$2 | ||
|
|
||
| # Configurations (Uses environment variables if set, otherwise defaults) | ||
| export PROJECT_ID="${PROJECT_ID:-google.com:cloud-bigtable-dev}" |
| export BUCKET="${BUCKET:-tianlei-beam-test-bucket}" | ||
| export REGION="${REGION:-us-central1}" | ||
|
|
||
| export TABLE_NAME="${TABLE_NAME:-validation_test}" |
There was a problem hiding this comment.
where is the doc that shows what are available env variables and what is expected to be set there?
af3976d to
1c83a18
Compare
| # We skip restore for all shards if running via --all because Step 1 handled it. | ||
| # If running manually via ranges, shard 0 will perform restore. | ||
| SKIP_RESTORE="true" | ||
| if [ $i -eq 0 ]; then |
There was a problem hiding this comment.
How does this work? If you have --all and have alredy restored the snaphsot, won't this re-restore?
| --serviceAccount=${SERVICE_ACCOUNT} \ | ||
| --usePublicIps=false \ | ||
| --enableSnappy=true \ | ||
| --skipRestoreStep=${SKIP_RESTORE} \ |
There was a problem hiding this comment.
This is good, but how are we passing the restorePath?
There was a problem hiding this comment.
I feel like we should have a custom restore path and the script (idempoent by adding timestamp etc) and use it for restore the path and pass it as the restorepath in every job.
Also, with this model, who cleans up the restore path? is there a way to trigger a cleanup at the end of the script? or we have a tool that can be used? We can also say its a manual step. but then this script should output something to the tune of "the snapshot was imported, please cleanup $RESTORE_PATH once validation succeeds."
|
|
||
| Example for manual parallel execution: | ||
| ```bash | ||
| ./run-snapshot-import.sh 0 3 & # Run this first! |
There was a problem hiding this comment.
In this model, how does the shard1. know htat it has to wait for restore to finish from shard 0? won't it run and fail becuase restore path is empty?
Thank you for opening a Pull Request! Before submitting your PR, there are a few things you can do to make sure it goes smoothly:
b/429250716
This is the second PR with the shell tool script. This a child of #4600, so the only file that needs to be reviewed is the script itself (bigtable-dataflow-parent/bigtable-beam-import/run-snapshot-import.sh). If this looks ok, I will merge this with the parent PR.
The script is originally from https://docs.google.com/document/d/1T7BQF-AYY8xGbdbhwkV7zPA11A_Fh5SRjA_7nkzeTos/edit?tab=t.twzz7rrh6jz7#heading=h.m7p3xy7m76u2