feat(import): add support for multiple hbase snapshot imports#4600
feat(import): add support for multiple hbase snapshot imports#4600tianlei2 wants to merge 68 commits into
Conversation
f5324bd to
f8b5932
Compare
13f65dc to
a3a6c7f
Compare
…sjava SPI conflict
12f1c11 to
d511a61
Compare
d511a61 to
5ec8dc1
Compare
|
/gemini review |
…m/google/cloud/bigtable/beam/hbasesnapshots/HBaseSnapshotRestoreTool.java Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
…gtableMutationsFn
…ce ImportConfigTest
| importConfig.getSourcepath(), | ||
| importConfig.getRestorepath()); | ||
| DataflowPipelineDebugOptions debugOptions = options.as(DataflowPipelineDebugOptions.class); | ||
| debugOptions.setGCThrashingPercentagePerPeriod(100.00); |
|
/gemini review |
…Config#validate() and remove duplicate checks from callers
…nterException when sourcepath is null
|
/gemini review |
…s for offline restore tool, and add unit tests
| LOG.info( | ||
| String.format( | ||
| "Restore path %s already exists, deleting it for idempotency", restorePath)); | ||
| fileSystem.delete(restorePath, true); |
There was a problem hiding this comment.
I understand why we do this. But the CL introduces many ways to run concurrent jobs, we need to make sure that in no case can the parallel shards delete the restored path.
I think we can do 2 things,
- Document this contract explicitly and loudly (Run the restore step first, only after completion lauch the shards).
- Make sure that this delete does not happen when skipRestore is set. The current code does not reach here when skipResotre is set but it sets the invariant that we don't want to delete any restore path when skipRestore is set.
@mutianf what do you think?
|
|
||
| /** Tests the {@link CreateBigtableMutationsFn} functionality. */ | ||
| @RunWith(JUnit4.class) | ||
| public class CreateBigtableMutationsFnTest { |
There was a problem hiding this comment.
No assertion on the silent fully-dropped-row case. When every cell in a row exceeds filterLargeCellThresholdBytes, the method returns an empty list and emits nothing — but only droppedCells is incremented, never droppedRows. No test pins this behavior, and it's arguably a bug.
| SnapshotConfig snapshotConfig = SnapshotTestHelper.newSnapshotConfig("my-restore-path"); | ||
|
|
||
| TableName tableName = TableName.valueOf("my-table"); | ||
| RegionInfo regionInfo = RegionInfoBuilder.newBuilder(tableName).build(); |
There was a problem hiding this comment.
We are only roundtripping with empty regions with "" start and end keys. Lets test both empty keys and populated keys. We can have start and end keys that are raw bytes, right?
There was a problem hiding this comment.
shading is under-documented (the highest long-term cost). The revert to shaded hbase-shaded-mapreduce, the hbase-shaded-client exclusion, and especially the new shade exclude of META-INF/services/java.net.spi.InetAddressResolverProvider are all there to defeat the dnsjava SPI/LiteralByteString conflict — but none carry an inline comment saying so. The next person who bumps HBase or Beam will "clean these up" and reintroduce the NoClassDefFoundError. Add a comment at each of those three spots referencing the conflict. The provided-scope annotations are now justified in-thread and fine.
Shading workarounds need inline rationale: pom.xml — three separate comments: (a) the hbase-shaded-client , (b) the META-INF/services/java.net.spi.InetAddressResolverProvider shade , (c) the hbase-shaded-mapreduce dependency.
Thank you for opening a Pull Request! Before submitting your PR, there are a few things you can do to make sure it goes smoothly:
b/429250716
This is the first PR that incorporates changes from https://github.com/jhambleton/java-bigtable-hbase/commits/dataflow-v2-v2.15.6 and some fixes to make it pass the tests.
SnapshotUtilsTest.testGetHbaseConfiguration was failing because the static configuration field SnapshotUtils.hbaseConfiguration cached state between test cases, leaking stale data into subsequent tests.
SnapshotUtilsTest.testAppendCurrentTimestamp was throwing a NumberFormatException because the return value contained a UUID suffix (timestamp-UUID), but the test attempted to parse the entire string directly as a Long.
Integration tests failed on Java 8 and 11 in Kokoro because of unshaded transitive dependency conflicts (com.google.protobuf.LiteralByteString NoClassDefFoundError).
Several useful unit tests were commented out in ImportJobFromHbaseSnapshotTest because mockito-core lacked the ability to mock static methods.
Switched from mockito-core to mockito-inline in the pom.xml to allow static mocking.
Uncommented the code and restored the original formatting to prevent any lint errors, enabling JUnit to verify correct configuration parsing.