Add support for SplitBlob and SpliceBlob methods#345
Conversation
17fa168 to
5ef05b3
Compare
| _, err = ba.contentAddressableStorageClient.SpliceBlob(ctx, &remoteexecution.SpliceBlobRequest{ | ||
| InstanceName: digestFunction.GetInstanceName().String(), | ||
| DigestFunction: digestFunction.GetEnumValue(), | ||
| ChunkDigests: splitBlobResponse.(*remoteexecution.SplitBlobResponse).GetChunkDigests(), |
There was a problem hiding this comment.
Shouldn't this also set ChunkingFunction?
| func (contentAddressableStorageServer) SpliceBlob(ctx context.Context, in *remoteexecution.SpliceBlobRequest) (*remoteexecution.SpliceBlobResponse, error) { | ||
| return nil, status.Error(codes.Unimplemented, "This service does not support splicing blobs") | ||
| func (s *contentAddressableStorageServer) SpliceBlob(ctx context.Context, in *remoteexecution.SpliceBlobRequest) (*remoteexecution.SpliceBlobResponse, error) { | ||
| if s.chunkListStorage == nil { |
There was a problem hiding this comment.
Instead of doing this, why don't we require that ChunkListStorage is non-nil, and that an ErrorBlobAccess is passed in?
There was a problem hiding this comment.
We could but then the ContentAddressableStorageServer would be unable to behave differently when the ChunkListStorage is not defined. If we want the FindMissingBlobs implementation from below then we need to know when we need to verify chunk list storage and when we do not.
| } | ||
|
|
||
| func (clsBlobAccess) FindMissing(ctx context.Context, digests digest.Set) (digest.Set, error) { | ||
| return digest.EmptySet, status.Error(codes.Unimplemented, "Chunk Mapping does not support bulk existence checking") |
There was a problem hiding this comment.
I don't think that this is correct. This should likely issue a call against FindMissingBlobs() to make sure all ChunkLists exist and are touched.
There was a problem hiding this comment.
The chunk lists are stored in the CLS rather than the CAS and we don't have an api method to request bulk existence checking. We could do it one by one for blobs > maxChunkSize (requiring FindMissing to also to call GetCapabilities) or we could add a custom api for bulk existence checking chunk lists.
| } | ||
| } | ||
|
|
||
| func (s *contentAddressableStorageServer) FindMissingBlobs(ctx context.Context, in *remoteexecution.FindMissingBlobsRequest) (*remoteexecution.FindMissingBlobsResponse, error) { |
There was a problem hiding this comment.
Shouldn't this also be patched up to call chunkListStorage.FindMissing()?
There was a problem hiding this comment.
It makes sense to renew the lifetime of the chunk lists when calling FMB for their blob but if we replace the nil CLS with an ErrorBlobAccess how would we know which mode of operation to use?
|
|
||
| // NewChunkingProvider decorates a capabilities provider with support | ||
| // for the split and splice blob api with chunking parameters. | ||
| func NewChunkingProvider(base Provider, chunkingParameters *remoteexecution.RepMaxCdcParams) Provider { |
There was a problem hiding this comment.
Why can't we use StaticProvider in combination with MergingProvider for this?
There was a problem hiding this comment.
I don't see why not, I'll add it
| // store the chunk list. | ||
| blobReader := ba.contentAddressableStorage.Get(ctx, d).ToReader() | ||
| defer blobReader.Close() | ||
| chunker := NewReaderChunker(d.GetDigestFunction(), blobReader, int64(params.MinChunkSizeBytes), int64(params.HorizonSizeBytes)) |
There was a problem hiding this comment.
I think we need to have some safety belts in place that these parameters are within bounds. For example, MinChunkSizeBytes needs to be at least 64, as the Gear hashing doesn't work otherwise.
There was a problem hiding this comment.
Sure, I'll add the requirement when getting the capabilities above, erroring if minChunkSizeBytes is outside of [64, maxMessageSizeBytes/2]
|
|
||
| // fakeBlobAccess provides a thread-safe, in-memory BlobAccess for | ||
| // testing. | ||
| type fakeBlobAccess struct { |
There was a problem hiding this comment.
Can't we just use gomock for consistency with most of the other tests in Buildbarn?
There was a problem hiding this comment.
Eh we sort of could but it becomes noisy, for some of the tests a straight up EXPECT().Return() construct flows well but for the more complicated tests such as the ones where we want to verify chunking and that files have been properly touched it becomes very noisy.
For those tests we could use .EXPECT().DoAndReturn().AnyTimes() to implement pretty much the same thing as in fakeBlobAccess already does but that would basically be the same thing as fakeBlobAccess just with lambdas instead of functions.
If you prefer that anyway I can modify the code such that it uses gomock for consistency but at least in my mind fakeBlobAccess was easier to understand.
| if err != nil { | ||
| return util.StatusWrap(err, "Failed to create Chunk Map") | ||
| } | ||
| chunkListStorage = authorizedBackend |
There was a problem hiding this comment.
I would have imagined that this added a capabilities provider to cacheCapabilitiesAuthorizers. Why doesn't t do that?
There was a problem hiding this comment.
I don't need to add it because Capabilities are propagated from the CAS and not from the CLS. Do you want us to propagate the parameters from the CLS instead?
15ba5de to
c20835c
Compare
This commit adds support for the SplitBlob and SpliceBlob methods from the Remote Execution v2 (REv2) api. SplitBlob and SpliceBlob can be used to facilitate uploads and downloads of large files but a naïve implementation like this has some major drawbacks as well. The blobs must exist in both their chunked and non chunked form, which may significantly increase storage requirements for large blobs. The protocol gives no guarantee that a large blob stored in the CAS exists in its chunked form which forces you to perform a fairly heavy Split call that loads the entire large blob in order to decomposition it into its chunks. This implementation mostly exists as a stepping stone for a different implementation where Buildbarn internally manages all blobs as chunked blobs.
c20835c to
e8f71eb
Compare
343d61a to
9c637a9
Compare
This commit adds support for the SplitBlob and SpliceBlob methods from the Remote Execution v2 (REv2) api. SplitBlob and SpliceBlob can be used to facilitate uploads and downloads of large files but a naïve implementation like this has some major drawbacks as well.
The blobs must exist in both their chunked and non chunked form, which may significantly increase storage requirements for large blobs. The protocol gives no guarantee that a large blob stored in the CAS exists in its chunked form which forces you to perform a fairly heavy Split call that loads the entire large blob in order to decomposition it into its chunks.
This implementation mostly exists as a stepping stone for a different implementation where Buildbarn internally manages all blobs as chunked blobs.