Skip to content

Aliyun: Improve OSSFileIO read performance by fixing close() bug and implementing RangeReadable (#16863)#16865

Open
liuliquan-marshal wants to merge 2 commits into
apache:mainfrom
liuliquan-marshal:oss-inputstream
Open

Aliyun: Improve OSSFileIO read performance by fixing close() bug and implementing RangeReadable (#16863)#16865
liuliquan-marshal wants to merge 2 commits into
apache:mainfrom
liuliquan-marshal:oss-inputstream

Conversation

@liuliquan-marshal

Copy link
Copy Markdown
Contributor

see step 2 in #16863

code change summary

  1. Fix close() performance bug to avoid reading massive unnecessary data.
  2. Implement RangeReadable interface. Random read will invoke ReadFully rather than seeking and reading now.
  3. Add more OSSInputStream unit tests. Code coverage of OSSInputStream.java is 96% now.

performance compare

seek and read

BEFORE

Benchmark                   (bufferSizeKB)                            (fileIOClass)  (fileSizeKB)  Mode  Cnt      Score        Error  Units
FileIOBenchmark.randomRead            1024       org.apache.iceberg.aws.s3.S3FileIO        131072  avgt    4   1817.108 ±    37.337  ms/op
FileIOBenchmark.randomRead            1024  org.apache.iceberg.aliyun.oss.OSSFileIO        131072  avgt    5  27164.064 ± 24437.452  ms/op

AFTER

Benchmark                   (bufferSizeKB)                            (fileIOClass)  (fileSizeKB)  Mode  Cnt     Score     Error  Units
FileIOBenchmark.randomRead            1024  org.apache.iceberg.aliyun.oss.OSSFileIO        131072  avgt    5  1974.035 ± 358.189  ms/op

range read

BEFORE

Benchmark                  (bufferSizeKB)                       (fileIOClass)  (fileSizeKB)  Mode  Cnt     Score    Error  Units
FileIOBenchmark.rangeRead            1024  org.apache.iceberg.aws.s3.S3FileIO        131072  avgt    5  1595.129 ± 84.628  ms/op
FileIOBenchmark.rangeRead            1024       org.apache.iceberg.aliyun.oss.OSSFileIO         131072  avgt    4   (not supported)

AFTER

Benchmark                  (bufferSizeKB)                            (fileIOClass)  (fileSizeKB)  Mode  Cnt     Score     Error  Units
FileIOBenchmark.rangeRead            1024  org.apache.iceberg.aliyun.oss.OSSFileIO        131072  avgt    5  1481.623 ± 254.501  ms/op

Bad performance has been fixed.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant