This article is co-authored with a generative AI. Facts have been cross-checked against official documentation where possible, but errors may remain. Please verify against primary sources before making any important decisions.

Target and architecture

This article covers a setup for delivering a high-resolution image of roughly 49000×28000 px — about 450MB as a Pyramid TIFF (PTIF) — through the IIIF (International Image Interoperability Framework) Image API. The component used is Samvera serverless-iiif, deployed from the AWS Serverless Application Repository (SAR).

The overall architecture is as follows.

[Viewer (Mirador, etc.)] → [CloudFront]
                             ├── cache Hit  ─→ return as-is
                             └── cache Miss ─→ [Lambda (Serverless IIIF)]
                                                  └─→ [PTIF on S3]
  • Lambda is fronted by an AWS Lambda Function URL, with CloudFront in front of that
  • The PTIF is stored as an S3 object
  • Everything runs in ap-northeast-1 (Tokyo region), so cross-region latency is out of scope here

It worked initially with the default configuration, but as the image size grew, rendering timeouts and 429 responses appeared, so adjustments were made from several angles. They are described in order below.

PTIF encoding settings

When a PTIF was created from a JPEG source with vips tiffsave, the output PTIF (1.30GB) ended up nearly the same size as the source JPEG (1.37GB), and the expected reduction was not obtained. Compared with another image delivered through the same setup (a 230MB PTIF created from an uncompressed TIFF), the bytes-per-pixel ratio differed by roughly a factor of 6.

The command used at first was as follows.

vips tiffsave input.jpg output.tif --tile --pyramid --compression jpeg \
  --Q 90 --tile-width 256 --tile-height 256

Comparing the internal structure with tiffinfo showed a difference in the Photometric Interpretation.

$ tiffinfo small.tif   # 230MB
  Photometric Interpretation: YCbCr      # 4:2:0 chroma subsampling applies

$ tiffinfo large.tif   # 1.3GB
  Photometric Interpretation: RGB color  # stored as full-resolution RGB

When saved as YCbCr, 4:2:0 chroma subsampling applies and the chroma signal is compressed to a quarter. When left as RGB, all three channels stay at full resolution.

The branch on the Q value

The table below shows results from cropping an 8192×8192 sample and varying the Q value and tile size.

QtilePhotometricsize
90256RGB75.9 MB
90512RGB75.8 MB
85256YCbCr21.0 MB
85512YCbCr20.9 MB

The jpegsave family in libvips has a subsample_mode parameter that defaults to auto. The behavior appears to be that chroma subsampling is disabled when Q is 90 or above, and YCbCr 4:2:0 subsampling is applied below that.

As far as could be checked, the tiffsave CLI has no option to override subsample_mode directly (confirmed with libvips 8.18), so setting Q to 89 or below became the practical approach.

The command adopted

vips tiffsave input.jpg output.tif --tile --pyramid --compression jpeg \
  --Q 85 --tile-width 512 --tile-height 512 --strip
  • --Q 85: keeping it below 90 causes YCbCr 4:2:0 to be selected
  • --tile-width 512 --tile-height 512: the tile count drops to a quarter, and the TIFF IFD (Image File Directory, the structure that serves as the file's internal table of contents) becomes smaller as well
  • --strip: removes the XMP history and ICC profile left over from Photoshop

With this change, the three target files went from 4,232 MB to 1,158 MB in total (about a 3.7× reduction).

Lambda timeout

Even after shrinking the PTIF, rendering sometimes did not finish within the time limit, and responses like the following were returned.

$ curl '.../full/!1024,1024/0/default.jpg'
{"errorType":"Sandbox.Timedout",
 "errorMessage":"RequestId: xxx Error: Task timed out after 10.00 seconds"}

Sandbox.Timedout indicates that the Lambda timeout was reached. The default is 10 seconds, and an initial large render can exceed that value.

CloudFront received this response as 200 OK + application/octet-stream and kept serving it from cache.

The adjustment

The maximum is 30 seconds when going through API Gateway, and aligning the Lambda Function URL path to around 30 seconds is also easier to operate, so the change below was made.

aws lambda update-function-configuration \
  --function-name serverlessrepo-serverless-iiif-apne1-IiifFunction-xxx \
  --timeout 30

Memory was also raised to 10240 MB as a first move, but that turned out to be excessive, as described later.

S3 object metadata and dimension retrieval

Even after extending the timeout, rendering remained slow, and the log showed the following warning on every request.

WARN: Unable to get dimensions for <object-key> 
      using custom function. Falling back to sharp.metadata().

Looking at the Lambda Duration, it clustered around 5000 ms per request, and there was almost no change when memory was raised from 3008 MB to 10240 MB. Since processing time did not shrink even with more CPU allocation, the assumption was that some initialization step, rather than the image processing itself, was dominant, so the Lambda code was inspected.

dimensionRetriever inside the Lambda

Retrieving the Lambda function code as a zip and reading it showed that the dimensionRetriever function has a path that retrieves dimensions from S3 object metadata.

var dimensionRetriever = async (location) => {
  const s3 = new S3Client({});
  const cmd = new HeadObjectCommand(location);
  const response = await s3.send(cmd);
  const { Metadata } = response;
  if (Metadata?.width && Metadata?.height)
    return calculateDimensions(Metadata);
  return null;  // null here falls back to sharp.metadata()
};

It returns immediately if width and height are present in the S3 object metadata, and falls back to sharp.metadata() — re-reading the PTIF's IFD — when they are not.

The Image Metadata documentation for Samvera serverless-iiif also describes the metadata mechanism. The official repository includes a generation script, npm run create-metadata, and the design appears to assume its use in operation.

Setting the metadata

Dimensions were obtained with vips and applied with an in-place update via aws s3api copy-object.

SRC=ptif/example.tif
KEY=images/example.tif
BUCKET=my-iiif-bucket

W=$(vipsheader -f width   "$SRC")
H=$(vipsheader -f height  "$SRC")
P=$(vipsheader -f n-pages "$SRC")

aws s3api copy-object \
  --bucket "$BUCKET" --key "$KEY" \
  --copy-source "$BUCKET/$KEY" \
  --metadata-directive REPLACE \
  --metadata "width=$W,height=$H,pages=$P" \
  --content-type image/tiff

The measured difference

The times below were measured with and without metadata, under the same Lambda settings (2048 MB of memory).

Operationwithout metadatawith metadatareduction
info.json4.31s0.05s−4.26s
tile 256→2568.69s6.77s−1.92s
region 2048→5129.26s6.49s−2.77s
full→10249.71s6.52s−3.19s

The reduction for info.json stands out. A reduction of 2–3 seconds was also seen on the image rendering side.

Why the reduction on the rendering side is smaller

Without metadata, the PTIF's IFD is read twice — once by sharp.metadata() and once by the rendering itself. With metadata, dimension retrieval completes via the S3 HEAD response, leaving only the single IFD read that sharp performs when it opens the PTIF for rendering. Since the info.json response involves no rendering, having metadata lets it complete with an S3 HEAD, and the response time shrinks substantially.

Revisiting Lambda memory

Lambda memory settings scale with CPU allocation, so a compute-heavy task would be expected to run faster with more memory. In this case, working from the hypothesis that S3 data retrieval, rather than CPU, was dominant, memory was lowered and tested.

Output10240 MB2048 MB
info.json0.04-0.27s0.12-0.27s
tile 256→2565.64s6.87s
region 512→2565.53s5.22s
region 2048→5126.36s6.31s
region 4096→10246.17s6.44s
full→5126.33s6.36s

The user-visible curl times are nearly unchanged. Looking at the CloudWatch Duration metric, it was 1.3–3.6 seconds at 2048 MB and 5.5–6.3 seconds at 10240 MB, and there were cases where lowering the memory produced a shorter internal time. One possibility is that sharp's internal thread count increases with CPU allocation and causes contention (this was not verified).

Max Memory Used measured around 600 MB, so 2048 MB left ample headroom.

Lambda concurrency limit and 429 responses

At a certain point, tiles began to be partially missing in the viewer, and inspecting the response showed headers like the following.

HTTP/2 429 
x-amzn-errortype: TooManyRequestsException
x-cache:          Error from cloudfront
content-type:     application/json

x-amzn-errortype: TooManyRequestsException is the header attached when the Lambda concurrency limit is reached.

Checking the configured value

$ aws lambda get-function-concurrency --function-name xxx
{
    "ReservedConcurrentExecutions": 50
}

ReservedConcurrentExecutions was 50, which was the direct cause. Given that a single request takes around 5–7 seconds, the theoretical throughput is 50 / 5 ≒ 10 req/s. For a viewer like Mirador, which requests many tiles in parallel the moment a map is opened, this is a value that is easy to hit.

Looking at recent throttle counts, throttling occurred on the order of hundreds to thousands of events during periods of concentrated use.

The change made

The account-wide concurrent quota is 1000, and no other function was consuming reserved concurrency, so it was raised to 200.

aws lambda put-function-concurrency \
  --function-name xxx \
  --reserved-concurrent-executions 200

After the change, throttling no longer occurred even when concurrent executions exceeded 100.

Extending the CloudFront cache TTL

Because the Lambda response carries no Cache-Control header, CloudFront was expiring the cache after the 24-hour default of the managed CachingOptimized policy. Since an IIIF tile response does not change in content as long as the identifier and coordinates are the same, extending the Time To Live (TTL) to one year was judged to cause no operational issue.

Creating a custom policy

{
  "Name": "iiif-long-cache",
  "DefaultTTL": 31536000,
  "MaxTTL": 31536000,
  "MinTTL": 86400,
  "ParametersInCacheKeyAndForwardedToOrigin": {
    "EnableAcceptEncodingGzip": true,
    "EnableAcceptEncodingBrotli": true,
    "HeadersConfig": {"HeaderBehavior": "none"},
    "CookiesConfig": {"CookieBehavior": "none"},
    "QueryStringsConfig": {"QueryStringBehavior": "none"}
  }
}
aws cloudfront create-cache-policy --cache-policy-config file://policy.json
# Set the returned ID as the Distribution's DefaultCacheBehavior.CachePolicyId

Once generated, a tile is delivered from the CloudFront edge in around 50 ms. Since Lambda is not invoked, this is also favorable in terms of cost.

When a PTIF is replaced, an Invalidation for the corresponding identifier needs to be run.

aws cloudfront create-invalidation \
  --distribution-id XXX \
  --paths "/iiif/2/<encoded-identifier>*"

The relationship between file size and rendering time

After all adjustments were in place, PTIFs of different sizes were measured in the state of metadata present, memory 2048 MB, and concurrency 200.

ImageFilePixelsinfo.jsonregion 2048→512
A (long scroll)151 MB126 Mpix0.43s2.41s
B (long scroll)182 MB224 Mpix0.14s3.15s
C (large-format)219 MB1,317 Mpix0.13s2.64s
D (large-format)342 MB1,208 Mpix0.14s4.09s
E (large-format)362 MB1,360 Mpix0.14s4.68s
F (large-format)454 MB1,416 Mpix0.15s6.16s

Within the observed range, rendering time appears to be proportional to file size rather than pixel count. A and B have roughly a tenth of the pixels of C, yet are faster because their files are smaller; conversely, comparing C and D, D is slower despite having nearly the same pixel count, because its file is larger.

In terms of perceived speed, the breakdown was roughly as follows.

File sizeRendering time
under 200 MB2-3 s
200-400 MB3-5 s
over 400 MB5-7 s

The remaining rendering overhead

The table below estimates the breakdown of the 5–6 seconds for an initial render of a 450 MB-class PTIF.

ComponentTime
Lambda cold start (first time only)500 ms
S3 metadata retrieval (HEAD)50 ms
sharp opening the PTIF and parsing the IFDabout 5 s
Fetching the relevant tiles + JPEG outputabout 500 ms

"sharp opening the PTIF and parsing the IFD" scales with file size, and it appears difficult to cut significantly under a Lambda architecture. Since Lambda containers do not readily carry state across invocations, the same initialization is repeated for each request.

To reduce this part further, the following options can be considered.

ApproachMechanism
Run Cantaloupe persistently on ECS / EC2Open the PTIF into memory at startup, so subsequent requests cut tiles immediately
Lambda Provisioned ConcurrencyRemoves the cold-start portion (around 500 ms), but the 5-second IFD parse remains
Further enlarge the PTIF tile size (1024 or 2048)There may be room for reduction by lowering the IFD entry count (not verified)
Convert to another format such as JPEG 2000The IFD structure differs, so the characteristics may change

With the current CloudFront long-term cache, a second and later access to the same URL is delivered in around 50 ms, so the initial cost is acceptable depending on the usage pattern. If multiple users view the same image within a day, the initial cost is paid only once.

Checklist toward a summary

As a result of this investigation, here are the items that seemed worth checking when delivering a PTIF via S3 and Lambda.

When creating a PTIF:

  • Set --Q to 89 or below in vips tiffsave
  • Use --tile-width 512 --tile-height 512 to keep the tile count down
  • Use --strip to remove metadata
  • Confirm Photometric Interpretation: YCbCr with tiffinfo

When uploading to S3:

  • Attach object metadata with --metadata "width=W,height=H,pages=N"
  • Specify --content-type image/tiff

Lambda settings:

  • Extend the timeout to around 30 seconds
  • 2048 MB of memory seemed sufficient (Max Memory Used measured around 600 MB)
  • Revisit Reserved Concurrent Executions to match the viewer's parallel requests (the default of 50 makes throttling likely under Mirador's burst requests)

CloudFront settings:

  • Set a longer Cache Policy TTL
  • Run an Invalidation when replacing a PTIF

Approximate performance to expect:

  • info.json: 50-300 ms
  • Tile / region rendering, first time: depends on file size; 5–7 s for the 500 MB class
  • Tile / region rendering, second and later (CloudFront cache Hit): 50-100 ms

References