-
Notifications
You must be signed in to change notification settings - Fork 511
Description
Concise Description:
Use case: to deploy models using prebuilt Tensorflow images. These models would process large payloads, as expected for a SageMaker asynchronous endpoint.
Issue: When you pull the prebuilt container of choice (example: 763104351884.dkr.ecr.us-east-1.amazonaws.com/tensorflow-inference:2.18-cpu) and locally run inference for a payload larger than 100 MB you would see the below error:
Input:
time curl -v -X POST http://localhost:8080/invocations \
-H "Content-Type: application/json" \
-d @/tmp/large_payload.json
Output:
Note: Unnecessary use of -X or --request, POST is already inferred.
* Trying 127.0.0.1:8080...
* TCP_NODELAY set
* Connected to localhost (127.0.0.1) port 8080 (#0)
> POST /invocations HTTP/1.1
> Host: localhost:8080
> User-Agent: curl/7.68.0
> Accept: */*
> Content-Type: application/json
> Content-Length: 177237594
> Expect: 100-continue
>
* Mark bundle as not supporting multiuse
< HTTP/1.1 100 Continue
* We are completely uploaded and fine
* Mark bundle as not supporting multiuse
< HTTP/1.1 500 Internal Server Error
< Server: nginx/1.26.3
< Date: Thu, 24 Jul 2025 09:52:00 GMT
< Content-Type: text/html
< Content-Length: 177
< Connection: close
<
<html>
<head><title>500 Internal Server Error</title></head>
<body>
<center><h1>500 Internal Server Error</h1></center>
<hr><center>nginx/1.26.3</center>
</body>
</html>
* Closing connection 0
real 0m0.409s
user 0m0.169s
sys 0m0.153s
-
Currently the parameters client_body_buffer_size and subrequest_output_buffer_size are set to 100m in nginx.conf.template limiting payloads to 100 and produce the above error.
-
There are no environment variables to change this
https://github.com/aws/deep-learning-containers/blob/master/tensorflow/inference/docker/build_artifacts/sagemaker/serve.py#L292-L303 -
to work around this had to create a custom container, below is an example:
FROM 763104351884.dkr.ecr.us-east-1.amazonaws.com/tensorflow-inference:2.18-cpu
# Copy custom handler
COPY nginx.conf.template sagemaker/nginx.conf.template
- the nginx.conf.template increased the parameters client_body_buffer_size and subrequest_output_buffer_size from the original 100m to a larger size (eg. 200m as my payload was 177 MB) as seen below and resolved the error:
load_module modules/ngx_http_js_module.so;
worker_processes auto;
daemon off;
pid /tmp/nginx.pid;
error_log /dev/stderr %NGINX_LOG_LEVEL%;
worker_rlimit_nofile 4096;
events {
worker_connections 2048;
}
http {
include /etc/nginx/mime.types;
default_type application/json;
access_log /dev/stdout combined;
js_import tensorflowServing.js;
proxy_read_timeout %PROXY_READ_TIMEOUT%;
upstream tfs_upstream {
%TFS_UPSTREAM%;
}
upstream gunicorn_upstream {
server unix:/tmp/gunicorn.sock fail_timeout=1;
}
server {
listen %NGINX_HTTP_PORT% deferred;
client_max_body_size 0;
client_body_buffer_size 200m; #originally 100m
subrequest_output_buffer_size 200m; #originally 100m
set $tfs_version %TFS_VERSION%;
set $default_tfs_model %TFS_DEFAULT_MODEL_NAME%;
location /tfs {
rewrite ^/tfs/(.*) /$1 break;
proxy_redirect off;
proxy_pass_request_headers off;
proxy_set_header Content-Type 'application/json';
proxy_set_header Accept 'application/json';
proxy_pass http://tfs_upstream;
}
location /ping {
%FORWARD_PING_REQUESTS%;
}
location /invocations {
%FORWARD_INVOCATION_REQUESTS%;
}
location /models {
proxy_pass http://gunicorn_upstream/models;
}
location / {
return 404 '{"error": "Not Found"}';
}
keepalive_timeout 3;
}
}
DLC image/dockerfile:
example: 763104351884.dkr.ecr.us-east-1.amazonaws.com/tensorflow-inference:2.18-cpu
Describe the solution you'd like
introduce environment variables to change client_body_buffer_size and subrequest_output_buffer_size instead of creating a custom container for to change this limit