Skip to content

[feature-request] Environment vairables for Nginx client_body_buffer_size and subrequest_output_buffer_size for prebuilt tensorflow #5089

@Kaylee-Govender

Description

@Kaylee-Govender

Concise Description:
Use case: to deploy models using prebuilt Tensorflow images. These models would process large payloads, as expected for a SageMaker asynchronous endpoint.

Issue: When you pull the prebuilt container of choice (example: 763104351884.dkr.ecr.us-east-1.amazonaws.com/tensorflow-inference:2.18-cpu) and locally run inference for a payload larger than 100 MB you would see the below error:

Input:

 time curl -v -X POST http://localhost:8080/invocations \
    -H "Content-Type: application/json" \
    -d @/tmp/large_payload.json

Output:

Note: Unnecessary use of -X or --request, POST is already inferred.
*   Trying 127.0.0.1:8080...
* TCP_NODELAY set
* Connected to localhost (127.0.0.1) port 8080 (#0)
> POST /invocations HTTP/1.1
> Host: localhost:8080
> User-Agent: curl/7.68.0
> Accept: */*
> Content-Type: application/json
> Content-Length: 177237594
> Expect: 100-continue
> 
* Mark bundle as not supporting multiuse
< HTTP/1.1 100 Continue
* We are completely uploaded and fine
* Mark bundle as not supporting multiuse
< HTTP/1.1 500 Internal Server Error
< Server: nginx/1.26.3
< Date: Thu, 24 Jul 2025 09:52:00 GMT
< Content-Type: text/html
< Content-Length: 177
< Connection: close
< 
<html>
<head><title>500 Internal Server Error</title></head>
<body>
<center><h1>500 Internal Server Error</h1></center>
<hr><center>nginx/1.26.3</center>
</body>
</html>
* Closing connection 0

real    0m0.409s
user    0m0.169s
sys     0m0.153s
FROM 763104351884.dkr.ecr.us-east-1.amazonaws.com/tensorflow-inference:2.18-cpu

# Copy custom handler
COPY nginx.conf.template sagemaker/nginx.conf.template
  • the nginx.conf.template increased the parameters client_body_buffer_size and subrequest_output_buffer_size from the original 100m to a larger size (eg. 200m as my payload was 177 MB) as seen below and resolved the error:
load_module modules/ngx_http_js_module.so;

worker_processes auto;
daemon off;
pid /tmp/nginx.pid;
error_log  /dev/stderr %NGINX_LOG_LEVEL%;

worker_rlimit_nofile 4096;

events {
  worker_connections 2048;
}

http {
  include /etc/nginx/mime.types;
  default_type application/json;
  access_log /dev/stdout combined;
  js_import tensorflowServing.js;

  proxy_read_timeout %PROXY_READ_TIMEOUT%;  

  upstream tfs_upstream {
    %TFS_UPSTREAM%;
  }

  upstream gunicorn_upstream {
    server unix:/tmp/gunicorn.sock fail_timeout=1;
  }

  server {
    listen %NGINX_HTTP_PORT% deferred;
    client_max_body_size 0;
    client_body_buffer_size 200m;      #originally 100m
    subrequest_output_buffer_size 200m;   #originally 100m

    set $tfs_version %TFS_VERSION%;
    set $default_tfs_model %TFS_DEFAULT_MODEL_NAME%;

    location /tfs {
        rewrite ^/tfs/(.*) /$1  break;
        proxy_redirect off;
        proxy_pass_request_headers off;
        proxy_set_header Content-Type 'application/json';
        proxy_set_header Accept 'application/json';
        proxy_pass http://tfs_upstream;
    }

    location /ping {
        %FORWARD_PING_REQUESTS%;
    }

    location /invocations {
        %FORWARD_INVOCATION_REQUESTS%;
    }

    location /models {
        proxy_pass http://gunicorn_upstream/models;
    }

    location / {
        return 404 '{"error": "Not Found"}';
    }

    keepalive_timeout 3;
  }
}

DLC image/dockerfile:
example: 763104351884.dkr.ecr.us-east-1.amazonaws.com/tensorflow-inference:2.18-cpu

Describe the solution you'd like
introduce environment variables to change client_body_buffer_size and subrequest_output_buffer_size instead of creating a custom container for to change this limit

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions