Learn how to add rate limit and progress bar to Google Cloud Generative AI API calls ? Google Cloud Tasks for rate limiting and retry on production

Artificial Intelligence

Learn how to add rate limit and progress bar to Google Cloud Generative AI API calls ? Google Cloud Tasks for rate limiting and retry on production

admin

May 26, 2023

Learn how to add rate limit and progress bar to Google Cloud Generative AI API calls ?
Google Cloud Tasks for rate limiting and retry on production

You might now use the Generative AI models of Google Cloud. On the time of writing this post, they’re in public preview. Considered one of the very first thing that you’re going to hit during development is the API rate limits. It is feasible to request Quota increase but in case you continue to hit the bounds, you would like a rate limiter. Especially, for those who are using pandasapplyfunction, you easily encounter nasty 429 ResourceExhausted errors.

On this post, I show you an example on tips on how to apply rate limit to LLM calls (to any API call actually). I take advantage of the ratelimit and backoff libraries to control the traffic.

Full Colab gist is above. Colab itself is here.

While you follow the guide above, finally you will notice the below progress bar which can decrease your anxiety while waiting for a response.

The speed limiter keeps us under the required QPS. If we receive a Resource Exhausted error from Google APIs then the code retries using exponential backoff as shown below:

1 COMMENT

LEAVE A REPLY Cancel reply