scholar_flux.api.rate_limiting package

Submodules

scholar_flux.api.rate_limiting.rate_limiter module

The scholar_flux.api.rate_limiting.rate_limiter module implements a simple, general RateLimiter.

ScholarFlux uses and builds upon this RateLimiter implementation to ensure that the number of requests to an API provider does not exceed the limit within the specified time interval.

class scholar_flux.api.rate_limiting.rate_limiter.RateLimiter(min_interval: int | float | None = None)[source]

Bases: object

A basic rate limiter used to ensure that function calls (such as API requests) do not exceed a specified rate.

The RateLimiter is used within ScholarFlux to throttle the total number of requests that can be made within a defined time interval (measured in seconds).

This class ensures that calls to RateLimiter.wait() (or any decorated function) are spaced by at least min_interval seconds.

For multithreading applications, the RateLimiter is not thread-safe. Instead, the ThreadedRateLimiter subclass can provide a thread-safe implementation when required.

Parameters:: min_interval (Optional[float | int]) – The minimum number of seconds that must elapse before another request sent or call is performed. If min_interval is not specified, then class attribute, RateLimiter.DEFAULT_MIN_INTERVAL will be assigned to RateLimiter.min_interval instead.

Examples

>>> import requests
>>> from scholar_flux.api import RateLimiter
>>> rate_limiter = RateLimiter(min_interval = 5)
>>> # The first call won't sleep, because a prior call using the rate limiter doesn't yet exist
>>> with rate_limiter:
...     response = requests.get("http://httpbin.org/get")
>>> # will sleep if 5 seconds since the last call hasn't elapsed.
>>> with rate_limiter:
...     response = requests.get("http://httpbin.org/get")
>>> # Or simply call the `wait` method directly:
>>> rate_limiter.wait()
>>> response = requests.get("http://httpbin.org/get")

DEFAULT_MIN_INTERVAL: float | int = 6.1

__init__(min_interval: int | float | None = None)[source]

Initializes the rate limiter with the min_interval argument.

Parameters:: min_interval (Optional[float | int]) – Minimum number of seconds to wait before the next call is performed or request sent.

property min_interval: float | int: The minimum number of seconds that must elapse before another request sent or action is taken.

rate(min_interval: float | int) → Iterator[Self][source]

Temporarily adjusts the minimum interval between function calls or requests when used with a context manager.

After the context manager exits, the original minimum interval value is then reassigned its previous value, and the time of the last call is recorded.

Parameters:: min_interval – Indicates the minimum interval to be temporarily used during the call
Yields:: RateLimiter – The original rate limiter with a temporarily changed minimum interval

wait(min_interval: int | float | None = None) → None[source]

Block (time.sleep) until at least min_interval has passed since last call.

This method can be used with the min_interval attribute to determine when a search was last sent and throttle requests to make sure rate limits aren’t exceeded. If not enough time has passed, the API will wait before sending the next request.

Parameters:: min_interval (Optional[float | int] = None) – The minimum time to wait until another call is sent. Note that the min_interval attribute or argument must be non-null, otherwise, the default min_interval value is used.

Exceptions:: APIParameterException: Occurs if the value provided is either not an integer/float or is less than 0

scholar_flux.api.rate_limiting.retry_handler module

The scholar_flux.api.rate_limiting.retry_handler implements a basic RetryHandler that defines a variable period of time to wait in between successive unsuccessful requests to the same provider.

This class is implemented by default within the SearchCoordinator class to verify and retry each request until successful or the maximum retry limit has been reached.

class scholar_flux.api.rate_limiting.retry_handler.RetryHandler(max_retries: int = 3, backoff_factor: float = 0.5, max_backoff: int = 120, retry_statuses: set[int] | list[int] | None = None, raise_on_error: bool | None = None)[source]

Bases: object

Core class used for determining whether or not to retry failed requests when rate limiting, backoff factors, and max backoff when enabled.

DEFAULT_RAISE_ON_ERROR = False

DEFAULT_RETRY_STATUSES = {429, 500, 503, 504}

DEFAULT_VALID_STATUSES = {200}

__init__(max_retries: int = 3, backoff_factor: float = 0.5, max_backoff: int = 120, retry_statuses: set[int] | list[int] | None = None, raise_on_error: bool | None = None)[source]

Helper class to send and retry requests of a specific status code. The RetryHandler also dynamically controls the degree of rate limiting that occurs upon observing a rate limiting error status code.

Parameters:

max_retries (int) – indicates how many attempts should be performed before halting retries at retrieving a valid response
backoff_factor (float) – indicates the factor used to adjust when the next request is should be attempted based on past unsuccessful attempts
max_backoff (int) – describes the maximum number of seconds to wait before submitting
retry_statuses (Optional[set[int]]) – Indicates the full list of status codes that should be retried if encountered
raise_on_error (Optional[bool]) – Flag that indicates whether or not to raise an error upon encountering an invalid status_code or exception

calculate_retry_delay(attempt_count: int, response: Response | ResponseProtocol | None = None) → float[source]: Calculate delay for the next retry attempt.

execute_with_retry(request_func: Callable, validator_func: Callable | None = None, *args, **kwargs) → Response | ResponseProtocol | None[source]

Sends a request and retries on failure based on predefined criteria and validation function.

Parameters:

request_func – The function to send the request.
validator_func – A function that takes a response and returns True if valid.
*args – Positional arguments for the request function.
**kwargs – Arbitrary keyword arguments for the request function.

Returns:

The response received, or None if no valid response was obtained.

Return type:

requests.Response

Raises:

RequestFailedException – When a request raises an exception for whatever reason
InvalidResponseException – When the number of retries has been exceeded and self.raise_on_error is True

log_retry_attempt(delay: float, status_code: int | None = None) → None[source]: Log an attempt to retry a request.

static log_retry_warning(message: str) → None[source]: Log a warning when retries are exhausted or an error occurs.

parse_retry_after(retry_after: str) → int | float | None[source]

Parse the ‘Retry-After’ header to calculate delay.

Parameters:: retry_after (str) – The value of ‘Retry-After’ header.
Returns:: Delay time in seconds.
Return type:: int

should_retry(response: Response | ResponseProtocol) → bool[source]: Determine whether the request should be retried.

scholar_flux.api.rate_limiting.threaded_rate_limiter module

The scholar_flux.api.rate_limiting.threaded_rate_limiter module implements ThreadedRateLimiter for thread safety.

The ThreadedRateLimiter extends the basic functionality of the original RateLimiter class and can be used in multithreaded scenarios to ensure that provider rate limits are not exceeded within a constant time interval.

class scholar_flux.api.rate_limiting.threaded_rate_limiter.ThreadedRateLimiter(min_interval: int | float | None = None)[source]

Bases: RateLimiter

Thread-safe version of RateLimiter that can be safely used across multiple threads.

Inherits all functionality from RateLimiter but adds thread synchronization to prevent race conditions when multiple threads access the same limiter instance.

__init__(min_interval: int | float | None = None)[source]: Initialize with thread safety.

rate(min_interval: float | int) → Iterator[Self][source]

Thread-safe version of rate() context manager.

Parameters:: min_interval – The minimum interval to temporarily use during the call
Yields:: ThreadSafeRateLimiter – The rate limiter with temporarily changed interval

wait(min_interval: int | float | None = None) → None[source]: Thread-safe version of wait() that prevents race conditions.

Module contents

The scholar_flux.api.rate_limiting module defines the rate-limiting behavior for all providers. The rate limiting module is designed to be relatively straightforward to apply to a variety of context and extensible to account for several varying contexts where rate limiting is required.

Modules:

rate_limiter:: Implements a basic rate limiter that applies rate limiting for a specified interval of time. Rate limiting directly using RateLimiter.wait or used with a context manager that records the time of execution and the amount of time to wait directly.
threaded_rate_limiter:: Inherits from the basic RateLimiter class to account for multithreading scenarios that require the same resource. the usage is the same, but thread-safe
retry_handler:: Basic implementation that defines a period of time to wait in between requests that are unsuccessful. This class is used to automatically retry failed requests until successful or the maximum retry limit has been exceeded. The end-user can decide whether to retry specific status codes or whether to halt early.

Classes:

RateLimiter:: The most basic rate limiter used for throttling requests using a constant interval
ThreadedRateLimiter:: A thread-safe implementation that inherits from the RateLimiter to apply in multithreading
RetryHandler:: Used to define the period of time to wait before sending a failed request with applications of max backoff and backoff_factor to assist in dynamically timing requests on successive request failures.

In addition, a rate_limiter_registry and threaded_rate_limiter_registry are implemented to aid in the normalization of responses to the same provider across multiple search APIs. This is particularly relevant when using the scholar_flux.api.MultiSearchCoordinator for multi-threaded requests across queries and configurations. where the threaded_rate_limiter_registry is implemented under the hood for throttling across APIs.

Example usage:

>>> import requests
# both the RateLimiter and threaded rate limiter are implemented similarly:
>>> from scholar_flux.api.rate_limiting import ThreadedRateLimiter
>>> rate_limiter = ThreadedRateLimiter(min_interval = 3)
# defines a simple decorated function that does the equivalent of calling `rate_limiter.wait()` between requests
>>> @rate_limiter
>>> def rate_limited_request(url = 'https://httpbin.org/get'):
>>>    return requests.get(url)
 # the first call won't be throttled
>>> rate_limited_request()
# the second call will wait the minimum duration from the time `rate_limited_request` was last called
>>> rate_limited_request()

class scholar_flux.api.rate_limiting.RateLimiter(min_interval: int | float | None = None)[source]

Bases: object

A basic rate limiter used to ensure that function calls (such as API requests) do not exceed a specified rate.

The RateLimiter is used within ScholarFlux to throttle the total number of requests that can be made within a defined time interval (measured in seconds).

This class ensures that calls to RateLimiter.wait() (or any decorated function) are spaced by at least min_interval seconds.

For multithreading applications, the RateLimiter is not thread-safe. Instead, the ThreadedRateLimiter subclass can provide a thread-safe implementation when required.

Parameters:: min_interval (Optional[float | int]) – The minimum number of seconds that must elapse before another request sent or call is performed. If min_interval is not specified, then class attribute, RateLimiter.DEFAULT_MIN_INTERVAL will be assigned to RateLimiter.min_interval instead.

Examples

>>> import requests
>>> from scholar_flux.api import RateLimiter
>>> rate_limiter = RateLimiter(min_interval = 5)
>>> # The first call won't sleep, because a prior call using the rate limiter doesn't yet exist
>>> with rate_limiter:
...     response = requests.get("http://httpbin.org/get")
>>> # will sleep if 5 seconds since the last call hasn't elapsed.
>>> with rate_limiter:
...     response = requests.get("http://httpbin.org/get")
>>> # Or simply call the `wait` method directly:
>>> rate_limiter.wait()
>>> response = requests.get("http://httpbin.org/get")

DEFAULT_MIN_INTERVAL: float | int = 6.1

__init__(min_interval: int | float | None = None)[source]

Initializes the rate limiter with the min_interval argument.

Parameters:: min_interval (Optional[float | int]) – Minimum number of seconds to wait before the next call is performed or request sent.

property min_interval: float | int: The minimum number of seconds that must elapse before another request sent or action is taken.

rate(min_interval: float | int) → Iterator[Self][source]

Temporarily adjusts the minimum interval between function calls or requests when used with a context manager.

After the context manager exits, the original minimum interval value is then reassigned its previous value, and the time of the last call is recorded.

Parameters:: min_interval – Indicates the minimum interval to be temporarily used during the call
Yields:: RateLimiter – The original rate limiter with a temporarily changed minimum interval

wait(min_interval: int | float | None = None) → None[source]

Block (time.sleep) until at least min_interval has passed since last call.

This method can be used with the min_interval attribute to determine when a search was last sent and throttle requests to make sure rate limits aren’t exceeded. If not enough time has passed, the API will wait before sending the next request.

Parameters:: min_interval (Optional[float | int] = None) – The minimum time to wait until another call is sent. Note that the min_interval attribute or argument must be non-null, otherwise, the default min_interval value is used.

Exceptions:: APIParameterException: Occurs if the value provided is either not an integer/float or is less than 0

class scholar_flux.api.rate_limiting.RetryHandler(max_retries: int = 3, backoff_factor: float = 0.5, max_backoff: int = 120, retry_statuses: set[int] | list[int] | None = None, raise_on_error: bool | None = None)[source]

Bases: object

Core class used for determining whether or not to retry failed requests when rate limiting, backoff factors, and max backoff when enabled.

DEFAULT_RAISE_ON_ERROR = False

DEFAULT_RETRY_STATUSES = {429, 500, 503, 504}

DEFAULT_VALID_STATUSES = {200}

__init__(max_retries: int = 3, backoff_factor: float = 0.5, max_backoff: int = 120, retry_statuses: set[int] | list[int] | None = None, raise_on_error: bool | None = None)[source]

Helper class to send and retry requests of a specific status code. The RetryHandler also dynamically controls the degree of rate limiting that occurs upon observing a rate limiting error status code.

Parameters:

max_retries (int) – indicates how many attempts should be performed before halting retries at retrieving a valid response
backoff_factor (float) – indicates the factor used to adjust when the next request is should be attempted based on past unsuccessful attempts
max_backoff (int) – describes the maximum number of seconds to wait before submitting
retry_statuses (Optional[set[int]]) – Indicates the full list of status codes that should be retried if encountered
raise_on_error (Optional[bool]) – Flag that indicates whether or not to raise an error upon encountering an invalid status_code or exception

calculate_retry_delay(attempt_count: int, response: Response | ResponseProtocol | None = None) → float[source]: Calculate delay for the next retry attempt.

execute_with_retry(request_func: Callable, validator_func: Callable | None = None, *args, **kwargs) → Response | ResponseProtocol | None[source]

Sends a request and retries on failure based on predefined criteria and validation function.

Parameters:

request_func – The function to send the request.
validator_func – A function that takes a response and returns True if valid.
*args – Positional arguments for the request function.
**kwargs – Arbitrary keyword arguments for the request function.

Returns:

The response received, or None if no valid response was obtained.

Return type:

requests.Response

Raises:

RequestFailedException – When a request raises an exception for whatever reason
InvalidResponseException – When the number of retries has been exceeded and self.raise_on_error is True

log_retry_attempt(delay: float, status_code: int | None = None) → None[source]: Log an attempt to retry a request.

static log_retry_warning(message: str) → None[source]: Log a warning when retries are exhausted or an error occurs.

parse_retry_after(retry_after: str) → int | float | None[source]

Parse the ‘Retry-After’ header to calculate delay.

Parameters:: retry_after (str) – The value of ‘Retry-After’ header.
Returns:: Delay time in seconds.
Return type:: int

should_retry(response: Response | ResponseProtocol) → bool[source]: Determine whether the request should be retried.

class scholar_flux.api.rate_limiting.ThreadedRateLimiter(min_interval: int | float | None = None)[source]

Bases: RateLimiter

Thread-safe version of RateLimiter that can be safely used across multiple threads.

Inherits all functionality from RateLimiter but adds thread synchronization to prevent race conditions when multiple threads access the same limiter instance.

__init__(min_interval: int | float | None = None)[source]: Initialize with thread safety.

rate(min_interval: float | int) → Iterator[Self][source]

Thread-safe version of rate() context manager.

Parameters:: min_interval – The minimum interval to temporarily use during the call
Yields:: ThreadSafeRateLimiter – The rate limiter with temporarily changed interval

wait(min_interval: int | float | None = None) → None[source]: Thread-safe version of wait() that prevents race conditions.