scholar_flux.api.models package

Submodules

scholar_flux.api.models.api_parameters module

The scholar_flux.api.models.api_parameters module implements the APIParameterMap and APIParameterConfig classes.

These two classes are designed for flexibility in the creation and handling of API Responses given provider-specific differences in request parameters and configuration.

Classes:
APIParameterMap:

Extends the BaseAPIParameterMap to provide factory functions and utilities to more efficiently retrieve and use default parameter maps.

APIParameterConfig:

Uses or creates an APIParameterMap to prepare request parameters according to the specifications of the current provider’s API.

class scholar_flux.api.models.api_parameters.APIParameterConfig(parameter_map: APIParameterMap)[source]

Bases: object

Uses an APIParameterMap instance and runtime parameter values to build parameter dictionaries for API requests.

Parameters:

parameter_map (APIParameterMap) – The mapping of universal to API-specific parameter names.

Class Attributes:
DEFAULT_CORRECT_ZERO_INDEX (bool):

Autocorrects zero-indexed API parameter building specifications to only accept positive values when True. If otherwise False, page calculation APIs will start from page 0 if zero-indexed (i.e., arXiv).

Examples

>>> from scholar_flux.api import APIParameterConfig, APIParameterMap
>>> # the API parameter map is defined and used to resolve parameters to the API's language
>>> api_parameter_map = APIParameterMap(
... query='q', records_per_page = 'pagesize', start = 'page', auto_calculate_page = False
... )
# The APIParameterConfig defines class and settings that indicate how to create requests
>>> api_parameter_config = APIParameterConfig(api_parameter_map, auto_calculate_page = False)
# Builds parameters using the specification from the APIParameterMap
>>> page = api_parameter_config.build_parameters(query= 'ml', page = 10, records_per_page=50)
>>> print(page)
# OUTPUT {'q': 'ml', 'page': 10, 'pagesize': 50}
DEFAULT_CORRECT_ZERO_INDEX: ClassVar[bool] = True
__init__(*args: Any, **kwargs: Any) None
classmethod as_config(parameter_map: dict | BaseAPIParameterMap | APIParameterMap | APIParameterConfig) APIParameterConfig[source]

Factory method for creating a new APIParameterConfig from a dictionary or APIParameterMap.

This helper class method resolves the structure of the APIParameterConfig against its basic building blocks to create a new configuration when possible.

Parameters:

parameter_map (dict | BaseAPIParameterMap | APIParameterMap | APIParameterConfig) – A parameter mapping/config to use in the instantiation of an APIParameterConfig.

Returns:

A new structure from the inputs

Return type:

APIParameterConfig

Raises:

APIParameterException – If there is an error in the creation/resolution of the required parameters

build_parameters(query: str | None, page: int | None, records_per_page: int, **api_specific_parameters) Dict[str, Any][source]

Builds the dictionary of request parameters using the current parameter map and provided values at runtime.

Parameters:
  • query (Optional[str]) – The search query string.

  • page (Optional[int]) – The page number for pagination (1-based).

  • records_per_page (int) – Number of records to fetch per page.

  • **api_specific_parameters – Additional API-specific parameters to include.

Returns:

The fully constructed API request parameters dictionary, with keys as API-specific parameter names and values as provided.

Return type:

Dict[str, Any]

classmethod from_defaults(provider_name: str, **additional_parameters) APIParameterConfig[source]

Factory method to create APIParameterConfig instances with sensible defaults for known APIs.

If the provider_name does not exist, the code will raise an exception.

Parameters:
  • provider_name (str) – The name of the API to create the parameter map for.

  • api_key (Optional[str]) – API key value if required.

  • additional_parameters (dict) – Additional parameter mappings.

Returns:

Configured parameter config instance for the specified API.

Return type:

APIParameterConfig

Raises:

NotImplementedError – If the API name is unknown.

classmethod get_defaults(provider_name: str, **additional_parameters) APIParameterConfig | None[source]

Factory method to create APIParameterConfig instances with sensible defaults for known APIs.

Avoids throwing an error if the provider name does not already exist.

Parameters:
  • provider_name (str) – The name of the API to create the parameter map for.

  • additional_parameters (dict) – Additional parameter mappings.

Returns:

Configured parameter config instance for the specified API. Returns None if a mapping for the provider_name isn’t retrieved

Return type:

Optional[APIParameterConfig]

property map: APIParameterMap

Helper property that is an alias for the APIParameterMap attribute.

The APIParameterMap maps all universal parameters to the parameter names specific to the API provider.

Returns:

The mapping that the current APIParameterConfig will use to build a dictionary of parameter requests specific to the current API.

Return type:

APIParameterMap

parameter_map: APIParameterMap
show_parameters() list[source]

Helper method to show the complete list of all parameters that can be found in the current_mappings.

Returns:

The complete list of all universal and api specific parameters corresponding to the current API

Return type:

List

structure(flatten: bool = False, show_value_attributes: bool = True) str[source]

Helper method that shows the current structure of the APIParameterConfig.

class scholar_flux.api.models.api_parameters.APIParameterMap(*, query: str, records_per_page: str, start: str | None = None, api_key_parameter: str | None = None, api_key_required: bool = False, auto_calculate_page: bool = True, zero_indexed_pagination: bool = False, api_specific_parameters: ~typing.Dict[str, ~scholar_flux.api.models.base_parameters.APISpecificParameter] = <factory>)[source]

Bases: BaseAPIParameterMap

Extends BaseAPIParameterMap by adding validation and the optional retrieval of provider defaults for known APIs.

This class also specifies default mappings for specific attributes such as API keys and additional parameter names.

query

The API-specific parameter name for the search query.

Type:

str

start

The API-specific parameter name for pagination (start index or page number).

Type:

Optional[str]

records_per_page

The API-specific parameter name for records per page.

Type:

str

api_key_parameter

The API-specific parameter name for the API key.

Type:

Optional[str]

api_key_required

Indicates whether an API key is required.

Type:

bool

auto_calculate_page

If True, calculates start index from page; if False, passes page number directly.

Type:

bool

zero_indexed_pagination

If True, treats 0 as an allowed page value when retrieving data from APIs.

Type:

bool

api_specific_parameters

Additional universal to API-specific parameter mappings.

Type:

Dict[str, str]

api_key_parameter: str | None
api_key_required: bool
api_specific_parameters: Dict[str, APISpecificParameter]
auto_calculate_page: bool
classmethod from_defaults(provider_name: str, **additional_parameters) APIParameterMap[source]

Factory method that uses the APIParameterMap.get_defaults classmethod to retrieve the provider config.

Raises an error if the provider does not exist.

Parameters:
  • provider_name (str) – The name of the API to create the parameter map for.

  • additional_parameters (dict) – Additional parameter mappings.

Returns:

Configured parameter map for the specified API.

Return type:

APIParameterMap

Raises:

NotImplementedError – If the API name is unknown.

classmethod get_defaults(provider_name: str, **additional_parameters) APIParameterMap | None[source]

Factory method to create APIParameterMap instances with sensible defaults for known APIs.

This class method attempts to pull from the list of known providers defined in the scholar_flux.api.providers.provider_registry and returns None if an APIParameterMap for the provider cannot be found.

Using the additional_parameters keyword arguments, users can specify optional overrides for specific parameters if needed. This is helpful in circumstances where an API’s specification overlaps with that of a known provider.

Valid providers (as indicated in provider_registry) include:

  • springernature

  • plos

  • arxiv

  • openalex

  • core

  • crossref

Parameters:
  • provider_name (str) – The name of the API provider to retrieve the parameter map for.

  • additional_parameters (dict) – Additional parameter mappings.

Returns:

Configured parameter map for the specified API.

Return type:

Optional[APIParameterMap]

model_config: ClassVar[ConfigDict] = {}

Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].

query: str
records_per_page: str
classmethod set_default_api_key_parameter(values: dict[str, Any]) dict[str, Any][source]

Sets the default for the api key parameter when api_key_required`=True and `api_key_parameter is None.

Parameters:

values (dict[str, Any]) – The dictionary of attributes to validate

Returns:

The updated parameter values passed to the APIParameterMap. api_key_parameter is set to “api_key” if key is required but not specified

Return type:

dict[str, Any]

start: str | None
classmethod validate_api_specific_parameter_mappings(values: dict[str, Any]) dict[str, Any][source]

Validates the additional mappings provided to the APIParameterMap.

This method validates that the input is dictionary of mappings that consists of only string-typed keys mapped to API-specific parameters as defined by the APISpecificParameter class.

Parameters:

values (dict[str, Any]) – The dictionary of attribute values to validate.

Returns:

The updated dictionary if validation passes.

Return type:

dict[str, Any]

Raises:

APIParameterException – If api_specific_parameters is not a dictionary or contains non-string keys/values.

zero_indexed_pagination: bool

scholar_flux.api.models.base_parameters module

The scholar_flux.api.models.base_parameters module implements BaseAPIParameterMap and APISpecificParameter classes.

These classes define the core and API-specific fields required to interact with and create requests to API providers.

Classes:

BaseAPIParameterMap: Defines parameters for interacting with a provider’s API specification. APISpecificParameters: Defines optional and required parameters specific to an API provider.

class scholar_flux.api.models.base_parameters.APISpecificParameter(name: str, description: str, validator: Callable[[Any], Any] | None = None, default: Any = None, required: bool = False)[source]

Bases: object

Dataclass that defines the specification of an API-specific parameter for an API provider.

Implements optionally specifiable defaults, validation steps, and indicators for optional vs. required fields.

Parameters:
  • name (str) – The name of the parameter used when sending requests to APis.

  • description (str) – A description of the API-specific parameter.

  • validator (Optional[Callable[[Any], Any]]) – An optional function/method for verifying and pre-processing parameter input based on required types, constrained values, etc.

  • default (Any) – An default value used for the parameter if not specified by the user

  • required (bool) – Indicates whether the current parameter is required for API calls.

__init__(*args: Any, **kwargs: Any) None
default: Any = None
description: str
name: str
required: bool = False
structure(flatten: bool = False, show_value_attributes: bool = True) str[source]

Helper method for showing the structure of the current APISpecificParameter.

validator: Callable[[Any], Any] | None = None
property validator_name

Helper method for generating a human readable string from the validator function, if used.

class scholar_flux.api.models.base_parameters.BaseAPIParameterMap(*, query: str, records_per_page: str, start: str | None = None, api_key_parameter: str | None = None, api_key_required: bool = False, auto_calculate_page: bool = True, zero_indexed_pagination: bool = False, api_specific_parameters: ~typing.Dict[str, ~scholar_flux.api.models.base_parameters.APISpecificParameter] = <factory>)[source]

Bases: BaseModel

Base class for Mapping universal SearchAPI parameter names to API-specific parameter names.

Includes core logic for distinguishing parameter names, indicating required API keys, and defining pagination logic.

query

The API-specific parameter name for the search query.

Type:

str

start

The API-specific parameter name for optional pagination (start index or page number).

Type:

Optional[str]

records_per_page

The API-specific parameter name for records per page.

Type:

str

api_key_parameter

The API-specific parameter name for the API key.

Type:

Optional[str]

api_key_required

Indicates whether an API key is required.

Type:

bool

page_required

If True, indicates that a page is required.

Type:

bool

auto_calculate_page

If True, calculates start index from page; if False, passes page number directly.

Type:

bool

zero_indexed_pagination

Treats page=0 as an allowed page value when retrieving data from the API.

Type:

bool

api_specific_parameters

Additional API-specific parameter mappings.

Type:

Dict[str, APISpecificParameter]

api_key_parameter: str | None
api_key_required: bool
api_specific_parameters: Dict[str, APISpecificParameter]
auto_calculate_page: bool
classmethod from_dict(obj: Dict[str, Any]) BaseAPIParameterMap[source]

Create a new instance of BaseAPIParameterMap from a dictionary.

Parameters:

obj (dict) – The dictionary containing the data for the new instance.

Returns:

A new instance created from the given dictionary.

Return type:

BaseAPIParameterMap

model_config: ClassVar[ConfigDict] = {}

Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].

query: str
records_per_page: str
show_parameters() list[source]

Helper method to show the complete list of all parameters that can be found in the current ParameterMap.

Returns:

The complete list of all universal and api specific parameters corresponding to the current API

Return type:

List

start: str | None
structure(flatten: bool = False, show_value_attributes: bool = True) str[source]

Helper method that shows the current structure of the BaseAPIParameterMap.

to_dict() Dict[str, Any][source]

Convert the current instance into a dictionary representation.

Returns:

A dictionary representation of the current instance.

Return type:

Dict

update(other: BaseAPIParameterMap | Dict[str, Any]) BaseAPIParameterMap[source]

Update the current instance with values from another BaseAPIParameterMap or dictionary.

Parameters:

other (BaseAPIParameterMap | Dict) – The object containing updated values.

Returns:

A new instance with updated values.

Return type:

BaseAPIParameterMap

zero_indexed_pagination: bool

scholar_flux.api.models.provider_config module

The scholar_flux.api.models.provider_config module implements the basic provider configuration necessary for interacting with APIs.

It provides the foundational information necessary for the SearchAPI to resolve provider names to the URLs of the providers as well as basic defaults necessary for interaction.

class scholar_flux.api.models.provider_config.ProviderConfig(*, provider_name: Annotated[str, MinLen(min_length=1)], base_url: str, parameter_map: BaseAPIParameterMap, records_per_page: Annotated[int, Ge(ge=0), Le(le=1000)] = 20, request_delay: Annotated[float, Ge(ge=0)] = 6.1, api_key_env_var: str | None = None, docs_url: str | None = None)[source]

Bases: BaseModel

Config for creating the basic instructions and settings necessary to interact with new providers. This config on initialization is created for default providers on package initialization in the scholar_flux.api.providers submodule. A new, custom provider or override can be added to the provider_registry (A custom user dictionary) from the scholar_flux.api.providers module.

Parameters:
  • provider_name (str) – The name of the provider to be associated with the config.

  • base_url (str) – The URL of the provider to send requests with the specified parameters.

  • parameter_map (BaseAPIParameterMap) – The parameter map indicating the specific semantics of the API.

  • records_per_page (int) – Generally the upper limit (for some APIs) or reasonable limit for the number of retrieved records per request (specific to the API provider).

  • request_delay (float) – Indicates exactly how many seconds to wait before sending successive requests Note that the requested interval may vary based on the API provider.

  • api_key_env_var (Optional[str]) – Indicates the environment variable to look for if the API requires or accepts API keys.

  • docs_url – (Optional[str]): An optional URL that indicates where documentation related to the use of the API can be found.

Example Usage:
>>> from scholar_flux.api import ProviderConfig, APIParameterMap, SearchAPI
>>> # Maps each of the individual parameters required to interact with the Guardian API
>>> parameters = APIParameterMap(query='q',
>>>                              start='page',
>>>                              records_per_page='page-size',
>>>                              api_key_parameter='api-key',
>>>                              auto_calculate_page=False,
>>>                              api_key_required=True)
>>> # creating the config object that holds the basic configuration necessary to interact with the API
>>> guardian_config = ProviderConfig(provider_name = 'GUARDIAN',
>>>                                  parameter_map = parameters,
>>>                                  base_url = 'https://content.guardianapis.com//search',
>>>                                  records_per_page=10,
>>>                                  api_key_env_var='GUARDIAN_API_KEY',
>>>                                  request_delay=6)
>>> api = SearchAPI.from_provider_config(query = 'economic welfare',
>>>                                      provider_config = guardian_config,
>>>                                      use_cache = True)
>>> assert api.provider_name == 'guardian'
>>> response = api.search(page = 1) # assumes that you have the GUARDIAN_API_KEY stored as an env variable
>>> assert response.ok
api_key_env_var: str | None
base_url: str
docs_url: str | None
model_config: ClassVar[ConfigDict] = {'str_strip_whitespace': True}

Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].

classmethod normalize_provider_name(v: str) str[source]

Helper method for normalizing the names of providers to a consistent structure.

parameter_map: BaseAPIParameterMap
provider_name: str
records_per_page: int
request_delay: float
search_config_defaults() dict[str, Any][source]

Convenience Method for retrieving ProviderConfig fields as a dict. Useful for providing the missing information needed to create a SearchAPIConfig object for a provider when only the provider_name has been provided.

Returns:

A dictionary containing the URL, name, records_per_page, and request_delay

for the current provider.

Return type:

(dict)

structure(flatten: bool = False, show_value_attributes: bool = True) str[source]

Helper method that shows the current structure of the ProviderConfig.

classmethod validate_base_url(v: str) str[source]

Validates the current url and raises an APIParameterException if invalid.

classmethod validate_docs_url(v: str | None) str | None[source]

Validates the documentation url and raises an APIParameterException if invalid.

scholar_flux.api.models.provider_registry module

The scholar_flux.models.provider_registry module implements the ProviderRegistry class which extends a dictionary to map provider names to their scholar_flux ProviderConfig.

When scholar_flux uses a provider_name to create a SearchAPI or SearchCoordinator, the package-level provider_registry is instantiated and referenced to retrieve the necessary configuration for easier interaction and specification of APIs.

class scholar_flux.api.models.provider_registry.ProviderRegistry(dict=None, /, **kwargs)[source]

Bases: BaseProviderDict

The ProviderRegistry implementation allows the smooth and efficient retrieval of API parameter maps and default configuration settings to aid in the creation of a SearchAPI that is specific to the current API.

Note that the ProviderRegistry uses the ProviderConfig._normalize_name to ignore underscores and case-sensitivity.

- ProviderRegistry.from_defaults

Dynamically imports configurations stored within scholar_flux.api.providers, and fails gracefully if a provider’s module does not contain a ProviderConfig.

- ProviderRegistry.get

resolves a provider name to its ProviderConfig if it exists in the registry.

- ProviderRegistry.get_from_url

resolves a provider URL to its ProviderConfig if it exists in the registry.

add(provider_config: ProviderConfig) None[source]

Helper method for adding a new provider to the provider registry.

create(provider_name: str, **kwargs) ProviderConfig[source]

Helper method that creates and registers a new ProviderConfig with the current provider registry.

Parameters:
  • key (str) – The name of the provider to create a new provider_config for.

  • **kwargs – Additional keyword arguments to pass to scholar_flux.api.models.ProviderConfig

classmethod from_defaults() ProviderRegistry[source]

Helper method that dynamically loads providers from the scholar_flux.api.providers module specifically reserved for default provider configs.

Returns:

A new registry containing the loaded default provider configurations

Return type:

ProviderRegistry

get_from_url(provider_url: str | None) ProviderConfig | None[source]

Attempt to retrieve a ProviderConfig instance for the given provider by resolving the provided url to the provider’s. Will not throw an error in the event that the provider does not exist.

Parameters:

provider_url (Optional[str]) – Name of the default provider

Returns:

Instance configuration for the provider if it exists, else None

Return type:

Optional[ProviderConfig]

remove(provider_name: str) None[source]

Helper method for removing a provider configuration from the provider registry.

scholar_flux.api.models.reconstructed_response module

The scholar_flux.api.reconstructed_response module implements a basic ReconstructedResponse data structure.

The ReconstructedResponse class was designed to be request-client agnostic to improve flexibility in the request clients that can be used to retrieve data from APIs and load response data from cache.

The ReconstructedResponse is a minimal implementation of a response-like object that can transform response classes from requests, httpx, and asyncio into a singular representation of the same response.

class scholar_flux.api.models.reconstructed_response.ReconstructedResponse(status_code: int, reason: str, headers: MutableMapping[str, str], content: bytes, url: Any)[source]

Bases: object

Helper class for retaining the most relevant of fields when reconstructing responses from different sources such as requests and httpx (if chosen). The primary purpose of the ReconstructedResponse in scholar_flux is to create a minimal representation of a response when we need to construct a ProcessedResponse without an actual response and verify content fields.

In applications such as retrieving cached data from a scholar_flux.data_storage.DataCacheManager, if an original or cached response is not available, then a ReconstructedResponse is created from the cached response fields when available.

Parameters:
  • status_code (int) – The integer code indicating the status of the response

  • reason (str) – Indicates the reasoning associated with the status of the response

  • MutableMapping[str (headers) – Indicates metadata associated with the response (e.g. Content-Type, etc.)

  • str] – Indicates metadata associated with the response (e.g. Content-Type, etc.)

  • content (bytes) – The content within the response

  • url – (Any): The URL from which the response was received

Note

The ReconstructedResponse.build factory method is recommended in cases when one property may contain the needed fields but may need to be processed and prepared first before being used. Examples include instances where one has text or json data instead of content, a reason_phrase field instead of reason, etc.

Example

>>> from scholar_flux.api.models import ReconstructedResponse
# build a response using a factory method that infers fields from existing ones when not directly specified
>>> response = ReconstructedResponse.build(status_code = 200, content = b"success", url = "https://google.com")
# check whether the current class follows a ResponseProtocol and contains valid fields
>>> assert response.is_response()
# OUTPUT: True
>>> response.validate() # raises an error if invalid
>>> response.raise_for_status() # no error for 200 status codes
>>> assert response.reason == 'OK' == response.status  # inferred from the status_code attribute
__init__(status_code: int, reason: str, headers: MutableMapping[str, str], content: bytes, url: Any) None
asdict() dict[str, Any][source]

Helper method for converting the ReconstructedResponse into a dictionary containing attributes and their corresponding values.

classmethod build(response: Any | None = None, **kwargs) ReconstructedResponse[source]

Helper method for building a new ReconstructedResponse from a regular response object. This classmethod can either construct a new ReconstructedResponse object from a response object or response-like object or create a new ReconstructedResponse altogether with its inputs.

Parameters:

response – (Optional[Any]): A response or response-like object of unknown type or None

kwargs: The underlying components needed to construct a new response. Note that ideally,

this set of key-value pairs would be specific only to the types expected by the ReconstructedResponse.

content: bytes
classmethod fields() list[source]

Helper method for retrieving a list containing the names of all fields associated with the ReconstructedResponse class.

Returns:

A list containing the name of each attribute in the ReconstructedResponse.

Return type:

list[str]

classmethod from_keywords(**kwargs) ReconstructedResponse[source]

Uses the provided keyword arguments to create a ReconstructedResponse. keywords include the default attributes of the ReconstructedResponse, or can be inferred and processed from other keywords.

Parameters:
  • status_code (int) – The integer code indicating the status of the response

  • reason (str) – Indicates the reasoning associated with the status of the response

  • headers (MutableMapping[str, str]) – Indicates metadata associated with the response (e.g. Content-Type)

  • content (bytes) – The content within the response

  • url – (Any): The URL from which the response was received

Some fields can be both provided directly or inferred from other similarly common fields:

  • content: [‘content’, ‘_content’, ‘text’, ‘json’]

  • headers: [‘headers’, ‘_headers’]

  • reason: [‘reason’, ‘status’, ‘reason_phrase’, ‘status_code’]

Returns:

A newly reconstructed response from the given keyword components

Return type:

ReconstructedResponse

headers: MutableMapping[str, str]
is_response() bool[source]

Method for directly validating the fields that indicate that a response has been minimally recreated successfully. The fields that are validated include:

  1. status codes (should be an integer)

  2. URLs (should be a valid url)

  3. reasons (should originate from a reason attribute or inferred from the status code)

  4. content (should be a bytes field or encoded from a string text field)

  5. headers (should be a dictionary with string fields and preferably a content type

Returns:

Indicates whether the current reconstructed response minimally recreates a response object.

Return type:

bool

json() Dict[str, Any] | List[Any] | None[source]

Return JSON-decoded body from the underlying response, if available.

property ok: bool

Indicates whether the current response indicates a successful request (200 <= status_code < 400) or whether an invalid response has been received. Accounts for the.

Returns:

True if the status code is an integer value within the range of 200 and 399, False otherwise

Return type:

bool

raise_for_status() None[source]

Method that imitates the capability of the requests and httpx response types to raise errors when encountering status codes that are indicative of failed responses.

As scholar_flux processes data that is generally only sent when status codes are within the 200s (or exactly 200 [ok]), an error is raised when encountering a value outside of this range.

Raises:
reason: str
property status: str | None

Helper property for retrieving a human-readable status description of the status.

Returns:

The status description associated with the response (if available)

Return type:

Optional[int]

status_code: int
property text: str | None

Helper property for retrieving the text from the bytes content as a string.

Returns:

The decoded text from the content of the response

Return type:

Optional[str]

url: Any
validate() None[source]

Raises an error if the recreated response object does not contain valid properties expected of a response. if the response validation is successful, a response is not raised and an object is not returned.

Raises:

InvalidResponseReconstructionException – if at least one field is determined to be invalid and unexpected of a true response object.

scholar_flux.api.models.response_types module

Helper module used to define response types returned by scholar-flux after API response retrieval and processing.

The APIResponseType is a union of different possible response types that can be received from a SearchCoordinator:
  • ProcessedResponse: A successfully processed response containing parsed response metadata, and processed records.

  • ErrorResponse: Indicates that an error has occurred during response retrieval and/or processing when unsuccessful.

  • NonResponse: ErrorResponse subclass indicating when an error prevents the successful retrieval of a response.

scholar_flux.api.models.responses module

The scholar_flux.api.models.responses module contains the core response types used to indicate whether the retrieval and processing of API responses was successful or unsuccessful. Each class uses pydantic to ensure type-validated responses while ensuring flexibility in how responses can be used and applied.

Classes:
ProcessedResponse:

Indicates whether an API was successfully retrieved, parsed, and processed. This model is designed to facilitate the inspection of intermediate results and retrieval of extracted response records.

ErrorResponse:

Indicates that an error occurred somewhere in the retrieval or processing of an API response. This class is designed to allow inspection of error messages and failure results to aid in debugging in case of unexpected scenarios.

NonResponse:

Inherits from ErrorResponse and is designed to indicate that an error occurred in the preparation of a request or the sending/retrieval of a response.

class scholar_flux.api.models.responses.APIResponse(*, cache_key: str | None = None, response: Any | None = None, created_at: str | None = None)[source]

Bases: BaseModel

A Response wrapper for responses of different types that allows consistency when using several possible backends. The purpose of this class is to serve as the base for managing responses received from scholarly APIs while processing each component in a predictable, reproducible manner,

This class uses pydantic’s data validation and serialization/deserialization methods to aid caching and includes properties that refer back to the original response for displaying valid response codes, URLs, etc.

All future processing/error-based responses classes inherit from and build off of this class.

Parameters:
  • cache_key (Optional[str]) – A string for recording cache keys for use in later steps of the response orchestration involving processing, cache storage, and cache retrieval

  • response (Any) – A response or response-like object to be validated and used/re-used in later caching and response processing/orchestration steps.

  • created_at (Optional[str]) – A value indicating the time in which a response or response-like object was created.

Example

>>> from scholar_flux.api import APIResponse
# Using keyword arguments to build a basic APIResponse data container:
>>> response = APIResponse.from_response(
>>>     cache_key = 'test-response',
>>>     status_code = 200,
>>>     content=b'success',
>>>     url='https://example.com',
>>>     headers={'Content-Type': 'application/text'}
>>> )
>>> response
# OUTPUT: APIResponse(cache_key='test-response', response = ReconstructedResponse(
#    status_code=200, reason='OK', headers={'Content-Type': 'application/text'},
#    text='success', url='https://example.com'
#)
>>> assert response.status == 'OK' and response.text == 'success' and response.url == 'https://example.com'
# OUTPUT: True
>>> assert response.validate_response()
# OUTPUT: True
classmethod as_reconstructed_response(response: Any) ReconstructedResponse[source]

Classmethod designed to create a reconstructed response from an original response object. This method coerces response attributes into a reconstructed response that retains the original content, status code, headers, URL, reason, etc.

Returns:

A minimal response object that contains the core attributes needed to support

other processes in the scholar_flux module such as response parsing and caching.

Return type:

ReconstructedResponse

cache_key: str | None
property content: bytes | None

Return content from the underlying response, if available and valid.

Returns:

The bytes from the original response content

Return type:

(bytes)

created_at: str | None
encode_response(response: Any) Dict[str, Any] | List[Any] | None[source]

Helper method for serializing a response into a json format. Accounts for special cases such as CaseInsensitiveDict fields that are otherwise unserializable.

From this step, pydantic can safely use json internally to dump the encoded response fields

classmethod from_response(response: Any | None = None, cache_key: str | None = None, auto_created_at: bool | None = None, **kwargs) Self[source]

Construct an APIResponse from a response object or from keyword arguments.

If response is not a valid response object, builds a minimal response-like object from kwargs.

classmethod from_serialized_response(response: Any | None = None, **kwargs) ReconstructedResponse | None[source]

Helper method for creating a new APIresponse from the original dumped object. This method Accounts for lack of ease of serialization of responses by decoding the response dictionary that was loaded from a string using json.loads from the json module in the standard library.

If the response input is still a serialized string, this method will manually load the response dict with the APIresponse._deserialize_response_dict class method before further processing.

Parameters:

response (Any) – A prospective response value to load into the API Response.

Returns:

A reconstructed response object, if possible. Otherwise returns None

Return type:

Optional[ReconstructedResponse]

property headers: MutableMapping[str, str] | None

Return headers from the underlying response, if available and valid.

Returns:

A dictionary of headers from the response

Return type:

MutableMapping[str, str]

model_config: ClassVar[ConfigDict] = {}

Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].

raise_for_status()[source]

Uses an underlying response object to validate the status code associated with the request.

If the attribute isn’t a response or reconstructed response, the code will coerce the class into a response object to verify the status code for the request URL and response.

property reason: str | None

Uses the underlying reason attribute on the response object, if available, to create a human readable status description.

Returns:

The status description associated with the response.

Return type:

Optional[str]

response: Any | None
classmethod serialize_response(response: Response | ResponseProtocol) str | None[source]

Helper method for serializing a response into a json format. The response object is first converted into a serialized string and subsequently dumped after ensuring that the field is serializable.

Parameters:

response (Response, ResponseProtocol)

property status: str | None

Helper property for retrieving a human-readable status description APIResponse.

Returns:

The status description associated with the response (if available).

Return type:

Optional[int]

property status_code: int | None

Helper property for retrieving a status code from the APIResponse.

Returns:

The status code associated with the response (if available)

Return type:

Optional[int]

property text: str | None

Attempts to retrieve the response text by first decoding the bytes of the its content. If not available, this property attempts to directly reference the text attribute directly.

Returns:

A text string if the text is available in the correct format, otherwise None

Return type:

Optional[str]

classmethod transform_response(v: Any) Response | ResponseProtocol | None[source]

Attempts to resolve a response object as an original or ReconstructedResponse: All original response objects (duck-typed or requests response) with valid values will be returned as is.

If the passed object is a string - this function will attempt to serialize it before attempting to parse it as a dictionary.

Dictionary fields will be decoded, if originally encoded, and parsed as a ReconstructedResponse object, if possible.

Otherwise, the original object is returned as is.

property url: str | None

Return URL from the underlying response, if available and valid.

Returns:

A string of the original URL if available. Accounts for objects that

that indicate the original url when converted as a string

Return type:

str

classmethod validate_iso_timestamp(v: str | datetime | None) str | None[source]

Helper method for validating and ensuring that the timestamp accurately follows an iso 8601 format.

validate_response() bool[source]

Helper method for determining whether the response attribute is truly a response. If the response isn’t a requests response, we use duck-typing to determine whether the response attribute, itself, has the expected attributes of a response by using properties for checking types vs None (if the attribute isn’t the expected type)

Returns:

An indicator of whether the current APIResponse.response attribute is

actually a response

Return type:

bool

class scholar_flux.api.models.responses.ErrorResponse(*, cache_key: str | None = None, response: Any | None = None, created_at: str | None = None, message: str | None = None, error: str | None = None)[source]

Bases: APIResponse

Returned when something goes wrong, but we don’t want to throw immediately—just hand back failure details.

The class is formatted for compatibility with the ProcessedResponse,

cache_key: str | None
created_at: str | None
property data: None

Provided for type hinting + compatibility.

error: str | None
property extracted_records: None

Provided for type hinting + compatibility.

classmethod from_error(message: str, error: Exception, cache_key: str | None = None, response: Response | ResponseProtocol | None = None) Self[source]

Creates and logs the processing error if one occurs during response processing.

Parameters:
  • response (Response) – Raw API response.

  • cache_key (Optional[str]) – Cache key for storing results.

Returns:

A Dataclass Object that contains the error response data

and background information on what precipitated the error.

Return type:

ErrorResponse

message: str | None
property metadata: None

Provided for type hinting + compatibility.

model_config: ClassVar[ConfigDict] = {}

Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].

property parsed_response: None

Provided for type hinting + compatibility.

property processed_records: None

Provided for type hinting + compatibility.

response: Any | None
class scholar_flux.api.models.responses.NonResponse(*, cache_key: str | None = None, response: None = None, created_at: str | None = None, message: str | None = None, error: str | None = None)[source]

Bases: ErrorResponse

Response class used to indicate that an error occurred in the preparation of a request or in the retrieval of a response object from an API.

This class is used to signify the error that occurred within the search process using a similar interface as the other scholar_flux Response dataclasses.

cache_key: str | None
created_at: str | None
error: str | None
message: str | None
model_config: ClassVar[ConfigDict] = {}

Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].

response: None
class scholar_flux.api.models.responses.ProcessedResponse(*, cache_key: str | None = None, response: Any | None = None, created_at: str | None = None, parsed_response: Any | None = None, extracted_records: List[Any] | None = None, processed_records: List[Dict[Any, Any]] | None = None, metadata: Any | None = None, message: str | None = None)[source]

Bases: APIResponse

Helper class for returning a ProcessedResponse object that contains information on the original, cached, or reconstructed_response received and processed after retrieval from an API in addition to the cache key. This object also allows storage of intermediate steps including:

1) parsed responses 2) extracted records and metadata 3) processed records (aliased as data) 4) any additional messages An error field is provided for compatibility with the ErrorResponse class.

cache_key: str | None
created_at: str | None
property data: List[Dict[Any, Any]] | None

Alias to the processed_records attribute that holds a list of dictionaries, when available.

property error: None

Provided for type hinting + compatibility.

extracted_records: List[Any] | None
message: str | None
metadata: Any | None
model_config: ClassVar[ConfigDict] = {}

Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].

parsed_response: Any | None
processed_records: List[Dict[Any, Any]] | None
response: Any | None

scholar_flux.api.models.search_api_config module

The scholar_flux.api.models.search_api_config module implements the core SearchAPIConfig used to drive API searches.

The SearchAPIConfig is used by the SearchAPI to interact with API providers via a unified interface for orchestrating response retrieval.

This configuration defines settings such as rate limiting, the number of records retrieved per request, API keys, and the API provider/URL where requests will be sent.

Under the hood, the SearchAPIConfig can use both pre-created and custom defaults to create a new configuration with minimal code.

class scholar_flux.api.models.search_api_config.SearchAPIConfig(*, provider_name: str = '', base_url: str = '', records_per_page: Annotated[int, Ge(ge=0), Le(le=1000)] = 20, request_delay: float = -1, api_key: SecretStr | None = None, api_specific_parameters: dict[str, Any] | None = None)[source]

Bases: BaseModel

The SearchAPIConfig class provides the core tools necessary to set and interact with the API. The SearchAPI uses this class to retrieve data from an API using universal parameters to simplify the process of retrieving raw responses.

provider_name

Indicates the name of the API to use when making requests to a provider. If the provider name matches a known default and the base_url is unspecified, the base URL for the current provider is used instead.

Type:

str

base_url

Indicates the API URL where data will be searched and retrieved.

Type:

str

records_per_page

Controls the number of records that will appear on each page

Type:

int

request_delay

Indicates the minimum delay between each request to avoid exceeding API rate limits

Type:

float

api_key

This is an API-specific parameter for validating the current user’s identity. If a str type is provided, it is converted into a SecretStr.

Type:

Optional[str | SecretStr]

api_specific_parameters

A dictionary containing all parameters specific to the current API. API-specific parameters include the following.

  1. mailto (Optional[str | SecretStr]):

    An optional email address for receiving feedback on usage from providers, This parameter is currently applicable only to the Crossref API.

  2. db: (str):

    The parameter use by the NIH to direct requests for data to the pubmed database. This parameter defaults to pubmed and does not require direct specification

Type:

dict[str, APISpecificParameter]

Examples

>>> from scholar_flux.api import SearchAPIConfig, SearchAPI, provider_registry
# to create a CROSSREF configuration with minimal defaults and provide an api_specific_parameter:
>>> config = SearchAPIConfig.from_defaults(provider_name = 'crossref', mailto = 'your_email_here@example.com')
# the configuration automatically retrieves the configuration for the "Crossref" API
>>> assert config.provider_name == 'crossref' and config.base_url == provider_registry['crossref'].base_url
>>> api = SearchAPI.from_settings(query = 'q', config = config)
>>> assert api.config == config
# to retrieve all defaults associated with a provider and automatically read an API key if needed
>>> config = SearchAPIConfig.from_defaults(provider_name = 'pubmed', api_key = 'your api key goes here')
# the API key is retrieved automatically if you have the API key specified as an environment variable
>>> assert config.api_key is not None
# Default provider API specifications are already pre-populated if they are set with defaults
>>> assert config.api_specific_parameters['db'] == 'pubmed'  # required by pubmed and defaults to pubmed
# Update a provider and automatically retrieve its API key - the previous API key will no longer apply
>>> updated_config = SearchAPIConfig.update(config, provider_name = 'core')
# The API key should have been overwritten to use core. Looks for a `CORE_API_KEY` env variable by default
>>> assert updated_config.provider_name  == 'core' and  updated_config.api_key != config.api_key
DEFAULT_PROVIDER: ClassVar[str] = 'PLOS'
DEFAULT_RECORDS_PER_PAGE: ClassVar[int] = 25
DEFAULT_REQUEST_DELAY: ClassVar[float] = 6.1
MAX_API_KEY_LENGTH: ClassVar[int] = 512
api_key: SecretStr | None
api_specific_parameters: dict[str, Any] | None
base_url: str
classmethod default_request_delay(v: int | float | None, provider_name: str | None = None) float[source]

Helper method enabling the retrieval of the most appropriate rate limit for the current provider.

Defaults to the SearchAPIConfig default rate limit when the current provider is unknown and a valid rate limit has not yet been provided.

Parameters:
  • v (Optional[int | float]) – The value received for the current request_delay

  • provider_name (Optional[str]) – The name of the provider to retrieve a rate limit for

Returns:

The inputted non-negative request delay, the retrieved rate limit for the current provider

if available, or the SearchAPIConfig.DEFAULT_REQUEST_DELAY - all in order of priority.

Return type:

float

classmethod from_defaults(provider_name: str, **overrides) SearchAPIConfig[source]

Uses the default configuration for the chosen provider to create a SearchAPIConfig object containing configuration parameters. Note that additional parameters and field overrides can be added via the **overrides field.

Parameters:
  • provider_name (str) – The name of the provider to create the config

  • **overrides – Optional keyword arguments to specify overrides and additional arguments

Returns:

A default APIConfig object based on the chosen parameters

Return type:

SearchAPIConfig

model_config: ClassVar[ConfigDict] = {}

Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].

provider_name: str
records_per_page: int
request_delay: float
classmethod set_records_per_page(v: int | None)[source]

Sets the records_per_page parameter with the default if the supplied value is not valid:

Triggers a validation error when request delay is an invalid type. Otherwise uses the DEFAULT_RECORDS_PER_PAGE class attribute if the supplied value is missing or is a negative number.

structure(flatten: bool = False, show_value_attributes: bool = True) str[source]

Helper method for retrieving a string representation of the overall structure of the current SearchAPIConfig.

classmethod update(current_config: SearchAPIConfig, **overrides) SearchAPIConfig[source]

Create a new SearchAPIConfig by updating an existing config with new values and/or switching to a different provider. This method ensures that the new provider’s base_url and defaults are used if provider_name is given, and that API-specific parameters are prioritized and merged as expected.

Parameters:
  • current_config (SearchAPIConfig) – The existing configuration to update.

  • **overrides – Any fields or API-specific parameters to override or add.

Returns:

A new config with the merged and prioritized values.

Return type:

SearchAPIConfig

property url_basename: str

Uses the _extract_url_basename method from the provider URL associated with the current config instance.

classmethod validate_api_key(v: SecretStr | str | None) SecretStr | None[source]

Validates the api_key attribute and triggers a validation error if it is not valid.

classmethod validate_provider_name(v: str | None) str[source]

Validates the provider_name attribute and triggers a validation error if it is not valid.

classmethod validate_request_delay(v: int | float | None) int | float | None[source]

Sets the request delay (delay between each request) for valid request delays. This validator triggers a validation error when the request delay is an invalid type.

If a request delay is left None or is a negative number, this class method returns -1, and further validation is performed by cls.default_request_delay to retrieve the provider’s default request delay.

If not available, SearchAPIConfig.DEFAULT_REQUEST_DELAY is used.

validate_search_api_config_parameters() Self[source]

Validation method that resolves URLs and/or provider names to provider_info when one or the other is not explicitly provided.

Occurs as the last step in the validation process.

classmethod validate_url(v: str)[source]

Validates the base_url and triggers a validation error if it is not valid.

classmethod validate_url_type(v: str | None) str[source]

Validates the type for the base_url attribute and triggers a validation error if it is not valid.

scholar_flux.api.models.search_inputs module

The scholar_flux.api.models.search_inputs module implements the PageListInput RootModel for multi-page searches.

The PageListInput model is designed to validate and prepare lists and iterables of page numbers for multi-page retrieval using the SearchCoordinator.search_pages method.

class scholar_flux.api.models.search_inputs.PageListInput(root: RootModelRootType = PydanticUndefined)[source]

Bases: RootModel[Sequence[int]]

Helper class for processing page information in a predictable manner. The PageListInput class expects to receive a list, string, or generator that contains at least one page number. If a singular integer is received, the result is transformed into a single-item list containing that integer.

Parameters:

root (Sequence[int]) – A list containing at least one page number.

Examples

>>> from scholar_flux.api.models import PageListInput
>>> PageListInput(5)
PageListInput([5])
>>> PageListInput(range(5))
PageListInput([1, 2, 3, 4])
model_config: ClassVar[ConfigDict] = {}

Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].

property page_numbers: Sequence[int]

Returns the sequence of validated page numbers as a list.

classmethod page_validation(v: str | int | Sequence[int | str]) Sequence[int][source]

Processes the page input to ensure that a list of integers is returned if the received page list is in a valid format.

Parameters:

v (str | int | Sequence[int | str]) – A page or sequence of pages to be formatted as a list of pages.

Returns:

A validated, formatted sequence of page numbers assuming successful page validation

Return type:

Sequence[int]

Raises:

ValidationError – Internally raised via pydantic if a ValueError is encountered (if the input is not exclusively a page or list of page numbers)

classmethod process_page(page_value: str | int) int[source]

Helper method for ensuring that each value in the sequence is a numeric string or whole number.

Note that this function will not throw an error for negative pages as that is handled at a later step in the page search process.

Parameters:

page_value (str | int) – The value to be converted if it is not already an integer

Returns:

A validated integer if the page can be converted to an integer and is not a float

Return type:

int

Raises:

ValueError – When the value is not an integer or numeric string to be converted to an integer

scholar_flux.api.models.search_results module

The scholar_flux.api.models.search_results module defines the SearchResult and SearchResultList implementations that aid in the retrieval of multi-page and multi-coordinated searches.

These implementations allow increased organization for the API output of multiple searches by defining the provider, page, query, and response result retrieved from multi-page searches from the SearchCoordinator and multi-provider/page searches using the MultiSearchCoordinator.

Classes:
SearchResult:

Pydantic Base class that stores the search result as well as the query, provider name, and page.

SearchResultList:

Inherits from a basic list to constrain the output to a list of SearchResults while providing data preparation convenience functions for downstream frameworks.

class scholar_flux.api.models.search_results.SearchResult(*, query: str, provider_name: str, page: int, response_result: ProcessedResponse | ErrorResponse | None = None)[source]

Bases: BaseModel

Core class used in order to store data in the retrieval and processing of API Searches when iterating and searching over a range of pages, queries, and providers at a time. This class uses pydantic to ensure that field validation is automatic for ensuring integrity and reliability of response processing. multi-page searches that link each response result to a particular query, page, and provider.

Parameters:
  • query (str) – The query used to retrieve records and response metadata

  • provider_name (str) – The name of the provider where data is being retrieved

  • page (int) – The page number associated with the request for data

  • response_result (Optional[ProcessedResponse | ErrorResponse]) – The response result containing the specifics of the data retrieved from the response or the error messages recorded if the request is not successful.

For convenience, the properties of the response_result are referenced as properties of the SearchResult, including: response, parsed_response, processed_records, etc.

property cache_key: str | None

Extracts the cache key from the API Response if available.

This cache key is used when storing and retrieving data from response processing cache storage.

property created_at: str | None

Extracts the time in which the ErrorResponse or ProcessedResponse was created, if available.

property data: list[dict[Any, Any]] | None

Alias referring back to the processed records from the ProcessedResponse or ErrorResponse.

Contains the processed records from the APIResponse processing step after a successfully received response has been processed. If an error response was received instead, the value of this property is None.

property error: str | None

Extracts the error name associated with the result from the base class, indicating the name/category of the error in the event that the response_result is an ErrorResponse.

property extracted_records: list[Any] | None

Contains the extracted records from the APIResponse handling steps that extract individual records from successfully received and parsed response.

If an ErrorResponse was received instead, the value of this property is None.

property message: str | None

Extracts the message associated with the result from the base class, indicating why an error occurred in the event that the response_result is an ErrorResponse.

property metadata: Any | None

Contains the metadata from the APIResponse handling steps that extract response metadata from successfully received and parsed responses.

If an ErrorResponse was received instead, the value of this property is None.

model_config: ClassVar[ConfigDict] = {}

Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].

page: int
property parsed_response: Any | None

Contains the parsed response content from the APIResponse handling steps that extract the JSON, XML, or YAML content from a successfully received response.

If an ErrorResponse was received instead, the value of this property is None.

property processed_records: list[dict[Any, Any]] | None

Contains the processed records from the APIResponse processing step after a successfully received response has been processed.

If an error response was received instead, the value of this property is None.

provider_name: str
query: str
property response: Response | ResponseProtocol | None

Helper method directly referencing the original or reconstructed response or response-like object from the API Response if available.

If the received response is not available (None in the response_result), then this value will also be absent (None).

response_result: ProcessedResponse | ErrorResponse | None
class scholar_flux.api.models.search_results.SearchResultList(iterable=(), /)[source]

Bases: list[SearchResult]

A helper class used to store the results of multiple SearchResult instances for enhanced type safety. This class inherits from a list and extends its functionality to tailor its functionality to APIResponses received from SearchCoordinators and MultiSearchCoordinators.

- SearchResultList.append

Basic list.append implementation extended to accept only SearchResults

- SearchResultList.extend

Basic list.extend implementation extended to accept only iterables of SearchResults

- SearchResultList.filter

Removes NonResponses and ErrorResponses from the list of SearchResults

- SearchResultList.filter

Removes NonResponses and ErrorResponses from the list of SearchResults

- SearchResultList.join

Combines all records from ProcessedResponses into a list of dictionary-based records

Note Attempts to add other classes to the SearchResultList other than SearchResults will raise a TypeError.

append(item: SearchResult)[source]

Overwrites the default append method on the user dict to ensure that only SearchResult objects can be appended to the custom list.

Parameters:

item (SearchResult) – The response result containing the API response data, the provider name, and page associated with the response.

extend(other: SearchResultList | MutableSequence[SearchResult] | Iterable[SearchResult])[source]

Overwrites the default append method on the user dict to ensure that only an iterable of SearchResult objects can be appended to the SearchResultList.

Parameters:
  • other (Iterable[SearchResult]) – An iterable/sequence of response results containing the API response

  • data

  • name (the provider)

  • response (and page associated with the)

filter() SearchResultList[source]

Helper method that retains only elements from the original response that indicate successful processing.

join() list[dict[str, Any]][source]

Helper method for joining all successfully processed API responses into a single list of dictionaries that can be loaded into a pandas or polars dataframe.

Note that this method will only load processed responses that contain records that were also successfully extracted and processed.

Returns:

A single list containing all records retrieved from each page

Return type:

list[dict[str, Any]]

Module contents

The scholar_flux.api.models module includes all of the needed configuration classes that are needed to define the configuration needed to configure APIs for specific providers and to ensure that the process is orchestrated in a robust way.

Core Models:
  • APIParameterMap: Contains the mappings and settings used to customized common and API Specific parameters

    to the requirements for each API.

  • APIParameterConfig: Encapsulates the created APIParameterMap as well as the methods used to create each request.

  • SearchAPIConfig: Defines the core logic to abstract the creation of requests with parameters specific to each API.

  • ProviderConfig: Allows users to define each of the defaults and mappings settings needed to create a Search API.

  • ProviderRegistry: A customized dictionary mapping provider names to their dynamically retrieved configuration.

  • ProcessedResponse: Indicates a successfully retrieved and processed response from an API provider.

  • ErrorResponse: Indicates that an exception occurred somewhere in the process of response retrieval and processing.

  • NonResponse: Indicates a that a response of any status code code not be retrieved due to an exception.

class scholar_flux.api.models.APIParameterConfig(parameter_map: APIParameterMap)[source]

Bases: object

Uses an APIParameterMap instance and runtime parameter values to build parameter dictionaries for API requests.

Parameters:

parameter_map (APIParameterMap) – The mapping of universal to API-specific parameter names.

Class Attributes:
DEFAULT_CORRECT_ZERO_INDEX (bool):

Autocorrects zero-indexed API parameter building specifications to only accept positive values when True. If otherwise False, page calculation APIs will start from page 0 if zero-indexed (i.e., arXiv).

Examples

>>> from scholar_flux.api import APIParameterConfig, APIParameterMap
>>> # the API parameter map is defined and used to resolve parameters to the API's language
>>> api_parameter_map = APIParameterMap(
... query='q', records_per_page = 'pagesize', start = 'page', auto_calculate_page = False
... )
# The APIParameterConfig defines class and settings that indicate how to create requests
>>> api_parameter_config = APIParameterConfig(api_parameter_map, auto_calculate_page = False)
# Builds parameters using the specification from the APIParameterMap
>>> page = api_parameter_config.build_parameters(query= 'ml', page = 10, records_per_page=50)
>>> print(page)
# OUTPUT {'q': 'ml', 'page': 10, 'pagesize': 50}
DEFAULT_CORRECT_ZERO_INDEX: ClassVar[bool] = True
__init__(*args: Any, **kwargs: Any) None
classmethod as_config(parameter_map: dict | BaseAPIParameterMap | APIParameterMap | APIParameterConfig) APIParameterConfig[source]

Factory method for creating a new APIParameterConfig from a dictionary or APIParameterMap.

This helper class method resolves the structure of the APIParameterConfig against its basic building blocks to create a new configuration when possible.

Parameters:

parameter_map (dict | BaseAPIParameterMap | APIParameterMap | APIParameterConfig) – A parameter mapping/config to use in the instantiation of an APIParameterConfig.

Returns:

A new structure from the inputs

Return type:

APIParameterConfig

Raises:

APIParameterException – If there is an error in the creation/resolution of the required parameters

build_parameters(query: str | None, page: int | None, records_per_page: int, **api_specific_parameters) Dict[str, Any][source]

Builds the dictionary of request parameters using the current parameter map and provided values at runtime.

Parameters:
  • query (Optional[str]) – The search query string.

  • page (Optional[int]) – The page number for pagination (1-based).

  • records_per_page (int) – Number of records to fetch per page.

  • **api_specific_parameters – Additional API-specific parameters to include.

Returns:

The fully constructed API request parameters dictionary, with keys as API-specific parameter names and values as provided.

Return type:

Dict[str, Any]

classmethod from_defaults(provider_name: str, **additional_parameters) APIParameterConfig[source]

Factory method to create APIParameterConfig instances with sensible defaults for known APIs.

If the provider_name does not exist, the code will raise an exception.

Parameters:
  • provider_name (str) – The name of the API to create the parameter map for.

  • api_key (Optional[str]) – API key value if required.

  • additional_parameters (dict) – Additional parameter mappings.

Returns:

Configured parameter config instance for the specified API.

Return type:

APIParameterConfig

Raises:

NotImplementedError – If the API name is unknown.

classmethod get_defaults(provider_name: str, **additional_parameters) APIParameterConfig | None[source]

Factory method to create APIParameterConfig instances with sensible defaults for known APIs.

Avoids throwing an error if the provider name does not already exist.

Parameters:
  • provider_name (str) – The name of the API to create the parameter map for.

  • additional_parameters (dict) – Additional parameter mappings.

Returns:

Configured parameter config instance for the specified API. Returns None if a mapping for the provider_name isn’t retrieved

Return type:

Optional[APIParameterConfig]

property map: APIParameterMap

Helper property that is an alias for the APIParameterMap attribute.

The APIParameterMap maps all universal parameters to the parameter names specific to the API provider.

Returns:

The mapping that the current APIParameterConfig will use to build a dictionary of parameter requests specific to the current API.

Return type:

APIParameterMap

parameter_map: APIParameterMap
show_parameters() list[source]

Helper method to show the complete list of all parameters that can be found in the current_mappings.

Returns:

The complete list of all universal and api specific parameters corresponding to the current API

Return type:

List

structure(flatten: bool = False, show_value_attributes: bool = True) str[source]

Helper method that shows the current structure of the APIParameterConfig.

class scholar_flux.api.models.APIParameterMap(*, query: str, records_per_page: str, start: str | None = None, api_key_parameter: str | None = None, api_key_required: bool = False, auto_calculate_page: bool = True, zero_indexed_pagination: bool = False, api_specific_parameters: ~typing.Dict[str, ~scholar_flux.api.models.base_parameters.APISpecificParameter] = <factory>)[source]

Bases: BaseAPIParameterMap

Extends BaseAPIParameterMap by adding validation and the optional retrieval of provider defaults for known APIs.

This class also specifies default mappings for specific attributes such as API keys and additional parameter names.

query

The API-specific parameter name for the search query.

Type:

str

start

The API-specific parameter name for pagination (start index or page number).

Type:

Optional[str]

records_per_page

The API-specific parameter name for records per page.

Type:

str

api_key_parameter

The API-specific parameter name for the API key.

Type:

Optional[str]

api_key_required

Indicates whether an API key is required.

Type:

bool

auto_calculate_page

If True, calculates start index from page; if False, passes page number directly.

Type:

bool

zero_indexed_pagination

If True, treats 0 as an allowed page value when retrieving data from APIs.

Type:

bool

api_specific_parameters

Additional universal to API-specific parameter mappings.

Type:

Dict[str, str]

api_key_parameter: str | None
api_key_required: bool
api_specific_parameters: Dict[str, APISpecificParameter]
auto_calculate_page: bool
classmethod from_defaults(provider_name: str, **additional_parameters) APIParameterMap[source]

Factory method that uses the APIParameterMap.get_defaults classmethod to retrieve the provider config.

Raises an error if the provider does not exist.

Parameters:
  • provider_name (str) – The name of the API to create the parameter map for.

  • additional_parameters (dict) – Additional parameter mappings.

Returns:

Configured parameter map for the specified API.

Return type:

APIParameterMap

Raises:

NotImplementedError – If the API name is unknown.

classmethod get_defaults(provider_name: str, **additional_parameters) APIParameterMap | None[source]

Factory method to create APIParameterMap instances with sensible defaults for known APIs.

This class method attempts to pull from the list of known providers defined in the scholar_flux.api.providers.provider_registry and returns None if an APIParameterMap for the provider cannot be found.

Using the additional_parameters keyword arguments, users can specify optional overrides for specific parameters if needed. This is helpful in circumstances where an API’s specification overlaps with that of a known provider.

Valid providers (as indicated in provider_registry) include:

  • springernature

  • plos

  • arxiv

  • openalex

  • core

  • crossref

Parameters:
  • provider_name (str) – The name of the API provider to retrieve the parameter map for.

  • additional_parameters (dict) – Additional parameter mappings.

Returns:

Configured parameter map for the specified API.

Return type:

Optional[APIParameterMap]

model_config: ClassVar[ConfigDict] = {}

Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].

query: str
records_per_page: str
classmethod set_default_api_key_parameter(values: dict[str, Any]) dict[str, Any][source]

Sets the default for the api key parameter when api_key_required`=True and `api_key_parameter is None.

Parameters:

values (dict[str, Any]) – The dictionary of attributes to validate

Returns:

The updated parameter values passed to the APIParameterMap. api_key_parameter is set to “api_key” if key is required but not specified

Return type:

dict[str, Any]

start: str | None
classmethod validate_api_specific_parameter_mappings(values: dict[str, Any]) dict[str, Any][source]

Validates the additional mappings provided to the APIParameterMap.

This method validates that the input is dictionary of mappings that consists of only string-typed keys mapped to API-specific parameters as defined by the APISpecificParameter class.

Parameters:

values (dict[str, Any]) – The dictionary of attribute values to validate.

Returns:

The updated dictionary if validation passes.

Return type:

dict[str, Any]

Raises:

APIParameterException – If api_specific_parameters is not a dictionary or contains non-string keys/values.

zero_indexed_pagination: bool
class scholar_flux.api.models.APIResponse(*, cache_key: str | None = None, response: Any | None = None, created_at: str | None = None)[source]

Bases: BaseModel

A Response wrapper for responses of different types that allows consistency when using several possible backends. The purpose of this class is to serve as the base for managing responses received from scholarly APIs while processing each component in a predictable, reproducible manner,

This class uses pydantic’s data validation and serialization/deserialization methods to aid caching and includes properties that refer back to the original response for displaying valid response codes, URLs, etc.

All future processing/error-based responses classes inherit from and build off of this class.

Parameters:
  • cache_key (Optional[str]) – A string for recording cache keys for use in later steps of the response orchestration involving processing, cache storage, and cache retrieval

  • response (Any) – A response or response-like object to be validated and used/re-used in later caching and response processing/orchestration steps.

  • created_at (Optional[str]) – A value indicating the time in which a response or response-like object was created.

Example

>>> from scholar_flux.api import APIResponse
# Using keyword arguments to build a basic APIResponse data container:
>>> response = APIResponse.from_response(
>>>     cache_key = 'test-response',
>>>     status_code = 200,
>>>     content=b'success',
>>>     url='https://example.com',
>>>     headers={'Content-Type': 'application/text'}
>>> )
>>> response
# OUTPUT: APIResponse(cache_key='test-response', response = ReconstructedResponse(
#    status_code=200, reason='OK', headers={'Content-Type': 'application/text'},
#    text='success', url='https://example.com'
#)
>>> assert response.status == 'OK' and response.text == 'success' and response.url == 'https://example.com'
# OUTPUT: True
>>> assert response.validate_response()
# OUTPUT: True
classmethod as_reconstructed_response(response: Any) ReconstructedResponse[source]

Classmethod designed to create a reconstructed response from an original response object. This method coerces response attributes into a reconstructed response that retains the original content, status code, headers, URL, reason, etc.

Returns:

A minimal response object that contains the core attributes needed to support

other processes in the scholar_flux module such as response parsing and caching.

Return type:

ReconstructedResponse

cache_key: str | None
property content: bytes | None

Return content from the underlying response, if available and valid.

Returns:

The bytes from the original response content

Return type:

(bytes)

created_at: str | None
encode_response(response: Any) Dict[str, Any] | List[Any] | None[source]

Helper method for serializing a response into a json format. Accounts for special cases such as CaseInsensitiveDict fields that are otherwise unserializable.

From this step, pydantic can safely use json internally to dump the encoded response fields

classmethod from_response(response: Any | None = None, cache_key: str | None = None, auto_created_at: bool | None = None, **kwargs) Self[source]

Construct an APIResponse from a response object or from keyword arguments.

If response is not a valid response object, builds a minimal response-like object from kwargs.

classmethod from_serialized_response(response: Any | None = None, **kwargs) ReconstructedResponse | None[source]

Helper method for creating a new APIresponse from the original dumped object. This method Accounts for lack of ease of serialization of responses by decoding the response dictionary that was loaded from a string using json.loads from the json module in the standard library.

If the response input is still a serialized string, this method will manually load the response dict with the APIresponse._deserialize_response_dict class method before further processing.

Parameters:

response (Any) – A prospective response value to load into the API Response.

Returns:

A reconstructed response object, if possible. Otherwise returns None

Return type:

Optional[ReconstructedResponse]

property headers: MutableMapping[str, str] | None

Return headers from the underlying response, if available and valid.

Returns:

A dictionary of headers from the response

Return type:

MutableMapping[str, str]

model_config: ClassVar[ConfigDict] = {}

Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].

raise_for_status()[source]

Uses an underlying response object to validate the status code associated with the request.

If the attribute isn’t a response or reconstructed response, the code will coerce the class into a response object to verify the status code for the request URL and response.

property reason: str | None

Uses the underlying reason attribute on the response object, if available, to create a human readable status description.

Returns:

The status description associated with the response.

Return type:

Optional[str]

response: Any | None
classmethod serialize_response(response: Response | ResponseProtocol) str | None[source]

Helper method for serializing a response into a json format. The response object is first converted into a serialized string and subsequently dumped after ensuring that the field is serializable.

Parameters:

response (Response, ResponseProtocol)

property status: str | None

Helper property for retrieving a human-readable status description APIResponse.

Returns:

The status description associated with the response (if available).

Return type:

Optional[int]

property status_code: int | None

Helper property for retrieving a status code from the APIResponse.

Returns:

The status code associated with the response (if available)

Return type:

Optional[int]

property text: str | None

Attempts to retrieve the response text by first decoding the bytes of the its content. If not available, this property attempts to directly reference the text attribute directly.

Returns:

A text string if the text is available in the correct format, otherwise None

Return type:

Optional[str]

classmethod transform_response(v: Any) Response | ResponseProtocol | None[source]

Attempts to resolve a response object as an original or ReconstructedResponse: All original response objects (duck-typed or requests response) with valid values will be returned as is.

If the passed object is a string - this function will attempt to serialize it before attempting to parse it as a dictionary.

Dictionary fields will be decoded, if originally encoded, and parsed as a ReconstructedResponse object, if possible.

Otherwise, the original object is returned as is.

property url: str | None

Return URL from the underlying response, if available and valid.

Returns:

A string of the original URL if available. Accounts for objects that

that indicate the original url when converted as a string

Return type:

str

classmethod validate_iso_timestamp(v: str | datetime | None) str | None[source]

Helper method for validating and ensuring that the timestamp accurately follows an iso 8601 format.

validate_response() bool[source]

Helper method for determining whether the response attribute is truly a response. If the response isn’t a requests response, we use duck-typing to determine whether the response attribute, itself, has the expected attributes of a response by using properties for checking types vs None (if the attribute isn’t the expected type)

Returns:

An indicator of whether the current APIResponse.response attribute is

actually a response

Return type:

bool

class scholar_flux.api.models.APISpecificParameter(name: str, description: str, validator: Callable[[Any], Any] | None = None, default: Any = None, required: bool = False)[source]

Bases: object

Dataclass that defines the specification of an API-specific parameter for an API provider.

Implements optionally specifiable defaults, validation steps, and indicators for optional vs. required fields.

Parameters:
  • name (str) – The name of the parameter used when sending requests to APis.

  • description (str) – A description of the API-specific parameter.

  • validator (Optional[Callable[[Any], Any]]) – An optional function/method for verifying and pre-processing parameter input based on required types, constrained values, etc.

  • default (Any) – An default value used for the parameter if not specified by the user

  • required (bool) – Indicates whether the current parameter is required for API calls.

__init__(*args: Any, **kwargs: Any) None
default: Any = None
description: str
name: str
required: bool = False
structure(flatten: bool = False, show_value_attributes: bool = True) str[source]

Helper method for showing the structure of the current APISpecificParameter.

validator: Callable[[Any], Any] | None = None
property validator_name

Helper method for generating a human readable string from the validator function, if used.

class scholar_flux.api.models.BaseAPIParameterMap(*, query: str, records_per_page: str, start: str | None = None, api_key_parameter: str | None = None, api_key_required: bool = False, auto_calculate_page: bool = True, zero_indexed_pagination: bool = False, api_specific_parameters: ~typing.Dict[str, ~scholar_flux.api.models.base_parameters.APISpecificParameter] = <factory>)[source]

Bases: BaseModel

Base class for Mapping universal SearchAPI parameter names to API-specific parameter names.

Includes core logic for distinguishing parameter names, indicating required API keys, and defining pagination logic.

query

The API-specific parameter name for the search query.

Type:

str

start

The API-specific parameter name for optional pagination (start index or page number).

Type:

Optional[str]

records_per_page

The API-specific parameter name for records per page.

Type:

str

api_key_parameter

The API-specific parameter name for the API key.

Type:

Optional[str]

api_key_required

Indicates whether an API key is required.

Type:

bool

page_required

If True, indicates that a page is required.

Type:

bool

auto_calculate_page

If True, calculates start index from page; if False, passes page number directly.

Type:

bool

zero_indexed_pagination

Treats page=0 as an allowed page value when retrieving data from the API.

Type:

bool

api_specific_parameters

Additional API-specific parameter mappings.

Type:

Dict[str, APISpecificParameter]

api_key_parameter: str | None
api_key_required: bool
api_specific_parameters: Dict[str, APISpecificParameter]
auto_calculate_page: bool
classmethod from_dict(obj: Dict[str, Any]) BaseAPIParameterMap[source]

Create a new instance of BaseAPIParameterMap from a dictionary.

Parameters:

obj (dict) – The dictionary containing the data for the new instance.

Returns:

A new instance created from the given dictionary.

Return type:

BaseAPIParameterMap

model_config: ClassVar[ConfigDict] = {}

Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].

query: str
records_per_page: str
show_parameters() list[source]

Helper method to show the complete list of all parameters that can be found in the current ParameterMap.

Returns:

The complete list of all universal and api specific parameters corresponding to the current API

Return type:

List

start: str | None
structure(flatten: bool = False, show_value_attributes: bool = True) str[source]

Helper method that shows the current structure of the BaseAPIParameterMap.

to_dict() Dict[str, Any][source]

Convert the current instance into a dictionary representation.

Returns:

A dictionary representation of the current instance.

Return type:

Dict

update(other: BaseAPIParameterMap | Dict[str, Any]) BaseAPIParameterMap[source]

Update the current instance with values from another BaseAPIParameterMap or dictionary.

Parameters:

other (BaseAPIParameterMap | Dict) – The object containing updated values.

Returns:

A new instance with updated values.

Return type:

BaseAPIParameterMap

zero_indexed_pagination: bool
class scholar_flux.api.models.BaseProviderDict(dict=None, /, **kwargs)[source]

Bases: UserDict[str, Any]

The BaseProviderDict extends the dictionary to resolve minor naming variations in keys to the same provider name.

The BaseProviderDict uses the ProviderConfig._normalize_name method to ignore underscores and case-sensitivity.

class scholar_flux.api.models.ErrorResponse(*, cache_key: str | None = None, response: Any | None = None, created_at: str | None = None, message: str | None = None, error: str | None = None)[source]

Bases: APIResponse

Returned when something goes wrong, but we don’t want to throw immediately—just hand back failure details.

The class is formatted for compatibility with the ProcessedResponse,

cache_key: str | None
created_at: str | None
property data: None

Provided for type hinting + compatibility.

error: str | None
property extracted_records: None

Provided for type hinting + compatibility.

classmethod from_error(message: str, error: Exception, cache_key: str | None = None, response: Response | ResponseProtocol | None = None) Self[source]

Creates and logs the processing error if one occurs during response processing.

Parameters:
  • response (Response) – Raw API response.

  • cache_key (Optional[str]) – Cache key for storing results.

Returns:

A Dataclass Object that contains the error response data

and background information on what precipitated the error.

Return type:

ErrorResponse

message: str | None
property metadata: None

Provided for type hinting + compatibility.

model_config: ClassVar[ConfigDict] = {}

Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].

property parsed_response: None

Provided for type hinting + compatibility.

property processed_records: None

Provided for type hinting + compatibility.

response: Any | None
class scholar_flux.api.models.NonResponse(*, cache_key: str | None = None, response: None = None, created_at: str | None = None, message: str | None = None, error: str | None = None)[source]

Bases: ErrorResponse

Response class used to indicate that an error occurred in the preparation of a request or in the retrieval of a response object from an API.

This class is used to signify the error that occurred within the search process using a similar interface as the other scholar_flux Response dataclasses.

cache_key: str | None
created_at: str | None
error: str | None
message: str | None
model_config: ClassVar[ConfigDict] = {}

Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].

response: None
class scholar_flux.api.models.PageListInput(root: RootModelRootType = PydanticUndefined)[source]

Bases: RootModel[Sequence[int]]

Helper class for processing page information in a predictable manner. The PageListInput class expects to receive a list, string, or generator that contains at least one page number. If a singular integer is received, the result is transformed into a single-item list containing that integer.

Parameters:

root (Sequence[int]) – A list containing at least one page number.

Examples

>>> from scholar_flux.api.models import PageListInput
>>> PageListInput(5)
PageListInput([5])
>>> PageListInput(range(5))
PageListInput([1, 2, 3, 4])
model_config: ClassVar[ConfigDict] = {}

Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].

property page_numbers: Sequence[int]

Returns the sequence of validated page numbers as a list.

classmethod page_validation(v: str | int | Sequence[int | str]) Sequence[int][source]

Processes the page input to ensure that a list of integers is returned if the received page list is in a valid format.

Parameters:

v (str | int | Sequence[int | str]) – A page or sequence of pages to be formatted as a list of pages.

Returns:

A validated, formatted sequence of page numbers assuming successful page validation

Return type:

Sequence[int]

Raises:

ValidationError – Internally raised via pydantic if a ValueError is encountered (if the input is not exclusively a page or list of page numbers)

classmethod process_page(page_value: str | int) int[source]

Helper method for ensuring that each value in the sequence is a numeric string or whole number.

Note that this function will not throw an error for negative pages as that is handled at a later step in the page search process.

Parameters:

page_value (str | int) – The value to be converted if it is not already an integer

Returns:

A validated integer if the page can be converted to an integer and is not a float

Return type:

int

Raises:

ValueError – When the value is not an integer or numeric string to be converted to an integer

root: RootModelRootType
class scholar_flux.api.models.ProcessedResponse(*, cache_key: str | None = None, response: Any | None = None, created_at: str | None = None, parsed_response: Any | None = None, extracted_records: List[Any] | None = None, processed_records: List[Dict[Any, Any]] | None = None, metadata: Any | None = None, message: str | None = None)[source]

Bases: APIResponse

Helper class for returning a ProcessedResponse object that contains information on the original, cached, or reconstructed_response received and processed after retrieval from an API in addition to the cache key. This object also allows storage of intermediate steps including:

1) parsed responses 2) extracted records and metadata 3) processed records (aliased as data) 4) any additional messages An error field is provided for compatibility with the ErrorResponse class.

cache_key: str | None
created_at: str | None
property data: List[Dict[Any, Any]] | None

Alias to the processed_records attribute that holds a list of dictionaries, when available.

property error: None

Provided for type hinting + compatibility.

extracted_records: List[Any] | None
message: str | None
metadata: Any | None
model_config: ClassVar[ConfigDict] = {}

Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].

parsed_response: Any | None
processed_records: List[Dict[Any, Any]] | None
response: Any | None
class scholar_flux.api.models.ProviderConfig(*, provider_name: Annotated[str, MinLen(min_length=1)], base_url: str, parameter_map: BaseAPIParameterMap, records_per_page: Annotated[int, Ge(ge=0), Le(le=1000)] = 20, request_delay: Annotated[float, Ge(ge=0)] = 6.1, api_key_env_var: str | None = None, docs_url: str | None = None)[source]

Bases: BaseModel

Config for creating the basic instructions and settings necessary to interact with new providers. This config on initialization is created for default providers on package initialization in the scholar_flux.api.providers submodule. A new, custom provider or override can be added to the provider_registry (A custom user dictionary) from the scholar_flux.api.providers module.

Parameters:
  • provider_name (str) – The name of the provider to be associated with the config.

  • base_url (str) – The URL of the provider to send requests with the specified parameters.

  • parameter_map (BaseAPIParameterMap) – The parameter map indicating the specific semantics of the API.

  • records_per_page (int) – Generally the upper limit (for some APIs) or reasonable limit for the number of retrieved records per request (specific to the API provider).

  • request_delay (float) – Indicates exactly how many seconds to wait before sending successive requests Note that the requested interval may vary based on the API provider.

  • api_key_env_var (Optional[str]) – Indicates the environment variable to look for if the API requires or accepts API keys.

  • docs_url – (Optional[str]): An optional URL that indicates where documentation related to the use of the API can be found.

Example Usage:
>>> from scholar_flux.api import ProviderConfig, APIParameterMap, SearchAPI
>>> # Maps each of the individual parameters required to interact with the Guardian API
>>> parameters = APIParameterMap(query='q',
>>>                              start='page',
>>>                              records_per_page='page-size',
>>>                              api_key_parameter='api-key',
>>>                              auto_calculate_page=False,
>>>                              api_key_required=True)
>>> # creating the config object that holds the basic configuration necessary to interact with the API
>>> guardian_config = ProviderConfig(provider_name = 'GUARDIAN',
>>>                                  parameter_map = parameters,
>>>                                  base_url = 'https://content.guardianapis.com//search',
>>>                                  records_per_page=10,
>>>                                  api_key_env_var='GUARDIAN_API_KEY',
>>>                                  request_delay=6)
>>> api = SearchAPI.from_provider_config(query = 'economic welfare',
>>>                                      provider_config = guardian_config,
>>>                                      use_cache = True)
>>> assert api.provider_name == 'guardian'
>>> response = api.search(page = 1) # assumes that you have the GUARDIAN_API_KEY stored as an env variable
>>> assert response.ok
api_key_env_var: str | None
base_url: str
docs_url: str | None
model_config: ClassVar[ConfigDict] = {'str_strip_whitespace': True}

Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].

classmethod normalize_provider_name(v: str) str[source]

Helper method for normalizing the names of providers to a consistent structure.

parameter_map: BaseAPIParameterMap
provider_name: str
records_per_page: int
request_delay: float
search_config_defaults() dict[str, Any][source]

Convenience Method for retrieving ProviderConfig fields as a dict. Useful for providing the missing information needed to create a SearchAPIConfig object for a provider when only the provider_name has been provided.

Returns:

A dictionary containing the URL, name, records_per_page, and request_delay

for the current provider.

Return type:

(dict)

structure(flatten: bool = False, show_value_attributes: bool = True) str[source]

Helper method that shows the current structure of the ProviderConfig.

classmethod validate_base_url(v: str) str[source]

Validates the current url and raises an APIParameterException if invalid.

classmethod validate_docs_url(v: str | None) str | None[source]

Validates the documentation url and raises an APIParameterException if invalid.

class scholar_flux.api.models.ProviderRegistry(dict=None, /, **kwargs)[source]

Bases: BaseProviderDict

The ProviderRegistry implementation allows the smooth and efficient retrieval of API parameter maps and default configuration settings to aid in the creation of a SearchAPI that is specific to the current API.

Note that the ProviderRegistry uses the ProviderConfig._normalize_name to ignore underscores and case-sensitivity.

- ProviderRegistry.from_defaults

Dynamically imports configurations stored within scholar_flux.api.providers, and fails gracefully if a provider’s module does not contain a ProviderConfig.

- ProviderRegistry.get

resolves a provider name to its ProviderConfig if it exists in the registry.

- ProviderRegistry.get_from_url

resolves a provider URL to its ProviderConfig if it exists in the registry.

add(provider_config: ProviderConfig) None[source]

Helper method for adding a new provider to the provider registry.

create(provider_name: str, **kwargs) ProviderConfig[source]

Helper method that creates and registers a new ProviderConfig with the current provider registry.

Parameters:
  • key (str) – The name of the provider to create a new provider_config for.

  • **kwargs – Additional keyword arguments to pass to scholar_flux.api.models.ProviderConfig

classmethod from_defaults() ProviderRegistry[source]

Helper method that dynamically loads providers from the scholar_flux.api.providers module specifically reserved for default provider configs.

Returns:

A new registry containing the loaded default provider configurations

Return type:

ProviderRegistry

get_from_url(provider_url: str | None) ProviderConfig | None[source]

Attempt to retrieve a ProviderConfig instance for the given provider by resolving the provided url to the provider’s. Will not throw an error in the event that the provider does not exist.

Parameters:

provider_url (Optional[str]) – Name of the default provider

Returns:

Instance configuration for the provider if it exists, else None

Return type:

Optional[ProviderConfig]

remove(provider_name: str) None[source]

Helper method for removing a provider configuration from the provider registry.

class scholar_flux.api.models.ReconstructedResponse(status_code: int, reason: str, headers: MutableMapping[str, str], content: bytes, url: Any)[source]

Bases: object

Helper class for retaining the most relevant of fields when reconstructing responses from different sources such as requests and httpx (if chosen). The primary purpose of the ReconstructedResponse in scholar_flux is to create a minimal representation of a response when we need to construct a ProcessedResponse without an actual response and verify content fields.

In applications such as retrieving cached data from a scholar_flux.data_storage.DataCacheManager, if an original or cached response is not available, then a ReconstructedResponse is created from the cached response fields when available.

Parameters:
  • status_code (int) – The integer code indicating the status of the response

  • reason (str) – Indicates the reasoning associated with the status of the response

  • MutableMapping[str (headers) – Indicates metadata associated with the response (e.g. Content-Type, etc.)

  • str] – Indicates metadata associated with the response (e.g. Content-Type, etc.)

  • content (bytes) – The content within the response

  • url – (Any): The URL from which the response was received

Note

The ReconstructedResponse.build factory method is recommended in cases when one property may contain the needed fields but may need to be processed and prepared first before being used. Examples include instances where one has text or json data instead of content, a reason_phrase field instead of reason, etc.

Example

>>> from scholar_flux.api.models import ReconstructedResponse
# build a response using a factory method that infers fields from existing ones when not directly specified
>>> response = ReconstructedResponse.build(status_code = 200, content = b"success", url = "https://google.com")
# check whether the current class follows a ResponseProtocol and contains valid fields
>>> assert response.is_response()
# OUTPUT: True
>>> response.validate() # raises an error if invalid
>>> response.raise_for_status() # no error for 200 status codes
>>> assert response.reason == 'OK' == response.status  # inferred from the status_code attribute
__init__(status_code: int, reason: str, headers: MutableMapping[str, str], content: bytes, url: Any) None
asdict() dict[str, Any][source]

Helper method for converting the ReconstructedResponse into a dictionary containing attributes and their corresponding values.

classmethod build(response: Any | None = None, **kwargs) ReconstructedResponse[source]

Helper method for building a new ReconstructedResponse from a regular response object. This classmethod can either construct a new ReconstructedResponse object from a response object or response-like object or create a new ReconstructedResponse altogether with its inputs.

Parameters:

response – (Optional[Any]): A response or response-like object of unknown type or None

kwargs: The underlying components needed to construct a new response. Note that ideally,

this set of key-value pairs would be specific only to the types expected by the ReconstructedResponse.

content: bytes
classmethod fields() list[source]

Helper method for retrieving a list containing the names of all fields associated with the ReconstructedResponse class.

Returns:

A list containing the name of each attribute in the ReconstructedResponse.

Return type:

list[str]

classmethod from_keywords(**kwargs) ReconstructedResponse[source]

Uses the provided keyword arguments to create a ReconstructedResponse. keywords include the default attributes of the ReconstructedResponse, or can be inferred and processed from other keywords.

Parameters:
  • status_code (int) – The integer code indicating the status of the response

  • reason (str) – Indicates the reasoning associated with the status of the response

  • headers (MutableMapping[str, str]) – Indicates metadata associated with the response (e.g. Content-Type)

  • content (bytes) – The content within the response

  • url – (Any): The URL from which the response was received

Some fields can be both provided directly or inferred from other similarly common fields:

  • content: [‘content’, ‘_content’, ‘text’, ‘json’]

  • headers: [‘headers’, ‘_headers’]

  • reason: [‘reason’, ‘status’, ‘reason_phrase’, ‘status_code’]

Returns:

A newly reconstructed response from the given keyword components

Return type:

ReconstructedResponse

headers: MutableMapping[str, str]
is_response() bool[source]

Method for directly validating the fields that indicate that a response has been minimally recreated successfully. The fields that are validated include:

  1. status codes (should be an integer)

  2. URLs (should be a valid url)

  3. reasons (should originate from a reason attribute or inferred from the status code)

  4. content (should be a bytes field or encoded from a string text field)

  5. headers (should be a dictionary with string fields and preferably a content type

Returns:

Indicates whether the current reconstructed response minimally recreates a response object.

Return type:

bool

json() Dict[str, Any] | List[Any] | None[source]

Return JSON-decoded body from the underlying response, if available.

property ok: bool

Indicates whether the current response indicates a successful request (200 <= status_code < 400) or whether an invalid response has been received. Accounts for the.

Returns:

True if the status code is an integer value within the range of 200 and 399, False otherwise

Return type:

bool

raise_for_status() None[source]

Method that imitates the capability of the requests and httpx response types to raise errors when encountering status codes that are indicative of failed responses.

As scholar_flux processes data that is generally only sent when status codes are within the 200s (or exactly 200 [ok]), an error is raised when encountering a value outside of this range.

Raises:
reason: str
property status: str | None

Helper property for retrieving a human-readable status description of the status.

Returns:

The status description associated with the response (if available)

Return type:

Optional[int]

status_code: int
property text: str | None

Helper property for retrieving the text from the bytes content as a string.

Returns:

The decoded text from the content of the response

Return type:

Optional[str]

url: Any
validate() None[source]

Raises an error if the recreated response object does not contain valid properties expected of a response. if the response validation is successful, a response is not raised and an object is not returned.

Raises:

InvalidResponseReconstructionException – if at least one field is determined to be invalid and unexpected of a true response object.

class scholar_flux.api.models.SearchAPIConfig(*, provider_name: str = '', base_url: str = '', records_per_page: Annotated[int, Ge(ge=0), Le(le=1000)] = 20, request_delay: float = -1, api_key: SecretStr | None = None, api_specific_parameters: dict[str, Any] | None = None)[source]

Bases: BaseModel

The SearchAPIConfig class provides the core tools necessary to set and interact with the API. The SearchAPI uses this class to retrieve data from an API using universal parameters to simplify the process of retrieving raw responses.

provider_name

Indicates the name of the API to use when making requests to a provider. If the provider name matches a known default and the base_url is unspecified, the base URL for the current provider is used instead.

Type:

str

base_url

Indicates the API URL where data will be searched and retrieved.

Type:

str

records_per_page

Controls the number of records that will appear on each page

Type:

int

request_delay

Indicates the minimum delay between each request to avoid exceeding API rate limits

Type:

float

api_key

This is an API-specific parameter for validating the current user’s identity. If a str type is provided, it is converted into a SecretStr.

Type:

Optional[str | SecretStr]

api_specific_parameters

A dictionary containing all parameters specific to the current API. API-specific parameters include the following.

  1. mailto (Optional[str | SecretStr]):

    An optional email address for receiving feedback on usage from providers, This parameter is currently applicable only to the Crossref API.

  2. db: (str):

    The parameter use by the NIH to direct requests for data to the pubmed database. This parameter defaults to pubmed and does not require direct specification

Type:

dict[str, APISpecificParameter]

Examples

>>> from scholar_flux.api import SearchAPIConfig, SearchAPI, provider_registry
# to create a CROSSREF configuration with minimal defaults and provide an api_specific_parameter:
>>> config = SearchAPIConfig.from_defaults(provider_name = 'crossref', mailto = 'your_email_here@example.com')
# the configuration automatically retrieves the configuration for the "Crossref" API
>>> assert config.provider_name == 'crossref' and config.base_url == provider_registry['crossref'].base_url
>>> api = SearchAPI.from_settings(query = 'q', config = config)
>>> assert api.config == config
# to retrieve all defaults associated with a provider and automatically read an API key if needed
>>> config = SearchAPIConfig.from_defaults(provider_name = 'pubmed', api_key = 'your api key goes here')
# the API key is retrieved automatically if you have the API key specified as an environment variable
>>> assert config.api_key is not None
# Default provider API specifications are already pre-populated if they are set with defaults
>>> assert config.api_specific_parameters['db'] == 'pubmed'  # required by pubmed and defaults to pubmed
# Update a provider and automatically retrieve its API key - the previous API key will no longer apply
>>> updated_config = SearchAPIConfig.update(config, provider_name = 'core')
# The API key should have been overwritten to use core. Looks for a `CORE_API_KEY` env variable by default
>>> assert updated_config.provider_name  == 'core' and  updated_config.api_key != config.api_key
DEFAULT_PROVIDER: ClassVar[str] = 'PLOS'
DEFAULT_RECORDS_PER_PAGE: ClassVar[int] = 25
DEFAULT_REQUEST_DELAY: ClassVar[float] = 6.1
MAX_API_KEY_LENGTH: ClassVar[int] = 512
api_key: SecretStr | None
api_specific_parameters: dict[str, Any] | None
base_url: str
classmethod default_request_delay(v: int | float | None, provider_name: str | None = None) float[source]

Helper method enabling the retrieval of the most appropriate rate limit for the current provider.

Defaults to the SearchAPIConfig default rate limit when the current provider is unknown and a valid rate limit has not yet been provided.

Parameters:
  • v (Optional[int | float]) – The value received for the current request_delay

  • provider_name (Optional[str]) – The name of the provider to retrieve a rate limit for

Returns:

The inputted non-negative request delay, the retrieved rate limit for the current provider

if available, or the SearchAPIConfig.DEFAULT_REQUEST_DELAY - all in order of priority.

Return type:

float

classmethod from_defaults(provider_name: str, **overrides) SearchAPIConfig[source]

Uses the default configuration for the chosen provider to create a SearchAPIConfig object containing configuration parameters. Note that additional parameters and field overrides can be added via the **overrides field.

Parameters:
  • provider_name (str) – The name of the provider to create the config

  • **overrides – Optional keyword arguments to specify overrides and additional arguments

Returns:

A default APIConfig object based on the chosen parameters

Return type:

SearchAPIConfig

model_config: ClassVar[ConfigDict] = {}

Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].

provider_name: str
records_per_page: int
request_delay: float
classmethod set_records_per_page(v: int | None)[source]

Sets the records_per_page parameter with the default if the supplied value is not valid:

Triggers a validation error when request delay is an invalid type. Otherwise uses the DEFAULT_RECORDS_PER_PAGE class attribute if the supplied value is missing or is a negative number.

structure(flatten: bool = False, show_value_attributes: bool = True) str[source]

Helper method for retrieving a string representation of the overall structure of the current SearchAPIConfig.

classmethod update(current_config: SearchAPIConfig, **overrides) SearchAPIConfig[source]

Create a new SearchAPIConfig by updating an existing config with new values and/or switching to a different provider. This method ensures that the new provider’s base_url and defaults are used if provider_name is given, and that API-specific parameters are prioritized and merged as expected.

Parameters:
  • current_config (SearchAPIConfig) – The existing configuration to update.

  • **overrides – Any fields or API-specific parameters to override or add.

Returns:

A new config with the merged and prioritized values.

Return type:

SearchAPIConfig

property url_basename: str

Uses the _extract_url_basename method from the provider URL associated with the current config instance.

classmethod validate_api_key(v: SecretStr | str | None) SecretStr | None[source]

Validates the api_key attribute and triggers a validation error if it is not valid.

classmethod validate_provider_name(v: str | None) str[source]

Validates the provider_name attribute and triggers a validation error if it is not valid.

classmethod validate_request_delay(v: int | float | None) int | float | None[source]

Sets the request delay (delay between each request) for valid request delays. This validator triggers a validation error when the request delay is an invalid type.

If a request delay is left None or is a negative number, this class method returns -1, and further validation is performed by cls.default_request_delay to retrieve the provider’s default request delay.

If not available, SearchAPIConfig.DEFAULT_REQUEST_DELAY is used.

validate_search_api_config_parameters() Self[source]

Validation method that resolves URLs and/or provider names to provider_info when one or the other is not explicitly provided.

Occurs as the last step in the validation process.

classmethod validate_url(v: str)[source]

Validates the base_url and triggers a validation error if it is not valid.

classmethod validate_url_type(v: str | None) str[source]

Validates the type for the base_url attribute and triggers a validation error if it is not valid.

class scholar_flux.api.models.SearchResult(*, query: str, provider_name: str, page: int, response_result: ProcessedResponse | ErrorResponse | None = None)[source]

Bases: BaseModel

Core class used in order to store data in the retrieval and processing of API Searches when iterating and searching over a range of pages, queries, and providers at a time. This class uses pydantic to ensure that field validation is automatic for ensuring integrity and reliability of response processing. multi-page searches that link each response result to a particular query, page, and provider.

Parameters:
  • query (str) – The query used to retrieve records and response metadata

  • provider_name (str) – The name of the provider where data is being retrieved

  • page (int) – The page number associated with the request for data

  • response_result (Optional[ProcessedResponse | ErrorResponse]) – The response result containing the specifics of the data retrieved from the response or the error messages recorded if the request is not successful.

For convenience, the properties of the response_result are referenced as properties of the SearchResult, including: response, parsed_response, processed_records, etc.

property cache_key: str | None

Extracts the cache key from the API Response if available.

This cache key is used when storing and retrieving data from response processing cache storage.

property created_at: str | None

Extracts the time in which the ErrorResponse or ProcessedResponse was created, if available.

property data: list[dict[Any, Any]] | None

Alias referring back to the processed records from the ProcessedResponse or ErrorResponse.

Contains the processed records from the APIResponse processing step after a successfully received response has been processed. If an error response was received instead, the value of this property is None.

property error: str | None

Extracts the error name associated with the result from the base class, indicating the name/category of the error in the event that the response_result is an ErrorResponse.

property extracted_records: list[Any] | None

Contains the extracted records from the APIResponse handling steps that extract individual records from successfully received and parsed response.

If an ErrorResponse was received instead, the value of this property is None.

property message: str | None

Extracts the message associated with the result from the base class, indicating why an error occurred in the event that the response_result is an ErrorResponse.

property metadata: Any | None

Contains the metadata from the APIResponse handling steps that extract response metadata from successfully received and parsed responses.

If an ErrorResponse was received instead, the value of this property is None.

model_config: ClassVar[ConfigDict] = {}

Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].

page: int
property parsed_response: Any | None

Contains the parsed response content from the APIResponse handling steps that extract the JSON, XML, or YAML content from a successfully received response.

If an ErrorResponse was received instead, the value of this property is None.

property processed_records: list[dict[Any, Any]] | None

Contains the processed records from the APIResponse processing step after a successfully received response has been processed.

If an error response was received instead, the value of this property is None.

provider_name: str
query: str
property response: Response | ResponseProtocol | None

Helper method directly referencing the original or reconstructed response or response-like object from the API Response if available.

If the received response is not available (None in the response_result), then this value will also be absent (None).

response_result: ProcessedResponse | ErrorResponse | None
class scholar_flux.api.models.SearchResultList(iterable=(), /)[source]

Bases: list[SearchResult]

A helper class used to store the results of multiple SearchResult instances for enhanced type safety. This class inherits from a list and extends its functionality to tailor its functionality to APIResponses received from SearchCoordinators and MultiSearchCoordinators.

- SearchResultList.append

Basic list.append implementation extended to accept only SearchResults

- SearchResultList.extend

Basic list.extend implementation extended to accept only iterables of SearchResults

- SearchResultList.filter

Removes NonResponses and ErrorResponses from the list of SearchResults

- SearchResultList.filter

Removes NonResponses and ErrorResponses from the list of SearchResults

- SearchResultList.join

Combines all records from ProcessedResponses into a list of dictionary-based records

Note Attempts to add other classes to the SearchResultList other than SearchResults will raise a TypeError.

append(item: SearchResult)[source]

Overwrites the default append method on the user dict to ensure that only SearchResult objects can be appended to the custom list.

Parameters:

item (SearchResult) – The response result containing the API response data, the provider name, and page associated with the response.

extend(other: SearchResultList | MutableSequence[SearchResult] | Iterable[SearchResult])[source]

Overwrites the default append method on the user dict to ensure that only an iterable of SearchResult objects can be appended to the SearchResultList.

Parameters:
  • other (Iterable[SearchResult]) – An iterable/sequence of response results containing the API response

  • data

  • name (the provider)

  • response (and page associated with the)

filter() SearchResultList[source]

Helper method that retains only elements from the original response that indicate successful processing.

join() list[dict[str, Any]][source]

Helper method for joining all successfully processed API responses into a single list of dictionaries that can be loaded into a pandas or polars dataframe.

Note that this method will only load processed responses that contain records that were also successfully extracted and processed.

Returns:

A single list containing all records retrieved from each page

Return type:

list[dict[str, Any]]