scholar_flux.api.models package
Submodules
scholar_flux.api.models.api_parameters module
The scholar_flux.api.models.api_parameters module implements the APIParameterMap and APIParameterConfig classes.
These two classes are designed for flexibility in the creation and handling of API Responses given provider-specific differences in request parameters and configuration.
- Classes:
- APIParameterMap:
Extends the BaseAPIParameterMap to provide factory functions and utilities to more efficiently retrieve and use default parameter maps.
- APIParameterConfig:
Uses or creates an APIParameterMap to prepare request parameters according to the specifications of the current provider’s API.
- class scholar_flux.api.models.api_parameters.APIParameterConfig(parameter_map: APIParameterMap)[source]
Bases:
objectUses an APIParameterMap instance and runtime parameter values to build parameter dictionaries for API requests.
- Parameters:
parameter_map (APIParameterMap) – The mapping of universal to API-specific parameter names.
- Class Attributes:
- DEFAULT_CORRECT_ZERO_INDEX (bool):
Autocorrects zero-indexed API parameter building specifications to only accept positive values when True. If otherwise False, page calculation APIs will start from page 0 if zero-indexed (i.e., arXiv).
Examples
>>> from scholar_flux.api import APIParameterConfig, APIParameterMap >>> # the API parameter map is defined and used to resolve parameters to the API's language >>> api_parameter_map = APIParameterMap( ... query='q', records_per_page = 'pagesize', start = 'page', auto_calculate_page = False ... ) # The APIParameterConfig defines class and settings that indicate how to create requests >>> api_parameter_config = APIParameterConfig(api_parameter_map, auto_calculate_page = False) # Builds parameters using the specification from the APIParameterMap >>> page = api_parameter_config.build_parameters(query= 'ml', page = 10, records_per_page=50) >>> print(page) # OUTPUT {'q': 'ml', 'page': 10, 'pagesize': 50}
- DEFAULT_CORRECT_ZERO_INDEX: ClassVar[bool] = True
- __init__(*args: Any, **kwargs: Any) None
- classmethod as_config(parameter_map: dict | BaseAPIParameterMap | APIParameterMap | APIParameterConfig) APIParameterConfig[source]
Factory method for creating a new APIParameterConfig from a dictionary or APIParameterMap.
This helper class method resolves the structure of the APIParameterConfig against its basic building blocks to create a new configuration when possible.
- Parameters:
parameter_map (dict | BaseAPIParameterMap | APIParameterMap | APIParameterConfig) – A parameter mapping/config to use in the instantiation of an APIParameterConfig.
- Returns:
A new structure from the inputs
- Return type:
- Raises:
APIParameterException – If there is an error in the creation/resolution of the required parameters
- build_parameters(query: str | None, page: int | None, records_per_page: int, **api_specific_parameters) Dict[str, Any][source]
Builds the dictionary of request parameters using the current parameter map and provided values at runtime.
- Parameters:
query (Optional[str]) – The search query string.
page (Optional[int]) – The page number for pagination (1-based).
records_per_page (int) – Number of records to fetch per page.
**api_specific_parameters – Additional API-specific parameters to include.
- Returns:
The fully constructed API request parameters dictionary, with keys as API-specific parameter names and values as provided.
- Return type:
Dict[str, Any]
- classmethod from_defaults(provider_name: str, **additional_parameters) APIParameterConfig[source]
Factory method to create APIParameterConfig instances with sensible defaults for known APIs.
If the provider_name does not exist, the code will raise an exception.
- Parameters:
provider_name (str) – The name of the API to create the parameter map for.
api_key (Optional[str]) – API key value if required.
additional_parameters (dict) – Additional parameter mappings.
- Returns:
Configured parameter config instance for the specified API.
- Return type:
- Raises:
NotImplementedError – If the API name is unknown.
- classmethod get_defaults(provider_name: str, **additional_parameters) APIParameterConfig | None[source]
Factory method to create APIParameterConfig instances with sensible defaults for known APIs.
Avoids throwing an error if the provider name does not already exist.
- Parameters:
provider_name (str) – The name of the API to create the parameter map for.
additional_parameters (dict) – Additional parameter mappings.
- Returns:
Configured parameter config instance for the specified API. Returns None if a mapping for the provider_name isn’t retrieved
- Return type:
Optional[APIParameterConfig]
- property map: APIParameterMap
Helper property that is an alias for the APIParameterMap attribute.
The APIParameterMap maps all universal parameters to the parameter names specific to the API provider.
- Returns:
The mapping that the current APIParameterConfig will use to build a dictionary of parameter requests specific to the current API.
- Return type:
- parameter_map: APIParameterMap
- class scholar_flux.api.models.api_parameters.APIParameterMap(*, query: str, records_per_page: str, start: str | None = None, api_key_parameter: str | None = None, api_key_required: bool = False, auto_calculate_page: bool = True, zero_indexed_pagination: bool = False, api_specific_parameters: ~typing.Dict[str, ~scholar_flux.api.models.base_parameters.APISpecificParameter] = <factory>)[source]
Bases:
BaseAPIParameterMapExtends BaseAPIParameterMap by adding validation and the optional retrieval of provider defaults for known APIs.
This class also specifies default mappings for specific attributes such as API keys and additional parameter names.
- query
The API-specific parameter name for the search query.
- Type:
str
- start
The API-specific parameter name for pagination (start index or page number).
- Type:
Optional[str]
- records_per_page
The API-specific parameter name for records per page.
- Type:
str
- api_key_parameter
The API-specific parameter name for the API key.
- Type:
Optional[str]
- api_key_required
Indicates whether an API key is required.
- Type:
bool
- auto_calculate_page
If True, calculates start index from page; if False, passes page number directly.
- Type:
bool
- zero_indexed_pagination
If True, treats 0 as an allowed page value when retrieving data from APIs.
- Type:
bool
- api_specific_parameters
Additional universal to API-specific parameter mappings.
- Type:
Dict[str, str]
- api_key_parameter: str | None
- api_key_required: bool
- api_specific_parameters: Dict[str, APISpecificParameter]
- auto_calculate_page: bool
- classmethod from_defaults(provider_name: str, **additional_parameters) APIParameterMap[source]
Factory method that uses the APIParameterMap.get_defaults classmethod to retrieve the provider config.
Raises an error if the provider does not exist.
- Parameters:
provider_name (str) – The name of the API to create the parameter map for.
additional_parameters (dict) – Additional parameter mappings.
- Returns:
Configured parameter map for the specified API.
- Return type:
- Raises:
NotImplementedError – If the API name is unknown.
- classmethod get_defaults(provider_name: str, **additional_parameters) APIParameterMap | None[source]
Factory method to create APIParameterMap instances with sensible defaults for known APIs.
This class method attempts to pull from the list of known providers defined in the scholar_flux.api.providers.provider_registry and returns None if an APIParameterMap for the provider cannot be found.
Using the additional_parameters keyword arguments, users can specify optional overrides for specific parameters if needed. This is helpful in circumstances where an API’s specification overlaps with that of a known provider.
Valid providers (as indicated in provider_registry) include:
springernature
plos
arxiv
openalex
core
crossref
- Parameters:
provider_name (str) – The name of the API provider to retrieve the parameter map for.
additional_parameters (dict) – Additional parameter mappings.
- Returns:
Configured parameter map for the specified API.
- Return type:
Optional[APIParameterMap]
- model_config: ClassVar[ConfigDict] = {}
Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].
- query: str
- records_per_page: str
- classmethod set_default_api_key_parameter(values: dict[str, Any]) dict[str, Any][source]
Sets the default for the api key parameter when api_key_required`=True and `api_key_parameter is None.
- Parameters:
values (dict[str, Any]) – The dictionary of attributes to validate
- Returns:
The updated parameter values passed to the APIParameterMap. api_key_parameter is set to “api_key” if key is required but not specified
- Return type:
dict[str, Any]
- start: str | None
- classmethod validate_api_specific_parameter_mappings(values: dict[str, Any]) dict[str, Any][source]
Validates the additional mappings provided to the APIParameterMap.
This method validates that the input is dictionary of mappings that consists of only string-typed keys mapped to API-specific parameters as defined by the APISpecificParameter class.
- Parameters:
values (dict[str, Any]) – The dictionary of attribute values to validate.
- Returns:
The updated dictionary if validation passes.
- Return type:
dict[str, Any]
- Raises:
APIParameterException – If api_specific_parameters is not a dictionary or contains non-string keys/values.
- zero_indexed_pagination: bool
scholar_flux.api.models.base_parameters module
The scholar_flux.api.models.base_parameters module implements BaseAPIParameterMap and APISpecificParameter classes.
These classes define the core and API-specific fields required to interact with and create requests to API providers.
- Classes:
BaseAPIParameterMap: Defines parameters for interacting with a provider’s API specification. APISpecificParameters: Defines optional and required parameters specific to an API provider.
- class scholar_flux.api.models.base_parameters.APISpecificParameter(name: str, description: str, validator: Callable[[Any], Any] | None = None, default: Any = None, required: bool = False)[source]
Bases:
objectDataclass that defines the specification of an API-specific parameter for an API provider.
Implements optionally specifiable defaults, validation steps, and indicators for optional vs. required fields.
- Parameters:
name (str) – The name of the parameter used when sending requests to APis.
description (str) – A description of the API-specific parameter.
validator (Optional[Callable[[Any], Any]]) – An optional function/method for verifying and pre-processing parameter input based on required types, constrained values, etc.
default (Any) – An default value used for the parameter if not specified by the user
required (bool) – Indicates whether the current parameter is required for API calls.
- __init__(*args: Any, **kwargs: Any) None
- default: Any = None
- description: str
- name: str
- required: bool = False
- structure(flatten: bool = False, show_value_attributes: bool = True) str[source]
Helper method for showing the structure of the current APISpecificParameter.
- validator: Callable[[Any], Any] | None = None
- property validator_name
Helper method for generating a human readable string from the validator function, if used.
- class scholar_flux.api.models.base_parameters.BaseAPIParameterMap(*, query: str, records_per_page: str, start: str | None = None, api_key_parameter: str | None = None, api_key_required: bool = False, auto_calculate_page: bool = True, zero_indexed_pagination: bool = False, api_specific_parameters: ~typing.Dict[str, ~scholar_flux.api.models.base_parameters.APISpecificParameter] = <factory>)[source]
Bases:
BaseModelBase class for Mapping universal SearchAPI parameter names to API-specific parameter names.
Includes core logic for distinguishing parameter names, indicating required API keys, and defining pagination logic.
- query
The API-specific parameter name for the search query.
- Type:
str
- start
The API-specific parameter name for optional pagination (start index or page number).
- Type:
Optional[str]
- records_per_page
The API-specific parameter name for records per page.
- Type:
str
- api_key_parameter
The API-specific parameter name for the API key.
- Type:
Optional[str]
- api_key_required
Indicates whether an API key is required.
- Type:
bool
- page_required
If True, indicates that a page is required.
- Type:
bool
- auto_calculate_page
If True, calculates start index from page; if False, passes page number directly.
- Type:
bool
- zero_indexed_pagination
Treats page=0 as an allowed page value when retrieving data from the API.
- Type:
bool
- api_specific_parameters
Additional API-specific parameter mappings.
- Type:
Dict[str, APISpecificParameter]
- api_key_parameter: str | None
- api_key_required: bool
- api_specific_parameters: Dict[str, APISpecificParameter]
- auto_calculate_page: bool
- classmethod from_dict(obj: Dict[str, Any]) BaseAPIParameterMap[source]
Create a new instance of BaseAPIParameterMap from a dictionary.
- Parameters:
obj (dict) – The dictionary containing the data for the new instance.
- Returns:
A new instance created from the given dictionary.
- Return type:
- model_config: ClassVar[ConfigDict] = {}
Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].
- query: str
- records_per_page: str
- show_parameters() list[source]
Helper method to show the complete list of all parameters that can be found in the current ParameterMap.
- Returns:
The complete list of all universal and api specific parameters corresponding to the current API
- Return type:
List
- start: str | None
- structure(flatten: bool = False, show_value_attributes: bool = True) str[source]
Helper method that shows the current structure of the BaseAPIParameterMap.
- to_dict() Dict[str, Any][source]
Convert the current instance into a dictionary representation.
- Returns:
A dictionary representation of the current instance.
- Return type:
Dict
- update(other: BaseAPIParameterMap | Dict[str, Any]) BaseAPIParameterMap[source]
Update the current instance with values from another BaseAPIParameterMap or dictionary.
- Parameters:
other (BaseAPIParameterMap | Dict) – The object containing updated values.
- Returns:
A new instance with updated values.
- Return type:
- zero_indexed_pagination: bool
scholar_flux.api.models.provider_config module
The scholar_flux.api.models.provider_config module implements the basic provider configuration necessary for interacting with APIs.
It provides the foundational information necessary for the SearchAPI to resolve provider names to the URLs of the providers as well as basic defaults necessary for interaction.
- class scholar_flux.api.models.provider_config.ProviderConfig(*, provider_name: Annotated[str, MinLen(min_length=1)], base_url: str, parameter_map: BaseAPIParameterMap, records_per_page: Annotated[int, Ge(ge=0), Le(le=1000)] = 20, request_delay: Annotated[float, Ge(ge=0)] = 6.1, api_key_env_var: str | None = None, docs_url: str | None = None)[source]
Bases:
BaseModelConfig for creating the basic instructions and settings necessary to interact with new providers. This config on initialization is created for default providers on package initialization in the scholar_flux.api.providers submodule. A new, custom provider or override can be added to the provider_registry (A custom user dictionary) from the scholar_flux.api.providers module.
- Parameters:
provider_name (str) – The name of the provider to be associated with the config.
base_url (str) – The URL of the provider to send requests with the specified parameters.
parameter_map (BaseAPIParameterMap) – The parameter map indicating the specific semantics of the API.
records_per_page (int) – Generally the upper limit (for some APIs) or reasonable limit for the number of retrieved records per request (specific to the API provider).
request_delay (float) – Indicates exactly how many seconds to wait before sending successive requests Note that the requested interval may vary based on the API provider.
api_key_env_var (Optional[str]) – Indicates the environment variable to look for if the API requires or accepts API keys.
docs_url – (Optional[str]): An optional URL that indicates where documentation related to the use of the API can be found.
- Example Usage:
>>> from scholar_flux.api import ProviderConfig, APIParameterMap, SearchAPI >>> # Maps each of the individual parameters required to interact with the Guardian API >>> parameters = APIParameterMap(query='q', >>> start='page', >>> records_per_page='page-size', >>> api_key_parameter='api-key', >>> auto_calculate_page=False, >>> api_key_required=True) >>> # creating the config object that holds the basic configuration necessary to interact with the API >>> guardian_config = ProviderConfig(provider_name = 'GUARDIAN', >>> parameter_map = parameters, >>> base_url = 'https://content.guardianapis.com//search', >>> records_per_page=10, >>> api_key_env_var='GUARDIAN_API_KEY', >>> request_delay=6) >>> api = SearchAPI.from_provider_config(query = 'economic welfare', >>> provider_config = guardian_config, >>> use_cache = True) >>> assert api.provider_name == 'guardian' >>> response = api.search(page = 1) # assumes that you have the GUARDIAN_API_KEY stored as an env variable >>> assert response.ok
- api_key_env_var: str | None
- base_url: str
- docs_url: str | None
- model_config: ClassVar[ConfigDict] = {'str_strip_whitespace': True}
Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].
- classmethod normalize_provider_name(v: str) str[source]
Helper method for normalizing the names of providers to a consistent structure.
- parameter_map: BaseAPIParameterMap
- provider_name: str
- records_per_page: int
- request_delay: float
- search_config_defaults() dict[str, Any][source]
Convenience Method for retrieving ProviderConfig fields as a dict. Useful for providing the missing information needed to create a SearchAPIConfig object for a provider when only the provider_name has been provided.
- Returns:
- A dictionary containing the URL, name, records_per_page, and request_delay
for the current provider.
- Return type:
(dict)
- structure(flatten: bool = False, show_value_attributes: bool = True) str[source]
Helper method that shows the current structure of the ProviderConfig.
scholar_flux.api.models.provider_registry module
The scholar_flux.models.provider_registry module implements the ProviderRegistry class which extends a dictionary to map provider names to their scholar_flux ProviderConfig.
When scholar_flux uses a provider_name to create a SearchAPI or SearchCoordinator, the package-level provider_registry is instantiated and referenced to retrieve the necessary configuration for easier interaction and specification of APIs.
- class scholar_flux.api.models.provider_registry.ProviderRegistry(dict=None, /, **kwargs)[source]
Bases:
BaseProviderDictThe ProviderRegistry implementation allows the smooth and efficient retrieval of API parameter maps and default configuration settings to aid in the creation of a SearchAPI that is specific to the current API.
Note that the ProviderRegistry uses the ProviderConfig._normalize_name to ignore underscores and case-sensitivity.
- - ProviderRegistry.from_defaults
Dynamically imports configurations stored within scholar_flux.api.providers, and fails gracefully if a provider’s module does not contain a ProviderConfig.
- - ProviderRegistry.get
resolves a provider name to its ProviderConfig if it exists in the registry.
- - ProviderRegistry.get_from_url
resolves a provider URL to its ProviderConfig if it exists in the registry.
- add(provider_config: ProviderConfig) None[source]
Helper method for adding a new provider to the provider registry.
- create(provider_name: str, **kwargs) ProviderConfig[source]
Helper method that creates and registers a new ProviderConfig with the current provider registry.
- Parameters:
key (str) – The name of the provider to create a new provider_config for.
**kwargs – Additional keyword arguments to pass to scholar_flux.api.models.ProviderConfig
- classmethod from_defaults() ProviderRegistry[source]
Helper method that dynamically loads providers from the scholar_flux.api.providers module specifically reserved for default provider configs.
- Returns:
A new registry containing the loaded default provider configurations
- Return type:
- get_from_url(provider_url: str | None) ProviderConfig | None[source]
Attempt to retrieve a ProviderConfig instance for the given provider by resolving the provided url to the provider’s. Will not throw an error in the event that the provider does not exist.
- Parameters:
provider_url (Optional[str]) – Name of the default provider
- Returns:
Instance configuration for the provider if it exists, else None
- Return type:
Optional[ProviderConfig]
scholar_flux.api.models.reconstructed_response module
The scholar_flux.api.reconstructed_response module implements a basic ReconstructedResponse data structure.
The ReconstructedResponse class was designed to be request-client agnostic to improve flexibility in the request clients that can be used to retrieve data from APIs and load response data from cache.
The ReconstructedResponse is a minimal implementation of a response-like object that can transform response classes from requests, httpx, and asyncio into a singular representation of the same response.
- class scholar_flux.api.models.reconstructed_response.ReconstructedResponse(status_code: int, reason: str, headers: MutableMapping[str, str], content: bytes, url: Any)[source]
Bases:
objectHelper class for retaining the most relevant of fields when reconstructing responses from different sources such as requests and httpx (if chosen). The primary purpose of the ReconstructedResponse in scholar_flux is to create a minimal representation of a response when we need to construct a ProcessedResponse without an actual response and verify content fields.
In applications such as retrieving cached data from a scholar_flux.data_storage.DataCacheManager, if an original or cached response is not available, then a ReconstructedResponse is created from the cached response fields when available.
- Parameters:
status_code (int) – The integer code indicating the status of the response
reason (str) – Indicates the reasoning associated with the status of the response
MutableMapping[str (headers) – Indicates metadata associated with the response (e.g. Content-Type, etc.)
str] – Indicates metadata associated with the response (e.g. Content-Type, etc.)
content (bytes) – The content within the response
url – (Any): The URL from which the response was received
Note
The ReconstructedResponse.build factory method is recommended in cases when one property may contain the needed fields but may need to be processed and prepared first before being used. Examples include instances where one has text or json data instead of content, a reason_phrase field instead of reason, etc.
Example
>>> from scholar_flux.api.models import ReconstructedResponse # build a response using a factory method that infers fields from existing ones when not directly specified >>> response = ReconstructedResponse.build(status_code = 200, content = b"success", url = "https://google.com") # check whether the current class follows a ResponseProtocol and contains valid fields >>> assert response.is_response() # OUTPUT: True >>> response.validate() # raises an error if invalid >>> response.raise_for_status() # no error for 200 status codes >>> assert response.reason == 'OK' == response.status # inferred from the status_code attribute
- __init__(status_code: int, reason: str, headers: MutableMapping[str, str], content: bytes, url: Any) None
- asdict() dict[str, Any][source]
Helper method for converting the ReconstructedResponse into a dictionary containing attributes and their corresponding values.
- classmethod build(response: Any | None = None, **kwargs) ReconstructedResponse[source]
Helper method for building a new ReconstructedResponse from a regular response object. This classmethod can either construct a new ReconstructedResponse object from a response object or response-like object or create a new ReconstructedResponse altogether with its inputs.
- Parameters:
response – (Optional[Any]): A response or response-like object of unknown type or None
- kwargs: The underlying components needed to construct a new response. Note that ideally,
this set of key-value pairs would be specific only to the types expected by the ReconstructedResponse.
- content: bytes
- classmethod fields() list[source]
Helper method for retrieving a list containing the names of all fields associated with the ReconstructedResponse class.
- Returns:
A list containing the name of each attribute in the ReconstructedResponse.
- Return type:
list[str]
- classmethod from_keywords(**kwargs) ReconstructedResponse[source]
Uses the provided keyword arguments to create a ReconstructedResponse. keywords include the default attributes of the ReconstructedResponse, or can be inferred and processed from other keywords.
- Parameters:
status_code (int) – The integer code indicating the status of the response
reason (str) – Indicates the reasoning associated with the status of the response
headers (MutableMapping[str, str]) – Indicates metadata associated with the response (e.g. Content-Type)
content (bytes) – The content within the response
url – (Any): The URL from which the response was received
Some fields can be both provided directly or inferred from other similarly common fields:
content: [‘content’, ‘_content’, ‘text’, ‘json’]
headers: [‘headers’, ‘_headers’]
reason: [‘reason’, ‘status’, ‘reason_phrase’, ‘status_code’]
- Returns:
A newly reconstructed response from the given keyword components
- Return type:
- headers: MutableMapping[str, str]
- is_response() bool[source]
Method for directly validating the fields that indicate that a response has been minimally recreated successfully. The fields that are validated include:
status codes (should be an integer)
URLs (should be a valid url)
reasons (should originate from a reason attribute or inferred from the status code)
content (should be a bytes field or encoded from a string text field)
headers (should be a dictionary with string fields and preferably a content type
- Returns:
Indicates whether the current reconstructed response minimally recreates a response object.
- Return type:
bool
- json() Dict[str, Any] | List[Any] | None[source]
Return JSON-decoded body from the underlying response, if available.
- property ok: bool
Indicates whether the current response indicates a successful request (200 <= status_code < 400) or whether an invalid response has been received. Accounts for the.
- Returns:
True if the status code is an integer value within the range of 200 and 399, False otherwise
- Return type:
bool
- raise_for_status() None[source]
Method that imitates the capability of the requests and httpx response types to raise errors when encountering status codes that are indicative of failed responses.
As scholar_flux processes data that is generally only sent when status codes are within the 200s (or exactly 200 [ok]), an error is raised when encountering a value outside of this range.
- Raises:
InvalidResponseReconstructionException – If the structure of the ReconstructedResponse is invalid
RequestException – If the expected response is not within the range of 200-399
- reason: str
- property status: str | None
Helper property for retrieving a human-readable status description of the status.
- Returns:
The status description associated with the response (if available)
- Return type:
Optional[int]
- status_code: int
- property text: str | None
Helper property for retrieving the text from the bytes content as a string.
- Returns:
The decoded text from the content of the response
- Return type:
Optional[str]
- url: Any
- validate() None[source]
Raises an error if the recreated response object does not contain valid properties expected of a response. if the response validation is successful, a response is not raised and an object is not returned.
- Raises:
InvalidResponseReconstructionException – if at least one field is determined to be invalid and unexpected of a true response object.
scholar_flux.api.models.response_types module
Helper module used to define response types returned by scholar-flux after API response retrieval and processing.
- The APIResponseType is a union of different possible response types that can be received from a SearchCoordinator:
ProcessedResponse: A successfully processed response containing parsed response metadata, and processed records.
ErrorResponse: Indicates that an error has occurred during response retrieval and/or processing when unsuccessful.
NonResponse: ErrorResponse subclass indicating when an error prevents the successful retrieval of a response.
scholar_flux.api.models.responses module
The scholar_flux.api.models.responses module contains the core response types used to indicate whether the retrieval and processing of API responses was successful or unsuccessful. Each class uses pydantic to ensure type-validated responses while ensuring flexibility in how responses can be used and applied.
- Classes:
- ProcessedResponse:
Indicates whether an API was successfully retrieved, parsed, and processed. This model is designed to facilitate the inspection of intermediate results and retrieval of extracted response records.
- ErrorResponse:
Indicates that an error occurred somewhere in the retrieval or processing of an API response. This class is designed to allow inspection of error messages and failure results to aid in debugging in case of unexpected scenarios.
- NonResponse:
Inherits from ErrorResponse and is designed to indicate that an error occurred in the preparation of a request or the sending/retrieval of a response.
- class scholar_flux.api.models.responses.APIResponse(*, cache_key: str | None = None, response: Any | None = None, created_at: str | None = None)[source]
Bases:
BaseModelA Response wrapper for responses of different types that allows consistency when using several possible backends. The purpose of this class is to serve as the base for managing responses received from scholarly APIs while processing each component in a predictable, reproducible manner,
This class uses pydantic’s data validation and serialization/deserialization methods to aid caching and includes properties that refer back to the original response for displaying valid response codes, URLs, etc.
All future processing/error-based responses classes inherit from and build off of this class.
- Parameters:
cache_key (Optional[str]) – A string for recording cache keys for use in later steps of the response orchestration involving processing, cache storage, and cache retrieval
response (Any) – A response or response-like object to be validated and used/re-used in later caching and response processing/orchestration steps.
created_at (Optional[str]) – A value indicating the time in which a response or response-like object was created.
Example
>>> from scholar_flux.api import APIResponse # Using keyword arguments to build a basic APIResponse data container: >>> response = APIResponse.from_response( >>> cache_key = 'test-response', >>> status_code = 200, >>> content=b'success', >>> url='https://example.com', >>> headers={'Content-Type': 'application/text'} >>> ) >>> response # OUTPUT: APIResponse(cache_key='test-response', response = ReconstructedResponse( # status_code=200, reason='OK', headers={'Content-Type': 'application/text'}, # text='success', url='https://example.com' #) >>> assert response.status == 'OK' and response.text == 'success' and response.url == 'https://example.com' # OUTPUT: True >>> assert response.validate_response() # OUTPUT: True
- classmethod as_reconstructed_response(response: Any) ReconstructedResponse[source]
Classmethod designed to create a reconstructed response from an original response object. This method coerces response attributes into a reconstructed response that retains the original content, status code, headers, URL, reason, etc.
- Returns:
- A minimal response object that contains the core attributes needed to support
other processes in the scholar_flux module such as response parsing and caching.
- Return type:
- cache_key: str | None
- property content: bytes | None
Return content from the underlying response, if available and valid.
- Returns:
The bytes from the original response content
- Return type:
(bytes)
- created_at: str | None
- encode_response(response: Any) Dict[str, Any] | List[Any] | None[source]
Helper method for serializing a response into a json format. Accounts for special cases such as CaseInsensitiveDict fields that are otherwise unserializable.
From this step, pydantic can safely use json internally to dump the encoded response fields
- classmethod from_response(response: Any | None = None, cache_key: str | None = None, auto_created_at: bool | None = None, **kwargs) Self[source]
Construct an APIResponse from a response object or from keyword arguments.
If response is not a valid response object, builds a minimal response-like object from kwargs.
- classmethod from_serialized_response(response: Any | None = None, **kwargs) ReconstructedResponse | None[source]
Helper method for creating a new APIresponse from the original dumped object. This method Accounts for lack of ease of serialization of responses by decoding the response dictionary that was loaded from a string using json.loads from the json module in the standard library.
If the response input is still a serialized string, this method will manually load the response dict with the APIresponse._deserialize_response_dict class method before further processing.
- Parameters:
response (Any) – A prospective response value to load into the API Response.
- Returns:
A reconstructed response object, if possible. Otherwise returns None
- Return type:
Optional[ReconstructedResponse]
- property headers: MutableMapping[str, str] | None
Return headers from the underlying response, if available and valid.
- Returns:
A dictionary of headers from the response
- Return type:
MutableMapping[str, str]
- model_config: ClassVar[ConfigDict] = {}
Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].
- raise_for_status()[source]
Uses an underlying response object to validate the status code associated with the request.
If the attribute isn’t a response or reconstructed response, the code will coerce the class into a response object to verify the status code for the request URL and response.
- property reason: str | None
Uses the underlying reason attribute on the response object, if available, to create a human readable status description.
- Returns:
The status description associated with the response.
- Return type:
Optional[str]
- response: Any | None
- classmethod serialize_response(response: Response | ResponseProtocol) str | None[source]
Helper method for serializing a response into a json format. The response object is first converted into a serialized string and subsequently dumped after ensuring that the field is serializable.
- Parameters:
response (Response, ResponseProtocol)
- property status: str | None
Helper property for retrieving a human-readable status description APIResponse.
- Returns:
The status description associated with the response (if available).
- Return type:
Optional[int]
- property status_code: int | None
Helper property for retrieving a status code from the APIResponse.
- Returns:
The status code associated with the response (if available)
- Return type:
Optional[int]
- property text: str | None
Attempts to retrieve the response text by first decoding the bytes of the its content. If not available, this property attempts to directly reference the text attribute directly.
- Returns:
A text string if the text is available in the correct format, otherwise None
- Return type:
Optional[str]
- classmethod transform_response(v: Any) Response | ResponseProtocol | None[source]
Attempts to resolve a response object as an original or ReconstructedResponse: All original response objects (duck-typed or requests response) with valid values will be returned as is.
If the passed object is a string - this function will attempt to serialize it before attempting to parse it as a dictionary.
Dictionary fields will be decoded, if originally encoded, and parsed as a ReconstructedResponse object, if possible.
Otherwise, the original object is returned as is.
- property url: str | None
Return URL from the underlying response, if available and valid.
- Returns:
- A string of the original URL if available. Accounts for objects that
that indicate the original url when converted as a string
- Return type:
str
- classmethod validate_iso_timestamp(v: str | datetime | None) str | None[source]
Helper method for validating and ensuring that the timestamp accurately follows an iso 8601 format.
- validate_response() bool[source]
Helper method for determining whether the response attribute is truly a response. If the response isn’t a requests response, we use duck-typing to determine whether the response attribute, itself, has the expected attributes of a response by using properties for checking types vs None (if the attribute isn’t the expected type)
- Returns:
- An indicator of whether the current APIResponse.response attribute is
actually a response
- Return type:
bool
- class scholar_flux.api.models.responses.ErrorResponse(*, cache_key: str | None = None, response: Any | None = None, created_at: str | None = None, message: str | None = None, error: str | None = None)[source]
Bases:
APIResponseReturned when something goes wrong, but we don’t want to throw immediately—just hand back failure details.
The class is formatted for compatibility with the ProcessedResponse,
- cache_key: str | None
- created_at: str | None
- property data: None
Provided for type hinting + compatibility.
- error: str | None
- property extracted_records: None
Provided for type hinting + compatibility.
- classmethod from_error(message: str, error: Exception, cache_key: str | None = None, response: Response | ResponseProtocol | None = None) Self[source]
Creates and logs the processing error if one occurs during response processing.
- Parameters:
response (Response) – Raw API response.
cache_key (Optional[str]) – Cache key for storing results.
- Returns:
- A Dataclass Object that contains the error response data
and background information on what precipitated the error.
- Return type:
- message: str | None
- property metadata: None
Provided for type hinting + compatibility.
- model_config: ClassVar[ConfigDict] = {}
Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].
- property parsed_response: None
Provided for type hinting + compatibility.
- property processed_records: None
Provided for type hinting + compatibility.
- response: Any | None
- class scholar_flux.api.models.responses.NonResponse(*, cache_key: str | None = None, response: None = None, created_at: str | None = None, message: str | None = None, error: str | None = None)[source]
Bases:
ErrorResponseResponse class used to indicate that an error occurred in the preparation of a request or in the retrieval of a response object from an API.
This class is used to signify the error that occurred within the search process using a similar interface as the other scholar_flux Response dataclasses.
- cache_key: str | None
- created_at: str | None
- error: str | None
- message: str | None
- model_config: ClassVar[ConfigDict] = {}
Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].
- response: None
- class scholar_flux.api.models.responses.ProcessedResponse(*, cache_key: str | None = None, response: Any | None = None, created_at: str | None = None, parsed_response: Any | None = None, extracted_records: List[Any] | None = None, processed_records: List[Dict[Any, Any]] | None = None, metadata: Any | None = None, message: str | None = None)[source]
Bases:
APIResponseHelper class for returning a ProcessedResponse object that contains information on the original, cached, or reconstructed_response received and processed after retrieval from an API in addition to the cache key. This object also allows storage of intermediate steps including:
1) parsed responses 2) extracted records and metadata 3) processed records (aliased as data) 4) any additional messages An error field is provided for compatibility with the ErrorResponse class.
- cache_key: str | None
- created_at: str | None
- property data: List[Dict[Any, Any]] | None
Alias to the processed_records attribute that holds a list of dictionaries, when available.
- property error: None
Provided for type hinting + compatibility.
- extracted_records: List[Any] | None
- message: str | None
- metadata: Any | None
- model_config: ClassVar[ConfigDict] = {}
Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].
- parsed_response: Any | None
- processed_records: List[Dict[Any, Any]] | None
- response: Any | None
scholar_flux.api.models.search_api_config module
The scholar_flux.api.models.search_api_config module implements the core SearchAPIConfig used to drive API searches.
The SearchAPIConfig is used by the SearchAPI to interact with API providers via a unified interface for orchestrating response retrieval.
This configuration defines settings such as rate limiting, the number of records retrieved per request, API keys, and the API provider/URL where requests will be sent.
Under the hood, the SearchAPIConfig can use both pre-created and custom defaults to create a new configuration with minimal code.
- class scholar_flux.api.models.search_api_config.SearchAPIConfig(*, provider_name: str = '', base_url: str = '', records_per_page: Annotated[int, Ge(ge=0), Le(le=1000)] = 20, request_delay: float = -1, api_key: SecretStr | None = None, api_specific_parameters: dict[str, Any] | None = None)[source]
Bases:
BaseModelThe SearchAPIConfig class provides the core tools necessary to set and interact with the API. The SearchAPI uses this class to retrieve data from an API using universal parameters to simplify the process of retrieving raw responses.
- provider_name
Indicates the name of the API to use when making requests to a provider. If the provider name matches a known default and the base_url is unspecified, the base URL for the current provider is used instead.
- Type:
str
- base_url
Indicates the API URL where data will be searched and retrieved.
- Type:
str
- records_per_page
Controls the number of records that will appear on each page
- Type:
int
- request_delay
Indicates the minimum delay between each request to avoid exceeding API rate limits
- Type:
float
- api_key
This is an API-specific parameter for validating the current user’s identity. If a str type is provided, it is converted into a SecretStr.
- Type:
Optional[str | SecretStr]
- api_specific_parameters
A dictionary containing all parameters specific to the current API. API-specific parameters include the following.
- mailto (Optional[str | SecretStr]):
An optional email address for receiving feedback on usage from providers, This parameter is currently applicable only to the Crossref API.
- db: (str):
The parameter use by the NIH to direct requests for data to the pubmed database. This parameter defaults to pubmed and does not require direct specification
- Type:
dict[str, APISpecificParameter]
Examples
>>> from scholar_flux.api import SearchAPIConfig, SearchAPI, provider_registry # to create a CROSSREF configuration with minimal defaults and provide an api_specific_parameter: >>> config = SearchAPIConfig.from_defaults(provider_name = 'crossref', mailto = 'your_email_here@example.com') # the configuration automatically retrieves the configuration for the "Crossref" API >>> assert config.provider_name == 'crossref' and config.base_url == provider_registry['crossref'].base_url >>> api = SearchAPI.from_settings(query = 'q', config = config) >>> assert api.config == config # to retrieve all defaults associated with a provider and automatically read an API key if needed >>> config = SearchAPIConfig.from_defaults(provider_name = 'pubmed', api_key = 'your api key goes here') # the API key is retrieved automatically if you have the API key specified as an environment variable >>> assert config.api_key is not None # Default provider API specifications are already pre-populated if they are set with defaults >>> assert config.api_specific_parameters['db'] == 'pubmed' # required by pubmed and defaults to pubmed # Update a provider and automatically retrieve its API key - the previous API key will no longer apply >>> updated_config = SearchAPIConfig.update(config, provider_name = 'core') # The API key should have been overwritten to use core. Looks for a `CORE_API_KEY` env variable by default >>> assert updated_config.provider_name == 'core' and updated_config.api_key != config.api_key
- DEFAULT_PROVIDER: ClassVar[str] = 'PLOS'
- DEFAULT_RECORDS_PER_PAGE: ClassVar[int] = 25
- DEFAULT_REQUEST_DELAY: ClassVar[float] = 6.1
- MAX_API_KEY_LENGTH: ClassVar[int] = 512
- api_key: SecretStr | None
- api_specific_parameters: dict[str, Any] | None
- base_url: str
- classmethod default_request_delay(v: int | float | None, provider_name: str | None = None) float[source]
Helper method enabling the retrieval of the most appropriate rate limit for the current provider.
Defaults to the SearchAPIConfig default rate limit when the current provider is unknown and a valid rate limit has not yet been provided.
- Parameters:
v (Optional[int | float]) – The value received for the current request_delay
provider_name (Optional[str]) – The name of the provider to retrieve a rate limit for
- Returns:
- The inputted non-negative request delay, the retrieved rate limit for the current provider
if available, or the SearchAPIConfig.DEFAULT_REQUEST_DELAY - all in order of priority.
- Return type:
float
- classmethod from_defaults(provider_name: str, **overrides) SearchAPIConfig[source]
Uses the default configuration for the chosen provider to create a SearchAPIConfig object containing configuration parameters. Note that additional parameters and field overrides can be added via the **overrides field.
- Parameters:
provider_name (str) – The name of the provider to create the config
**overrides – Optional keyword arguments to specify overrides and additional arguments
- Returns:
A default APIConfig object based on the chosen parameters
- Return type:
- model_config: ClassVar[ConfigDict] = {}
Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].
- provider_name: str
- records_per_page: int
- request_delay: float
- classmethod set_records_per_page(v: int | None)[source]
Sets the records_per_page parameter with the default if the supplied value is not valid:
Triggers a validation error when request delay is an invalid type. Otherwise uses the DEFAULT_RECORDS_PER_PAGE class attribute if the supplied value is missing or is a negative number.
- structure(flatten: bool = False, show_value_attributes: bool = True) str[source]
Helper method for retrieving a string representation of the overall structure of the current SearchAPIConfig.
- classmethod update(current_config: SearchAPIConfig, **overrides) SearchAPIConfig[source]
Create a new SearchAPIConfig by updating an existing config with new values and/or switching to a different provider. This method ensures that the new provider’s base_url and defaults are used if provider_name is given, and that API-specific parameters are prioritized and merged as expected.
- Parameters:
current_config (SearchAPIConfig) – The existing configuration to update.
**overrides – Any fields or API-specific parameters to override or add.
- Returns:
A new config with the merged and prioritized values.
- Return type:
- property url_basename: str
Uses the _extract_url_basename method from the provider URL associated with the current config instance.
- classmethod validate_api_key(v: SecretStr | str | None) SecretStr | None[source]
Validates the api_key attribute and triggers a validation error if it is not valid.
- classmethod validate_provider_name(v: str | None) str[source]
Validates the provider_name attribute and triggers a validation error if it is not valid.
- classmethod validate_request_delay(v: int | float | None) int | float | None[source]
Sets the request delay (delay between each request) for valid request delays. This validator triggers a validation error when the request delay is an invalid type.
If a request delay is left None or is a negative number, this class method returns -1, and further validation is performed by cls.default_request_delay to retrieve the provider’s default request delay.
If not available, SearchAPIConfig.DEFAULT_REQUEST_DELAY is used.
- validate_search_api_config_parameters() Self[source]
Validation method that resolves URLs and/or provider names to provider_info when one or the other is not explicitly provided.
Occurs as the last step in the validation process.
scholar_flux.api.models.search_inputs module
The scholar_flux.api.models.search_inputs module implements the PageListInput RootModel for multi-page searches.
The PageListInput model is designed to validate and prepare lists and iterables of page numbers for multi-page retrieval using the SearchCoordinator.search_pages method.
- class scholar_flux.api.models.search_inputs.PageListInput(root: RootModelRootType = PydanticUndefined)[source]
Bases:
RootModel[Sequence[int]]Helper class for processing page information in a predictable manner. The PageListInput class expects to receive a list, string, or generator that contains at least one page number. If a singular integer is received, the result is transformed into a single-item list containing that integer.
- Parameters:
root (Sequence[int]) – A list containing at least one page number.
Examples
>>> from scholar_flux.api.models import PageListInput >>> PageListInput(5) PageListInput([5]) >>> PageListInput(range(5)) PageListInput([1, 2, 3, 4])
- model_config: ClassVar[ConfigDict] = {}
Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].
- property page_numbers: Sequence[int]
Returns the sequence of validated page numbers as a list.
- classmethod page_validation(v: str | int | Sequence[int | str]) Sequence[int][source]
Processes the page input to ensure that a list of integers is returned if the received page list is in a valid format.
- Parameters:
v (str | int | Sequence[int | str]) – A page or sequence of pages to be formatted as a list of pages.
- Returns:
A validated, formatted sequence of page numbers assuming successful page validation
- Return type:
Sequence[int]
- Raises:
ValidationError – Internally raised via pydantic if a ValueError is encountered (if the input is not exclusively a page or list of page numbers)
- classmethod process_page(page_value: str | int) int[source]
Helper method for ensuring that each value in the sequence is a numeric string or whole number.
Note that this function will not throw an error for negative pages as that is handled at a later step in the page search process.
- Parameters:
page_value (str | int) – The value to be converted if it is not already an integer
- Returns:
A validated integer if the page can be converted to an integer and is not a float
- Return type:
int
- Raises:
ValueError – When the value is not an integer or numeric string to be converted to an integer
scholar_flux.api.models.search_results module
The scholar_flux.api.models.search_results module defines the SearchResult and SearchResultList implementations that aid in the retrieval of multi-page and multi-coordinated searches.
These implementations allow increased organization for the API output of multiple searches by defining the provider, page, query, and response result retrieved from multi-page searches from the SearchCoordinator and multi-provider/page searches using the MultiSearchCoordinator.
- Classes:
- SearchResult:
Pydantic Base class that stores the search result as well as the query, provider name, and page.
- SearchResultList:
Inherits from a basic list to constrain the output to a list of SearchResults while providing data preparation convenience functions for downstream frameworks.
- class scholar_flux.api.models.search_results.SearchResult(*, query: str, provider_name: str, page: int, response_result: ProcessedResponse | ErrorResponse | None = None)[source]
Bases:
BaseModelCore class used in order to store data in the retrieval and processing of API Searches when iterating and searching over a range of pages, queries, and providers at a time. This class uses pydantic to ensure that field validation is automatic for ensuring integrity and reliability of response processing. multi-page searches that link each response result to a particular query, page, and provider.
- Parameters:
query (str) – The query used to retrieve records and response metadata
provider_name (str) – The name of the provider where data is being retrieved
page (int) – The page number associated with the request for data
response_result (Optional[ProcessedResponse | ErrorResponse]) – The response result containing the specifics of the data retrieved from the response or the error messages recorded if the request is not successful.
For convenience, the properties of the response_result are referenced as properties of the SearchResult, including: response, parsed_response, processed_records, etc.
- property cache_key: str | None
Extracts the cache key from the API Response if available.
This cache key is used when storing and retrieving data from response processing cache storage.
- property created_at: str | None
Extracts the time in which the ErrorResponse or ProcessedResponse was created, if available.
- property data: list[dict[Any, Any]] | None
Alias referring back to the processed records from the ProcessedResponse or ErrorResponse.
Contains the processed records from the APIResponse processing step after a successfully received response has been processed. If an error response was received instead, the value of this property is None.
- property error: str | None
Extracts the error name associated with the result from the base class, indicating the name/category of the error in the event that the response_result is an ErrorResponse.
- property extracted_records: list[Any] | None
Contains the extracted records from the APIResponse handling steps that extract individual records from successfully received and parsed response.
If an ErrorResponse was received instead, the value of this property is None.
- property message: str | None
Extracts the message associated with the result from the base class, indicating why an error occurred in the event that the response_result is an ErrorResponse.
- property metadata: Any | None
Contains the metadata from the APIResponse handling steps that extract response metadata from successfully received and parsed responses.
If an ErrorResponse was received instead, the value of this property is None.
- model_config: ClassVar[ConfigDict] = {}
Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].
- page: int
- property parsed_response: Any | None
Contains the parsed response content from the APIResponse handling steps that extract the JSON, XML, or YAML content from a successfully received response.
If an ErrorResponse was received instead, the value of this property is None.
- property processed_records: list[dict[Any, Any]] | None
Contains the processed records from the APIResponse processing step after a successfully received response has been processed.
If an error response was received instead, the value of this property is None.
- provider_name: str
- query: str
- property response: Response | ResponseProtocol | None
Helper method directly referencing the original or reconstructed response or response-like object from the API Response if available.
If the received response is not available (None in the response_result), then this value will also be absent (None).
- response_result: ProcessedResponse | ErrorResponse | None
- class scholar_flux.api.models.search_results.SearchResultList(iterable=(), /)[source]
Bases:
list[SearchResult]A helper class used to store the results of multiple SearchResult instances for enhanced type safety. This class inherits from a list and extends its functionality to tailor its functionality to APIResponses received from SearchCoordinators and MultiSearchCoordinators.
- - SearchResultList.append
Basic list.append implementation extended to accept only SearchResults
- - SearchResultList.extend
Basic list.extend implementation extended to accept only iterables of SearchResults
- - SearchResultList.filter
Removes NonResponses and ErrorResponses from the list of SearchResults
- - SearchResultList.filter
Removes NonResponses and ErrorResponses from the list of SearchResults
- - SearchResultList.join
Combines all records from ProcessedResponses into a list of dictionary-based records
Note Attempts to add other classes to the SearchResultList other than SearchResults will raise a TypeError.
- append(item: SearchResult)[source]
Overwrites the default append method on the user dict to ensure that only SearchResult objects can be appended to the custom list.
- Parameters:
item (SearchResult) – The response result containing the API response data, the provider name, and page associated with the response.
- extend(other: SearchResultList | MutableSequence[SearchResult] | Iterable[SearchResult])[source]
Overwrites the default append method on the user dict to ensure that only an iterable of SearchResult objects can be appended to the SearchResultList.
- Parameters:
other (Iterable[SearchResult]) – An iterable/sequence of response results containing the API response
data
name (the provider)
response (and page associated with the)
- filter() SearchResultList[source]
Helper method that retains only elements from the original response that indicate successful processing.
- join() list[dict[str, Any]][source]
Helper method for joining all successfully processed API responses into a single list of dictionaries that can be loaded into a pandas or polars dataframe.
Note that this method will only load processed responses that contain records that were also successfully extracted and processed.
- Returns:
A single list containing all records retrieved from each page
- Return type:
list[dict[str, Any]]
Module contents
The scholar_flux.api.models module includes all of the needed configuration classes that are needed to define the configuration needed to configure APIs for specific providers and to ensure that the process is orchestrated in a robust way.
- Core Models:
- APIParameterMap: Contains the mappings and settings used to customized common and API Specific parameters
to the requirements for each API.
APIParameterConfig: Encapsulates the created APIParameterMap as well as the methods used to create each request.
SearchAPIConfig: Defines the core logic to abstract the creation of requests with parameters specific to each API.
ProviderConfig: Allows users to define each of the defaults and mappings settings needed to create a Search API.
ProviderRegistry: A customized dictionary mapping provider names to their dynamically retrieved configuration.
ProcessedResponse: Indicates a successfully retrieved and processed response from an API provider.
ErrorResponse: Indicates that an exception occurred somewhere in the process of response retrieval and processing.
NonResponse: Indicates a that a response of any status code code not be retrieved due to an exception.
- class scholar_flux.api.models.APIParameterConfig(parameter_map: APIParameterMap)[source]
Bases:
objectUses an APIParameterMap instance and runtime parameter values to build parameter dictionaries for API requests.
- Parameters:
parameter_map (APIParameterMap) – The mapping of universal to API-specific parameter names.
- Class Attributes:
- DEFAULT_CORRECT_ZERO_INDEX (bool):
Autocorrects zero-indexed API parameter building specifications to only accept positive values when True. If otherwise False, page calculation APIs will start from page 0 if zero-indexed (i.e., arXiv).
Examples
>>> from scholar_flux.api import APIParameterConfig, APIParameterMap >>> # the API parameter map is defined and used to resolve parameters to the API's language >>> api_parameter_map = APIParameterMap( ... query='q', records_per_page = 'pagesize', start = 'page', auto_calculate_page = False ... ) # The APIParameterConfig defines class and settings that indicate how to create requests >>> api_parameter_config = APIParameterConfig(api_parameter_map, auto_calculate_page = False) # Builds parameters using the specification from the APIParameterMap >>> page = api_parameter_config.build_parameters(query= 'ml', page = 10, records_per_page=50) >>> print(page) # OUTPUT {'q': 'ml', 'page': 10, 'pagesize': 50}
- DEFAULT_CORRECT_ZERO_INDEX: ClassVar[bool] = True
- __init__(*args: Any, **kwargs: Any) None
- classmethod as_config(parameter_map: dict | BaseAPIParameterMap | APIParameterMap | APIParameterConfig) APIParameterConfig[source]
Factory method for creating a new APIParameterConfig from a dictionary or APIParameterMap.
This helper class method resolves the structure of the APIParameterConfig against its basic building blocks to create a new configuration when possible.
- Parameters:
parameter_map (dict | BaseAPIParameterMap | APIParameterMap | APIParameterConfig) – A parameter mapping/config to use in the instantiation of an APIParameterConfig.
- Returns:
A new structure from the inputs
- Return type:
- Raises:
APIParameterException – If there is an error in the creation/resolution of the required parameters
- build_parameters(query: str | None, page: int | None, records_per_page: int, **api_specific_parameters) Dict[str, Any][source]
Builds the dictionary of request parameters using the current parameter map and provided values at runtime.
- Parameters:
query (Optional[str]) – The search query string.
page (Optional[int]) – The page number for pagination (1-based).
records_per_page (int) – Number of records to fetch per page.
**api_specific_parameters – Additional API-specific parameters to include.
- Returns:
The fully constructed API request parameters dictionary, with keys as API-specific parameter names and values as provided.
- Return type:
Dict[str, Any]
- classmethod from_defaults(provider_name: str, **additional_parameters) APIParameterConfig[source]
Factory method to create APIParameterConfig instances with sensible defaults for known APIs.
If the provider_name does not exist, the code will raise an exception.
- Parameters:
provider_name (str) – The name of the API to create the parameter map for.
api_key (Optional[str]) – API key value if required.
additional_parameters (dict) – Additional parameter mappings.
- Returns:
Configured parameter config instance for the specified API.
- Return type:
- Raises:
NotImplementedError – If the API name is unknown.
- classmethod get_defaults(provider_name: str, **additional_parameters) APIParameterConfig | None[source]
Factory method to create APIParameterConfig instances with sensible defaults for known APIs.
Avoids throwing an error if the provider name does not already exist.
- Parameters:
provider_name (str) – The name of the API to create the parameter map for.
additional_parameters (dict) – Additional parameter mappings.
- Returns:
Configured parameter config instance for the specified API. Returns None if a mapping for the provider_name isn’t retrieved
- Return type:
Optional[APIParameterConfig]
- property map: APIParameterMap
Helper property that is an alias for the APIParameterMap attribute.
The APIParameterMap maps all universal parameters to the parameter names specific to the API provider.
- Returns:
The mapping that the current APIParameterConfig will use to build a dictionary of parameter requests specific to the current API.
- Return type:
- parameter_map: APIParameterMap
- class scholar_flux.api.models.APIParameterMap(*, query: str, records_per_page: str, start: str | None = None, api_key_parameter: str | None = None, api_key_required: bool = False, auto_calculate_page: bool = True, zero_indexed_pagination: bool = False, api_specific_parameters: ~typing.Dict[str, ~scholar_flux.api.models.base_parameters.APISpecificParameter] = <factory>)[source]
Bases:
BaseAPIParameterMapExtends BaseAPIParameterMap by adding validation and the optional retrieval of provider defaults for known APIs.
This class also specifies default mappings for specific attributes such as API keys and additional parameter names.
- query
The API-specific parameter name for the search query.
- Type:
str
- start
The API-specific parameter name for pagination (start index or page number).
- Type:
Optional[str]
- records_per_page
The API-specific parameter name for records per page.
- Type:
str
- api_key_parameter
The API-specific parameter name for the API key.
- Type:
Optional[str]
- api_key_required
Indicates whether an API key is required.
- Type:
bool
- auto_calculate_page
If True, calculates start index from page; if False, passes page number directly.
- Type:
bool
- zero_indexed_pagination
If True, treats 0 as an allowed page value when retrieving data from APIs.
- Type:
bool
- api_specific_parameters
Additional universal to API-specific parameter mappings.
- Type:
Dict[str, str]
- api_key_parameter: str | None
- api_key_required: bool
- api_specific_parameters: Dict[str, APISpecificParameter]
- auto_calculate_page: bool
- classmethod from_defaults(provider_name: str, **additional_parameters) APIParameterMap[source]
Factory method that uses the APIParameterMap.get_defaults classmethod to retrieve the provider config.
Raises an error if the provider does not exist.
- Parameters:
provider_name (str) – The name of the API to create the parameter map for.
additional_parameters (dict) – Additional parameter mappings.
- Returns:
Configured parameter map for the specified API.
- Return type:
- Raises:
NotImplementedError – If the API name is unknown.
- classmethod get_defaults(provider_name: str, **additional_parameters) APIParameterMap | None[source]
Factory method to create APIParameterMap instances with sensible defaults for known APIs.
This class method attempts to pull from the list of known providers defined in the scholar_flux.api.providers.provider_registry and returns None if an APIParameterMap for the provider cannot be found.
Using the additional_parameters keyword arguments, users can specify optional overrides for specific parameters if needed. This is helpful in circumstances where an API’s specification overlaps with that of a known provider.
Valid providers (as indicated in provider_registry) include:
springernature
plos
arxiv
openalex
core
crossref
- Parameters:
provider_name (str) – The name of the API provider to retrieve the parameter map for.
additional_parameters (dict) – Additional parameter mappings.
- Returns:
Configured parameter map for the specified API.
- Return type:
Optional[APIParameterMap]
- model_config: ClassVar[ConfigDict] = {}
Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].
- query: str
- records_per_page: str
- classmethod set_default_api_key_parameter(values: dict[str, Any]) dict[str, Any][source]
Sets the default for the api key parameter when api_key_required`=True and `api_key_parameter is None.
- Parameters:
values (dict[str, Any]) – The dictionary of attributes to validate
- Returns:
The updated parameter values passed to the APIParameterMap. api_key_parameter is set to “api_key” if key is required but not specified
- Return type:
dict[str, Any]
- start: str | None
- classmethod validate_api_specific_parameter_mappings(values: dict[str, Any]) dict[str, Any][source]
Validates the additional mappings provided to the APIParameterMap.
This method validates that the input is dictionary of mappings that consists of only string-typed keys mapped to API-specific parameters as defined by the APISpecificParameter class.
- Parameters:
values (dict[str, Any]) – The dictionary of attribute values to validate.
- Returns:
The updated dictionary if validation passes.
- Return type:
dict[str, Any]
- Raises:
APIParameterException – If api_specific_parameters is not a dictionary or contains non-string keys/values.
- zero_indexed_pagination: bool
- class scholar_flux.api.models.APIResponse(*, cache_key: str | None = None, response: Any | None = None, created_at: str | None = None)[source]
Bases:
BaseModelA Response wrapper for responses of different types that allows consistency when using several possible backends. The purpose of this class is to serve as the base for managing responses received from scholarly APIs while processing each component in a predictable, reproducible manner,
This class uses pydantic’s data validation and serialization/deserialization methods to aid caching and includes properties that refer back to the original response for displaying valid response codes, URLs, etc.
All future processing/error-based responses classes inherit from and build off of this class.
- Parameters:
cache_key (Optional[str]) – A string for recording cache keys for use in later steps of the response orchestration involving processing, cache storage, and cache retrieval
response (Any) – A response or response-like object to be validated and used/re-used in later caching and response processing/orchestration steps.
created_at (Optional[str]) – A value indicating the time in which a response or response-like object was created.
Example
>>> from scholar_flux.api import APIResponse # Using keyword arguments to build a basic APIResponse data container: >>> response = APIResponse.from_response( >>> cache_key = 'test-response', >>> status_code = 200, >>> content=b'success', >>> url='https://example.com', >>> headers={'Content-Type': 'application/text'} >>> ) >>> response # OUTPUT: APIResponse(cache_key='test-response', response = ReconstructedResponse( # status_code=200, reason='OK', headers={'Content-Type': 'application/text'}, # text='success', url='https://example.com' #) >>> assert response.status == 'OK' and response.text == 'success' and response.url == 'https://example.com' # OUTPUT: True >>> assert response.validate_response() # OUTPUT: True
- classmethod as_reconstructed_response(response: Any) ReconstructedResponse[source]
Classmethod designed to create a reconstructed response from an original response object. This method coerces response attributes into a reconstructed response that retains the original content, status code, headers, URL, reason, etc.
- Returns:
- A minimal response object that contains the core attributes needed to support
other processes in the scholar_flux module such as response parsing and caching.
- Return type:
- cache_key: str | None
- property content: bytes | None
Return content from the underlying response, if available and valid.
- Returns:
The bytes from the original response content
- Return type:
(bytes)
- created_at: str | None
- encode_response(response: Any) Dict[str, Any] | List[Any] | None[source]
Helper method for serializing a response into a json format. Accounts for special cases such as CaseInsensitiveDict fields that are otherwise unserializable.
From this step, pydantic can safely use json internally to dump the encoded response fields
- classmethod from_response(response: Any | None = None, cache_key: str | None = None, auto_created_at: bool | None = None, **kwargs) Self[source]
Construct an APIResponse from a response object or from keyword arguments.
If response is not a valid response object, builds a minimal response-like object from kwargs.
- classmethod from_serialized_response(response: Any | None = None, **kwargs) ReconstructedResponse | None[source]
Helper method for creating a new APIresponse from the original dumped object. This method Accounts for lack of ease of serialization of responses by decoding the response dictionary that was loaded from a string using json.loads from the json module in the standard library.
If the response input is still a serialized string, this method will manually load the response dict with the APIresponse._deserialize_response_dict class method before further processing.
- Parameters:
response (Any) – A prospective response value to load into the API Response.
- Returns:
A reconstructed response object, if possible. Otherwise returns None
- Return type:
Optional[ReconstructedResponse]
- property headers: MutableMapping[str, str] | None
Return headers from the underlying response, if available and valid.
- Returns:
A dictionary of headers from the response
- Return type:
MutableMapping[str, str]
- model_config: ClassVar[ConfigDict] = {}
Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].
- raise_for_status()[source]
Uses an underlying response object to validate the status code associated with the request.
If the attribute isn’t a response or reconstructed response, the code will coerce the class into a response object to verify the status code for the request URL and response.
- property reason: str | None
Uses the underlying reason attribute on the response object, if available, to create a human readable status description.
- Returns:
The status description associated with the response.
- Return type:
Optional[str]
- response: Any | None
- classmethod serialize_response(response: Response | ResponseProtocol) str | None[source]
Helper method for serializing a response into a json format. The response object is first converted into a serialized string and subsequently dumped after ensuring that the field is serializable.
- Parameters:
response (Response, ResponseProtocol)
- property status: str | None
Helper property for retrieving a human-readable status description APIResponse.
- Returns:
The status description associated with the response (if available).
- Return type:
Optional[int]
- property status_code: int | None
Helper property for retrieving a status code from the APIResponse.
- Returns:
The status code associated with the response (if available)
- Return type:
Optional[int]
- property text: str | None
Attempts to retrieve the response text by first decoding the bytes of the its content. If not available, this property attempts to directly reference the text attribute directly.
- Returns:
A text string if the text is available in the correct format, otherwise None
- Return type:
Optional[str]
- classmethod transform_response(v: Any) Response | ResponseProtocol | None[source]
Attempts to resolve a response object as an original or ReconstructedResponse: All original response objects (duck-typed or requests response) with valid values will be returned as is.
If the passed object is a string - this function will attempt to serialize it before attempting to parse it as a dictionary.
Dictionary fields will be decoded, if originally encoded, and parsed as a ReconstructedResponse object, if possible.
Otherwise, the original object is returned as is.
- property url: str | None
Return URL from the underlying response, if available and valid.
- Returns:
- A string of the original URL if available. Accounts for objects that
that indicate the original url when converted as a string
- Return type:
str
- classmethod validate_iso_timestamp(v: str | datetime | None) str | None[source]
Helper method for validating and ensuring that the timestamp accurately follows an iso 8601 format.
- validate_response() bool[source]
Helper method for determining whether the response attribute is truly a response. If the response isn’t a requests response, we use duck-typing to determine whether the response attribute, itself, has the expected attributes of a response by using properties for checking types vs None (if the attribute isn’t the expected type)
- Returns:
- An indicator of whether the current APIResponse.response attribute is
actually a response
- Return type:
bool
- class scholar_flux.api.models.APISpecificParameter(name: str, description: str, validator: Callable[[Any], Any] | None = None, default: Any = None, required: bool = False)[source]
Bases:
objectDataclass that defines the specification of an API-specific parameter for an API provider.
Implements optionally specifiable defaults, validation steps, and indicators for optional vs. required fields.
- Parameters:
name (str) – The name of the parameter used when sending requests to APis.
description (str) – A description of the API-specific parameter.
validator (Optional[Callable[[Any], Any]]) – An optional function/method for verifying and pre-processing parameter input based on required types, constrained values, etc.
default (Any) – An default value used for the parameter if not specified by the user
required (bool) – Indicates whether the current parameter is required for API calls.
- __init__(*args: Any, **kwargs: Any) None
- default: Any = None
- description: str
- name: str
- required: bool = False
- structure(flatten: bool = False, show_value_attributes: bool = True) str[source]
Helper method for showing the structure of the current APISpecificParameter.
- validator: Callable[[Any], Any] | None = None
- property validator_name
Helper method for generating a human readable string from the validator function, if used.
- class scholar_flux.api.models.BaseAPIParameterMap(*, query: str, records_per_page: str, start: str | None = None, api_key_parameter: str | None = None, api_key_required: bool = False, auto_calculate_page: bool = True, zero_indexed_pagination: bool = False, api_specific_parameters: ~typing.Dict[str, ~scholar_flux.api.models.base_parameters.APISpecificParameter] = <factory>)[source]
Bases:
BaseModelBase class for Mapping universal SearchAPI parameter names to API-specific parameter names.
Includes core logic for distinguishing parameter names, indicating required API keys, and defining pagination logic.
- query
The API-specific parameter name for the search query.
- Type:
str
- start
The API-specific parameter name for optional pagination (start index or page number).
- Type:
Optional[str]
- records_per_page
The API-specific parameter name for records per page.
- Type:
str
- api_key_parameter
The API-specific parameter name for the API key.
- Type:
Optional[str]
- api_key_required
Indicates whether an API key is required.
- Type:
bool
- page_required
If True, indicates that a page is required.
- Type:
bool
- auto_calculate_page
If True, calculates start index from page; if False, passes page number directly.
- Type:
bool
- zero_indexed_pagination
Treats page=0 as an allowed page value when retrieving data from the API.
- Type:
bool
- api_specific_parameters
Additional API-specific parameter mappings.
- Type:
Dict[str, APISpecificParameter]
- api_key_parameter: str | None
- api_key_required: bool
- api_specific_parameters: Dict[str, APISpecificParameter]
- auto_calculate_page: bool
- classmethod from_dict(obj: Dict[str, Any]) BaseAPIParameterMap[source]
Create a new instance of BaseAPIParameterMap from a dictionary.
- Parameters:
obj (dict) – The dictionary containing the data for the new instance.
- Returns:
A new instance created from the given dictionary.
- Return type:
- model_config: ClassVar[ConfigDict] = {}
Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].
- query: str
- records_per_page: str
- show_parameters() list[source]
Helper method to show the complete list of all parameters that can be found in the current ParameterMap.
- Returns:
The complete list of all universal and api specific parameters corresponding to the current API
- Return type:
List
- start: str | None
- structure(flatten: bool = False, show_value_attributes: bool = True) str[source]
Helper method that shows the current structure of the BaseAPIParameterMap.
- to_dict() Dict[str, Any][source]
Convert the current instance into a dictionary representation.
- Returns:
A dictionary representation of the current instance.
- Return type:
Dict
- update(other: BaseAPIParameterMap | Dict[str, Any]) BaseAPIParameterMap[source]
Update the current instance with values from another BaseAPIParameterMap or dictionary.
- Parameters:
other (BaseAPIParameterMap | Dict) – The object containing updated values.
- Returns:
A new instance with updated values.
- Return type:
- zero_indexed_pagination: bool
- class scholar_flux.api.models.BaseProviderDict(dict=None, /, **kwargs)[source]
Bases:
UserDict[str,Any]The BaseProviderDict extends the dictionary to resolve minor naming variations in keys to the same provider name.
The BaseProviderDict uses the ProviderConfig._normalize_name method to ignore underscores and case-sensitivity.
- class scholar_flux.api.models.ErrorResponse(*, cache_key: str | None = None, response: Any | None = None, created_at: str | None = None, message: str | None = None, error: str | None = None)[source]
Bases:
APIResponseReturned when something goes wrong, but we don’t want to throw immediately—just hand back failure details.
The class is formatted for compatibility with the ProcessedResponse,
- cache_key: str | None
- created_at: str | None
- property data: None
Provided for type hinting + compatibility.
- error: str | None
- property extracted_records: None
Provided for type hinting + compatibility.
- classmethod from_error(message: str, error: Exception, cache_key: str | None = None, response: Response | ResponseProtocol | None = None) Self[source]
Creates and logs the processing error if one occurs during response processing.
- Parameters:
response (Response) – Raw API response.
cache_key (Optional[str]) – Cache key for storing results.
- Returns:
- A Dataclass Object that contains the error response data
and background information on what precipitated the error.
- Return type:
- message: str | None
- property metadata: None
Provided for type hinting + compatibility.
- model_config: ClassVar[ConfigDict] = {}
Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].
- property parsed_response: None
Provided for type hinting + compatibility.
- property processed_records: None
Provided for type hinting + compatibility.
- response: Any | None
- class scholar_flux.api.models.NonResponse(*, cache_key: str | None = None, response: None = None, created_at: str | None = None, message: str | None = None, error: str | None = None)[source]
Bases:
ErrorResponseResponse class used to indicate that an error occurred in the preparation of a request or in the retrieval of a response object from an API.
This class is used to signify the error that occurred within the search process using a similar interface as the other scholar_flux Response dataclasses.
- cache_key: str | None
- created_at: str | None
- error: str | None
- message: str | None
- model_config: ClassVar[ConfigDict] = {}
Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].
- response: None
- class scholar_flux.api.models.PageListInput(root: RootModelRootType = PydanticUndefined)[source]
Bases:
RootModel[Sequence[int]]Helper class for processing page information in a predictable manner. The PageListInput class expects to receive a list, string, or generator that contains at least one page number. If a singular integer is received, the result is transformed into a single-item list containing that integer.
- Parameters:
root (Sequence[int]) – A list containing at least one page number.
Examples
>>> from scholar_flux.api.models import PageListInput >>> PageListInput(5) PageListInput([5]) >>> PageListInput(range(5)) PageListInput([1, 2, 3, 4])
- model_config: ClassVar[ConfigDict] = {}
Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].
- property page_numbers: Sequence[int]
Returns the sequence of validated page numbers as a list.
- classmethod page_validation(v: str | int | Sequence[int | str]) Sequence[int][source]
Processes the page input to ensure that a list of integers is returned if the received page list is in a valid format.
- Parameters:
v (str | int | Sequence[int | str]) – A page or sequence of pages to be formatted as a list of pages.
- Returns:
A validated, formatted sequence of page numbers assuming successful page validation
- Return type:
Sequence[int]
- Raises:
ValidationError – Internally raised via pydantic if a ValueError is encountered (if the input is not exclusively a page or list of page numbers)
- classmethod process_page(page_value: str | int) int[source]
Helper method for ensuring that each value in the sequence is a numeric string or whole number.
Note that this function will not throw an error for negative pages as that is handled at a later step in the page search process.
- Parameters:
page_value (str | int) – The value to be converted if it is not already an integer
- Returns:
A validated integer if the page can be converted to an integer and is not a float
- Return type:
int
- Raises:
ValueError – When the value is not an integer or numeric string to be converted to an integer
- root: RootModelRootType
- class scholar_flux.api.models.ProcessedResponse(*, cache_key: str | None = None, response: Any | None = None, created_at: str | None = None, parsed_response: Any | None = None, extracted_records: List[Any] | None = None, processed_records: List[Dict[Any, Any]] | None = None, metadata: Any | None = None, message: str | None = None)[source]
Bases:
APIResponseHelper class for returning a ProcessedResponse object that contains information on the original, cached, or reconstructed_response received and processed after retrieval from an API in addition to the cache key. This object also allows storage of intermediate steps including:
1) parsed responses 2) extracted records and metadata 3) processed records (aliased as data) 4) any additional messages An error field is provided for compatibility with the ErrorResponse class.
- cache_key: str | None
- created_at: str | None
- property data: List[Dict[Any, Any]] | None
Alias to the processed_records attribute that holds a list of dictionaries, when available.
- property error: None
Provided for type hinting + compatibility.
- extracted_records: List[Any] | None
- message: str | None
- metadata: Any | None
- model_config: ClassVar[ConfigDict] = {}
Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].
- parsed_response: Any | None
- processed_records: List[Dict[Any, Any]] | None
- response: Any | None
- class scholar_flux.api.models.ProviderConfig(*, provider_name: Annotated[str, MinLen(min_length=1)], base_url: str, parameter_map: BaseAPIParameterMap, records_per_page: Annotated[int, Ge(ge=0), Le(le=1000)] = 20, request_delay: Annotated[float, Ge(ge=0)] = 6.1, api_key_env_var: str | None = None, docs_url: str | None = None)[source]
Bases:
BaseModelConfig for creating the basic instructions and settings necessary to interact with new providers. This config on initialization is created for default providers on package initialization in the scholar_flux.api.providers submodule. A new, custom provider or override can be added to the provider_registry (A custom user dictionary) from the scholar_flux.api.providers module.
- Parameters:
provider_name (str) – The name of the provider to be associated with the config.
base_url (str) – The URL of the provider to send requests with the specified parameters.
parameter_map (BaseAPIParameterMap) – The parameter map indicating the specific semantics of the API.
records_per_page (int) – Generally the upper limit (for some APIs) or reasonable limit for the number of retrieved records per request (specific to the API provider).
request_delay (float) – Indicates exactly how many seconds to wait before sending successive requests Note that the requested interval may vary based on the API provider.
api_key_env_var (Optional[str]) – Indicates the environment variable to look for if the API requires or accepts API keys.
docs_url – (Optional[str]): An optional URL that indicates where documentation related to the use of the API can be found.
- Example Usage:
>>> from scholar_flux.api import ProviderConfig, APIParameterMap, SearchAPI >>> # Maps each of the individual parameters required to interact with the Guardian API >>> parameters = APIParameterMap(query='q', >>> start='page', >>> records_per_page='page-size', >>> api_key_parameter='api-key', >>> auto_calculate_page=False, >>> api_key_required=True) >>> # creating the config object that holds the basic configuration necessary to interact with the API >>> guardian_config = ProviderConfig(provider_name = 'GUARDIAN', >>> parameter_map = parameters, >>> base_url = 'https://content.guardianapis.com//search', >>> records_per_page=10, >>> api_key_env_var='GUARDIAN_API_KEY', >>> request_delay=6) >>> api = SearchAPI.from_provider_config(query = 'economic welfare', >>> provider_config = guardian_config, >>> use_cache = True) >>> assert api.provider_name == 'guardian' >>> response = api.search(page = 1) # assumes that you have the GUARDIAN_API_KEY stored as an env variable >>> assert response.ok
- api_key_env_var: str | None
- base_url: str
- docs_url: str | None
- model_config: ClassVar[ConfigDict] = {'str_strip_whitespace': True}
Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].
- classmethod normalize_provider_name(v: str) str[source]
Helper method for normalizing the names of providers to a consistent structure.
- parameter_map: BaseAPIParameterMap
- provider_name: str
- records_per_page: int
- request_delay: float
- search_config_defaults() dict[str, Any][source]
Convenience Method for retrieving ProviderConfig fields as a dict. Useful for providing the missing information needed to create a SearchAPIConfig object for a provider when only the provider_name has been provided.
- Returns:
- A dictionary containing the URL, name, records_per_page, and request_delay
for the current provider.
- Return type:
(dict)
- structure(flatten: bool = False, show_value_attributes: bool = True) str[source]
Helper method that shows the current structure of the ProviderConfig.
- class scholar_flux.api.models.ProviderRegistry(dict=None, /, **kwargs)[source]
Bases:
BaseProviderDictThe ProviderRegistry implementation allows the smooth and efficient retrieval of API parameter maps and default configuration settings to aid in the creation of a SearchAPI that is specific to the current API.
Note that the ProviderRegistry uses the ProviderConfig._normalize_name to ignore underscores and case-sensitivity.
- - ProviderRegistry.from_defaults
Dynamically imports configurations stored within scholar_flux.api.providers, and fails gracefully if a provider’s module does not contain a ProviderConfig.
- - ProviderRegistry.get
resolves a provider name to its ProviderConfig if it exists in the registry.
- - ProviderRegistry.get_from_url
resolves a provider URL to its ProviderConfig if it exists in the registry.
- add(provider_config: ProviderConfig) None[source]
Helper method for adding a new provider to the provider registry.
- create(provider_name: str, **kwargs) ProviderConfig[source]
Helper method that creates and registers a new ProviderConfig with the current provider registry.
- Parameters:
key (str) – The name of the provider to create a new provider_config for.
**kwargs – Additional keyword arguments to pass to scholar_flux.api.models.ProviderConfig
- classmethod from_defaults() ProviderRegistry[source]
Helper method that dynamically loads providers from the scholar_flux.api.providers module specifically reserved for default provider configs.
- Returns:
A new registry containing the loaded default provider configurations
- Return type:
- get_from_url(provider_url: str | None) ProviderConfig | None[source]
Attempt to retrieve a ProviderConfig instance for the given provider by resolving the provided url to the provider’s. Will not throw an error in the event that the provider does not exist.
- Parameters:
provider_url (Optional[str]) – Name of the default provider
- Returns:
Instance configuration for the provider if it exists, else None
- Return type:
Optional[ProviderConfig]
- class scholar_flux.api.models.ReconstructedResponse(status_code: int, reason: str, headers: MutableMapping[str, str], content: bytes, url: Any)[source]
Bases:
objectHelper class for retaining the most relevant of fields when reconstructing responses from different sources such as requests and httpx (if chosen). The primary purpose of the ReconstructedResponse in scholar_flux is to create a minimal representation of a response when we need to construct a ProcessedResponse without an actual response and verify content fields.
In applications such as retrieving cached data from a scholar_flux.data_storage.DataCacheManager, if an original or cached response is not available, then a ReconstructedResponse is created from the cached response fields when available.
- Parameters:
status_code (int) – The integer code indicating the status of the response
reason (str) – Indicates the reasoning associated with the status of the response
MutableMapping[str (headers) – Indicates metadata associated with the response (e.g. Content-Type, etc.)
str] – Indicates metadata associated with the response (e.g. Content-Type, etc.)
content (bytes) – The content within the response
url – (Any): The URL from which the response was received
Note
The ReconstructedResponse.build factory method is recommended in cases when one property may contain the needed fields but may need to be processed and prepared first before being used. Examples include instances where one has text or json data instead of content, a reason_phrase field instead of reason, etc.
Example
>>> from scholar_flux.api.models import ReconstructedResponse # build a response using a factory method that infers fields from existing ones when not directly specified >>> response = ReconstructedResponse.build(status_code = 200, content = b"success", url = "https://google.com") # check whether the current class follows a ResponseProtocol and contains valid fields >>> assert response.is_response() # OUTPUT: True >>> response.validate() # raises an error if invalid >>> response.raise_for_status() # no error for 200 status codes >>> assert response.reason == 'OK' == response.status # inferred from the status_code attribute
- __init__(status_code: int, reason: str, headers: MutableMapping[str, str], content: bytes, url: Any) None
- asdict() dict[str, Any][source]
Helper method for converting the ReconstructedResponse into a dictionary containing attributes and their corresponding values.
- classmethod build(response: Any | None = None, **kwargs) ReconstructedResponse[source]
Helper method for building a new ReconstructedResponse from a regular response object. This classmethod can either construct a new ReconstructedResponse object from a response object or response-like object or create a new ReconstructedResponse altogether with its inputs.
- Parameters:
response – (Optional[Any]): A response or response-like object of unknown type or None
- kwargs: The underlying components needed to construct a new response. Note that ideally,
this set of key-value pairs would be specific only to the types expected by the ReconstructedResponse.
- content: bytes
- classmethod fields() list[source]
Helper method for retrieving a list containing the names of all fields associated with the ReconstructedResponse class.
- Returns:
A list containing the name of each attribute in the ReconstructedResponse.
- Return type:
list[str]
- classmethod from_keywords(**kwargs) ReconstructedResponse[source]
Uses the provided keyword arguments to create a ReconstructedResponse. keywords include the default attributes of the ReconstructedResponse, or can be inferred and processed from other keywords.
- Parameters:
status_code (int) – The integer code indicating the status of the response
reason (str) – Indicates the reasoning associated with the status of the response
headers (MutableMapping[str, str]) – Indicates metadata associated with the response (e.g. Content-Type)
content (bytes) – The content within the response
url – (Any): The URL from which the response was received
Some fields can be both provided directly or inferred from other similarly common fields:
content: [‘content’, ‘_content’, ‘text’, ‘json’]
headers: [‘headers’, ‘_headers’]
reason: [‘reason’, ‘status’, ‘reason_phrase’, ‘status_code’]
- Returns:
A newly reconstructed response from the given keyword components
- Return type:
- headers: MutableMapping[str, str]
- is_response() bool[source]
Method for directly validating the fields that indicate that a response has been minimally recreated successfully. The fields that are validated include:
status codes (should be an integer)
URLs (should be a valid url)
reasons (should originate from a reason attribute or inferred from the status code)
content (should be a bytes field or encoded from a string text field)
headers (should be a dictionary with string fields and preferably a content type
- Returns:
Indicates whether the current reconstructed response minimally recreates a response object.
- Return type:
bool
- json() Dict[str, Any] | List[Any] | None[source]
Return JSON-decoded body from the underlying response, if available.
- property ok: bool
Indicates whether the current response indicates a successful request (200 <= status_code < 400) or whether an invalid response has been received. Accounts for the.
- Returns:
True if the status code is an integer value within the range of 200 and 399, False otherwise
- Return type:
bool
- raise_for_status() None[source]
Method that imitates the capability of the requests and httpx response types to raise errors when encountering status codes that are indicative of failed responses.
As scholar_flux processes data that is generally only sent when status codes are within the 200s (or exactly 200 [ok]), an error is raised when encountering a value outside of this range.
- Raises:
InvalidResponseReconstructionException – If the structure of the ReconstructedResponse is invalid
RequestException – If the expected response is not within the range of 200-399
- reason: str
- property status: str | None
Helper property for retrieving a human-readable status description of the status.
- Returns:
The status description associated with the response (if available)
- Return type:
Optional[int]
- status_code: int
- property text: str | None
Helper property for retrieving the text from the bytes content as a string.
- Returns:
The decoded text from the content of the response
- Return type:
Optional[str]
- url: Any
- validate() None[source]
Raises an error if the recreated response object does not contain valid properties expected of a response. if the response validation is successful, a response is not raised and an object is not returned.
- Raises:
InvalidResponseReconstructionException – if at least one field is determined to be invalid and unexpected of a true response object.
- class scholar_flux.api.models.SearchAPIConfig(*, provider_name: str = '', base_url: str = '', records_per_page: Annotated[int, Ge(ge=0), Le(le=1000)] = 20, request_delay: float = -1, api_key: SecretStr | None = None, api_specific_parameters: dict[str, Any] | None = None)[source]
Bases:
BaseModelThe SearchAPIConfig class provides the core tools necessary to set and interact with the API. The SearchAPI uses this class to retrieve data from an API using universal parameters to simplify the process of retrieving raw responses.
- provider_name
Indicates the name of the API to use when making requests to a provider. If the provider name matches a known default and the base_url is unspecified, the base URL for the current provider is used instead.
- Type:
str
- base_url
Indicates the API URL where data will be searched and retrieved.
- Type:
str
- records_per_page
Controls the number of records that will appear on each page
- Type:
int
- request_delay
Indicates the minimum delay between each request to avoid exceeding API rate limits
- Type:
float
- api_key
This is an API-specific parameter for validating the current user’s identity. If a str type is provided, it is converted into a SecretStr.
- Type:
Optional[str | SecretStr]
- api_specific_parameters
A dictionary containing all parameters specific to the current API. API-specific parameters include the following.
- mailto (Optional[str | SecretStr]):
An optional email address for receiving feedback on usage from providers, This parameter is currently applicable only to the Crossref API.
- db: (str):
The parameter use by the NIH to direct requests for data to the pubmed database. This parameter defaults to pubmed and does not require direct specification
- Type:
dict[str, APISpecificParameter]
Examples
>>> from scholar_flux.api import SearchAPIConfig, SearchAPI, provider_registry # to create a CROSSREF configuration with minimal defaults and provide an api_specific_parameter: >>> config = SearchAPIConfig.from_defaults(provider_name = 'crossref', mailto = 'your_email_here@example.com') # the configuration automatically retrieves the configuration for the "Crossref" API >>> assert config.provider_name == 'crossref' and config.base_url == provider_registry['crossref'].base_url >>> api = SearchAPI.from_settings(query = 'q', config = config) >>> assert api.config == config # to retrieve all defaults associated with a provider and automatically read an API key if needed >>> config = SearchAPIConfig.from_defaults(provider_name = 'pubmed', api_key = 'your api key goes here') # the API key is retrieved automatically if you have the API key specified as an environment variable >>> assert config.api_key is not None # Default provider API specifications are already pre-populated if they are set with defaults >>> assert config.api_specific_parameters['db'] == 'pubmed' # required by pubmed and defaults to pubmed # Update a provider and automatically retrieve its API key - the previous API key will no longer apply >>> updated_config = SearchAPIConfig.update(config, provider_name = 'core') # The API key should have been overwritten to use core. Looks for a `CORE_API_KEY` env variable by default >>> assert updated_config.provider_name == 'core' and updated_config.api_key != config.api_key
- DEFAULT_PROVIDER: ClassVar[str] = 'PLOS'
- DEFAULT_RECORDS_PER_PAGE: ClassVar[int] = 25
- DEFAULT_REQUEST_DELAY: ClassVar[float] = 6.1
- MAX_API_KEY_LENGTH: ClassVar[int] = 512
- api_key: SecretStr | None
- api_specific_parameters: dict[str, Any] | None
- base_url: str
- classmethod default_request_delay(v: int | float | None, provider_name: str | None = None) float[source]
Helper method enabling the retrieval of the most appropriate rate limit for the current provider.
Defaults to the SearchAPIConfig default rate limit when the current provider is unknown and a valid rate limit has not yet been provided.
- Parameters:
v (Optional[int | float]) – The value received for the current request_delay
provider_name (Optional[str]) – The name of the provider to retrieve a rate limit for
- Returns:
- The inputted non-negative request delay, the retrieved rate limit for the current provider
if available, or the SearchAPIConfig.DEFAULT_REQUEST_DELAY - all in order of priority.
- Return type:
float
- classmethod from_defaults(provider_name: str, **overrides) SearchAPIConfig[source]
Uses the default configuration for the chosen provider to create a SearchAPIConfig object containing configuration parameters. Note that additional parameters and field overrides can be added via the **overrides field.
- Parameters:
provider_name (str) – The name of the provider to create the config
**overrides – Optional keyword arguments to specify overrides and additional arguments
- Returns:
A default APIConfig object based on the chosen parameters
- Return type:
- model_config: ClassVar[ConfigDict] = {}
Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].
- provider_name: str
- records_per_page: int
- request_delay: float
- classmethod set_records_per_page(v: int | None)[source]
Sets the records_per_page parameter with the default if the supplied value is not valid:
Triggers a validation error when request delay is an invalid type. Otherwise uses the DEFAULT_RECORDS_PER_PAGE class attribute if the supplied value is missing or is a negative number.
- structure(flatten: bool = False, show_value_attributes: bool = True) str[source]
Helper method for retrieving a string representation of the overall structure of the current SearchAPIConfig.
- classmethod update(current_config: SearchAPIConfig, **overrides) SearchAPIConfig[source]
Create a new SearchAPIConfig by updating an existing config with new values and/or switching to a different provider. This method ensures that the new provider’s base_url and defaults are used if provider_name is given, and that API-specific parameters are prioritized and merged as expected.
- Parameters:
current_config (SearchAPIConfig) – The existing configuration to update.
**overrides – Any fields or API-specific parameters to override or add.
- Returns:
A new config with the merged and prioritized values.
- Return type:
- property url_basename: str
Uses the _extract_url_basename method from the provider URL associated with the current config instance.
- classmethod validate_api_key(v: SecretStr | str | None) SecretStr | None[source]
Validates the api_key attribute and triggers a validation error if it is not valid.
- classmethod validate_provider_name(v: str | None) str[source]
Validates the provider_name attribute and triggers a validation error if it is not valid.
- classmethod validate_request_delay(v: int | float | None) int | float | None[source]
Sets the request delay (delay between each request) for valid request delays. This validator triggers a validation error when the request delay is an invalid type.
If a request delay is left None or is a negative number, this class method returns -1, and further validation is performed by cls.default_request_delay to retrieve the provider’s default request delay.
If not available, SearchAPIConfig.DEFAULT_REQUEST_DELAY is used.
- validate_search_api_config_parameters() Self[source]
Validation method that resolves URLs and/or provider names to provider_info when one or the other is not explicitly provided.
Occurs as the last step in the validation process.
- class scholar_flux.api.models.SearchResult(*, query: str, provider_name: str, page: int, response_result: ProcessedResponse | ErrorResponse | None = None)[source]
Bases:
BaseModelCore class used in order to store data in the retrieval and processing of API Searches when iterating and searching over a range of pages, queries, and providers at a time. This class uses pydantic to ensure that field validation is automatic for ensuring integrity and reliability of response processing. multi-page searches that link each response result to a particular query, page, and provider.
- Parameters:
query (str) – The query used to retrieve records and response metadata
provider_name (str) – The name of the provider where data is being retrieved
page (int) – The page number associated with the request for data
response_result (Optional[ProcessedResponse | ErrorResponse]) – The response result containing the specifics of the data retrieved from the response or the error messages recorded if the request is not successful.
For convenience, the properties of the response_result are referenced as properties of the SearchResult, including: response, parsed_response, processed_records, etc.
- property cache_key: str | None
Extracts the cache key from the API Response if available.
This cache key is used when storing and retrieving data from response processing cache storage.
- property created_at: str | None
Extracts the time in which the ErrorResponse or ProcessedResponse was created, if available.
- property data: list[dict[Any, Any]] | None
Alias referring back to the processed records from the ProcessedResponse or ErrorResponse.
Contains the processed records from the APIResponse processing step after a successfully received response has been processed. If an error response was received instead, the value of this property is None.
- property error: str | None
Extracts the error name associated with the result from the base class, indicating the name/category of the error in the event that the response_result is an ErrorResponse.
- property extracted_records: list[Any] | None
Contains the extracted records from the APIResponse handling steps that extract individual records from successfully received and parsed response.
If an ErrorResponse was received instead, the value of this property is None.
- property message: str | None
Extracts the message associated with the result from the base class, indicating why an error occurred in the event that the response_result is an ErrorResponse.
- property metadata: Any | None
Contains the metadata from the APIResponse handling steps that extract response metadata from successfully received and parsed responses.
If an ErrorResponse was received instead, the value of this property is None.
- model_config: ClassVar[ConfigDict] = {}
Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].
- page: int
- property parsed_response: Any | None
Contains the parsed response content from the APIResponse handling steps that extract the JSON, XML, or YAML content from a successfully received response.
If an ErrorResponse was received instead, the value of this property is None.
- property processed_records: list[dict[Any, Any]] | None
Contains the processed records from the APIResponse processing step after a successfully received response has been processed.
If an error response was received instead, the value of this property is None.
- provider_name: str
- query: str
- property response: Response | ResponseProtocol | None
Helper method directly referencing the original or reconstructed response or response-like object from the API Response if available.
If the received response is not available (None in the response_result), then this value will also be absent (None).
- response_result: ProcessedResponse | ErrorResponse | None
- class scholar_flux.api.models.SearchResultList(iterable=(), /)[source]
Bases:
list[SearchResult]A helper class used to store the results of multiple SearchResult instances for enhanced type safety. This class inherits from a list and extends its functionality to tailor its functionality to APIResponses received from SearchCoordinators and MultiSearchCoordinators.
- - SearchResultList.append
Basic list.append implementation extended to accept only SearchResults
- - SearchResultList.extend
Basic list.extend implementation extended to accept only iterables of SearchResults
- - SearchResultList.filter
Removes NonResponses and ErrorResponses from the list of SearchResults
- - SearchResultList.filter
Removes NonResponses and ErrorResponses from the list of SearchResults
- - SearchResultList.join
Combines all records from ProcessedResponses into a list of dictionary-based records
Note Attempts to add other classes to the SearchResultList other than SearchResults will raise a TypeError.
- append(item: SearchResult)[source]
Overwrites the default append method on the user dict to ensure that only SearchResult objects can be appended to the custom list.
- Parameters:
item (SearchResult) – The response result containing the API response data, the provider name, and page associated with the response.
- extend(other: SearchResultList | MutableSequence[SearchResult] | Iterable[SearchResult])[source]
Overwrites the default append method on the user dict to ensure that only an iterable of SearchResult objects can be appended to the SearchResultList.
- Parameters:
other (Iterable[SearchResult]) – An iterable/sequence of response results containing the API response
data
name (the provider)
response (and page associated with the)
- filter() SearchResultList[source]
Helper method that retains only elements from the original response that indicate successful processing.
- join() list[dict[str, Any]][source]
Helper method for joining all successfully processed API responses into a single list of dictionaries that can be loaded into a pandas or polars dataframe.
Note that this method will only load processed responses that contain records that were also successfully extracted and processed.
- Returns:
A single list containing all records retrieved from each page
- Return type:
list[dict[str, Any]]