scholar_flux.sessions package

Subpackages

Submodules

scholar_flux.sessions.encryption module

The scholar_flux.sessions.encryption module is tasked with the implementation of an EncryptionPipelineFactory that can be used to easily and efficiently create a serializer that is accepted by CachedSession objects to store requests cache.

This encryption factory uses encryption and a safer_serializer for two steps:
  1. To sign the requests storage cache for invalidation on unexpected data changes/tampering

  2. To encrypt request cache for storage after serialization and decrypt it before deserialization during retrieval

If a key does not exist and is not provided, the EncryptionPipelineFactory will create a new Fernet key for these steps

class scholar_flux.sessions.encryption.EncryptionPipelineFactory(secret_key: str | bytes | SecretStr | None = None, salt: str | None = '')[source]

Bases: object

Helper class used to create a factory for encrypting and decrypting session cache and pipelines using a secret key.

Note that pickle in common use carries the potential for vulnerabilities when reading untrusted serialized data and can otherwise perform arbitrary code execution. This implementation makes use of a safe serializer that uses a fernet generated secret_key to validate the serialized data before reading and decryption. This prevents errors and halts reading the cached data in case of modification via a malicious source.

The EncryptionPipelineFactory can be used for generalized use cases requiring encryption outside scholar_flux. and implemented as follows:

>>> from scholar_flux.sessions import EncryptionPipelineFactory
>>> from requests_cache import CachedSession, CachedResponse
>>> encryption_pipeline_factory = EncryptionPipelineFactory()
>>> encryption_serializer = encryption_pipeline_factory()
>>> cached_session = CachedSession('filesystem', serializer = encryption_serializer)
>>> endpoint = "https://docs.python.org/3/library/typing.html"
>>> response = cached_session.get(endpoint)
>>> cached_response = cached_session.get(endpoint)
>>> assert isinstance(cached_response, CachedResponse)
ENCODING: Final[str] = 'utf-8'
__init__(secret_key: str | bytes | SecretStr | None = None, salt: str | None = '')[source]

Initializes the EncryptionPipelineFactory class that generates an encryption pipeline for use with CachedSession objects.

If no secret_key is provided, the code attempts to retrieve a secret key from the SCHOLAR_FLUX_CACHE_SECRET_KEY environment variable from the config.

Otherwise a random Fernet key is generated and used to encrypt the session.

Parameters:
  • bytes] (secret_key Optional[str |) – The key to use for encrypting and decrypting the data that flows through the pipeline.

  • salt – Optional[str]: An optional salt used to further increase security on write

create_pipeline() SerializerPipeline[source]

Create a serializer that uses pickle + itsdangerous for signing and cryptography for encryption.

This pipeline encrypts the response data after generating a signature when serialized. On load, the data is then decrypted and the signature that was previously generated with the secret key is verified prior to deserialization of the response.

Returns:

A new serializer pipeline that enforces signature validation and encryption.

Return type:

SerializerPipeline

encryption_stage() Stage[source]

Creates a new serializer stage that uses Fernet encryption and decryption using the generated Fernet key.

Returns:

A new serializer stage that encrypts data when dumped and decrypts data when loaded.

Return type:

Stage

property fernet: None

Returns the current fernet key using the validated 32 byte URL-safe base64 key.

static generate_secret_key() bytes[source]

Generates a secret key for Fernet encryption using the cryptography package.

Returns:

A new 32 byte URL-safe base 64 key

Return type:

bytes

property secret_key: bytes

Returns the secret key used for encrypting and decrypting the cache serialization pipeline.

signer_stage() Stage[source]

Creates a stage that uses itsdangerous to add a signature to responses during serialization.

This signature is generated on write and uses the provided secret key to enforce signature validation on deserialization, verifying that the response data hasn’t been tampered when the response is reloaded.

Returns:

A new stage that uses the secret key and salt for signature creation and validation.

Return type:

Stage

scholar_flux.sessions.session_manager module

The scholar_flux.sessions.session_manager module implements the SessionManager & CachedSessionManager for requests.

These classes serve as factory methods in the creation of requests.Session objects and requests_cache.CachedSession objects.

By calling the configure_session method on a session manager, a new session can be created that implements basic or cached sessions depending on which SessionManager was created.

Classes:

SessionManager: Base class holding the configuration for non-cached sessions CachedSessionManager: Extensible factory class allowing users to define cached sessions with the selected backend

class scholar_flux.sessions.session_manager.CachedSessionManager(user_agent: str | None = None, cache_name: str | None = None, cache_directory: str | Path | None = None, backend: SessionCacheBackendType | None = None, serializer: SessionCacheSerializer | None = None, expire_after: int | float | str | datetime | timedelta | None = None, raise_on_error: bool = False)[source]

Bases: SessionManager

This session manager is a wrapper around requests-cache and enables the creation of a requests-cache session with defaults that abstract away the complexity of cached session management.

The purpose of this class is to abstract away the complexity in cached sessions by providing reasonable defaults that are well integrated with the scholar_flux package. The requests_cache package is built off of the base requests library and can similarly be injected into the scholar_flux SearchAPI for making cached queries.

Examples

>>> from scholar_flux.sessions import CachedSessionManager
>>> from scholar_flux.api import SearchAPI
>>> from requests_cache import CachedSession
### creates a sqlite cached session in a package-writable directory
>>> session_manager = CachedSessionManager(user_agent='scholar_flux_user_agent')
>>> cached_session = session_manager() # defaults to a sqlite session in the package directory
### Which is equivalent to:
>>> cached_session = session_manager.configure_session() # defaults to a sqlite session in the package directory
>>> assert isinstance(cached_session, CachedSession)
### Similarly to the basic requests.session, this can be dependency injected in the SearchAPI:
>>> SearchAPI(query = 'history of software design', session = cached_session)
DEFAULT_EXPIRE_AFTER: int | None = 86400
__init__(user_agent: str | None = None, cache_name: str | None = None, cache_directory: str | Path | None = None, backend: SessionCacheBackendType | None = None, serializer: SessionCacheSerializer | None = None, expire_after: int | float | str | datetime | timedelta | None = None, raise_on_error: bool = False) None[source]

Initializes the CachedSessionManager, defining and validating the options for CachedSession generation.

The inputs, once validated via pydantic, are passed to the self.configure_session method to generate a session object.

Parameters:
  • user_agent (str) – Specifies the name to use for the User-Agent parameter that is to be provided in each request header.

  • cache_name (Optional[str]) – The name to associate with the current cache - used as a file in the case of filesystem/sqlite storages, and is otherwise used as a cache name in the case of storages such as Redis. If not provided, the cache name defaults to “search_requests_cache”.

  • Optional (cache_directory) – Defines the directory where the cache file is stored. if not provided, the cache_directory, when needed (sqlite, filesystem storage, etc.) will default to the first writable directory location using the scholar_flux.package_metadata.get_default_writable_directory method.

  • backend (Optional[Literal["dynamodb", "filesystem", "gridfs", "memory", "mongodb", "redis", "sqlite"] | requests_cache.BaseCache]) –

    Defines the backend to use when creating a requests-cache session. the default is sqlite. Other backends include memory, filesystem, mongodb, redis, gridfs, and dynamodb.

    Users can enter in direct cache storage implementations from requests_cache, including RedisCache, MongoCache, SQLiteCache, etc. If left None, the cache will default to checking the value of the SCHOLAR_FLUX_DEFAULT_SESSION_CACHE_BACKEND environment variable. If the environment variable is missing, the backend defaults to sqlite.

    For more information, visit the following link: https://requests-cache.readthedocs.io/en/stable/user_guide/backends.html#choosing-a-backend

  • serializer – (Optional[str | requests_cache.serializers.pipeline.SerializerPipeline | requests_cache.serializers.pipeline.Stage]): An optional serializer that is used to prepare cached responses for storage (serialization) and deserialize them for retrieval

  • expire_after (Optional[int|float|str|datetime.datetime|datetime.timedelta]) – Sets the expiration time after which previously successfully cached responses expire. This can be modified via CachedSessionManager.DEFAULT_EXPIRE_AFTER (default=86400) unless otherwise set.

  • raise_on_error (bool) – Whether to raise an error on instantiation if an error is encountered in the creation of a session. If raise_on_error = False, the error is logged, and a requests.Session is created instead.

property backend: SessionCacheBackendType

Makes the config’s backend storage device for requests-cache accessible from the CachedSessionManager.

property cache_directory: Path | None

Makes the config’s cache directory accessible by the CachedSessionManager.

property cache_name: str

Makes the config’s base file name for the cache accessible by the CachedSessionManager.

property cache_path: str

Makes the config’s cache directory accessible by the CachedSessionManager.

configure_session(verify_connection: bool = False) Session | CachedSession[source]

Creates and returns a new CachedSession using the same settings shown in the current CachedSessionConfig.

Note

If the cached session can not be configured due to permission errors or connection errors, the session_manager will fallback to creating a requests.Session if the self.raise_on_error attribute is set to False.

Parameters:

verify_connection (bool) – Indicates whether CachedSession validation should occur by reading from cache. Useful for verifying whether the cache for remote connections (redis, mongodb) is accessible and the deserialization pipeline is operating as intended when a serializer is specified.

Returns:

A cached session object if successful otherwise returns a requests.Session object in the event of an error.

Return type:

requests.Session | requests_cache.CachedSession

classmethod default_session_backend(raise_on_error: bool = False) Literal['dynamodb', 'filesystem', 'gridfs', 'memory', 'mongodb', 'redis', 'sqlite'][source]

Reads a default backend from SCHOLAR_FLUX_DEFAULT_SESSION_CACHE_BACKEND or defaulting to sqlite otherwise.

Parameters:

raise_on_error (bool) – If True, an exception is raised when the environment variable exists but attempts to use an unknown requests_cache backend. If False, this method instead raises a warning defaulting to sqlite instead.

Returns:

The name of the backend to use as the default session cache.

Return type:

str

property expire_after: int | float | str | datetime | timedelta | None

Makes the config’s value used for response cache expiration accessible from the CachedSessionManager.

classmethod get_cache_directory(cache_directory: str | Path | None = None, backend: SessionCacheBackendType | None = None) Path | None[source]

Finds a directory path for use with session cache, favoring explicitly assigned directories if provided.

Note that this method will only attempt to find a cache directory if one is needed, such as when choosing to use a “filesystem” or “sqlite” database using a string.

Resolution order (highest to lowest priority):
  1. Explicit cache_directory argument

  2. config_settings.config[‘CACHE_DIRECTORY’] (can be set via environment variable)

  3. Package or home directory defaults (depending on writability)

If the resolved cache_directory is a string, it is coerced into a Path before being returned. Returns None if the backend does not require a cache directory (e.g., dynamodb, mongodb, etc.).

Parameters:
  • cache_directory (Optional[Path | str]) – Explicit directory to use, if provided.

  • backend (Optional[str | requests.BaseCache]) – Backend type, used to determine if a directory is needed.

Returns:

The resolved cache directory as a Path or None if not applicable

Return type:

Optional[Path]

classmethod get_cache_name(cache_name: str | None = None) str[source]

Retrieves a valid, non-missing cache_name when an input for the parameter is not provided.

When cache_name is None, this method attempts to retrieve a valid cache name from the environment variable, SCHOLAR_FLUX_SESSION_CACHE_NAME, when available. Otherwise, this method will falls back to using the default name: search_requests_cache.

Parameters:

cache_name (Optional[str]) – The name to associate with the current session cache backend.

Returns:

The resolved cache name, either retrieved via a user-specified input, the OS environment, or the search_requests_cache default.

Return type:

str

property kwargs: dict[str, Any]

Additional keyword arguments that can be passed to CachedSession on the creation of the session.

property serializer: SessionCacheSerializer | None

Makes the serializer from the config accessible from the CachedSessionManager.

classmethod validate_cached_session(session: CachedSession) None[source]

Verifies that created CachedSession objects can successfully retrieve from cache without error.

Note: This method is useful for verifying that Redis/MongoDB backends can successfully retrieve from cache and whether CachedSession using serializers created from the EncryptionPipelineSerializerFactory can successfully retrieve and deserialize previously cached responses using the current secret key.

Parameters:

session (requests_cache.CachedSession) – A session object to validate cache retrieval for. Note that if the cache is empty, this validation step will not raise an error.

Raises:

CachedSessionValidationError – If an error occurs during the validation of the cached session instance.

classmethod with_session(backend: SessionCacheBackendType | None = None, *, user_agent: str | None = None, cache_name: str | None = None, cache_directory: str | Path | None = None, serializer: SessionCacheSerializer | None = None, expire_after: int | float | str | datetime | timedelta | None = None, raise_on_error: bool = False, verify_connection: bool = False) Session | CachedSession[source]

Convenience factory method for creating and configuring a new CachedSession.

Note: For consistency with the DataCacheManager (Layer 2 processing cache), this method is designed to use the backend parameter as the only positional parameter while all others are designated as keyword only arguments.

Parameters:
  • backend (Optional[Literal["dynamodb", "filesystem", "gridfs", "memory", "mongodb", "redis", "sqlite"] | requests_cache.BaseCache]) – Defines the backend to use when creating a requests-cache session. the default is sqlite. Other backends include memory, filesystem, mongodb, redis, gridfs, and dynamodb.

  • user_agent (str) – Specifies the name to use for the User-Agent parameter that is to be provided in each request header.

  • cache_name (Optional[str]) – The name to associate with the current cache - used as a file in the case of filesystem/sqlite storages, and is otherwise used as a cache name in the case of storages such as Redis.

  • Optional (cache_directory) – Defines the directory where the cache file is stored. if not provided, the cache_directory, when needed (sqlite, filesystem storage, etc.) will default to the first writable directory location using the scholar_flux.package_metadata.get_default_writable_directory method.

  • serializer – (Optional[str | requests_cache.serializers.pipeline.SerializerPipeline | requests_cache.serializers.pipeline.Stage]): An optional serializer that is used to prepare cached responses for storage (serialization) and deserialize them for retrieval.

  • expire_after (Optional[int|float|str|datetime.datetime|datetime.timedelta]) – Sets the expiration time after which previously successfully cached responses expire. This can be modified via CachedSessionManager.DEFAULT_EXPIRE_AFTER (default=86400) unless otherwise set.

  • raise_on_error (bool) – Whether to raise an error on instantiation if an error is encountered in the creation of a session.

  • verify_connection (bool) – Indicates whether CachedSession validation should occur by reading from cache.

Returns:

A new session created by calling configure_session on the current session manager instance.

Return type:

requests.Session | requests_cache.CachedSession

class scholar_flux.sessions.session_manager.SessionManager(user_agent: str | None = None)[source]

Bases: BaseSessionManager

Manager that creates a simple requests session using the default settings and the provided User-Agent.

Example

>>> from scholar_flux.sessions import SessionManager
>>> from scholar_flux.api import SearchAPI
>>> from requests import Session
>>> session_manager = SessionManager(user_agent='scholar_flux_user_agent')
### Creating the session object
>>> session = session_manager.configure_session()
### Which is also equivalent to:
>>> session = session_manager()
### This implementation returns a requests.session object which is compatible with the SearchAPI:
>>> assert isinstance(session, Session)
# OUTPUT: True
>>> api = SearchAPI(query='history of software design', session = session)
__init__(user_agent: str | None = None) None[source]

Initializes a basic session manager that sets the user agent if provided.

Parameters:

user_agent (Optional[str]) – The User-Agent to be passed as a parameter in the creation of the session object. When a user_agent is not available, the user-agent will instead delegate the assignment of a User-Agent to the requests package (e.g., python-requests/2.32.5)

configure_session() Session[source]

Configures a basic requests session with the provided user_agent attribute.

Returns:

a regular requests.session object with the default settings and an optional user header.

Return type:

requests.Session

classmethod with_session(*, user_agent: str | None = None) Session[source]

Convenience factory method for creating and configuring a new requests.Session instance.

Note: This method is designed to first instantiate the current SessionManager class with the specified User-Agent.

Parameters:

user_agent (Optional[str]) – The User-Agent to be passed as a parameter when creating the session object.

Returns:

A new requests.Session object with the configured User-Agent.

Return type:

requests.Session

Module contents

The scholar_flux.sessions module contains helper classes to set up HTTP sessions, both cached and uncached, with relatively straightforward configurations and a unified interface. The SessionManager and CachedSessionManager are designed as factory classes that return a constructed session object with the parameters provided.

Classes:
  • SessionManager:

    Creates a standard requests.Session that simply takes a user-agent parameter.

  • CachedSessionManager:

    Creates a requests-cache.CachedSession with configurable options. This implementation uses pydantic for configuration to validate the parameters used to create the requests.CachedSession object.

Basic Usage:
>>> from scholar_flux.api import SearchAPI
>>> from scholar_flux.sessions import SessionManager, CachedSessionManager, EncryptionPipelineFactory
>>> from requests import Response
>>> from requests_cache import CachedResponse
>>> session_manager = SessionManager(user_agent='scholar_flux_session')
>>> requests_session = session_manager.configure_session() # or session_manager()
>>> api = SearchAPI(query = 'functional programming', session = requests_session)
Cached Sessions:
>>> from scholar_flux.api import SearchAPI
>>> from scholar_flux.sessions import CachedSessionManager
### And for cached sessions, the following defaults to sqlite in the package_cache subfolder
>>> session_manager = CachedSessionManager(user_agent='scholar_flux_session')
### Or initialize the session manager with a custom requests_cache backend
>>> from requests_cache import RedisCache, CachedResponse
>>> session_manager = CachedSessionManager(user_agent='scholar_flux_session', backend = RedisCache())
>>> cached_requests_session = session_manager() # or session_manager.configure_session()
>>> api_with_cache = SearchAPI(query = 'functional programming', session = cached_requests_session)
>>> response = api_with_cache.search(page=1) # will be cached on subsequent runs
>>> isinstance(response, Response)
# OUTPUT: True
>>> cached_response = api_with_cache.search(page=1) # is now cached
>>> isinstance(cached_response, CachedResponse)
# OUTPUT: True
Encrypted Cached Sessions
>>> from scholar_flux.api import SearchAPI
>>> from scholar_flux.sessions import CachedSessionManager, EncryptionPipelineFactory
### For encrypting requests we can create a serializer that encrypts data before it's stored:
>>> encryption_pipeline_factory = EncryptionPipelineFactory()
### The pipeline, if a Fernet key is not provided and not saved in a .env file that is read on import,
### the following generates a random Fernet key by default.
>>> fernet = encryption_pipeline_factory.fernet # (make sure you save this)
>>> print(fernet)
# OUTPUT: <cryptography.fernet.Fernet at 0x7efd9de62450>
### The encryption has to be specified when creating a cached session:
>>> session_manager = CachedSessionManager(user_agent='scholar_flux_session',
>>>                                        backend='filesystem',
>>>                                        serializer=encryption_pipeline_factory())
### Now assignment to a SearchAPI occurs similarly as before
>>> api_with_encrypted_cache = SearchAPI(query = 'object oriented programming', session = session_manager())
raises - SessionCreationError:

The base exception class for errors involving the creation/use of a CachedSessionManager

raises - SessionConfigurationError:

When an error occurs during the creation of a CachedSessionManager

raises - SessionInitializationError:

When an exception prevents the initialization of a new session from a session manager

raises - SessionCacheDirectoryError:

When cache directory setup fails for file-based backends (sqlite, filesystem)

Cached Session Support:

Cached sessions support all built-in subclasses originating from the BaseCache base class in requests-cache. This includes the following built-ins:

  • Dynamo DB,

  • File System cache,

  • GridFS

  • In-Memory

  • Mongo DB

  • Redis

  • SQLite

Custom implementations of BaseCache are also supported.

class scholar_flux.sessions.BaseSessionManager(*args: Any, **kwargs: Any)[source]

Bases: ABC

An abstract base class used as a factory to create session objects.

This base class can be extended to validate inputs to sessions and abstract the complexity of their creation

__init__(*args: Any, **kwargs: Any) None[source]

Initializes BaseSessionManager subclasses given the provided arguments.

abstract configure_session(*args: Any, **kwargs: Any) Session | CachedSession[source]

Configure the session.

Should be overridden by subclasses.

classmethod get_cache_directory(*args: Any, **kwargs: Any) Path | None[source]

Defines defaults used in the creation of subclasses.

Can be optionally overridden in the creation of cached session managers

classmethod with_session(*args: Any, **kwargs: Any) Session | CachedSession[source]

Convenience factory method for creating and configuring a new session instance.

Note: This method is designed to first instantiate the current SessionManager class using the provided positional or keyword arguments. Subclasses can define the exact parameters and type annotations required for instantiation if needed.

Parameters:
  • *args – Positional arguments to pass to the __init__ method of the current class

  • **kwargs – Keyword arguments to pass to the __init__ method of the current class

Returns:

A new session created by calling configure_session on the current session manager instance.

Return type:

requests.Session | CachedSession

class scholar_flux.sessions.CachedSessionConfig(*, cache_name: str, backend: SessionCacheBackendType, cache_directory: ~pathlib._local.Path | None = None, serializer: SessionCacheSerializer | None = None, expire_after: int | float | str | ~datetime.datetime | ~datetime.timedelta | None = None, user_agent: str | None = None, kwargs: dict[str, ~typing.Any] = <factory>)[source]

Bases: BaseModel

A helper model used to validate the inputs provided when creating a CachedSessionManager.

This config is used to validate the inputs to the session manager prior to attempting its creation.

backend: SessionCacheBackendType
cache_directory: Path | None
cache_name: str
property cache_path: str

Helper method for retrieving the path that the cache will be written to or named, depending on the backend.

Assumes that the cache_name is provided to the config is not None.

expire_after: int | float | str | datetime.datetime | datetime.timedelta | None
kwargs: dict[str, Any]
model_config: ClassVar[ConfigDict] = {'arbitrary_types_allowed': True}

Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].

serializer: SessionCacheSerializer | None
user_agent: str | None
classmethod validate_backend_dependency(v: str | SessionCacheBackendType) SessionCacheBackendType[source]

Validates the choice of backend to and raises an error if its dependency is missing.

If the backend has unmet dependencies, this validator will trigger a ValidationError.

Parameters:

v (str | Optional[Literal["dynamodb", "filesystem", "gridfs", "memory", "mongodb", "redis", "sqlite"] | requests_cache.BaseCache])) – A valid backend for requests_cache (not case sensitive)

Returns:

A BaseCache or name of a backend supported by requests-cache

Return type:

Optional[Literal[“dynamodb”, “filesystem”, “gridfs”, “memory”, “mongodb”, “redis”, “sqlite”] | requests_cache.BaseCache])

validate_backend_filepath() Self[source]

Helper method for validating when file storage is a necessity vs when it’s not required.

classmethod validate_cache_directory(v: Path | str | None) Path | None[source]

Validates the cache_directory field to flag simple cases where the value is an empty string.

classmethod validate_cache_name(v: str) str[source]

Validates the cache_name field to flag simple cases where the value is an empty string.

classmethod validate_expire_after(v: int | float | str | datetime | timedelta | None) int | float | datetime | timedelta | None[source]

Validates the expire_after field to flag simple cases where numeric values below 0 are marked as invalid.

class scholar_flux.sessions.CachedSessionManager(user_agent: str | None = None, cache_name: str | None = None, cache_directory: str | Path | None = None, backend: SessionCacheBackendType | None = None, serializer: SessionCacheSerializer | None = None, expire_after: int | float | str | datetime | timedelta | None = None, raise_on_error: bool = False)[source]

Bases: SessionManager

This session manager is a wrapper around requests-cache and enables the creation of a requests-cache session with defaults that abstract away the complexity of cached session management.

The purpose of this class is to abstract away the complexity in cached sessions by providing reasonable defaults that are well integrated with the scholar_flux package. The requests_cache package is built off of the base requests library and can similarly be injected into the scholar_flux SearchAPI for making cached queries.

Examples

>>> from scholar_flux.sessions import CachedSessionManager
>>> from scholar_flux.api import SearchAPI
>>> from requests_cache import CachedSession
### creates a sqlite cached session in a package-writable directory
>>> session_manager = CachedSessionManager(user_agent='scholar_flux_user_agent')
>>> cached_session = session_manager() # defaults to a sqlite session in the package directory
### Which is equivalent to:
>>> cached_session = session_manager.configure_session() # defaults to a sqlite session in the package directory
>>> assert isinstance(cached_session, CachedSession)
### Similarly to the basic requests.session, this can be dependency injected in the SearchAPI:
>>> SearchAPI(query = 'history of software design', session = cached_session)
DEFAULT_EXPIRE_AFTER: int | None = 86400
__init__(user_agent: str | None = None, cache_name: str | None = None, cache_directory: str | Path | None = None, backend: SessionCacheBackendType | None = None, serializer: SessionCacheSerializer | None = None, expire_after: int | float | str | datetime | timedelta | None = None, raise_on_error: bool = False) None[source]

Initializes the CachedSessionManager, defining and validating the options for CachedSession generation.

The inputs, once validated via pydantic, are passed to the self.configure_session method to generate a session object.

Parameters:
  • user_agent (str) – Specifies the name to use for the User-Agent parameter that is to be provided in each request header.

  • cache_name (Optional[str]) – The name to associate with the current cache - used as a file in the case of filesystem/sqlite storages, and is otherwise used as a cache name in the case of storages such as Redis. If not provided, the cache name defaults to “search_requests_cache”.

  • Optional (cache_directory) – Defines the directory where the cache file is stored. if not provided, the cache_directory, when needed (sqlite, filesystem storage, etc.) will default to the first writable directory location using the scholar_flux.package_metadata.get_default_writable_directory method.

  • backend (Optional[Literal["dynamodb", "filesystem", "gridfs", "memory", "mongodb", "redis", "sqlite"] | requests_cache.BaseCache]) –

    Defines the backend to use when creating a requests-cache session. the default is sqlite. Other backends include memory, filesystem, mongodb, redis, gridfs, and dynamodb.

    Users can enter in direct cache storage implementations from requests_cache, including RedisCache, MongoCache, SQLiteCache, etc. If left None, the cache will default to checking the value of the SCHOLAR_FLUX_DEFAULT_SESSION_CACHE_BACKEND environment variable. If the environment variable is missing, the backend defaults to sqlite.

    For more information, visit the following link: https://requests-cache.readthedocs.io/en/stable/user_guide/backends.html#choosing-a-backend

  • serializer – (Optional[str | requests_cache.serializers.pipeline.SerializerPipeline | requests_cache.serializers.pipeline.Stage]): An optional serializer that is used to prepare cached responses for storage (serialization) and deserialize them for retrieval

  • expire_after (Optional[int|float|str|datetime.datetime|datetime.timedelta]) – Sets the expiration time after which previously successfully cached responses expire. This can be modified via CachedSessionManager.DEFAULT_EXPIRE_AFTER (default=86400) unless otherwise set.

  • raise_on_error (bool) – Whether to raise an error on instantiation if an error is encountered in the creation of a session. If raise_on_error = False, the error is logged, and a requests.Session is created instead.

property backend: SessionCacheBackendType

Makes the config’s backend storage device for requests-cache accessible from the CachedSessionManager.

property cache_directory: Path | None

Makes the config’s cache directory accessible by the CachedSessionManager.

property cache_name: str

Makes the config’s base file name for the cache accessible by the CachedSessionManager.

property cache_path: str

Makes the config’s cache directory accessible by the CachedSessionManager.

configure_session(verify_connection: bool = False) Session | CachedSession[source]

Creates and returns a new CachedSession using the same settings shown in the current CachedSessionConfig.

Note

If the cached session can not be configured due to permission errors or connection errors, the session_manager will fallback to creating a requests.Session if the self.raise_on_error attribute is set to False.

Parameters:

verify_connection (bool) – Indicates whether CachedSession validation should occur by reading from cache. Useful for verifying whether the cache for remote connections (redis, mongodb) is accessible and the deserialization pipeline is operating as intended when a serializer is specified.

Returns:

A cached session object if successful otherwise returns a requests.Session object in the event of an error.

Return type:

requests.Session | requests_cache.CachedSession

classmethod default_session_backend(raise_on_error: bool = False) Literal['dynamodb', 'filesystem', 'gridfs', 'memory', 'mongodb', 'redis', 'sqlite'][source]

Reads a default backend from SCHOLAR_FLUX_DEFAULT_SESSION_CACHE_BACKEND or defaulting to sqlite otherwise.

Parameters:

raise_on_error (bool) – If True, an exception is raised when the environment variable exists but attempts to use an unknown requests_cache backend. If False, this method instead raises a warning defaulting to sqlite instead.

Returns:

The name of the backend to use as the default session cache.

Return type:

str

property expire_after: int | float | str | datetime | timedelta | None

Makes the config’s value used for response cache expiration accessible from the CachedSessionManager.

classmethod get_cache_directory(cache_directory: str | Path | None = None, backend: SessionCacheBackendType | None = None) Path | None[source]

Finds a directory path for use with session cache, favoring explicitly assigned directories if provided.

Note that this method will only attempt to find a cache directory if one is needed, such as when choosing to use a “filesystem” or “sqlite” database using a string.

Resolution order (highest to lowest priority):
  1. Explicit cache_directory argument

  2. config_settings.config[‘CACHE_DIRECTORY’] (can be set via environment variable)

  3. Package or home directory defaults (depending on writability)

If the resolved cache_directory is a string, it is coerced into a Path before being returned. Returns None if the backend does not require a cache directory (e.g., dynamodb, mongodb, etc.).

Parameters:
  • cache_directory (Optional[Path | str]) – Explicit directory to use, if provided.

  • backend (Optional[str | requests.BaseCache]) – Backend type, used to determine if a directory is needed.

Returns:

The resolved cache directory as a Path or None if not applicable

Return type:

Optional[Path]

classmethod get_cache_name(cache_name: str | None = None) str[source]

Retrieves a valid, non-missing cache_name when an input for the parameter is not provided.

When cache_name is None, this method attempts to retrieve a valid cache name from the environment variable, SCHOLAR_FLUX_SESSION_CACHE_NAME, when available. Otherwise, this method will falls back to using the default name: search_requests_cache.

Parameters:

cache_name (Optional[str]) – The name to associate with the current session cache backend.

Returns:

The resolved cache name, either retrieved via a user-specified input, the OS environment, or the search_requests_cache default.

Return type:

str

property kwargs: dict[str, Any]

Additional keyword arguments that can be passed to CachedSession on the creation of the session.

property serializer: SessionCacheSerializer | None

Makes the serializer from the config accessible from the CachedSessionManager.

classmethod validate_cached_session(session: CachedSession) None[source]

Verifies that created CachedSession objects can successfully retrieve from cache without error.

Note: This method is useful for verifying that Redis/MongoDB backends can successfully retrieve from cache and whether CachedSession using serializers created from the EncryptionPipelineSerializerFactory can successfully retrieve and deserialize previously cached responses using the current secret key.

Parameters:

session (requests_cache.CachedSession) – A session object to validate cache retrieval for. Note that if the cache is empty, this validation step will not raise an error.

Raises:

CachedSessionValidationError – If an error occurs during the validation of the cached session instance.

classmethod with_session(backend: SessionCacheBackendType | None = None, *, user_agent: str | None = None, cache_name: str | None = None, cache_directory: str | Path | None = None, serializer: SessionCacheSerializer | None = None, expire_after: int | float | str | datetime | timedelta | None = None, raise_on_error: bool = False, verify_connection: bool = False) Session | CachedSession[source]

Convenience factory method for creating and configuring a new CachedSession.

Note: For consistency with the DataCacheManager (Layer 2 processing cache), this method is designed to use the backend parameter as the only positional parameter while all others are designated as keyword only arguments.

Parameters:
  • backend (Optional[Literal["dynamodb", "filesystem", "gridfs", "memory", "mongodb", "redis", "sqlite"] | requests_cache.BaseCache]) – Defines the backend to use when creating a requests-cache session. the default is sqlite. Other backends include memory, filesystem, mongodb, redis, gridfs, and dynamodb.

  • user_agent (str) – Specifies the name to use for the User-Agent parameter that is to be provided in each request header.

  • cache_name (Optional[str]) – The name to associate with the current cache - used as a file in the case of filesystem/sqlite storages, and is otherwise used as a cache name in the case of storages such as Redis.

  • Optional (cache_directory) – Defines the directory where the cache file is stored. if not provided, the cache_directory, when needed (sqlite, filesystem storage, etc.) will default to the first writable directory location using the scholar_flux.package_metadata.get_default_writable_directory method.

  • serializer – (Optional[str | requests_cache.serializers.pipeline.SerializerPipeline | requests_cache.serializers.pipeline.Stage]): An optional serializer that is used to prepare cached responses for storage (serialization) and deserialize them for retrieval.

  • expire_after (Optional[int|float|str|datetime.datetime|datetime.timedelta]) – Sets the expiration time after which previously successfully cached responses expire. This can be modified via CachedSessionManager.DEFAULT_EXPIRE_AFTER (default=86400) unless otherwise set.

  • raise_on_error (bool) – Whether to raise an error on instantiation if an error is encountered in the creation of a session.

  • verify_connection (bool) – Indicates whether CachedSession validation should occur by reading from cache.

Returns:

A new session created by calling configure_session on the current session manager instance.

Return type:

requests.Session | requests_cache.CachedSession

class scholar_flux.sessions.EncryptionPipelineFactory(secret_key: str | bytes | SecretStr | None = None, salt: str | None = '')[source]

Bases: object

Helper class used to create a factory for encrypting and decrypting session cache and pipelines using a secret key.

Note that pickle in common use carries the potential for vulnerabilities when reading untrusted serialized data and can otherwise perform arbitrary code execution. This implementation makes use of a safe serializer that uses a fernet generated secret_key to validate the serialized data before reading and decryption. This prevents errors and halts reading the cached data in case of modification via a malicious source.

The EncryptionPipelineFactory can be used for generalized use cases requiring encryption outside scholar_flux. and implemented as follows:

>>> from scholar_flux.sessions import EncryptionPipelineFactory
>>> from requests_cache import CachedSession, CachedResponse
>>> encryption_pipeline_factory = EncryptionPipelineFactory()
>>> encryption_serializer = encryption_pipeline_factory()
>>> cached_session = CachedSession('filesystem', serializer = encryption_serializer)
>>> endpoint = "https://docs.python.org/3/library/typing.html"
>>> response = cached_session.get(endpoint)
>>> cached_response = cached_session.get(endpoint)
>>> assert isinstance(cached_response, CachedResponse)
ENCODING: Final[str] = 'utf-8'
__init__(secret_key: str | bytes | SecretStr | None = None, salt: str | None = '')[source]

Initializes the EncryptionPipelineFactory class that generates an encryption pipeline for use with CachedSession objects.

If no secret_key is provided, the code attempts to retrieve a secret key from the SCHOLAR_FLUX_CACHE_SECRET_KEY environment variable from the config.

Otherwise a random Fernet key is generated and used to encrypt the session.

Parameters:
  • bytes] (secret_key Optional[str |) – The key to use for encrypting and decrypting the data that flows through the pipeline.

  • salt – Optional[str]: An optional salt used to further increase security on write

create_pipeline() SerializerPipeline[source]

Create a serializer that uses pickle + itsdangerous for signing and cryptography for encryption.

This pipeline encrypts the response data after generating a signature when serialized. On load, the data is then decrypted and the signature that was previously generated with the secret key is verified prior to deserialization of the response.

Returns:

A new serializer pipeline that enforces signature validation and encryption.

Return type:

SerializerPipeline

encryption_stage() Stage[source]

Creates a new serializer stage that uses Fernet encryption and decryption using the generated Fernet key.

Returns:

A new serializer stage that encrypts data when dumped and decrypts data when loaded.

Return type:

Stage

property fernet: None

Returns the current fernet key using the validated 32 byte URL-safe base64 key.

static generate_secret_key() bytes[source]

Generates a secret key for Fernet encryption using the cryptography package.

Returns:

A new 32 byte URL-safe base 64 key

Return type:

bytes

property secret_key: bytes

Returns the secret key used for encrypting and decrypting the cache serialization pipeline.

signer_stage() Stage[source]

Creates a stage that uses itsdangerous to add a signature to responses during serialization.

This signature is generated on write and uses the provided secret key to enforce signature validation on deserialization, verifying that the response data hasn’t been tampered when the response is reloaded.

Returns:

A new stage that uses the secret key and salt for signature creation and validation.

Return type:

Stage

class scholar_flux.sessions.SessionCacheBackend(value, names=<not given>, *values, module=None, qualname=None, type=None, start=1, boundary=None)[source]

Bases: str, Enum

Known session cache backends compatible with requests-cache.

DYNAMODB = 'dynamodb'
FILESYSTEM = 'filesystem'
GRIDFS = 'gridfs'
MEMORY = 'memory'
MONGODB = 'mongodb'
REDIS = 'redis'
SQLITE = 'sqlite'
classmethod get(backend: str | SessionCacheBackend) SessionCacheBackend | None[source]

Helper method for retrieving a known, valid requests-cache backend.

class scholar_flux.sessions.SessionManager(user_agent: str | None = None)[source]

Bases: BaseSessionManager

Manager that creates a simple requests session using the default settings and the provided User-Agent.

Example

>>> from scholar_flux.sessions import SessionManager
>>> from scholar_flux.api import SearchAPI
>>> from requests import Session
>>> session_manager = SessionManager(user_agent='scholar_flux_user_agent')
### Creating the session object
>>> session = session_manager.configure_session()
### Which is also equivalent to:
>>> session = session_manager()
### This implementation returns a requests.session object which is compatible with the SearchAPI:
>>> assert isinstance(session, Session)
# OUTPUT: True
>>> api = SearchAPI(query='history of software design', session = session)
__init__(user_agent: str | None = None) None[source]

Initializes a basic session manager that sets the user agent if provided.

Parameters:

user_agent (Optional[str]) – The User-Agent to be passed as a parameter in the creation of the session object. When a user_agent is not available, the user-agent will instead delegate the assignment of a User-Agent to the requests package (e.g., python-requests/2.32.5)

configure_session() Session[source]

Configures a basic requests session with the provided user_agent attribute.

Returns:

a regular requests.session object with the default settings and an optional user header.

Return type:

requests.Session

classmethod with_session(*, user_agent: str | None = None) Session[source]

Convenience factory method for creating and configuring a new requests.Session instance.

Note: This method is designed to first instantiate the current SessionManager class with the specified User-Agent.

Parameters:

user_agent (Optional[str]) – The User-Agent to be passed as a parameter when creating the session object.

Returns:

A new requests.Session object with the configured User-Agent.

Return type:

requests.Session