scholar_flux.sessions package

Subpackages

scholar_flux.sessions.models package

Submodules

scholar_flux.sessions.encryption module

The scholar_flux.sessions.encryption module is tasked with the implementation of an EncryptionPipelineFactory that can be used to easily and efficiently create a serializer that is accepted by CachedSession objects to store requests cache.

This encryption factory uses encryption and a safer_serializer for two steps:

To sign the requests storage cache for invalidation on unexpected data changes/tampering
To encrypt request cache for storage after serialization and decrypt it before deserialization during retrieval

If a key does not exist and is not provided, the EncryptionPipelineFactory will create a new Fernet key. for these steps

class scholar_flux.sessions.encryption.EncryptionPipelineFactory(secret_key: str | bytes | None = None, salt: str | None = '')[source]

Bases: object

Helper class used to create a factory for encrypting and decrypting session cache and pipelines using a secret key.

Note that pickle in common uses carries the potential for vulnerabilities when reading untrusted serialized data and can otherwise perform arbitrary code execution. This implementation makes use of a safe serializer that uses a fernet generated secret_key to validate the serialized data before reading and decryption. This prevents errors and halts reading the cached data in case of modification via a malicious source.

The EncryptionPipelineFactory can be used for generalized use cases requiring encryption outside scholar_flux and implemented as follows:

>>> from scholar_flux.sessions import EncryptionPipelineFactory
>>> from requests_cache import CachedSession, CachedResponse
>>> encryption_pipeline_factory = EncryptionPipelineFactory()
>>> encryption_serializer = encryption_pipeline_factory()
>>> cached_session = CachedSession('filesystem', serializer = encryption_serializer)
>>> endpoint = "https://docs.python.org/3/library/typing.html"
>>> response = cached_session.get(endpoint)
>>> cached_response = cached_session.get(endpoint)
>>> assert isinstance(cached_response, CachedResponse)

__init__(secret_key: str | bytes | None = None, salt: str | None = '')[source]

Initializes the EncryptionPipelineFactory class that generates an encryption pipeline for use with CachedSession objects.

If no secret_key is provided, the code attempts to retrieve a secret key from the SCHOLAR_FLUX_CACHE_SECRET_KEY environment variable from the config.

Otherwise a random Fernet key is generated and used to encrypt the session.

Parameters:

bytes] (secret_key Optional[str |) – The key to use for encrypting and decrypting the data that flows through the pipeline.
salt – Optional[str]: An optional salt used to further increase security on write

create_pipeline() → SerializerPipeline[source]: Create a serializer that uses pickle + itsdangerous for signing and cryptography for encryption.

encryption_stage() → Stage[source]: Create a stage that uses Fernet encryption.

property fernet: None: Returns a fernet key using the validated 32 byte url-safe base64 key.

static generate_secret_key() → bytes[source]: Generate a secret key for Fernet encryption.

signer_stage() → Stage[source]: Create a stage that uses itsdangerous to add a signature to responses on write, and validate that signature with a secret key on read.

scholar_flux.sessions.session_manager module

The scholar_flux.utils.session_manager module implements the SessionManager and CachedSessionManager classes that each serve as factory methods in the creation of requests.Session objects and requests_cache.CachedSession objects.

By calling the configure_session manager class, a new session can be created that implements basic or cached sessions depending on which SessionManager was created.

Classes:: SessionManager: Base class holding the configuration for non-cached sessions CachedSessionManager: Extensible factory class allowing users to define cached sessions with the selected backend

Bases: SessionManager

This session manager is a wrapper around requests-cache and enables the creation of a requests-cache session with defaults that abstract away the complexity of cached session management.

The purpose of this class is to abstract away the complexity in cached sessions by providing reasonable defaults that are well integrated with the scholar_flux package The requests_cache package is built off of the base requests library and similarly be injected into the scholar_flux SearchAPI for making cached queries.

Examples

>>> from scholar_flux.sessions import CachedSessionManager
>>> from scholar_flux.api import SearchAPI
>>> from requests_cache import CachedSession
### creates a sqlite cached session in a package-writable directory
>>> session_manager = CachedSessionManager(user_agent='scholar_flux_user_agent')
>>> cached_session = session_manager() # defaults to a sqlite session in the package directory
### Which is equivalent to:
>>> cached_session = session_manager.configure_session() # defaults to a sqlite session in the package directory
>>> assert isinstance(cached_session, CachedSession)
### Similarly to the basic requests.session, this can be dependency injected in the SearchAPI:
>>> SearchAPI(query = 'history of software design', session = cached_session)

The initialization of the CachedSessionManager defines the options that are later passed to the self.configure_session method which returns a session object after parameter validation.

Parameters:

user_agent (str) – Specifies the name to use for the User-Agent parameter that is to be provided in each request header.
cache_name (str) – The name to associate with the current cache - used as a file in the case of filesystem/sqlite storages, and is otherwise used as a cache name in the case of storages such as Redis.
Optional (cache_directory) – Defines the directory where the cache file is stored. if not provided, the cache_directory, when needed (sqlite, filesystem storage, etc.) will default to the first writable directory location using the scholar_flux.package_metadata.get_default_writable_directory method.
backend (str | requests.BaseCache) – Defines the backend to use when creating a requests-cache session. the default is sqlite. Other backends include memory, filesystem, mongodb, redis, gridfs, and dynamodb. Users can enter in direct cache storage implementations from requests_cache, including RedisCache, MongoCache, SQLiteCache, etc. For more information, visit the following link: https://requests-cache.readthedocs.io/en/stable/user_guide/backends.html#choosing-a-backend
serializer – (Optional[str | requests_cache.serializers.pipeline.SerializerPipeline | requests_cache.serializers.pipeline.Stage]): An optional serializer that is used to prepare cached responses for storage (serialization) and deserialize them for retrieval
expire_after (Optional[int|float|str|datetime.datetime|datetime.timedelta]) – Sets the expiration time after which previously successfully cached responses expire.
raise_on_error (bool) – Whether to raise an error on instantiation if an error is encountered in the creation of a session. If raise_on_error = False, the error is logged, and a requests.Session is created instead.

property backend: str | BaseCache: Makes the config’s backend storage device for requests-cache accessible from the CachedSessionManager.

property cache_directory: Path | None: Makes the config’s cache directory accessible by the CachedSessionManager.

property cache_name: str: Makes the config’s base file name for the cache accessible by the CachedSessionManager.

property cache_path: str: Makes the config’s cache directory accessible by the CachedSessionManager.

configure_session() → Session | CachedSession[source]

Configures and returns a cached session object with the options provided to the config when creating the CachedSessionManager.

Note

If the cached session can not be configured due to permission errors, or connection errors, the session_manager will fallback to creating a requests.Session if the self.raise_on_error attribute is set to False.

Returns:: A cached session object if successful otherwise returns a requests.Session object in the event of an error.
Return type:: requests.Session | requests_cache.CachedSession

property expire_after: int | float | str | datetime | timedelta | None: Makes the config’s value used for response cache expiration accessible from the CachedSessionManager.

Determines what directory will be used for session cache storage, favoring an explicitly assigned cache_directory if provided.

Note that this method will only attempt to find a cache directory if one is needed, such as when choosing to use a “filesystem” or “sqlite” database using a string.

Resolution order (highest to lowest priority):

Explicit cache_directory argument
config_settings.config[‘CACHE_DIRECTORY’] (can be set via environment variable)
Package or home directory defaults (depending on writeability)

If the resolved cache_directory is a string, it is coerced into a Path before being returned. Returns None if the backend does not require a cache directory (e.g., dynamodb, mongodb, etc.).

Parameters:

cache_directory (Optional[Path | str]) – Explicit directory to use, if provided.
backend (Optional[str | requests.BaseCache]) – Backend type, used to determine if a directory is needed.

Returns:

The resolved cache directory as a Path or None if not applicable

Return type:

Optional[Path]

property serializer: str | SerializerPipeline | Stage | None: Makes the serializer from the config accessible from the CachedSessionManager.

class scholar_flux.sessions.session_manager.SessionManager(user_agent: str | None = None)[source]

Bases: BaseSessionManager

Manager that creates a simple requests session using the default settings and the provided User-Agent.

Parameters:: user_agent (Optional[str]) – The User-Agent to be passed as a parameter in the creation of the session object.

Example

>>> from scholar_flux.sessions import SessionManager
>>> from scholar_flux.api import SearchAPI
>>> from requests import Session
>>> session_manager = SessionManager(user_agent='scholar_flux_user_agent')
### Creating the session object
>>> session = session_manager.configure_session()
### Which is also equivalent to:
>>> session = session_manager()
### This implementation returns a requests.session object which is compatible with the SearchAPI:
>>> assert isinstance(session, Session)
# OUTPUT: True
>>> api = SearchAPI(query='history of software design', session = session)

__init__(user_agent: str | None = None) → None[source]: Initializes a basic session manager that sets the user agent if provided.

configure_session() → Session[source]

Configures a basic requests session with the provided user_agent attribute.

Returns:: a regular requests.session object with the default settings and an optional user header.
Return type:: requests.Session

Module contents

The scholar_flux.sessions module contains helper classes to set up HTTP sessions, both cached and uncached, with relatively straightforward configurations and a unified interface. The SessionManager and CachedSessionManager are designed as factory classes that return a constructed session object with the parameters provided.

Classes:

SessionManager:
Creates a standard requests.Session that simply takes a user-agent parameter.
CachedSessionManager:
Creates a requests-cache.CachedSession with configurable options. This implementation uses pydantic for configuration to validate the parameters used to create the requests.CachedSession object.

Basic Usage:

>>> from scholar_flux.api import SearchAPI
>>> from scholar_flux.sessions import SessionManager, CachedSessionManager, EncryptionPipelineFactory
>>> from requests import Response
>>> from requests_cache import CachedResponse
>>> session_manager = SessionManager(user_agent='scholar_flux_session')
>>> requests_session = session_manager.configure_session() # or session_manager()
>>> api = SearchAPI(query = 'functional programming', session = requests_session)

Cached Sessions:

>>> from scholar_flux.api import SearchAPI
>>> from scholar_flux.sessions import CachedSessionManager
### And for cached sessions, the following defaults to sqlite in the package_cache subfolder
>>> session_manager = CachedSessionManager(user_agent='scholar_flux_session')
### Or initialize the session manager with a custom requests_cache backend
>>> from requests_cache import RedisCache, CachedResponse
>>> session_manager = CachedSessionManager(user_agent='scholar_flux_session', backend = RedisCache())
>>> cached_requests_session = session_manager() # or session_manager.configure_session()
>>> api_with_cache = SearchAPI(query = 'functional programming', session = cached_requests_session)
>>> response = api_with_cache.search(page=1) # will be cached on subsequent runs
>>> isinstance(response, Response)
# OUTPUT: True
>>> cached_response = api_with_cache.search(page=1) # is now cached
>>> isinstance(cached_response, CachedResponse)
# OUTPUT: True

Encrypted Cached Sessions

>>> from scholar_flux.api import SearchAPI
>>> from scholar_flux.sessions import CachedSessionManager, EncryptionPipelineFactory
### For encrypting requests we can create a serializer that encrypts data before it's stored:
>>> encryption_pipeline_factory = EncryptionPipelineFactory()
### The pipeline, if a Fernet key is not provided and not saved in a .env file that is read on import,
### the following generates a random Fernet key by default.
>>> fernet = encryption_pipeline_factory.fernet # (make sure you save this)
>>> print(fernet)
# OUTPUT: <cryptography.fernet.Fernet at 0x7efd9de62450>
### The encryption has to be specified when creating a cached session:
>>> session_manager = CachedSessionManager(user_agent='scholar_flux_session',
>>>                                        backend='filesystem',
>>>                                        serializer=encryption_pipeline_factory())
### Now assignment to a SearchAPI occurs similarly as before
>>> api_with_encrypted_cache = SearchAPI(query = 'object oriented programming', session = session_manager())

raises - SessionCreationError:
raises - SessionConfigurationError:
raises - SessionInitializationError:
raises - SessionCacheDirectoryError:

Cached Session Support:

Cached sessions support all built-in subclasses originating from the BaseCache base class in requests-cache. This includes the following built-ins:

Dynamo DB,
File System cache,
GridFS
In-Memory
Mongo DB
Redis
SQLite

Custom implementations of BaseCache are also supported.

See also

https://requests-cache.readthedocs.io/

class scholar_flux.sessions.BaseSessionManager(*args, **kwargs)[source]

Bases: ABC

An abstract base class used as a factory to create session objects.

This base class can be extended to validate inputs to sessions and abstract the complexity of their creation

__init__(*args, **kwargs) → None[source]: Initializes BaseSessionManager subclasses given the provided arguments.

abstract configure_session(*args, **kwargs) → Session | CachedSession[source]

Configure the session.

Should be overridden by subclasses.

classmethod get_cache_directory(*args, **kwargs) → Path | None[source]

Defines defaults used in the creation of subclasses.

Can be optionally overridden in the creation of cached session managers

Bases: BaseModel

A helper model used to validate the inputs provided when creating a CachedSessionManager.

This config is used to validate the inputs to the session manager prior to attempting its creation.

backend: Literal['dynamodb', 'filesystem', 'gridfs', 'memory', 'mongodb', 'redis', 'sqlite'] | BaseCache

cache_directory: Path | None

cache_name: str

property cache_path: str

Helper method for retrieving the path that the cache will be written to or named, depending on the backend.

Assumes that the cache_name is provided to the config is not None.

expire_after: int | float | str | datetime | timedelta | None

model_config: ClassVar[ConfigDict] = {'arbitrary_types_allowed': True}: Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].

serializer: str | SerializerPipeline | Stage | None

user_agent: str | None

classmethod validate_backend_dependency(v)[source]

Validates the choice of backend to and raises an error if its dependency is missing.

If the backend has unmet dependencies, this validator will trigger a ValidationError.

validate_backend_filepath() → Self[source]: Helper method for validating when file storage is a necessity vs when it’s not required.

classmethod validate_cache_directory(v) → Path | None[source]: Validates the cache_directory field to flag simple cases where the value is an empty string.

classmethod validate_cache_name(v) → str[source]: Validates the cache_name field to flag simple cases where the value is an empty string.

classmethod validate_expire_after(v)[source]: Validates the expire_after field to flag simple cases where numeric values below 0 are marked as invalid.

Bases: SessionManager

This session manager is a wrapper around requests-cache and enables the creation of a requests-cache session with defaults that abstract away the complexity of cached session management.

The purpose of this class is to abstract away the complexity in cached sessions by providing reasonable defaults that are well integrated with the scholar_flux package The requests_cache package is built off of the base requests library and similarly be injected into the scholar_flux SearchAPI for making cached queries.

Examples

>>> from scholar_flux.sessions import CachedSessionManager
>>> from scholar_flux.api import SearchAPI
>>> from requests_cache import CachedSession
### creates a sqlite cached session in a package-writable directory
>>> session_manager = CachedSessionManager(user_agent='scholar_flux_user_agent')
>>> cached_session = session_manager() # defaults to a sqlite session in the package directory
### Which is equivalent to:
>>> cached_session = session_manager.configure_session() # defaults to a sqlite session in the package directory
>>> assert isinstance(cached_session, CachedSession)
### Similarly to the basic requests.session, this can be dependency injected in the SearchAPI:
>>> SearchAPI(query = 'history of software design', session = cached_session)

The initialization of the CachedSessionManager defines the options that are later passed to the self.configure_session method which returns a session object after parameter validation.

Parameters:

user_agent (str) – Specifies the name to use for the User-Agent parameter that is to be provided in each request header.
cache_name (str) – The name to associate with the current cache - used as a file in the case of filesystem/sqlite storages, and is otherwise used as a cache name in the case of storages such as Redis.
Optional (cache_directory) – Defines the directory where the cache file is stored. if not provided, the cache_directory, when needed (sqlite, filesystem storage, etc.) will default to the first writable directory location using the scholar_flux.package_metadata.get_default_writable_directory method.
backend (str | requests.BaseCache) – Defines the backend to use when creating a requests-cache session. the default is sqlite. Other backends include memory, filesystem, mongodb, redis, gridfs, and dynamodb. Users can enter in direct cache storage implementations from requests_cache, including RedisCache, MongoCache, SQLiteCache, etc. For more information, visit the following link: https://requests-cache.readthedocs.io/en/stable/user_guide/backends.html#choosing-a-backend
serializer – (Optional[str | requests_cache.serializers.pipeline.SerializerPipeline | requests_cache.serializers.pipeline.Stage]): An optional serializer that is used to prepare cached responses for storage (serialization) and deserialize them for retrieval
expire_after (Optional[int|float|str|datetime.datetime|datetime.timedelta]) – Sets the expiration time after which previously successfully cached responses expire.
raise_on_error (bool) – Whether to raise an error on instantiation if an error is encountered in the creation of a session. If raise_on_error = False, the error is logged, and a requests.Session is created instead.

property backend: str | BaseCache: Makes the config’s backend storage device for requests-cache accessible from the CachedSessionManager.

property cache_directory: Path | None: Makes the config’s cache directory accessible by the CachedSessionManager.

property cache_name: str: Makes the config’s base file name for the cache accessible by the CachedSessionManager.

property cache_path: str: Makes the config’s cache directory accessible by the CachedSessionManager.

configure_session() → Session | CachedSession[source]

Configures and returns a cached session object with the options provided to the config when creating the CachedSessionManager.

Note

If the cached session can not be configured due to permission errors, or connection errors, the session_manager will fallback to creating a requests.Session if the self.raise_on_error attribute is set to False.

Returns:: A cached session object if successful otherwise returns a requests.Session object in the event of an error.
Return type:: requests.Session | requests_cache.CachedSession

property expire_after: int | float | str | datetime | timedelta | None: Makes the config’s value used for response cache expiration accessible from the CachedSessionManager.

Determines what directory will be used for session cache storage, favoring an explicitly assigned cache_directory if provided.

Note that this method will only attempt to find a cache directory if one is needed, such as when choosing to use a “filesystem” or “sqlite” database using a string.

Resolution order (highest to lowest priority):

Explicit cache_directory argument
config_settings.config[‘CACHE_DIRECTORY’] (can be set via environment variable)
Package or home directory defaults (depending on writeability)

If the resolved cache_directory is a string, it is coerced into a Path before being returned. Returns None if the backend does not require a cache directory (e.g., dynamodb, mongodb, etc.).

Parameters:

cache_directory (Optional[Path | str]) – Explicit directory to use, if provided.
backend (Optional[str | requests.BaseCache]) – Backend type, used to determine if a directory is needed.

Returns:

The resolved cache directory as a Path or None if not applicable

Return type:

Optional[Path]

property serializer: str | SerializerPipeline | Stage | None: Makes the serializer from the config accessible from the CachedSessionManager.

class scholar_flux.sessions.EncryptionPipelineFactory(secret_key: str | bytes | None = None, salt: str | None = '')[source]

Bases: object

Helper class used to create a factory for encrypting and decrypting session cache and pipelines using a secret key.

Note that pickle in common uses carries the potential for vulnerabilities when reading untrusted serialized data and can otherwise perform arbitrary code execution. This implementation makes use of a safe serializer that uses a fernet generated secret_key to validate the serialized data before reading and decryption. This prevents errors and halts reading the cached data in case of modification via a malicious source.

The EncryptionPipelineFactory can be used for generalized use cases requiring encryption outside scholar_flux and implemented as follows:

>>> from scholar_flux.sessions import EncryptionPipelineFactory
>>> from requests_cache import CachedSession, CachedResponse
>>> encryption_pipeline_factory = EncryptionPipelineFactory()
>>> encryption_serializer = encryption_pipeline_factory()
>>> cached_session = CachedSession('filesystem', serializer = encryption_serializer)
>>> endpoint = "https://docs.python.org/3/library/typing.html"
>>> response = cached_session.get(endpoint)
>>> cached_response = cached_session.get(endpoint)
>>> assert isinstance(cached_response, CachedResponse)

__init__(secret_key: str | bytes | None = None, salt: str | None = '')[source]

Initializes the EncryptionPipelineFactory class that generates an encryption pipeline for use with CachedSession objects.

If no secret_key is provided, the code attempts to retrieve a secret key from the SCHOLAR_FLUX_CACHE_SECRET_KEY environment variable from the config.

Otherwise a random Fernet key is generated and used to encrypt the session.

Parameters:

bytes] (secret_key Optional[str |) – The key to use for encrypting and decrypting the data that flows through the pipeline.
salt – Optional[str]: An optional salt used to further increase security on write

create_pipeline() → SerializerPipeline[source]: Create a serializer that uses pickle + itsdangerous for signing and cryptography for encryption.

encryption_stage() → Stage[source]: Create a stage that uses Fernet encryption.

property fernet: None: Returns a fernet key using the validated 32 byte url-safe base64 key.

static generate_secret_key() → bytes[source]: Generate a secret key for Fernet encryption.

signer_stage() → Stage[source]: Create a stage that uses itsdangerous to add a signature to responses on write, and validate that signature with a secret key on read.

class scholar_flux.sessions.SessionManager(user_agent: str | None = None)[source]

Bases: BaseSessionManager

Manager that creates a simple requests session using the default settings and the provided User-Agent.

Parameters:: user_agent (Optional[str]) – The User-Agent to be passed as a parameter in the creation of the session object.

Example

>>> from scholar_flux.sessions import SessionManager
>>> from scholar_flux.api import SearchAPI
>>> from requests import Session
>>> session_manager = SessionManager(user_agent='scholar_flux_user_agent')
### Creating the session object
>>> session = session_manager.configure_session()
### Which is also equivalent to:
>>> session = session_manager()
### This implementation returns a requests.session object which is compatible with the SearchAPI:
>>> assert isinstance(session, Session)
# OUTPUT: True
>>> api = SearchAPI(query='history of software design', session = session)

__init__(user_agent: str | None = None) → None[source]: Initializes a basic session manager that sets the user agent if provided.

configure_session() → Session[source]

Configures a basic requests session with the provided user_agent attribute.

Returns:: a regular requests.session object with the default settings and an optional user header.
Return type:: requests.Session