scholar_flux.api.workflows package

Submodules

scholar_flux.api.workflows.models module

Module that implements the base classes used by scholar_flux workflows to implement the customizable, multi-step retrieval and processing of API responses.

Classes:

BaseStepContext: Base class for step contexts BaseWorkflowStep: Base class for workflow steps BaseWorkflowResult: Base class for returning the results from a Workflow BaseWorkflow: Base class for defining and fully executing a workflow

class scholar_flux.api.workflows.models.BaseStepContext[source]

Bases: BaseModel

Base class for step contexts.

Passed between workflow steps to communicate the context and history of the current workflow before and after the execution of each step.

model_config: ClassVar[ConfigDict] = {}

Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].

class scholar_flux.api.workflows.models.BaseWorkflow[source]

Bases: BaseModel, ABC

Base class for defining and fully executing a workflow.

model_config: ClassVar[ConfigDict] = {}

Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].

class scholar_flux.api.workflows.models.BaseWorkflowResult[source]

Bases: BaseModel

Base class for returning the results from a Workflow.

model_config: ClassVar[ConfigDict] = {}

Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].

class scholar_flux.api.workflows.models.BaseWorkflowStep(*, additional_kwargs: ~typing.Dict[str, ~typing.Any] = <factory>)[source]

Bases: BaseModel

Base class for workflow steps.

Used to define the behavior and actions of each step in a workflow

Parameters:

additional_kwargs (Dict[str, Any]) – A dictionary of optional keyword parameters used to modify the functionality of future WorkflowStep subclass instances.

additional_kwargs: Dict[str, Any]
model_config: ClassVar[ConfigDict] = {}

Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].

post_transform(ctx: Any, *args: Any, **kwargs: Any) Any[source]

Defines the optional transformation to the results that are retrieved after executing the workflow step to modify its output.

Parameters:
  • ctx (Any) – Defines the inputs that are used by the BaseWorkflowStep after execution to modify its output.

  • *args – Optional positional arguments to pass to change its output behavior

  • **args – Optional keyword arguments to pass to change its output behavior

Returns:

A modified or copied version of the output to be returned or prepared for the next step

Return type:

BaseWorkflowStep

pre_transform(ctx: Any, *args: Any, **kwargs: Any) Self[source]

Defines the optional transformation to the BaseWorkflowStep that can occur before executing the workflow step to generate and modify its behavior.

Parameters:
  • ctx (Any) – Defines the inputs that are used by the BaseWorkflowStep to modify its function before execution

  • *args – Optional positional arguments to pass to change runtime behavior

  • **kwargs – Optional keyword arguments to pass to change runtime behavior

Returns:

A modified or copied version of the original BaseWorkflowStep

Return type:

BaseWorkflowStep

with_context(*args: Any, **kwargs: Any) Generator[Self, None, None][source]

Helper method to be overridden by subclasses to customize the behavior of the workflow step.

Base classes implementing with_context should ideally use a context manager to be fully compatible as an override for current method.

Yields:

Self – The current workflow step within a context

scholar_flux.api.workflows.pubmed_workflow module

The scholar_flux.api.workflows.pubmed_workflow module defines the core steps for retrieving records from PubMed API.

These two steps integrate into a single workflow to consolidate the two-step article/abstract retrieval process into a single step that involves the automatic execution of a workflow.

Classes:
PubMedSearchStep:

The first of two steps in the article/metadata response retrieval process involving ID retrieval

PubMedFetchStep:

The second of two steps in the article/metadata response retrieval process that resolves IDs into their corresponding article data and metadata.

Note that this workflow is further defined in the workflow_defaults.py module and is automatically retrieved when creating a new SearchCoordinator when provider_name=pubmed. The SearchCoordinator.search() method will then automatically retrieve records and metadata without the need to directly execute either step if workflows are enabled in the SearchCoordinator.

class scholar_flux.api.workflows.pubmed_workflow.PubMedFetchStep(*, additional_kwargs: ~typing.Dict[str, ~typing.Any] = <factory>, provider_name: str | None = 'pubmedefetch', search_parameters: ~typing.Dict[str, ~typing.Any] = <factory>, config_parameters: ~typing.Dict[str, ~typing.Any] = <factory>, description: str | None = 'Fetches each record/article corresponding to a PubMed ID from the PubMedSearchStep.', step_number: int | None = 1)[source]

Bases: WorkflowStep

Next and final step of the PubMed workflow that uses the eFetch API to resolve article/abstract Ids.

These ids are retrieved from the metadata of the previous step and are used as input to eFetch to retrieve their associated articles and/or abstracts.

Parameters:
  • provider_name (Optional[str]) – Defines the pubmed eFetch API as the location where the next/final request will be sent.

  • step_number – Metadata indicating the intended position in the workflow sequence. This is for documentation purposes only; the actual execution order is determined by the step’s position in the workflow’s steps list.

  • description – Metadata indicating the purpose of the current workflow step. This is for documentation purposes only.

description: str | None
model_config: ClassVar[ConfigDict] = {}

Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].

pre_transform(ctx: StepContext | None = None, provider_name: str | None = None, search_parameters: dict | None = None, config_parameters: dict | None = None) PubMedFetchStep[source]

Overrides the pre_transform of the SearchWorkflow step to use the IDs retrieved from the previous step as input parameters for the PubMed eFetch API request.

Parameters:
  • ctx (Optional[StepContext]) – Defines the inputs that are used by the current PubMedWorkflowStep to modify its function before execution.

  • provider_name – Optional[str]: Provided for API compatibility. Is uses pubmedefetch by default.

  • search_parameters – defines optional keyword arguments to pass to SearchCoordinator._search()

  • config_parameters – defines optional keyword arguments that modify the step’s SearchAPIConfig

Returns:

A modified or copied version of the current pubmed workflow step

Return type:

PubMedFetchWorkflowStep

provider_name: str | None
step_number: int | None
class scholar_flux.api.workflows.pubmed_workflow.PubMedSearchStep(*, additional_kwargs: ~typing.Dict[str, ~typing.Any] = <factory>, provider_name: str | None = 'pubmed', search_parameters: ~typing.Dict[str, ~typing.Any] = <factory>, config_parameters: ~typing.Dict[str, ~typing.Any] = <factory>, description: str | None = 'Retrieves IDs of records matching a particular query from the PubMed database.', step_number: int | None = 0)[source]

Bases: WorkflowStep

Initial step of the PubMed workflow that retrieves the IDs of articles/abstracts matching the query.

The equivalent of this step is the retrieval of a single page from the PubMed API without the use of a workflow. The default search/config parameter settings can be overridden to customize how the workflow step is executed.

After retrieving the IDs of records that match the current query and page, the workflow will pass these IDs as context to the following PubMedFetchStep which will then resolve each ID into its associated actual article and/or abstract.

provider_name

Defines the pubmed eSearch API as the location where the initial request will be sent.

Type:

Optional[str]

step_number

Metadata indicating the intended position in the workflow sequence. This is for documentation purposes only; the actual execution order is determined by the step’s position in the workflow’s steps list.

Type:

Optional[int]

description

Metadata indicating the purpose of the current workflow step. This is for documentation purposes only.

Type:

Optional[str]

description: str | None
model_config: ClassVar[ConfigDict] = {}

Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].

provider_name: str | None
step_number: int | None
class scholar_flux.api.workflows.pubmed_workflow.PubMedSearchWorkflow(*, steps: list[~scholar_flux.api.workflows.search_workflow.WorkflowStep] = <factory>, stop_on_error: bool = True)[source]

Bases: SearchWorkflow

SearchWorkflow implementation for PubMed’s two-step article retrieval process.

PubMed’s API requires a two-step retrieval process:

  1. eSearch (PubMedSearchStep): Searches for articles matching the query and returns a list of article IDs along with metadata about the search (query info, pagination, result counts, etc.)

  2. eFetch (PubMedFetchStep): Takes the article IDs from step 1 and retrieves the full article data including abstracts, authors, and other detailed information.

This workflow coordinates both steps automatically and ensures that metadata from the initial eSearch is preserved in the final result, providing consumers with both the full article data and the search context.

model_config: ClassVar[ConfigDict] = {}

Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].

model_post_init(context: Any, /) None

This function is meant to behave like a BaseModel method to initialise private attributes.

It takes context as an argument since that’s what pydantic-core passes when calling it.

Parameters:
  • self – The BaseModel instance.

  • context – The context.

steps: list[WorkflowStep]

scholar_flux.api.workflows.search_workflow module

Implements the workflow steps, runner, and context necessary for orchestrating a workflow that retrieves and processes API responses using a sequential methodology. These classes form the base of how a workflow is designed and can be used directly to create a multi-step workflow or subclassed to further customize the functionality of the workflow.

Classes:

StepContext: Defines the step context to be transferred to the next step in a workflow to modify its function WorkflowStep: Contains the necessary logic and instructions for executing the current step of the SearchWorkflow WorkflowResult: Class that holds the history and final result of a workflow after successful execution SearchWorkflow: Defines and fully executes a workflow and the steps used to arrive at the final result

class scholar_flux.api.workflows.search_workflow.SearchWorkflow(*, steps: List[WorkflowStep], stop_on_error: bool = True)[source]

Bases: BaseWorkflow

Front-end SearchWorkflow class that is further refined for particular providers base on subclassing. This class defines the full workflow used to arrive at a result and records the history of each search at any particular step.

Parameters:
  • steps (List[WorkflowStep]) – Defines the steps to be iteratively executed to arrive at a result.

  • stop_on_error (bool) – Defines whether to stop workflow step iteration when an error occurs in a preceding step. If True, the workflow halts and the ErrorResponse from the previous step is returned.

  • history (List[StepContext]) – Defines the full context of all steps taken and results recorded to arrive at the final result on the completion of an executed workflow.

model_config: ClassVar[ConfigDict] = {}

Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].

model_post_init(context: Any, /) None

This function is meant to behave like a BaseModel method to initialise private attributes.

It takes context as an argument since that’s what pydantic-core passes when calling it.

Parameters:
  • self – The BaseModel instance.

  • context – The context.

steps: List[WorkflowStep]
stop_on_error: bool
class scholar_flux.api.workflows.search_workflow.StepContext(*, step_number: int, step: WorkflowStep, result: ProcessedResponse | ErrorResponse | None = None)[source]

Bases: BaseStepContext

Helper class that holds information on the Workflow step, step number, and its results after execution. This StepContext is passed before and after the execution of a SearchWorkflowStep to dynamically aid in the modification of the functioning of each step at runtime.

Parameters:
  • step_number (int) – Indicates the order in which the step is executed for a particular step context

  • step (WorkflowStep) – Defines the instructions for response retrieval, processing, and pre/post transforms for each step of a workflow. This value defines both the step taken to arrive at the result.

  • result (Optional[ProcessedResponse | ErrorResponse]) – Indicates the result that was retrieved and processed in the current step

model_config: ClassVar[ConfigDict] = {}

Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].

result: ProcessedResponse | ErrorResponse | None
step: WorkflowStep
step_number: int
class scholar_flux.api.workflows.search_workflow.WorkflowResult(*, history: List[StepContext], result: Any)[source]

Bases: BaseWorkflowResult

Helper class that encapsulates the result and history in an object.

Parameters:
  • history (List[StepContext]) – Defines the context of steps taken to arrive at the final result.

  • result (Any) – The final result after the execution of a workflow

history: List[StepContext]
model_config: ClassVar[ConfigDict] = {}

Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].

result: Any
class scholar_flux.api.workflows.search_workflow.WorkflowStep(*, additional_kwargs: ~typing.Dict[str, ~typing.Any] = <factory>, provider_name: str | None = None, search_parameters: ~typing.Dict[str, ~typing.Any] = <factory>, config_parameters: ~typing.Dict[str, ~typing.Any] = <factory>, description: str | None = None)[source]

Bases: BaseWorkflowStep

Defines a specific step in a workflow and indicates its processing metadata and execution instructions before, during, and after the execution of the search procedure in this step of the SearchWorkflow.

Parameters:
  • provider_name – Optional[str]: The provider to use for this step. Allows for the modification of the current provider for multifaceted searches.

  • search_parameters – API search parameters for this step. Defines optional keyword arguments to pass to SearchCoordinator._search()

  • config_parameters – Optional config parameters for this step. Defines optional keyword arguments that modify the step’s SearchAPIConfig.

  • description (str) – An optional description explaining the execution and/or purpose of the current step

config_parameters: Dict[str, Any]
description: str | None
classmethod format_provider_name(v: str | None) str | None[source]

Helper method used to format the inputted provider name using name normalization after type checking.

model_config: ClassVar[ConfigDict] = {}

Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].

post_transform(ctx: StepContext, *args: Any, **kwargs: Any) StepContext[source]

Helper method that validates whether the current ctx is a StepContext before returning the result.

Parameters:

ctx (StepContext) – The context to verify as a StepContext

Returns:

The same step context to be passed to the next step of the current workflow

Return type:

StepContext

Raises:

TypeError – If the current ctx is not a StepContext

pre_transform(ctx: StepContext | None = None, provider_name: str | None = None, search_parameters: dict | None = None, config_parameters: dict | None = None) Self[source]

Overrides the pre_transform of the base workflow step to allow for the modification of runtime search behavior to modify the current search and its behavior.

This method will use the current configuration of the WorkflowStep by default (provider_name, config_parameters, search_parameters).

If the provider_name is not specified, the context from the preceding workflow step, if available, is used to transform the current WorkflowStep before runtime.

Parameters:
  • ctx (Optional[StepContext]) – Defines the inputs that are used by the current SearchWorkflowStep to modify its function before execution.

  • provider_name – Optional[str]: Allows for the modification of the current provider for multifaceted searches

  • **search_parameters – defines optional keyword arguments to pass to SearchCoordinator._search()

  • **config_parameters – defines optional keyword arguments that modify the step’s SearchAPIConfig

Returns:

A modified or copied version of the current search workflow step

Return type:

SearchWorkflowStep

provider_name: str | None
search_parameters: Dict[str, Any]
with_context(search_coordinator: BaseCoordinator) Generator[Self, None, None][source]

Helper method that briefly changes the configuration of the search_coordinator with the step configuration.

This method uses a context manager in addition to the with_config_parameters method of the SearchAPI to modify the search location, default API-specific parameters used, and other possible options that have an effect on SearchAPIConfig. This step is associated with the configuration for greater flexibility in overriding behavior.

Parameters:

search_coordinator (BaseCoordinator) – The search coordinator to modify the configuration for

Yields:

WorkflowStep – The current step with the modification applied

scholar_flux.api.workflows.workflow_defaults module

The scholar_flux.api.workflows.workflow_defaults defines the default workflows that are automatically used when setting up a new SearchCoordinator with a provider name registered in the WORKFLOW_DEFAULTS enumeration.

At the present moment, only the PubMed API implements a workflow to consolidate two step article/metadata retrieval.

class scholar_flux.api.workflows.workflow_defaults.WORKFLOW_DEFAULTS(value, names=<not given>, *values, module=None, qualname=None, type=None, start=1, boundary=None)[source]

Bases: Enum

Enumerated class specifying default workflows for different providers.

classmethod get(workflow_name: str) SearchWorkflow | None[source]

Attempt to retrieve a SearchWorkflow instance for the given workflow name. Will not throw an error if the workflow does not exist.

Parameters:

workflow_name (str) – Name of the default Workflow

Returns:

instance configuration for the workflow if it exists

Return type:

SearchWorkflow

pubmed = PubMedSearchWorkflow(steps=[PubMedSearchStep(additional_kwargs={}, provider_name='pubmed', search_parameters={}, config_parameters={}, description='Retrieves IDs of records matching a particular query from the PubMed database.', step_number=0), PubMedFetchStep(additional_kwargs={}, provider_name='pubmedefetch', search_parameters={}, config_parameters={}, description='Fetches each record/article corresponding to a PubMed ID from the PubMedSearchStep.', step_number=1)], stop_on_error=True)

Module contents

The scholar_flux.api.workflows module contains the core logic for integrating workflows into the SearchCoordinator, and, in doing so, allows for customizable, integrated, workflows that allow extraneous steps to occur throughout the process.

Examples include:
  1. Searching for articles of a particular type in customized year ranges

  2. Performing several searches and aggregating the results into one Processed Response

  3. Using the parameters of a previous search to guide how a subsequent search is performed

Modules:
models: Contains the core components needed to build a workflow. This includes:
  1. BaseWorkflow - the overarching runnable that orchestrates each step of a workflow integrating context

  2. BaseWorkflowStep - the core component that corresponds to a single search or action

  3. BaseStepContext - Passed to future steps to allow integration of the results of previous steps into

    the workflow logic at the current step in a workflow

  4. BaseWorkflowResult - The result that is returned at the completion of a workflow. This step

    contains the results from all steps (history) as well as the result from the final step (result)

search_workflow: Contains the default classes from which workflows are further subclassed and instantiated.

These classes, by default, are designed to perform in a similar manner as a regular call to SearchCoordinator.search(…). This module includes:

  1. SearchWorkflow - The first concrete workflow. Allows each call to SearchCoordinator._search to occur,

    step by step, in a custom workflow

  2. WorkflowStep - Contains the core logic indicate what providers and default parameter overrides will be

    used to perform the next search

  3. StepContext - Basic wrapper holding the results of each step as well as its step number and WorkflowStep

  4. WorkflowResult - Will contain the history of each of the steps in the SearchWorkflow. Also stores the

    result of each search in the result attribute

pubmed_workflow: Contains the necessary steps for interacting with the PubMed API. Note that this API generally

requires a 2-step workflow. The first step retrieves the IDs of articles given a query (eSearch). The second step uses these IDs to fetch actual abstracts/records and supporting information.

To account for this, the PubMedSearchStep and PubMedFetchStep are each created to encompass these two steps in a reusable format and is later defined in a pre-created workflow for later use

WORKFLOW_DEFAULTS: Currently contains the pubmed workflow for retrieving data from articles from PubMed.

This implementation will also contain future workflows that allow searches via SearchCoordinator.search to be further customized.

Example use:

>>> from scholar_flux.api import SearchCoordinator, SearchAPI
>>> from scholar_flux.sessions import CachedSessionManager
>>> from scholar_flux.api.workflows import WORKFLOW_DEFAULTS, SearchWorkflow
# PubMed requires an API key - Is read automatically from the user's environment variable list if available
>>> api = SearchAPI.from_defaults(query = 'Machine Learning in Hospitals', provider_name='pubmed', session = CachedSessionManager(user_agent='sam_research', backend='redis').configure_session())
 # THE WORKFLOW is read automatically from the WORKFLOW defaults
>>> pubmed_search = SearchCoordinator(api)
>>> isinstance(pubmed_search, SearchWorkflow)
# OUTPUT: True
>>> pubmed_workflow = WORKFLOW_DEFAULTS.get('pubmed')
>>> pubmed_search_with_workflow = SearchCoordinator(api, workflow = pubmed_workflow)
# Each comparison is identical given how the workflows are read
>>> assert pubmed_search.workflow == pubmed_workflow == pubmed_search_with_workflow.workflow
# assuming that an API key is available:
>>> response = pubmed_search.search(page = 1, use_workflow = True) # The workflow is used automatically
class scholar_flux.api.workflows.BaseStepContext[source]

Bases: BaseModel

Base class for step contexts.

Passed between workflow steps to communicate the context and history of the current workflow before and after the execution of each step.

model_config: ClassVar[ConfigDict] = {}

Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].

class scholar_flux.api.workflows.BaseWorkflow[source]

Bases: BaseModel, ABC

Base class for defining and fully executing a workflow.

model_config: ClassVar[ConfigDict] = {}

Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].

class scholar_flux.api.workflows.BaseWorkflowResult[source]

Bases: BaseModel

Base class for returning the results from a Workflow.

model_config: ClassVar[ConfigDict] = {}

Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].

class scholar_flux.api.workflows.BaseWorkflowStep(*, additional_kwargs: ~typing.Dict[str, ~typing.Any] = <factory>)[source]

Bases: BaseModel

Base class for workflow steps.

Used to define the behavior and actions of each step in a workflow

Parameters:

additional_kwargs (Dict[str, Any]) – A dictionary of optional keyword parameters used to modify the functionality of future WorkflowStep subclass instances.

additional_kwargs: Dict[str, Any]
model_config: ClassVar[ConfigDict] = {}

Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].

post_transform(ctx: Any, *args: Any, **kwargs: Any) Any[source]

Defines the optional transformation to the results that are retrieved after executing the workflow step to modify its output.

Parameters:
  • ctx (Any) – Defines the inputs that are used by the BaseWorkflowStep after execution to modify its output.

  • *args – Optional positional arguments to pass to change its output behavior

  • **args – Optional keyword arguments to pass to change its output behavior

Returns:

A modified or copied version of the output to be returned or prepared for the next step

Return type:

BaseWorkflowStep

pre_transform(ctx: Any, *args: Any, **kwargs: Any) Self[source]

Defines the optional transformation to the BaseWorkflowStep that can occur before executing the workflow step to generate and modify its behavior.

Parameters:
  • ctx (Any) – Defines the inputs that are used by the BaseWorkflowStep to modify its function before execution

  • *args – Optional positional arguments to pass to change runtime behavior

  • **kwargs – Optional keyword arguments to pass to change runtime behavior

Returns:

A modified or copied version of the original BaseWorkflowStep

Return type:

BaseWorkflowStep

with_context(*args: Any, **kwargs: Any) Generator[Self, None, None][source]

Helper method to be overridden by subclasses to customize the behavior of the workflow step.

Base classes implementing with_context should ideally use a context manager to be fully compatible as an override for current method.

Yields:

Self – The current workflow step within a context

class scholar_flux.api.workflows.PubMedFetchStep(*, additional_kwargs: ~typing.Dict[str, ~typing.Any] = <factory>, provider_name: str | None = 'pubmedefetch', search_parameters: ~typing.Dict[str, ~typing.Any] = <factory>, config_parameters: ~typing.Dict[str, ~typing.Any] = <factory>, description: str | None = 'Fetches each record/article corresponding to a PubMed ID from the PubMedSearchStep.', step_number: int | None = 1)[source]

Bases: WorkflowStep

Next and final step of the PubMed workflow that uses the eFetch API to resolve article/abstract Ids.

These ids are retrieved from the metadata of the previous step and are used as input to eFetch to retrieve their associated articles and/or abstracts.

Parameters:
  • provider_name (Optional[str]) – Defines the pubmed eFetch API as the location where the next/final request will be sent.

  • step_number – Metadata indicating the intended position in the workflow sequence. This is for documentation purposes only; the actual execution order is determined by the step’s position in the workflow’s steps list.

  • description – Metadata indicating the purpose of the current workflow step. This is for documentation purposes only.

additional_kwargs: Dict[str, Any]
config_parameters: Dict[str, Any]
description: str | None
model_config: ClassVar[ConfigDict] = {}

Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].

pre_transform(ctx: StepContext | None = None, provider_name: str | None = None, search_parameters: dict | None = None, config_parameters: dict | None = None) PubMedFetchStep[source]

Overrides the pre_transform of the SearchWorkflow step to use the IDs retrieved from the previous step as input parameters for the PubMed eFetch API request.

Parameters:
  • ctx (Optional[StepContext]) – Defines the inputs that are used by the current PubMedWorkflowStep to modify its function before execution.

  • provider_name – Optional[str]: Provided for API compatibility. Is uses pubmedefetch by default.

  • search_parameters – defines optional keyword arguments to pass to SearchCoordinator._search()

  • config_parameters – defines optional keyword arguments that modify the step’s SearchAPIConfig

Returns:

A modified or copied version of the current pubmed workflow step

Return type:

PubMedFetchWorkflowStep

provider_name: str | None
search_parameters: Dict[str, Any]
step_number: int | None
class scholar_flux.api.workflows.PubMedSearchStep(*, additional_kwargs: ~typing.Dict[str, ~typing.Any] = <factory>, provider_name: str | None = 'pubmed', search_parameters: ~typing.Dict[str, ~typing.Any] = <factory>, config_parameters: ~typing.Dict[str, ~typing.Any] = <factory>, description: str | None = 'Retrieves IDs of records matching a particular query from the PubMed database.', step_number: int | None = 0)[source]

Bases: WorkflowStep

Initial step of the PubMed workflow that retrieves the IDs of articles/abstracts matching the query.

The equivalent of this step is the retrieval of a single page from the PubMed API without the use of a workflow. The default search/config parameter settings can be overridden to customize how the workflow step is executed.

After retrieving the IDs of records that match the current query and page, the workflow will pass these IDs as context to the following PubMedFetchStep which will then resolve each ID into its associated actual article and/or abstract.

provider_name

Defines the pubmed eSearch API as the location where the initial request will be sent.

Type:

Optional[str]

step_number

Metadata indicating the intended position in the workflow sequence. This is for documentation purposes only; the actual execution order is determined by the step’s position in the workflow’s steps list.

Type:

Optional[int]

description

Metadata indicating the purpose of the current workflow step. This is for documentation purposes only.

Type:

Optional[str]

additional_kwargs: Dict[str, Any]
config_parameters: Dict[str, Any]
description: str | None
model_config: ClassVar[ConfigDict] = {}

Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].

provider_name: str | None
search_parameters: Dict[str, Any]
step_number: int | None
class scholar_flux.api.workflows.PubMedSearchWorkflow(*, steps: list[~scholar_flux.api.workflows.search_workflow.WorkflowStep] = <factory>, stop_on_error: bool = True)[source]

Bases: SearchWorkflow

SearchWorkflow implementation for PubMed’s two-step article retrieval process.

PubMed’s API requires a two-step retrieval process:

  1. eSearch (PubMedSearchStep): Searches for articles matching the query and returns a list of article IDs along with metadata about the search (query info, pagination, result counts, etc.)

  2. eFetch (PubMedFetchStep): Takes the article IDs from step 1 and retrieves the full article data including abstracts, authors, and other detailed information.

This workflow coordinates both steps automatically and ensures that metadata from the initial eSearch is preserved in the final result, providing consumers with both the full article data and the search context.

model_config: ClassVar[ConfigDict] = {}

Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].

model_post_init(context: Any, /) None

This function is meant to behave like a BaseModel method to initialise private attributes.

It takes context as an argument since that’s what pydantic-core passes when calling it.

Parameters:
  • self – The BaseModel instance.

  • context – The context.

steps: list[WorkflowStep]
stop_on_error: bool
class scholar_flux.api.workflows.SearchWorkflow(*, steps: List[WorkflowStep], stop_on_error: bool = True)[source]

Bases: BaseWorkflow

Front-end SearchWorkflow class that is further refined for particular providers base on subclassing. This class defines the full workflow used to arrive at a result and records the history of each search at any particular step.

Parameters:
  • steps (List[WorkflowStep]) – Defines the steps to be iteratively executed to arrive at a result.

  • stop_on_error (bool) – Defines whether to stop workflow step iteration when an error occurs in a preceding step. If True, the workflow halts and the ErrorResponse from the previous step is returned.

  • history (List[StepContext]) – Defines the full context of all steps taken and results recorded to arrive at the final result on the completion of an executed workflow.

model_config: ClassVar[ConfigDict] = {}

Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].

model_post_init(context: Any, /) None

This function is meant to behave like a BaseModel method to initialise private attributes.

It takes context as an argument since that’s what pydantic-core passes when calling it.

Parameters:
  • self – The BaseModel instance.

  • context – The context.

steps: List[WorkflowStep]
stop_on_error: bool
class scholar_flux.api.workflows.StepContext(*, step_number: int, step: WorkflowStep, result: ProcessedResponse | ErrorResponse | None = None)[source]

Bases: BaseStepContext

Helper class that holds information on the Workflow step, step number, and its results after execution. This StepContext is passed before and after the execution of a SearchWorkflowStep to dynamically aid in the modification of the functioning of each step at runtime.

Parameters:
  • step_number (int) – Indicates the order in which the step is executed for a particular step context

  • step (WorkflowStep) – Defines the instructions for response retrieval, processing, and pre/post transforms for each step of a workflow. This value defines both the step taken to arrive at the result.

  • result (Optional[ProcessedResponse | ErrorResponse]) – Indicates the result that was retrieved and processed in the current step

model_config: ClassVar[ConfigDict] = {}

Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].

result: ProcessedResponse | ErrorResponse | None
step: WorkflowStep
step_number: int
class scholar_flux.api.workflows.WORKFLOW_DEFAULTS(value, names=<not given>, *values, module=None, qualname=None, type=None, start=1, boundary=None)[source]

Bases: Enum

Enumerated class specifying default workflows for different providers.

classmethod get(workflow_name: str) SearchWorkflow | None[source]

Attempt to retrieve a SearchWorkflow instance for the given workflow name. Will not throw an error if the workflow does not exist.

Parameters:

workflow_name (str) – Name of the default Workflow

Returns:

instance configuration for the workflow if it exists

Return type:

SearchWorkflow

pubmed = PubMedSearchWorkflow(steps=[PubMedSearchStep(additional_kwargs={}, provider_name='pubmed', search_parameters={}, config_parameters={}, description='Retrieves IDs of records matching a particular query from the PubMed database.', step_number=0), PubMedFetchStep(additional_kwargs={}, provider_name='pubmedefetch', search_parameters={}, config_parameters={}, description='Fetches each record/article corresponding to a PubMed ID from the PubMedSearchStep.', step_number=1)], stop_on_error=True)
class scholar_flux.api.workflows.WorkflowResult(*, history: List[StepContext], result: Any)[source]

Bases: BaseWorkflowResult

Helper class that encapsulates the result and history in an object.

Parameters:
  • history (List[StepContext]) – Defines the context of steps taken to arrive at the final result.

  • result (Any) – The final result after the execution of a workflow

history: List[StepContext]
model_config: ClassVar[ConfigDict] = {}

Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].

result: Any
class scholar_flux.api.workflows.WorkflowStep(*, additional_kwargs: ~typing.Dict[str, ~typing.Any] = <factory>, provider_name: str | None = None, search_parameters: ~typing.Dict[str, ~typing.Any] = <factory>, config_parameters: ~typing.Dict[str, ~typing.Any] = <factory>, description: str | None = None)[source]

Bases: BaseWorkflowStep

Defines a specific step in a workflow and indicates its processing metadata and execution instructions before, during, and after the execution of the search procedure in this step of the SearchWorkflow.

Parameters:
  • provider_name – Optional[str]: The provider to use for this step. Allows for the modification of the current provider for multifaceted searches.

  • search_parameters – API search parameters for this step. Defines optional keyword arguments to pass to SearchCoordinator._search()

  • config_parameters – Optional config parameters for this step. Defines optional keyword arguments that modify the step’s SearchAPIConfig.

  • description (str) – An optional description explaining the execution and/or purpose of the current step

additional_kwargs: Dict[str, Any]
config_parameters: Dict[str, Any]
description: str | None
classmethod format_provider_name(v: str | None) str | None[source]

Helper method used to format the inputted provider name using name normalization after type checking.

model_config: ClassVar[ConfigDict] = {}

Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].

post_transform(ctx: StepContext, *args: Any, **kwargs: Any) StepContext[source]

Helper method that validates whether the current ctx is a StepContext before returning the result.

Parameters:

ctx (StepContext) – The context to verify as a StepContext

Returns:

The same step context to be passed to the next step of the current workflow

Return type:

StepContext

Raises:

TypeError – If the current ctx is not a StepContext

pre_transform(ctx: StepContext | None = None, provider_name: str | None = None, search_parameters: dict | None = None, config_parameters: dict | None = None) Self[source]

Overrides the pre_transform of the base workflow step to allow for the modification of runtime search behavior to modify the current search and its behavior.

This method will use the current configuration of the WorkflowStep by default (provider_name, config_parameters, search_parameters).

If the provider_name is not specified, the context from the preceding workflow step, if available, is used to transform the current WorkflowStep before runtime.

Parameters:
  • ctx (Optional[StepContext]) – Defines the inputs that are used by the current SearchWorkflowStep to modify its function before execution.

  • provider_name – Optional[str]: Allows for the modification of the current provider for multifaceted searches

  • **search_parameters – defines optional keyword arguments to pass to SearchCoordinator._search()

  • **config_parameters – defines optional keyword arguments that modify the step’s SearchAPIConfig

Returns:

A modified or copied version of the current search workflow step

Return type:

SearchWorkflowStep

provider_name: str | None
search_parameters: Dict[str, Any]
with_context(search_coordinator: BaseCoordinator) Generator[Self, None, None][source]

Helper method that briefly changes the configuration of the search_coordinator with the step configuration.

This method uses a context manager in addition to the with_config_parameters method of the SearchAPI to modify the search location, default API-specific parameters used, and other possible options that have an effect on SearchAPIConfig. This step is associated with the configuration for greater flexibility in overriding behavior.

Parameters:

search_coordinator (BaseCoordinator) – The search coordinator to modify the configuration for

Yields:

WorkflowStep – The current step with the modification applied