scholar_flux.security package

Submodules

scholar_flux.security.filters module

The scholar_flux.security.filters module implements the basic foundational filter used by logging utilities to determine how text will be masked and redacted prior to being stored in file handlers or written to the console.

This module uses the SensitiveDataMasker class to flag and redact text prior to it ever reaching the console or file via the logging module.

class scholar_flux.security.filters.MaskingFilter(masker: SensitiveDataMasker | None = None)[source]

Bases: Filter

Custom class for adding a masking to the logs: Uses the SensitiveDataMasker in order to enforce the rules detailing which fields should be logged. By default, the SensitiveDataMasker masks API keys and email parameters in requests to APIs sent via scholar_flux.

Initialized MaskingFilters can also be updated without re-instantiation by adding or removing patterns from the set of all patterns within the received Masker.

This class can otherwise be added to other loggers with minimal effort:

>>> import logging
>>> from scholar_flux.security import MaskingFilter # contains the filter
>>> formatting = "%(name)s - %(levelname)s - %(message)s" # custom format
>>> logger = logging.getLogger('security_logger') # gets or creates a new logger
>>> logging.basicConfig(level=logging.DEBUG, format=formatting) # set the level and formatting for the log
>>> logging_filter = MaskingFilter() # creating a new filter
>>> logger.addFilter(logging_filter) # adding the filter and formatting rules
>>> logger.info("The following api key should be filtered: API_KEY='an_api_key_that_needs_to_be_filtered'")
# OUTPUT: security_logger - INFO - The following api key should be filtered: API_KEY='***'

__init__(masker: SensitiveDataMasker | None = None)[source]

By default, this implementation is applied in the initialization of the package in scholar_flux.__init__ on import, so this class does not need to be applied directly.

Parameters:: masker (Optional[SensitiveDataMasker]) – The actual implementation responsible for masking text matching patterns. Note that if a masker is not passed, the MaskingFilter will initialize a masker directly.

filter(record) → bool[source]

Helper method used by the logging.Logger class when adding custom filters to the logging module.

This class will always return True after an attempt to mask sensitive fields is completed.

scholar_flux.security.masker module

The scholar_flux.security.masker defines the SensitiveDataMasker that is used during API retrieval and processing.

The SensitiveDataMasker implements the logic necessary to determine text strings to mask and is used to identify and mask potentially sensitive fields based on dictionary fields and string-based patterns.

This class is also used during initialization and within the scholar_flux.SearchAPI class to identify and mask API keys, emails, and other forms of sensitive data with the aim of redacting text from both console and file system logs.

class scholar_flux.security.masker.SensitiveDataMasker(register_defaults: bool = True)[source]

Bases: object

The main interface used by the scholar_flux API for masking all text identified as sensitive.

This class is used by scholar_flux to ensure that all sensitive text sent to the scholar_flux.logger is masked.

The SensitiveDataMasker operates through the registration of patterns that identify the text to mask.

Components:

KeyMaskingPattern:
identifies specific keys and regex patterns that will signal text to filter
StringMaskingPattern:
identifies strings to filter either by fixed or pattern matching
MaskingPatternSet:
A customized set accepting only subclasses of MaskingPatterns that specify the rules for filtering text of sensitive fields.

By default, this structure implements masking for email addresses, API keys, bearer tokens, etc. that are identified as sensitive parameters/secrets.

Parameters:: register_defaults (bool) – Determines whether or not to add the patterns that filter API keys email parameters and auth bearers.

Examples

>>> from scholar_flux.security import SensitiveDataMasker # imports the class
>>> masker = SensitiveDataMasker(register_defaults = True) # initializes a masker with defaults
>>> masked = masker.mask_text("'API_KEY' = 'This_Should_Be_Masked_1234', email='a.secret.email@address.com'")
>>> print(masked)
# Output: "'API_KEY' = '***', email='***'"

>>> new_secret = "This string should be filtered"
### specifies a new secret to filter - uses regex by default
>>> masker.add_sensitive_string_patterns(name='custom', patterns=new_secret, use_regex = False)
# applying the filter
>>> masked = masker.mask_text(f"The following string should be masked: {new_secret}")
>>> print(masked)
# Output: "The following string should be masked: ***"

__init__(register_defaults: bool = True)[source]

Initializes the SensitiveDataMasker for registering and applying different masking patterns, each with a name and pattern that will be scrubbed from text with the use of the mask_text method.

Parameters:

register_defaults – (bool): Indicates whether to register_defaults for scrubbing emails,
api_keys
Bearers (Authorization)
self.mask_text (etc. from the text when applying)

self.patterns

Indicates the full list of patterns that will be applied when scrubbing text of sensitive fields using masking patterns.

Type:: Set[MaskingPattern]

add_pattern(pattern: MaskingPattern) → None[source]: Adds a pattern to the self.patterns attribute.

add_sensitive_key_patterns(name: str, fields: List[str] | str, fuzzy: bool = False, **kwargs) → None[source]

Adds patterns that identify potentially sensitive strings with the aim of filtering them from logs.

The parameters provided to the method are used to create new string patterns.

Parameters:

name (str) – The name associated with the pattern (aides identification of patterns)
fields (List[str] | str) – The list of fields to identify to search and remove from logs.
pattern (str) – An optional parameter for filtering and removing sensitive fields that match a given pattern. By default this is already set to remove api keys that are typically denoted by alpha numeric fields
fuzzy (bool) – If true, regular expressions are used to identify keys. Otherwise the fixed (field) key matching is used through the implementation of a basic KeyMaskingPattern.
**kwargs – Other fields, specifiable via additional keyword arguments that are passed to KeyMaskingPattern

add_sensitive_string_patterns(name: str, patterns: List[str] | str, **kwargs) → None[source]

Adds patterns that identify potentially sensitive strings with the aim of filtering them from logs.

The parameters provided to the method are used to create new string patterns :param name: The name associated with the pattern (aides identification of patterns) :type name: str :param patterns: The list of patterns to search for and remove from logs :type patterns: List[str] | str :param **kwargs: Other fields, specifiable via additional keyword arguments that are passed to StringMaskingPattern

clear() → None[source]

Clears the SensitiveDataMasker.patterns set of all previously registered MaskingPatterns including those that were registered by default.

The masker would otherwise use the available patterns set to determine what text strings would be masked when the mask_text method is called. Calling mask_text after clearing all MaskingPatterns from the current masker will leave all text unmasked and return the inputted text as is.

get_patterns_by_name(name: str) → Set[MaskingPattern][source]: Get all patterns with a specific name.

classmethod is_secret(obj: Any) → bool[source]

Utility method for verifying whether the current value is a secret. This method delegates the verification of the value type to the SecretUtils helper class to abstract the implementation details in cases where the implementation details might require modification in the future for special cases.

Parameters:: obj (Any) – The object to check
Returns:: True if the object is a SecretStr, False otherwise
Return type:: bool

static mask_secret(obj: Any) → SecretStr | None[source]

Method for ensuring that any non-secret keys will be masked as secrets.

Parameters:: obj (Any) – An object to attempt to unmask if it is a secret string
Returns:: A SecretStr representation of the original object
Return type:: obj (SecretStr)

mask_text(text: str) → str[source]

Public method for removing sensitive data from text/logs Note that the data that is obfuscated is dependent on what patterns were already previously defined in the SensitiveDataMasker. by default, this includes API keys, emails, and auth headers.

Parameters:: text (str) – the text to scrub of sensitive data
Returns:: the cleaned text that excludes sensitive fields

register_secret_if_exists(field: str, value: SecretStr | Any, name: str | None = None, use_regex: bool = False, ignore_case: bool = True) → bool[source]

Identifies fields already registered as secret strings and adds a relevant pattern for ensuring that the field, when unmasked for later use, doesn’t display in logs. Note that if the current field is not a SecretStr, the method will return False without modification or side-effects.

The parameters provided to the method are used to create new string patterns when a SecretStr is detected.

Parameters:

field (str) – The field, parameter, or key associated with the secret key
value (SecretStr | Any) – The value, if typed as a secret string, to be registered as a pattern
name (Optional[str]) – The name to add to identify the relevant pattern by within the pattern set. If not provided, defaults to the field name.
use_regex (bool) – Indicates whether the current function should use regular expressions when matching the pattern in text. Defaults to False.
ignore_case (bool) – Whether we should consider case when determining whether or not to filter a string. Defaults to True.

Returns:

If the value is a SecretStr, a string masking pattern is registered for the value and True is returned. if the value is not a SecretStr, False is returned and no side-effects will occur in this case.

Return type:

bool

Example

>>> masker = SensitiveDataMasker()
>>> api_key = SecretStr("sk-123456")
>>> registered = masker.register_secret_if_exists("api_key", api_key)
>>> print(registered)  # True
>>> registered = masker.register_secret_if_exists("normal_field", "normal_value")
>>> print(registered)  # False

remove_pattern_by_name(name: str) → int[source]: Remove patterns by name, return count of removed patterns.

structure(flatten: bool = False, show_value_attributes: bool = False) → str[source]

Helper method for creating an in-memory cache without overloading the representation with the specifics of what is being cached.

By default, nested MaskingPatterns will not be shown.

static unmask_secret(obj: Any) → Any[source]

Method for ensuring that usable values can be successfully extracted from objects. If the current value is a secret string, this method will return the secret value from the object.

Parameters:: obj (Any) – An object to attempt to unmask if it is a secret string
Returns:: The object’s original type before being converted into a secret string
Return type:: obj (Any)

update(pattern: MaskingPattern | Set[MaskingPattern] | Set[KeyMaskingPattern] | Set[StringMaskingPattern] | MutableSequence[MaskingPattern | KeyMaskingPattern | StringMaskingPattern]) → None[source]: Adds a pattern to the self.patterns attribute.

scholar_flux.security.patterns module

The scholar_flux.security.patterns module implements the foundational patterns required to implement a light-weight fixed/regex pattern matching utility that determines keys to mask in both text and JSON-formatted parameter dictionaries.

Classes:

MaskingPattern:: Implements the abstract base class that defines how MaskingPatterns are created and formatted.
MaskingPatternSet:: Defines a subclass of a set that excepts only subclasses of MaskingPatterns for robustness.
KeyMaskingPattern:: Defines the class and methods necessary to mask text based on the presence or absence of a specific field name when determining what patterns to mask.
StringMaskingPattern:: Defines the class and methods necessary to mask text based on the presence or absence of specific patterns. These patterns can either be fixed or regular expressions, and accept both case-sensitive and case-insensitive pattern matching settings.

class scholar_flux.security.patterns.FuzzyKeyMaskingPattern(name: str, pattern: str | SecretStr = '[A-Za-z0-9\\-_]+', field: str = <factory>, replacement: str = '***', use_regex: bool = True, ignore_case: bool = True, mask_pattern: bool = True)[source]

Bases: KeyMaskingPattern

A KeyMaskingPattern subclass that allows the field parameter to use regular expressions field pattern matching.

name

The name to be associated with a particular pattern - can help in later identification and retrieval of rules associated with pattern masks of a particular category.

Type:: str

field

The regular expression field to look for when determining whether to mask a specific parameter.

Type:: str

pattern

The pattern to use to remove sensitive fields, contingent on a parameter being defined. By default, the pattern is set to allow for the removal dashes and alphanumeric fields but can be overridden based on API specific specifications.

Type:: str

replacement

Indicates the replacement string for the value in the key-value pair if matched (’***’ by default)

Type:: str

use_regex

Indicates whether the current function should use regular expressions

Type:: bool

ignore_case

whether we should consider case when determining whether or not to filter a string. (True by default)

Type:: bool

mask_pattern

Indicates whether we should, by default, mask pattern strings that are registered in the MaskingPattern. This is True by default.

Type:: bool

apply_masking(text: str) → str[source]

Uses fuzzy field matching to identify fields containing sensitive data in text.

This method is revised to account for circumstances where several fields might be present in the same text string using the | delimiter. The masker can be customized using the following fields: field, pattern, replacement, and ignore_case.

Parameters:: text (str) – The text to clean of sensitive fields

class scholar_flux.security.patterns.KeyMaskingPattern(name: str, pattern: str | SecretStr = '[A-Za-z0-9\\-_]+', field: str = <factory>, replacement: str = '***', use_regex: bool = True, ignore_case: bool = True, mask_pattern: bool = True)[source]

Bases: MaskingPattern

Masks values associated with specific keys/fields/parameters in text and API requests.

The KeyMaskingPattern identifies fields in dumped JSON-formatted data that are commonly prepared in the creation of request URLs. After identifying the assigned fixed-string field or key in a request URL or string-formatted dictionary, the pattern conditionally masks its associated value using a fixed or regular expression pattern.

By default, the masking pattern is set to filter string combinations of dashes and alphanumeric fields that are commonly observed in API keys, secrets, etc. The pattern parameter can be overridden to identify sensitive text such as birthdays, combinations of digits, and addresses using regular expressions.

name

The name to be associated with a particular pattern. Facilitates the identification and retrieval of pattern masks by name/category in later steps.

Type:: str

field

The fixed field string to look for when determining whether to mask a specific parameter.

Type:: str

pattern

The pattern that will be used to identify and mask sensitive fields when its corresponding field/JSON key has been located.

Type:: str

replacement

Indicates the replacement string for the value in the key-value pair if matched (’***’ by default)

Type:: str

use_regex

Indicates whether the current function should use regular expressions

Type:: bool

ignore_case

whether we should consider case when determining whether or not to filter a string. (True by default)

Type:: bool

mask_pattern

Indicates whether we should, by default, mask pattern strings that are registered in the MaskingPattern. This is True by default.

Type:: bool

__init__(*args: Any, **kwargs: Any) → None

apply_masking(text: str) → str[source]

Uses the defined settings in order to remove sensitive fields from text based on the attributes specified for field, pattern, replacement, and ignore case.

Parameters:: text (str) – The text to clean of sensitive fields

field: str = FieldInfo(annotation=NoneType, required=True, metadata=[MinLen(min_length=1)])

ignore_case: bool = True

mask_pattern: bool = True

name: str

pattern: str | SecretStr = '[A-Za-z0-9\\-_]+'

replacement: str = '***'

use_regex: bool = True

class scholar_flux.security.patterns.MaskingPattern(name: str, pattern: str | SecretStr)[source]

Bases: ABC

The base class for creating MaskingPattern objects that can be used to mask fields based on defined rules.

__init__(*args: Any, **kwargs: Any) → None

abstract apply_masking(text: str) → str[source]: Base method that will be overridden by subclasses to later remove sensitive values and fields from text.

name: str

pattern: str | SecretStr

class scholar_flux.security.patterns.MaskingPatternSet[source]

Bases: set[MaskingPattern]

Defines the subclass of a set that implements type safety to ensure that only subclasses of MaskingPatterns can be added.

As a result, robustness is increased, and the likelihood of unsuspecting errors from the use of incorrect types decreases at runtime when using the scholar_flux API for response retrieval and sensitive pattern masking.

__init__()[source]: Initializes the MaskingPatternSet as an empty set.

add(item: MaskingPattern) → None[source]: Overrides the basic add method to ensure that each item is typed checked prior to entering the set.

update(*others: Iterable[MaskingPattern]) → None[source]: Overrides the basic update method to ensure that all items are typed checked prior to entering the set.

class scholar_flux.security.patterns.StringMaskingPattern(name: str, pattern: str | SecretStr, replacement: str = '***', use_regex: bool = True, ignore_case: bool = True, mask_pattern: bool = True)[source]

Bases: MaskingPattern

Masks values associated with a particular pattern or fixed string in text and API requests.

name

The name to be associated with a particular pattern - can help in later identification and retrieval of rules associated with pattern masks of a particular category.

Type:: str

pattern

The pattern to use to remove sensitive fields, contingent on a parameter being defined. By default, the pattern is set to allow for the removal dashes and alphanumeric fields but can be overridden based on API specific specifications.

Type:: str

replacement

Indicates the replacement string for the value in the string if matched (’***’ by default)

Type:: str

use_regex

Indicates whether the current function should use regular expressions

Type:: bool

ignore_case

whether we should consider case when determining whether or not to filter a string. (True by default)

Type:: bool

mask_pattern

Indicates whether we should, by default, mask pattern strings that are registered in the MaskingPattern. This is True by default.

Type:: bool

__init__(*args: Any, **kwargs: Any) → None

apply_masking(text: str) → str[source]

Uses the defined settings in order to remove sensitive fields from text based on the attributes specified for pattern, replacement, use_regex, and ignore_case.

Parameters:: text (str) – The text to clean of sensitive fields

Returns: text (str): The text after scrubbing sensitive fields

ignore_case: bool = True

mask_pattern: bool = True

name: str

pattern: str | SecretStr

replacement: str = '***'

use_regex: bool = True

scholar_flux.security.utils module

The scholar_flux.security.utils module defines the SecretUtils class that implements the basic set of tools for both masking and unmasking text, and identifying if a field is masked.

This class uses the pydantic.SecretStr class to mask and unmask fields and can be further extended to encrypt and decrypt text as needed before and after conversion to a secret string, respectively.

class scholar_flux.security.utils.SecretUtils[source]

Bases: object

Helper utility for both masking and unmasking strings.

Class methods are defined so that they can be used directly or implemented as a mixin so that subclasses can implement the class methods directly.

classmethod is_secret(obj: Any) → bool[source]

Utility class method used to verify whether the current variable is a secret string. This method abstracts the implementation details into a single method to aid further extensibility.

Parameters:: obj (Any) – The object to check
Returns:: True if the object is a SecretStr, False otherwise
Return type:: bool

classmethod mask_secret(obj: Any) → SecretStr | None[source]

Helper method masking variables into secret strings:

Parameters:: obj (Any | SecretStr) – An object to attempt to unmask if it is a secret string
Returns:: A SecretStr representation of the original object
Return type:: obj (SecretStr)

Examples

>>> from scholar_flux.security import SecretUtils
>>> string = 'a secret'
>>> secret_string = SecretUtils.mask_secret(string)
>>> isinstance(secret_string, SecretStr) is True
# OUTPUT: True

>>> no_string = None
>>> non_secret = SecretUtils.mask_secret(no_string)
>>> non_secret is None
# OUTPUT: True

classmethod unmask_secret(obj: Any) → Any[source]

Helper method for unmasking a variable from a SecretStr into its native type if a secret string.

Parameters:: obj (Any | SecretStr) – An object to attempt to unmask if it is a secret string
Returns:: The object’s original type before being converted into a secret string
Return type:: obj (Any)

Examples

>>> from scholar_flux.security import SecretUtils
>>> string = 'a secret'
>>> secret_string = SecretUtils.mask_secret(string)
>>> isinstance(secret_string, SecretStr) is True
# OUTPUT: True
>>> SecretUtils.unmask_secret(secret_string) == string
>>> SecretUtils.unmask_secret(None) is None
# OUTPUT: True

Module contents

The scholar_flux.security module contains classes and models created specifically for ensuring that console and file logs do not contain sensitive data. The set of modules uses pattern matching to determine whether, when sending a request, any known API keys are filtered from the logs.

Core classes:

SecretUtils: Class with basic static methods for masking and unmasking non-missing strings with pydantic.SecretStr
MaskingPattern: Basic pattern from which all subclasses inherit from in order to define rules for masking strings
KeyMaskingPattern: Matches key-value pairs for commonly sensitive fixed string fields (e.g. api_key, mailto)
FuzzyKeyMaskingPattern: Extends the KeyMaskingPattern for fuzzy field matching when parameter names may vary
StringMaskingPattern: Identifies and masks known sensitive strings using either regex or fixed string matching
MaskingFilter: Defines the core logging filter used by the dedicated scholar_flux.logger to hide sensitive info
MaskingPatternSet: Container that will hold a set of all String- and Key-based patterns used in the package
SensitiveDataMasker: Main entry point for managing/adding to/deleting from the list of all patterns to be filtered

Note that the global package level SensitiveDataMasker is instantiated on package loading and can be imported:

>>> from scholar_flux import masker
>>> print(masker) # view all currently masked strings and keys
# Output: "SensitiveDataMasker(patterns=MaskingPatternSet(...))"
# set up and remove all matching email-like strings
>>> email_pattern = r"[a-zA-Z0-9._%+-]+(@|%40)[a-zA-Z0-9.-]+[.][a-zA-Z]+"
>>> masker.add_sensitive_string_patterns( name="email_strings", patterns=email_pattern, use_regex = True)
>>> masker.mask_text("here_is_my_fake123@email.com")
# Output: "***"

class scholar_flux.security.FuzzyKeyMaskingPattern(name: str, pattern: str | SecretStr = '[A-Za-z0-9\\-_]+', field: str = <factory>, replacement: str = '***', use_regex: bool = True, ignore_case: bool = True, mask_pattern: bool = True)[source]

Bases: KeyMaskingPattern

A KeyMaskingPattern subclass that allows the field parameter to use regular expressions field pattern matching.

name

The name to be associated with a particular pattern - can help in later identification and retrieval of rules associated with pattern masks of a particular category.

Type:: str

field

The regular expression field to look for when determining whether to mask a specific parameter.

Type:: str

pattern

The pattern to use to remove sensitive fields, contingent on a parameter being defined. By default, the pattern is set to allow for the removal dashes and alphanumeric fields but can be overridden based on API specific specifications.

Type:: str

replacement

Indicates the replacement string for the value in the key-value pair if matched (’***’ by default)

Type:: str

use_regex

Indicates whether the current function should use regular expressions

Type:: bool

ignore_case

whether we should consider case when determining whether or not to filter a string. (True by default)

Type:: bool

mask_pattern

Indicates whether we should, by default, mask pattern strings that are registered in the MaskingPattern. This is True by default.

Type:: bool

apply_masking(text: str) → str[source]

Uses fuzzy field matching to identify fields containing sensitive data in text.

This method is revised to account for circumstances where several fields might be present in the same text string using the | delimiter. The masker can be customized using the following fields: field, pattern, replacement, and ignore_case.

Parameters:: text (str) – The text to clean of sensitive fields

class scholar_flux.security.KeyMaskingPattern(name: str, pattern: str | SecretStr = '[A-Za-z0-9\\-_]+', field: str = <factory>, replacement: str = '***', use_regex: bool = True, ignore_case: bool = True, mask_pattern: bool = True)[source]

Bases: MaskingPattern

Masks values associated with specific keys/fields/parameters in text and API requests.

The KeyMaskingPattern identifies fields in dumped JSON-formatted data that are commonly prepared in the creation of request URLs. After identifying the assigned fixed-string field or key in a request URL or string-formatted dictionary, the pattern conditionally masks its associated value using a fixed or regular expression pattern.

By default, the masking pattern is set to filter string combinations of dashes and alphanumeric fields that are commonly observed in API keys, secrets, etc. The pattern parameter can be overridden to identify sensitive text such as birthdays, combinations of digits, and addresses using regular expressions.

name

The name to be associated with a particular pattern. Facilitates the identification and retrieval of pattern masks by name/category in later steps.

Type:: str

field

The fixed field string to look for when determining whether to mask a specific parameter.

Type:: str

pattern

The pattern that will be used to identify and mask sensitive fields when its corresponding field/JSON key has been located.

Type:: str

replacement

Indicates the replacement string for the value in the key-value pair if matched (’***’ by default)

Type:: str

use_regex

Indicates whether the current function should use regular expressions

Type:: bool

ignore_case

whether we should consider case when determining whether or not to filter a string. (True by default)

Type:: bool

mask_pattern

Indicates whether we should, by default, mask pattern strings that are registered in the MaskingPattern. This is True by default.

Type:: bool

__init__(*args: Any, **kwargs: Any) → None

apply_masking(text: str) → str[source]

Uses the defined settings in order to remove sensitive fields from text based on the attributes specified for field, pattern, replacement, and ignore case.

Parameters:: text (str) – The text to clean of sensitive fields

field: str = FieldInfo(annotation=NoneType, required=True, metadata=[MinLen(min_length=1)])

ignore_case: bool = True

mask_pattern: bool = True

name: str

pattern: str | SecretStr = '[A-Za-z0-9\\-_]+'

replacement: str = '***'

use_regex: bool = True

class scholar_flux.security.MaskingFilter(masker: SensitiveDataMasker | None = None)[source]

Bases: Filter

Custom class for adding a masking to the logs: Uses the SensitiveDataMasker in order to enforce the rules detailing which fields should be logged. By default, the SensitiveDataMasker masks API keys and email parameters in requests to APIs sent via scholar_flux.

Initialized MaskingFilters can also be updated without re-instantiation by adding or removing patterns from the set of all patterns within the received Masker.

This class can otherwise be added to other loggers with minimal effort:

>>> import logging
>>> from scholar_flux.security import MaskingFilter # contains the filter
>>> formatting = "%(name)s - %(levelname)s - %(message)s" # custom format
>>> logger = logging.getLogger('security_logger') # gets or creates a new logger
>>> logging.basicConfig(level=logging.DEBUG, format=formatting) # set the level and formatting for the log
>>> logging_filter = MaskingFilter() # creating a new filter
>>> logger.addFilter(logging_filter) # adding the filter and formatting rules
>>> logger.info("The following api key should be filtered: API_KEY='an_api_key_that_needs_to_be_filtered'")
# OUTPUT: security_logger - INFO - The following api key should be filtered: API_KEY='***'

__init__(masker: SensitiveDataMasker | None = None)[source]

By default, this implementation is applied in the initialization of the package in scholar_flux.__init__ on import, so this class does not need to be applied directly.

Parameters:: masker (Optional[SensitiveDataMasker]) – The actual implementation responsible for masking text matching patterns. Note that if a masker is not passed, the MaskingFilter will initialize a masker directly.

filter(record) → bool[source]

Helper method used by the logging.Logger class when adding custom filters to the logging module.

This class will always return True after an attempt to mask sensitive fields is completed.

class scholar_flux.security.MaskingPattern(name: str, pattern: str | SecretStr)[source]

Bases: ABC

The base class for creating MaskingPattern objects that can be used to mask fields based on defined rules.

__init__(*args: Any, **kwargs: Any) → None

abstract apply_masking(text: str) → str[source]: Base method that will be overridden by subclasses to later remove sensitive values and fields from text.

name: str

pattern: str | SecretStr

class scholar_flux.security.MaskingPatternSet[source]

Bases: set[MaskingPattern]

Defines the subclass of a set that implements type safety to ensure that only subclasses of MaskingPatterns can be added.

As a result, robustness is increased, and the likelihood of unsuspecting errors from the use of incorrect types decreases at runtime when using the scholar_flux API for response retrieval and sensitive pattern masking.

__init__()[source]: Initializes the MaskingPatternSet as an empty set.

add(item: MaskingPattern) → None[source]: Overrides the basic add method to ensure that each item is typed checked prior to entering the set.

update(*others: Iterable[MaskingPattern]) → None[source]: Overrides the basic update method to ensure that all items are typed checked prior to entering the set.

class scholar_flux.security.SecretUtils[source]

Bases: object

Helper utility for both masking and unmasking strings.

Class methods are defined so that they can be used directly or implemented as a mixin so that subclasses can implement the class methods directly.

classmethod is_secret(obj: Any) → bool[source]

Utility class method used to verify whether the current variable is a secret string. This method abstracts the implementation details into a single method to aid further extensibility.

Parameters:: obj (Any) – The object to check
Returns:: True if the object is a SecretStr, False otherwise
Return type:: bool

classmethod mask_secret(obj: Any) → SecretStr | None[source]

Helper method masking variables into secret strings:

Parameters:: obj (Any | SecretStr) – An object to attempt to unmask if it is a secret string
Returns:: A SecretStr representation of the original object
Return type:: obj (SecretStr)

Examples

>>> from scholar_flux.security import SecretUtils
>>> string = 'a secret'
>>> secret_string = SecretUtils.mask_secret(string)
>>> isinstance(secret_string, SecretStr) is True
# OUTPUT: True

>>> no_string = None
>>> non_secret = SecretUtils.mask_secret(no_string)
>>> non_secret is None
# OUTPUT: True

classmethod unmask_secret(obj: Any) → Any[source]

Helper method for unmasking a variable from a SecretStr into its native type if a secret string.

Parameters:: obj (Any | SecretStr) – An object to attempt to unmask if it is a secret string
Returns:: The object’s original type before being converted into a secret string
Return type:: obj (Any)

Examples

>>> from scholar_flux.security import SecretUtils
>>> string = 'a secret'
>>> secret_string = SecretUtils.mask_secret(string)
>>> isinstance(secret_string, SecretStr) is True
# OUTPUT: True
>>> SecretUtils.unmask_secret(secret_string) == string
>>> SecretUtils.unmask_secret(None) is None
# OUTPUT: True

class scholar_flux.security.SensitiveDataMasker(register_defaults: bool = True)[source]

Bases: object

The main interface used by the scholar_flux API for masking all text identified as sensitive.

This class is used by scholar_flux to ensure that all sensitive text sent to the scholar_flux.logger is masked.

The SensitiveDataMasker operates through the registration of patterns that identify the text to mask.

Components:

KeyMaskingPattern:
identifies specific keys and regex patterns that will signal text to filter
StringMaskingPattern:
identifies strings to filter either by fixed or pattern matching
MaskingPatternSet:
A customized set accepting only subclasses of MaskingPatterns that specify the rules for filtering text of sensitive fields.

By default, this structure implements masking for email addresses, API keys, bearer tokens, etc. that are identified as sensitive parameters/secrets.

Parameters:: register_defaults (bool) – Determines whether or not to add the patterns that filter API keys email parameters and auth bearers.

Examples

>>> from scholar_flux.security import SensitiveDataMasker # imports the class
>>> masker = SensitiveDataMasker(register_defaults = True) # initializes a masker with defaults
>>> masked = masker.mask_text("'API_KEY' = 'This_Should_Be_Masked_1234', email='a.secret.email@address.com'")
>>> print(masked)
# Output: "'API_KEY' = '***', email='***'"

>>> new_secret = "This string should be filtered"
### specifies a new secret to filter - uses regex by default
>>> masker.add_sensitive_string_patterns(name='custom', patterns=new_secret, use_regex = False)
# applying the filter
>>> masked = masker.mask_text(f"The following string should be masked: {new_secret}")
>>> print(masked)
# Output: "The following string should be masked: ***"

__init__(register_defaults: bool = True)[source]

Initializes the SensitiveDataMasker for registering and applying different masking patterns, each with a name and pattern that will be scrubbed from text with the use of the mask_text method.

Parameters:

register_defaults – (bool): Indicates whether to register_defaults for scrubbing emails,
api_keys
Bearers (Authorization)
self.mask_text (etc. from the text when applying)

self.patterns

Indicates the full list of patterns that will be applied when scrubbing text of sensitive fields using masking patterns.

Type:: Set[MaskingPattern]

add_pattern(pattern: MaskingPattern) → None[source]: Adds a pattern to the self.patterns attribute.

add_sensitive_key_patterns(name: str, fields: List[str] | str, fuzzy: bool = False, **kwargs) → None[source]

Adds patterns that identify potentially sensitive strings with the aim of filtering them from logs.

The parameters provided to the method are used to create new string patterns.

Parameters:

name (str) – The name associated with the pattern (aides identification of patterns)
fields (List[str] | str) – The list of fields to identify to search and remove from logs.
pattern (str) – An optional parameter for filtering and removing sensitive fields that match a given pattern. By default this is already set to remove api keys that are typically denoted by alpha numeric fields
fuzzy (bool) – If true, regular expressions are used to identify keys. Otherwise the fixed (field) key matching is used through the implementation of a basic KeyMaskingPattern.
**kwargs – Other fields, specifiable via additional keyword arguments that are passed to KeyMaskingPattern

add_sensitive_string_patterns(name: str, patterns: List[str] | str, **kwargs) → None[source]

Adds patterns that identify potentially sensitive strings with the aim of filtering them from logs.

The parameters provided to the method are used to create new string patterns :param name: The name associated with the pattern (aides identification of patterns) :type name: str :param patterns: The list of patterns to search for and remove from logs :type patterns: List[str] | str :param **kwargs: Other fields, specifiable via additional keyword arguments that are passed to StringMaskingPattern

clear() → None[source]

Clears the SensitiveDataMasker.patterns set of all previously registered MaskingPatterns including those that were registered by default.

The masker would otherwise use the available patterns set to determine what text strings would be masked when the mask_text method is called. Calling mask_text after clearing all MaskingPatterns from the current masker will leave all text unmasked and return the inputted text as is.

get_patterns_by_name(name: str) → Set[MaskingPattern][source]: Get all patterns with a specific name.

classmethod is_secret(obj: Any) → bool[source]

Utility method for verifying whether the current value is a secret. This method delegates the verification of the value type to the SecretUtils helper class to abstract the implementation details in cases where the implementation details might require modification in the future for special cases.

Parameters:: obj (Any) – The object to check
Returns:: True if the object is a SecretStr, False otherwise
Return type:: bool

static mask_secret(obj: Any) → SecretStr | None[source]

Method for ensuring that any non-secret keys will be masked as secrets.

Parameters:: obj (Any) – An object to attempt to unmask if it is a secret string
Returns:: A SecretStr representation of the original object
Return type:: obj (SecretStr)

mask_text(text: str) → str[source]

Public method for removing sensitive data from text/logs Note that the data that is obfuscated is dependent on what patterns were already previously defined in the SensitiveDataMasker. by default, this includes API keys, emails, and auth headers.

Parameters:: text (str) – the text to scrub of sensitive data
Returns:: the cleaned text that excludes sensitive fields

register_secret_if_exists(field: str, value: SecretStr | Any, name: str | None = None, use_regex: bool = False, ignore_case: bool = True) → bool[source]

Identifies fields already registered as secret strings and adds a relevant pattern for ensuring that the field, when unmasked for later use, doesn’t display in logs. Note that if the current field is not a SecretStr, the method will return False without modification or side-effects.

The parameters provided to the method are used to create new string patterns when a SecretStr is detected.

Parameters:

field (str) – The field, parameter, or key associated with the secret key
value (SecretStr | Any) – The value, if typed as a secret string, to be registered as a pattern
name (Optional[str]) – The name to add to identify the relevant pattern by within the pattern set. If not provided, defaults to the field name.
use_regex (bool) – Indicates whether the current function should use regular expressions when matching the pattern in text. Defaults to False.
ignore_case (bool) – Whether we should consider case when determining whether or not to filter a string. Defaults to True.

Returns:

If the value is a SecretStr, a string masking pattern is registered for the value and True is returned. if the value is not a SecretStr, False is returned and no side-effects will occur in this case.

Return type:

bool

Example

>>> masker = SensitiveDataMasker()
>>> api_key = SecretStr("sk-123456")
>>> registered = masker.register_secret_if_exists("api_key", api_key)
>>> print(registered)  # True
>>> registered = masker.register_secret_if_exists("normal_field", "normal_value")
>>> print(registered)  # False

remove_pattern_by_name(name: str) → int[source]: Remove patterns by name, return count of removed patterns.

structure(flatten: bool = False, show_value_attributes: bool = False) → str[source]

Helper method for creating an in-memory cache without overloading the representation with the specifics of what is being cached.

By default, nested MaskingPatterns will not be shown.

static unmask_secret(obj: Any) → Any[source]

Method for ensuring that usable values can be successfully extracted from objects. If the current value is a secret string, this method will return the secret value from the object.

Parameters:: obj (Any) – An object to attempt to unmask if it is a secret string
Returns:: The object’s original type before being converted into a secret string
Return type:: obj (Any)

update(pattern: MaskingPattern | Set[MaskingPattern] | Set[KeyMaskingPattern] | Set[StringMaskingPattern] | MutableSequence[MaskingPattern | KeyMaskingPattern | StringMaskingPattern]) → None[source]: Adds a pattern to the self.patterns attribute.

class scholar_flux.security.StringMaskingPattern(name: str, pattern: str | SecretStr, replacement: str = '***', use_regex: bool = True, ignore_case: bool = True, mask_pattern: bool = True)[source]

Bases: MaskingPattern

Masks values associated with a particular pattern or fixed string in text and API requests.

name

The name to be associated with a particular pattern - can help in later identification and retrieval of rules associated with pattern masks of a particular category.

Type:: str

pattern

The pattern to use to remove sensitive fields, contingent on a parameter being defined. By default, the pattern is set to allow for the removal dashes and alphanumeric fields but can be overridden based on API specific specifications.

Type:: str

replacement

Indicates the replacement string for the value in the string if matched (’***’ by default)

Type:: str

use_regex

Indicates whether the current function should use regular expressions

Type:: bool

ignore_case

whether we should consider case when determining whether or not to filter a string. (True by default)

Type:: bool

mask_pattern

Indicates whether we should, by default, mask pattern strings that are registered in the MaskingPattern. This is True by default.

Type:: bool

__init__(*args: Any, **kwargs: Any) → None

apply_masking(text: str) → str[source]

Uses the defined settings in order to remove sensitive fields from text based on the attributes specified for pattern, replacement, use_regex, and ignore_case.

Parameters:: text (str) – The text to clean of sensitive fields

Returns: text (str): The text after scrubbing sensitive fields

ignore_case: bool = True

mask_pattern: bool = True

name: str

pattern: str | SecretStr

replacement: str = '***'

use_regex: bool = True