Text Selectors

Text substring selectors for anchoring annotations.

Based on parts of the W3C Web Annotation Data Model.

class anchorpoint.textselectors.TextPositionSelector(**data)

Describes a textual segment by start and end positions.

Based on the Web Annotation Data Model Text Position Selector standard

Parameters
  • start – The starting position of the segment of text. The first character in the full text is character position 0, and the character is included within the segment.

  • end – The end position of the segment of text. The character is not included within the segment.

__add__(value)

Make a new selector covering the combined ranges of self and other.

Parameters
  • other – selector for another text interval

  • margin – allowable distance between two selectors that can still be added together

Return type

Union[TextPositionSelector, TextPositionSet, None]

Returns

a selector reflecting the combined range if possible, otherwise None

__and__(other)

Make a new selector covering the combined ranges of self and other.

Parameters

other (Union[TextPositionSelector, TextPositionSet, Range, RangeSet]) – selector for another text interval

Return type

Optional[TextPositionSelector]

Returns

a selector reflecting the combined range

__ge__(other)

Return self>=value.

Return type

bool

__gt__(other)

Check if self is greater than other.

Parameters

other (Union[TextPositionSelector, TextPositionSet]) – selector for another text interval

Return type

bool

Returns

whether self is greater than other

__hash__ = None
__or__(other)

Make a new selector covering the combined ranges of self and other.

Parameters

other (Union[TextPositionSelector, TextPositionSet, Range, RangeSet]) – selector for another text interval

Return type

Union[TextPositionSelector, TextPositionSet]

Returns

a selector reflecting the combined range

as_quote(text, left_margin=0, right_margin=0)

Make a quote selector, creating prefix and suffix from specified lengths of text.

Parameters
  • text (str) – the passage where an exact quotation needs to be located

  • left_margin (int) – number of characters to look backward to create TextQuoteSelector.prefix

  • right_margin (int) – number of characters to look forward to create TextQuoteSelector.suffix

Return type

TextQuoteSelector

combine(other, text)

Make new selector combining ranges of self and other if it will fit in text.

difference(rng)

Apply Range difference method replacing RangeSet with TextPositionSet in return value.

Return type

Union[TextPositionSet, TextPositionSelector]

range()

Get the range of the text.

Return type

Range

rangeset()

Get the range set of the text.

Return type

RangeSet

select_text(text)

Get the quotation from text identified by start and end positions.

Return type

str

classmethod start_not_negative(v)

Check if start position is not negative.

Return type

bool

Returns

whether the start position is not negative

unique_quote_selector(text)

Add text to prefix and suffix as needed to make selector unique in the source text.

Parameters

text (str) – the passage where an exact quotation needs to be located

Return type

TextQuoteSelector

verify_text_positions(text)

Verify that selector’s text positions exist in text.

Return type

None

class anchorpoint.textselectors.TextPositionSet(**data)

A set of TextPositionSelectors.

__add__(value)

Increase all startpoints and endpoints by the given amount.

Parameters

value (Union[int, TextPositionSelector, TextPositionSet]) – selector for another text interval, or integet to add to every start and end value in self’s position selectors

Return type

TextPositionSet

Returns

a selector reflecting the combined range if possible, otherwise None

__ge__(other)

Test if self’s rangeset includes all of other’s rangeset.

Return type

bool

__gt__(other)

Test if self’s rangeset includes all of other’s rangeset, but is not identical.

Return type

bool

__hash__ = None
__str__()

Return str(self).

__sub__(value)

Decrease all startpoints and endpoints by the given amount.

Return type

TextPositionSet

add_margin(text, margin_width=3, margin_characters=', ."\\' ;[]()')

Expands selected position selectors to include margin of punctuation.

This can cause multiple selections to be merged into a single one.

Ignores quote selectors.

Parameters
  • text (str) – The text that passages are selected from

  • margin_width (int) – The width of the margin to add

  • margin_characters (str) – The characters to include in the margin

Return type

TextPositionSet

Returns

A new TextPositionSet with the margin added

>>> from anchorpoint.schemas import TextPositionSetFactory
>>> text = "I predict that the grass is wet. (It rained.)"
>>> factory = TextPositionSetFactory(text=text)
>>> selectors = [TextQuoteSelector(exact="the grass is wet"), TextQuoteSelector(exact="it rained")]
>>> position_set = factory.from_selection(selection=selectors)
>>> len(position_set.ranges())
2
>>> new_position_set = position_set.add_margin(text=text)
>>> len(new_position_set.ranges())
1
>>> new_position_set.ranges()[0].start
15
>>> new_position_set.ranges()[0].end
43
as_quotes(text)

Copy self’s quote and position selectors, converting all position selectors to quote selectors.

Return type

List[TextQuoteSelector]

as_string(text)

Return a string representing the selected parts of text.

>>> selectors = [TextPositionSelector(start=5, end=10)]
>>> selector_set = TextPositionSet(positions=selectors)
>>> sequence = selector_set.as_text_sequence("Some text.")
>>> selector_set.as_string("Some text.")
'…text.'
Return type

str

as_text_sequence(text, include_nones=True)

List the phrases in a text passage selected by this TextPositionSet.

Parameters
  • passage – A passage to select text from

  • include_nones (bool) – Whether the list of phrases should include None to indicate a block of unselected text

Return type

TextSequence

Returns

A TextSequence of the phrases in the text

>>> selectors = [TextPositionSelector(start=5, end=10)]
>>> selector_set = TextPositionSet(positions=selectors)
>>> selector_set.as_text_sequence("Some text.")
TextSequence([None, TextPassage("text.")])
convert_quotes_to_positions(text)

Return new TextPositionSet with all quotes replaced by their positions in the given text.

Return type

ForwardRef

merge_rangeset(rangeset)

Merge another RangeSet into this one, returning a new TextPositionSet.

Parameters

rangeset (RangeSet) – the RangeSet to merge

Return type

ForwardRef

Returns

a new TextPositionSet representing the combined ranges

classmethod order_of_selectors(v)

Ensure that selectors are in order.

positions_as_quotes(text)

Copy self’s position selectors, converted to quote selectors.

Return type

List[TextQuoteSelector]

classmethod quote_selectors_are_in_list(selectors)

Put single selector in list and convert strings to selectors.

select_text(text, margin_width=3, margin_characters=', ."\\' ;[]()')

Return the selected text from text.

Parameters
  • text (str) – The text that passages are selected from

  • margin_width (int) – The width of the margin to add

  • margin_characters (str) – The characters to include in the margin

Return type

str

Returns

The selected text

>>> from anchorpoint.schemas import TextPositionSetFactory
>>> text = "I predict that the grass is wet. (It rained.)"
>>> factory = TextPositionSetFactory(text=text)
>>> selectors = [TextQuoteSelector(exact="the grass is wet"), TextQuoteSelector(exact="it rained")]
>>> position_set = factory.from_selection(selection=selectors)
>>> position_set.select_text(text=text)
'the grass is wet. (It rained.)'
classmethod selectors_are_in_list(selectors)

Put single selector in list.

class anchorpoint.textselectors.TextPositionSetFactory(text)

Factory for constructing TextPositionSet from text passages and various kinds of selector.

__init__(text)

Store text passage that will be used to generate text selections.

__weakref__

list of weak references to the object (if defined)

from_bool(selection)

Select either the whole passage or none of it.

Return type

TextPositionSet

from_exact_strings(selection)

Construct TextPositionSet from a sequence of strings representing exact quotations.

First converts the sequence to TextQuoteSelectors, and then to TextPositionSelectors.

Return type

TextPositionSet

from_quote_selectors(quotes)

Construct TextPositionSet from a sequence of TextQuoteSelectors.

Return type

TextPositionSet

from_selection(selection)

Construct TextPositionSet for a provided text passage, from any type of selector.

Return type

TextPositionSet

from_selection_sequence(selections)

Construct TextPositionSet from one or more of: strings, Quote Selectors, and Position Selectors.

First converts strings to TextQuoteSelectors, and then to TextPositionSelectors.

Return type

TextPositionSet

class anchorpoint.textselectors.TextQuoteSelector(**data)

Describes a textual segment by quoting it, or passages before or after it.

Based on the Web Annotation Data Model Text Quote Selector standard

Parameters
  • exact – a copy of the text which is being selected

  • prefix – a snippet of text that occurs immediately before the text which is being selected.

  • suffix – the snippet of text that occurs immediately after the text which is being selected.

__hash__ = None
as_position(text)

Get the interval where the selected quote appears in “text”.

Parameters

text (str) – the passage where an exact quotation needs to be located

Return type

TextPositionSelector

Returns

the position selector for the location of the exact quotation

as_unique_position(text)

Get the interval where the selected quote appears in “text”.

Parameters

text (str) – the passage where an exact quotation needs to be located

Return type

TextPositionSelector

Returns

the position selector for the location of the exact quotation

find_match(text)

Get the first match for the selector within a string.

Parameters

text (str) – text to search for a match to the selector

Return type

Optional[Match]

Returns

a regular expression match, or None

>>> text = "process, system, method of operation, concept, principle"
>>> selector = TextQuoteSelector(exact="method of operation")
>>> selector.find_match(text)
<re.Match object; span=(17, 36), match='method of operation'>
classmethod from_text(text)

Create a selector from a text string.

“prefix” and “suffix” fields may be created by separating part of the text with a pipe character (“|”).

Parameters

text (str) – the passage where an exact quotation needs to be located

Return type

TextQuoteSelector

Returns

a selector for the location of the exact quotation

>>> text = "process, system,|method of operation|, concept, principle"
>>> selector = TextQuoteSelector.from_text(text)
>>> selector.prefix
'process, system,'
>>> selector.exact
'method of operation'
>>> selector.suffix
', concept, principle'
is_unique_in(text)

Test if selector refers to exactly one passage in text.

Parameters

text (str) – the passage where an exact quotation needs to be located

Return type

bool

Returns

whether the passage appears exactly once

classmethod no_none_for_prefix(value)

Ensure that ‘prefix’, ‘exact’, and ‘suffix’ are not None.

passage_regex()

Get regex to identify the selected text.

prefix_regex()

Get regex for the text before any whitespace and the selection.

rebuild_from_text(text)

Make new selector with the “exact” value found in a given text.

Used for building a complete selector when exact has not been specified.

Parameters

text (str) – the passage where an exact quotation needs to be located

Return type

Optional[TextQuoteSelector]

Returns

a new selector with the “exact” value found in the provided text

select_text(text)

Get the passage matching the selector, minus any whitespace at ends.

Parameters

text (str) – the passage where an exact quotation needs to be located.

Return type

Optional[str]

Returns

the passage between prefix and suffix in text.

>>> text = "process, system, method of operation, concept, principle"
>>> selector = TextQuoteSelector(prefix="method of operation,")
>>> selector.select_text(text)
'concept, principle'
static split_anchor_text(text)

Break up shorthand text selector format into three fields.

Tries to break up the string into prefix, exact, and suffix, by splitting on exactly two pipe characters.

Parameters

text (str) – a string or dict representing a text passage

Return type

Tuple[str, …]

Returns

a tuple of the three values

suffix_regex()

Get regex for the text following the selection and any whitespace.

exception anchorpoint.textselectors.TextSelectionError

Exception for failing to select text as described by user.

__weakref__

list of weak references to the object (if defined)