Text Selectors¶
Text substring selectors for anchoring annotations.
Based on parts of the W3C Web Annotation Data Model.
- class anchorpoint.textselectors.TextPositionSelector(**data)¶
Describes a textual segment by start and end positions.
Based on the Web Annotation Data Model Text Position Selector standard
- Parameters
start – The starting position of the segment of text. The first character in the full text is character position 0, and the character is included within the segment.
end – The end position of the segment of text. The character is not included within the segment.
- __add__(value)¶
Make a new selector covering the combined ranges of self and other.
- Parameters
other – selector for another text interval
margin – allowable distance between two selectors that can still be added together
- Return type
- Returns
a selector reflecting the combined range if possible, otherwise None
- __and__(other)¶
Make a new selector covering the combined ranges of self and other.
- Parameters
other (
Union
[TextPositionSelector
,TextPositionSet
,Range
,RangeSet
]) – selector for another text interval- Return type
- Returns
a selector reflecting the combined range
- __gt__(other)¶
Check if self is greater than other.
- Parameters
other (
Union
[TextPositionSelector
,TextPositionSet
]) – selector for another text interval- Return type
- Returns
whether self is greater than other
- __hash__ = None¶
- __or__(other)¶
Make a new selector covering the combined ranges of self and other.
- Parameters
other (
Union
[TextPositionSelector
,TextPositionSet
,Range
,RangeSet
]) – selector for another text interval- Return type
- Returns
a selector reflecting the combined range
- as_quote(text, left_margin=0, right_margin=0)¶
Make a quote selector, creating prefix and suffix from specified lengths of text.
- Parameters
- Return type
- combine(other, text)¶
Make new selector combining ranges of self and other if it will fit in text.
- difference(rng)¶
Apply Range difference method replacing RangeSet with TextPositionSet in return value.
- Return type
- range()¶
Get the range of the text.
- Return type
Range
- rangeset()¶
Get the range set of the text.
- Return type
RangeSet
- select_text(text)¶
Get the quotation from text identified by start and end positions.
- Return type
- classmethod start_not_negative(v)¶
Check if start position is not negative.
- Return type
- Returns
whether the start position is not negative
- class anchorpoint.textselectors.TextPositionSet(**data)¶
A set of TextPositionSelectors.
- __add__(value)¶
Increase all startpoints and endpoints by the given amount.
- Parameters
value (
Union
[int
,TextPositionSelector
,TextPositionSet
]) – selector for another text interval, or integet to add to every start and end value in self’s position selectors- Return type
- Returns
a selector reflecting the combined range if possible, otherwise None
- __gt__(other)¶
Test if self’s rangeset includes all of other’s rangeset, but is not identical.
- Return type
- __hash__ = None¶
- __str__()¶
Return str(self).
- __sub__(value)¶
Decrease all startpoints and endpoints by the given amount.
- Return type
- add_margin(text, margin_width=3, margin_characters=', ."\\' ;[]()')¶
Expands selected position selectors to include margin of punctuation.
This can cause multiple selections to be merged into a single one.
Ignores quote selectors.
- Parameters
- Return type
- Returns
A new TextPositionSet with the margin added
>>> from anchorpoint.schemas import TextPositionSetFactory >>> text = "I predict that the grass is wet. (It rained.)" >>> factory = TextPositionSetFactory(text=text) >>> selectors = [TextQuoteSelector(exact="the grass is wet"), TextQuoteSelector(exact="it rained")] >>> position_set = factory.from_selection(selection=selectors) >>> len(position_set.ranges()) 2 >>> new_position_set = position_set.add_margin(text=text) >>> len(new_position_set.ranges()) 1 >>> new_position_set.ranges()[0].start 15 >>> new_position_set.ranges()[0].end 43
- as_quotes(text)¶
Copy self’s quote and position selectors, converting all position selectors to quote selectors.
- Return type
- as_string(text)¶
Return a string representing the selected parts of text.
>>> selectors = [TextPositionSelector(start=5, end=10)] >>> selector_set = TextPositionSet(positions=selectors) >>> sequence = selector_set.as_text_sequence("Some text.") >>> selector_set.as_string("Some text.") '…text.'
- Return type
- as_text_sequence(text, include_nones=True)¶
List the phrases in a text passage selected by this TextPositionSet.
- Parameters
passage – A passage to select text from
include_nones (
bool
) – Whether the list of phrases should include None to indicate a block of unselected text
- Return type
- Returns
A TextSequence of the phrases in the text
>>> selectors = [TextPositionSelector(start=5, end=10)] >>> selector_set = TextPositionSet(positions=selectors) >>> selector_set.as_text_sequence("Some text.") TextSequence([None, TextPassage("text.")])
- convert_quotes_to_positions(text)¶
Return new TextPositionSet with all quotes replaced by their positions in the given text.
- Return type
- merge_rangeset(rangeset)¶
Merge another RangeSet into this one, returning a new TextPositionSet.
- Parameters
rangeset (
RangeSet
) – the RangeSet to merge- Return type
- Returns
a new TextPositionSet representing the combined ranges
- classmethod order_of_selectors(v)¶
Ensure that selectors are in order.
- positions_as_quotes(text)¶
Copy self’s position selectors, converted to quote selectors.
- Return type
- classmethod quote_selectors_are_in_list(selectors)¶
Put single selector in list and convert strings to selectors.
- select_text(text, margin_width=3, margin_characters=', ."\\' ;[]()')¶
Return the selected text from text.
- Parameters
- Return type
- Returns
The selected text
>>> from anchorpoint.schemas import TextPositionSetFactory >>> text = "I predict that the grass is wet. (It rained.)" >>> factory = TextPositionSetFactory(text=text) >>> selectors = [TextQuoteSelector(exact="the grass is wet"), TextQuoteSelector(exact="it rained")] >>> position_set = factory.from_selection(selection=selectors) >>> position_set.select_text(text=text) 'the grass is wet. (It rained.)'
- classmethod selectors_are_in_list(selectors)¶
Put single selector in list.
- class anchorpoint.textselectors.TextPositionSetFactory(text)¶
Factory for constructing
TextPositionSet
from text passages and various kinds of selector.- __init__(text)¶
Store text passage that will be used to generate text selections.
- __weakref__¶
list of weak references to the object (if defined)
- from_bool(selection)¶
Select either the whole passage or none of it.
- Return type
- from_exact_strings(selection)¶
Construct TextPositionSet from a sequence of strings representing exact quotations.
First converts the sequence to TextQuoteSelectors, and then to TextPositionSelectors.
- Return type
- from_quote_selectors(quotes)¶
Construct TextPositionSet from a sequence of TextQuoteSelectors.
- Return type
- from_selection(selection)¶
Construct TextPositionSet for a provided text passage, from any type of selector.
- Return type
- from_selection_sequence(selections)¶
Construct TextPositionSet from one or more of: strings, Quote Selectors, and Position Selectors.
First converts strings to TextQuoteSelectors, and then to TextPositionSelectors.
- Return type
- class anchorpoint.textselectors.TextQuoteSelector(**data)¶
Describes a textual segment by quoting it, or passages before or after it.
Based on the Web Annotation Data Model Text Quote Selector standard
- Parameters
exact – a copy of the text which is being selected
prefix – a snippet of text that occurs immediately before the text which is being selected.
suffix – the snippet of text that occurs immediately after the text which is being selected.
- __hash__ = None¶
- as_position(text)¶
Get the interval where the selected quote appears in “text”.
- Parameters
text (
str
) – the passage where an exact quotation needs to be located- Return type
- Returns
the position selector for the location of the exact quotation
- as_unique_position(text)¶
Get the interval where the selected quote appears in “text”.
- Parameters
text (
str
) – the passage where an exact quotation needs to be located- Return type
- Returns
the position selector for the location of the exact quotation
- find_match(text)¶
Get the first match for the selector within a string.
- Parameters
text (
str
) – text to search for a match to the selector- Return type
Optional
[Match
]- Returns
a regular expression match, or None
>>> text = "process, system, method of operation, concept, principle" >>> selector = TextQuoteSelector(exact="method of operation") >>> selector.find_match(text) <re.Match object; span=(17, 36), match='method of operation'>
- classmethod from_text(text)¶
Create a selector from a text string.
“prefix” and “suffix” fields may be created by separating part of the text with a pipe character (“|”).
- Parameters
text (
str
) – the passage where an exact quotation needs to be located- Return type
- Returns
a selector for the location of the exact quotation
>>> text = "process, system,|method of operation|, concept, principle" >>> selector = TextQuoteSelector.from_text(text) >>> selector.prefix 'process, system,' >>> selector.exact 'method of operation' >>> selector.suffix ', concept, principle'
- is_unique_in(text)¶
Test if selector refers to exactly one passage in text.
- classmethod no_none_for_prefix(value)¶
Ensure that ‘prefix’, ‘exact’, and ‘suffix’ are not None.
- passage_regex()¶
Get regex to identify the selected text.
- prefix_regex()¶
Get regex for the text before any whitespace and the selection.
- rebuild_from_text(text)¶
Make new selector with the “exact” value found in a given text.
Used for building a complete selector when
exact
has not been specified.- Parameters
text (
str
) – the passage where an exact quotation needs to be located- Return type
- Returns
a new selector with the “exact” value found in the provided text
- select_text(text)¶
Get the passage matching the selector, minus any whitespace at ends.
- Parameters
text (
str
) – the passage where an exact quotation needs to be located.- Return type
- Returns
the passage between
prefix
andsuffix
intext
.
>>> text = "process, system, method of operation, concept, principle" >>> selector = TextQuoteSelector(prefix="method of operation,") >>> selector.select_text(text) 'concept, principle'
- static split_anchor_text(text)¶
Break up shorthand text selector format into three fields.
Tries to break up the string into
prefix
,exact
, andsuffix
, by splitting on exactly two pipe characters.
- suffix_regex()¶
Get regex for the text following the selection and any whitespace.