Title: | Manipulate Matrix Row and Column Labels with Ease |
---|---|
Description: | Functions to assist manipulation of matrix row and column labels for all types of matrix mathematics where row and column labels are to be respected. |
Authors: | Matthew Heun [aut, cre] |
Maintainer: | Matthew Heun <[email protected]> |
License: | MIT + file LICENSE |
Version: | 0.1.11 |
Built: | 2025-02-12 06:13:06 UTC |
Source: | https://github.com/matthewheun/rclabels |
A description of arrow notation.
arrow_notation
arrow_notation
A vector of notational symbols that provides an arrow separator ("a -> b") between prefix and suffix.
arrow_notation
arrow_notation
A description of bracket arrow notation.
bracket_arrow_notation
bracket_arrow_notation
A vector of notational symbols that provides bracket arrow ("a [-> b]") notation.
bracket_arrow_notation
bracket_arrow_notation
A description of bracket notation.
bracket_notation
bracket_notation
A vector of notational symbols that provides bracket ("a [b]") notation.
bracket_notation
bracket_notation
A description of dash notation.
dash_notation
dash_notation
A vector of notational symbols that provides an dash separator ("a - b") between prefix and suffix.
dash_notation
dash_notation
A description of first dot notation. Note that "a.b.c" splits into prefix ("a") and suffix ("b.c").
first_dot_notation
first_dot_notation
A vector of notational symbols that provides first dot ("a.b") notation.
first_dot_notation
first_dot_notation
A description of from notation.
from_notation
from_notation
A vector of notational symbols that provides from ("a [from b]") notation.
from_notation
from_notation
Nouns are the first part of a row-column label,
"a" in "a [b]".
Internally, this function calls get_pref_suff(which = "pref")
.
get_nouns( labels, inf_notation = TRUE, notation = RCLabels::notations_list, choose_most_specific = TRUE )
get_nouns( labels, inf_notation = TRUE, notation = RCLabels::notations_list, choose_most_specific = TRUE )
labels |
A list or vector of labels from which nouns are to be extracted. |
inf_notation |
A boolean that tells whether to infer notation for |
notation |
The notation type to be used when extracting nouns.
Default is |
choose_most_specific |
A boolean that tells whether to choose the most specific
notation from |
A list of nouns from row and column labels.
get_nouns("a [b]", notation = bracket_notation) # Also works with vectors and lists. get_nouns(c("a [b]", "c [d]")) get_nouns(list("a [b]", "c [d]"))
get_nouns("a [b]", notation = bracket_notation) # Also works with vectors and lists. get_nouns(c("a [b]", "c [d]")) get_nouns(list("a [b]", "c [d]"))
This function extracts the objects of prepositional phrases
from row and column labels.
The format of the output is a list of
named items, one name for each preposition encountered in labels.
Objects are NA
if there is no prepositional phrase starting
with that preposition.
get_objects( labels, inf_notation = TRUE, notation = RCLabels::notations_list, choose_most_specific = FALSE, prepositions = RCLabels::prepositions_list )
get_objects( labels, inf_notation = TRUE, notation = RCLabels::notations_list, choose_most_specific = FALSE, prepositions = RCLabels::prepositions_list )
labels |
The row and column labels from which prepositional phrases are to be extracted. |
inf_notation |
A boolean that tells whether to infer notation for |
notation |
The notation type to be used when extracting prepositions.
Default is |
choose_most_specific |
A boolean that tells whether to choose the most specific
notation from |
prepositions |
A vector of strings to be treated as prepositions.
Note that a space is appended to each word internally,
so, e.g., "to" becomes "to ".
Default is |
A list of objects of prepositional phrases, with names being prepositions, and values being objects.
get_objects(c("a [of b into c]", "d [of Coal from e -> f]"))
get_objects(c("a [of b into c]", "d [of Coal from e -> f]"))
This is a wrapper function for get_pref_suff()
, get_nouns()
, and
get_objects()
.
It returns a piece
of a row or column label.
get_piece( labels, piece = "all", inf_notation = TRUE, notation = RCLabels::notations_list, choose_most_specific = FALSE, prepositions = RCLabels::prepositions_list )
get_piece( labels, piece = "all", inf_notation = TRUE, notation = RCLabels::notations_list, choose_most_specific = FALSE, prepositions = RCLabels::prepositions_list )
labels |
The row and column labels from which prepositional phrases are to be extracted. |
piece |
The name of the item to return. |
inf_notation |
A boolean that tells whether to infer notation for |
notation |
The notation type to be used when extracting prepositions.
Default is |
choose_most_specific |
A boolean that tells whether to choose the most specific
notation from |
prepositions |
A vector of strings to be treated as prepositions.
Note that a space is appended to each word internally,
so, e.g., "to" becomes "to ".
Default is |
piece
is typically one of
"all" (which returns labels
directly),
"pref" (for the prefixes),
"suff" (for the suffixes),
"noun" (returns the noun),
"pps" (prepositional phrases, returns prepositional phrases in full),
"prepositions" (returns a list of prepositions),
"objects" (returns a list of objects with prepositions as names), or
a preposition in prepositions
(as a string), which will return
the object of that preposition named by the preposition itself.
piece
must be a character vector of length 1.
If a piece
is missing in a label, "" (empty string) is returned.
If specifying more than one notation
, be sure the notations are in a list.
notation = c(RCLabels::bracket_notation, RCLabels::arrow_notation)
is unlikely to produce the desired result, because the notations
are concatenated together to form a long string vector.
Rather say
notation = list(RCLabels::bracket_notation, RCLabels::arrow_notation)
.
A piece
of labels
.
labs <- c("a [from b in c]", "d [of e in f]", "Export [of Coal from USA to MEX]") get_piece(labs, "pref") get_piece(labs, "suff") get_piece(labs, piece = "noun") get_piece(labs, piece = "pps") get_piece(labs, piece = "prepositions") get_piece(labs, piece = "objects") get_piece(labs, piece = "from") get_piece(labs, piece = "in") get_piece(labs, piece = "of") get_piece(labs, piece = "to")
labs <- c("a [from b in c]", "d [of e in f]", "Export [of Coal from USA to MEX]") get_piece(labs, "pref") get_piece(labs, "suff") get_piece(labs, piece = "noun") get_piece(labs, piece = "pps") get_piece(labs, piece = "prepositions") get_piece(labs, piece = "objects") get_piece(labs, piece = "from") get_piece(labs, piece = "in") get_piece(labs, piece = "of") get_piece(labs, piece = "to")
This function extracts prepositional phrases from suffixes of row and column labels of the form "a [preposition b]", where "preposition b" is the prepositional phrase.
get_pps( labels, inf_notation = TRUE, notation = RCLabels::notations_list, choose_most_specific = FALSE, prepositions = RCLabels::prepositions_list )
get_pps( labels, inf_notation = TRUE, notation = RCLabels::notations_list, choose_most_specific = FALSE, prepositions = RCLabels::prepositions_list )
labels |
A list or vector of labels from which prepositional phrases are to be extracted. |
inf_notation |
A boolean that tells whether to infer notation for |
notation |
The notation type to be used when extracting prepositional phrases.
Default is |
choose_most_specific |
A boolean that tells whether to choose the most specific
notation from |
prepositions |
A list of prepositions for which to search.
Default is |
All prepositional phrases in a suffix.
get_pps(c("a [in b]", "c [of d]")) get_pps(c("a [of b in c]", "d [-> e of f]"))
get_pps(c("a [in b]", "c [of d]")) get_pps(c("a [of b in c]", "d [-> e of f]"))
This function extracts prepositions from a list of row and column labels. The list has outer structure of the number of labels and an inner structure of each prepositional phrase in the specific label.
get_prepositions( labels, inf_notation = TRUE, notation = RCLabels::notations_list, choose_most_specific = FALSE, prepositions = RCLabels::prepositions_list )
get_prepositions( labels, inf_notation = TRUE, notation = RCLabels::notations_list, choose_most_specific = FALSE, prepositions = RCLabels::prepositions_list )
labels |
The row and column labels from which prepositional phrases are to be extracted. |
inf_notation |
A boolean that tells whether to infer notation for |
notation |
The notation type to be used when extracting prepositions.
Default is |
choose_most_specific |
A boolean that tells whether to choose the most specific
notation from |
prepositions |
A vector of strings to be treated as prepositions.
Note that a space is appended to each word internally,
so, e.g., "to" becomes "to ".
Default is |
If labels are in the form of
from_notation, to_notation or similar,
it is probably best to give bracket_notation in the notation
argument.
Providing
from_notation, to_notation or similar
in the notation
argument will lead to empty results.
The preposition is discarded when extracting the suffix,
yielding empty strings for the prepositions.
A list of prepositions.
get_prepositions(c("a [of b into c]", "d [-> e of f]")) get_prepositions(c("a [of b]", "d [-> e of f]"), inf_notation = FALSE, notation = bracket_notation) # Best to *not* specify notation by the preposition, # as the result will be empty strings. # Rather, give the notation as `bracket_notation` # as shown above, or infer the notation # as shown below. get_prepositions(c("a [of b]", "d [-> e of f]"), inf_notation = TRUE) # The suffix is extracted, and the preposition # is lost before looking for the preposition. get_prepositions(c("a [of b]", "d [of f]"), inf_notation = FALSE, notation = of_notation)
get_prepositions(c("a [of b into c]", "d [-> e of f]")) get_prepositions(c("a [of b]", "d [-> e of f]"), inf_notation = FALSE, notation = bracket_notation) # Best to *not* specify notation by the preposition, # as the result will be empty strings. # Rather, give the notation as `bracket_notation` # as shown above, or infer the notation # as shown below. get_prepositions(c("a [of b]", "d [-> e of f]"), inf_notation = TRUE) # The suffix is extracted, and the preposition # is lost before looking for the preposition. get_prepositions(c("a [of b]", "d [of f]"), inf_notation = FALSE, notation = of_notation)
A description of in notation.
in_notation
in_notation
A vector of notational symbols that provides to ("a [in b]") notation.
in_notation
in_notation
It is convenient to know which notation is applicable to row or column labels.
This function infers which notations
are appropriate for x
.
infer_notation( x, inf_notation = TRUE, notations = RCLabels::notations_list, allow_multiple = FALSE, retain_names = FALSE, choose_most_specific = TRUE, must_succeed = TRUE )
infer_notation( x, inf_notation = TRUE, notations = RCLabels::notations_list, allow_multiple = FALSE, retain_names = FALSE, choose_most_specific = TRUE, must_succeed = TRUE )
x |
A row or column label (or vector of labels). |
inf_notation |
A boolean that tells whether to infer notation for |
notations |
A list of notations from which matches will be inferred.
This function might not work as expected if
|
allow_multiple |
A boolean that tells whether multiple notation matches
are allowed.
If |
retain_names |
A boolean that tells whether to retain names from |
choose_most_specific |
A boolean that indicates whether the most-specific notation
will be returned when more than one of |
must_succeed |
A boolean that if |
This function is vectorized.
Thus, x
can be a vector, in which case the output is a list of notations.
notations
is treated as a store from which matches for each label in x
can be determined.
notations
should be a named list of notations.
When retain_names = TRUE
, the names on notations
will be retained,
and the return value is always a list.
By default (allow_multiple = FALSE
),
a single notation object is returned for each item in x
if only one notation in notations
is appropriate for x
.
If allow_multiple = FALSE
(the default) and more than one notation
is applicable to x
,
an error is thrown.
Multiple matches can be returned when allow_multiple = TRUE
.
If multiple notations are matched, the return value is a list.
When choose_most_specific = TRUE
(the default),
the most specific notation in notations
is returned.
"Most specific" is defined as the matching notation
whose sum of characters in the pref_start
, pref_end
,
suff_start
and suff_end
elements
is greatest.
If choose_most_specific = TRUE
and
two matching notations in notations
have the same number of characters,
only the first match is returned.
When choose_most_specific = TRUE
,
the value of allow_multiple
no longer matters.
allow_multiple = FALSE
is implied and
at most one of the notations
will be returned.
When inf_notation = FALSE
(default is TRUE
),
notations
are returned unmodified,
essentially disabling this function.
Although calling with inf_notation = FALSE
seems daft,
this behavior enables cleaner code elsewhere.
A single notation object (if x
is a single row or column label)
or a list of notation objects (if x
is a vector or a list).
If no notations
match x
, NULL
is returned,
either alone or in a list.
# Does not match any notations in RCLabels::notations_list # and throws an error, because the default value for `must_succeed` # is `TRUE`. ## Not run: infer_notation("abc") ## End(Not run) # This returns `NULL`, because `must_succeed = FALSE`. infer_notation("abc", must_succeed = FALSE) # This succeeds, because the label is in the form of a # notation in `RCLabels::notation_list`, # the default value of the `notation` argument. infer_notation("a -> b") # Names of the notations can be retained, in which case # the return value is always a list. infer_notation("a -> b", retain_names = TRUE) # This function is vectorized. # The list of labels matches # all known notations in `RCLabels::notations_list`. infer_notation(c("a -> b", "a (b)", "a [b]", "a [from b]", "a [of b]", "a [to b]", "a [in b]", "a [-> b]", "a.b"), retain_names = TRUE) # By default, the most specific notation is returned. # But when two or more matches are present, # multiple notations can be returned, too. infer_notation("a [from b]", allow_multiple = TRUE, retain_names = TRUE, choose_most_specific = FALSE) infer_notation(c("a [from b]", "c [to d]"), allow_multiple = TRUE, retain_names = TRUE, choose_most_specific = FALSE) # As shown above, "a \[from b\]" matches 2 notations: # `RCLabels::bracket_notation` and `RCLabels::from_notation`. # The default value for the notation argument is # RCLabels::notations_list, # which includes `RCLabels::bracket_notation` # and `RCLabels::from_notation` in that order. # Thus, there is some flexibility to how this function works # if the value of the `notation` argument is a list of notations # ordered from least specific to most specific, # as `RCLabels::notations_list` is ordered. # To review, the next call returns both `RCLabels::bracket_notation` and # `RCLabels::from_notation`, because `allow_multiple = TRUE` and # `choose_most_specific = FALSE`, neither of which are default. infer_notation("a [from b]", allow_multiple = TRUE, choose_most_specific = FALSE, retain_names = TRUE) # The next call returns `RCLabels::from_notation`, because # the most specific notation is requested, and # `RCLabels::from_notation` has more characters in its specification than # `RCLabels::bracket_notation`. infer_notation("a [from b]", choose_most_specific = TRUE, retain_names = TRUE) # The next call returns the `RCLabels::bracket_notation`, because # `choose_most_specific = FALSE`, and the first matching # notation in `RCLabels::notations_list` is `RCLabels::bracket_notation`. infer_notation("a [from b]", choose_most_specific = FALSE, retain_names = TRUE)
# Does not match any notations in RCLabels::notations_list # and throws an error, because the default value for `must_succeed` # is `TRUE`. ## Not run: infer_notation("abc") ## End(Not run) # This returns `NULL`, because `must_succeed = FALSE`. infer_notation("abc", must_succeed = FALSE) # This succeeds, because the label is in the form of a # notation in `RCLabels::notation_list`, # the default value of the `notation` argument. infer_notation("a -> b") # Names of the notations can be retained, in which case # the return value is always a list. infer_notation("a -> b", retain_names = TRUE) # This function is vectorized. # The list of labels matches # all known notations in `RCLabels::notations_list`. infer_notation(c("a -> b", "a (b)", "a [b]", "a [from b]", "a [of b]", "a [to b]", "a [in b]", "a [-> b]", "a.b"), retain_names = TRUE) # By default, the most specific notation is returned. # But when two or more matches are present, # multiple notations can be returned, too. infer_notation("a [from b]", allow_multiple = TRUE, retain_names = TRUE, choose_most_specific = FALSE) infer_notation(c("a [from b]", "c [to d]"), allow_multiple = TRUE, retain_names = TRUE, choose_most_specific = FALSE) # As shown above, "a \[from b\]" matches 2 notations: # `RCLabels::bracket_notation` and `RCLabels::from_notation`. # The default value for the notation argument is # RCLabels::notations_list, # which includes `RCLabels::bracket_notation` # and `RCLabels::from_notation` in that order. # Thus, there is some flexibility to how this function works # if the value of the `notation` argument is a list of notations # ordered from least specific to most specific, # as `RCLabels::notations_list` is ordered. # To review, the next call returns both `RCLabels::bracket_notation` and # `RCLabels::from_notation`, because `allow_multiple = TRUE` and # `choose_most_specific = FALSE`, neither of which are default. infer_notation("a [from b]", allow_multiple = TRUE, choose_most_specific = FALSE, retain_names = TRUE) # The next call returns `RCLabels::from_notation`, because # the most specific notation is requested, and # `RCLabels::from_notation` has more characters in its specification than # `RCLabels::bracket_notation`. infer_notation("a [from b]", choose_most_specific = TRUE, retain_names = TRUE) # The next call returns the `RCLabels::bracket_notation`, because # `choose_most_specific = FALSE`, and the first matching # notation in `RCLabels::notations_list` is `RCLabels::bracket_notation`. infer_notation("a [from b]", choose_most_specific = FALSE, retain_names = TRUE)
This is a non-public helper function for vectorized infer_notation()
.
infer_notation_for_one_label( x, inf_notation = TRUE, notations = RCLabels::notations_list, allow_multiple = FALSE, retain_names = FALSE, choose_most_specific = TRUE, must_succeed = TRUE )
infer_notation_for_one_label( x, inf_notation = TRUE, notations = RCLabels::notations_list, allow_multiple = FALSE, retain_names = FALSE, choose_most_specific = TRUE, must_succeed = TRUE )
x |
A single row or column label. |
inf_notation |
A boolean that tells whether to infer notation for |
notations |
A list of notations from which matches will be inferred
This function might not work as expected if
|
allow_multiple |
A boolean that tells whether multiple notation matches
are allowed.
If |
retain_names |
A boolean that tells whether to retain names on the
outgoing matches.
Default is |
choose_most_specific |
A boolean that indicates if the most-specific notation
will be returned when more than one of |
must_succeed |
A boolean that if |
A single matching notation object (if allow_multiple = FALSE
, the default)
or possibly multiple matching notation objects (if allow_multiple = TRUE
).
If no notations
match x
, NULL
.
Repeats x
as necessary to make n
of them.
Does not try to simplify x
.
make_list(x, n, lenx = ifelse(is.vector(x), length(x), 1))
make_list(x, n, lenx = ifelse(is.vector(x), length(x), 1))
x |
The object to be duplicated. |
n |
The number of times to be duplicated. |
lenx |
The length of item |
If x
is itself a vector or list,
you may want to override the default value for lenx
.
For example, if x
is a list that should be duplicated several times,
set lenx = 1
.
A list of x
duplicated n
times
m <- matrix(c(1:6), nrow=3, dimnames = list(c("r1", "r2", "r3"), c("c2", "c1"))) make_list(m, n = 1) make_list(m, n = 2) make_list(m, n = 5) make_list(list(c(1,2), c(1,2)), n = 4) m <- matrix(1:4, nrow = 2) l <- list(m, m+100) make_list(l, n = 4) make_list(l, n = 1) # Warning because l is trimmed. make_list(l, n = 5) # Warning because length(l) (i.e., 2) not evenly divisible by 5 make_list(list(c("r10", "r11"), c("c10", "c11")), n = 2) # Confused by x being a list make_list(list(c("r10", "r11"), c("c10", "c11")), n = 2, lenx = 1) # Fix by setting lenx = 1
m <- matrix(c(1:6), nrow=3, dimnames = list(c("r1", "r2", "r3"), c("c2", "c1"))) make_list(m, n = 1) make_list(m, n = 2) make_list(m, n = 5) make_list(list(c(1,2), c(1,2)), n = 4) m <- matrix(1:4, nrow = 2) l <- list(m, m+100) make_list(l, n = 4) make_list(l, n = 1) # Warning because l is trimmed. make_list(l, n = 5) # Warning because length(l) (i.e., 2) not evenly divisible by 5 make_list(list(c("r10", "r11"), c("c10", "c11")), n = 2) # Confused by x being a list make_list(list(c("r10", "r11"), c("c10", "c11")), n = 2, lenx = 1) # Fix by setting lenx = 1
This function makes "or" regex patterns from vectors or lists of strings.
This function can be used with the matsbyname::select_rows_byname()
and matsbyname::select_cols_byname
functions.
make_or_pattern()
correctly escapes special characters in strings
,
such as (
and )
, as needed.
Thus, it is highly recommended that make_or_pattern
be used when
constructing patterns for row and column selections with
matsbyname::select_rows_byname()
and matsbyname::select_cols_byname()
.
make_or_pattern( strings, pattern_type = c("exact", "leading", "trailing", "anywhere", "literal") )
make_or_pattern( strings, pattern_type = c("exact", "leading", "trailing", "anywhere", "literal") )
strings |
A vector of row and column names. |
pattern_type |
One of "exact", "leading", "trailing", "anywhere", or "literal". Default is "exact". |
pattern_type
controls the type of pattern created:
exact
produces a regex pattern that selects row or column names by exact match.
leading
produces a regex pattern that selects row or column names if the item in strings
matches
the beginnings of row or column names.
trailing
produces a regex pattern that selects row or column names if the item in strings
matches
the ends of row or column names.
anywhere
produces a regex pattern that selects row or column names if the item in strings
matches
any substring of row or column names.
literal
returns strings
unmodified, and it is up to the caller to formulate a correct regex.
An "or" regex pattern suitable for selecting row and column names.
Amenable for use with matsbyname::select_rows_byname
or matsbyname::select_cols_byname
.
make_or_pattern(strings = c("a", "b"), pattern_type = "exact") make_or_pattern(strings = c("a", "b"), pattern_type = "leading") make_or_pattern(strings = c("a", "b"), pattern_type = "trailing") make_or_pattern(strings = c("a", "b"), pattern_type = "anywhere") make_or_pattern(strings = c("a", "b"), pattern_type = "literal")
make_or_pattern(strings = c("a", "b"), pattern_type = "exact") make_or_pattern(strings = c("a", "b"), pattern_type = "leading") make_or_pattern(strings = c("a", "b"), pattern_type = "trailing") make_or_pattern(strings = c("a", "b"), pattern_type = "anywhere") make_or_pattern(strings = c("a", "b"), pattern_type = "literal")
Typical piece
s include "noun" or a preposition,
such as "in" or "from".
See RCLabels::prepositions
for additional examples.
This argument may be a single string or a character vector.
modify_label_pieces( labels, piece, mod_map, prepositions = RCLabels::prepositions_list, inf_notation = TRUE, notation = RCLabels::bracket_notation, choose_most_specific = FALSE )
modify_label_pieces( labels, piece, mod_map, prepositions = RCLabels::prepositions_list, inf_notation = TRUE, notation = RCLabels::bracket_notation, choose_most_specific = FALSE )
labels |
A vector of row or column labels in which pieces will be modified. |
piece |
The piece (or pieces) of the row or column label that will be modified. |
mod_map |
A modification map. See details. |
prepositions |
A list of prepositions, used to detect prepositional phrases.
Default is |
inf_notation |
A boolean that tells whether to infer notation for |
notation |
The notation type to be used when extracting prepositions.
Default is |
choose_most_specific |
A boolean that tells whether the most specific
notation is selected when more than one notation match.
Default is |
This function modifies pieces of row and column labels
according to label_map
that defines "one or many to one" relationships.
This function is useful for aggregations.
For example, replacing nouns can be done by
modify_label_pieces(labels, piece = "noun", label_map = list(new_noun = c("a", "b", "c"))
.
The string "new_noun" will replace any of "a", "b", or "c"
when they appear as nouns in a row or column label.
See examples for details.
The mod_map
argument should consist of a
named list of character vectors in which names indicate
strings to be inserted and values indicate
values that should be replaced.
The sense is new = old
or new = olds
,
where "new" is the new name (the replacement) and
"old"/"olds" is/are a string/vector of strings,
any one of which will be replaced by "new".
Note piece
can be "pref"/"suff" or "noun"/"prepositions"
If any piece
is "pref" or "suff",
all pieces are assumed to be a prefix or a suffix.
If non of the piece
s are "pref" or "suff",
all piece
s are assumed to be nouns or prepositions,
such as "in" or "from".
See RCLabels::prepositions
for additional examples.
This argument may be a single string or a character vector.
labels
with replacements according to piece
and mod_map
.
# Simple case modify_label_pieces("a [of b in c]", piece = "noun", mod_map = list(new_noun = c("a", "b"))) # Works with a vector or list of labels modify_label_pieces(c("a [of b in c]", "d [-> e in f]"), piece = "noun", mod_map = list(new_noun = c("d", "e"))) # Works with multiple items in the mod_map modify_label_pieces(c("a [of b in c]", "d [-> e in f]"), piece = "noun", mod_map = list(new_noun1 = c("a", "b", "c"), new_noun2 = c("d", "e", "f"))) # Works with multiple pieces to be modified modify_label_pieces(c("a [of b in c]", "d [-> e in f]"), piece = c("noun", "in"), mod_map = list(new_noun = c("a", "b", "c"), new_in = c("c", "f")))
# Simple case modify_label_pieces("a [of b in c]", piece = "noun", mod_map = list(new_noun = c("a", "b"))) # Works with a vector or list of labels modify_label_pieces(c("a [of b in c]", "d [-> e in f]"), piece = "noun", mod_map = list(new_noun = c("d", "e"))) # Works with multiple items in the mod_map modify_label_pieces(c("a [of b in c]", "d [-> e in f]"), piece = "noun", mod_map = list(new_noun1 = c("a", "b", "c"), new_noun2 = c("d", "e", "f"))) # Works with multiple pieces to be modified modify_label_pieces(c("a [of b in c]", "d [-> e in f]"), piece = c("noun", "in"), mod_map = list(new_noun = c("a", "b", "c"), new_in = c("c", "f")))
This function modifies the nouns of row and column labels.
The length of new_nouns
must be the same as the length of labels
.
modify_nouns( labels, new_nouns, inf_notation = TRUE, notation = RCLabels::notations_list, choose_most_specific = FALSE )
modify_nouns( labels, new_nouns, inf_notation = TRUE, notation = RCLabels::notations_list, choose_most_specific = FALSE )
labels |
The row and column labels in which the nouns will be modified. |
new_nouns |
The new nouns to be set in |
inf_notation |
A boolean that tells whether to infer notation for |
notation |
The notation type to be used when extracting prepositions.
Default is |
choose_most_specific |
A boolean that tells whether to choose the most specific
notation from |
A character vector of same length as labels
with nouns modified to be new_nouns
.
labels <- c("a [of b in c]", "d [of e in USA]") modify_nouns(labels, c("a_plus", "g"))
labels <- c("a [of b in c]", "d [of e in USA]") modify_nouns(labels, c("a_plus", "g"))
A list of all bundled notations.
This list is organized by least specific to most specific,
thereby enabling some unique behaviors in infer_notation()
.
See the examples for infer_notation()
.
notations_list
notations_list
A list of bundled notations.
notations_list
notations_list
A description of of notation.
of_notation
of_notation
A vector of notational symbols that provides of ("a [of b]") notation.
of_notation
of_notation
A description of parenthetical notation.
paren_notation
paren_notation
A vector of notational symbols that provides a parenthetical ("a (b)") notation.
paren_notation
paren_notation
This function recombines (unsplits) row or column labels that have
been separated by split_noun_pp()
.
paste_noun_pp( splt_labels, notation = RCLabels::bracket_notation, squish = TRUE )
paste_noun_pp( splt_labels, notation = RCLabels::bracket_notation, squish = TRUE )
splt_labels |
A vector of split row or column labels, probably created by |
notation |
The notation object that describes the labels.
Default is |
squish |
A boolean that tells whether to remove extra spaces in the output of |
Recombined row and column labels.
labs <- c("a [of b in c]", "d [from Coal mines in USA]") labs split <- split_noun_pp(labs) split paste_noun_pp(split) # Also works in a data frame df <- tibble::tibble(labels = c("a [in b]", "c [of d into USA]", "e [of f in g]", "h [-> i in j]")) recombined <- df %>% dplyr::mutate( splits = split_noun_pp(labels), recombined = paste_noun_pp(splits) ) all(recombined$labels == recombined$recombined)
labs <- c("a [of b in c]", "d [from Coal mines in USA]") labs split <- split_noun_pp(labs) split paste_noun_pp(split) # Also works in a data frame df <- tibble::tibble(labels = c("a [in b]", "c [of d into USA]", "e [of f in g]", "h [-> i in j]")) recombined <- df %>% dplyr::mutate( splits = split_noun_pp(labels), recombined = paste_noun_pp(splits) ) all(recombined$labels == recombined$recombined)
This constant is deprecated.
Please use prepositiions_list
instead.
prepositions
prepositions
A vector of prepositions used in row and column labels.
Prepositions used in row and column labels.
prepositions_list
prepositions_list
A vector of prepositions used in row and column labels.
prepositions_list
prepositions_list
match_by_pattern()
tells whether row or column labels
match a regular expression.
Internally, grepl()
decides whether a match occurs.
replace_by_pattern()
replaces portions of row of column labels
when a regular expression is matched.
Internally, gsub()
performs the replacements.
match_by_pattern( labels, regex_pattern, pieces = "all", prepositions = RCLabels::prepositions_list, notation = RCLabels::bracket_notation, inf_notation = TRUE, choose_most_specific = FALSE, ... ) replace_by_pattern( labels, regex_pattern, replacement, pieces = "all", prepositions = RCLabels::prepositions_list, notation = RCLabels::bracket_notation, ... )
match_by_pattern( labels, regex_pattern, pieces = "all", prepositions = RCLabels::prepositions_list, notation = RCLabels::bracket_notation, inf_notation = TRUE, choose_most_specific = FALSE, ... ) replace_by_pattern( labels, regex_pattern, replacement, pieces = "all", prepositions = RCLabels::prepositions_list, notation = RCLabels::bracket_notation, ... )
labels |
The row and column labels to be modified. |
regex_pattern |
The regular expression pattern to determine matches and replacements.
Consider using |
pieces |
The pieces of row or column labels to be checked for matches or replacements. See details. |
prepositions |
A vector of strings that count as prepositions.
Default is prepositions_list.
Used to detect prepositional phrases
if |
notation |
The notation used in |
inf_notation |
A boolean that tells whether to infer notation for |
choose_most_specific |
A boolean that tells whether to choose the most specific
notation from |
... |
Other arguments passed to |
replacement |
For |
By default (pieces = "all"
), complete labels (as strings) are checked for matches
and replacements.
If pieces == "pref"
or pieces == "suff"
,
only the prefix or the suffix is checked for matches and replacements.
Alternatively, pieces = "noun"
or pieces = <<preposition>>
indicate
that only specific pieces of labels are to be checked for matches and replacements.
When pieces = <<preposition>>
, only the object of <<preposition>>
is
checked for matches and replacement.
pieces
can be a vector, indicating multiple pieces to be checked for matches
and replacements.
But if any of the pieces
are "all", all pieces are checked and replaced.
If pieces
is "pref" or "suff", only one can be specified.
A logical vector of same length as labels
,
where TRUE
indicates a match was found and FALSE
indicates otherwise.
labels <- c("Production [of b in c]", "d [of Coal in f]", "g [of h in USA]") # With default `pieces` argument, matching is done for whole labels. match_by_pattern(labels, regex_pattern = "Production") match_by_pattern(labels, regex_pattern = "Coal") match_by_pattern(labels, regex_pattern = "USA") # Check beginnings of labels match_by_pattern(labels, regex_pattern = "^Production") # Check at ends of labels: no match. match_by_pattern(labels, regex_pattern = "Production$") # Can match on nouns or prepositions. match_by_pattern(labels, regex_pattern = "Production", pieces = "noun") # Gives FALSE, because "Production" is a noun. match_by_pattern(labels, regex_pattern = "Production", pieces = "in")
labels <- c("Production [of b in c]", "d [of Coal in f]", "g [of h in USA]") # With default `pieces` argument, matching is done for whole labels. match_by_pattern(labels, regex_pattern = "Production") match_by_pattern(labels, regex_pattern = "Coal") match_by_pattern(labels, regex_pattern = "USA") # Check beginnings of labels match_by_pattern(labels, regex_pattern = "^Production") # Check at ends of labels: no match. match_by_pattern(labels, regex_pattern = "Production$") # Can match on nouns or prepositions. match_by_pattern(labels, regex_pattern = "Production", pieces = "noun") # Gives FALSE, because "Production" is a noun. match_by_pattern(labels, regex_pattern = "Production", pieces = "in")
This function removes pieces from row and column labels.
remove_label_pieces( labels, pieces_to_remove, prepositions = RCLabels::prepositions_list, inf_notation = TRUE, notation = RCLabels::notations_list, choose_most_specific = FALSE )
remove_label_pieces( labels, pieces_to_remove, prepositions = RCLabels::prepositions_list, inf_notation = TRUE, notation = RCLabels::notations_list, choose_most_specific = FALSE )
labels |
The row and column labels from which prepositional phrases will be removed. |
pieces_to_remove |
The names of pieces of the label to be removed,
typically "noun" or a preposition such as "of" or "in"
See |
prepositions |
A list of prepositions, used to detect prepositional phrases.
Default is |
inf_notation |
A boolean that tells whether to infer notation for |
notation |
The notation type to be used when extracting prepositions.
Default is |
choose_most_specific |
A boolean that tells whether the most specific
notation is selected when more than one notation match.
Default is |
labels
with pieces removed.
labs <- c("a [of b in c]", "d [-> e in f]") remove_label_pieces(labs, pieces_to_remove = "of") remove_label_pieces(labs, pieces_to_remove = c("of", "->")) remove_label_pieces(labs, pieces_to_remove = c("in", "into")) remove_label_pieces(labs, pieces_to_remove = c("of", "in"))
labs <- c("a [of b in c]", "d [-> e in f]") remove_label_pieces(labs, pieces_to_remove = "of") remove_label_pieces(labs, pieces_to_remove = c("of", "->")) remove_label_pieces(labs, pieces_to_remove = c("in", "into")) remove_label_pieces(labs, pieces_to_remove = c("of", "in"))
It is often convenient to represent matrix row and column names with notation that includes a prefix and a suffix, with corresponding separators or start-end string sequences. There are several functions to generate specialized versions or otherwise manipulate row and column names on their own or as row or column names.
flip_pref_suff()
Switches the location of prefix and suffix, such that the prefix becomes the suffix, and
the suffix becomes the prefix.
E.g., "a -> b" becomes "b -> a" or "a [b]" becomes "b [a]".
get_pref_suff()
Selects only prefix or suffix, discarding notational elements
and the rejected part.
Internally, this function calls split_pref_suff()
and selects only the desired portion.
notation_vec()
Builds a vector of notation symbols in a standard format.
By default, it builds a list of notation symbols that provides an arrow
separator (" -> ") between prefix and suffix.
paste_pref_suff()
paste0
's prefixes and suffixes, the inverse of split_pref_suff()
.
Always returns a character vector.
preposition_notation()
Builds a list of notation symbols that provides (by default) square brackets around the suffix with a preposition ("prefix [preposition suffix]").
split_pref_suff()
Splits prefixes from suffixes, returning each in a list with names pref
and suff
.
If no prefix or suffix delimiters are found, x
is returned in the pref
item, unmodified,
and the suff
item is returned as ""
(an empty string).
If there is no prefix, and empty string is returned for the pref
item.
If there is no suffix, and empty string is returned for the suff
item.
switch_notation()
Switches from one type of notation to another based on the from
and to
arguments.
Optionally, prefix and suffix can be flip
ped.
Parts of a notation
vector are
"pref_start", "pref_end", "suff_start", and "suff_end".
None of the strings in a notation vector are considered part of the prefix or suffix.
E.g., "a -> b" in arrow notation means that "a" is the prefix and "b" is the suffix.
If sep
only is specified for notation_vec()
(default is " -> "),
pref_start
, pref_end
, suff_start
, and suff_end
are
set appropriately.
For functions where the notation
argument is used to identify portions of the row or column label
(such as split_pref_suff()
, get_pref_suff()
,
and the from
argument to switch_notation()
),
(Note: flip_pref_suff()
cannot infer notation, because it switches prefix and suffix in a known, single notation.)
if notation
is a list, it is treated as a store from which
the most appropriate notation is inferred by infer_notation(choose_most_specific = TRUE)
.
Because default is RCLabels::notations_list
,
notation is inferred by default.
The argument choose_most_specific
tells what to do when two notation
s match a label:
if TRUE
(the default), the notation with most characters is selected.
If FALSE
, the first matching notation in notation
will be selected.
See details at infer_notation()
.
If specifying more than one notation
, be sure the notations are in a list.
notation = c(RCLabels::bracket_notation, RCLabels::arrow_notation)
is unlikely to produce the desired result, because the notations
are concatenated together to form a long string vector.
Rather say
notation = list(RCLabels::bracket_notation, RCLabels::arrow_notation)
.
For functions that construct labels (such as paste_pref_suff()
),
notation
can be a list of notations
over which the paste tasks is mapped.
If notation
is a list, it must have as many items as
there are prefix/suffix pairs to be pasted.
If either pref
or suff
are a zero-length character vector
(essentially an empty character vector
such as obtained from character()
)
input to paste_pref_suff()
,
an error is thrown.
Instead, use an empty character string
(such as obtained from ""
).
notation_vec( sep = " -> ", pref_start = "", pref_end = "", suff_start = "", suff_end = "" ) preposition_notation(preposition, suff_start = " [", suff_end = "]") split_pref_suff( x, transpose = FALSE, inf_notation = TRUE, notation = RCLabels::notations_list, choose_most_specific = TRUE ) paste_pref_suff( ps = list(pref = pref, suff = suff), pref = NULL, suff = NULL, notation = RCLabels::arrow_notation, squish = TRUE ) flip_pref_suff( x, notation = RCLabels::notations_list, inf_notation = TRUE, choose_most_specific = TRUE ) get_pref_suff( x, which = c("pref", "suff"), inf_notation = TRUE, notation = RCLabels::notations_list, choose_most_specific = TRUE ) switch_notation( x, from = RCLabels::notations_list, to, flip = FALSE, inf_notation = TRUE )
notation_vec( sep = " -> ", pref_start = "", pref_end = "", suff_start = "", suff_end = "" ) preposition_notation(preposition, suff_start = " [", suff_end = "]") split_pref_suff( x, transpose = FALSE, inf_notation = TRUE, notation = RCLabels::notations_list, choose_most_specific = TRUE ) paste_pref_suff( ps = list(pref = pref, suff = suff), pref = NULL, suff = NULL, notation = RCLabels::arrow_notation, squish = TRUE ) flip_pref_suff( x, notation = RCLabels::notations_list, inf_notation = TRUE, choose_most_specific = TRUE ) get_pref_suff( x, which = c("pref", "suff"), inf_notation = TRUE, notation = RCLabels::notations_list, choose_most_specific = TRUE ) switch_notation( x, from = RCLabels::notations_list, to, flip = FALSE, inf_notation = TRUE )
sep |
A string separator between prefix and suffix. Default is " -> ". |
pref_start |
A string indicating the start of a prefix. Default is |
pref_end |
A string indicating the end of a prefix. Default is the value of |
suff_start |
A string indicating the start of a suffix. Default is the value of |
suff_end |
A string indicating the end of a suffix. Default is |
preposition |
A string used to indicate position for energy flows, typically "from" or "to" in different notations. |
x |
A string or vector of strings to be operated upon. |
transpose |
A boolean that tells whether to |
inf_notation |
A boolean that tells whether to infer notation for |
notation |
A notation vector generated by one of the |
choose_most_specific |
A boolean that tells whether to choose the most specific
notation from the |
ps |
A list of prefixes and suffixes in which each item of the list is itself a list with two items named |
pref |
A string or list of strings that are prefixes. Default is |
suff |
A string of list of strings that are suffixes. Default is |
squish |
A boolean that tells whether to remove extra spaces in the output of |
which |
Tells which to keep, the prefix ("pref") or the suffix ("suff"). |
from |
The |
to |
The |
flip |
A boolean that tells whether to also flip the notation. Default is |
For notation_vec()
, arrow_notation
, and bracket_notation
,
a string vector with named items pref_start
, pref_end
, suff_start
, and suff_end
;
For split_pref_suff()
, a string list with named items pref
and suff
.
For paste_pref_suff()
, split_pref_suff()
, and switch_notation()
,
a string list in notation format specified by various notation
arguments, including
from
, and to
.
For keep_pref_suff
, one of the prefix or suffix or a list of prefixes or suffixes.
notation_vec() arrow_notation bracket_notation split_pref_suff("a -> b", notation = arrow_notation) # Or infer the notation (by default from notations_list) split_pref_suff("a -> b") split_pref_suff(c("a -> b", "c -> d", "e -> f")) split_pref_suff(c("a -> b", "c -> d", "e -> f"), transpose = TRUE) flip_pref_suff("a [b]", notation = bracket_notation) # Infer notation flip_pref_suff("a [b]") get_pref_suff("a -> b", which = "suff") switch_notation("a -> b", from = arrow_notation, to = bracket_notation) # Infer notation and flip prefix and suffix switch_notation("a -> b", to = bracket_notation, flip = TRUE) # Also works for vectors switch_notation(c("a -> b", "c -> d"), from = arrow_notation, to = bracket_notation) # Functions can infer the correct notation and return multiple matches infer_notation("a [to b]", allow_multiple = TRUE, choose_most_specific = FALSE) # Or choose the most specific notation infer_notation("a [to b]", allow_multiple = TRUE, choose_most_specific = TRUE) # When setting the from notation, only that type of notation will be switched switch_notation(c("a -> b", "c [to d]"), from = arrow_notation, to = bracket_notation) # But if notations are inferred, all notations can be switched switch_notation(c("a -> b", "c [to d]"), to = bracket_notation) # A double-switch can be accomplished. # In this first example, `RCLabels::first_dot_notation` is inferred. switch_notation("a.b.c", to = arrow_notation) # In this second example, # it is easier to specify the `from` and `to` notations. switch_notation("a.b.c", to = arrow_notation) %>% switch_notation(from = first_dot_notation, to = arrow_notation) # "" can be used as an input paste_pref_suff(pref = "a", suff = "", notation = RCLabels::from_notation)
notation_vec() arrow_notation bracket_notation split_pref_suff("a -> b", notation = arrow_notation) # Or infer the notation (by default from notations_list) split_pref_suff("a -> b") split_pref_suff(c("a -> b", "c -> d", "e -> f")) split_pref_suff(c("a -> b", "c -> d", "e -> f"), transpose = TRUE) flip_pref_suff("a [b]", notation = bracket_notation) # Infer notation flip_pref_suff("a [b]") get_pref_suff("a -> b", which = "suff") switch_notation("a -> b", from = arrow_notation, to = bracket_notation) # Infer notation and flip prefix and suffix switch_notation("a -> b", to = bracket_notation, flip = TRUE) # Also works for vectors switch_notation(c("a -> b", "c -> d"), from = arrow_notation, to = bracket_notation) # Functions can infer the correct notation and return multiple matches infer_notation("a [to b]", allow_multiple = TRUE, choose_most_specific = FALSE) # Or choose the most specific notation infer_notation("a [to b]", allow_multiple = TRUE, choose_most_specific = TRUE) # When setting the from notation, only that type of notation will be switched switch_notation(c("a -> b", "c [to d]"), from = arrow_notation, to = bracket_notation) # But if notations are inferred, all notations can be switched switch_notation(c("a -> b", "c [to d]"), to = bracket_notation) # A double-switch can be accomplished. # In this first example, `RCLabels::first_dot_notation` is inferred. switch_notation("a.b.c", to = arrow_notation) # In this second example, # it is easier to specify the `from` and `to` notations. switch_notation("a.b.c", to = arrow_notation) %>% switch_notation(from = first_dot_notation, to = arrow_notation) # "" can be used as an input paste_pref_suff(pref = "a", suff = "", notation = RCLabels::from_notation)
This function is similar to split_pref_suff()
in that it returns a list.
However, this function's list is more detailed than
split_pref_suff()
.
The return value from this function is a list
with the first named item being the prefix (with the name noun
)
followed by objects of prepositional phrases
(with names being prepositions that precede the objects).
split_noun_pp( labels, inf_notation = TRUE, notation = RCLabels::notations_list, choose_most_specific = FALSE, prepositions = RCLabels::prepositions_list )
split_noun_pp( labels, inf_notation = TRUE, notation = RCLabels::notations_list, choose_most_specific = FALSE, prepositions = RCLabels::prepositions_list )
labels |
The row and column labels from which prepositional phrases are to be extracted. |
inf_notation |
A boolean that tells whether to infer notation for |
notation |
The notation type to be used when extracting prepositions.
Default is |
choose_most_specific |
A boolean that tells whether to choose the most specific
notation from |
prepositions |
A vector of strings to be treated as prepositions.
Note that a space is appended to each word internally,
so, e.g., "to" becomes "to ".
Default is |
Unlike split_pref_suff()
, it does not make sense to have a transpose
argument on split_noun_pp()
.
Labels may not have the same structure,
e.g., they may have different prepositions.
A list of lists with items named noun
and pp
.
# Specify the notation split_noun_pp(c("a [of b in c]", "d [of e into f]"), notation = bracket_notation) # Infer the notation via default arguments split_noun_pp(c("a [of b in c]", "d [of e into f]"))
# Specify the notation split_noun_pp(c("a [of b in c]", "d [of e into f]"), notation = bracket_notation) # Infer the notation via default arguments split_noun_pp(c("a [of b in c]", "d [of e into f]"))
This function should only ever see a single label (x
)
and a single notation
.
strip_label_part(x, notation, part, pattern_pref = "", pattern_suff = "")
strip_label_part(x, notation, part, pattern_pref = "", pattern_suff = "")
x |
The label(s) to be split. |
notation |
The notations to be used for each |
part |
The part of the label to work on, such as "pref_start", "pref_end", "suff_start", or "suff_end". |
pattern_pref |
The prefix to a regex pattern to be used in |
pattern_suff |
The suffix to a regex pattern to be used in |
If notation
is NULL
, x
is returned, unmodified.
A label shorn of the part to be stripped.
A description of to notation.
to_notation
to_notation
A vector of notational symbols that provides to ("a [to b]") notation.
to_notation
to_notation