Identify context words using user-provided patterns
Usage
textstat_context(
x,
pattern,
valuetype = c("glob", "regex", "fixed"),
case_insensitive = TRUE,
window = 10,
min_count = 10,
remove_pattern = TRUE,
n = 1,
skip = 0,
...
)
char_context(
x,
pattern,
valuetype = c("glob", "regex", "fixed"),
case_insensitive = TRUE,
window = 10,
min_count = 10,
remove_pattern = TRUE,
p = 0.001,
n = 1,
skip = 0
)
Arguments
- x
a tokens object created by
quanteda::tokens()
.- pattern
quanteda::pattern()
to specify target words.- valuetype
the type of pattern matching:
"glob"
for "glob"-style wildcard expressions;"regex"
for regular expressions; or"fixed"
for exact matching. Seequanteda::valuetype()
for details.- case_insensitive
if
TRUE
, ignore case when matching.- window
size of window for collocation analysis.
- min_count
minimum frequency of words within the window to be considered as collocations.
- remove_pattern
if
TRUE
, keywords do not contain target words.- n
integer vector specifying the number of elements to be concatenated in each n-gram. Each element of this vector will define a \(n\) in the \(n\)-gram(s) that are produced.
- skip
integer vector specifying the adjacency skip size for tokens forming the n-grams, default is 0 for only immediately neighbouring words. For
skipgrams
,skip
can be a vector of integers, as the "classic" approach to forming skip-grams is to set skip = \(k\) where \(k\) is the distance for which \(k\) or fewer skips are used to construct the \(n\)-gram. Thus a "4-skip-n-gram" defined asskip = 0:4
produces results that include 4 skips, 3 skips, 2 skips, 1 skip, and 0 skips (where 0 skips are typical n-grams formed from adjacent words). See Guthrie et al (2006).- ...
additional arguments passed to
textstat_keyness()
.- p
threshold for statistical significance of collocations.