Skip to contents

Identify context words using user-provided patterns

Usage

textstat_context(
  x,
  pattern,
  valuetype = c("glob", "regex", "fixed"),
  case_insensitive = TRUE,
  window = 10,
  min_count = 10,
  remove_pattern = TRUE,
  n = 1,
  skip = 0,
  ...
)

char_context(
  x,
  pattern,
  valuetype = c("glob", "regex", "fixed"),
  case_insensitive = TRUE,
  window = 10,
  min_count = 10,
  remove_pattern = TRUE,
  p = 0.001,
  n = 1,
  skip = 0
)

Arguments

x

a tokens object created by quanteda::tokens().

pattern

quanteda::pattern() to specify target words.

valuetype

the type of pattern matching: "glob" for "glob"-style wildcard expressions; "regex" for regular expressions; or "fixed" for exact matching. See quanteda::valuetype() for details.

case_insensitive

if TRUE, ignore case when matching.

window

size of window for collocation analysis.

min_count

minimum frequency of words within the window to be considered as collocations.

remove_pattern

if TRUE, keywords do not contain target words.

n

integer vector specifying the number of elements to be concatenated in each n-gram. Each element of this vector will define a \(n\) in the \(n\)-gram(s) that are produced.

skip

integer vector specifying the adjacency skip size for tokens forming the n-grams, default is 0 for only immediately neighbouring words. For skipgrams, skip can be a vector of integers, as the "classic" approach to forming skip-grams is to set skip = \(k\) where \(k\) is the distance for which \(k\) or fewer skips are used to construct the \(n\)-gram. Thus a "4-skip-n-gram" defined as skip = 0:4 produces results that include 4 skips, 3 skips, 2 skips, 1 skip, and 0 skips (where 0 skips are typical n-grams formed from adjacent words). See Guthrie et al (2006).

...

additional arguments passed to textstat_keyness().

p

threshold for statistical significance of collocations.

See also

tokens_select() and textstat_keyness()