Symbolic NLP
- The lexical level
- Phonetics and Phonology - understanding how sound relates to test.
- Orthography - understaning spelling systems.
- Morphology - understanding how words are constructed.
- Tokenization - splitting text into words, tokens, punctuation, etc.
- Tokenizing by word - the smallest unit that still makes sense on its own.
Allows, e.g., identification of words used often.
- Filtering - removes words that do not contribute to meaning, e.g., "in", "is", etc.
- Stemming and lemmatizing - reducing words to their root, e.g., "helping" and
"helper" both reduce to "help".
- Understemming - two words should be reduced to the same stem but aren't.
- Overstemming - two unrelated words are reduced to the same stem.
- Word-level analysis
- Frequency - density of occurrence of words.
- Concordance - how a word is used in an immediate context.
- Collocations - how words are used in relation to other words.
- Dispersion - how words are used in relation to the text.
- Tokenizing by sentence - allows analysise of words relate to one another, e.g.,
negative words in a sentence.
- Part-of-speech (POS) tagging - labeling each token with its grammatical role, e.g.,
noun, verb, adjective, etc.
- The syntactic level
- Syntactic parsing - understanding sentence structure, dependencies, phrases.
- Phrase chunking - to identify phrases, e.g., "A big fat zero".
- Chinking - excluding patterns of text.
- The semantic level
- Named-Entity Recognition (NER) - identifying entities, e.g., people, places,
organizations, dates, etc.
- Semantic parsing - coreferences, metaphors, ambiguity.
- Word meaning vs. world knowledge
- Compositional semantics (how meanings combine)
- The discourse level tasks
- Coreference resolution - determining when different words/phrases refer to the same
entity.
- Discourse analysis - analyzing relations between sentences or paragraphs, e.g.,
elaboration, contrast, cause-effect, narrative coherence.
- Dialogue coherence.
- Pragmatics - the relationship to the intentions of the source.
- Views from
Behrooz Mansouri;
Diyi Yang.
Exam Style Questions