Hypothesis

Usually, rule-based approaches have high precision but low recall. Now, another approach would be to build some machine learning system. To do that, first of all you need some training data. So you need a corpus with some markup. So here, you have a sequence of words and you know that these certain phrases have these certain texts. Right? Like origin, destination, and date. After you have your training data, you need to do some feature engineering. So you need to create features like for example is the word capitalized? Or does this word occur in some list of Cities or something like that. Then, you need to define your model. So the probabilistic model would for example produce the probabilities of your text given your words.

Steps required before Training Data for NLP

#nlp

Tags

Annotators

URL