Hire a web Developer and Designer to upgrade and boost your online presence with cutting edge Technologies

Saturday, 6 May 2023

How to use serialized CRFClassifier with StanfordCoreNLP prop 'ner'

 I'm using the StanfordCoreNLP API interface to programmatically do some basic NLP. I need to train a model on my own corpus, but I'd like to use the StanfordCoreNLP interface to do it, because it handles a lot of the dry mechanics behind the scenes and I don't need much specialization there.

I've trained a CRFClassifier that I'd like to use for NER, serialized to a file. Based on the documentation, I'd think the following would work, but it doesn't seem to find my model and instead barfs on not being able to find the standard models (I'm not sure why I don't have those model files, but I'm not concerned about it since I don't want to use them anyway):

    // String constants
    final String serializedClassifierFilename = "/absolute/path/to/model.ser.gz";

    Properties props = new Properties();
    props.setProperty("annotators", "tokenize, ssplit, ner");
    props.setProperty("ner.models", serializedClassifierFilename);

    StanfordCoreNLP pipeline = new StanfordCoreNLP(props);

    String fileContents = IOUtils.slurpFileNoExceptions("test.txt");
    Annotation document = new Annotation(fileContents);

Results in:

Adding annotator tokenize
TokenizerAnnotator: No tokenizer type provided. Defaulting to PTBTokenizer.
Adding annotator ssplit
Adding annotator ner
Loading classifier from /path/build/edu/stanford/nlp/models/ner/english.all.3class.distsim.crf.ser.gz ... java.io.FileNotFoundException: edu/stanford/nlp/models/ner/english.all.3class.distsim.crf.ser.gz (No such file or directory)
    at java.io.FileInputStream.open0(Native Method)
    at java.io.FileInputStream.open(FileInputStream.java:195)
    at java.io.FileInputStream.<init>(FileInputStream.java:138)
    at edu.stanford.nlp.ie.AbstractSequenceClassifier.loadClassifier(AbstractSequenceClassifier.java:1554)

etc., etc.

I know that I don't have their built-in model (again, not sure why.. I just cloned their git repo and compiled with ant compile. Regardless, I don't want to use their model anyway, I want to use the one I trained).

How can I get the StanfordCoreNLP interface to use my model in the ner step? Is possible? Is not possible?

------------

A:The property name is ner.model, not ner.models, so your code is still trying to load the default models.

Let me know if this is documented incorrectly somewhere.


No comments:

Post a Comment

Connect broadband

Oxford Course on Deep Learning for Natural Language Processing

  Deep Learning methods achieve state-of-the-art results on a suite of   natural language processing   problems What makes this exciting is ...