Artificial Intelligence , Machine Learning and Data Science Hubspot

Unlock the Power of Artificial Intelligence, Machine Learning, and Data Science with our Blog Discover the latest insights, trends, and innovations in Artificial Intelligence (AI), Machine Learning (ML), and Data Science through our informative and engaging Hubspot blog. Gain a deep understanding of how these transformative technologies are shaping industries and revolutionizing the way we work. Stay updated with cutting-edge advancements, practical applications, and real-world use.

Saturday, 6 May 2023

How to use serialized CRFClassifier with StanfordCoreNLP prop 'ner'

I'm using the StanfordCoreNLP API interface to programmatically do some basic NLP. I need to train a model on my own corpus, but I'd like to use the StanfordCoreNLP interface to do it, because it handles a lot of the dry mechanics behind the scenes and I don't need much specialization there.

I've trained a CRFClassifier that I'd like to use for NER, serialized to a file. Based on the documentation, I'd think the following would work, but it doesn't seem to find my model and instead barfs on not being able to find the standard models (I'm not sure why I don't have those model files, but I'm not concerned about it since I don't want to use them anyway):

    // String constants
    final String serializedClassifierFilename = "/absolute/path/to/model.ser.gz";

    Properties props = new Properties();
    props.setProperty("annotators", "tokenize, ssplit, ner");
    props.setProperty("ner.models", serializedClassifierFilename);

    StanfordCoreNLP pipeline = new StanfordCoreNLP(props);

    String fileContents = IOUtils.slurpFileNoExceptions("test.txt");
    Annotation document = new Annotation(fileContents);

Results in:

Adding annotator tokenize
TokenizerAnnotator: No tokenizer type provided. Defaulting to PTBTokenizer.
Adding annotator ssplit
Adding annotator ner
Loading classifier from /path/build/edu/stanford/nlp/models/ner/english.all.3class.distsim.crf.ser.gz ... java.io.FileNotFoundException: edu/stanford/nlp/models/ner/english.all.3class.distsim.crf.ser.gz (No such file or directory)
    at java.io.FileInputStream.open0(Native Method)
    at java.io.FileInputStream.open(FileInputStream.java:195)
    at java.io.FileInputStream.<init>(FileInputStream.java:138)
    at edu.stanford.nlp.ie.AbstractSequenceClassifier.loadClassifier(AbstractSequenceClassifier.java:1554)

etc., etc.

I know that I don't have their built-in model (again, not sure why.. I just cloned their git repo and compiled with ant compile. Regardless, I don't want to use their model anyway, I want to use the one I trained).

How can I get the StanfordCoreNLP interface to use my model in the ner step? Is possible? Is not possible?

------------

A:The property name is ner.model, not ner.models, so your code is still trying to load the default models.

Let me know if this is documented incorrectly somewhere.

Artificial Intelligence , Machine Learning and Data Science Hubspot

Saturday, 6 May 2023

How to use serialized CRFClassifier with StanfordCoreNLP prop 'ner'

No comments:

Post a Comment

Report Abuse

Labels

"Donate for a Noble Cause