Artificial Intelligence , Machine Learning and Data Science Hubspot

Unlock the Power of Artificial Intelligence, Machine Learning, and Data Science with our Blog Discover the latest insights, trends, and innovations in Artificial Intelligence (AI), Machine Learning (ML), and Data Science through our informative and engaging Hubspot blog. Gain a deep understanding of how these transformative technologies are shaping industries and revolutionizing the way we work. Stay updated with cutting-edge advancements, practical applications, and real-world use.

Sunday, 7 May 2023

Huggin Face Conversational error: error: argument --model: invalid choice: 'models/' (choose from 'openai-gpt', 'gpt2')

I´m trying to replicate the results of this repo:

https://github.com/huggingface/transfer-learning-conv-ai

For that I'm following the basic example that is not based on docker:

git clone https://github.com/huggingface/transfer-learning-conv-ai
cd transfer-learning-conv-ai
pip install -r requirements.txt
python -m spacy download en

Then I try:

python3 interact.py --model models/

And there I get this error:

  np_resource = np.dtype([("resource", np.ubyte, 1)])
usage: interact.py [-h] [--dataset_path DATASET_PATH]
                   [--dataset_cache DATASET_CACHE] [--model {openai-gpt,gpt2}]
                   [--model_checkpoint MODEL_CHECKPOINT]
                   [--max_history MAX_HISTORY] [--device DEVICE] [--no_sample]
                   [--max_length MAX_LENGTH] [--min_length MIN_LENGTH]
                   [--seed SEED] [--temperature TEMPERATURE] [--top_k TOP_K]
                   [--top_p TOP_P]
interact.py: error: argument --model: invalid choice: 'models/' (choose from 'openai-gpt', 'gpt2')

First thing I notice is that there was not any "models" directory hence I created one and tried again, got the same error.

Second thing I tried was to download the model as in the repo it specifies:

We make a pretrained and fine-tuned model available on our S3 here

From that link I tried:

wget https://s3.amazonaws.com/models.huggingface.co/transfer-learning-chatbot/finetuned_chatbot_gpt.tar.gz

And uncompress the files both in the main directory and in the models directory and tried again.

For the third time, I tried and got the same error.

This is the current structure of my working dir:

Dockerfile   config.json                   interact.py              pytorch_model.bin       train.py
LICENCE      convai_evaluation.py          merges.txt               requirements.txt        utils.py
README.md    example_entry.py              model_training_args.bin  special_tokens.txt      vocab.json
__pycache__  finetuned_chatbot_gpt.tar.gz  models                   test_special_tokens.py

EDIT

Tried kimbo´s sugestion:

python3 interact.py --model gpt2

I get this error now:

 File "interact.py", line 154, in <module>
    run()
  File "interact.py", line 114, in run
    raise ValueError("Interacting with GPT2 requires passing a finetuned model_checkpoint")
ValueError: Interacting with GPT2 requires passing a finetuned model_checkpoint

Also tried just runing:

python3 interact.py

For that I have not get any error, it seems to get stuck at this point:

INFO:/home/lramirez/transfer-learning-conv-ai/utils.py:Download dataset from https://s3.amazonaws.com/datasets.huggingface.co/personachat/personachat_self_original.json
INFO:/home/lramirez/transfer-learning-conv-ai/utils.py:Tokenize and encode the dataset

I have been there for about 30min

New Update

It's taking forever to tokenize the dataset because it's tokenizing the entire dataset, which is a 200 MB JSON file.

To make it MUCH faster, just load part of the dataset.

Open up utils.py and change the tokenize function:

def tokenize(obj):
    if isinstance(obj, str):
        return tokenizer.convert_tokens_to_ids(tokenizer.tokenize(obj))
    if isinstance(obj, dict):
        return dict((n, tokenize(o)) for n, o in obj.items())
    limit = 100  # <- this is the number of items in the dataset to load
    return list(tokenize(o) for o in obj[:limit])  # <- change it here

That will only load the first 100 items in the dataset.

Old Answer

When I'm unsure how to use a python script (or anything you run from the command line, really), I usually try a couple things to figure it out.

python script.py -h or python script.py --help. Often that will print out an explanation of the arguments the script is expecting and how to run it.
If it's an executable command you installed, I always try man <executable>. Probably won't work in this case since you just cloned the repo from GitHub and didn't install anything.
If I still don't understand how to use the script because the above didn't work, I go online and look for some documentation (a Github README, a wiki, readthedocs, etc)
If it's documented poorly, I just look at the source code. Sometimes I skip straight to this part because for smaller stuff it's often quicker.

In this case, I read the README on Github and that didn't tell me all that much, so I took a look at interact.py. If you look starting at line 139 (https://github.com/huggingface/transfer-learning-conv-ai/blob/master/interact.py#L139), it appears they're in a while loop, waiting for you to input something to feed to the model.

/end update

This part:

(choose from 'openai-gpt', 'gpt2')

should tell you all you need to know.

Try running

python3 interact.py --model gpt2

python3 interact.py --model openai-gpt

if args.model == 'gpt2':
            raise ValueError("Interacting with GPT2 requires passing a finetuned model_checkpoint")

Artificial Intelligence , Machine Learning and Data Science Hubspot

Sunday, 7 May 2023

Huggin Face Conversational error: error: argument --model: invalid choice: 'models/' (choose from 'openai-gpt', 'gpt2')

New Update

Old Answer

No comments:

Post a Comment

Report Abuse

Labels

"Donate for a Noble Cause