Artificial Intelligence , Machine Learning and Data Science Hubspot

Unlock the Power of Artificial Intelligence, Machine Learning, and Data Science with our Blog Discover the latest insights, trends, and innovations in Artificial Intelligence (AI), Machine Learning (ML), and Data Science through our informative and engaging Hubspot blog. Gain a deep understanding of how these transformative technologies are shaping industries and revolutionizing the way we work. Stay updated with cutting-edge advancements, practical applications, and real-world use.

Saturday, 6 May 2023

Transformer summariser pipeline giving different results on same model with fixed seed

I am using a HuggingFace summariser pipeline and I noticed that if I train a model for 3 epochs and then at the end run evaluation on all 3 epochs with fixed random seeds, I get a different results based on whether I restart the python console 3 times or whether I load the different model (one for every epoch) on the same summariser object in a loop, and I would like to understand why we have this strange behaviour.

While my results are based on ROUGE score on a large dataset, I have made this small reproducible example to show this issue. Instead of using the weights of the same model at different training epochs, I decided to demonstrate using two different summarization models, but the effect is the same. Grateful for any help.

Notice how in the first run I firstly use the facebook/bart-large-cnn model and then the lidiya/bart-large-xsum-samsum model without shutting the python terminal. In the second run I only use lidiya/bart-large-xsum-samsum model and get different output (which should not be the case).

NOTE: this reproducible example won't work on a CPU machine as it doesn't seem sensitive to torch.use_deterministic_algorithms(True) and it might give different results every time when run on a CPU, so should be reproduced on a GPU.

FIRST RUN

from transformers import AutoTokenizer,  AutoModelForSeq2SeqLM, pipeline
import torch

# random text taken from UK news website
text = """
The veteran retailer Stuart Rose has urged the government to do more to shield the poorest from double-digit inflation, describing the lack of action as “horrifying”, with a prime minister “on shore leave” leaving a situation where “nobody is in charge”.
Responding to July’s 10.1% headline rate, the Conservative peer and Asda chair said: “We have been very, very slow in recognising this train coming down the tunnel and it’s run quite a lot of people over and we now have to deal with the aftermath.”
Attacking a lack of leadership while Boris Johnson is away on holiday, he said: “We’ve got to have some action. The captain of the ship is on shore leave, right, nobody’s in charge at the moment.”
Lord Rose, who is a former boss of Marks & Spencer, said action was needed to kill “pernicious” inflation, which he said “erodes wealth over time”. He dismissed claims by the Tory leadership candidate Liz Truss’s camp that it would be possible for the UK to grow its way out of the crisis.
"""

seed = 42
torch.cuda.manual_seed_all(seed)
torch.use_deterministic_algorithms(True)
tokenizer = AutoTokenizer.from_pretrained("facebook/bart-large-cnn")
model = AutoModelForSeq2SeqLM.from_pretrained("facebook/bart-large-cnn")
model.eval()
summarizer = pipeline(
    "summarization", model=model, tokenizer=tokenizer, 
    num_beams=5, do_sample=True, no_repeat_ngram_size=3, device=0
)

output = summarizer(text, truncation=True)

tokenizer = AutoTokenizer.from_pretrained("lidiya/bart-large-xsum-samsum")
model = AutoModelForSeq2SeqLM.from_pretrained("lidiya/bart-large-xsum-samsum")
model.eval()
summarizer = pipeline(
    "summarization", model=model, tokenizer=tokenizer, 
    num_beams=5, do_sample=True, no_repeat_ngram_size=3, device=0
)

output = summarizer(text, truncation=True)
print(output)

output from lidiya/bart-large-xsum-samsum model should be

[{'summary_text': 'The UK economy is in crisis because of inflation. The government has been slow to react to it. Boris Johnson is on holiday.'}]

SECOND RUN (you must restart python to conduct the experiment)

from transformers import AutoTokenizer,  AutoModelForSeq2SeqLM, pipeline
import torch

text = """
The veteran retailer Stuart Rose has urged the government to do more to shield the poorest from double-digit inflation, describing the lack of action as “horrifying”, with a prime minister “on shore leave” leaving a situation where “nobody is in charge”.
Responding to July’s 10.1% headline rate, the Conservative peer and Asda chair said: “We have been very, very slow in recognising this train coming down the tunnel and it’s run quite a lot of people over and we now have to deal with the aftermath.”
Attacking a lack of leadership while Boris Johnson is away on holiday, he said: “We’ve got to have some action. The captain of the ship is on shore leave, right, nobody’s in charge at the moment.”
Lord Rose, who is a former boss of Marks & Spencer, said action was needed to kill “pernicious” inflation, which he said “erodes wealth over time”. He dismissed claims by the Tory leadership candidate Liz Truss’s camp that it would be possible for the UK to grow its way out of the crisis.
"""

seed = 42
torch.cuda.manual_seed_all(seed)
torch.use_deterministic_algorithms(True)

tokenizer = AutoTokenizer.from_pretrained("lidiya/bart-large-xsum-samsum")
model = AutoModelForSeq2SeqLM.from_pretrained("lidiya/bart-large-xsum-samsum")
model.eval()
summarizer = pipeline(
    "summarization", model=model, tokenizer=tokenizer, 
    num_beams=5, do_sample=True, no_repeat_ngram_size=3, device=0
)

output = summarizer(text, truncation=True)
print(output)

output should be

[{'summary_text': 'The government has been slow to deal with inflation. Stuart Rose has urged the government to do more to shield the poorest from double-digit inflation.'}]

Why is the first output different from the second one?

from transformers import AutoTokenizer,  AutoModelForSeq2SeqLM, pipeline
import torch

# random text taken from UK news website
text = """
The veteran retailer Stuart Rose has urged the government to do more to shield the poorest from double-digit inflation, describing the lack of action as “horrifying”, with a prime minister “on shore leave” leaving a situation where “nobody is in charge”.
Responding to July’s 10.1% headline rate, the Conservative peer and Asda chair said: “We have been very, very slow in recognising this train coming down the tunnel and it’s run quite a lot of people over and we now have to deal with the aftermath.”
Attacking a lack of leadership while Boris Johnson is away on holiday, he said: “We’ve got to have some action. The captain of the ship is on shore leave, right, nobody’s in charge at the moment.”
Lord Rose, who is a former boss of Marks & Spencer, said action was needed to kill “pernicious” inflation, which he said “erodes wealth over time”. He dismissed claims by the Tory leadership candidate Liz Truss’s camp that it would be possible for the UK to grow its way out of the crisis.
"""

seed = 42
torch.cuda.manual_seed_all(seed)
torch.use_deterministic_algorithms(True)
tokenizer = AutoTokenizer.from_pretrained("facebook/bart-large-cnn")
model = AutoModelForSeq2SeqLM.from_pretrained("facebook/bart-large-cnn")
model.eval()
summarizer = pipeline(
    "summarization", model=model, tokenizer=tokenizer, 
    num_beams=5, do_sample=True, no_repeat_ngram_size=3, device=0
)

output = summarizer(text, truncation=True)

seed = 42
torch.cuda.manual_seed_all(seed)
torch.use_deterministic_algorithms(True)

tokenizer = AutoTokenizer.from_pretrained("lidiya/bart-large-xsum-samsum")
model = AutoModelForSeq2SeqLM.from_pretrained("lidiya/bart-large-xsum-samsum")
model.eval()
summarizer = pipeline(
    "summarization", model=model, tokenizer=tokenizer, 
    num_beams=5, do_sample=True, no_repeat_ngram_size=3, device=0
)

output = summarizer(text, truncation=True)
print(output)

Hi @joe32140 that's a great point, thanks. May I ask as a follow up why we need to do this? What happens if we don't re-seed? Why would the seed only be used by the first pipeline?
– andrea
The seeding is to ensure the reproducibility of the same script. In your case, you are running two different scripts so we should not expect them to generate the same result. My example is a simple example to explain why you got different summaries from the model but might not the right way to use seed.
– joe32140

Artificial Intelligence , Machine Learning and Data Science Hubspot

Saturday, 6 May 2023

Transformer summariser pipeline giving different results on same model with fixed seed

No comments:

Post a Comment

Positional Encodings in Transformer Models

Report Abuse

Labels

"Donate for a Noble Cause