I am using a HuggingFace summariser pipeline and I noticed that if I train a model for 3 epochs and then at the end run evaluation on all 3 epochs with fixed random seeds, I get a different results based on whether I restart the python console 3 times or whether I load the different model (one for every epoch) on the same summariser object in a loop, and I would like to understand why we have this strange behaviour.
While my results are based on ROUGE score on a large dataset, I have made this small reproducible example to show this issue. Instead of using the weights of the same model at different training epochs, I decided to demonstrate using two different summarization models, but the effect is the same. Grateful for any help.
Notice how in the first run I firstly use the facebook/bart-large-cnn
model and then the lidiya/bart-large-xsum-samsum
model without shutting the python terminal. In the second run I only use lidiya/bart-large-xsum-samsum
model and get different output (which should not be the case).
NOTE: this reproducible example won't work on a CPU machine as it doesn't seem sensitive to torch.use_deterministic_algorithms(True)
and it might give different results every time when run on a CPU, so should be reproduced on a GPU.
FIRST RUN
from transformers import AutoTokenizer, AutoModelForSeq2SeqLM, pipeline
import torch
# random text taken from UK news website
text = """
The veteran retailer Stuart Rose has urged the government to do more to shield the poorest from double-digit inflation, describing the lack of action as “horrifying”, with a prime minister “on shore leave” leaving a situation where “nobody is in charge”.
Responding to July’s 10.1% headline rate, the Conservative peer and Asda chair said: “We have been very, very slow in recognising this train coming down the tunnel and it’s run quite a lot of people over and we now have to deal with the aftermath.”
Attacking a lack of leadership while Boris Johnson is away on holiday, he said: “We’ve got to have some action. The captain of the ship is on shore leave, right, nobody’s in charge at the moment.”
Lord Rose, who is a former boss of Marks & Spencer, said action was needed to kill “pernicious” inflation, which he said “erodes wealth over time”. He dismissed claims by the Tory leadership candidate Liz Truss’s camp that it would be possible for the UK to grow its way out of the crisis.
"""
seed = 42
torch.cuda.manual_seed_all(seed)
torch.use_deterministic_algorithms(True)
tokenizer = AutoTokenizer.from_pretrained("facebook/bart-large-cnn")
model = AutoModelForSeq2SeqLM.from_pretrained("facebook/bart-large-cnn")
model.eval()
summarizer = pipeline(
"summarization", model=model, tokenizer=tokenizer,
num_beams=5, do_sample=True, no_repeat_ngram_size=3, device=0
)
output = summarizer(text, truncation=True)
tokenizer = AutoTokenizer.from_pretrained("lidiya/bart-large-xsum-samsum")
model = AutoModelForSeq2SeqLM.from_pretrained("lidiya/bart-large-xsum-samsum")
model.eval()
summarizer = pipeline(
"summarization", model=model, tokenizer=tokenizer,
num_beams=5, do_sample=True, no_repeat_ngram_size=3, device=0
)
output = summarizer(text, truncation=True)
print(output)
output from lidiya/bart-large-xsum-samsum
model should be
[{'summary_text': 'The UK economy is in crisis because of inflation. The government has been slow to react to it. Boris Johnson is on holiday.'}]
SECOND RUN (you must restart python to conduct the experiment)
from transformers import AutoTokenizer, AutoModelForSeq2SeqLM, pipeline
import torch
text = """
The veteran retailer Stuart Rose has urged the government to do more to shield the poorest from double-digit inflation, describing the lack of action as “horrifying”, with a prime minister “on shore leave” leaving a situation where “nobody is in charge”.
Responding to July’s 10.1% headline rate, the Conservative peer and Asda chair said: “We have been very, very slow in recognising this train coming down the tunnel and it’s run quite a lot of people over and we now have to deal with the aftermath.”
Attacking a lack of leadership while Boris Johnson is away on holiday, he said: “We’ve got to have some action. The captain of the ship is on shore leave, right, nobody’s in charge at the moment.”
Lord Rose, who is a former boss of Marks & Spencer, said action was needed to kill “pernicious” inflation, which he said “erodes wealth over time”. He dismissed claims by the Tory leadership candidate Liz Truss’s camp that it would be possible for the UK to grow its way out of the crisis.
"""
seed = 42
torch.cuda.manual_seed_all(seed)
torch.use_deterministic_algorithms(True)
tokenizer = AutoTokenizer.from_pretrained("lidiya/bart-large-xsum-samsum")
model = AutoModelForSeq2SeqLM.from_pretrained("lidiya/bart-large-xsum-samsum")
model.eval()
summarizer = pipeline(
"summarization", model=model, tokenizer=tokenizer,
num_beams=5, do_sample=True, no_repeat_ngram_size=3, device=0
)
output = summarizer(text, truncation=True)
print(output)
output should be
[{'summary_text': 'The government has been slow to deal with inflation. Stuart Rose has urged the government to do more to shield the poorest from double-digit inflation.'}]
Why is the first output different from the second one?
from transformers import AutoTokenizer, AutoModelForSeq2SeqLM, pipeline
import torch
# random text taken from UK news website
text = """
The veteran retailer Stuart Rose has urged the government to do more to shield the poorest from double-digit inflation, describing the lack of action as “horrifying”, with a prime minister “on shore leave” leaving a situation where “nobody is in charge”.
Responding to July’s 10.1% headline rate, the Conservative peer and Asda chair said: “We have been very, very slow in recognising this train coming down the tunnel and it’s run quite a lot of people over and we now have to deal with the aftermath.”
Attacking a lack of leadership while Boris Johnson is away on holiday, he said: “We’ve got to have some action. The captain of the ship is on shore leave, right, nobody’s in charge at the moment.”
Lord Rose, who is a former boss of Marks & Spencer, said action was needed to kill “pernicious” inflation, which he said “erodes wealth over time”. He dismissed claims by the Tory leadership candidate Liz Truss’s camp that it would be possible for the UK to grow its way out of the crisis.
"""
seed = 42
torch.cuda.manual_seed_all(seed)
torch.use_deterministic_algorithms(True)
tokenizer = AutoTokenizer.from_pretrained("facebook/bart-large-cnn")
model = AutoModelForSeq2SeqLM.from_pretrained("facebook/bart-large-cnn")
model.eval()
summarizer = pipeline(
"summarization", model=model, tokenizer=tokenizer,
num_beams=5, do_sample=True, no_repeat_ngram_size=3, device=0
)
output = summarizer(text, truncation=True)
seed = 42
torch.cuda.manual_seed_all(seed)
torch.use_deterministic_algorithms(True)
tokenizer = AutoTokenizer.from_pretrained("lidiya/bart-large-xsum-samsum")
model = AutoModelForSeq2SeqLM.from_pretrained("lidiya/bart-large-xsum-samsum")
model.eval()
summarizer = pipeline(
"summarization", model=model, tokenizer=tokenizer,
num_beams=5, do_sample=True, no_repeat_ngram_size=3, device=0
)
output = summarizer(text, truncation=True)
print(output)
-
Hi @joe32140 that's a great point, thanks. May I ask as a follow up why we need to do this? What happens if we don't re-seed? Why would the seed only be used by the first pipeline?– andrea
-
The seeding is to ensure the reproducibility of the same script. In your case, you are running two different scripts so we should not expect them to generate the same result. My example is a simple example to explain why you got different summaries from the model but might not the right way to use seed.– joe32140