Artificial Intelligence , Machine Learning and Data Science Hubspot

Unlock the Power of Artificial Intelligence, Machine Learning, and Data Science with our Blog Discover the latest insights, trends, and innovations in Artificial Intelligence (AI), Machine Learning (ML), and Data Science through our informative and engaging Hubspot blog. Gain a deep understanding of how these transformative technologies are shaping industries and revolutionizing the way we work. Stay updated with cutting-edge advancements, practical applications, and real-world use.

Saturday, 6 May 2023

BertModel weights are randomly initialized?

recently, I've been trying to re-implement DiffCSE

During refactoring the codes that the authors uploaded on Github, I've run into some issues.

I have 2 questions

1. If I set seed like set_seed(30), I was under the impression that the model has the same initialized weights, thus making the same result when training. But It feels like I was wrong for example,

config = AutoConfig.from_pretrained('bert-base-uncased')
a = BertModel(config)
b = BertModel(config)
a_query =a.encoder.layer[0].attention.self.query.weight
b_query =b.encoder.layer[0].attention.self.query.weight
a_query == b_query
# tensor([[False, False, False,  ..., False, False, False],
        [False, False, False,  ..., False, False, False],
        [False, False, False,  ..., False, False, False],
        ...,
        [False, False, False,  ..., False, False, False],
        [False, False, False,  ..., False, False, False],
        [False, False, False,  ..., False, False, False]])

print(a_query, b_query)
Parameter containing:
tensor([[ 0.0168, -0.0072,  0.0141,  ...,  0.0060, -0.0098, -0.0361],
        [ 0.0121, -0.0106,  0.0169,  ..., -0.0512,  0.0154, -0.0251],
        [ 0.0252,  0.0375,  0.0215,  ..., -0.0097, -0.0009, -0.0102],
        ...,
        [ 0.0038,  0.0120, -0.0205,  ..., -0.0082, -0.0066,  0.0125],
        [ 0.0032, -0.0330,  0.0073,  ...,  0.0072,  0.0484,  0.0143],
        [-0.0153,  0.0207, -0.0086,  ..., -0.0087, -0.0032,  0.0022]],
       requires_grad=True) Parameter containing:
tensor([[ 0.0239,  0.0236,  0.0181,  ..., -0.0331,  0.0062,  0.0142],
        [-0.0116,  0.0417, -0.0379,  ...,  0.0059,  0.0207,  0.0155],
        [ 0.0178,  0.0017,  0.0064,  ..., -0.0007,  0.0405, -0.0170],
        ...,
        [ 0.0115,  0.0039, -0.0508,  ...,  0.0187,  0.0043, -0.0048],
        [ 0.0025, -0.0079, -0.0132,  ..., -0.0003, -0.0079,  0.0320],
        [-0.0105, -0.0097, -0.0076,  ...,  0.0214, -0.0068,  0.0016]],
       requires_grad=True)

I can't understand why it happens. Also, Every time I execute this code, the weights are different from each case.

2. There are many models provided by Huggingface. When it comes to BERT, they have BertModel, BertForPretraining, BertForMaskedLM,, etc. As far as I know, the only difference between each Bert model is whether they have heads on the top layer or not.
Then, the heads are also pretrained?? or just randomly initailzed weights and provieded for users' convenience.??

====
A:

You have a small misunderstanding of how seeds work. The seed defines how the random values are sampled, it doesn't reset after each sample. This means that the sequences sampled will be the same when starting from the seed. For example, if you have a code like:

seed = 1
sample = sample_4_values()

You should always get the same four values because the seed defined this sequence. In your case you define 2 BERT models without resetting the seed to the starting point for each sample isn't the same! In order to get the same weights to reset the seed before each initialization of BERT

##Edit

To better understand what the seed does you need to think about it as a starting point. Imagine that setting the seed to 30 tells the computer to sample the following numbers: 1,2,3,5,6 Calling the sample function 1 time will return 1. Calling it again will return 2 and so on. What you are basically doing is sampling 2 times but each time your starting point is different.

===============

When you use Autoconfig to load a model, only the configuration information of the model is loaded, such as the number of layers of the model, the dimension of each layer, etc. And when you use AutoModel to load the model, the real parameters of the model are loaded, so you should do as follows:

a = AutoModel.from_pretrained('bert-base-uncased')
b = AutoModel.from_pretrained('bert-base-uncased')

Artificial Intelligence , Machine Learning and Data Science Hubspot

Saturday, 6 May 2023

BertModel weights are randomly initialized?

No comments:

Post a Comment

Report Abuse

Labels

"Donate for a Noble Cause