RNN Application Classify text with BERT

Tensorflow also provides BERT (Bidirectional Encoder Representation from Transformers) for text classification. The BERT family of models uses the Transformer encoder architecture to process each token of input text in the full context of all tokens before and after. It’s already pre-trained on a large corpus.

There are multiple BERT models available in Tensorflow Hub. Among which, ALBERT: four different sizes of “A Lite BERT” that reduces model size (but not computation time) by sharing parameters between layers.

I need to go through a whole process using IMDB movie review dataset containing the text of 50,000 movie reviews again to practice using BERT, then migrate to hydrogen dataset classification.

#note this is for imdb, what if my files are from google doc?
url = 'https://ai.stanford.edu/~amaas/data/sentiment/aclImdb_v1.tar.gz'
dataset = tf.keras.utils.get_file('aclImdb_v1.tar.gz', url,
                                  untar=True, cache_dir='.',
                                  cache_subdir='')
dataset_dir = os.path.join(os.path.dirname(dataset), 'aclImdb')
train_dir = os.path.join(dataset_dir, 'train')
# remove unused folders to make it easier to load the data
remove_dir = os.path.join(train_dir, 'unsup')
shutil.rmtree(remove_dir)
#now process data after splitting them into train, validation and test data
#need to know autotune, cache, prefetch usages
AUTOTUNE = tf.data.AUTOTUNE
batch_size = 32
seed = 42

raw_train_ds = tf.keras.preprocessing.text_dataset_from_directory(
    'aclImdb/train',
    batch_size=batch_size,
    validation_split=0.2,
    subset='training',
    seed=seed)
class_names = raw_train_ds.class_names
train_ds = raw_train_ds.cache().prefetch(buffer_size=AUTOTUNE)

val_ds = tf.keras.preprocessing.text_dataset_from_directory(
    'aclImdb/train',
    batch_size=batch_size,
    validation_split=0.2,
    subset='validation',
    seed=seed)
val_ds = val_ds.cache().prefetch(buffer_size=AUTOTUNE)

test_ds = tf.keras.preprocessing.text_dataset_from_directory(
    'aclImdb/test',
    batch_size=batch_size)

test_ds = test_ds.cache().prefetch(buffer_size=AUTOTUNE)

Then choose the BERT model from Tensforflow hub:

BERT model selected : https://tfhub.dev/tensorflow/small_bert/bert_en_uncased_L-4_H-512_A-8/1 Preprocess model auto-selected: https://tfhub.dev/tensorflow/bert_en_uncased_preprocess/3

After BERT models is selected, try a test_text applying preprocess_model:

bert_preprocess_model = hub.KerasLayer(tfhub_handle_preprocess)
text_test = ['this is such an amazing movie!']
text_preprocessed = bert_preprocess_model(text_test)

Now use BERT model

bert_model = hub.KerasLayer(tfhub_handle_encoder)
bert_results = bert_model(text_preprocessed)
print(f'Loaded BERT: {tfhub_handle_encoder}')
print(f'Pooled Outputs Shape:{bert_results["pooled_output"].shape}')
print(f'Pooled Outputs Values:{bert_results["pooled_output"][0, :12]}')
print(f'Sequence Outputs Shape:{bert_results["sequence_output"].shape}')
print(f'Sequence Outputs Values:{bert_results["sequence_output"][0, :12]}')
Loaded BERT: https://tfhub.dev/tensorflow/small_bert/bert_en_uncased_L-4_H-512_A-8/1
Pooled Outputs Shape:(1, 512)
Pooled Outputs Values:[ 0.7626284   0.9928099  -0.18611862  0.36673862  0.15233698  0.6550447
  0.9681154  -0.94862705  0.00216154 -0.98777324  0.0684273  -0.97630596]
Sequence Outputs Shape:(1, 128, 512)
Sequence Outputs Values:[[-0.28946292  0.3432122   0.3323146  ...  0.2130091   0.7102076
  -0.05771127]
 [-0.28742066  0.31980988 -0.23018472 ...  0.58454984 -0.21329743
   0.7269208 ]
 [-0.6615696   0.688769   -0.87432975 ...  0.1087725  -0.26173288
   0.47855526]
 ...
 [-0.22561064 -0.28925598 -0.07064444 ...  0.4756608   0.832771
   0.40025362]
 [-0.2982425  -0.27473134 -0.05450555 ...  0.48849788  1.0955352
   0.18163432]
 [-0.44378024  0.00930739  0.07223781 ...  0.1729011   1.1833246
   0.0789801 ]]

pooled_output is the embedding of the whole movie dataset, sequence_output represents each input token in the context, encoder_outputs are the intermediate activations of L transformer blocks.

With the preprocessing model, the selected BERT model, one Dense and a Dropout layer we can create a refined model.

def build_classifier_model():
  text_input = tf.keras.layers.Input(shape=(), dtype=tf.string, name='text')
  preprocessing_layer = hub.KerasLayer(tfhub_handle_preprocess, name='preprocessing')
  encoder_inputs = preprocessing_layer(text_input)
  encoder = hub.KerasLayer(tfhub_handle_encoder, trainable=True, name='BERT_encoder')
  outputs = encoder(encoder_inputs)
  net = outputs['pooled_output']
  net = tf.keras.layers.Dropout(0.1)(net)
  net = tf.keras.layers.Dense(1, activation=None, name='classifier')(net)
  return tf.keras.Model(text_input, net)

Now runs this refined classifier

classifier_model = build_classifier_model()
bert_raw_result = classifier_model(tf.constant(text_test))
print(tf.sigmoid(bert_raw_result))
#the result is meaningless because it's not trained yet, so far the steps are like

Since it is a binary classification problem and the model outputs a probability (a single-unit layer), you’ll use losses.BinaryCrossentropy loss function.

loss = tf.keras.losses.BinaryCrossentropy(from_logits=True)
metrics = tf.metrics.BinaryAccuracy()
#then optimize
epochs = 5
steps_per_epoch = tf.data.experimental.cardinality(train_ds).numpy()
num_train_steps = steps_per_epoch * epochs
num_warmup_steps = int(0.1*num_train_steps)

init_lr = 3e-5
optimizer = optimization.create_optimizer(init_lr=init_lr,
                                          num_train_steps=num_train_steps,
                                          num_warmup_steps=num_warmup_steps,
                                          optimizer_type='adamw')

Now we can load the model and train, then evaluate and plot the accuracy etc.

classifier_model.compile(optimizer=optimizer,
                         loss=loss,
                         metrics=metrics)

Lastly, export it for inference

dataset_name = 'imdb'
saved_model_path = './{}_bert'.format(dataset_name.replace('/', '_'))

classifier_model.save(saved_model_path, include_optimizer=False)
reloaded_model = tf.saved_model.load(saved_model_path)

def print_my_examples(inputs, results):
  result_for_printing = \
    [f'input: {inputs[i]:<30} : score: {results[i][0]:.6f}'
                         for i in range(len(inputs))]
  print(*result_for_printing, sep='\n')
  print()
examples = [
    'this is such an amazing movie!',  # this is the same sentence tried earlier
    'The movie was great!',
    'The movie was meh.',
    'The movie was okish.',
    'The movie was terrible...'
]
reloaded_results = tf.sigmoid(reloaded_model(tf.constant(examples)))
original_results = tf.sigmoid(classifier_model(tf.constant(examples)))
print('Results from the saved model:')
print_my_examples(examples, reloaded_results)
print('Results from the model in memory:')
print_my_examples(examples, original_results)

Results from the saved model:
input: this is such an amazing movie! : score: 0.999579
input: The movie was great!           : score: 0.992434
input: The movie was meh.             : score: 0.850666
input: The movie was okish.           : score: 0.028051
input: The movie was terrible...      : score: 0.001664

Results from the model in memory:
input: this is such an amazing movie! : score: 0.999579
input: The movie was great!           : score: 0.992434
input: The movie was meh.             : score: 0.850666
input: The movie was okish.           : score: 0.028051
input: The movie was terrible...      : score: 0.001664

Leave a comment

This site uses Akismet to reduce spam. Learn how your comment data is processed.