Hire a web Developer and Designer to upgrade and boost your online presence with cutting edge Technologies

Thursday, 18 May 2023

DeepFake with Python

 DeepFake is composed of Deep Learning and Fake means taking one person from an image or video and replacing it with someone else likeness using technology such as Deep Artificial Neural Networks [1].

Kaggle dataset is used https://www.kaggle.com/c/deepfake-detection-challenge/data

Github Reference: https://github.com/ageitgey/face_recognition

!pip install face_recognition

Data

  • The data is comprised of .mp4 files, split into ~10GB apiece. A metadata.json accompanies each set of .mp4 files and contains the filename, label (REAL/FAKE), original and split columns, listed below under Columns.
  • The full training set is just over 470 GB.

References: https://deepfakedetectionchallenge.ai/faqs

Data exploration

DATA_FOLDER = '../input/deepfake-detection-challenge'
TRAIN_SAMPLE_FOLDER = 'train_sample_videos'
TEST_FOLDER = 'test_videos'

print(f"Train samples: {len(os.listdir(os.path.join(DATA_FOLDER, TRAIN_SAMPLE_FOLDER)))}")
print(f"Test samples: {len(os.listdir(os.path.join(DATA_FOLDER, TEST_FOLDER)))}")

Files

  • train_sample_videos.zip
  • sample_submission.csv
  • test_videos.zip

Metadata Columns

  • filename - the filename of the video
  • label - whether the video is REAL or FAKE
  • original - in the case that a train set video is FAKE, the original video is listed here
  • split - this is always equal to “train”.

Check files type

train_list = list(os.listdir(os.path.join(DATA_FOLDER, TRAIN_SAMPLE_FOLDER)))
ext_dict = []
for file in train_list:
file_ext = file.split('.')[1]
if (file_ext not in ext_dict):
ext_dict.append(file_ext)
print(f"Extensions: {ext_dict}")

Video data exploration

We check first if the list of files in the meta info and the list from the folder are the same.

meta = np.array(list(meta_train_df.index))
storage = np.array([file for file in train_list if file.endswith('mp4')])
print(f"Metadata: {meta.shape[0]}, Folder: {storage.shape[0]}")
print(f"Files in metadata and not in folder: {np.setdiff1d(meta,storage,assume_unique=False).shape[0]}")
print(f"Files in folder and not in metadata: {np.setdiff1d(storage,meta,assume_unique=False).shape[0]}")

Example of Fake video aagfhgtpmv.mp4

Let's use the face_recognition package to detect faces in the video

Check out this great kernel here https://www.kaggle.com/brassmonkey381/a-quick-look-at-the-first-frame-of-each-video for how I learned to capture a frame from the video file.

import cv2 as cv
import os
import matplotlib.pylab as plt
train_dir = '/kaggle/input/deepfake-detection-challenge/train_sample_videos/'
fig, ax = plt.subplots(1,1, figsize=(15, 15))
train_video_files = [train_dir + x for x in os.listdir(train_dir)]
# video_file = train_video_files[30]
video_file = '/kaggle/input/deepfake-detection-challenge/train_sample_videos/akxoopqjqz.mp4'
cap = cv.VideoCapture(video_file)
success, image = cap.read()
image = cv.cvtColor(image, cv.COLOR_BGR2RGB)
cap.release()
ax.imshow(image)
ax.xaxis.set_visible(False)
ax.yaxis.set_visible(False)
ax.title.set_text(f"FRAME 0: {video_file.split('/')[-1]}")
plt.grid(False)

Now, use OpenCV to detect the faces using the face_recognition package! First, we need to pip install it. Make sure you have internet turned on in your kernel.

Reference: https://github.com/ageitgey/face_recognition

Frame by Frame Face Detection

  • First we will loop through the frames of the video file and append them to a list called frames
video_file = '/kaggle/input/deepfake-detection-challenge/train_sample_videos/akxoopqjqz.mp4'

cap = cv2.VideoCapture(video_file)

frames = []
while(cap.isOpened()):
ret, frame = cap.read()
if ret==True:
frames.append(frame)
if cv2.waitKey(1) & 0xFF == ord('q'):
break
else:
break
cap.release()

print('The number of frames saved: ', len(frames))

No comments:

Post a Comment

Connect broadband

How to Develop a Character-Based Neural Language Model in Keras

  A   language model   predicts the next word in the sequence based on the specific words that have come before it in the sequence. It is al...