DeepFake is composed of Deep Learning and Fake means taking one person from an image or video and replacing it with someone else likeness using technology such as Deep Artificial Neural Networks [1].
Kaggle dataset is used https://www.kaggle.com/c/deepfake-detection-challenge/data
Github Reference: https://github.com/ageitgey/face_recognition
!pip install face_recognition
Data
- The data is comprised of .mp4 files, split into ~10GB apiece. A metadata.json accompanies each set of .mp4 files and contains the filename, label (REAL/FAKE), original and split columns, listed below under Columns.
- The full training set is just over 470 GB.
References: https://deepfakedetectionchallenge.ai/faqs
Data exploration
DATA_FOLDER = '../input/deepfake-detection-challenge'
TRAIN_SAMPLE_FOLDER = 'train_sample_videos'
TEST_FOLDER = 'test_videos'
print(f"Train samples: {len(os.listdir(os.path.join(DATA_FOLDER, TRAIN_SAMPLE_FOLDER)))}")
print(f"Test samples: {len(os.listdir(os.path.join(DATA_FOLDER, TEST_FOLDER)))}")
Files
- train_sample_videos.zip
- sample_submission.csv
- test_videos.zip
Metadata Columns
- filename - the filename of the video
- label - whether the video is REAL or FAKE
- original - in the case that a train set video is FAKE, the original video is listed here
- split - this is always equal to “train”.
Check files type
train_list = list(os.listdir(os.path.join(DATA_FOLDER, TRAIN_SAMPLE_FOLDER)))
ext_dict = []
for file in train_list:
file_ext = file.split('.')[1]
if (file_ext not in ext_dict):
ext_dict.append(file_ext)
print(f"Extensions: {ext_dict}")
Video data exploration
We check first if the list of files in the meta info and the list from the folder are the same.
meta = np.array(list(meta_train_df.index))
storage = np.array([file for file in train_list if file.endswith('mp4')])
print(f"Metadata: {meta.shape[0]}, Folder: {storage.shape[0]}")
print(f"Files in metadata and not in folder: {np.setdiff1d(meta,storage,assume_unique=False).shape[0]}")
print(f"Files in folder and not in metadata: {np.setdiff1d(storage,meta,assume_unique=False).shape[0]}")
Example of Fake video aagfhgtpmv.mp4
Let's use the face_recognition package to detect faces in the video
Check out this great kernel here https://www.kaggle.com/brassmonkey381/a-quick-look-at-the-first-frame-of-each-video for how I learned to capture a frame from the video file.
import cv2 as cv
import os
import matplotlib.pylab as plt
train_dir = '/kaggle/input/deepfake-detection-challenge/train_sample_videos/'
fig, ax = plt.subplots(1,1, figsize=(15, 15))
train_video_files = [train_dir + x for x in os.listdir(train_dir)]
# video_file = train_video_files[30]
video_file = '/kaggle/input/deepfake-detection-challenge/train_sample_videos/akxoopqjqz.mp4'
cap = cv.VideoCapture(video_file)
success, image = cap.read()
image = cv.cvtColor(image, cv.COLOR_BGR2RGB)
cap.release()
ax.imshow(image)
ax.xaxis.set_visible(False)
ax.yaxis.set_visible(False)
ax.title.set_text(f"FRAME 0: {video_file.split('/')[-1]}")
plt.grid(False)
Now, use OpenCV to detect the faces using the face_recognition
package! First, we need to pip install it. Make sure you have internet turned on in your kernel.
Reference: https://github.com/ageitgey/face_recognition
Frame by Frame Face Detection
- First we will loop through the frames of the video file and append them to a list called
frames
video_file = '/kaggle/input/deepfake-detection-challenge/train_sample_videos/akxoopqjqz.mp4'
cap = cv2.VideoCapture(video_file)
frames = []
while(cap.isOpened()):
ret, frame = cap.read()
if ret==True:
frames.append(frame)
if cv2.waitKey(1) & 0xFF == ord('q'):
break
else:
break
cap.release()
print('The number of frames saved: ', len(frames))
No comments:
Post a Comment