Getting text scores
Import required packages
[2]:
from oceanai.modules.lab.build import Run
Build
[3]:
_b5 = Run(
lang = 'en', # Inference language
color_simple = '#333', # Plain text color (hexadecimal code)
color_info = '#1776D2', # The color of the text containing the information (hexadecimal code)
color_err = '#FF0000', # Error text color (hexadecimal code)
color_true = '#008001', # Text color containing positive information (hexadecimal code)
bold_text = True, # Bold text
num_to_df_display = 30, # Number of rows to display in tables
text_runtime = 'Runtime', # Runtime text
metadata = True # Displaying information about library
)
[2023-12-14 18:07:43] OCEANAI - personaly traits: Authors: Elena Ryumina [ryumina_ev@mail.ru] Dmitry Ryumin [dl_03.03.1991@mail.ru] Alexey Karpov [karpov@iias.spb.su] Maintainers: Elena Ryumina [ryumina_ev@mail.ru] Dmitry Ryumin [dl_03.03.1991@mail.ru] Version: 1.0.0a16 License: BSD License
Getting and displaying versions of installed libraries
_b5.df_pkgs_
- DataFrame with versions of installed libraries
[4]:
_b5.libs_vers(runtime = True, run = True)
Package | Version | |
---|---|---|
1 | TensorFlow | 2.15.0 |
2 | Keras | 2.15.0 |
3 | OpenCV | 4.8.1 |
4 | MediaPipe | 0.9.0 |
5 | NumPy | 1.26.2 |
6 | SciPy | 1.11.4 |
7 | Pandas | 2.1.3 |
8 | Scikit-learn | 1.3.2 |
9 | OpenSmile | 2.5.0 |
10 | Librosa | 0.10.1 |
11 | AudioRead | 3.0.1 |
12 | IPython | 8.18.1 |
13 | PyMediaInfo | 6.1.0 |
14 | Requests | 2.31.0 |
15 | JupyterLab | 4.0.9 |
16 | LIWC | 0.5.0 |
17 | Transformers | 4.36.0 |
18 | Sentencepiece | 0.1.99 |
19 | Torch | 2.0.1+cpu |
20 | Torchaudio | 2.0.2+cpu |
— Runtime: 0.006 sec. —
Loading a dictionary with hand-crafted features
[5]:
# Core setup
_b5.path_to_save_ = './models' # Directory to save the models
_b5.chunk_size_ = 2000000 # File download size from network in one step
res_load_text_features = _b5.load_text_features(
force_reload = True, # Forced download file
out = True, # Display
runtime = True, # Runtime calculation
run = True # Run blocking
)
[2023-12-14 18:07:43] Loading a dictionary with hand-crafted features …
[2023-12-03 00:30:00] Loading the “LIWC2007.txt” file 100.0% …
— Runtime: 0.232 sec. —
Building tokenizer and translation model (RU -> EN
)
[6]:
res_setup_translation_model = _b5.setup_translation_model(
out = True, # Display
runtime = True, # Runtime calculation
run = True # Run blocking
)
[2023-12-14 18:07:43] Building tokenizer and translation model …
— Runtime: 1.71 sec. —
Building tokenizer and BERT model (for word encoding
)
[7]:
# Core setup
_b5.path_to_save_ = './models' # Directory to save the models
_b5.chunk_size_ = 2000000 # File download size from network in one step
res_setup_translation_model = _b5.setup_bert_encoder(
force_reload = True, # Forced download file
out = True, # Display
runtime = True, # Runtime calculation
run = True # Run blocking
)
[2023-12-14 18:07:45] Building tokenizer and BERT model …
[2023-12-14 18:07:47] Loading the “bert-base-multilingual-cased.zip” file
[2023-12-14 18:07:45] Building tokenizer and BERT model …
[2023-12-14 18:07:47] Loading the “bert-base-multilingual-cased.zip” file
[2023-12-14 18:07:47] Unzipping an archive “bert-base-multilingual-cased.zip” …
— Runtime: 4.188 sec. —
FI V2
Formation of neural network architectures of models for obtaining scores by hand-crafted features
_b5.text_model_hc_
- Neural network model tf.keras.Model for obtaining scores by hand-crafted features
[8]:
res_load_text_model_hc_mupta = _b5.load_text_model_hc(
corpus = "fi", # Corpus selection for models trained on First Impressions V2 'fi' and models trained on for MuPTA 'mupta'
show_summary = False, # Displaying the formed neural network architecture of the model
out = True, # Display
runtime = True, # Runtime count
run = True # Run blocking
)
[2023-12-14 18:07:49] Formation of the neural network architecture of the model for obtaining scores by hand-crafted features (text modality) …
— Runtime: 0.647 sec. —
Downloading the weights of the neural network model for obtaining scores by hand-crafted features
_b5.text_model_hc_
- Neural network model tf.keras.Model for obtaining scores by hand-crafted features
[9]:
# Core settings
_b5.path_to_save_ = './models' # Directory to save the file
_b5.chunk_size_ = 2000000 # File download size from network in 1 step
url = _b5.weights_for_big5_['text']['fi']['hc']['sberdisk']
res_load_text_model_weights_hc_fi = _b5.load_text_model_weights_hc(
url = url, # Full path to the file with weights of the neural network model
force_reload = True, # Forced download of a file with weights of a neural network model from the network
out = True, # Display
runtime = True, # Runtime count
run = True # Run blocking
)
[2023-12-14 18:07:50] Downloading the weights of a neural network model to obtain hand-crafted features (text modality) …
[2023-12-14 18:07:50] File download “weights_2023-07-15_10-52-15.h5” 100.0% …
— Runtime: 0.289 sec. —
Formation of the neural network architecture of the model to obtain scores by deep features
_b5s.text_model_nn_
- Neural network model tf.keras.Model for obtaining scores by deep features
[10]:
res_load_text_model_nn_fi = _b5.load_text_model_nn(
corpus = "fi", # Corpus selection for models trained on First Impressions V2 'fi' and models trained on for MuPTA 'mupta'
show_summary = False, # Displaying the formed neural network architecture of the model
out = True, # Display
runtime = True, # Runtime count
run = True # Run blocking
)
[2023-12-14 18:07:50] Formation of a neural network architecture for obtaining scores by deep features (text modality) …
— Runtime: 0.279 sec. —
Downloading the weights of the neural network model for obtaining scores by deep features
_b5s.text_model_nn_
- Neural network model tf.keras.Model for obtaining scores by deep features
[11]:
# Core settings
_b5.path_to_save_ = './models' # Directory to save the file
_b5.chunk_size_ = 2000000 # File download size from network in 1 step
url = _b5.weights_for_big5_['text']['fi']['nn']['sberdisk']
res_load_text_model_weights_nn_fi = _b5.load_text_model_weights_nn(
url = url, # Full path to the file with weights of the neural network model
force_reload = True, # Forced download of a file with weights of a neural network model from the network
out = True, # Display
runtime = True, # Runtime count
run = True # Run blocking
)
[2023-12-14 18:07:50] Downloading the weights of a neural network model to obtain deep features (text modality) …
[2023-12-14 18:07:51] File download “weights_2023-07-03_15-01-08.h5” 100.0% …
— Runtime: 0.337 sec. —
Formation of the neural network architecture of the model to obtain personality traits scores
_b5.text_models_b5_
- Neural network models tf.keras.Model for obtaining the personality traits scores
[12]:
res_load_text_model_b5 = _b5.load_text_model_b5(
show_summary = False, # Displaying the formed neural network architecture of the model
out = True, # Display
runtime = True, # Runtime count
run = True # Run blocking
)
[2023-12-14 18:07:51] Formation of neural network architectures of models for obtaining the personality traits scores (text modality) …
— Runtime: 0.015 sec. —
Downloading weights of neural network models for obtaining the personality traits scores
_b5.text_models_b5_
- Neural network models tf.keras.Model for obtaining the personality traits scores
[13]:
# Core settings
_b5.path_to_save_ = './models' # Directory to save the file
_b5.chunk_size_ = 2000000 # File download size from network in 1 step
url = _b5.weights_for_big5_['text']['fi']['b5']['sberdisk']
res_load_text_model_weights_b5 = _b5.load_text_model_weights_b5(
url = url,
force_reload = False, # Forced download of a file with weights of a neural network model from the network
out = True, # Display
runtime = True, # Runtime count
run = True # Run blocking
)
[2023-12-14 18:07:51] Downloading the weights of neural network models to obtain the personality traits scores (text modality) …
[2023-12-14 18:07:51] File download “ft_fi_2023-12-09_14-25-13.h5”
— Runtime: 0.163 sec. —
Getting scores (text modality)
_b5.df_files_
- DataFrame with data
_b5.df_accuracy_
- DataFrame with accuracy
[14]:
# Core settings
_b5.path_to_dataset_ = 'E:/Databases/FirstImpressionsV2/test' # Dataset directory
# Directories not included in the selection
_b5.ignore_dirs_ = []
# НKey names for DataFrame dataset
_b5.keys_dataset_ = ['Path', 'Openness', 'Conscientiousness', 'Extraversion', 'Agreeableness', 'Non-Neuroticism']
_b5.ext_ = ['.mp4'] # Search file extensions
_b5.path_to_logs_ = './logs' # Directory for saving LOG files
# Full path to the file containing the ground truth scores for the accuracy calculation
url_accuracy = _b5.true_traits_['fi']['sberdisk']
res_get_text_union_predictions = _b5.get_text_union_predictions(
depth = 1, # Hierarchy depth for receiving video
recursive = False, # Recursive data search
asr = True, # Using a model for ASR
lang = 'en', # Language selection for models trained on First Impressions V2 'en' and models trained on for MuPTA 'ru'
accuracy = True, # Accuracy calculation
url_accuracy = url_accuracy,
logs = True, # If necessary, generate a LOG file
out = True, # Display
runtime = True, # Runtime count
run = True # Run blocking
)
[2023-12-14 19:00:14] Feature extraction (hand-crafted and deep) from text …
[2023-12-14 19:00:15] Getting scores and accuracy calculation (text modality) …
2000 from 2000 (100.0%) … test80_25_Q4wOgixh7E.004.mp4 …
Path | Openness | Conscientiousness | Extraversion | Agreeableness | Non-Neuroticism | |
---|---|---|---|---|---|---|
ID | ||||||
1 | E:\Databases\FirstImpressionsV2\test\test80_01... | 0.624434 | 0.588915 | 0.53729 | 0.601771 | 0.587032 |
2 | E:\Databases\FirstImpressionsV2\test\test80_01... | 0.518305 | 0.405696 | 0.440837 | 0.486431 | 0.42919 |
3 | E:\Databases\FirstImpressionsV2\test\test80_01... | 0.516165 | 0.482939 | 0.419187 | 0.520959 | 0.46346 |
4 | E:\Databases\FirstImpressionsV2\test\test80_01... | 0.653522 | 0.645953 | 0.5613 | 0.63864 | 0.635908 |
5 | E:\Databases\FirstImpressionsV2\test\test80_01... | 0.672823 | 0.563164 | 0.597474 | 0.618239 | 0.627377 |
6 | E:\Databases\FirstImpressionsV2\test\test80_01... | 0.571563 | 0.49441 | 0.477624 | 0.548336 | 0.509708 |
7 | E:\Databases\FirstImpressionsV2\test\test80_01... | 0.579048 | 0.590844 | 0.470888 | 0.580203 | 0.545247 |
8 | E:\Databases\FirstImpressionsV2\test\test80_01... | 0.547369 | 0.540064 | 0.441378 | 0.55407 | 0.52564 |
9 | E:\Databases\FirstImpressionsV2\test\test80_01... | 0.630611 | 0.546466 | 0.548925 | 0.592785 | 0.576801 |
10 | E:\Databases\FirstImpressionsV2\test\test80_01... | 0.643665 | 0.650126 | 0.561841 | 0.63202 | 0.636658 |
11 | E:\Databases\FirstImpressionsV2\test\test80_01... | 0.610431 | 0.509742 | 0.532337 | 0.563182 | 0.548405 |
12 | E:\Databases\FirstImpressionsV2\test\test80_01... | 0.501841 | 0.438787 | 0.408134 | 0.493867 | 0.433236 |
13 | E:\Databases\FirstImpressionsV2\test\test80_01... | 0.516751 | 0.521908 | 0.412392 | 0.535759 | 0.475492 |
14 | E:\Databases\FirstImpressionsV2\test\test80_01... | 0.625826 | 0.595756 | 0.545166 | 0.608196 | 0.601571 |
15 | E:\Databases\FirstImpressionsV2\test\test80_01... | 0.506065 | 0.466968 | 0.428299 | 0.497129 | 0.451425 |
16 | E:\Databases\FirstImpressionsV2\test\test80_01... | 0.638552 | 0.564402 | 0.561068 | 0.599493 | 0.594701 |
17 | E:\Databases\FirstImpressionsV2\test\test80_01... | 0.51764 | 0.588128 | 0.392461 | 0.569938 | 0.512308 |
18 | E:\Databases\FirstImpressionsV2\test\test80_01... | 0.581101 | 0.516556 | 0.489761 | 0.557651 | 0.521073 |
19 | E:\Databases\FirstImpressionsV2\test\test80_01... | 0.545621 | 0.467661 | 0.46827 | 0.518607 | 0.478676 |
20 | E:\Databases\FirstImpressionsV2\test\test80_01... | 0.619155 | 0.529129 | 0.535892 | 0.58141 | 0.571938 |
21 | E:\Databases\FirstImpressionsV2\test\test80_01... | 0.58491 | 0.489063 | 0.500084 | 0.538159 | 0.525135 |
22 | E:\Databases\FirstImpressionsV2\test\test80_01... | 0.504319 | 0.449576 | 0.427531 | 0.488319 | 0.441239 |
23 | E:\Databases\FirstImpressionsV2\test\test80_01... | 0.587255 | 0.591969 | 0.50329 | 0.578679 | 0.566444 |
24 | E:\Databases\FirstImpressionsV2\test\test80_01... | 0.6448 | 0.58204 | 0.558367 | 0.61345 | 0.60149 |
25 | E:\Databases\FirstImpressionsV2\test\test80_01... | 0.575514 | 0.517498 | 0.481397 | 0.548056 | 0.514953 |
26 | E:\Databases\FirstImpressionsV2\test\test80_01... | 0.561977 | 0.594428 | 0.456222 | 0.562595 | 0.536081 |
27 | E:\Databases\FirstImpressionsV2\test\test80_01... | 0.522762 | 0.468697 | 0.426084 | 0.510566 | 0.451157 |
28 | E:\Databases\FirstImpressionsV2\test\test80_01... | 0.642535 | 0.538425 | 0.564254 | 0.602641 | 0.595872 |
29 | E:\Databases\FirstImpressionsV2\test\test80_01... | 0.615789 | 0.54139 | 0.522493 | 0.585496 | 0.570682 |
30 | E:\Databases\FirstImpressionsV2\test\test80_01... | 0.620333 | 0.522955 | 0.543902 | 0.569043 | 0.559107 |
[2023-12-14 19:00:16] Trait-wise accuracy …
Openness | Conscientiousness | Extraversion | Agreeableness | Non-Neuroticism | Mean | |
---|---|---|---|---|---|---|
Metrics | ||||||
MAE | 0.1097 | 0.114 | 0.115 | 0.1019 | 0.1154 | 0.1112 |
Accuracy | 0.8903 | 0.886 | 0.885 | 0.8981 | 0.8846 | 0.8888 |
[2023-12-14 19:00:16] Mean absolute errors: 0.1112, average accuracy: 0.8888 …
Log files saved successfully …
— Runtime: 3131.846 sec. —