Getting text scores

cce5fba3707c4ff6ac9b85d7c67f2141


Import required packages

[2]:
from oceanai.modules.lab.build import Run

Build

[3]:
_b5 = Run(
    lang = 'en', # Inference language
    color_simple = '#333', # Plain text color (hexadecimal code)
    color_info = '#1776D2', # The color of the text containing the information (hexadecimal code)
    color_err = '#FF0000', # Error text color (hexadecimal code)
    color_true = '#008001', # Text color containing positive information (hexadecimal code)
    bold_text = True, # Bold text
    num_to_df_display = 30, # Number of rows to display in tables
    text_runtime = 'Runtime', # Runtime text
    metadata = True # Displaying information about library
)

[2023-12-14 18:07:43] OCEANAI - personaly traits:    Authors:        Elena Ryumina [ryumina_ev@mail.ru]        Dmitry Ryumin [dl_03.03.1991@mail.ru]        Alexey Karpov [karpov@iias.spb.su]    Maintainers:        Elena Ryumina [ryumina_ev@mail.ru]        Dmitry Ryumin [dl_03.03.1991@mail.ru]    Version: 1.0.0a16    License: BSD License

Getting and displaying versions of installed libraries

  • _b5.df_pkgs_ - DataFrame with versions of installed libraries

[4]:
_b5.libs_vers(runtime = True, run = True)
Package Version
1 TensorFlow 2.15.0
2 Keras 2.15.0
3 OpenCV 4.8.1
4 MediaPipe 0.9.0
5 NumPy 1.26.2
6 SciPy 1.11.4
7 Pandas 2.1.3
8 Scikit-learn 1.3.2
9 OpenSmile 2.5.0
10 Librosa 0.10.1
11 AudioRead 3.0.1
12 IPython 8.18.1
13 PyMediaInfo 6.1.0
14 Requests 2.31.0
15 JupyterLab 4.0.9
16 LIWC 0.5.0
17 Transformers 4.36.0
18 Sentencepiece 0.1.99
19 Torch 2.0.1+cpu
20 Torchaudio 2.0.2+cpu

— Runtime: 0.006 sec. —

Loading a dictionary with hand-crafted features

[5]:
# Core setup
_b5.path_to_save_ = './models' # Directory to save the models
_b5.chunk_size_ = 2000000      # File download size from network in one step

res_load_text_features = _b5.load_text_features(
    force_reload = True,       # Forced download file
    out = True,                # Display
    runtime = True,            # Runtime calculation
    run = True                 # Run blocking
)

[2023-12-14 18:07:43] Loading a dictionary with hand-crafted features …

[2023-12-03 00:30:00] Loading the “LIWC2007.txt” file 100.0% …

— Runtime: 0.232 sec. —

Building tokenizer and translation model (RU -> EN)

[6]:
res_setup_translation_model = _b5.setup_translation_model(
    out = True,     # Display
    runtime = True, # Runtime calculation
    run = True      # Run blocking
)

[2023-12-14 18:07:43] Building tokenizer and translation model …

— Runtime: 1.71 sec. —

Building tokenizer and BERT model (for word encoding)

[7]:
# Core setup
_b5.path_to_save_ = './models' # Directory to save the models
_b5.chunk_size_ = 2000000      # File download size from network in one step

res_setup_translation_model = _b5.setup_bert_encoder(
    force_reload = True,       # Forced download file
    out = True,                # Display
    runtime = True,            # Runtime calculation
    run = True                 # Run blocking
)

[2023-12-14 18:07:45] Building tokenizer and BERT model …

[2023-12-14 18:07:47] Loading the “bert-base-multilingual-cased.zip” file

[2023-12-14 18:07:45] Building tokenizer and BERT model …

[2023-12-14 18:07:47] Loading the “bert-base-multilingual-cased.zip” file

[2023-12-14 18:07:47] Unzipping an archive “bert-base-multilingual-cased.zip” …

— Runtime: 4.188 sec. —

FI V2

Formation of neural network architectures of models for obtaining scores by hand-crafted features

  • _b5.text_model_hc_ - Neural network model tf.keras.Model for obtaining scores by hand-crafted features

[8]:
res_load_text_model_hc_mupta = _b5.load_text_model_hc(
    corpus = "fi", # Corpus selection for models trained on First Impressions V2 'fi' and models trained on for MuPTA 'mupta'
    show_summary = False, # Displaying the formed neural network architecture of the model
    out = True, # Display
    runtime = True, # Runtime count
    run = True # Run blocking
)

[2023-12-14 18:07:49] Formation of the neural network architecture of the model for obtaining scores by hand-crafted features (text modality) …

— Runtime: 0.647 sec. —

Downloading the weights of the neural network model for obtaining scores by hand-crafted features

  • _b5.text_model_hc_ - Neural network model tf.keras.Model for obtaining scores by hand-crafted features

[9]:
# Core settings
_b5.path_to_save_ = './models' # Directory to save the file
_b5.chunk_size_ = 2000000      # File download size from network in 1 step

url = _b5.weights_for_big5_['text']['fi']['hc']['sberdisk']

res_load_text_model_weights_hc_fi = _b5.load_text_model_weights_hc(
    url = url, # Full path to the file with weights of the neural network model
    force_reload = True, # Forced download of a file with weights of a neural network model from the network
    out = True,     # Display
    runtime = True, # Runtime count
    run = True      # Run blocking
)

[2023-12-14 18:07:50] Downloading the weights of a neural network model to obtain hand-crafted features (text modality) …

[2023-12-14 18:07:50] File download “weights_2023-07-15_10-52-15.h5” 100.0% …

— Runtime: 0.289 sec. —

Formation of the neural network architecture of the model to obtain scores by deep features

  • _b5s.text_model_nn_ - Neural network model tf.keras.Model for obtaining scores by deep features

[10]:
res_load_text_model_nn_fi = _b5.load_text_model_nn(
    corpus = "fi", # Corpus selection for models trained on First Impressions V2 'fi' and models trained on for MuPTA 'mupta'
    show_summary = False, # Displaying the formed neural network architecture of the model
    out = True, # Display
    runtime = True, # Runtime count
    run = True # Run blocking
)

[2023-12-14 18:07:50] Formation of a neural network architecture for obtaining scores by deep features (text modality) …

— Runtime: 0.279 sec. —

Downloading the weights of the neural network model for obtaining scores by deep features

  • _b5s.text_model_nn_ - Neural network model tf.keras.Model for obtaining scores by deep features

[11]:
# Core settings
_b5.path_to_save_ = './models' # Directory to save the file
_b5.chunk_size_ = 2000000      # File download size from network in 1 step

url = _b5.weights_for_big5_['text']['fi']['nn']['sberdisk']

res_load_text_model_weights_nn_fi = _b5.load_text_model_weights_nn(
    url = url, # Full path to the file with weights of the neural network model
    force_reload = True, # Forced download of a file with weights of a neural network model from the network
    out = True,     # Display
    runtime = True, # Runtime count
    run = True      # Run blocking
)

[2023-12-14 18:07:50] Downloading the weights of a neural network model to obtain deep features (text modality) …

[2023-12-14 18:07:51] File download “weights_2023-07-03_15-01-08.h5” 100.0% …

— Runtime: 0.337 sec. —

Formation of the neural network architecture of the model to obtain personality traits scores

  • _b5.text_models_b5_ - Neural network models tf.keras.Model for obtaining the personality traits scores

[12]:
res_load_text_model_b5 = _b5.load_text_model_b5(
    show_summary = False, # Displaying the formed neural network architecture of the model
    out = True, # Display
    runtime = True, # Runtime count
    run = True # Run blocking
)

[2023-12-14 18:07:51] Formation of neural network architectures of models for obtaining the personality traits scores (text modality) …

— Runtime: 0.015 sec. —

Downloading weights of neural network models for obtaining the personality traits scores

  • _b5.text_models_b5_ - Neural network models tf.keras.Model for obtaining the personality traits scores

[13]:
# Core settings
_b5.path_to_save_ = './models' # Directory to save the file
_b5.chunk_size_ = 2000000 # File download size from network in 1 step

url = _b5.weights_for_big5_['text']['fi']['b5']['sberdisk']

res_load_text_model_weights_b5 = _b5.load_text_model_weights_b5(
    url = url,
    force_reload = False, # Forced download of a file with weights of a neural network model from the network
    out = True, # Display
    runtime = True, # Runtime count
    run = True # Run blocking
)

[2023-12-14 18:07:51] Downloading the weights of neural network models to obtain the personality traits scores (text modality) …

[2023-12-14 18:07:51] File download “ft_fi_2023-12-09_14-25-13.h5”

— Runtime: 0.163 sec. —

Getting scores (text modality)

  • _b5.df_files_ - DataFrame with data

  • _b5.df_accuracy_ - DataFrame with accuracy

[14]:
# Core settings
_b5.path_to_dataset_ = 'E:/Databases/FirstImpressionsV2/test' # Dataset directory
# Directories not included in the selection
_b5.ignore_dirs_ = []
# НKey names for DataFrame dataset
_b5.keys_dataset_ = ['Path', 'Openness', 'Conscientiousness', 'Extraversion', 'Agreeableness', 'Non-Neuroticism']
_b5.ext_ = ['.mp4'] # Search file extensions
_b5.path_to_logs_ = './logs' # Directory for saving LOG files

# Full path to the file containing the ground truth scores for the accuracy calculation
url_accuracy = _b5.true_traits_['fi']['sberdisk']

res_get_text_union_predictions = _b5.get_text_union_predictions(
    depth = 1,         # Hierarchy depth for receiving video
    recursive = False, # Recursive data search
    asr = True,        # Using a model for ASR
    lang = 'en', # Language selection for models trained on First Impressions V2 'en' and models trained on for MuPTA 'ru'
    accuracy = True,   # Accuracy calculation
    url_accuracy = url_accuracy,
    logs = True,       # If necessary, generate a LOG file
    out = True,        # Display
    runtime = True,    # Runtime count
    run = True         # Run blocking
)

[2023-12-14 19:00:14] Feature extraction (hand-crafted and deep) from text …

[2023-12-14 19:00:15] Getting scores and accuracy calculation (text modality) …

2000 from 2000 (100.0%) … test80_25_Q4wOgixh7E.004.mp4 …

Path Openness Conscientiousness Extraversion Agreeableness Non-Neuroticism
ID
1 E:\Databases\FirstImpressionsV2\test\test80_01... 0.624434 0.588915 0.53729 0.601771 0.587032
2 E:\Databases\FirstImpressionsV2\test\test80_01... 0.518305 0.405696 0.440837 0.486431 0.42919
3 E:\Databases\FirstImpressionsV2\test\test80_01... 0.516165 0.482939 0.419187 0.520959 0.46346
4 E:\Databases\FirstImpressionsV2\test\test80_01... 0.653522 0.645953 0.5613 0.63864 0.635908
5 E:\Databases\FirstImpressionsV2\test\test80_01... 0.672823 0.563164 0.597474 0.618239 0.627377
6 E:\Databases\FirstImpressionsV2\test\test80_01... 0.571563 0.49441 0.477624 0.548336 0.509708
7 E:\Databases\FirstImpressionsV2\test\test80_01... 0.579048 0.590844 0.470888 0.580203 0.545247
8 E:\Databases\FirstImpressionsV2\test\test80_01... 0.547369 0.540064 0.441378 0.55407 0.52564
9 E:\Databases\FirstImpressionsV2\test\test80_01... 0.630611 0.546466 0.548925 0.592785 0.576801
10 E:\Databases\FirstImpressionsV2\test\test80_01... 0.643665 0.650126 0.561841 0.63202 0.636658
11 E:\Databases\FirstImpressionsV2\test\test80_01... 0.610431 0.509742 0.532337 0.563182 0.548405
12 E:\Databases\FirstImpressionsV2\test\test80_01... 0.501841 0.438787 0.408134 0.493867 0.433236
13 E:\Databases\FirstImpressionsV2\test\test80_01... 0.516751 0.521908 0.412392 0.535759 0.475492
14 E:\Databases\FirstImpressionsV2\test\test80_01... 0.625826 0.595756 0.545166 0.608196 0.601571
15 E:\Databases\FirstImpressionsV2\test\test80_01... 0.506065 0.466968 0.428299 0.497129 0.451425
16 E:\Databases\FirstImpressionsV2\test\test80_01... 0.638552 0.564402 0.561068 0.599493 0.594701
17 E:\Databases\FirstImpressionsV2\test\test80_01... 0.51764 0.588128 0.392461 0.569938 0.512308
18 E:\Databases\FirstImpressionsV2\test\test80_01... 0.581101 0.516556 0.489761 0.557651 0.521073
19 E:\Databases\FirstImpressionsV2\test\test80_01... 0.545621 0.467661 0.46827 0.518607 0.478676
20 E:\Databases\FirstImpressionsV2\test\test80_01... 0.619155 0.529129 0.535892 0.58141 0.571938
21 E:\Databases\FirstImpressionsV2\test\test80_01... 0.58491 0.489063 0.500084 0.538159 0.525135
22 E:\Databases\FirstImpressionsV2\test\test80_01... 0.504319 0.449576 0.427531 0.488319 0.441239
23 E:\Databases\FirstImpressionsV2\test\test80_01... 0.587255 0.591969 0.50329 0.578679 0.566444
24 E:\Databases\FirstImpressionsV2\test\test80_01... 0.6448 0.58204 0.558367 0.61345 0.60149
25 E:\Databases\FirstImpressionsV2\test\test80_01... 0.575514 0.517498 0.481397 0.548056 0.514953
26 E:\Databases\FirstImpressionsV2\test\test80_01... 0.561977 0.594428 0.456222 0.562595 0.536081
27 E:\Databases\FirstImpressionsV2\test\test80_01... 0.522762 0.468697 0.426084 0.510566 0.451157
28 E:\Databases\FirstImpressionsV2\test\test80_01... 0.642535 0.538425 0.564254 0.602641 0.595872
29 E:\Databases\FirstImpressionsV2\test\test80_01... 0.615789 0.54139 0.522493 0.585496 0.570682
30 E:\Databases\FirstImpressionsV2\test\test80_01... 0.620333 0.522955 0.543902 0.569043 0.559107

[2023-12-14 19:00:16] Trait-wise accuracy …

Openness Conscientiousness Extraversion Agreeableness Non-Neuroticism Mean
Metrics
MAE 0.1097 0.114 0.115 0.1019 0.1154 0.1112
Accuracy 0.8903 0.886 0.885 0.8981 0.8846 0.8888

[2023-12-14 19:00:16] Mean absolute errors: 0.1112, average accuracy: 0.8888 …

Log files saved successfully …

— Runtime: 3131.846 sec. —