Solution of practical task 1

Task: Ranking of potential candidates by professional responsibilities

The solution of the practical task is performed in two stages. At the first stage it is necessary to use the OCEAN-AI library to obtain predictions (personality traits scores). The second step is to use the methods _candidate_ranking and _priority_skill_calculation from the OCEAN-AI library to solve the presented practical task. Examples of the results of the work and implementation are presented below.

Thus, the OCEAN-AI library provides tools to analyze the personality traits of candidates and their suitability for the position, which can significantly improve the recruitment process and help to make more

45d4969e81eb4e9888272a0ac4041673

75a07d3cb946440ea177c0267d50c0f8


FI V2

[2]:
# Import required tools
import os
import pandas as pd

# Module import
from oceanai.modules.lab.build import Run

# Creating an instance of a class
_b5 = Run(lang = 'en')

# Core setup
_b5.path_to_save_ = './models' # Directory to save the models
_b5.chunk_size_ = 2000000      # File download size from network in one step

corpus = 'fi'

# Building audio models
res_load_model_hc = _b5.load_audio_model_hc()
res_load_model_nn = _b5.load_audio_model_nn()

# Loading audio model weights
url = _b5.weights_for_big5_['audio'][corpus]['hc']['sberdisk']
res_load_model_weights_hc = _b5.load_audio_model_weights_hc(url = url)

url = _b5.weights_for_big5_['audio'][corpus]['nn']['sberdisk']
res_load_model_weights_nn = _b5.load_audio_model_weights_nn(url = url)

# Loading audio model weights
res_load_model_hc = _b5.load_video_model_hc(lang='en')
res_load_model_deep_fe = _b5.load_video_model_deep_fe()
res_load_model_nn = _b5.load_video_model_nn()

# Loading video model weights
url = _b5.weights_for_big5_['video'][corpus]['hc']['sberdisk']
res_load_model_weights_hc = _b5.load_video_model_weights_hc(url = url)

url = _b5.weights_for_big5_['video'][corpus]['fe']['sberdisk']
res_load_model_weights_deep_fe = _b5.load_video_model_weights_deep_fe(url = url)

url = _b5.weights_for_big5_['video'][corpus]['nn']['sberdisk']
res_load_model_weights_nn = _b5.load_video_model_weights_nn(url = url)

# Loading a dictionary with hand-crafted features (text modality)
res_load_text_features = _b5.load_text_features()

# Building text models
res_setup_translation_model = _b5.setup_translation_model()
res_setup_translation_model = _b5.setup_bert_encoder()
res_load_text_model_hc_fi = _b5.load_text_model_hc(corpus=corpus)
res_load_text_model_nn_fi = _b5.load_text_model_nn(corpus=corpus)

# Loading text model weights
url = _b5.weights_for_big5_['text'][corpus]['hc']['sberdisk']
res_load_text_model_weights_hc_fi = _b5.load_text_model_weights_hc(url = url)

url = _b5.weights_for_big5_['text'][corpus]['nn']['sberdisk']
res_load_text_model_weights_nn_fi = _b5.load_text_model_weights_nn(url = url)

# Building model for multimodal information fusion
res_load_avt_model_b5 = _b5.load_avt_model_b5()

# Loading model weights for multimodal information fusion
url = _b5.weights_for_big5_['avt'][corpus]['b5']['sberdisk']
res_load_avt_model_weights_b5 = _b5.load_avt_model_weights_b5(url = url)

PATH_TO_DIR = './video_FI/'
PATH_SAVE_VIDEO = './video_FI/test/'

_b5.path_to_save_ = PATH_SAVE_VIDEO

# Loading 10 test files from the First Impressions V2 corpus
# URL: https://chalearnlap.cvc.uab.cat/dataset/24/description/
domain = 'https://download.sberdisk.ru/download/file/'
tets_name_files = [
    '429713680?token=FqHdMLSSh7zYSZt&filename=_plk5k7PBEg.003.mp4',
    '429713681?token=Hz9b4lQkrLfic33&filename=be0DQawtVkE.002.mp4',
    '429713683?token=EgUXS9Xs8xHm5gz&filename=2d6btbaNdfo.000.mp4',
    '429713684?token=1U26753kmPYdIgt&filename=300gK3CnzW0.003.mp4',
    '429713685?token=LyigAWLTzDNwKJO&filename=300gK3CnzW0.001.mp4',
    '429713686?token=EpfRbCKHyuc4HPu&filename=cLaZxEf1nE4.004.mp4',
    '429713687?token=FNTkwqBr4jOS95l&filename=g24JGYuT74A.004.mp4',
    '429713688?token=qDT95nz7hfm2Nki&filename=JZNMxa3OKHY.000.mp4',
    '429713689?token=noLguEGXDpbcKhg&filename=nvlqJbHk_Lc.003.mp4',
    '429713679?token=9L7RQ0hgdJlcek6&filename=4vdJGgZpj4k.003.mp4'
]

for curr_files in tets_name_files:
    _b5.download_file_from_url(url = domain + curr_files, out = True)

# Getting scores
_b5.path_to_dataset_ = PATH_TO_DIR # Dataset directory
_b5.ext_ = ['.mp4'] # Search file extensions

# Full path to the file with ground truth scores for accuracy calculation
url_accuracy = _b5.true_traits_[corpus]['sberdisk']

_b5.get_avt_predictions(url_accuracy = url_accuracy, lang = 'en')

[2023-12-16 18:42:02] Feature extraction (hand-crafted and deep) from text …

[2023-12-16 18:42:05] Getting scores and accuracy calculation (multimodal fusion) …

10 from 10 (100.0%) … GitHub:nbsphinx-math:OCEANAI\docs\source\user_guide:nbsphinx-math:notebooks\video_FI:nbsphinx-math:test_plk5k7PBEg.003.mp4 …

Path Openness Conscientiousness Extraversion Agreeableness Non-Neuroticism
Person ID
1 2d6btbaNdfo.000.mp4 0.581159 0.628822 0.466609 0.622129 0.553832
2 300gK3CnzW0.001.mp4 0.463991 0.418851 0.41301 0.493329 0.423093
3 300gK3CnzW0.003.mp4 0.454281 0.415049 0.39189 0.485114 0.420741
4 4vdJGgZpj4k.003.mp4 0.588461 0.643233 0.530789 0.603038 0.593398
5 be0DQawtVkE.002.mp4 0.633433 0.533295 0.523742 0.608591 0.588456
6 cLaZxEf1nE4.004.mp4 0.636944 0.542386 0.558461 0.570975 0.558983
7 g24JGYuT74A.004.mp4 0.531518 0.376987 0.393309 0.4904 0.447881
8 JZNMxa3OKHY.000.mp4 0.610342 0.541418 0.563163 0.595013 0.569461
9 nvlqJbHk_Lc.003.mp4 0.495809 0.458526 0.414436 0.469152 0.435461
10 _plk5k7PBEg.003.mp4 0.60707 0.591893 0.520662 0.603938 0.565726

[2023-12-16 18:42:05] Trait-wise accuracy …

Openness Conscientiousness Extraversion Agreeableness Non-Neuroticism Mean
Metrics
MAE 0.0589 0.0612 0.0864 0.0697 0.0582 0.0669
Accuracy 0.9411 0.9388 0.9136 0.9303 0.9418 0.9331

[2023-12-16 18:42:05] Mean absolute errors: 0.0669, average accuracy: 0.9331 …

Log files saved successfully …

— Runtime: 64.481 sec. —

[2]:
True

Thus, the OCEAN-AI library provides tools to analyze the personality traits of candidates and their suitability for the position, which can significantly improve the recruitment process and help to make more objective and systematic decisions when ranking candidates.

The weight coefficients for 5 professions based on scientific articles are proposed:

  1. Sajjad H. et al. Personality and Career Choices // African Journal of Business Management. - 2012. – Vol. 6 (6) – pp. 2255-2260.

  2. Alkhelil A. H. The Relationship between Personality Traits and Career Choice: A Case Study of Secondary School Students // International Journal of Academic Research in Progressive Education and Development. – 2016. – Vol. 5(2). – pp. 2226-6348.

  3. De Jong N. et al. Personality Traits and Career Role Enactment: Career Role Preferences as a Mediator // Frontiers in Psychology. – 2019. – Vol. 10. – pp. 1720.

The user can set their own weights; the sum of the weights must be equal to 100.

[3]:
# Loading a dataframe with weights
url = 'https://download.sberdisk.ru/download/file/478675798?token=fF5fNZVpthQlEV0&filename=traits_priority_for_professions.csv'
traits_priority_for_professions = pd.read_csv(url)

traits_priority_for_professions.index.name = 'ID'
traits_priority_for_professions.index += 1
traits_priority_for_professions.index = traits_priority_for_professions.index.map(str)

traits_priority_for_professions
[3]:
Profession Openness Conscientiousness Extraversion Agreeableness Non-Neuroticism
ID
1 Managers/executives 15 35 15 30 5
2 Entrepreneurship 30 30 5 5 30
3 Social/Non profit making professions 5 5 35 35 20
4 Public sector professions 15 50 15 15 5
5 Scientists/researchers, and engineers 50 15 5 15 15

Ranking of candidates for the position of engineer

[4]:
weights = traits_priority_for_professions.iloc[4].values[1:]
weights = list(map(int, weights))

_b5._candidate_ranking(
    weigths_openness = weights[0],
    weigths_conscientiousness = weights[1],
    weigths_extraversion = weights[2],
    weigths_agreeableness = weights[3],
    weigths_non_neuroticism = weights[4],
    out = False
)

_b5._save_logs(df = _b5.df_files_ranking_, name = 'engineer_candidate_ranking_fi_en', out = True)

# Optional
df = _b5.df_files_ranking_.rename(columns = {'Openness':'OPE', 'Conscientiousness':'CON', 'Extraversion': 'EXT', 'Agreeableness': 'AGR', 'Non-Neuroticism': 'NNEU'})
columns_to_round = df.columns[1:]
df[columns_to_round] = df[columns_to_round].apply(lambda x: [round(i, 3) for i in x])
df
[4]:
Path OPE CON EXT AGR NNEU Candidate score
Person ID
5 be0DQawtVkE.002.mp4 0.633 0.533 0.524 0.609 0.588 60.246
6 cLaZxEf1nE4.004.mp4 0.637 0.542 0.558 0.571 0.559 59.725
4 4vdJGgZpj4k.003.mp4 0.588 0.643 0.531 0.603 0.593 59.672
10 _plk5k7PBEg.003.mp4 0.607 0.592 0.521 0.604 0.566 59.380
8 JZNMxa3OKHY.000.mp4 0.610 0.541 0.563 0.595 0.569 58.921
1 2d6btbaNdfo.000.mp4 0.581 0.629 0.467 0.622 0.554 58.463
7 g24JGYuT74A.004.mp4 0.532 0.377 0.393 0.490 0.448 48.271
9 nvlqJbHk_Lc.003.mp4 0.496 0.459 0.414 0.469 0.435 47.310
2 300gK3CnzW0.001.mp4 0.464 0.419 0.413 0.493 0.423 45.294
3 300gK3CnzW0.003.mp4 0.454 0.415 0.392 0.485 0.421 44.487

Ranking of candidates for the position of manager

[5]:
weights = traits_priority_for_professions.iloc[0].values[1:]
weights = list(map(int, weights))

_b5._candidate_ranking(
    weigths_openness = weights[0],
    weigths_conscientiousness = weights[1],
    weigths_extraversion = weights[2],
    weigths_agreeableness = weights[3],
    weigths_non_neuroticism = weights[4],
    out = False
)

_b5._save_logs(df = _b5.df_files_ranking_, name = 'executive_candidate_ranking_fi_en', out = True)

# Optional
df = _b5.df_files_ranking_.rename(columns = {'Openness':'OPE', 'Conscientiousness':'CON', 'Extraversion': 'EXT', 'Agreeableness': 'AGR', 'Non-Neuroticism': 'NNEU'})
columns_to_round = df.columns[1:]
df[columns_to_round] = df[columns_to_round].apply(lambda x: [round(i, 3) for i in x])
df
[5]:
Path OPE CON EXT AGR NNEU Candidate score
Person ID
4 4vdJGgZpj4k.003.mp4 0.588 0.643 0.531 0.603 0.593 60.360
1 2d6btbaNdfo.000.mp4 0.581 0.629 0.467 0.622 0.554 59.158
10 _plk5k7PBEg.003.mp4 0.607 0.592 0.521 0.604 0.566 58.579
8 JZNMxa3OKHY.000.mp4 0.610 0.541 0.563 0.595 0.569 57.250
5 be0DQawtVkE.002.mp4 0.633 0.533 0.524 0.609 0.588 57.223
6 cLaZxEf1nE4.004.mp4 0.637 0.542 0.558 0.571 0.559 56.839
9 nvlqJbHk_Lc.003.mp4 0.496 0.459 0.414 0.469 0.435 45.954
2 300gK3CnzW0.001.mp4 0.464 0.419 0.413 0.493 0.423 44.730
7 g24JGYuT74A.004.mp4 0.532 0.377 0.393 0.490 0.448 44.018
3 300gK3CnzW0.003.mp4 0.454 0.415 0.392 0.485 0.421 43.876

To rank candidates by skills, two correlation coefficients must be set for each personality trait and skill, as well as a threshold for the polarity of the traits. These coefficients should show how a person’s trait score changes when it is above or below a given trait polarity threshold.

As an example, the use of correlation coefficients between five traits and four professional skills presented in the article is suggested:

  1. Wehner C., de Grip A., Pfeifer H. Do recruiters select workers with different personality traits for different tasks? A discrete choice experiment // Labour Economics. - 2022. - vol. 78. - pp. 102186.

There are 4 professional skills presented:

  1. Analytical. The ability to effectively solve new problems that require in-depth analysis.

  2. Interactive. The ability to persuade and compromise with clients and colleagues.

  3. Routine. The ability to perform routine tasks effectively with accuracy and attention to detail.

  4. Non-Routine. The ability to respond to and solve problems that have no set order, demonstrating adaptability and creative problem solving skills.

The users can set their own correlation coefficients and rank candidates by other professional skills.

Ranking candidates by professional skills

[6]:
# Loading a dataframe with correlation coefficients
url = 'https://download.sberdisk.ru/download/file/478678231?token=0qiZwliLtHWWYMv&filename=professional_skills.csv'
df_professional_skills = pd.read_csv(url)

df_professional_skills.index.name = 'ID'
df_professional_skills.index += 1
df_professional_skills.index = df_professional_skills.index.map(str)

df_professional_skills
[6]:
Trait Score_level Analytical Interactive Routine Non-Routine
ID
1 Openness high 0.082 0.348 0.571 0.510
2 Openness low 0.196 0.152 0.148 0.218
3 Conscientiousness high 0.994 1.333 1.507 1.258
4 Conscientiousness low 0.241 0.188 0.191 0.267
5 Extraversion high 0.169 -0.060 0.258 0.017
6 Extraversion low 0.181 0.135 0.130 0.194
7 Agreeableness high 1.239 0.964 1.400 1.191
8 Agreeableness low 0.226 0.180 0.189 0.259
9 Non-Neuroticism high 0.636 0.777 0.876 0.729
10 Non-Neuroticism low 0.207 0.159 0.166 0.238
[7]:
_b5._priority_skill_calculation(
    correlation_coefficients = df_professional_skills,
    threshold = 0.5,
    out = True
)

_b5._save_logs(df = _b5.df_files_priority_skill_, name = 'skill_candidate_ranking_fi_en', out = True)

# Optional
df = _b5.df_files_priority_skill_.rename(columns = {'Openness':'OPE', 'Conscientiousness':'CON', 'Extraversion': 'EXT', 'Agreeableness': 'AGR', 'Non-Neuroticism': 'NNEU'})
columns_to_round = df.columns[1:]
df[columns_to_round] = df[columns_to_round].apply(lambda x: [round(i, 3) for i in x])
df
[7]:
Path OPE CON EXT AGR NNEU Analytical Interactive Routine Non-Routine
Person ID
4 4vdJGgZpj4k.003.mp4 0.588 0.643 0.531 0.603 0.593 0.380 0.415 0.561 0.454
1 2d6btbaNdfo.000.mp4 0.581 0.629 0.467 0.622 0.554 0.376 0.427 0.539 0.465
10 _plk5k7PBEg.003.mp4 0.607 0.592 0.521 0.604 0.566 0.367 0.398 0.543 0.439
5 be0DQawtVkE.002.mp4 0.633 0.533 0.524 0.609 0.588 0.360 0.389 0.534 0.431
8 JZNMxa3OKHY.000.mp4 0.610 0.541 0.563 0.595 0.569 0.357 0.383 0.528 0.425
6 cLaZxEf1nE4.004.mp4 0.637 0.542 0.558 0.571 0.559 0.350 0.379 0.523 0.421
9 nvlqJbHk_Lc.003.mp4 0.496 0.459 0.414 0.469 0.435 0.096 0.074 0.075 0.107
2 300gK3CnzW0.001.mp4 0.464 0.419 0.413 0.493 0.423 0.093 0.072 0.073 0.104
3 300gK3CnzW0.003.mp4 0.454 0.415 0.392 0.485 0.421 0.091 0.071 0.072 0.102
7 g24JGYuT74A.004.mp4 0.532 0.377 0.393 0.490 0.448 0.082 0.094 0.119 0.136

MuPTA (ru)

[9]:
import os
import pandas as pd

# Module import
from oceanai.modules.lab.build import Run

# Creating an instance of a class
_b5 = Run(lang = 'en')

corpus = 'mupta'
lang = 'ru'

# Core setup
_b5.path_to_save_ = './models' # Directory to save the models
_b5.chunk_size_ = 2000000      # File download size from network in one step

# Building audio models
res_load_model_hc = _b5.load_audio_model_hc()
res_load_model_nn = _b5.load_audio_model_nn()

# Loading audio model weights
url = _b5.weights_for_big5_['audio'][corpus]['hc']['sberdisk']
res_load_model_weights_hc = _b5.load_audio_model_weights_hc(url = url)

url = _b5.weights_for_big5_['audio'][corpus]['nn']['sberdisk']
res_load_model_weights_nn = _b5.load_audio_model_weights_nn(url = url)

# Building video models
res_load_model_hc = _b5.load_video_model_hc(lang=lang)
res_load_model_deep_fe = _b5.load_video_model_deep_fe()
res_load_model_nn = _b5.load_video_model_nn()

# Loading video model weights
url = _b5.weights_for_big5_['video'][corpus]['hc']['sberdisk']
res_load_model_weights_hc = _b5.load_video_model_weights_hc(url = url)

url = _b5.weights_for_big5_['video'][corpus]['fe']['sberdisk']
res_load_model_weights_deep_fe = _b5.load_video_model_weights_deep_fe(url = url)

url = _b5.weights_for_big5_['video'][corpus]['nn']['sberdisk']
res_load_model_weights_nn = _b5.load_video_model_weights_nn(url = url)

# Loading a dictionary with hand-crafted features (text modality)
res_load_text_features = _b5.load_text_features()

# Building text models
res_setup_translation_model = _b5.setup_translation_model()
res_setup_translation_model = _b5.setup_bert_encoder()
res_load_text_model_hc_fi = _b5.load_text_model_hc(corpus=corpus)
res_load_text_model_nn_fi = _b5.load_text_model_nn(corpus=corpus)

# Loading text model weights
url = _b5.weights_for_big5_['text'][corpus]['hc']['sberdisk']
res_load_text_model_weights_hc_fi = _b5.load_text_model_weights_hc(url = url)

url = _b5.weights_for_big5_['text'][corpus]['nn']['sberdisk']
res_load_text_model_weights_nn_fi = _b5.load_text_model_weights_nn(url = url)

# Building model for multimodal information fusion
res_load_avt_model_b5 = _b5.load_avt_model_b5()

# Loading model weights for multimodal information fusion
url = _b5.weights_for_big5_['avt'][corpus]['b5']['sberdisk']
res_load_avt_model_weights_b5 = _b5.load_avt_model_weights_b5(url = url)

PATH_TO_DIR = './video_MuPTA/'
PATH_SAVE_VIDEO = './video_MuPTA/test/'

_b5.path_to_save_ = PATH_SAVE_VIDEO

# Loading 10 test files from the MuPTA corpus
# URL: https://hci.nw.ru/en/pages/mupta-corpus
domain = 'https://download.sberdisk.ru/download/file/'
tets_name_files = [
    '477995979?token=2cvyk7CS0mHx2MJ&filename=speaker_06_center_83.mov',
    '477995980?token=jGPtBPS69uzFU6Y&filename=speaker_01_center_83.mov',
    '477995967?token=zCaRbNB6ht5wMPq&filename=speaker_11_center_83.mov',
    '477995966?token=B1rbinDYRQKrI3T&filename=speaker_15_center_83.mov',
    '477995978?token=dEpVDtZg1EQiEQ9&filename=speaker_07_center_83.mov',
    '477995961?token=o1hVjw8G45q9L9Z&filename=speaker_19_center_83.mov',
    '477995964?token=5K220Aqf673VHPq&filename=speaker_23_center_83.mov',
    '477995965?token=v1LVD2KT1cU7Lpb&filename=speaker_24_center_83.mov',
    '477995962?token=tmaSGyyWLA6XCy9&filename=speaker_27_center_83.mov',
    '477995963?token=bTpo96qNDPcwGqb&filename=speaker_10_center_83.mov',
]

for curr_files in tets_name_files:
    _b5.download_file_from_url(url = domain + curr_files, out = True)

# Getting scores
_b5.path_to_dataset_ = PATH_TO_DIR # Dataset directory
_b5.ext_ = ['.mov'] # Search file extensions

# Full path to the file with ground truth scores for accuracy calculation
url_accuracy = _b5.true_traits_['mupta']['sberdisk']

_b5.get_avt_predictions(url_accuracy = url_accuracy, lang = lang)

[2023-12-16 18:51:57] Feature extraction (hand-crafted and deep) from text …

[2023-12-16 18:52:01] Getting scores and accuracy calculation (multimodal fusion) …

10 from 10 (100.0%) … GitHub:nbsphinx-math:OCEANAI\docs\source\user_guide:nbsphinx-math:notebooks\video_MuPTA:nbsphinx-math:test\speaker_27_center_83.mov …

Path Openness Conscientiousness Extraversion Agreeableness Non-Neuroticism
Person ID
1 speaker_01_center_83.mov 0.758137 0.693356 0.650108 0.744589 0.488671
2 speaker_06_center_83.mov 0.681602 0.654339 0.607156 0.731282 0.417908
3 speaker_07_center_83.mov 0.666104 0.656836 0.567863 0.685067 0.378102
4 speaker_10_center_83.mov 0.694171 0.596195 0.571414 0.66223 0.348639
5 speaker_11_center_83.mov 0.712885 0.594764 0.571709 0.716696 0.37802
6 speaker_15_center_83.mov 0.664158 0.670411 0.60421 0.696056 0.399842
7 speaker_19_center_83.mov 0.761213 0.652635 0.651028 0.788677 0.459676
8 speaker_23_center_83.mov 0.692788 0.68324 0.616737 0.795205 0.447242
9 speaker_24_center_83.mov 0.705923 0.658382 0.610645 0.697415 0.411988
10 speaker_27_center_83.mov 0.753417 0.708372 0.654608 0.816416 0.504743

[2023-12-16 18:52:01] Trait-wise accuracy …

Openness Conscientiousness Extraversion Agreeableness Non-Neuroticism Mean
Metrics
MAE 0.0673 0.0789 0.1325 0.102 0.1002 0.0962
Accuracy 0.9327 0.9211 0.8675 0.898 0.8998 0.9038

[2023-12-16 18:52:01] Mean absolute errors: 0.0962, average accuracy: 0.9038 …

Log files saved successfully …

— Runtime: 415.41 sec. —

[9]:
True

Thus, the OCEAN-AI library provides tools to analyze the personality traits of candidates and their suitability for the position, which can significantly improve the recruitment process and help to make more objective and systematic decisions when ranking candidates.

The weight coefficients for 5 professions based on scientific articles are proposed:

  1. Sajjad H. et al. Personality and Career Choices // African Journal of Business Management. - 2012. – Vol. 6 (6) – pp. 2255-2260.

  2. Alkhelil A. H. The Relationship between Personality Traits and Career Choice: A Case Study of Secondary School Students // International Journal of Academic Research in Progressive Education and Development. – 2016. – Vol. 5(2). – pp. 2226-6348.

  3. De Jong N. et al. Personality Traits and Career Role Enactment: Career Role Preferences as a Mediator // Frontiers in Psychology. – 2019. – Vol. 10. – pp. 1720.

The user can set their own weights; the sum of the weights must be equal to 100.

[10]:
# Loading a dataframe with weights
url = 'https://download.sberdisk.ru/download/file/478675798?token=fF5fNZVpthQlEV0&filename=traits_priority_for_professions.csv'
traits_priority_for_professions = pd.read_csv(url)

traits_priority_for_professions.index.name = 'ID'
traits_priority_for_professions.index += 1
traits_priority_for_professions.index = traits_priority_for_professions.index.map(str)

traits_priority_for_professions
[10]:
Profession Openness Conscientiousness Extraversion Agreeableness Non-Neuroticism
ID
1 Managers/executives 15 35 15 30 5
2 Entrepreneurship 30 30 5 5 30
3 Social/Non profit making professions 5 5 35 35 20
4 Public sector professions 15 50 15 15 5
5 Scientists/researchers, and engineers 50 15 5 15 15

Ranking of candidates for the position of engineer

[11]:
weights = traits_priority_for_professions.iloc[4].values[1:]
weights = list(map(int, weights))

_b5._candidate_ranking(
    weigths_openness = weights[0],
    weigths_conscientiousness = weights[1],
    weigths_extraversion = weights[2],
    weigths_agreeableness = weights[3],
    weigths_non_neuroticism = weights[4],
    out = False
)

_b5._save_logs(df = _b5.df_files_ranking_, name = 'engineer_candidate_ranking_mupta_ru', out = True)

# Optional
df = _b5.df_files_ranking_.rename(columns = {'Openness':'OPE', 'Conscientiousness':'CON', 'Extraversion': 'EXT', 'Agreeableness': 'AGR', 'Non-Neuroticism': 'NNEU'})
columns_to_round = df.columns[1:]
df[columns_to_round] = df[columns_to_round].apply(lambda x: [round(i, 3) for i in x])
df
[11]:
Path OPE CON EXT AGR NNEU Candidate score
Person ID
10 speaker_27_center_83.mov 0.753 0.708 0.655 0.816 0.505 71.387
1 speaker_01_center_83.mov 0.758 0.693 0.650 0.745 0.489 70.057
7 speaker_19_center_83.mov 0.761 0.653 0.651 0.789 0.460 69.831
8 speaker_23_center_83.mov 0.693 0.683 0.617 0.795 0.447 66.608
9 speaker_24_center_83.mov 0.706 0.658 0.611 0.697 0.412 64.866
2 speaker_06_center_83.mov 0.682 0.654 0.607 0.731 0.418 64.169
5 speaker_11_center_83.mov 0.713 0.595 0.572 0.717 0.378 63.845
6 speaker_15_center_83.mov 0.664 0.670 0.604 0.696 0.400 62.724
3 speaker_07_center_83.mov 0.666 0.657 0.568 0.685 0.378 61.945
4 speaker_10_center_83.mov 0.694 0.596 0.571 0.662 0.349 61.672

Ranking of candidates for the position of manager

[12]:
weights = traits_priority_for_professions.iloc[0].values[1:]
weights = list(map(int, weights))

_b5._candidate_ranking(
    weigths_openness = weights[0],
    weigths_conscientiousness = weights[1],
    weigths_extraversion = weights[2],
    weigths_agreeableness = weights[3],
    weigths_non_neuroticism = weights[4],
    out = False
)

_b5._save_logs(df = _b5.df_files_ranking_, name = 'executive_candidate_ranking_mupta_ru', out = True)

# Optional
df = _b5.df_files_ranking_.rename(columns = {'Openness':'OPE', 'Conscientiousness':'CON', 'Extraversion': 'EXT', 'Agreeableness': 'AGR', 'Non-Neuroticism': 'NNEU'})
columns_to_round = df.columns[1:]
df[columns_to_round] = df[columns_to_round].apply(lambda x: [round(i, 3) for i in x])
df
[12]:
Path OPE CON EXT AGR NNEU Candidate score
Person ID
10 speaker_27_center_83.mov 0.753 0.708 0.655 0.816 0.505 72.930
1 speaker_01_center_83.mov 0.758 0.693 0.650 0.745 0.489 70.172
7 speaker_19_center_83.mov 0.761 0.653 0.651 0.789 0.460 69.985
8 speaker_23_center_83.mov 0.693 0.683 0.617 0.795 0.447 69.649
2 speaker_06_center_83.mov 0.682 0.654 0.607 0.731 0.418 66.261
9 speaker_24_center_83.mov 0.706 0.658 0.611 0.697 0.412 65.774
6 speaker_15_center_83.mov 0.664 0.670 0.604 0.696 0.400 65.371
3 speaker_07_center_83.mov 0.666 0.657 0.568 0.685 0.378 63.941
5 speaker_11_center_83.mov 0.713 0.595 0.572 0.717 0.378 63.477
4 speaker_10_center_83.mov 0.694 0.596 0.571 0.662 0.349 61.461

To rank candidates by skills, two correlation coefficients must be set for each personality trait and skill, as well as a threshold for the polarity of the traits. These coefficients should show how a person’s trait score changes when it is above or below a given trait polarity threshold.

As an example, the use of correlation coefficients between five traits and four professional skills presented in the article is suggested:

  1. Wehner C., de Grip A., Pfeifer H. Do recruiters select workers with different personality traits for different tasks? A discrete choice experiment // Labour Economics. - 2022. - vol. 78. - pp. 102186.

There are 4 professional skills presented:

  1. Analytical. The ability to effectively solve new problems that require in-depth analysis.

  2. Interactive. The ability to persuade and compromise with clients and colleagues.

  3. Routine. The ability to perform routine tasks effectively with accuracy and attention to detail.

  4. Non-Routine. The ability to respond to and solve problems that have no set order, demonstrating adaptability and creative problem solving skills.

The users can set their own correlation coefficients and rank candidates by other professional skills.

Ranking candidates by professional skills

[13]:
# Loading a dataframe with correlation coefficients
url = 'https://download.sberdisk.ru/download/file/478678231?token=0qiZwliLtHWWYMv&filename=professional_skills.csv'
df_professional_skills = pd.read_csv(url)

df_professional_skills.index.name = 'ID'
df_professional_skills.index += 1
df_professional_skills.index = df_professional_skills.index.map(str)

df_professional_skills
[13]:
Trait Score_level Analytical Interactive Routine Non-Routine
ID
1 Openness high 0.082 0.348 0.571 0.510
2 Openness low 0.196 0.152 0.148 0.218
3 Conscientiousness high 0.994 1.333 1.507 1.258
4 Conscientiousness low 0.241 0.188 0.191 0.267
5 Extraversion high 0.169 -0.060 0.258 0.017
6 Extraversion low 0.181 0.135 0.130 0.194
7 Agreeableness high 1.239 0.964 1.400 1.191
8 Agreeableness low 0.226 0.180 0.189 0.259
9 Non-Neuroticism high 0.636 0.777 0.876 0.729
10 Non-Neuroticism low 0.207 0.159 0.166 0.238
[14]:
_b5._priority_skill_calculation(
    correlation_coefficients = df_professional_skills,
    threshold = 0.5,
    out = True
)

_b5._save_logs(df = _b5.df_files_priority_skill_, name = 'skill_candidate_ranking_mupta_ru', out = True)

# Optional
df = _b5.df_files_priority_skill_.rename(columns = {'Openness':'OPE', 'Conscientiousness':'CON', 'Extraversion': 'EXT', 'Agreeableness': 'AGR', 'Non-Neuroticism': 'NNEU'})
columns_to_round = df.columns[1:]
df[columns_to_round] = df[columns_to_round].apply(lambda x: [round(i, 3) for i in x])
df
[14]:
Path OPE CON EXT AGR NNEU Analytical Interactive Routine Non-Routine
Person ID
10 speaker_27_center_83.mov 0.753 0.708 0.655 0.816 0.505 0.442 0.469 0.650 0.525
8 speaker_23_center_83.mov 0.693 0.683 0.617 0.795 0.447 0.384 0.391 0.554 0.455
7 speaker_19_center_83.mov 0.761 0.653 0.651 0.789 0.460 0.379 0.386 0.553 0.454
1 speaker_01_center_83.mov 0.758 0.693 0.650 0.745 0.489 0.377 0.389 0.554 0.455
2 speaker_06_center_83.mov 0.682 0.654 0.607 0.731 0.418 0.360 0.369 0.525 0.430
6 speaker_15_center_83.mov 0.664 0.670 0.604 0.696 0.400 0.354 0.365 0.517 0.423
9 speaker_24_center_83.mov 0.706 0.658 0.611 0.697 0.412 0.353 0.365 0.520 0.425
3 speaker_07_center_83.mov 0.666 0.657 0.568 0.685 0.378 0.346 0.359 0.508 0.416
5 speaker_11_center_83.mov 0.713 0.595 0.572 0.717 0.378 0.343 0.352 0.503 0.413
4 speaker_10_center_83.mov 0.694 0.596 0.571 0.662 0.349 0.328 0.339 0.485 0.397

MuPTA (en)

[15]:
import os
import pandas as pd

# Module import
from oceanai.modules.lab.build import Run

# Creating an instance of a class
_b5 = Run(lang = 'en')

corpus = 'fi'
lang = 'en'

# Core setup
_b5.path_to_save_ = './models' # Directory to save the models
_b5.chunk_size_ = 2000000      # File download size from network in one step

# Building audio models
res_load_model_hc = _b5.load_audio_model_hc()
res_load_model_nn = _b5.load_audio_model_nn()

# Loading audio model weights
url = _b5.weights_for_big5_['audio'][corpus]['hc']['sberdisk']
res_load_model_weights_hc = _b5.load_audio_model_weights_hc(url = url)

url = _b5.weights_for_big5_['audio'][corpus]['nn']['sberdisk']
res_load_model_weights_nn = _b5.load_audio_model_weights_nn(url = url)

# Building video models
res_load_model_hc = _b5.load_video_model_hc(lang=lang)
res_load_model_deep_fe = _b5.load_video_model_deep_fe()
res_load_model_nn = _b5.load_video_model_nn()

# Loading video model weights
url = _b5.weights_for_big5_['video'][corpus]['hc']['sberdisk']
res_load_model_weights_hc = _b5.load_video_model_weights_hc(url = url)

url = _b5.weights_for_big5_['video'][corpus]['fe']['sberdisk']
res_load_model_weights_deep_fe = _b5.load_video_model_weights_deep_fe(url = url)

url = _b5.weights_for_big5_['video'][corpus]['nn']['sberdisk']
res_load_model_weights_nn = _b5.load_video_model_weights_nn(url = url)

# Loading a dictionary with hand-crafted features (text modality)
res_load_text_features = _b5.load_text_features()

# Building text models
res_setup_translation_model = _b5.setup_translation_model()
res_setup_translation_model = _b5.setup_bert_encoder()
res_load_text_model_hc_fi = _b5.load_text_model_hc(corpus=corpus)
res_load_text_model_nn_fi = _b5.load_text_model_nn(corpus=corpus)

# Loading text model weights
url = _b5.weights_for_big5_['text'][corpus]['hc']['sberdisk']
res_load_text_model_weights_hc_fi = _b5.load_text_model_weights_hc(url = url)

url = _b5.weights_for_big5_['text'][corpus]['nn']['sberdisk']
res_load_text_model_weights_nn_fi = _b5.load_text_model_weights_nn(url = url)

# Building model for multimodal information fusion
res_load_avt_model_b5 = _b5.load_avt_model_b5()

# Building model for multimodal information fusion
url = _b5.weights_for_big5_['avt'][corpus]['b5']['sberdisk']
res_load_avt_model_weights_b5 = _b5.load_avt_model_weights_b5(url = url)

PATH_TO_DIR = './video_MuPTA/'
PATH_SAVE_VIDEO = './video_MuPTA/test/'

_b5.path_to_save_ = PATH_SAVE_VIDEO

# Loading 10 test files from the MuPTA corpus
# URL: https://hci.nw.ru/en/pages/mupta-corpus
domain = 'https://download.sberdisk.ru/download/file/'
tets_name_files = [
    '477995979?token=2cvyk7CS0mHx2MJ&filename=speaker_06_center_83.mov',
    '477995980?token=jGPtBPS69uzFU6Y&filename=speaker_01_center_83.mov',
    '477995967?token=zCaRbNB6ht5wMPq&filename=speaker_11_center_83.mov',
    '477995966?token=B1rbinDYRQKrI3T&filename=speaker_15_center_83.mov',
    '477995978?token=dEpVDtZg1EQiEQ9&filename=speaker_07_center_83.mov',
    '477995961?token=o1hVjw8G45q9L9Z&filename=speaker_19_center_83.mov',
    '477995964?token=5K220Aqf673VHPq&filename=speaker_23_center_83.mov',
    '477995965?token=v1LVD2KT1cU7Lpb&filename=speaker_24_center_83.mov',
    '477995962?token=tmaSGyyWLA6XCy9&filename=speaker_27_center_83.mov',
    '477995963?token=bTpo96qNDPcwGqb&filename=speaker_10_center_83.mov',
]

for curr_files in tets_name_files:
    _b5.download_file_from_url(url = domain + curr_files, out = True)

# Getting scores
_b5.path_to_dataset_ = PATH_TO_DIR # Dataset directory
_b5.ext_ = ['.mov'] # Search file extensions

# Full path to the file with ground truth scores for accuracy calculation
url_accuracy = _b5.true_traits_['mupta']['sberdisk']

_b5.get_avt_predictions(url_accuracy = url_accuracy, lang = lang)

[2023-12-16 19:00:49] Feature extraction (hand-crafted and deep) from text …

[2023-12-16 19:00:52] Getting scores and accuracy calculation (multimodal fusion) …

10 from 10 (100.0%) … GitHub:nbsphinx-math:OCEANAI\docs\source\user_guide:nbsphinx-math:notebooks\video_MuPTA:nbsphinx-math:test\speaker_27_center_83.mov …

Path Openness Conscientiousness Extraversion Agreeableness Non-Neuroticism
Person ID
1 speaker_01_center_83.mov 0.564985 0.539052 0.440615 0.59251 0.488763
2 speaker_06_center_83.mov 0.650774 0.663849 0.607308 0.643847 0.620627
3 speaker_07_center_83.mov 0.435976 0.486683 0.313828 0.415446 0.396618
4 speaker_10_center_83.mov 0.498542 0.511243 0.412592 0.468947 0.44399
5 speaker_11_center_83.mov 0.394776 0.341608 0.327082 0.427304 0.354936
6 speaker_15_center_83.mov 0.566107 0.543811 0.492766 0.587411 0.499433
7 speaker_19_center_83.mov 0.506271 0.438215 0.430894 0.456177 0.44075
8 speaker_23_center_83.mov 0.486463 0.521755 0.309894 0.432291 0.433601
9 speaker_24_center_83.mov 0.417404 0.473339 0.320714 0.445086 0.414649
10 speaker_27_center_83.mov 0.526112 0.661107 0.443167 0.558965 0.554224

[2023-12-16 19:00:52] Trait-wise accuracy …

Openness Conscientiousness Extraversion Agreeableness Non-Neuroticism Mean
Metrics
MAE 0.1727 0.1672 0.1661 0.2579 0.107 0.1742
Accuracy 0.8273 0.8328 0.8339 0.7421 0.893 0.8258

[2023-12-16 19:00:52] Mean absolute errors: 0.1742, average accuracy: 0.8258 …

Log files saved successfully …

— Runtime: 372.823 sec. —

[15]:
True

Thus, the OCEAN-AI library provides tools to analyze the personality traits of candidates and their suitability for the position, which can significantly improve the recruitment process and help to make more objective and systematic decisions when ranking candidates.

The weight coefficients for 5 professions based on scientific articles are proposed:

  1. Sajjad H. et al. Personality and Career Choices // African Journal of Business Management. - 2012. – Vol. 6 (6) – pp. 2255-2260.

  2. Alkhelil A. H. The Relationship between Personality Traits and Career Choice: A Case Study of Secondary School Students // International Journal of Academic Research in Progressive Education and Development. – 2016. – Vol. 5(2). – pp. 2226-6348.

  3. De Jong N. et al. Personality Traits and Career Role Enactment: Career Role Preferences as a Mediator // Frontiers in Psychology. – 2019. – Vol. 10. – pp. 1720.

The user can set their own weights; the sum of the weights must be equal to 100.

[16]:
# Loading a dataframe with weights
url = 'https://download.sberdisk.ru/download/file/478675798?token=fF5fNZVpthQlEV0&filename=traits_priority_for_professions.csv'
traits_priority_for_professions = pd.read_csv(url)

traits_priority_for_professions.index.name = 'ID'
traits_priority_for_professions.index += 1
traits_priority_for_professions.index = traits_priority_for_professions.index.map(str)

traits_priority_for_professions
[16]:
Profession Openness Conscientiousness Extraversion Agreeableness Non-Neuroticism
ID
1 Managers/executives 15 35 15 30 5
2 Entrepreneurship 30 30 5 5 30
3 Social/Non profit making professions 5 5 35 35 20
4 Public sector professions 15 50 15 15 5
5 Scientists/researchers, and engineers 50 15 5 15 15

Ranking of candidates for the position of engineer

[17]:
weights = traits_priority_for_professions.iloc[4].values[1:]
weights = list(map(int, weights))

_b5._candidate_ranking(
    weigths_openness = weights[0],
    weigths_conscientiousness = weights[1],
    weigths_extraversion = weights[2],
    weigths_agreeableness = weights[3],
    weigths_non_neuroticism = weights[4],
    out = False
)

_b5._save_logs(df = _b5.df_files_ranking_, name = 'engineer_candidate_ranking_mupta_en', out = True)

# Optional
df = _b5.df_files_ranking_.rename(columns = {'Openness':'OPE', 'Conscientiousness':'CON', 'Extraversion': 'EXT', 'Agreeableness': 'AGR', 'Non-Neuroticism': 'NNEU'})
columns_to_round = df.columns[1:]
df[columns_to_round] = df[columns_to_round].apply(lambda x: [round(i, 3) for i in x])
df
[17]:
Path OPE CON EXT AGR NNEU Candidate score
Person ID
2 speaker_06_center_83.mov 0.651 0.664 0.607 0.644 0.621 64.500
6 speaker_15_center_83.mov 0.566 0.544 0.493 0.587 0.499 55.229
10 speaker_27_center_83.mov 0.526 0.661 0.443 0.559 0.554 55.136
1 speaker_01_center_83.mov 0.565 0.539 0.441 0.593 0.489 54.757
4 speaker_10_center_83.mov 0.499 0.511 0.413 0.469 0.444 48.353
7 speaker_19_center_83.mov 0.506 0.438 0.431 0.456 0.441 47.495
8 speaker_23_center_83.mov 0.486 0.522 0.310 0.432 0.434 46.687
3 speaker_07_center_83.mov 0.436 0.487 0.314 0.415 0.397 42.849
9 speaker_24_center_83.mov 0.417 0.473 0.321 0.445 0.415 42.470
5 speaker_11_center_83.mov 0.395 0.342 0.327 0.427 0.355 38.232

Ranking of candidates for the position of manager

[18]:
weights = traits_priority_for_professions.iloc[0].values[1:]
weights = list(map(int, weights))

_b5._candidate_ranking(
    weigths_openness = weights[0],
    weigths_conscientiousness = weights[1],
    weigths_extraversion = weights[2],
    weigths_agreeableness = weights[3],
    weigths_non_neuroticism = weights[4],
    out = False
)

_b5._save_logs(df = _b5.df_files_ranking_, name = 'executive_candidate_ranking_mupta_en', out = True)

# Optional
df = _b5.df_files_ranking_.rename(columns = {'Openness':'OPE', 'Conscientiousness':'CON', 'Extraversion': 'EXT', 'Agreeableness': 'AGR', 'Non-Neuroticism': 'NNEU'})
columns_to_round = df.columns[1:]
df[columns_to_round] = df[columns_to_round].apply(lambda x: [round(i, 3) for i in x])
df
[18]:
Path OPE CON EXT AGR NNEU Candidate score
Person ID
2 speaker_06_center_83.mov 0.651 0.664 0.607 0.644 0.621 64.524
10 speaker_27_center_83.mov 0.526 0.661 0.443 0.559 0.554 57.218
6 speaker_15_center_83.mov 0.566 0.544 0.493 0.587 0.499 55.036
1 speaker_01_center_83.mov 0.565 0.539 0.441 0.593 0.489 54.170
4 speaker_10_center_83.mov 0.499 0.511 0.413 0.469 0.444 47.849
8 speaker_23_center_83.mov 0.486 0.522 0.310 0.432 0.434 45.344
7 speaker_19_center_83.mov 0.506 0.438 0.431 0.456 0.441 45.284
9 speaker_24_center_83.mov 0.417 0.473 0.321 0.445 0.415 43.064
3 speaker_07_center_83.mov 0.436 0.487 0.314 0.415 0.397 42.727
5 speaker_11_center_83.mov 0.395 0.342 0.327 0.427 0.355 37.378

To rank candidates by skills, two correlation coefficients must be set for each personality trait and skill, as well as a threshold for the polarity of the traits. These coefficients should show how a person’s trait score changes when it is above or below a given trait polarity threshold.

As an example, the use of correlation coefficients between five traits and four professional skills presented in the article is suggested:

  1. Wehner C., de Grip A., Pfeifer H. Do recruiters select workers with different personality traits for different tasks? A discrete choice experiment // Labour Economics. - 2022. - vol. 78. - pp. 102186.

There are 4 professional skills presented:

  1. Analytical. The ability to effectively solve new problems that require in-depth analysis.

  2. Interactive. The ability to persuade and compromise with clients and colleagues.

  3. Routine. The ability to perform routine tasks effectively with accuracy and attention to detail.

  4. Non-Routine. The ability to respond to and solve problems that have no set order, demonstrating adaptability and creative problem solving skills.

The users can set their own correlation coefficients and rank candidates by other professional skills.

Ranking candidates by professional skills

[19]:
# Loading a dataframe with correlation coefficients
url = 'https://download.sberdisk.ru/download/file/478678231?token=0qiZwliLtHWWYMv&filename=professional_skills.csv'
df_professional_skills = pd.read_csv(url)

df_professional_skills.index.name = 'ID'
df_professional_skills.index += 1
df_professional_skills.index = df_professional_skills.index.map(str)

df_professional_skills
[19]:
Trait Score_level Analytical Interactive Routine Non-Routine
ID
1 Openness high 0.082 0.348 0.571 0.510
2 Openness low 0.196 0.152 0.148 0.218
3 Conscientiousness high 0.994 1.333 1.507 1.258
4 Conscientiousness low 0.241 0.188 0.191 0.267
5 Extraversion high 0.169 -0.060 0.258 0.017
6 Extraversion low 0.181 0.135 0.130 0.194
7 Agreeableness high 1.239 0.964 1.400 1.191
8 Agreeableness low 0.226 0.180 0.189 0.259
9 Non-Neuroticism high 0.636 0.777 0.876 0.729
10 Non-Neuroticism low 0.207 0.159 0.166 0.238
[20]:
_b5._priority_skill_calculation(
    correlation_coefficients = df_professional_skills,
    threshold = 0.5,
    out = True
)

_b5._save_logs(df = _b5.df_files_priority_skill_, name = 'skill_candidate_ranking_mupta_en', out = True)

# Optional
df = _b5.df_files_priority_skill_.rename(columns = {'Openness':'OPE', 'Conscientiousness':'CON', 'Extraversion': 'EXT', 'Agreeableness': 'AGR', 'Non-Neuroticism': 'NNEU'})
columns_to_round = df.columns[1:]
df[columns_to_round] = df[columns_to_round].apply(lambda x: [round(i, 3) for i in x])
df
[20]:
Path OPE CON EXT AGR NNEU Analytical Interactive Routine Non-Routine
Person ID
2 speaker_06_center_83.mov 0.651 0.664 0.607 0.644 0.621 0.402 0.436 0.595 0.479
10 speaker_27_center_83.mov 0.526 0.661 0.443 0.559 0.554 0.365 0.419 0.524 0.451
6 speaker_15_center_83.mov 0.566 0.544 0.493 0.587 0.499 0.301 0.327 0.422 0.377
1 speaker_01_center_83.mov 0.565 0.539 0.441 0.593 0.489 0.299 0.325 0.421 0.375
4 speaker_10_center_83.mov 0.499 0.511 0.413 0.469 0.444 0.176 0.194 0.212 0.212
8 speaker_23_center_83.mov 0.486 0.522 0.310 0.432 0.434 0.172 0.192 0.210 0.208
9 speaker_24_center_83.mov 0.417 0.473 0.321 0.445 0.415 0.088 0.068 0.069 0.099
3 speaker_07_center_83.mov 0.436 0.487 0.314 0.415 0.397 0.087 0.068 0.069 0.098
7 speaker_19_center_83.mov 0.506 0.438 0.431 0.456 0.441 0.084 0.094 0.118 0.136
5 speaker_11_center_83.mov 0.395 0.342 0.327 0.427 0.355 0.078 0.060 0.061 0.087