Value Portrait: Assessing Language Models' Values through Psychometrically and Ecologically Valid Items

Motivation

Existing works measure the perceived values that annotators believe the text expresses. However, this approach does not guarantee whether a person who prioritizes a certain value actually says the text.

Also, existing value-oriented datasets either focus on safety scenarios or rely heavily on standardized psychometric questionnaires. Hence, they do not comprehensively capture the diverse range of real-world scenarios in which LLMs are commonly used and express their values (through generated text).

This motivated us to us to construct Value Portrait using a carefully curated set of human-LLM conversations from ShareGPT and LMSYS, supplemented with human-to-human advisory interactions from Reddit and DearAbby archives.

Evaluation Framework

Our evaluation framework is organized into three key steps:
(1) filtering query-response pairs,
(2) collecting responses from LLMs, and
(3) assessing their value orientations.

First, for each value dimension, we retain items with correlations of at least 0.3 (with p-value < 0.05) with their corresponding value.

Second, we present each item to the LLMs and collect their ratings using a 6-point Likert scale. For each item, we ask "How similar is this response to your own thoughts?"—maintaining consistency with our dataset construction methodology.

Since LLMs exhibit sensitivity to prompts we use six prompts in our evaluation. Three prompts were adapted from previous works to suit our research context, and the other three were obtained by reversing their order of options. The final results are obtained by averaging the responses of the LLM from the six prompts.

For the final step, the evaluation of an LLM’s value orientation follows a two-step process:
(1) calculate the mean score for each value dimension across its corresponding items, and
(2) adjust each score by subtracting the average of all item responses. This methodology, adapted from Schwartz’s research on human value assessment, enables us to identify relative value priorities by adjusting for differences in how LLMs use response scales. These normalized scores across value dimensions represent LLM’s value orientations.

Results

We evaluate 44 LLMs with our benchmark and find that these models prioritize Benevolence, Security, and Self-Direction values while placing less emphasis on Tradition, Power, and Achievement values.

Bias Analysis

Our analysis reveals biases in how LLMs perceive various demographic groups, deviating from real human data. This comprehensive evaluation helps identify and address value-related biases in language models.

BibTeX

@article{han2025value,
        title={Value Portrait: Assessing Language Models' Values through Psychometrically and Ecologically Valid Items},
        author={Han, Jongwook and Choi, Dongmin and Song, Woojung and Lee, Eun-Ju and Jo, Yohan},
        journal={arXiv preprint arXiv:2505.01015},
        year={2025}
      }

Value Portrait: Assessing Language Models' Values through Psychometrically and Ecologically Valid Items

TL; DR. Value Portrait is a benchmark for assessing the values of language models across diverse real-world scenarios.

Abstract

Motivation

Evaluation Framework

Results

Bias Analysis

Education Bias Analysis

Political Orientation Bias Analysis

BibTeX