Insights from interval-valued ratings of consumer products—a DECSYS appraisal

The capture and analysis of interval-valued data has seen increased interest over recent years. This offers a direct means to capture and reason about uncertainty in data, whether obtained from sensors or from people. Open-source software (DECSYS [1]) was recently released to facilitate the efficient capture of interval-valued survey responses. Potential real-world applications are broad ranging, and this paper documents an initial test-case of the software and its underpinning methodology, in a marketing-centric application. It provides an illustration of the insights offered by interval-valued responses, in this case relating to consumer preferences. We apply two approaches to describe and draw insights from the data: inferential statistics and descriptive visualisation methods. Statistical results indicate that overall purchase intention was well-described by four factors: value, healthiness, taste and brand. The capture of uncertainty information, afforded by intervals, also permitted identification of six factors that contribute to purchase intention uncertainty— relating to taste, ethics and visual appearance. Visualisations of interval-valued responses, using the IAA [2]–[5], also highlighted factors with high degrees of uncertainty—in particular, product ethics. This information could prove valuable for retailers in determining how to focus future marketing campaigns. It may prove equally valuable for market regulators, by informing where to improve product labelling information. More generally, the case study provides an overview of capturing and analysing intervals, highlighting some of the challenges, but also the unique potential to gain additional insights not available using conventional, ‘crisp’, approaches.


I. INTRODUCTION AND BACKGROUND
Collecting interval-valued responses offers the potential to obtain greater information content than point responses. The variable dimension of width that is possessed by an intervalvalue informs the degree of certainty, variability or fuzziness associated with the response in question. This information is often overlooked by traditional discrete, or point, response modes (such as Likert-type [6], or Visual Analogue [7] scales), for which the forced specificity may give a false impression of certainty or precision that is not truly warranted-in fact, this may increase the noise present in the response.
Conventional methods of obtaining interval-valued responses, e.g. through soliciting two separate (minimum and maximum) endpoints, are substantially more effortful and time consuming than equivalent point-response modes, for both survey participants and administrators. In combination with relatively low awareness concerning the potential benefits of This work was part funded by the Horizon Centre for Doctoral Training, and the EPSRC EP/P011918/1 grant. interval-valued data, this has contributed to low levels of general adoption of interval-valued data capture across the wider research community.
Recently released open-source software-'DECSYS' [1]permits electronic capture of interval-valued responses, using an ellipse response format that aims to be quick, simple and intuitive to use, thus maximising response information capture efficiency. Therefore, a tool now exists to facilitate the efficient capture of interval-valued responses, which may prove valuable across a range of research areas. However, these have yet to become widely adopted.
Complementary research studies are ongoing to address socio-technical considerations-concerning the basic efficacy of this response mode to capture multiple sources of respondent uncertainty and response variability-and also to demonstrate the value of the resulting data in real-world contexts [8]. The present study was conducted to demonstrate both the value provided by interval-valued responses and the use of the DECSYS software's ellipse response mode, to capture said intervals. To achieve this, we describe the implementation of this method in the context of one of many real-world applications for which the additional information captured by interval-valued responses could provide valuable insights. The area chosen is an important and widely studied area of market research-that of consumer preferences, for which the breadth of literature ranges from wine [9]- [14], to apples [15]- [20] to electric vehicles [21]- [26]. In the present case, we investigate consumer ratings in relation to a range of snack food products.
Understanding consumer attitudes, and the factors that influence these, is fundamental to this far-reaching area of research [27]. Moreover, existing literature already acknowledges the importance of understanding consumer conviction, or attitude certainty-both in terms of influencing future choice behaviour and in determining openness to persuasion versus resistance to future attitude change [28]- [31]. This may be a key consideration in informing which attributes it may be most effective to focus advertising material upon, when the aim is to encourage or reinforce positive attitudes towards a product. Identification of areas for which consumers are highly uncertain may also be of value to market regulators, whose objective may instead be to ensure that consumers are sufficiently well-informed to make sensible purchase decisions.
This study represents a test case, designed to examine consumer values, product perceptions, and also the uncertainty surrounding these, by capturing this data in interval-valued format using the DECSYS software. We use these data to build statistical models, and also provide data visualisations, using the Interval Agreement Approach [2]- [5], to demonstrate insights that may be provided; both into consumer values and product perceptions, and into how these contribute towards overall product purchase intention.
Potential benefits of obtaining interval-valued responses, as identified a priori, include: • Added predictive value, in terms of overall purchase intention, provided by uncertainty associated with factor ratings-as indicated by interval width. • Added information regarding uncertainty surrounding overall purchase intention-as indicated by interval width-as well as allowing identification of the factors associated with this. • Added information regarding consumer uncertainty, in relation to both self-reported values and product attribute perceptions. In addition to consumer preference ratings, we collect participant ratings concerning their use of the ellipse response mode itself. These focus on perceived ease of use, unnecessary complexity, effective communication and overall liking. This allows a secondary assessment of whether respondents subjectively felt that the new response format offered additional value, and whether this was at the cost of increased workload.
In Section II we describe the participants, stimuli and procedure of the experimental study-including details concerning the processes of both data collection and analysis. In Section III we report the findings of both descriptive and inferential strands of analysis of the data obtained in the study. In Section IV we summarise and provide a discussion of the key findings and lessons learned.

A. Study Participants
A total of forty participants were recruited to take part in the study, in an opportunity sample. The study was open to anyone who wished to take part, but as it was conducted on campus at the University of Nottingham, it can be assumed that a substantially greater proportion of participants were students or staff of an academic institution than would be expected in the general population. Ages ranged from 18 to 55 (M=24.7, SD=8.0).

B. Experimental Stimuli
A total of eight high-street convenience food products-i.e. 'snacks'-were selected as stimuli. Each participant viewed each product and its associated packaging, as well as tasting a small portion of each product when prompted to do so. The eight snack-food products were chosen to represent diversity along six dimensions-corresponding to the six attribute ratings made for each product (Visual Appeal, Value for Money, Healthiness, Taste, Branding and Ethics). For example, three products were chocolate bars-each of these were therefore of relatively low nutritional value and likely to be rated highly on taste-but one of these was a popular confectionery brand, another a budget supermarket own brand, and a third a brand that focuses on ethical sourcing at a higher price point. Other products are marketed as offering greater nutritional value, with options presented across a range of price points. The specific products sampled and rated in this study were 1 : •

C. Data Collection Procedure
The study procedure was approved by the Ethics Committee at the University of Nottingham School of Computer Science. Before beginning the task, participants were provided with two information sheets. One provided general information, concerning the nature of the study, the total estimated time to take part (30 minutes), the inconvenience allowance provided (£6.50), and their right to withdraw at any time. The second provided a brief explanation of the use of the ellipse response format, including basic example responses (cf. Fig. 1). Having had the opportunity to review these information sheets and ask questions, participants who wished to proceed signed the consent form and began the study.
The primary survey was both created and administered using the DECSYS software [1]-Workshop Mode was used, with all survey responses collected locally. Questions were presented via a Microsoft Surface Pro tablet computer. The touchscreen capabilities of this device enable users to interact with the screen via a stylus (Microsoft Surface Pen). This hardware is particularly well-suited to the DECSYS ellipse response mode, because it allows for precise and intuitive provision of ellipse responses, with minimal differences from how this would be achieved using pen and paper.
Survey questions were administered in three key stages: First, before product exposure, participants were asked to provide hypothetical value ratings concerning the importance of each of the six product attributes-e.g. 'When making a product choice, how important is visual appeal?' Second, during product exposure, participants were asked, as appropriate, either to view or taste each product, or consider its packaging or information provided on this, and to provide a rating for the associated product attribute. Specific questions for each attribute are shown in Table I.
Third, after product exposure, participants were asked to provide overall ratings for their purchase intention towards each product.
For each question, the response scale was continuous, without visual markers to indicate increments or specific intermediate values. Pen colour was red, and range markers blue-see Fig. 2 for illustration. The scale was labelled at the far left with 'Not at all' and at the far right with, 'Very much'. The survey was programmed in such a way as to ask each of the six attribute questions for each product. Question order was randomised within each section of the survey (i.e. Before, During and After product exposure). This was to keep participants engaged in reading each question considerately before responding and to preclude any potential order effects.
Following completion of the primary product rating survey, participants were also asked to provide their level of agreement with four statements concerning their subjective user experience of the ellipse response format, as administered through the DECSYS software. These were adapted from the 'Systems Usability Scale' [32], and administered separately, on paper, and using a traditional 5-point ordinal response scale ranging from 1-'Strongly Disagree', to 5-'Strongly Agree'. The questions asked were as follows: • 'I found the response-format easy to use' • 'I found the response-format unnecessarily complex.' • 'I found that the response-format allowed me to effectively communicate my desired response.' • 'Overall, I liked the response-format.' Following completion of the study participants were paid an inconvenience allowance of £6.50, they were also offered the opportunity to spend up to £1.50 from this allowance to purchase products that they had sampled at a rate discounted from retail value.

D. Analysis Procedure
Two forms of data representation and analysis are reported in this paper. First, descriptive visualisations-these are based on the IAA method [4], which provides a fuzzy set and thus allows a graphic illustration of all response intervals combined as a whole. This gives an idea of overall degree and areas of agreement, along with the degree of uncertainty that is held within the group as a whole. The resulting fuzzy sets also enable subsequent reasoning over the data using fuzzy logic or similarity measures for example-an area not further explored in this paper. Second, inferential statistics are applied to key characteristics of the interval-valued ratings. In this case, each interval is decomposed into two point-values: the midpoint (m), and the width (w). These independently represent the mean position along, and spread across the response scale, with the latter representing the uncertainty, or range, associated with each response. We then apply linear mixed effects modelling (an extension of linear regression) to assess the influence of each of these characteristics, in relation to each product attribute, upon the corresponding values for overall product purchase intention. 2 This approach estimates the contribution of each attribute's midpoint and width together upon the model outcome variable, i.e. for Model 1-purchase intention rating midpoint, Model 2-purchase intention rating width. These contributions are represented in the form of β weights. The product attribute rating characteristics are entered as fixed effects, alongside two-way interaction terms (m · w). These represent combined effects of both response dimensions, which permit the model to identify effects that concern both a given attribute rating's position and width-e.g. high certainty around disliking a product's taste may have an opposite effect on overall purchase intention than high certainty around liking a product's taste.
These models also account for varying baseline purchase intention ratings between participants and between products, through the inclusion of random intercepts.
Two separate analyses are conducted, pertaining respectively to the dependent variables of overall purchase intention position (midpoint), and overall purchase intention uncertainty (width).
Due to the presence of a high number of initial fixed effects (18), an iterative process of backwards stepwise variable reduction was used to 'prune' the variables present within each model, leaving only those that were found to contribute significantly to the outcome variable. This was done for the purposes of increasing model interpretability, although it is important to bear in mind that this method leads to inflation of the Type 1 error rate for variables retained in the final model, by comparison with retaining all initial factors-when interpreting results, confidence in the robustness of effects in relation to their reported p-values should therefore be adjusted accordingly.
Refer to Table I for variable notations. The sum of all simple effects for product attribute rating positions (midpoints) is i,j is the value m (midpoint) of attribute a (visual appearance) for i (a given participant) and j (a given product), and z reflects the model's outcome variable, which may be either m (midpoints) or w (widths) of the overall purchase intention.
The sum of all simple effects for product attribute rating uncertainties (widths) is where x aw i,j is the width w of attribute a (visual appearance) for i (a given participant) and j (a given product).
The sum of the interactions between the midpoints and widths of the product attributes is Our initial model formula to explain the overall purchase intention midpoints (γ Aom i,j ) and widths (γ Aow i,j ) is then where z reflects the model's outcome variable, which may be either m (midpoints) or w (widths), for participant i relating to product j; β 0 denotes the fixed intercept; µ i and µ j denote respective random intercepts for participant and product; and represents the error. The remaining β terms (within A m , A w and A mw ) denote the coefficients of the fixed effects of the product attributes.
Each of the initial models, as presented above, was then subjected to a backwards stepwise variable elimination procedure. During this, fixed effects were iteratively assessed and those that did not significantly contribute to the overall model were removed. Specifically, this process began by selection, from the pool of all non-significant fixed effects, of the effect with the t-statistic closest to zero. This variable was then removed, and the resulting model directly compared with the preceding one, using the Theoretical Likelihood Ratio test. This was implemented through the MATLAB f itlme and compare functions. If the benefit of retaining the variable in question was calculated to be non-significant, then the model with the lower Bayesian Information Criterion (BIC) was retained into the next iteration. This procedure continued until a final model was determined, within which all fixed effects were statistically significant. Final model parameters were then re-estimated using Restricted Maximum Likelihood (REML) method, to reduce bias in random effect estimates.

A. Descriptive Visualisations
Example visualisations of the interval-valued data, modelled as fuzzy sets, using the IAA are shown in Figures 3, 4 and 5. Table II shows the factors retained in the final predictive model for mean position of overall purchase intention rating, subsequent to the stepwise variable removal process. Model results identify four key factors that hold substantial influence over the overall purchase intention rating position. These relate to four product attributes, and all relate to the mean position, rather than the width, of the attribute rating. Of these, taste was found to be the most robust factor, followed by brand, then price, and then perceived healthiness. Neither visual appearance of the actual product nor perceived ethics associated with the product made significant contributions to overall mean purchase intention. Moreover, in this case the width of none of the interval-valued attribute ratings significantly influenced mean purchase intention rating. Furthermore, no significant two-way interaction terms (i.e. xm · xw) were found, indicating that the attribute rating widths did not substantially moderate the influence of the attribute rating positions on overall purchase intention.   Table III shows the factors retained in the final predictive model for uncertainty surrounding overall purchase intention-as represented by rating width-subsequent to the stepwise variable removal process. Model results identify six factors that significantly influenced uncertainty around overall purchase intention. These relate to three different product attributes: visual appearance, taste, and ethics. Three of the six effects relate to mean attribute rating position, one to the width, and two to the interaction of mean position and width (i.e. xm · xw). Overall purchase intention was more certain when taste, ethics and visual appearance mean rating positions were low-implying that participants were more certain about their intention to purchase a product when they held negative sentiment rather than positive sentiment toward a product on these three factors. When visual appearance ratings were more certain so were overall ratings. Two-way interaction terms between position and width of ratings for each ethics and appearance show that the combined effects of these differed significantly from the additive effects of

C. Attribute Importance-Subjective Ratings vs Model Effects
We aggregate and summarise hypothetical product attribute importance ratings-as provided by participants in the first stage of the survey (before product exposure), using three different descriptive statistics. We then compare these against the output of the first mixed effects model, which determined the relative influence of attribute ratings, provided during product exposure, on overall purchase intention ratings, provided post-product exposure. The aim was to assess whether there was any substantial discrepancy between subjective and model derived attribute importance. Results are shown in Table IV.
Interestingly, the rank order of participants' self-reported factor importance ratings, according to both the Mean of Midpoints and the IAA Centroid metrics, was: taste > value > health > appearance > ethics > brand. The only difference for the IAA Mean of Maxima metric was the ordering of appearance and health. All subjective measures agreed that brand was least important factor. By contrast, the order of importance determined by the mixed-effects model was: taste > brand > value > health > appearance ≈ ethics, suggesting that when predicting actual overall purchase intention ratings, brand rating provided the second greatest contribution.

D. Subjective Feedback
Subjective feedback ratings concerning participant sentiment towards use of the ellipse response mode were obtained, along four key characteristics. These were collected using a conventional five point ordinal scale, ranging from 1-Strongly Disagree, to 5-Strongly Agree. Results, shown in Table V, are re-scaled with zero as the scale midpoint (range -2.5, 2.5), such that negative values show disagreement and positive values agreement. It is clear from the mean and 95% confidence intervals that, as a group, participants rated their agreement as significantly greater than zero on the three positive factors, and significantly lower than zero on the one negative factor.
A linear multiple regression model was conducted to assess the influence of the three former factors on overall liking of the response format, results are shown in Table VI. These indicate that ease of use and how effectively participants perceived the ellipse response mode to allow them to communicate their desired responses were the primary predictors of overall liking. IV. SUMMARY, CONCLUSIONS AND FUTURE WORK This paper documents a real-world application of the DEC-SYS software and interval-valued ellipse response format. We argue that collecting interval-valued responses provides greater information content within each individual response-and that the ellipse response mode permits coherent, efficient and intuitive capture of the uncertainty, variability or vagueness associated with each answer. In this study, we demonstrate the capability of DECSYS to capture interval-valued responses in a relevant practical context-consumer preference researchand evaluate how this may deliver insights into consumer values, product perceptions, and the importance of different factors in forming purchase intentions. This appraisal was designed to provide an initial exposition of the potential value offered by the use of interval-valued data generally, and DECSYS and the ellipse response format specifically, to encourage broader uptake of and engagement with these methods.
In the study, participants sampled a total range of eight snack food products, and rated these on a range of six attributes, which could each potentially have substantial influence on overall sentiment and purchase intention towards the products in question. One primary objective was to determine which factors held substantial influence over overall product purchase intention, across the full range of products-we did not aim to robustly establish superiority or preference for any given product across the broader population. Crucially, as we collected interval-valued ratings, we were also able to assess the influence of uncertainty around product ratings, in terms of both product attributes and overall purchase intention.
Results found that the mean interval-valued ratings representing product purchase intention were substantially influenced by four product attributes: taste, branding, price and perceived healthiness. The uncertainty around these, or other, product attribute ratings was not found to add significant explanatory value to this model. However, the capture of uncertainty associated with overall purchase intentions permitted identification-through another model-of a number of factors that influenced this degree of uncertainty. These included attributes that were not found to significantly affect average purchase intention rating. For example, perceptions around both the appearance and ethics of a product were found to significantly influence uncertainty around overall purchase intention, although they did not affect mean degree of purchase intention.
The capture of interval-valued responses can also provide interesting insights concerning consumer perceptions of different product attributes-as illustrated through fuzzy-set based data visualisations such as the IAA. These illustrate differences between subjective importance ratings provided for the six attributes before product exposure, between overall purchase intention for each of the eight products following product exposure, and between ratings of each of the eight attributes themselves during product exposure. For instance, these clearly show a high relative degree of uncertainty, across a large proportion of respondents, in relation to product ethics. In addition, these plots highlight whether group ratings more closely approximate uni-, bi-or multi-modal distributions. Combined with further analysis-e.g. to break down the characteristics or consumer archetypes of respondents-this may provide further valuable insights into drivers behind consumer sentiment and purchase intention.
Before sampling the products, participants estimated how important each of the six factors were when making a product choice. This allowed comparison between these initial subjective ratings and the importance as estimated by the statistical model, based upon subsequent product and purchase intention ratings. Interestingly, this analysis highlighted one substantial discrepancy between self-reported and model determined factor influence. This was in the importance of branding-which was rated by participants, on average, as the least important of the six attributes when making a product choice, but identified by the statistical model as having the second largest effect on average purchase intention.
As an addendum to the primary study, we also solicited participant feedback regarding their use of the (interval-valued) ellipse response mode, as administered through the DECSYS survey software. We found that user feedback was consistently positive. Participants reported that they found the survey easy to use, that it was not unnecessarily complex, that it allowed them to effectively communicate their desired responses, and that they liked it overall. Of course, these ratings should not be over-interpreted in the absence of comparable ratings for traditional, or other alternative response formats. In the future, we plan to report findings from more comprehensive research designs, which will focus on both user feedback and more objective ease-of-use measures, pertaining to the ellipse response mode and alternatives of varying complexity, in order to empirically inform the 'effort vs information trade-off'. Nonetheless, the present results are promising in respect to the potential uptake of this response format from the perspective of maintaining a very manageable participant workload.
To conclude, an absence of tools to allow easy collection, collation and analysis of interval-valued responses has held back their use in wider research. The DECSYS software facilitates electronic capture of interval-valued responses, using an efficient and intuitive ellipse response format. This paper documents an initial appraisal of this approach, as applied to the practical research problem of determining factors that influence consumer sentiment and purchase intention-for a range of snack food products. The study demonstrates insights obtained from interval-valued response data, both using descriptive data visualisations and inferential statistical analyses. Although the statistical methods applied in this study involved both position and width dimensions of intervalvalued survey responses, these dimensions were in this case represented independently, by point-values. In the future we plan to develop and apply tailored statistical approaches and tools to enable more holistic collation and analysis of intervalvalued responses. Note that due to the sample of the present study-considering both size and location-we do not claim that outcomes are representative of consumer opinion within the general population. Nevertheless, we believe that these findings do provide credible evidence for the importance of further exploring the utility of intervals, and thus reinvesting effort into associated research. We propose that there is substantial scope for such research, pertaining to many aspects of this subject area-from mathematics, statistics, and information theory, to general methodology, and broader qualitative analysis. In terms of fuzzy sets, significant potential exists to leverage both existing tools and future research to support the modelling, representation (as shown here with the IAA) and reasoning with uncertain, interval-valued data.
Future work, some of which is under way, will also be necessary to reinforce and further build upon the foundations of empirical evidence for the value of capturing this type of response-data, both in general and as applied to a variety of specific contexts. Once a critical mass of such evidence is achieved, and coupled with a sufficient degree of exposure and acceptance within the wider research community, it is hoped that the use of interval-valued survey responses will see much broader adoption and provide improvements across a multitude of research domains.