“Speak of the Devil… and he Shall Appear”: Religiosity, Unconsciousness, and the Effects of Explicit Priming in the Misperception of Immorality

Psychological theory and research suggest that religious individuals could have differences in the appraisal of immoral behaviours and cognitions compared to non-religious individuals. This effect could occur due to adherence to prescriptive and inviolate deontic religious-moral rules and socio-evolutionary factors, such as increased autonomic nervous system responsivity to indirect threat. The latter thesis has been used to suggest that immoral elicitors could be processed subliminally by religious individuals. In this manuscript, we employed masking to test this hypothesis. We rated and pre-selected IAPS images for moral impropriety. We presented these images masked with and without negatively manipulating a pre-image moral label. We measured detection, moral appraisal and discrimination, and physiological responses. We found that religious individuals experienced higher responsivity to masked immoral images. Bayesian and hit-versus-miss response analyses revealed that the differences in appraisal and physiological responses were reported only for consciously perceived immoral images. Our analysis showed that when a negative moral label was presented, religious individuals experienced the interval following the label as more physiologically arousing and responded with lower specificity for moral discrimination. We propose that religiosity involves higher conscious perceptual and physiological responsivity for discerning moral impropriety but also higher susceptibility for the misperception of immorality.


Introduction
Much heated debate surrounds whether religious and nonreligious individuals differ in relation to understanding immorality (Shariff, 2015) and-if so-in what way (Magee & Hardin, 2010). This subject is relevant now more than ever for the wider public (The Guardian, 2019) and has been approached mainly by social-evolutionary psychology (Ebstyne King & Furrow, 2008) and neuroscience (Shariff, Willard, Andersen & Norenzayan, 2016). Consensus concerning this most important of traditional (see for example, Freud, 1950) and contemporary psychological subjects (see for example, McKay & Whitehouse, 2015) has not been achieved by psychological researchers. Nevertheless, several experimental and theoretical models have been employed to explain whether and how religious and non-religious individuals differ in relation to the processing of immoral elicitors (Baumard & Boyer, 2013).
From a theoretical point of view, the argument has been suggested that religious and non-religious individuals could differ in relation to the processing of immoral elicitors due to evolutionary factors. Baumard and Boyer (2013), based also on seminal previous texts on the nature of evolution and religion (see for example, Barrett, 2000), have provided one possibly formative theory in this area. Their theory involves that early societal intuitional processes and behaviours were formed to provide order in archaic human groups. These intuitive behaviours occurred early during human societal evolution and predated the emergence of conscious moral reasoning, and organised and reflective morality (Baumard & Boyer, 2013, p. 278).
As human societies evolved, conscious moral reasoning emerged in the form of secular laws while moral intuition became part of religious beliefs related to proportionality. Proportionality in this context marks the passage from archaic religions that involved-arguably (McKay & Whitehouse, 2015)-indifferent Gods as regards to moral behaviours to moral religions, such as Christianity, Judaism, Islam and Sikhism, that reward-most frequently in the afterlife (see for example, Maclean, Walker & Matsuba, 2004)-the moral qualities of religious believers (Baumard, Hyafil, Morris & Boyer, 2015; but see also Graham & Haidt, 2010).
Arguably (Shariff, 2015), this binary evolution of conscious moral reasoning to secular organised morality and pre-conscious moral intuition to religious belief induced an evolutionary dichotomy between a slower, reflective and conscious moral system of secular moral reasoning and a fast, automatic and implicit system of religious moral processing, respectively (Baumard & Boyer, 2013;p. 273). The latter in particular has been suggested to have retained its unconscious and/or pre-conscious (Dehaene, Changeux, Naccache, Sackur & Sergent, 2006) links to intuition. These links are suggested to have been preserved via the participation in the ''collective effervescence'' (Durkheim, 1965) of communal ritualistic religious practice (Boyer & Lienard, 2006). These links have also been suggested to have been preserved due to the experience of moral intuition as supernatural moral invigilation (Shariff, 2015). According to this suggested phenomenon, automatic and implicit experiences of moral intuition are experienced as directly linked to, and accessible and supervised by a punishing or rewarding transcendental entity by religious believers (Boyer & Bergstrom, 2008).
This compelling and influential theoretical model is possibly the leading but not the only perspective (van Slyke, 2016) in this area (Cohen & Rozin, 2001;Cohen, Malka, Rozin & Cherfas, 2006;Saroglou & Cohen, 2011). For example, evolutionary anthropology, and the psychology and neuroscience of religious rituals and ritualistic behaviour (Visala, 2016) have been used as rally points for researchers to suggest the theory of the Evolved Hazard-Precaution System (EH-PS). This model suggests that acute perception, cognitive processing, and conscious detection are necessary for processing immoral elicitors for religious and non-religious individuals (Sun, 2012). This model proposes that religious and non-religious individuals possess a processing system adapted for avoiding contamination, such as contact with agents with pathogenic consequences. This system is related to inferred-indirect (Boyer & Lienard, 2006) as opposed to predatory-direct (Pessoa & Adolphs, 2010) environmental danger, and it is suggested that, therefore, it cannot involve unconscious and subcortical neural response pathways (Brooks et al., 2012;van der Ploeg, 2017; but see also Tsikandilakis, Bali, Derrfuss & Chapman, 2019a, 2020a. This moral processing system is suggested to enhance the salience of perception of an environmental occurrence relating to inferred danger to achieve avoidance of contact with pathogenic agents (Boyer & Lienard, 2006;p. 604). This pathogen detection system has been suggested to have distinctly evolved in contemporary settings to the acute processing of possible environmental immoral elicitors to facilitate the maintenance of spiritual purity in religious individuals (Preston & Ritter, 2012). This system is suggested to be more potent in religious individuals and further enhanced-even in the form of ex-post-facto metacognition (see Boyer & Lienard, 2006;p. 602)-due to the participation of religious individuals in collective religious practices (Barret, 2000;p. 31;Boyer & Bergstrom, 2008;p. 119-120). The basis for this argument is that attendance to religious practices communicates and stimulates strict moral self-regulation, abstinence from perceived immorality and conscious adherence to prescriptive and inviolate divine moral laws (Shariff, 2015).
Experimental explorations of these theoretical perspectives have been provided to some extent by previous research. For example, in a seminal study in this area, Pichon, Boccato and Saroglou (2007) presented religious and non-religious-related words for 15 ms. The words were backwards masked by a string of X letters for 500 ms that were subsequently followed by neutral or anagrams of neutral words for 500 ms. For religious individuals, the presentation of religious words for 15 ms led to higher intention for pro-social behaviour, such as willingness to participate in a post-trial charity task (Pichon, Boccato & Saroglou, 2007;p. 1035). Similar effects for subliminal priming with religious words resulting in higher moral sensitivity for religious participants have been reported by subsequent research that found increased pro-social moral behaviour in moral dilemma tasks when religious individuals were primed with religion related words (see for example, Ahmed & Salas, 2008; see also Shariff, Willard, Andersen & Norenzayan, 2016). Previous research has also found decreased reports for self-authorship for a subsequent morally related behaviour when religious believers were primed with religion related words (Dijksterhuis, Preston, Wegner & Aarts, 2008). These findings suggest that religious individuals could have higher sensitivity to primes related to morality and immorality but also that they possibly experience their moral-related cognition and behaviour as ''guided'' by a non-secular, possibly transcendental and deontic, moral code (Shariff, 2015; but see also Atran, 2002).
Surprisingly, the majority of experimental studies in the area of unconscious moral processing in relation to religiosity have been conducted using either masking of linguistic concepts or implicit processing of religious words, such as unscrambling word search tasks and implicit association tests (see Sharif et al., 2016). Unlike other areas of psychology, where masking of faces and images is possibly the predominant technique for exploring subliminal processing (see, for example, van der Ploeg et al., 2017), morality and religiosity have been explored using masked or primed image stimuli only in a handful of studies. These few studies used non-morality related masked image primes and focused on whether these primes can induce disgust sensitivity (Preston, Ritter & Ivan Hernandez, 2010), political and secular conservatism (Samuel, 2016), whether the primes associated with developmental attachment (Birgegard & Granqvist, 2004), racial prejudice (Howard & Sommers, 2017), the regulation of negative emotion (Harenski & Hamann, 2006) and moral and immoral attitudes as personality traits (Luo et al., 2006; see also Garrigan, Adlam & Langdon, 2016).
The scarcity of experimental implementations using morally related images in this area will become more clear if we take into account that in research relating to subliminality and emotion the scientific community has provided a plethora of acknowledged, extensively used and validated facial and image datasets (see, for example, Brooks et al., 2012). In the area of moral processing the development of set and acknowledged image databases for presenting moral and, particularly, immoral image stimuli (Crone, Bode, Murawski & Laham, 2018) is not sufficiently developed (see McCullough & Willoughby, 2009; see also Shariff et al., 2016). Experimental research in this area-as also in other relevant areas of psychology (see, for example, Axelrod, Bar & Rees, 2015)-would either have to go to the length of creating a relevant dataset (see, for example, Clifford, Iyengar, Cabeza & Sinnott-Armstrong, 2015;Tsikandilakis et al., 2019a, b, c;Tsikandilakis et al., 2021b;in press) or thoroughly and reliably validate a subcategory of images from an existing database (see, for example, Harenski, Antonenko, Shane & Kiehl, 2010;Tsikandilakis, Bali & Chapman, 2019a, b, c). This hurdle is particularly important because both traditional (see, for example, Freedberg, 1989) and more contemporary (see, for example, Warren, 2009) psychological research has shown that images are potent elicitors for the activation of neural structures such as the amygdala, and the cingulate and insular cortices that are associated with automatic processing and peripheral nervous system arousal (Brooks et al., 2012). Faced with this challenge, in the current research, we opted for the rigorous pre-selection of immoral stimuli from an established dataset to provide the first exploration of the effects of image masking of immoral and morally neutral elicitors in appraisal responses, and physiological processes, such as skin conductance (SCR), heart rate (HR) and facial expression responses in religious and non-religious individuals. We carefully and rigorously preselected from the International Affective Picture System (Lang, Bradley & Cuthbert, 1997) a subset of stimuli that were related to moral impropriety in previous reviews (Ellemers, van der Toorn, Paunov, & van Leeuwen, 2019; see also Supplementary Material 3.1). Subsequently, we replicated the psychophysics of previous studies that claimed subliminal perception of IAPS images (see, for example, Ponseti & Bosinski, 2010;Wetherill et al., 2014;Huang, Sun & Vaina, 2019), presented these images with backwards masking and measured self-report, physiological and facial expressive responses with and without using different moral labels as primes for the presentation.
We explored whether the stimuli were processed subliminally using our previously developed model for the assessment of unconscious processing (see Tsikandilakis, Bali, Derrfuss & Chapman, 2019a, 2020a including the calculation of detection performance using unbiased ROC characteristics (Zhang & Mueller, 2005), Bayesian analysis for evidence of chance-level performance (Dienes, 2016) that would indicate subliminal processing (Erdelyi, 2004), without inferring subliminality via non-significance (Dienes, 2014), and separate hits and misses analysis for appraisal and physiological responses (Pessoa, Padmala & Morland, 2005a, b;Pessoa, 2005;Pessoa, Japee, Sturman & Ungerleider, 2005a, b). As an exploratory objective we tested whether religious individuals would report higher physiological responses to masked immoral elicitors and whether these responses will be due to conscious (above chance-level) or unconscious processing (see Tsikandilakis, Peirce & Chapman, 2018;Tsikandilakis, Bali, Derrfuss & Chapman, 2019a, 2020a. Finally, we hypothesized that in the presence of a negative pre-trial moral label religious individuals will respond with higher miss-discrimination and lower specificity for seeing an immoral image due to susceptibility to prescriptive and deontic moral rules (Shariff, 2015).

Aims
The first objective of the current stage was to present religious and non-religious individuals with IAPS images for one second and explore differences in ratings relating to emotionality, valence, salience and moral impropriety. Our hypothesis was that religious individuals will respond with higher ratings for moral impropriety to the presented images (Shariff, 2015;Zinnbauer et al., 2015). The second objective of the current stage was to select morally improper and morally innocuous images for use in the next stage of the current research.

Participants
Participants were invited to a preliminary screening study one week prior to the first main experiment. A power calculation based on medium effect sizes (partial eta squared = 0.6; f = 0.25) revealed that an n = 58 would be required to achieve a power of P (1 -b) C 0.8 (Faul et al., 2009). A total of sixty-nine British individuals volunteered to participate in the screening stage. The participants in this stage were allocated to two groups. For the religiosity group the inclusion criteria were being affiliated with a religion, being a practising member of the same religion and believing in the existence of the God or Gods that their religion of practise and affiliation supports (Zinnbauer et al., 2015). For the non-religiosity group the inclusion criteria were not being affiliated with a religion, not being a practising member of a religion and not believing in the existence of a God or Gods associated with an organized religion or spiritual/religious group (see Peterson & Ruse, 2016).
The participants were assessed with several questionnaire assessments to ensure that potential differences between the two groups were due differences in religious beliefs and not emotionality-related and life-experiencerelated or other possibly confounding factors, such as socioeconomic status and education (see Tsikandilakis et al., 2019a, b, c;p. 923; see also Russell et al., 2003;p. 331-337). The participants were screened with questionnaire assessments related to emotional, moral and religious characteristics with order randomized. The participants were screened with the Emotional Reactivity Scale (ERC; Nock, Wedig, Holmberg & Hooley, 2008), the Santa Clara Strength of Religious Faith Questionnaire (SCS; Plante, Vallaeys, Sherman & Wallston, 2002), the Intratextual Fundamentalism Scale (IFS; Sharrock, 2019), the Social and Political Conservatism Scale (SECS;Everett, 2013) and the Moral Foundation Questionnaire (MFQ-20; Davies, Sibley & Liu, 2014). The participants were also assessed for eligibility for being included in the current stage using the Personality Disorder Questionnaire (PDQ; American Psychiatric Association, 2013), a modified version of the Moral Injury Questionnaire (MIQ; Koenig et al., 2018; see Supplementary Material 1.1), the Somatic and Psychological Health Report Questionnaire (SPHRQ; Hickie et al., 2001) and an online Alexithymia/Emotional Blindness questionnaire (Alexithymia, 2018). Data from one participant were excluded from further analysis due to having a SPHRQ score ([ 3) that indicated a possible psychiatric diagnosis. Data from one participant were excluded from further analysis due to PDQ scores ([ 0.5) that indicated a possible personality disorder. Data from nine participants were excluded because they were nonpractising religious believers (see Shariff, 2015). The final population sample included fifty-eight participants (thirty females). For the religiosity group the final population sample consisted of eight individuals who identified as Catholics (five females), seven Church of England Protestants (four females), eleven individuals who identified as Muslims (seven females) and three individuals who identified as Sikhs (no females; see Table 1) 1,2 .

Stimuli
Two hundred and twenty-two images, which were approved for ethical permission for inclusion in this stage, were selected for presentation during the main experiments. The images were part of the International Affective Picture System (Lang, Bradley & Cuthbert, 1997). The images included seventy-four pictures with themes that were linked in previous research to moral impropriety (Ellemers, van der Toorn, Paunov, & van Leeuwen, 2019), such as images with partial male and female nudity, images showing potential and mild violence, alcohol and drug consumption, unlawful behaviour, racial segregation, poverty, environmental pollution and offensive behaviour. The images also included seventy-four positive and negative, low and high arousal pictures that had not been directly related to moral impropriety in previous research (Haidt, 2003), such as athletic, musical, mental, cultural and technological images and activities, environmental and planetary phenomena, and images of ordinary personal and interpersonal behaviours (Moll, de Oliveira-Souza & Eslinger, 2003). The images also included seventy-four pictures of items and scenery with neutral ratings for valence, arousal and dominance (Lang, Bradley & Cuthbert, 1997). No images related to political and religious themes, such as pictures including ministers, clerics and representatives of a religion and political parties, and religious and political supporters, and religious and political locations were included (Malka, Lelkes, Srivastava, Cohen & Miller, 2012). The images did not include pictures of animals (Colden, Bruder & Manstead, 2008) and pictures showing blood, gore, death and sexual violence (Kimura, Yoshino, Takahash & Nomura, 2004). Images of identifiable people, events and locations were also excluded (see Lang & Bradley, 2007).
The selected stimuli were sorted and labelled (Mikels et al., 2005). They were transformed to grey scale and resized to a standard 1280 9 768 resolution. They were then transformed to visual vectors using CorelDRAW Technical Suite Pro 2019 and their luminescence was averaged in SHINE, MATLAB Toolbox. Two hundred and twenty-two randomly generated pattern images were also created using permutation of grey scale pixels, they were resized, transformed to visual vectors and averaged for luminance using SHINE, MATLAB. The pattern images were assessed using MATLAB & Simulink, Pattern Matching and MATLAB, Histogram of Oriented Features Thresholds and Optical Character Recognition for whether they showed repeated patterns and classifier features. Three pattern images were replaced with newly randomized pattern images based on this assessment (Qidwai, & Chen, 2009).

Procedures
Participants were invited in a quiet laboratory space in the School of Psychology of the University of Nottingham.
The participants attended two experimental sessions scheduled one week apart at the same timeslot. Each session included the presentation of one hundred and eleven IAPS images and an equal number of pattern images. In both sessions, the stimuli were presented in an HD Lenovo monitor adjusted at 60 Hz (16.67 ms per frame). The presentation was programmed in the coder and builder components of PsychoPy (Peirce, 2007).
Both experimental sessions started with a 5-min training session during which participants were briefed concerning how to respond to the engagement tasks and the terminology of the engagement tasks (Mikels et al., 2005;Gray, Adams, Hedger, Newton & Garner, 2013;Ellemers, van der Toorn, Paunov, & van Leeuwen, 2019). An interval screen was then presented, and participants were asked whether they understood the instructions, and whether they were ready to proceed to the main experiment. The participants were given the choice to ask the researcher questions before they decided to proceed; no instances of required researcher feedback were reported.
Each experimental trial started with a fixation cross for two seconds (± one second). After the fixation cross a single IAPS image or pattern image was presented at fixation for one second; order randomized. The image was immediately followed by a black and white mask for one second. The black and white mask was included to make the stimuli sequence identical to the subsequent masking stage (see also Kim et al., 2010). After the presentation, a blank screen was presented for five seconds. Participants were then asked to respond to a set of questions using the mouse. They were asked to rate the presentation for valence and intensity (consecutive assessments), and salience (Gray, Adams, Hedger, Newton & Garner, 2013;but  Demographic characteristics and questionnaire scores for the final population sample for stage one. Three participants (three females) from the religious group and four participants (two females) from the non-religious group chose not to disclose their salary scale. Assessment comparisons show that differences between groups were due to religious moral values questionnaire scores (SCS) and not random sampling differences (see Tsikandilakis et al., 2019a, b, c;p. 923; see also Russell et al., 2003;p. 331-337). For each comparison, a Bayes factor was calculated for mean differences to test evidence for the null (B \ 0.33; Dienes, 2016) Psychological Research (2022) 86:37-65 41 see also Jerram, Lee, Negreira & Gansler, 2014; see also Supplementary Material 2.1) and moral impropriety (Ellemers, van der Toorn, Paunov, & van Leeuwen, 2019; see also Supplementary Material 3.1); order randomized ( Fig. 1). After the engagement tasks a five-second blank screen interval was presented before the next trial. Analysis and Discussion. Image Ratings. We employed both frequentist and Bayesian analyses in the statistical processes of the current manuscript. For the Bayesian analyses we used the Dienes calculator (Dienes, 2014). Due to the lack a previous meta-analysis or similar studies (see Dienes, 2019), that would allow us to define a plausible predicted effect size P (see particularly, https://psyarxiv. com/yc7s5/), we assumed a basic and conventional continuous uniform distribution for the analyses (Rossman, Short & Parks, 1998). 3 We defined the two-tailed credible intervals between two means or samples using the standard error between a higher and a lower bound to apply a Bayesian equivalence of significance testing (see Dienes, 2019), We tested whether the resulting Bayes factor provided evidence that the mean difference between two scores or samples was within the predefined credible intervals and suggested evidence for the null hypothesis (B \ 0.33), or whether it was outside these intervals and provided evidence for the alternate hypothesis (B [ 3) or whether the data were inconclusive (see Dienes, 2016) and the analysis suggested that the data provided evidence for being insensitive to both hypotheses (0.33 \ B \ 3) (Dienes, 2011;see also Supplementary Material 8).
To explore whether religious and non-religious individuals differed in ratings for the IAPS images, we compared their scores for valence and its intensity, salience and moral impropriety. Religious (M = 5.12, SD = 1.01) and non-religious individuals (M = 5.21, SD = 0.94) did not respond significantly different for ratings for valence . This effect suggests that religious individuals considered more scenes as morally improper (Shariff, 2015). Previous research suggests that male and female participants differ in emotional sensitivity due to biological, developmental and social characteristics (Kret & De Gelder, 2012). In the current data we found that female (M = 5.69, SD = 1.17) and male individuals (M = 4.45, SD = 1.34) differed significantly for ratings for emotional intensity (F (1, 54) = 13.45; p = 0.001; partial eta-squared = 0.19; d = 0.99); no evidence was found for an interaction between gender and religiosity for differences for emotional intensity (F (1, 54) = 0.48; p = 0.49; partial etasquared = 0.009); no other gender effects were found.

Criteria, analysis and discussion: image selection
To control for possible differences in the processing for moral impropriety in the subsequent experimental stage we used Bayesian analysis (Dienes, 2016) to select twenty morally improper and morally innocuous IAPS images, and forty pattern images. These images were required to provide evidence for being within a-priori criteria for rating characteristics for religious and non-religious individuals (Shariff, 2015). For morally improper images, the criterion was a mean rating greater or equal to seven (C 7) for moral impropriety. For morally innocuous images the criterion was a mean rating less or equal to three (B 3) for moral impropriety. Due to the lack of positive valence ratings for morally improper IAPS images (M = 3.48, SD = 0.92), morally improper and innocuous images were required to be rated with a mean between two and four for valence (2 B M B 4) and between four and six for emotional intensity (4 B M B 6). Due to the lack of high ratings for salience in response to pattern images (M = 3.79, SD = 0.41) and to balance the visual characteristics between the IAPS images and the pattern images (Gray et al., 2013), the selected morally improper and morally innocuous images, and the pattern images were required to have a mean between two and four (2 B M B 4) for salience. Pattern images were also required to have mean ratings for valence and emotional intensity that were between four and six (4 B M B 6) and mean ratings for moral impropriety that were equal or lesser to three (B 3). Bayesian analysis (Dienes, 2016) with corrected degrees of freedom (Berry, 1996) and higher and lower bounds set at -0.5 and 0.5 was run for each rating characteristic (see Table 2). The analysis confirmed that the included stimuli did not differ (B \ 0.33) between religious and non-religious participants (Table 2; a), differed (B [ 3) between morally 3 Note that there are additional ways to define distributions and priors (see Tendeiro & Kiers, 2019) and boundary conditions (see Dienes, 2019) for providing evidence for the null hypothesis and that the Bayesian process could be justified and performed based on different assumptions compared to the ones used in the current analyses (see Dienes, 2014). 4 LB stands for Lower Bound. HB stands for Higher Bound. SE stands for Standard Error (see Dienes, 2014Dienes, , 2016 improper and morally innocuous images only in regard to moral impropriety (Table 2b) and did not differ in regard to salience between IAPS images and pattern images (Table 2c); for image codes and IAPS ratings (Lang, Bradley & Cuthbert, 1997)

Aims
The first objective of the current stage was to present religious and non-religious individuals with pre-selected morally improper and morally innocuous masked IAPS images and explore differences in perceptual and physiological responses, and moral impropriety ratings. The second objective of the current stage was to present religious and non-religious individuals with different pre-trial moral labels and then pre-selected masked IAPS images and explore differences in perceptual and physiological responses, and moral impropriety ratings. We assessed whether the images were processed subliminally. Our hypothesis for this stage was that religious individuals will have higher perceptual sensitivity, moral impropriety ratings and physiological responses to morally improper pre-image labels and brief masked images showing moral impropriety. We examined-during the primed masking stage-the physiological responses of religious and nonreligious individuals to different pre-image moral labels both separately and in interaction with the presented masked image. Our hypotheses for this analysis were that labels suggesting the occurrence of a morally improper image will result in higher physiological arousal both before and after the presentation of a masked image for religious individuals (Shariff, 2015;Zinnbauer et al., 2015). Our exploratory hypothesis was that changes in physiology and higher ratings for moral impropriety for religious individuals will involve conscious awareness, such as being recorded due to higher than chance-level ROC performance and for hits for detection of an image but not miss responses for detection of an image (see Tsikandilakis, Bali, Derrfuss & Chapman, 2020a;.

Participants
A power calculation based on medium effect sizes (partial eta squared = 0.6; f = 0.25) revealed that an n = 58 would be required to achieve a power of P (1 -b) C 0.8 (Faul et al., 2009). A total of seventy-two participants volunteered to take part in the screening stage. The participant recruitment and assessment in stage two was identical to stage one. Participants were invited to a preliminary screening study one week prior to the first main experiment. Data from one participant were excluded from further analysis due to having a SPHRQ score ([ 3) that indicated a possible psychiatric diagnosis. Data from two participants were excluded due to having scores that indicated traits for Alexithymia ([ 94). Data from one participant were excluded due to having scores that indicated the experience of traumatic morality-related experiences in the MIQ (MIQ Items 1, 4 and 7 C 8). Data from four participants were also excluded because they were non-practising religious believers. The final population sample included sixty-four participants (thirty-six females). For the religiosity group the final population sample consisted of four individuals who identified as Catholics (three females), fifteen Church of England Protestants (seven females), twelve individuals who identified as Muslims (eight females) and one individual who identified as a Sikh (male; see Table 3). No participant from stage one was included in stage two. In (a) mean and standard deviation per group for ratings characteristics and Bayes factor and effect size (Cohen's d) for comparisons. In (b) and (b) mean and standard deviation per type of image and Bayes factor and effect size (Cohen's d) for comparisons. When the analysis suggested evidence for the alternative hypothesis (B C 3) p values were calculated to confirm the report. In this table asterisk (*) indicates insensitivity for the null (B 0.33) and the alternate hypothesis (C 3) at 0.33 [ B \ 3 (Dienes, 2016); a single instance was reported (A.; Morally Improper Images, Intensity; SE = 0.09; B = 0.6; t (57) = 0.79; p = 0.43)

Physiological assessment
Combined physiological assessment was used as described in detail in our previous research (Tsikandilakis & Chapman, 2018;Tsikandilakis, Chapman & Peirce, 2018;Tsikandilakis, Bali, Derrfuss & Chapman, 2020a, b). Skin conductance and heart rate were used to assess physiological responses. Skin-conductance responses were measured from the left hand (index/first and middle/second fingers) of each participant using disposable Ag/AgCl gelled electrodes. The signals were received by a BIOPAC System, EDA100C in units of microsiemens (lS) and recorded in AcqKnowledge (Braithwaite, Watson, Jones, & Rowe, 2013). Heart rate was measured via a single-finger sensor from the left hand (ring/third finger). The signal was measured by a BIOPAC System, PPG100C using infra-red photoplethysmography of blood flow fluctuations and converted and recorded in beats per minute (bpm) in AcqKnowledge. The presence of a phasic skin-conductance response was defined as an unambiguous increase occurring up to three seconds post-stimuli offset (van der Ploeg et al., 2017). The presence of a heart-rate response was defined as an event-related heart-rate peak in beats per minute occurring up to five seconds post stimuli offset (Cacioppo, Tassinary & Berntson, 2007;p. 182). Each score was calculated using the inbuilt derive phasic from tonic and find cycles routines as the highest peak in physiological responses (d) in respect to a tonic baseline averaged across a period (dT) of one second for each prestimulus onset using parallel port-input derived pre-stimulus onset markers (Braithwaite, Watson, Jones, & Rowe, 2013;p. 38-41). Non-responders for physiological arousal were included in the analysis (see van der Ploeg et al., 2017).

Facial recognition software
Computer-based analysis of the participants was conducted with Noldus FaceReader 7.1 using an HD camera mounted on the bottom of the presenting screen and centred on the participant's face. The analysis was run using the maximum video capture frames per second allowed by the facereader equipment (thirty fps). The analysis was run using the Viola-Jones cascaded algorithm and an active appearance model (AAM) that employed a 500-point Euclidean transformation to eliminate static identification variability for image quality, lighting, background variation and orientation (Lewinski, den Uyl & Butler, 2014). Each participant was evaluated in respect to the expressed emotion after controlling for the influence of action units that were present in their own neutral expressions using the participant calibration module (Noldus, 2018). The analysis included the in-built emotional categorization labels included (anger, fear, surprise, happiness, sadness, disgust and neutral). Facial-emotional recognition of an emotion was defined as a categorical classification for a facial response up to five seconds post-stimuli offset. Participants were aware that their facial expressions were recorded.

Main experiment
Participants were invited in a quiet laboratory space in the School of Psychology of the University of Nottingham. The participants attended two experimental sessions scheduled one week apart at the same timeslot with order randomized. In both sessions, the stimuli were presented in an HD Lenovo monitor adjusted at 60 Hz (16.67 ms per frame). To ensure that brief stimuli would be appropriately presented during the main experiment an iPad camera with 240 Hz refresh rate (4.17 ms) recorded two pilot runs of the experiment and the presentation was assessed frame by frame; no instances of dropped frames were detected. An in-house developed dropped frame report script with one frame (16.67 ms) tolerance threshold was coded in Python. Two pilot experimental diagnostic sessions were run. The presenting monitor reported no dropped frames; prognostic dropped frame rate was estimated at 1/5000 trials. Experimental stages were, subsequently, run using dropped frames diagnostics; no instances of dropped frames were reported. The presentation was programmed in the coder and builder components of PsychoPy. Both experimental sessions started with a five-minute training session during which participants were briefed concerning how to respond to the engagement tasks and the terminology of the engagement tasks. An interval screen was then presented, and participants were asked whether they understood the training and the terminology, and whether they were ready to proceed to the main experiment. The participants were given the choice to ask the researcher questions before they decided to proceed; no instances of required researcher feedback were reported.
In one experimental session (masked presentation) the experimental trial started with a fixation cross for two seconds (± 1 s). After the fixation cross a single IAPS image or pattern image was presented at fixation for 33.33 ms; order randomized. The image was immediately followed by a black and white mask for 100 ms. After the presentation, a blank screen was presented for seven seconds (Cacioppo, Tassinary & Berntson, 2007;p. 164). 5 After the blank screen interval, participants were asked to respond to a set of questions with order randomized. They were asked by an on-screen message to decide whether a real-life image or a pattern image was presented using the mouse. After this task participants were asked from an onscreen message to rate the confidence for their response from one (not confident at all) to ten (extremely confident) using the mouse. Participants were also asked how emotionally intense and how morally improper the presentation was from one (not emotional/improper at all) to ten (extremely emotional/improper) using the mouse (see Fig. 2). After the engagement tasks a seven-second blank screen interval was presented before the next trial to allow physiology to return to baseline (Cacioppo, Tassinary & Berntson, 2007;p. 165). A total of twenty preselected morally inappropriate and twenty morally innocuous images, and forty pattern images were presented during this session.
In one experimental session (primed masked presentation), the experimental trial started with a fixation cross for two seconds (± one second). After the fixation cross an onscreen message for three seconds informed participants that a ''morally improper image will be presented'' (Label One: morally improper label) or ''a morally innocuous image will be presented'' (Label Two: morally innocuous label) or ''an image will be presented'' (Label Three: non-moral label). After the message, a five-second blank screen was presented. After the blank screen, a fixation cross was presented for two seconds (± one second). After the fixation cross a single IAPS image or pattern image was presented at fixation for 33.33 ms; order randomized. The image was immediately followed by a black and white mask for 100 ms. After the presentation, a blank screen was presented for seven seconds. After the blank screen interval participants were asked to respond to a set of questions with order randomized. They were asked by an on-screen message to decide whether a real-life image or a pattern image was presented using the mouse (Macmillan, 2002). After this task participants were asked from an onscreen message to rate the confidence for their response from one (not confident at all) to ten (extremely confident) using the mouse. Participants were also asked how emotional and how morally improper the presentation was from one (not emotional/improper at all) to ten (extremely emotional/improper) using the mouse (see Fig. 2). After the engagement tasks a seven-second blank screen interval was presented before the next trial to allow physiology to return to baseline. Five morally improper IAPS images and five morally innocuous IAPS images, and ten pattern images were presented for Labels One and Label Two. Ten morally improper IAPS images and ten morally innocuous IAPS images, and twenty pattern images were presented for Label Three. A total of twenty preselected morally inappropriate and twenty morally innocuous images, and forty pattern images were presented during this session.

Analysis and discussion: masked presentation session, subliminality
The first analysis that we performed for each session related to whether the images were processed subliminally. We used our previous method for assessing subliminality (Tsikandilakis, Bali, Derrfuss & Chapman, 2020a). We used non-parametric sensitivity index A-e.g., Sensitivity = TP TPþFN ¼ 1 À FN (for a comprehensive review see Krupinski 2017)-for the measurement of detection performance (Zhang & Mueller, 2005). This choice was based on advantages that A has compared to hit rates (Stanislaw & Todorov, 1999;p. 137-141) and sensitivity indexes d' (Macmillan & Creelman, 2002;p. 45-57) Fig. 2 Trial sequences and tasks for stage two. In a and b, experimental sequences for the masked and primed masked sessions are given. In c, the engagement tasks for both sessions are given; order randomized. Experimental code can be found at https://osf.io/tgnbm/ and A'' (Pastore et al., 2003;p. 556-559). 6 Along the same lines, the contemporary canon for subliminality is that participants should detect (Brooks et al., 2012) or recognize (Pessoa et al., 2005a, b) a presented cue at chance to report subliminal presentation (Tsikandilakis, Bali, Derrfuss & Chapman, 2019a;p. 6-8;Erdelyi, 2004;p. 74). Previous research has used a one-sample t-test methodology for inferring this criterion. According to this statistical approach the reported detection or recognition performance is compared to absolute chance (e.g., A = 0.5). In case of non-significant findings, the researchers claim that the reported detection or recognition performance was not significantly different to chance and, therefore, that this was evidence for unconscious processing. The problem with this approach is that not significantly different to chance-lack of evidence for the alternate hypothesis-is interpreted as evidence for the null (see Dienes, 2014). In the current section, we present results using Bayesian analysis. Bayesian analysis can be used to define the lower and upper bounds for chance-level performance (e.g., Lower Bound A = 0.45 and Higher Bound A = 0.55) and provide a calculation for a Bayes factor that would indicate at B \ 0.33 evidence for the null hypothesis, meaning that detection or recognition performance were within a-priori criteria for subliminality (see also, Dienes, 2019). A Bayesian analysis with corrected degrees of freedom (Berry, 1996) was run using the Dienes calculator (Dienes, 2016). We defined subliminality as chance-level processing (A = 0.5; Erdelyi, 2004), with substantial evidence for the null hypothesis defined as a Bayes factor (B) below 1/3 (chance-level performance) and evidence for the alternate defined as a Bayes factor above 3 (different to chance-level performance). The intervals were conservatively defined at -0.05 (0.45; lower bound) and 0.05 (0.55; higher bound) with 0 (A = 0.5) representing chance-level performance. Detection performance using non-parametric receiver operating characteristics for the masked presentation session was overall above chance (M = 0.6445; SE = 0.0119; B = ? !; for scores per group and image type see Fig. 3). This result suggests that the images were not processed subliminally during this session (Carlson, Fee & Reinke, 2009;Lewis-Evans, De Waard, Jolij & Brookhuis, 2012).
Analysis and discussion: primed masked presentation session-subliminality Hit rate performance per image type and group was transformed to non-parametric sensitivity index A. A Bayesian analysis with corrected degrees of freedom was run using the Dienes calculator and the same parameters as the masked presentation session. Detection performance using non-parametric receiver operating characteristics for the primed masked presentation sessions was overall above chance (M = 0.6328; SE = 0.0136; B = ? !; for scores per image type and group see Fig. 3). This result suggests that the images were not processed subliminally during this session.

Analysis and discussion: masked presentation sessionratings
Descriptive statistics for ratings for the masked presentation session can be found in Table 4a. To explore whether there were differences in ratings between the two groups during the masked presentation session, we compared rating responses for emotionality, impropriety and signal detection performance between religious and non-religious individuals, and further explored significant results for responses for hits and misses for detection performance. A mixed model ANOVA with independent variables Group (religious vs non-religious) and Image type (morally improper vs morally innocuous vs pattern images) was run with dependent variable Emotionality Ratings. The analysis revealed that there were no differences between the two groups (F (1, 31) = 0.47; p = 0.59; partial eta-squared = 0.015) and no interaction was found between the two groups and the image types  Table 4). As predicted, the analysis revealed that religious individuals responded with higher ratings for moral impropriety compared to non-religious individuals (F (1, 31) = 6.39; p = 0.017; partial eta-squared = 0.17). We also found a significant effect of Image Type (F (1.52, 47.12) = 122.71; p \ 0.001; Greenhouse-Geisser Corrected; partial eta-squared = 0.79; see Table 4b). As predicted, for ROC scores we found that religious individuals performed with higher perceptual sensitivity compared to 6 Compared to hit rates, A is not susceptible to noise variance due to response strategies, such as conservative or liberal biases for signal detection (Tsikandilakis, Bali, Derrfuss & Chapman, 2019a). Compared to d', A is a nonparametric sensitivity index and does not involve any assumptions concerning the shape of the underlying distributions and their interactions (Swets, 2014; but see also Hajian-Tilaki et al., 1997). A can also provide a sensitivity index for zero values, such as zero hits or miss responses, and provides diagonal Euclidean corrections to the A' and A'' algorithms for scores that lie in the upper left quadrant of the ROC curve (see Robin et al., 2011).   non-religious individuals (F (1, 31) = 8.03; p = 0.008; partial eta-squared = 0.21). We also found a significant effect of Image Type (F (1, 31) = 4.36; p = 0.045; partial eta-squared = 0.12) and a significance trend for a Group to Image Type interaction (F (1, 31) = 3.59; p = 0.067; partial eta-squared = 0.1). Bonferroni-corrected comparisons can be found in Table 4. These results suggest that religious individuals were more sensitive to images showing moral impropriety.
To further explore the reported significant findings for whether they provided evidence for subliminal processing a mixed model ANOVA was run with independent variables Group (religious vs non-religious) and Detection Performance (hits vs misses 7 ) with dependent variable Moral Impropriety ratings for morally improper images. The analysis revealed a significant effect of Group (F (1, 31) = 9.53; p = 0.004; partial eta-squared = 0.24), a significant effect of Detection Performance (F (1, 31) = 28.25; p \ 0.001; partial eta-squared = 0.48) and a significant interaction (F (1, 31) = 12.69; p = 0.001; partial eta-squared = 0.29). Hits for religious individuals (M = 5.86, SD = 0.61) were higher than hits for non-religious individuals (M = 5.13, SD = 0.6; p \ 0.001; d = 1.21; LB = -0.5; HB = 0.5; SE = 0.11; B = 1.92). Misses for religious individuals (M = 4.93, SD = 0.59) were not higher than misses for non-religious individuals (M = 5.01, SD = 0.55; p = 0.56; d = 0.14) and provided evidence for being proximate for ratings for impropriety (LB = -0.5; HB = 0.5; SE = 0.09; B = 0.33). Overall, these results suggest that religious individuals rated the morally improper images higher for moral impropriety compared to non-religious individuals, but that the reported differences in response to morally improper images between the two groups were not due to subliminal processing such as miss responses for the detection of an image.
For the ROC analysis we used specificity ðSpecificity ¼ TN TNþFP = 1 -FP) because it is an unbiased metric of false positive responses; incorrectly responding that a target stimulus was presented (Zhang & Mueller, 2005). As predicted, religious individuals were lower for specificity (F (1, 31) = 7.43; p = 0.01; partial eta-squared = 0.19). A trend was reported for lower specificity to the morally improper label (F (2, 62) = 2.78; p = 0.07; partial eta-squared = 0.08). Significant differences were also revealed between image types (F (1, 31) = 16.86; p \ 0.001; partial eta-squared = 0.35). A significant interaction was also found between different groups, labels, and image types (F (2, 62) = 17.36; p \ 0.001; partial eta-squared = 0.36; see Table 6). These findings suggest that religious individuals responded with lower specificity when a morally improper label was presented and, therefore, that they were more susceptible to moral misperception following the presentation of a negatively valanced moral label compared to non-religious individuals (see Tables 5 and 6).
To further explore the reported significant findings for whether they provided evidence for subliminal processing, a mixed model ANOVA was run with independent variables Group (religious vs non-religious) and Detection Performance (hits vs misses) for the morally improper label for morally improper images with dependent variable Moral Impropriety ratings. As predicted, the analysis revealed a significant effect of Group (F (1, 31) = 151.17; p \ 0.001; partial eta-squared = 0.83), a significant effect of Detection Performance (F (1, 31) = 5.42; p = 0.027; partial eta-squared = 0.15) and a significant interaction (F (1, 31) = 7.76; p = 0.009; partial eta-squared = 0.2). Hits for religious individuals (M = 6.09, SD = 0.59) were higher than hits for non-religious individuals (M = 5.72, SD = 0.32; p = 0.003; d = 0.78; LB = -0.5; HB = 0.5; SE = 0.08; B = 8,394.86). Misses for religious individuals (M = 5.03, SD = 0.29) were not higher than misses for non-religious individuals (M = 5.04, SD = 0.39; p = 0.89; d = 0.03) and provided very strong evidence for being proximate for ratings for impropriety (LB = -0.5; HB = 0.5; SE = 0.03; B = 0.08). Overall, these results suggest that religious individuals were more responsive to the moral label compared to non-religious individuals but that the reported differences for higher moral impropriety ratings between the two groups were not due to subliminal processing such as miss responses for the detection of an image. Although previous studies suggest that female participants could respond with higher sensitivity to masked emotional stimuli (Williams et al., 2005), we did not find any gender effects. We also did not find session order effects for rating responses for the masked presentation and the primed masked presentation sessions (Schwarz, 1999; see Supplementary Material 7.1).

Analysis and discussion: masked presentation sessionphysiology
Graphical illustrations for the descriptive statistics for physiological responses for the masked presentation session can be found in Fig. 4. To explore whether there were differences in physiological responses between the two groups during the masked presentation session, we compared SCR, heart rate responses and facial-emotional responses between religious and non-religious individuals, and further explored significant results for responses for hits and misses for detection performance. A mixed model ANOVA with independent variables Group (religious vs non-religious) and Image Type (morally improper vs morally innocuous vs non-moral) and dependent variable skin conductance responses was run. The analysis revealed a significance trend for higher SCR for religious individuals (F (1, 31) = 3.59; p = 0.067; partial eta-squared = 0.1), significant differences between different image types (F (2, 62) = 138.75; p \ 0.001; partial eta-squared = 0.82) and a significant interaction (F (2, 62) = 5.69; p = 0.005; partial eta-squared = 0.16; see Table 7). A similar but more pronounced pattern of results was revealed for heart rate responses. Religious individuals experienced higher heart rate responses than non-religious individuals (F (1, 31) = 22.39; p \ 0.001 partial eta-squared = 0.42) and we also found significant differences in heart rate responses between different images types (F (2, 62) = 156.17; p \ 0.001; partial eta-squared = 0.83). A significant interaction was also found (F (2, 62) = 12.27; p \ 0.001; partial eta-squared = 0.28; see Table 7).
As reported also in our previous publications facialemotional recognition was not as sensitive as SCR and HR responses to masked stimuli (Tsikandilakis, Bali, Derrfuss & Chapman., 2020a). Nevertheless, possibly confirming the evolutionary and developmental role of disgust sensitivity in the appraisal of moral stimuli (Tybur et al., 2010; Fig. 4c). These findings suggest that religious individuals responded with higher physiological arousal and relatively more discernible facial-expressive characteristics to masked images showing moral impropriety compared to non-religious individuals (see Table 7).  To further explore the reported significant findings for whether they provided evidence for subliminal processing, a mixed model ANOVA was run with independent variables Group (religious vs non-religious) and Detection Performance (hits vs misses) for morally improper images with dependent variable Skin Conductance Responses. The analysis revealed a significant effect of Group (F (1, 31) = 457.26; p \ 0.001; partial eta-squared = 0.94), a significant effect of Detection Performance (F (1, 31) = 10.02; p = 0.003 partial eta-squared = 0.24) and a significant interaction (F (1, 31) = 6.46; p = 0.016; partial eta-squared = 0.17). Hits for religious individuals (M = 0.778, SD = 0.019) were higher than hits for nonreligious individuals (M = 0.0643, SD = 0.017; p = 0.005; d = 0.74). Misses for religious individuals (M = 0.0198, SD = 0.006) were not significantly higher than misses for non-religious individuals (M = 0.0188, SD = 0.006; p = 0.517; d = 0.17) and provided some evidence for being proximate for SCR (LB = -0.005; HB = 0.005; SE = 0.001; B = 0.41). The same pattern of results was reported for heart rate responses. The analysis revealed a significant effect of Group (F (1, 31) = 450.91; p \ 0.001; partial etasquared = 0.94), a significant effect of Detection Performance (F (1, 31) = 10.16; p = 0.003; partial etasquared = 0.25) and a significant interaction (F (1, 31) = 4.86; p = 0.035; partial eta-squared = 0.14). Hits for religious individuals (M = 3.21, SD = 0.58) were higher than hits for non-religious individuals (M = 2.73, SD = 0.47; p = 0.002; d = 0.91). Misses for religious individuals (M = 1.24, SD = 0.44) were not significantly higher than misses for non-religious individuals (M = 1.14, SD = 0.41; p = 0.367; d = 0.21) and were insensitive to the alternate and the null hypotheses for heart rate responses when these were defined within conservative higher and lower bounds (LB = -0.25; HB = 0.25; SE = 0.08; B = 0.85). These results suggest that when using assessments of involuntary and automatic responses, such as physiological measures, the differences found between the two groups in response to morally improper images were not due to subliminal processing such as miss responses for the detection of an image.

Analysis and discussion: primed masked presentation session-physiology
We started the analysis in this session by analysing separately the effect of the moral label on the physiology of religious and non-religious individuals to explore whether religious individuals where more physiologically responsive to moral classification (Shariff, 2015). Graphical illustrations for the descriptive statistics for physiology for the primed masked presentation session specifically for the post-label interval can be found in Fig. 5. To explore whether there were differences in ratings between the two groups during the primed masked presentation session for the post-label interval, we compared SCR, heart-rate responses and facial-emotional responses between religious and non-religious individuals, and further explored significant results for responses for hits and misses for detection performance. A mixed model ANOVA was run with independent variables Group (religious vs non-religious) and Moral Label (morally improper vs morally innocuous vs non-moral) and dependent variable SCR for the postlabel interval (see Fig. 2b). As predicted the analysis revealed that religious individuals experienced higher skin conductance responses (F (1, 31) = 24.61; p \ 0.001; partial eta-squared = 0.44). We also found significant differences between different labels (F (2, 62) = 353.07; p \ 0.001; partial eta-squared = 0.92) and a significant interaction (F (2, 62) = 23.73; p \ 0.001; partial etasquared = 0.43). The same pattern of results was reported for heart rate responses. Religious individuals experienced higher heart rate responses compared to non-religious individuals (F (1, 31) = 126.64; p \ 0.001; partial etasquared = 0.8). We also found a significant effect of Moral Label (F (2, 62) = 327.85; p \ 0.001; partial etasquared = 0.91) and a significant interaction (F (2, 62) = 117.34; p \ 0.001; partial eta-squared = 0.79). These findings suggest that religious individuals experienced the post-interval period following a negative label preceding a brief masked image as more physiologically arousing than non-religious individuals (see Table 8). We did not find significant differences for facial-emotional recognition between different groups (F (1, 31) = 0.493; p = 0.493; partial eta-squared = 0.016) and we did not find a significant effect for an interaction between Group and Moral Label (F (2, 62) = 0.326; p = 0.723; partial etasquared = 0.01). Graphical illustrations for the descriptive statistics for physiology for the primed masked presentation session specifically for the post-image interval 8 can be found in Fig. 6. To explore whether there were differences in ratings between the two groups during the primed masked presentation session for the post-image interval, we compared SCR, heart rate responses and facial-emotional responses between religious and non-religious individuals, and further explored significant results for responses for hits and misses for detection performance. We implemented an analysis of variance model for this session that involved all the available relevant variables to explore the effects of Group, Moral Label, Image Type and their interactions in the experience of post-masked-image physiological arousal (Moll et al., 2003;Graham & Haidt, 2010;Sharriff, 2015;Ellemers et al., 2019)  The post image interval physiological responses were calculated as the maximum deferral (highest peak in physiological responses) up to 3 (SCR) or 5 s (heart rate responses and facial emotional recognition) post-stimulus offset in respect to a tonic baseline averaged across a period (dT) of one second for each pre-stimulus onset as measured from the presentation of the first fixation cross in each experimental trial sequence (see Fig. 2b).
To further explore the reported significant findings for whether they provided evidence for subliminal processing, a mixed model ANOVA was run with independent variables Group (religious vs non-religious) and Detection Performance (hits vs misses) for morally improper images for the morally improper label with dependent variable Skin Conductance Responses. The analysis revealed a  . Misses for religious individuals (M = 3.33, SD = 0.35) were not significantly higher than misses for non-religious individuals (M = 3.28, SD = 0.37; p = 0.556; d = 0.14) and provided evidence for being proximate for heart rate responses (LB = -0.25; HB = 0.25; SE = 0.07; B = 0.35). These results suggest that the physiological differences found between the two groups in response to morally improper images when preceded by a morally improper label were not due to subliminal processing such as miss responses for the detection of an image. Although previous studies suggest that female participants respond with higher sensitivity to masked emotional stimuli, we did not report any gender effects. We also did not report session order effects for physiological responses for the masked presentation and the primed masked presentations sessions (see Supplementary Material 7.1).

Summary of Findings
In the current manuscript, we presented a series of experiments to explore a socially significant and challenging hypothesis. We explored whether there were differences in moral processing between religious and non-religious individuals. We used rigorous participant selection criteria and tested a healthy and systematically statistically powered population sample of religious and non-religious individuals. These participants differed in all stages and across all implemented assessments only in respect to religiosity. The rigorous selection criteria for the current population sample allowed us to provide the first to our knowledge instance of Bayesian sampling (Dienes, 2019).
In the current context, this signifies that we were able to report evidence across all stages and assessments for the comparison groups having baseline and proximate characteristics for possible confounding variables, such as fundamentalism, political conservatism and orientation, and morality-related life experiences (see Tables 1 and 3). Based on this rigorously controlled population sample of religious and non-religious individuals, we presented a cornucopia of questionnaire outcomes, behavioural and physiological outcomes that can inform further research and research hypotheses, and meta-analytic research in this area. The length and detail of our report relates to the novelty of implementing image masking for showing moral impropriety in the current area. It also relates to the reliability of the current population sample. The analyses presented an unprecedented opportunity to formally illustrate the differences in a novel design as regards the moral processing between religious and non-religious individuals that, moreover, differed significantly only in respect to the intended key independent variable we were exploring: religiosity.
Along these lines, we were able to report several possibly formative findings. Religious participants were more perceptually and physiologically sensitive to masked images relating to moral impropriety. They were also more susceptible to priming and negative moral misperception of masked images as morally improper (see Tables 7 and 8). These reported differences between religious and non-religious individuals were not due to subliminal processing. Our analysis suggested that these effects involved conscious awareness, such as higher than chance-level ROC detection performance, the report of statistical differences between religious and non-religious individuals only for hits for detection of an image and Bayesian evidence for null responses to moral impropriety for miss responses for detection performance (see Fig. 3; see also Analysis and discussion: stage two, physiology).
The current findings suggest that religious individuals have higher sensitivity but also lower specificity when it comes to the perception and experience of moral impropriety. Nevertheless, the current finding should not be confounded with suggesting that non-religious individuals are neither insensitive to moral impropriety nor immune to moral labelling and miss-classification. Instead, our findings suggest that perceptual and physiological sensitivity to moral impropriety are universal human phenomena and could occur irrespective of religiosity (see Figs. 5 and 6). They also suggest that religious individuals have higher sensitivity and lower specificity for the perception of immoral elicitors.

General discussion
The first seminal acknowledgement that we should emphasize as part of our general discussion is that using Bayesian analysis of non-parametric sensitivity index metrics and further hits versus miss response analyses (see Tsikandilakis, Bali, Haralabopoulos, Derrfuss & Chapman, 2020b;p. 10-17), the current findings did not support that images were subliminally processed. This finding is important because in the current study we used the same parameters for backward masked presentation as previous research (see van der Ploeg et al., 2017). We applied rigorous analytical methods and provided thorough counterprevious-research evidence as regards subliminality (see Brooks et al., 2012). Our analyses suggested that religious, and, in fact, non-religious individuals, responded with above chance-level detection and discrimination performance to masked immoral images when using Bayesian analysis (Dienes, 2016) of signal detection theory metrics (Zhang & Mueller, 2005) and also responded with physiological arousal and correct appraisal responses to immoral images only when these image were consciously recognized as having been presented in a post-trial detection assessment task (see also Pessoa, Japee, Sturman & Ungerleider, 2005a, b).
This casts some doubt to the extent that the Baumard and Boyer (2013) theory for the evolution of archaic moral intuition to the unconscious appraisal of immorality is, indeed, valid at least regarding the possibility that the processing of immorality is and remains unconscious. Instead, our findings for higher conscious perceptual and physiological sensitivity to immoral elicitors for religious compared to non-religious individuals seemingly provided more support for the Evolved Hazard-Perception model; that must be noted is also proposed by Pascal Boyer (Boyer & Lienard, 2006). Our findings suggested that religious individuals are more sensitive to consciously perceiving immoral elicitors even when these are presented for very brief durations, such as 33.33 ms, and masked with an overt stimulus. This effect occurred possibly because these have evolutionary detection and avoidance value for the maintenance of spiritual and behavioural purity (Preston & Ritter, 2012). Our findings relating to significance trends for higher rates for expressions of disgust by religious individuals in response to non-labelled masked immoral elicitors also provided support for the latter model (Ritter & Preston, 2011).
If we were to explore in additional depth the current outcomes in relation to explaining-from an evolutionary perspective-the processing of immorality by religious individuals, we will have to confront a previously unaddressed hurdle (Shariff, 2015). The proposition that is put forth, as a primary consideration in the current area, is that either phylogenetically, via genetic processes, or ontogenetically, via, for example, participation in religious rituals, or via both, religious individuals have retained an archaic societal, intuitional and possibly subliminal system for processing immorality (Baumard & Boyer, 2013). The evolution and evolutionary value of this purportedly subliminal processing system should have been interrupted during the emergence of religious proportionality (Baumard, Hyafil, Morris & Boyer, 2015). The emergence of morally interactive religions (Shariff, 2015), such as religion related to supernaturally invigilated moral self-regulation that deviate from the archaic ritualistic appeasement of emotionally anthropomorphic, autocratic and self-interested deities (Westh, 2009), should have prompted the emergence of conscious self-reflection of what is ''prescriptively immoral'' (Dijksterhuis, Preston, Wegner & Aarts, 2008). This should serve towards the achievement of a harmonious supervisory relationship with the prescribing transcendental entity (Boyer & Bergstrom, 2008). In simpler terms, the evolutionary shift to proportionality, as a process of moral cohabitation between the believer and the rules of the involved transcendental entity, should have also shifted the evolutionary advantage of religious cognition from unconscious intuition to conscious and selfreflective inhibitory cognitive and behavioural processing mechanisms (see for example, Whitehead & Whitehead, 2001).
This should not be miss-interpreted to suggest neither that religious moral intuitions are obsolete from an evolutionary perspective nor that religious individuals cannot, or do not, experience moral intuitions (Gervais, Willard, Norenzayan & Henrich, 2011). Instead, it suggests that even if religious individuals do experience automatic and involuntary moral appraisals and physiological arousal in response to immorality, these can be consciously apprehended (Pessoa, Japee, Sturman & Ungerleider, 2005a, b). That means that there is possibly more evolutionary value in apprehending responses to immorality than retaining the neural workings of a possible archaic system for the unconscious processing of immoral elicitors (Baumard & Boyer, 2013). It also signifies that even if the evaluation of immorality occurs due to pre-conscious mechanisms (see Dehaene, Changeux, Naccache, Sackur & Sergent, 2006), the outcomes, such as the physiological correlates, of this processing are, at-least at this point in religious evolution (Shariff, 2015), accessible by conscious introspection and meta-awareness (see Bargh & Morsella, 2008).
This suggestion is in line with our own theory of Physiological Meta-Cognition (Tsikandilakis, Bali, Derrfuss & Chapman, 2019a;Tsikandilakis, Bali, Derrfuss & Chapman, 2020a;Tsikandilakis et al., 2021a). The latter suggests that when an experience reaches or exceeds a certain level of physiological arousal, conscious awareness is recruited for the perception of the physiological experience and/or the meta-cognitive processing of the eliciting stimulus (see also Pessoa, Japee, Sturman & Ungerleider, 2005a, b). The evolutionary disadvantage of experiencing an intense physiological response that will not automatically and involuntarily elicit the involvement of metaawareness and meta-cognition, and alarm both limbic and higher executive structures to the potential presence of endangering or sociobiologically relevant environmental cues, contradicts the evolutionary utility of a module for processing both inferred threat, such as immorality, and direct threat, such as predatory danger (see Tsikandilakis, Bali, Derrfuss & Chapman, 2019a). Therefore, the current findings are not unexpected from an evolutionary perspective. Arguably, reporting subliminal effects, that would suggest the absence of meta-cognition and/or metaawareness in response to physiologically stimulating immoral elicitors (Brooks et al., 2012), would be-conceptually-more challenging as regards the workings of our inferred threat-detection evolutionary processes (see for example, Pessoa & Adolphs, 2010). It should be emphasized that these findings mean that we could not report significant effects for differences between different stimuli without meta-awareness (Bachmann & Francis, 2013), such as reporting an accurate engagement task response or physiological effect in the absence of one's ability to correctly recall that a stimulus was presented during the trial in a posttrial engagement task (Dehaene, Lau & Kouider, 2017). According to this definition, the presented images were not processed subliminally (see Dehaene, Changeux, Naccache, Sackur & Sergent, 2006).
The formative contribution of the current study is that, at-least at this stage in religious evolution, religious individuals illustrate higher conscious perceptual and physiological sensitivity to immorality. They can also display lower specificity for the discrimination of immorality. This suggests that religious individuals can respond with higher error rates for seeing an immoral image when an immoral image is not presented. This was the result of the religious participants being primed with an overt prescriptive immoral label. If we were to conceptualize our experimental outcomes in a simple message, we could propose a higher sensitivity-lower specificity hypothesis for the processing of immorality as regards religious individuals. The suggestion that could be made, based on the current outcomes, is that religious individuals are more susceptible to the perception and misperception of immoral events. Religious individuals are also less specific as to whether an immoral event occurred when primed with a prior negative description. In more colloquial terms, our study can be interpreted to have provided empirical evidence for the occurrence of a perceptual misclassification and emotional experience, proverbially and with variations in its phrasing, described in most religions related to proportionality (see Beal, 2002); namely, ''Speak of the devil…and he shall appear''.

Limitations
The current research took place in Britain and could reflect cultural elements associated with the religious and nonreligious individuals that took part in the current studies. The religious individuals who took part in the current studies were selected based on the religious demographic availability in Britain. Catholic and Protestant Christians, Muslims and Sikhs were part of the religious individuals that formed the religious population sample. Sampling issues were as well controlled as we possibly could control them in the current study, such as assessing and showing Bayesian evidence for proximity between the two groups for confounding variables, such as fundamentalism, political conservatism and morality-related life-experiences. Nevertheless, the careful reader will notice that in the SECS assessment in both stages we reported a trend for Bayesian proximity between the two groups. This was due to the inclusion of an abortion related item in the SECS questionnaire to which collectively all religious individuals, and specifically more so Catholic individuals, responded with highly negative scores (see Supplementary Material 4.1). In the current manuscript the arduous length and the challenging experimental requirements of testing for moral impropriety, as thoroughly described throughout the Methods and Results sections, disallowed us to proceed to further testing of additional variables. A seminal consideration, therefore, is whether the current findings will replicate for moral masked images and positive moral labels (Shariff, 2015). Finally, it must be noted that the current research was completed during the period of two years (see also Quing Leong, Yu, Tsikandilakis & Eddie, 2021;in press). This was due to difficulties with recruiting religious and non-religious individuals without creating confounds that could result in effects due to economic status, education, political orientation, age, life experiences and so on. The current authors would like to kindly and cordially share with collegial solidarity that replicating the current or a similar design could require dedication and perseverance for the completion of the intended research.

Conclusions
In the current studies, we explored whether religious and non-religious individuals differ as regards the processing of backwards masked images showing moral impropriety when these were and were not preceded by a pre-trial moral label. We could not find any evidence for subliminal or unconscious processing as theorized by several models relating to evolutionary moral psychology and previous experimental research that used masking in this area. We found that religious individuals considered more non-labelled masked images as immoral. Religious individuals had also higher rates for misclassification of backwards masked images as immoral when a negative moral label preceded the presentation.