The effect of input modes and number of exposures on the learning of L2 binomials


Despite the importance of mastering different types of formulaic sequences in a second language, little is known about the relative effect of different input modes on their acquisition. This study explores the learning of a particular type of formulaic language (binomials) in three input modes (reading-only, listening-only, and reading-while-listening) at different frequencies of exposure (2, 4, 5 and 6 occurrences). Arabic learners of English were presented with three stories, each in a different mode, that contained novel binomials (e.g., wires and pipes) and existing binomials (e.g., brother and sister). Two post-tests (multiple-choice and familiarity ratings) assessed learners’ knowledge of the binomials. Results showed that reading-only and reading-while-listening led to better performance on the tasks than listening-only. Frequency of exposure had an effect on the perceived familiarity of binomials.


Introduction
Multi-word sequences (MWSs) -also referred to as 'formulaic language' (Wray, 2002) are recurring patterns consisting of multiple words. MWSs are generally identified and/or defined based on how frequently the combination of words occurs in corpora and whether the co-occurrence of words is random or not (e.g., Biber, Johansson, Leech, Conrad & Finegan, 1999). MWSs are ubiquitous in the English language and are therefore a major component of language competence. Although estimates have varied, research has indicated that formulaic sequences are widespread in both spoken (e.g., Biber et al., 1999) and written discourse (Erman & Warren, 2000) and that they serve a crucial function in language use, especially when their pragmatic value is considered (e.g., Conklin & Schmitt, 2008). The importance of acquiring formulaic sequences for achieving high levels of competence has led to a surge in research exploring the conditions that facilitate their acquisition. Many studies have shown that formulaic sequences can be learnt from explicit, focused instruction (e.g., Laufer & Girsai, 2008;Webb & Kagimoto, 2009). However, the large amount of formulaic language that learners need to acquire, together with the often-limited classroom time, points to the need to explore methods that could be exploited in-and out-side of the classroom (Pellicer-Sánchez, 2017). Recent studies have suggested that formulaic sequences can be learnt incidentally from different input modes: reading (Pellicer-Sánchez, 2017;Vilkaite, 2017), reading-while-listening (Webb, Newton, & Chang, 2013;Webb & Chang, 2020), listening (Webb & Chang, 2020), and viewing (Puimège & Peters, 2019. Importantly, with the exception of Peters (2019, 2020), who looked at a range of formulaic sequences, the majority of these studies have focused on the acquisition of collocations. The effectiveness of different input modes on the incidental acquisition of other types of formulaic sequences needs to be explored more fully.
Binomials are one type of formulaic language, and although many binomials are more frequent than idioms, they have been studied far less (Siyanova-Chanturia, Conklin, & Van Heuven, 2011). Binomials consist of two lexical items of the same lexical class joined by a coordinating conjunction and generally have a conventional word order that may be determined by semantic, orthographical, social, phonological and frequency-based factors (e.g., male before female as in men and women; Benor & Levy, 2006). They can be reversible, that is, the words can change position (e.g., salt and pepper) or irreversible (i.e., the meaning is anomalous when reversed, as in hit and run; Malkiel, 1959). Binomials vary on a continuum from transparent to opaque; the meaning of transparent binomials is directly derived from the two content words (e.g., tall and short), while the meaning of opaque binomials is not evident from the content words (e.g., by and large; Farghal & Jabber, 1995).
For transparent binomials, when the individual words are already known to learners, developing the form-meaning connection should not pose a challenge. What might be more difficult is learning the conventional or preferred word order (it is salt and pepper not pepper and salt). The present study examined whether L2 learners could incidentally acquire new (transparent) binomials in three different input modes (reading-only, listening-only and reading-while-listening) and compared performance on novel binomials to existing binomials.

Incidental Learning of Individual Words and Formulaic Sequences
In language acquisition, learning new words without the intention of doing so is referred to as incidental vocabulary acquisition (van Zeeland & Schmitt, 2013). It occurs when vocabulary learning is the 'by-product' of another activity like reading (Huckin & Coady, 1999).
A few studies have looked at the incidental learning of formulaic sequences from reading. Szudarski (2012) compared EFL learners' acquisition of verb-noun collocations in two conditions: reading-plus treatment and reading-only. Participants in the reading-plus treatment group read stories that contained target collocations and completed explicit exercises focused on collocational patterns, while those in the reading-only group were simply asked to read the same stories. The results showed that the collocational knowledge of those in the reading-plus treatment group was significantly better than that of learners in the reading-only group and that this was true on both the productive and the receptive tests. In contrast, there was no significant difference between the reading-only group and the notreatment control group, which led the author to conclude that reading-only did not contribute much to the learners' knowledge. Similarly, Szudarski and Carter (2014) investigated L2 learners' acquisition of infrequent verb-noun (e.g., take a swipe) and adjective-noun (e.g., quick retort) collocations in two conditions: reading-only and reading with target words underlined. They found that reading a story with underlined collocations lead to significant gains in their form recall and recognition, but that reading-only did not.
Pellicer-Sánchez (2017) examined the incidental acquisition of collocational knowledge when learners encountered adjective-pseudoword collocations (e.g., dangerous bancel for dangerous criminal) while reading. She found that exposure to the input led to collocational gains in terms of form recall and form recognition at a rate similar to that of learning the form and meaning of a single word. Vilkaitė (2017) looked at the incidental acquisition of both adjacent verb-noun collocations (i.e., the components directly follow each other, as in to spend time) and nonadjacent verb-noun collocations (i.e., intervening words between the two components, as in to spend a lot of time) from reading. The results showed that learners had equivalent gains for adjacent and nonadjacent collocations at the recognition level, while the learning gains were negligible at the recall level for both types.
Despite aural input playing a major role in overall language development (Vandergrift, 1999), research on learning from L2 listening is relatively scarce and limited to the learning of single words. Vidal (2003) found that listening to lectures containing target words resulted in vocabulary learning and that learners with a larger vocabulary size gained greater vocabulary knowledge. Van Zeeland and Schmitt (2013) showed that listening to passages containing target items that were repeated 3, 7 and 11 times resulted in gains in different aspects of vocabulary knowledge, with very small gains on a meaning-recall test compared to those on form-and grammar-recognition tests.
Due to the ephemeral nature of listening, it might be difficult for L2 learners to recognize words in aural input and learn new lexical items from it. It is, however, thought that exposure to written plus spoken input may help low-proficiency learners to develop their 'auditory discrimination skills' (Vandergrift, 2007). Reading-while-listening allows learners to follow the written words as they listen to the pronunciation. Conklin, Alotaibi, Vilkaitė, and Pellicer-Sánchez (2020) used eye movement measures to explore how reading patterns were impacted by the simultaneous presentation of auditory input. They examined whether fixations on words in a reading-while-listening mode were aligned with the audio. In other words, whether readers fixated on the word tea when they heard it. The overall results showed that L2 speakers were ahead about 60.7% of the time, behind 7.1 % of the time, and aligned with the audio only 21.4 % of the time. Thus, reading was mostly unaligned with the audio. It is unclear how the auditory and written input are integrated when reading and listening are not aligned and, therefore, how reading-while-listening might benefit learners.
The authors speculated that exposure to written text provides listeners with visual cues for the upcoming spoken word, hence helping (1) speed word identification and (2) segment individual words from the continuous speech stream and (3) match the spoken form with the written form.
Research with single words has shown that reading-while-listening is an effective source of L2 vocabulary growth. Chang (2009) showed that reading-while-listening to stories resulted in better performance than listening-only in terms of vocabulary learning and comprehension. Reading-while-listening has also been shown to be effective for the acquisition of formulaic sequences. Webb et al. (2013) investigated the incidental acquisition of verb-noun collocations (e.g., blow your nose) when learners simultaneously read and listened to a modified graded reader. Overall, the results confirmed that reading-whilelistening led to increased collocational knowledge at the level of form recall and recognition.
Some studies have compared the effectiveness of different sources of input. Brown, Waring, and Donkaewbua (2008) explored incidental vocabulary acquisition of single words in three input modes (reading-only, listening-only and reading-while-listening) and the effect of number of exposures. Results showed that new words were learned in all three input modes, but the most learning occurred from reading-while-listening and the least in listeningonly. Reading-while-listening and reading-only led to similar gains. Vidal (2011) compared incidental learning from academic reading and listening and found that both immediate and delayed vocabulary gains were higher for reading than listening, especially for lowproficiency students. Further, Malone (2018) found that a reading-while-listening group had an advantage over a reading-only group in a form-recognition task. In a recent study, however, Vu and Peters (2020) did not find a difference in single word gains between reading-while-listening and reading-only. Similarly, Feng and Webb (2020) compared the acquisition of single words from reading, listening and viewing and found similar gains in the three modes.
To the authors' knowledge, only one study has compared the effectiveness of different input modes on the acquisition of formulaic sequences. Webb and Chang (2020) compared the effect of reading, listening, and reading-while-listening with EFL students using graded readers. Findings showed that all three input modes contributed to the incidental learning of L2 multi-word combinations (e.g., unpack bag), with the reading-while-listening condition resulting in the highest immediate post-test gains, followed by both the listeningonly and the reading-only conditions. Unlike the findings of Brown et al. (2008) andVidal (2011), these results showed that listening-only had an effect on learning that was similar to that of reading-only, suggesting that listening may play a more important role in the learning of formulaic sequences than for single words. This advantage is interpreted by the authors as pointing to the important role of intonation, which may have contributed to acquiring the prosodic form of the sequences. As Lin (2018) argues, the prosodic cues in spoken input can (1) guide the segmentation of input hence promoting the acquisition of L2 formulaic sequences and (2) potentially facilitate how they are processed and stored in the brain.
Overall, research suggests that successful incidental learning of new lexical items can take place from all three input modes, but that reading-while-listening seems to have an advantage over the other modes. However, there is contradictory evidence about the benefit of reading-while-listening relative to reading-only, and of reading-only relative to listeningonly. Webb and Chang (2020) argue that these differences may be due to the input modes having differing effects on the incidental learning of single versus multi-word units.
However, there is very limited empirical research to support such a claim. Crucially, we need evidence about a much wider range of MWSs.
As a final point, it is important to note that in the literature on incidental vocabulary learning, most studies have employed a battery of tests to measure learners' knowledge of new words at different levels of mastery. For example, in Brown et al.'s (2008) study, the three learning conditions (reading-only, listening-only, and reading-while-listening) were each followed by a test. It might be argued that after the first test, participants would expect a test after the other input conditions. This makes it difficult to draw concrete conclusions about incidental learning. Indeed, this is a common problem in incidental learning studies: once assessment has been carried out, any subsequent learning may no longer be incidental as participants might try to learn (i.e., intentionally learn) in case there are further tests. While keeping this caveat in mind, in line with the much of the literature, learning in the current study is referred to as incidental, as items were not explicitly taught nor were participants told that they would be tested.

The Effect of Number of Exposures on Incidental Vocabulary Learning
It is widely acknowledged that new lexical items are learned gradually and incrementally through repeated encounters in context. Research on single words has demonstrated a positive effect of number of exposures on learners' incidental gains from reading (e.g., Brown et al., 2008;Chen & Truscott, 2010;Vidal, 2011;Waring & Takaki, 2003;Webb, 2007;Zahar, Cobb, & Spada, 2001). However, in the listening modality, studies have demonstrated either no or marginal effects for one or repeated exposures (e.g., Brown et al., 2008;van Zeeland & Schmitt, 2013;Vidal, 2011). More recently, Feng and Webb (2020) found that frequency of exposure was not related to incidental vocabulary gains in any of the conditions they examined (i.e., reading, listening, and viewing), calling for more research in the area.
Very little evidence exists about the effect of frequency of exposure on the acquisition of formulaic language, and it is somewhat contradictory. Pellicer-Sánchez (2017) found that differences in the number of exposures to collocations (4 and 8 times) in reading had no significant effect on their acquisition. This could be due to the narrow range in frequencies of the target items. In Szudarki and Carter's (2014) study, there were greater gains in terms of form recognition of collocations with 6 encounters and form recall with 12 encounters.
Similar to single-word research, this suggests that different levels of exposure are needed for form recognition and recall of MWSs.
In reading-while-listening, Webb et al. (2013) found that repeated exposure (1, 5, 10 and 15 times) had a positive effect on the learning of collocations at both the receptive and productive levels. In particular, 5 encounters to target items led to significant learning and 15 encounters contributed to significantly greater gains than the other frequencies. Similarly, Webb and Chang (2020) found a significant positive correlation (r = .61) between number of exposures (between 1 and 16) and vocabulary gains for collocations. Importantly, this was modulated by input mode, with a significant positive correlation between number of exposures and vocabulary gains in reading-while-listening but only moderate and nonsignificant correlations in reading-only and listening-only.
A recent meta-analysis by Uchihara, Webb, and Yanagisawa (2019) provides evidence for the complex relationship between number of exposures and learning. Uchihara and colleagues examined studies reporting the correlation coefficient between number of exposures and single word learning. Their goal was to explore the overall relationship between frequency of encounters and incidental vocabulary learning, as well as whether and to what extent other variables might account for the variability in repetition effects. Results showed that there was a medium effect (r = .34) of repetition on incidental vocabulary learning. There was significant variation in the size of frequency effects across the studies, which was explained by a range of variables: learner variables (e.g., age, vocabulary knowledge); methodological differences (e.g., nonword use, whether or not participants were informed of an upcoming test, test format); and treatment (spacing, mode of input, visual aids, engagement, range in number of exposures). Particularly relevant for the present investigation is the finding of the association between input mode and repetition. Although there were no significant differences between four input modes (reading-only, listening-only, reading-while-listening, and viewing), repetition appeared more beneficial in reading-only (r = .41) and listening-only (r = .39) than in reading-while-listening (r = .28) and viewing (r = .22). The authors concluded that, although number of exposures is an important predictor of incidental vocabulary learning, it is one of many factors affecting vocabulary learning in meaning-focused input. This provides a rationale for the current study, which explores the impact of and relationship between input mode and number of exposures in the incidental learning of MWSs.
As pointed to by Uchihara et al. (2019), the seemingly conflicting findings across studies could be explained by methodological differences: for instance, the aspects of knowledge measured, test formats, and the length of time between encountering the items and the testing phase. More research is needed to gain a better understanding of the role of number of exposures and how it interacts with factors such as input mode. Importantly, the focus of previous MWS research has been on the acquisition of collocations. While this is a frequent and important type of MWS, the multifarious nature of formulaic sequences means that we cannot necessarily generalise findings from one type of MWSs to others. A notable exception to the focus on collocations is the research by Peters (2019, 2020) that looked at the acquisition of different types of formulaic sequences (e.g., compounds, idioms, phrasal verbs, collocations, similes, binomials) from watching English language television. Further, an eye-tracking study on non-idiomatic reversible binomials by Siyanova-Chanturia et al. (2011) found that binomials in their more frequent 'forward' form (e.g., bride and groom) were read more quickly than their less frequent 'reversed' form (e.g., groom and bride) by L1 speakers and more proficient L2 speakers.
Studies have mainly looked at gains in learners' knowledge of form and meaning (at receptive and productive levels of mastery) using multiple-choice and translation tests. The use of other measures would allow us to explore gains at potentially earlier and less explicit stages of learning. For example, familiarity has been shown to be related to the comprehension of formulaic language (e.g., Aljabri, 2013;Nippold & Taylor, 2002) and is therefore an important aspect to examine when investigating the learning of new formulaic sequences. While a number of psycholinguistic studies have used familiarity ratings as a criterion for item selection (e.g., Carrol & Conklin, 2019) and to provide evidence supporting the role of word familiarity in word/phrases recognition (e.g., Isobe, 2011), to our best knowledge, there has not been research on vocabulary acquisition that used familiarity ratings as a measure to assess whether target items were familiar after exposure.
To address the gaps in the literature, the present study examined the effect of three input modalities (reading-only, listening-only, and reading-while-listening) on the incidental acquisition of an under-researched type of MWS, binomials. The study focused on transparent, compositional binomials where the constituents are known to the learners and the lexical aspect to be learnt is the association among the two components and the order in which those components appear (e.g., fish and chips not chips and fish). The study aimed to identify whether L2 learners can acquire knowledge of the correct order of the binomials incidentally from various input modes. The role of number of exposures was also examined.
In order to assess knowledge of binomials, different measures were used: a multiple-choice form recognition task and a familiarity rating task. The following research questions were addressed: 1. Do L2 learners acquire new binomials (wires and pipes) from meaning-focused input?
2. Do the different input modes (reading-only, listening-only or reading-whilelistening) lead to different performance on post-tests?

Participants
Thirty-nine female students studying at a university in Saudi Arabia took part in this study (BA level = 34; MA level = 5). For practical reasons, participants were recruited from two classes: 19 participants were majoring in English and 20 were non-English majors learning English at the English Language Institute. Participants were asked to complete a language background questionnaire in which they self-rated their proficiency in speaking, listening, reading and writing, as well as how often they used English in various contexts: speaking (with family, friends etc.), writing (to family and friends etc.), reading (things in English for academic purposes, work, etc.) and watching/hearing/listening (to TV, radio in English etc.).
This should help establish that the participants' exposure to English was similar and that they did not have any particular difficulties with reading and listening. The purpose of the selfreporting measure was to ensure that none of the participants had difficulties with reading and listening (i.e., ratings below 4), which might influence their ability to understand the written/spoken texts. Participants were also asked to complete a shortened version of the VST (modified from Nation & Beglar, 2007), which will be referred to as a vocabulary knowledge test (VKT). The VKT consisted of 20 items, of which two words were randomly selected from the first 10 bands. A maximum score of 20 could be achieved, which would roughly correspond to a vocabulary size of 10,000 words (Carrol & Conklin, 2014Carrol, Conklin, & Gyllstad, 2016). This vocabulary measure simply provided an estimate of participants' vocabulary knowledge and was included in the analysis. Data from five participants, whose shortened VKT indicated a vocabulary of less than 4000 word families, were discarded. Finally, data from two participants were not included in the analysis because they did not complete the experimental procedure. This resulted in 32 participants being included in the analyses (see Table 1 for a summary of their characteristics).
14 Table 1. Summary of participants' mean age, years studying English, self-rating on a 7-point scale (1 = very low to 7 = native-like) of speaking, auditory comprehension, reading and writing, use of English language in everyday life (0% English to 100% English), as well as their score on the shortened version of the VKT (maximum score of 20) with Standard deviations in parentheses.

Age
Yrs.

Binomials and Passages
The stimuli consisted of a set of novel and existing binomials. By using a set of novel binomials (e.g., wires and pipes), we could examine learning without having to pre-test knowledge. The inclusion of existing binomials allowed us to determine whether repeated exposures influence performance for novel binomials, such that they begin to behave like existing binomials. The nine existing and 24 novel (invented) binomials were taken from a study by Conklin and Carrol (2020) Table 2 summarizes the characteristics of the stimuli.
1 The novel items did not violate the word order constraints considered in Morgan and Levy (2016).
2 The Zipf scale goes from 1 to 6 or 7 and can be interpreted as follows: values of 3 and less are low-frequency words, and values of 4 or more are high-frequency words (Van Heuven, Mandera, Keuleers & Brysbaert, 2014). The calculation of Zipf values is straightforward, as it equals log10 (frequency per billion words) or log10 (frequency per million words) + 3. As such, words with a frequency of 1 per 100 million words get a Zipf value of 1, words with a frequency of 1 per 10 million words get a Zipf value of 2, and words with a frequency of 1 per million words get a Zipf value of 3, and so on. The three passages that were about daily life were taken from a study by Conklin and Carrol (2020) on adult L1 speaker acquisition and adapted such that each contained three existing binomials and eight novel ones (see Appendix 2 for a sample text). The three existing binomials appeared twice in a text and the eight novel ones occurred at different frequency levels (two at each frequency level: 2, 4, 5 and 6). All of the words that made up the texts were high frequency words (belonging to the first 3000 most frequent word families in English), and therefore were likely to be known by the participants. No comprehension difficulties were informally reported by any of the participants, either during or after the treatment.
The mean length of the passages/stories was 1443 words (Max = 1494, Min = 1402).
The stories were divided into 8-9 paragraphs, with a mean length of 167 words per paragraph.
Each passage was presented in one of the three modes in a counterbalanced design (see Table   3 below). For listening, the passages were recorded by an L1 speaker of British English at a  -Eisler, 1961), and were all around six minutes long.

Post-tests
Two tests were created to assess different aspects of learners' knowledge of the existing and novel binomials presented in the passage: 1. a multiple-choice test and 2. familiarity ratings.
For both the multiple-choice test and the familiarity ratings two categories of 'filler' binomials were created to avoid making the target binomials overly salient: 'filler existing binomials' and 'filler novel binomials'. The filler existing binomials were from Siyanova-Chanturia et al. (2011) and the filler novel ones were created by the researchers. Care was taken to ensure that the filler existing binomials were of low frequency (Mean frequency = 1.5 per million words in the BNC, SD = .53). The filler novel binomials had a frequency of 0 in the BNC. Both types of filler items were matched with the target items for part of speech, length, and frequency. The multiple-choice test and familiarity ratings after each passage contained six filler existing binomials and ten filler novel binomials.
In the multiple-choice test, participants' recognition of the form of the binomials was measured. Participants were presented with a list of three existing binomials and eight novel binomials. Each item included three options: the correct form of a binomial, the reversed form and a neither option to reduce guessing. 3 It is worth noting that this test does not reflect the standard format of a multiple-choice test used in L2 vocabulary research. Instead, it is designed to measure learners' ability to recognize the correct word order of binomials by asking them to choose between options that vary only in the word order, which is likely very challenging (see Appendix 3).
The familiarity ratings asked participants to rate how familiar they were with the existing and novel binomials on a five-point scale (1= I have never heard/used this phrase, 2 = I've very rarely heard/used this phrase, 3 = I've rarely heard/used this phrase, 4 = I've frequently heard/used this phrase to 5 = I have very frequently very frequently heard/used this phrase). 4 This test included both the existing and novel target binomials that had been encountered in the just seen and/or heard passage as well as the filler existing and novel binomials that had not been encountered in the text (see Appendix 3).

Procedure
The study was carried out in the participants' English class. They provided their informed consent for taking part in the research. Participants were given the three passages, each one in a different mode, and were told that they would answer some comprehension questions (see Appendix 4) about them. A within participant, counterbalanced design was adopted. Thus, a participant read three texts, each in a different input mode. The three passages were presented in the different modes to different but an equal number of participants. Table 3 demonstrates the counterbalance design. Participants were instructed to read as naturally as possible for comprehension (no announcement was made about the posttests). After each passage, participants were given three true-false statements to (1) ensure that they read/listened to the text and (2) check their overall understanding (questions did not contain the target binomials), followed by the two post-tests for the items presented in that passage. The tests were administered and collected after each passage to prevent participants going back to previous tests. The medium of test administration was paper and pencil and participants reading was self-paced in the reading-only condition. The scoring of the multiple-choice task involved giving 1 point for a correct response and 0 points if the response was incorrect or the 'neither' option was selected. The vocabulary test and language background questionnaire were completed at the end of the session. The entire procedure took approximately 50 minutes. GEE is based on the number of observations (the number of participants multiplied by the number of items) rather than on the test scores, as in an AN(C)OVA (Peters, 2016). This means that the combination of values of the specified variables uniquely define subjects within the dataset. For example, the combination of 'Participant, Test Type and Score defines, for each case, a particular score (correct or incorrect) on a particular test type for a particular participant. In GEEs the odds ratio (= expβ, or exponential parameter estimate) provides a measure of effect size and expresses the relative chance of an event happening under the different conditions analysed. The odds ratio predicts the change from one level to the next in a factor variable relative to the lowest or highest level. 5 It should be noted that a GEE model does not allow for pairwise comparisons of categorical data; hence, only the odds ratio is reported in the analyses of familiarity-ratings tasks. Table 4 presents accuracy in the multiple-choice test. The first set of GEEs explored performance in this task and included the factors: input mode (reading-only, listening-only, and reading-while-listening); number of exposures (2x, 4x, 5x, and 6x) as within-subject factors; and vocabulary knowledge (measured by the shortened VKT) as a covariate; as well as exploring the two-way interactions between them. We included the Group variable (English majors vs. non-English majors) in the initial analyses, but since there were no differences in vocabulary learning between the two groups it was excluded from the analyses.

Multiple-choice test
With a binary dependent variable in each case (correct vs. incorrect), the model employed a binomial distribution and logit link. In these analyses (Tables 5 and 6), only the novel binomials were considered, as they had an exposure manipulation, while the existing binomials did not. In a second set of analyses, we compared performance on the existing and a subset of the novel binomials to look at performance on the novel binomials relative to ones that are likely to have been encountered before.  The model for the multiple-choice test revealed a significant main effect of input mode on participants' performance ( 2 (2) = 25.9, p < .001). The odds of correct answers in listeningonly did not differ from those in reading-while-listening (exp -.19 = .83). Further, the odds of correct answers in reading-only did not significantly differ from those in reading-whilelistening (exp .13 = 1.1). The post-hoc, pairwise comparisons showed that listening-only led to lower scores than both reading-only (p = .01); and reading-while-listening (p < .001). There was no significant difference between reading-only and reading-while-listening (p = .23). The number of exposures did not predict performance ( 2 (3) = 6.32, p = .09). The interaction between input mode and number of exposures was not significant ( 2 (6) = 12.22, p = .06), indicating a lack of difference in performance between input modes at the different levels of exposure. The effect of vocabulary knowledge was not significant ( 2 (1) =.002, p = .96).
There was no effect of number of exposures. However, differences between 2, 4, 5, and 6 exposures are expected to be relatively small and are unlikely to be picked up when all of the levels are considered together. To further explore the effect of number of exposures, we compared performance on the minimum and maximum number of exposures (two vs. six) for each mode, using a paired sample t-test. Results revealed that items repeated six times were better learned than those repeated two times after listening-only (t(30) = 2.9, p = .01). There were no significant differences between items repeated six and two times after reading-only (t(28) = 1.1, p = .28), or reading-while-listening (t(31) = -.22, p = .83).
In sum, reading-only and reading-while-listening had a similar effect on participants' knowledge of binomials. Further, there was no main effect of number of exposures on participants' performance on the multiple-choice task. However, when comparing the minimum and maximum number of exposures, results showed a significant difference between two and six exposures in the listening-only mode. Finally, participants' vocabulary knowledge, as measured by the shortened VKT, did not modulate their performance on the task. While the analysis thus far looked at performance on the novel binomials across the number of exposures, another important comparison involves that of the novel binomials to the existing ones (which were only presented twice). In this subsequent analysis, we explored whether: 1) the novel binomials that were presented twice and the existing binomials that were also presented twice demonstrate differences in accuracy; and 2) whether the novel binomials presented six times (the maximum number of exposures) differ in accuracy from existing binomials. The analyses of performance on the multiple-choice task included the factors: input modes (reading-only, listening-only, and reading-while-listening), binomial type (existing 2x, novel 2x, novel 6x) as within-subject factors, vocabulary knowledge (VKT) as a covariate, and the two-way interactions between them. Similar to the previous analysis, the model employed a binomial distribution and logit link function. Table 4 presents the percentage of correct responses in the multiple-choice test.
The model in Table 7 revealed a main effect of binomial type ( 2 (2) = 34.76, p < .001). From Table 8, we see that the odds of correct answers for novel binomials repeated six times were lower than existing binomials repeated twice (exp -.77 = .47). The odds of correct answers for novel binomials repeated twice were lower than existing binomials that also appeared twice (exp -1.08 = .33). The post-hoc, pairwise comparisons indicate that existing binomials occurring twice yielded better performance than novel binomials repeated twice (p < .001); and novel binomials repeated six times (p = .003). It also showed that performance on novel binomials seen six times was significantly better than those seen twice (p = .02). Furthermore, results revealed a main effect of input mode ( 2 (2) = 8.71, p = .01). The odds of correct answers in the listening-only were not significantly different from those in the reading-while-listening (exp -.53 = .59). 6 The odds of correct answers in the reading-while-listening were not significantly different from those in the reading-only (exp -.30 = .74). The post-hoc, pairwise comparisons revealed that performance in listening-only was worse than in both reading-only (p = .01) and reading-while-listening (p =.01). There was no difference between reading-only and reading-while-listening (p = .76). The interaction between binomial type and input mode was not significant (p = .22), indicating that the type of target items (novel or existing phrases) did not modulate performance across the input modes. Vocabulary knowledge contributed significantly to the model ( 2 (1) = 5.28, p = .02): learners with larger vocabulary knowledge performed better on the multiple-choice test. Table 7. Generalized Estimating Equations analysis on the multiple-choice test scores: Test of model effects (binomial type and gains) 6 As an example, this result reflects the overall main effect of 'input mode' only, not the interaction between input mode and binomial type. It compares the odds of correct answers for all types of target items in the listening-only vs. reading-while-listening modes.  Note. a Set to zero because this parameter is redundant, β: regression coefficient Taken together, these results indicate that participants were more likely to recognize the 'correct' order of existing binomials than novel binomials. Exposure played a role in the recognition of novel binomials; seeing a new phrase six times significantly increased participants' ability to recognize its correct order relative to seeing it twice. Even after having seen the novel binomials six times, knowledge of the correct order of the binomials was significantly lower than existing binomials. Learners' vocabulary knowledge appeared to modulate their performance on the multiple-choice task.

Multiple-choice test
It is possible that responses in the multiple-choice task reflected guessing. In the test, participants were presented with two possible orders for the same phrase (e.g., fish and chips, chips and fish), and a third 'neither' option. They were instructed to choose the option that sounded more natural or 'neither' if neither sounded better. It could be argued that participants would simply ignore the 'neither' option, which would mean that chance-level or 33.33% rather than 50%). Finally, we also examined whether there was a difference between the target novel items and filler novel items. Paired sample t-tests showed that there was a significant difference in the scores for the target novel items and filler novel items, t(31) = 7.74, p < .000. The fact that there was a difference in performance between target items (that were exposed in the reading texts) and filler novel items (those that were not exposed to) indicates that target items were learnt from the treatment.

Familiarity Ratings
Another goal of this study was to determine whether number of exposures and input modes affected participants' familiarity level with novel binomials. Given that the dependent variable was categorical (familiarity ratings), we used an ordinal logistic GEE model, with Multinomial distribution, Cumulative logit function and Independent structure. The following factors were included: input mode (reading-only, listening-only, and reading-while-listening) and number of exposures (2x, 4x, 5x, and 6x) as within-subject factors; vocabulary knowledge (measured by the vocabulary test) as a covariate; and the two-way interactions between them. Table 9 shows participants' familiarity ratings for existing and novel binomials. The results in Table 10 revealed a significant main effect of input mode on familiarity, ( 2 (2) = 25.11, p < .001). Based on the odds ratio, presented in Table 11, we see that familiarity ratings were lower in listening-only than in reading-while-listening (exp -1.07 = .34). In contrast, familiarity ratings in reading-only did not significantly differ from those in reading- Looking at the difference between input modes at each exposure, familiarity ratings for items repeated twice were lower in listening-only than in reading-while-listening (exp -.95 = 2.59).
Familiarity ratings for items repeated five times were higher in listening-only than in readingwhile-listening (exp 1.43 = 4.19) and in reading-only than in reading-while-listening (exp .51 = 1.66). The main effect of vocabulary knowledge was significant ( 2 (1) = 4.57, p = .03).
Greater vocabulary knowledge led to higher familiarity ratings.  These results showed that participants developed a certain level of familiarity with the novel binomials in the three input modes, with reading-while-listening having an advantage over the listening-only mode. Number of exposures also appeared to have an effect, with greater familiarity appearing at four exposures. This analysis informed us about learners' familiarity with novel binomials only. In order to investigate differences between familiarity ratings to the different types of binomials, another GEE (ordinal model) was carried out that included the factors: input modes (reading-only, listening-only, reading-while-listening), binomial type (existing 2x, novel 2x, and novel 6x) as within-subject variables; vocabulary knowledge (VKT) as a covariate; and the interaction between them. Similar to the previous analysis, the model employed a Multinomial distribution, Cumulative logit function and Independent structure. Results (see Table 12) revealed a main effect of binomial type ( 2 (2) = 76.76, < .001). Based on the odds ratio (see Table 13), there was no significant difference in familiarity ratings between novel binomials seen six times and existing binomials (exp -.08 = .93). In contrast, familiarity ratings for novel binomials seen twice were lower than existing binomials (exp -.1.42 = .24). In addition, there was a main effect of input mode ( 2 (2) = 16.14, p < .001). Familiarity ratings were lower in listening-only than in reading-only (exp -.49 = .61).
Also, familiarity ratings in reading-while-listening did not differ from those in reading-only (exp -.28 = .75). The interaction between binomial type and input mode was not significant, p = .69. This shows that learners' familiarity with binomials (novel or existing) did not depend on the input mode. Vocabulary knowledge did not contribute significantly to the model, p = .92.
In sum, participants demonstrated a higher familiarity level with existing binomials over novel binomials seen twice. However, with greater exposure, the novel binomials seen six times became as familiar as existing ones. This indicates that repeated exposure to new phrases in a meaningful context can promote lexical development.  Note. a Set to zero because this parameter is redundant, β: regression coefficient

Discussion
The purpose of this study was to investigate the effect of different input modes and number of exposures on the incidental learning of binomials. It also explored an aspect of formulaic language knowledge that previous studies on vocabulary acquisition have not investigated (i.e., the development of learners' familiarity with target items, using familiarity ratings). familiarity with them from all three input modes. Notably, in response to the second research question, there were generally no differences in performance in the reading-only and the reading-while-listening modes, and they both lead to better performance than the listeningonly mode. This supports previous research on single words (e.g., Brown et al.'s, 2008;Vidal, 2011;Vu & Peters 2020) and points to an interesting similarity between the learning of single words and MWSs. However, the current findings seem to contradict those of Webb and Chang's (2020) indicating that reading-while-listening led to best performance for MWSs, while listening-only and reading-only yielded similar results. They also appear to contrast with the findings of Malone (2018) who found that simultaneous input modalities (i.e., reading-while-listening) led to higher learning outcomes in the acquisition of single words. The relatively small sample size in the current study could explain the lack of difference in performance between reading-while-listening and reading-only modes, and hence the contradictory findings between the current study and previous research (e.g., Webb & Chang, 2020;Malone, 2018). In addition, the difference in results could point to different input modes being more/less beneficial for the learning of different types of MWSs.
Listening-only may be more beneficial when learning collocations than binomials. While prosodic cues in spoken input are thought to play a role in the development of formulaic sequences (e.g., Vidal, 2011;Webb & Chang, 2020), it is unclear whether they are equally important for the different types of formulaic language. Future research will need to look at prosodic cues across the types of formulaic sequences and how they might benefit learners.
Another possible explanation for the differing results may be related to the use of different tasks. In their post-test, Webb and Chang presented the collocations auditorily (i.e., collocations were presented aurally, and participants had to write their meanings on a corresponding test sheet), while in the current study they were presented in a 'reading' mode.
This could have disadvantaged items that were learned orally. The lack of a clear advantage for reading-while-listening in the current study may imply that its advantages are limited to tasks involving aural input. There is evidence suggesting a test-modality congruency effect (Sydorenko, 2010;Jelani & Boers, 2018). Sydorenko (2010) found that a video with audio group performed better on an aural than on a written recognition test, and the reverse pattern was found for a video with caption group. Jelani and Boers (2018) also found that a group watching video clips with captions outperformed a group watching uncaptioned clips on a written test, but not on a test presented in aural format.
As discussed earlier, much of the L2 vocabulary research suggests that reading-whilelistening leads to greater gains than either reading-or listening-only. However, the results of the current study do not generally show a difference in performance in the reading-whilelistening and the reading-only modes. This may be related to narration pace in the readingwhile-listening, which may not have been aligned with the participants' reading speed. While the reading pace was intended to be natural, it is possible that (some of) our participants were unable to follow the narration. Research on children's reading comprehension from readingwhile-listening has demonstrated that some readers do not benefit from combining auditory input with reading because the narration rate is either slower or faster than the readers' reading rate (e.g., McMahon, 1983). As Chang (2009) (Feng & Webb, 2020;Vu & Peters, 2020 It is important to note that in the current study, while we have talked about the results in terms of incidental learning, the frequent occurrence of binomials in the texts and the fact that learners completed tasks after each session, could have increased the salience of the items leading to more intentional learning. Further, demonstrating lexical gains, in the current study, is considerably different than in other studies in the literature. The binomials in this study were transparent (the meaning is the sum of the parts) and 'learning' amounts to demonstrating command over the word order (form); thus, it is fish and chips not chips and fish for the existing binomials and plates and glasses not glasses and plates for the novel ones. On the multiple-choice task learners had to choose from three options -'forward' form (e.g., plates and glasses), the reversed from (e.g., glasses and plates), and a 'neither' option.
Importantly, performance on the task implies that it is possible for L2 learners to develop knowledge of novel binomial word order.
Concerning the effect of binomial type (novel versus existing), results indicated that learners were better on existing binomials (e.g., brother and sister) than novel ones (e.g., wires and pipes). Given the fact that the EFL students in the current study have been exposed to English for a number of years (approximately 5 years studying English), these results are promising; they indicate that the students have had exposure to English MWSs, which is reflected in their performance.
With regard to the effect of number of exposures on the incidental learning of binomials in the multiple-choice task, encountering novel binomials an increasing number of times did not have an impact on overall performance in any of the modes. This finding aligns with both Pellicer-Sánchez' (2017) and Szudarski's (2012) studies, which showed that number of exposures did not affect incidental collocation learning from reading. However, this finding contradicts several studies on single words (Brown et al., 2008;Vidal, 2011;Webb, 2007) and studies on collocations (Webb et al., 2013;Webb & Chang, 2020). The lack of an effect of exposure in the present study is likely due to frequency levels that are minimally different (2, 4, 5 and 6 exposures). A significant role of number of exposures on incidental learning of MWSs has been found in studies having a higher number of exposures and wider range (e.g., Webb et al.'s, 2013;Webb & Chang, 2020). Another possible explanation is that more items at each frequency level are needed in order to find a significant difference. Further, as has been speculated in the literature (van Zeeland & Schmitt, 2013;Vidal 2003Vidal , 2011, number of exposures may be more effective if the encounters are spread over a greater period of time, as in a naturalistic learning environment, and not a short experimental context.
However, a significant effect of frequency of exposures emerged in the familiarity rating task. Seeing a novel phrase multiple times had a positive effect on its familiarity.
Learners rated the existing binomials as being more familiar than novel binomials that were encountered twice. However, they showed similar levels of familiarity with the existing binomials and novel binomials that were encountered six times in the text. This suggests that repeated exposure incrementally builds knowledge of new MWSs that regularly occur in a classroom 'story' context. It is worth noting that we seem to see earlier development of binomial knowledge in a familiarity-rating task than in form recognition taskone that shows a clear sensitivity to frequency of exposure. Notably, an effect of exposure for familiarity ratings aligns with that finding in the recognition task showing that there was an effect of exposure when comparing novel binomials that occurred twice to those that occurred six times. It may be that to find an effect of exposure on recognition tasks, the exposures need to be sufficiently different from each other, whereas familiarity ratings are more sensitive to smaller differences in number of exposures.
There are a few limitations of the current research that are important to mention. This study only attempted to gather data on L2 learners' immediate knowledge of binomials.
However, as recognized by Waring and Takaki (2003), using immediate posttests to assess learners' knowledge, when new lexical items are still fresh in the mind, may result in higher gains than might be found at a delay. Therefore, future research should assess whether learners retain this knowledge over time. In addition, as discussed earlier, the items in the post-tests were presented in a written format. While participants in the listening-only condition had only been exposed to binomials in the listening modality, the post-test presented the binomials in a 'reading' mode, which could have impacted performance.
Presenting the items auditorily in the post-tests may have revealed differences in performance between reading-only and reading-while listening that were not captured by the tests in the present study. Other limitations relate more specifically to the multiple-choice test format. As discussed earlier, while we checked that the learning reported in the multiple-choice test was above chance, the participants' guessing could have affected scores. Future research could also include a control group with participants only completing the tests and receiving no treatment, in order to further control for the possibility of guessing.

Conclusion
This study demonstrates that incidental learning is not limited to the type of lexical items examined thus far in the literature but can be extended to other types of formulaic sequences, namely binomials. All three input modesreading-only, listening-only, and reading-whilelisteningcontributed to binomial learning. Notably, both reading-while-listening and reading-only led to better performance than listening-only, while no differences were found between reading-while-listening and reading-only. In addition, the results of this study provide some indication for the role of number of exposures on binomials acquisition, but indicate that at an initial stage, frequency might have a clearer effect at the level of familiarity than at the level of form recognition. Six exposures were not enough to develop knowledge of the correct order of binomials at a level comparable to existing binomials, while they were enough to develop a level of familiarity similar to that of existing items. More work needs to be done to explore frequency and its relation to other factors in order to gain a more accurate picture of its role in incidental learning across a range of MWSs.

Appendix 2: A Sample Text
We've recently moved house so I've been very busy trying to get the new place into shape. It needs a bit of work, and in particular I think a lot of the wires and pipes in the garage are going to need looking at. When we first went in, the floor was flooded, and after a bit of investigation we realised that a whole load of leaves had blocked the guttering and caused the problem. The rest of the house is generally ok I think, but I'd really like to get some new shelves and drawers for the kitchen as the old ones were well past their best. I'm also fairly sure that quite a lot of our plates and glasses got broken during the move, so we'll probably have to stock up on a lot of things before we can cook our first meal. I found a box with spoons and bowls in it so I guess we can live on soup for a while if we need to! I'll maybe need to clean and polish a few things before we use them since the move was a little on the dusty side, but we should be ok for now. One thing we do have plenty of is tea and coffee since I know how useless I am without it! One of the main reasons we've moved is to be nearer to my mum and dad. I'm aware that they're slowing down quite a lot now, so it'll be nice to spend time with them while I can. I used to write and phone a lot but it's not the same as being within walking distance. I called round to see them this morning actually as my mother said that she had some knives and forks that we could have, so that's one less thing that I'll need to buy. One thing we certainly don't need is any more bags and coats since my wife has brought boxes of the things with her in the move. She's not a big fan of shopping in general, so I guess I should count myself lucky, but I was rather hoping that the move would be a good opportunity to get rid of a few things like that.
When I arrived to see my parents, my father was in the lounge. He told me that my mother had been keen to clean and polish everything before I arrived, so when I went into the kitchen, I was presented with quite a sight. There were plates and glasses everywhere and they were all sparkling, so I certainly didn't need to worry about buying more. The shelves and drawers were all empty and it looked like everything had been piled on the table in the middle of the room. I thought that my wife was bad with her bags and coats but that was nothing compared to what I was seeing here! There were also bottles and tins of all kinds piled up all over the floor -I was amazed at what I was seeing.
My mother stopped what she was doing and smiled as I came in. She explained that she wanted to make sure that we had everything we needed in the new house, so she had dug out everything she could think of. She said that she had been to the supermarket and stocked up on plenty of things, including tea and coffee since she knew how much we went through as a family. She had also been collecting various items and had already boxed up some spoons and bowls for us (the one thing I knew we didn't need!) and had been busy all morning trying to clean and polish everything so that it was ready for me to take away. I asked where all of this had come from and she said that it was just what had accumulated over the years. The knives and forks were very good quality, so she was very keen for me to have them.
I came back from my visit to see mum and dad with several boxes full of stuff. After everything that I'd said to my wife about the bags and coats she was going to tease me about this, I was sure! But she was very pleased to learn that we now had plates and glasses for the kitchen, and even more pleased to learn that she wouldn't have to clean and polish them since my mother had already done so. We wanted to unpack the boxes, but since the shelves and drawers were yet to be replaced, we didn't really have anywhere to put the contents, so we decided that it was as good an excuse as any to head to the big shopping park that was located a few miles away. The bottles and tins were all going to go straight into the garage, so I unpacked those, but I made sure that the wires and pipes were kept clear, ready for when the men came to look at them. I also thought that I must make sure to clear the leaves out of the gutters to make sure that the garage didn't flood again in future. I had a rather wicked thought that we could maybe use the bags and coats as a barrier, but I didn't think I'd mention it to my wife!
We got in the car and drove out to the shopping centre. I was glad that we didn't need plates and glasses anymore, as it meant that we could think of much more interesting things to spend our money on. We had never been people with particularly expensive tastesas far as I was concerned having nice things to look after was just annoying, so not having fancy jewellery or things like that just meant fewer things to have to clean and polish on a regular basis! It was nice to know that we wouldn't have to spend money on things for the kitchen since I had a nasty feeling that the wires and pipes in the garage were going to end up costing us quite a bit to fix, so saving a bit of money here was a big bonus.
We walked around for quite a while just browsing some of the shops. I knew that we no longer needed spoons and bowls, but my wife found a set that she absolutely fell in love with, so I decided to buy it for her as a moving in gift. They had quite a striking pattern which would match our colour scheme, so I was happy to get them for her. I did insist that along with the plates and glasses from my mother, we would now definitely need shelves and drawers though, or else she would have nowhere to put them, so we asked if there was a shop that might sell such things in the shopping centre. A helpful young lady told us that there was, so we set off to find it.
We passed right by a shop selling bags and coats and I was a bit concerned that my wife would insist on going in, but she didn't seem to notice. We went on to find the shop that the girl had mentioned. It was right at the far end of the shopping centre, so it took a few minutes to get there, but when we did, we found an amazing selection awaiting us. It took us a while to browse through everything but eventually we found something we could agree on. The shelves and drawers we picked out were not cheap, but they would certainly be sturdy and would last us a long time. While I was there, I also asked if the shop dealt with wires and pipes to see whether they could send someone to look at our garage. They said they could, so I booked a man to come out later in the week, although I must make sure I had time to clear out the leaves before he came.
We headed home very satisfied with our day's shopping. I was particularly pleased that we had gotten our kitchen sorted out, and that we had acquired so much stuff from my parents to fill the cupboards! And the fact that we didn't need to clean and polish any of it ourselves was a bonus, as was the fact that we'd gone shopping and bags and coats hadn't even entered my wife's mind! She was happy with her spoons and bowls so that would probably keep her happy for a while, but getting them did remind me that I would need to buy white paint for the kitchen walls. As we drove home, I thought about how happy we were going to be in our new home, and I decided that I really must remember to write and phone to tell all of our friends our new address.