Proposing an Expanded Measure for Comparing Online / Hybrid to Face-to-face Courses

Online education continues to increase. With increased online offerings, it is important to evaluate the integrity or equivalence of online/hybrid courses relative to face-to-face (F2F) courses. This study used three separate samples of business undergraduates taking both online/hybrid and F2F courses in the same semester (i.e., mixed course delivery format), across summer, fall and spring semesters. Eight items were used to assess students’ perceived favorability of online courses (PFoOC) compared to F2F courses. Across all three samples, two related but distinct sources for course comparison consistently emerged, instructor-related and peer related. An eight-item measure represents a necessary improvement over a previously developed four-item PFoOC measure, because it allows for additional relevant item comparisons between online/hybrid versus F2F courses. It is hoped that this measure can be used to further research evaluating online education.


Introduction
The Babson Survey Research Group (2017) found that for 2016, distance/online education college student enrollment had increased for the fourteenth straight year.Between 2012 to 2016, the number of college students studying on a campus dropped by over one million.Increasingly, many universities and colleges are viewing online education as a critical component of their enrollment strategy (Comer, Lenaghan, & Sengupta, 2015;Jain, 2015).In addition to full-time online Bachelor of Business Administration (BBA) programs, undergraduate business students are increasingly taking online courses to complement their more traditional face-to-face (F2F) course-delivered education.Reasons for taking online courses include: flexible scheduling (Daymont, Blau, & Campbell, 2011), and convenience (Cochran, Baker, Benson, & Rhea, 2016), as well as motivation-related factors such as challenging course material (Eom & Ashill, 2016), and self-discipline (Comer et al., 2015).Students who take both F2F and online/hybrid classes simultaneously in a semester represent a "mixed course delivery format" sample (Blau & Kapanjie, 2016;Blau, Pred, Drennan, & Kapanjie, 2016;Blau & Drennan, 2017).The goal of this study was to propose and test an expanded measure for comparing the perceived favorability of online/hybrid classes (PFoOC) to F2F classes using three mixed course delivery format samples of undergraduate business students.Prior research on measuring perceived factorability of online (versus F2F) classes is reviewed below.

Measuring PFoOC
When examining PFoOC, some researchers have looked at specific practices (e.g., discussion), while others have looked at more global assessments (e.g., learning).For example, Meyer (2007) examined discussions and found that students preferred F2F over online discussion, but acknowledged there were advantages to each medium.Based on media richness theory (Daft & Lengel, 1986), the advantages of F2F discussion included emotional content, energy, ease, the ability to read nonverbal signs, and more immediate feedback.Alternatively, based partially on compensatory adaptation theory (Kock, 2005), the advantages of written online discussion were the opportunity to take time and care to reflect on what response should be made; the fact that the discussions were more reasoned, more informative, and contained deeper analyses; and the opportunity for quieter students to open up online.Other research investigated general learning comparisons between online versus F2F courses.Eom, Wen and Ashill (2006, p. 233) asked about a global general learning comparison between online and F2F in their three-item measure, i.e., "I feel like I learn more in online courses than in face-to-face courses."In a later study, Eom and Ashill (2016) asked students to compare the quantity and quality of learning in online versus F2F classes.One item from Sun, Tasi, Finger, Chen andYeh's (2008, p.1198) three-item "e-learning course quality" scale was that "conducting the course via the Internet improved the quality of the course compared to other courses." Blended or hybrid courses represent some combination of F2F and on-line activities (Arbaugh, 2014), and can allow students to directly compare these specific components within a course.However, hybrid versions of a course tend to be offered less often than either F2F or online versions.Hybrid classes can require "space allocation coordination," e.g., Class A versus Class B meeting alternate weeks/days/times in Room X, where Room X meets specific pedagogical requirements, e.g., allows for class video capture, or collaborative physical set-up.Another factor limiting hybrid course offerings can involve determining the "optimal blend" (Arbaugh, 2014, p.800) for a course, i.e., the combination of classroom-based and online activities that best promotes perceived student learning.Finally, hybrid courses can present a challenge related to faculty meeting teaching load requirements.For example, if the faculty member uses the same online activities for two sections (F2F and hybrid) of a class, but only meets the F2F section once a month in the hybrid format, does this fulfill the requirement of a two-course teaching load?
Studying undergraduates taking both online/hybrid and F2F courses simultaneously allows for a more direct comparison of PFoOC because this mixed course delivery format student has recent, salient experiences with both course-delivery modalities.This is a study strength compared to prior research, which has typically used only online students (Beqiri, Chase, & Bishka 2010;Eom et al., 2006;Eom & Ashill, 2016;Comer, et al., 2015;Sun et al., 2008).In addition, prior empirical studies have generally not compared specific features of online/hybrid versus F2F courses, for example, items comparing video lectures (online) versus F2F class lectures, written discussion board (online) versus F2F classroom participation, or synchronous discussion (online) versus F2F classroom discussion in a comprehensive manner.It is important and necessary to directly compare such specific course features to measure the "integrity" of a course (Daymont et al., 2011), i.e., approximating the same content and process in an online/hybrid course as its F2F equivalent.Using the above specific item comparisons, Blau and Kapanjie (2016) found that a four-item PFoOC scale had a Cronbach alpha of .91 and .89 at two separate times.Cronbach alpha is a measure of internal consistency (or reliability).Ideally a measure should have an internal consistency of at least .70(Nunnally, 1978), with a still higher number (e.g., .80,.90)indicating greater reliability.(Blau, Pred, Drennan, Kapanjie, 2016;Blau & Drennan, 2017), using separate samples of mixed course delivery business undergraduates, found this same four-item PFoOC scale had a Cronbach alpha of .85 and .90.

Additional Items to Enhance Online/Hybrid to F2F Comparison
Despite its strong reliability, the four-item PFoOC measure has limitations for comparing online/hybrid to F2F classes.Many undergraduate classes, business as well as non-business, often have some type of group project as part of a course grade (Hazari & Thompson, 2015), and this group project assessment is not included in the four-item PFoOC measure.In addition, comparing different types of learning between online/hybrid versus F2F classes, such as peer learning and transfer learning (Alavi, 1994), may be important.Peer learning measures student perceptions about whether they are learning from other students in class (Arbaugh, 2014).Transfer learning measures student perceptions on whether they can apply present course material to future courses or work situations (Hart Research Associates, 2015).An expanded measure, incorporating these additional items beyond the initial four items (Blau & Kapanjie, 2016;Blau & Drenan, 2017), is needed to make a more complete comparison when assessing course integrity between online/hybrid versus F2F classes.Such an expanded measure could help faculty who have taught F2F classes evaluate their transition to teaching online or hybrid classes (Wingo, Ivankova, & Moss, 2017).Therefore, the research question (RQ) for this study was: RQ-can a psychometrically-sound expanded instrument for measuring the perceived favorability of online/hybrid courses (PFoOC) be developed?

Samples and Procedure
Three separate undergraduate business student samples were gathered at the end of the summer 2017, fall 2017 and spring 2018 semesters.These samples will be referred to as: summer semester, fall semester and spring semester.The business school is part of a large urban state-supported university located in the Mid-Atlantic region of the United States.The summer semester sample represented two separate six-week terms, aggregated into one semester, while the fall semester and spring semester samples each represented 14-week terms.Near the end of each semester all business undergraduate students who enrolled in at least one synchronous online or hybrid course and also an F2F course were contacted by school email address and asked to voluntarily complete an online survey.Data collections were approved by the University Institutional Review Board as part of a routine program evaluation.As an incentive to complete the online survey, two prizes were offered each semester, e.g., two Apple AirPods, the winners to be chosen by random number lottery.Prior research has suggested that incentives can improve online survey response rates (Fan & Yan, 2010).A student could fill out a separate survey for each online/hybrid course taken in that semester, and the student's name was entered in the lottery for each.Only respondents who completed a survey were eligible to win.Survey reminders were sent one week after the initial invitation.Across the three semesters, the following number of at least partially completed survey responses was collected: summer, n = 250; fall, n = 783; and spring, n = 742.A comparison of this number of responses to the total number of eligible students in each semester indicated that the response rate was approximately 20% per semester.Prior literature has shown that a lower response rate may not be evidence of survey bias (Rindfuss, Choe, Tsuya, Bumpass, & Tamaki, 2015).Across all three samples, over 90% of the respondents were full-time students, i.e., taking at least twelve credits/semester.

Measures
Demographic and background variables.In each survey, 12 variables were measured: Gender; Ethnic background; Transfer status; Commuter status; Currently working; Grade Point Average (GPA); Age; Number of prior online courses taken; Number of prior hybrid courses taken; Number of current online classes; Number of current hybrid classes; and Number of current F2F classes.Gender was indicated as 1 = male, 2 = female.Ethnic background was indicated as, 1 = Caucasian, 2 = African American, 3 = Asian, 4 = Hispanic or Latino, and 5 = other, e.g., mixed, biracial, American Indian.Transfer status was indicated as 1 = no (entered as a first-semester freshman), or 2 = transferred in after first semester.Commuter status was indicated as 1 = no (living on campus or in walking distance), or 2 = yes.Currently working was indicated as 1 = no, or 2 = yes.GPA (cumulative) was measured in incremental tenth response categories, e.g., 2.0, 2.1 …, where 1 = less than 2.0 to 22 = 4.0.Age was measured in yearly response categories, from 1 = 18 years old or less to 34 = 51 or older.Number of prior online and prior hybrid courses taken were each measured from 0 to 9 or more.Number of current online, hybrid and F2F classes were each measured from 0 to 6 or more.A breakdown of responses for each of these variables is given in Table 1.
Items comparing F2F to online classes.Eight items were asked, using a 7-point response scale, from 1 = very inferior to 7 = very superior.In addition, an eighth response point was coded "not applicable".If this response was chosen, it was counted as missing data.The exact content of these items is shown in Table 2.

Data Analysis
SPSS version 24 (SPSS, 2013) was used for all data analyses.Listwise deletion was used to test the research question.Missing data across all studied variables reduced the complete data sample size to 149/250 (60%) for the summer sample, 359/783 (46%) for the fall sample and 372/742 (50%) for the spring sample.This deletion also included multiple submissions from the same person in each sample, to eliminate autocorrelation as a bias (Stevens, 1996).Since these eight items for comparing F2F to online classes had never been tested before, exploratory factor analysis (EFA) was used for the summer sample.For EFA, based on the recommendations of Costello and Osborne (2005), the following four criteria were applied: (1) using maximum likelihood for factor extraction; (2) using the scree test (not eigenvalue greater than one), to determine the number of factors to retain; (3) having an item loading on a factor of at least .50,with minimal cross-loadings on other factors; and (4) using oblique rotation, which recognizes an underlying correlation between factors.The subjects-to-items ratio of 149:8 exceeded the recommended 10:1 ratio for a stable factor solution (Costello & Osborne, 2005).
Confirmatory Factor Analysis (CFA) was used for the fall and spring samples using the same item breakdown for each factor found in the summer sample.Amos as part of the SPSS (2103) package allows for conducting CFA.Following prior research recommendations (Jackson, Gillaspy, & Pure-Stephenson, 2009), two fit statistics, i.e., comparative fit index (CFI) and Tucker-Lewis Index Fit (TLI), and two error statistics, Standardized Root Mean Square Residual Error (RMR) and Root Mean Square Error of Approximation (RMSEA), will be reported.

Sample Characteristics
Table 1 reports the background variables' descriptive statistics for all three samples.All samples have higher percentages of respondents who were: female, Caucasian, non-transfer, on campus/within walking distance, and currently working.Respondents had generally taken a larger number of online versus hybrid courses prior to being sampled and the reported number of total current semester courses supports the high percentage of participating full-time students across the three samples.By comparison, the fall 2017 demographics for matriculated University undergraduates (N = 29, 732) were: 53% female; ethnicity-Caucasian (56%), Asian (11%), African American (13%), and Hispanic or Latino (7%); with 13% either unknown/other or international, according to the University's student profile data.These comparison demographics suggest that the participating undergraduate samples were generally representative of the University, with the exceptions of Asian undergraduates being over-represented, and the "other category" being under-represented, in the participating study samples.

Factor Analyses
Table 2 shows the results of the EFA using the summer sample.Applying the scree test (Costello & Osborne, 2005) indicated two factors and subsequent extraction with oblique rotation showed clean item loadings between two factors.Items #1, 2, 6, 7, and 8 loaded on the first factor, while items #3, 4, and 5 loaded on the second factor.Inspection of the five factor-one items suggested an "instructor-related" factor, while inspection of the three factor-two items suggested a "peer-related" factor.Reliability estimates (coefficient alpha) for each scale in this summer sample were: .91 for instructor-related and .86 for peer-related.A scale mean for each variable was then computed by adding up the relevant items and dividing the total score by the number of items, to allow for easier interpretation.For the instructor-related scale the mean (M) was 4.63, and the standard deviation (SD) was 1.16.For the peer-related scale, M = 4.03, SD = 1.35.The correlation (r) between these scales was, r(147) = .76,p < .01.Although this indicates substantial overlap, (.76) 2 = 58%, it is below the threshold cutoff of .80 for multicollinearity (Stevens, 1996).Using a paired sample t-test, the instructor-related mean was significantly higher than the peer-related mean, t(148) = 8.25, p < .01.
CFA was then separately performed for the fall and spring samples to test the fit of each data set to this two-factor model.For the fall sample, initial model, the following statistics were found: χ 2 (19, N = 359) = 99.43,p < .01;CFI = .97,TLI = .95,RMR = .11,and RMSEA = .11.These statistics indicated that although the fit statistics (CFI and TLI) were at least .90,which is good, the two error statistics (RMR and RMSEA) were greater than .10,which is not good (Browne & Cudeck, 1993).This indicated that a post hoc modification should be done to further improve the data-model fit.As Jackson et al. (2009) noted in their recommendations for CFA, any post hoc model modifications must be reported.Accordingly, one modification, a correlation in the error terms, between the WebEx item (#2, Table 2) and Discussion Board item (#3, Table 2) was allowed.This modification improved the data-model fit, so it was acceptable using all four evaluation indices: χ 2 (18, N = 359) = 69.92,p < .01;CFI = .98,TLI = .97,RMR = .09,and RMSEA = .09.Reliability estimates (coefficient alpha) for each scale in this fall sample were: .90 for instructor-related and .85 for peer-related.For the instructor-related scale the M = 4.40, SD = 1.34.For the peer-related scale, M = 3.97, SD = 1.43.The correlation between these scales was, r(357) = .77,p < .01.Using a paired sample t-test, the instructor-related mean was significantly higher than the peer-related mean, t(358) = 8.63, p < .01. .04.7471 5. Compared to face-to-face courses, the quality of peer learning (i.e., students learning from other students in the class) in the online course was: -.01 .8569 6.Compared to face-to-face courses, the quality of instructor-guided learning in the online course was: .70 .1771 7. Compared to face-to-face classes, the quality of transfer of learning (i.e., using course material in future courses or in work situations) in the online course was: .95 -.05 70 For the spring sample, initial model, the following statistics were found: χ 2 (19, N = 372) = 89.00,p < .01;CFI = .97,TLI = .96,RMR = .08,and RMSEA = .10.These statistics indicated a good data-model fit (Browne & Cudeck, 1993) since all four evaluation indices were acceptable, i.e., CFI and TLI were at least .90, and RMR and RMSEA were not greater than .10.Therefore, no post hoc modification was needed.Reliability estimates (coefficient alpha) for each scale in this spring sample were: .92for instructor-related and .84 for peer-related.
For the instructor-related scale the M = 4.37, SD = 1.31.For the peer-related scale, M = 3.91, SD = 1.44.The correlation between these scales was, r(370) = .79,p < .01.Using a paired sample t-test, the instructor-related mean was significantly higher than the peer-related mean, t(371) = 10.25, p < .01.Overall, there was collective support for the research question with the data analyses supporting a psychometrically-sound expanded measure of PFoOC.

Missing Data Bias
Given the large amount of missing data for each sample, it was important to check for systematic missing data bias (Roth, 1994).Using an independent samples t-test (missing versus complete data respondents), there were minimal significant differences across all study variables between missing versus complete data respondents in each sample.For the summer sample, the only significant difference was that complete data respondents (M = 1.43) were less likely to be transfers than missing data respondents (M = 1.68), t = 2.41(190), p < .05.For the fall sample, the only significant difference was that missing data respondents were older (M = 22 years old) than complete data respondents (M = 21 years old), t = 2.67(633), p < .05.For the spring sample, missing data respondents were again older (M = 25 years old) than complete data respondents (M = 21 years old), t = 7.76(610), p < .01.In addition, spring sample missing data respondents were more likely to be working (M = 1.77) than complete data respondents (M = 1.57), t(610) = 4.12, p < .01.Across the samples, there was some evidence that age affected missing data, but the larger sample sizes also contributed to smaller age differences being significant.Overall, across all the variables collected, there did not seem to be a concerning systematic bias, but rather a random pattern of missing data (Roth, 1994).

Discussion
To the authors' knowledge, this is the first study testing an expanded eight-item PFoOC measure.Prior research has used a four-item PFoOC version (Blau & Kapanjie, 2016;Blau & Drennan, 2017).The results of this study are promising for the revised eight-item PFoOC measure.As online course offerings increase (Babson Survey Research Group, 2017;Jain, 2015), a more detailed PFoOC measure, making specific item comparisons between online/hybrid versus F2F courses, is needed to better evaluate course integrity (Daymont et al., 2011).As traditional or F2F faculty transition to teaching online or hybrid classes, this eight-item measure can help them evaluate the effectiveness of their methods (Wingo et al., 2017).
For this study, across three separate samples, there were two related, but distinct sources for comparison, instructor-related and peer-related.The instructor-related PFoOC scale was consistently rated significantly higher than peer-related PFoOC scale.This suggests that, perhaps not surprisingly, the instructor continues to play the most important role in the perceived favorability of an online/hybrid course.Thus, it is critical to make sure that faculty feel as comfortable as possible with their technical skills (e.g., leading a WebEx session, developing a high-quality video lecture) as they prepare to teach in a virtual environment (Wingo et al., 2017).To be fair, this study did not control for the content of the online/hybrid course, and it is possible that a peer-based source could be higher in some courses (e.g., advanced, qualitative) versus others (e.g., introductory, quantitative).Neither course size nor instructors were controlled for in this study.Ideally, to best study PFoOC, an experimental design would be used, with students randomly assigned to either an online or an F2F section of the same-size course.
Both sections would also be taught by the same instructor.This would allow for the strongest comparison of both groups' perceived favorability of the respective delivery methods.However, such a research design was not possible, and some elements, particularly randomly assigning students, would be difficult to carry out.

Study Limitations and Future Research
This study did not measure student learning styles, which may have affected how they evaluated online/hybrid versus F2F classes (Fendler, Ruff, & Shrikhande, 2016).All data collected were self-reported.However, a one-factor test (Podsakoff, Mackenzie, Lee, & Podsakoff, 2003) found that for the summer sample, the first factor accounted for 27% of the total variance, and there were six factors with eigenvalues of at least one.For the fall sample, the first factor accounted for 26% of the total variance and there were six factors with eigenvalues of at least one.Finally, for the spring sample, the first factor accounted for 28% of the total variance, with six factors having eigenvalues over one.Thus, if the first factor represents method variance, it is not an overriding limitation.Since the research design was cross-sectional, no causal inference can be made about the relationship between the two PFoOC scales.
As noted above, the instructor-related PFoOC scale was consistently rated significantly higher than the peer-related PFoOC scale by students.However, the role of the instructor in affecting PFoOC needs to be more fully assessed in future research.For example, have students taken a class with the professor previously?Although this item was not specifically asked, the correlation of the number of prior online courses students had taken to both PFoOC scales was analyzed for each sample.For the summer and fall samples, there were non-significant correlations between number of prior online courses taken to both PFoOC scales.However for the spring sample, there were small but significant positive correlations for number of prior online courses taken to both instructor-related PFoOC, r(370) = .17,p < .01;and peer-related PfoOC, r(370) = .12,p < .05.Although multiple submissions from the same student within a sample were deleted (as noted earlier), it is possible that a non-graduating student could have participated in each of the independently gathered samples.As such, a "repeat" sample student could have been taking the same professor again in a different online/hybrid course, or the same online/hybrid course if re-taking it.Gathering independent instructor data, such as: years of teaching experience; number of online or hybrid courses taught; number of times the instructor has taught a particular class; comparing instructor perceptions of teaching online versus hybrid versus F2F teaching methods; and if possible, the instructor's teaching evaluations, could help to further understand the instructor's role.This would allow a stronger comparison between student versus faculty perceptions of teaching online versus hybrid versus F2F classes.
All three samples were business undergraduates at a large public university.Testing the generalizability of this study's results using other business school undergraduate samples (e.g., nonurban, private college, and different region), as well as more general undergraduate samples, is important.Across each sample, there was a significant sample size loss due to missing data.This happened despite the use of incentives.Although the missing data seemed to be random, one option to consider in future research is requiring a respondent to complete all items on a survey page before being allowed to continue.The currently working measure was limited to either Yes or No, and did not measure how many hours/week a student worked.Future research is also needed considering course-level factors, such as quantitative versus qualitative or introductory versus advanced, since these factors may also impact student perceptions of their online learning (Comer et al., 2015).

Conclusion
The goal of this study was to test if a psychometrically-sound expanded measure of PFoOC could be determined.Using three separate samples, the results were supportive.Two related but distinct source scales for comparison emerged, instructor-related and peer related.The instructor-related scale was rated significantly higher than the peer-related scale.As online education continues to grow, ongoing research evaluating student perceptions comparing online/hybrid versus F2F courses is needed to assure course integrity across delivery modalities.The eight-item measure presented in this study is one tool for assessing this equivalence.We hope this measure is useful for future research evaluating online education as it continues to increase.

Table 1 .
Frequencies and percentages for summer 2017, fall 2017 and spring 2018 demographic and background variables

Table 2 .
Exploratory factor analysis for comparing face-to-face class to online class item loadings with two-factor extraction and oblique rotation