Cultural Adaptation of Research Instruments – The Case of Materialism Scales Culturally Adapted for Use in China

Introduction: This study seeks to understand whether a back translation process can produce culturally adapted statements. Economic materialism levels are often studied using materials designed in English for the Western subject. This study revolves around culturally adapting, for use in China, economic materialism research instruments that were originally created in English for use in the West. Methods: This study used an instrument translation process followed by a validation step in order to culturally adapt research constructs not originally designed for subjects in China. The study consisted of six main steps: forward translation, reconciliation, blind back translation, expert committee review, validation, and statistical analysis. Results: Inadequate back translated items were identified. The analysis revealed several items that should be redesigned specifically for the Chinese cultural context. This study shows that the back translation process may be insufficient, and that Western-developed research instruments can be improved through a process of cultural adaptation.


Introduction
China today is a far different place from the China of yesteryear.Where once its society was rabidly anti-materialistic, as it practiced "puritan communism" (Zhao, 1997, p. 42), today the Chinese people are hungry for consumer goods, as though in response to the many years of deprivation (Zhao, 1997).Chan and Prendergast (2007) define materialism as "a set of attitudes which regard possessions a symbols of success, where possessions occupy a central part of life, and which include holding the belief that more possessions lead to more happiness" (p.213).However, efforts to research materialism and the values of material possession and acquisition, whether in China or elsewhere, commonly centre on research instruments such as surveys that have been written originally in English and designed for a Western subject.Given the substantial differences in both culture and language between China and Western countries (Doctoroff, 2005;Nisbett, 2003), this approach is problematic.Research instruments that have been identified, validated, and declared reliable for Western contexts may generate invalid results in China, due to important differences between the Western and the Chinese cultures (Chan & Prendergast, 2007).The translation process used to adapt these Western research constructs for the Chinese population commonly involves a verification step, back translation, by which the material that has been translated in Chinese is then translated back into English, to verify its adherence to the original document, which means that the English language and Western culture remain the hegemonic force in these translations.Nonetheless, back translations are commonly used in cross-cultural research.
The purpose of this study was to analyze two materialism value scales (MVS) designed for Western consumers, and to translate and culturally adapt them for use with Chinese consumers, with the goal of achieving more reliable results.Materialism has been scrutinized in some depth by researchers, but cross-cultural research concerning the Asian consumer market is now becoming important due to the rise of Asia in the global markets.The cultural and language differences between the East and West call for new approaches to understanding the values of the population (Podoshen, et al., 2011;Watchravesringkan, 2012;McCarthy et al., 2013).
This study translated, and culturally adapted for use in China, the 18-item Materialism Value Scale MVS (Richins & Dawson, 1992) and the 24-item Materialism Scale (Belk, 1984).These are the two scales that are most frequently used by researchers (Watchravesringkan, 2012).Some researchers have recently suggested a shortening of the scales, such as Opree, et al. (2011) in their work with children, and Trinh and Phau (2012), in their development of a 16-item scale.Nonetheless, the 18-item MVS of Richins and Dawson, and Belk's 24-item MVS remain the classic scales used in research.In addition, there is a positive relationship between the two materialism instruments, as the Richins MVS scale correlates with Belk's instrument of materialism (Richins, 2004).Richins MVS approaches materialism from the values perspective -that materialism and the desire for possession and acquisition of goods result from commonly-held values in a society; Belk's MVS approaches materialism as resulting from character traits (Zarco, 2014).Therefore, the two scales were expected to measure materialism from complementary, overlapping perspectives.The intention was to observe whether a simple back translation is sufficient for translating research instruments appropriately for research.

Literature Review a. Translation and Cultural Adaptation of Research Instruments
Translation processes can include two approaches: forward translation and back translation.Forward translation is the process of translating the research document, or instrument, from its original language into a target language.Back translation is the process of translating the result of the forward translation back to the original language.When a different culture is involved, however, forward and back translation are not sufficient to attain cultural equivalence with the original instrument (Cannino, Bird, Rubo-Stipe, & Bravo, 1997;Watchravesringkan, 2012).Each item has to contain the nuances of the original instrument, and if the items are not translated exactly the same way, there is a risk that the translated item will refer to an adjacent concept or meaning.In some cases, only part of the original meaning is able to be preserved after translation (Cannino et al., 1997).The loss of meaning, the narrowing down of item meaning, or the expansion and broadening of the coverage of an item can be significant issues during both forward and back translation.To avoid such issues, a process of cultural adaptation that follows a mixed qualitative and quantitative approach is superior to arbitrary decision making done by individuals during research because it quantifies the extent of agreement on the appropriateness of the translation and its cultural fit (Sumathipala & Murray, 2000).
The cultural adaptation of research instruments is the most challenging and sensitive issue of translation, in part because of the multitude of languages and cultures.South Africa has nine official languages, as does India.The French language is significantly different in France than in Canada, and Spanish is different across Spanish-speaking countries.German is also spoken differently in Belgium, Switzerland, and Germany.
In China, there are five mutually unintelligible varieties of Chinese language, and under each are multiple dialects that are variations and modifications of the five large groups.There is also distinction between the spoken and the written language: Zhongwen (中文), written language, and Hanyu (汉语/漢語), spoken language.The Chinese language is supported by a pictographic script, and the language needs to provide pictures even in speech (Ozolins, 2009).For example, concepts like 'length' are explained as 'how short' or 'how long,' whereas 'very difficult' is a concept translated to the more pictographic description of 'large difficulties' (Ozolins, 2009).
In addition to these differences, there are nuances, idioms, and significant differences in local culture, understanding of concepts, literacy levels; and other issues (Andriesen, 2008).As well, cultural differences such as that of individualism versus collectivism may play a role in the equivalency of concepts.Although it is generally believed that "materialistic values are common traits that exist in individuals across cultures" (Watchravesringkan, 2012, p. 236), nuances of these values may not be as common as we think.For instance, "[w]hile the United States is an individualistic culture where people view themselves as independent of others, Thailand is a collectivist society where individuals view themselves as interdependent with others and their behaviors are mainly driven by social norms" (Watchravesringkan, 2012, p. 237).Cultural differences such as this may influence the reception of a given value cross-culturally.In China, recent historical and cultural changes have altered certain time-honored values, such as that of the head of the household, generally the man, being the dominant driver of familial behaviors.Whereas the most highly-desirable goods are those which most benefit the family, and family-centeredness remains of prime importance in China, the official policy of the one-child family which began in the early 80s has resulted in "an unprecedented generation of single children in cities, nicknamed the 'little emperors of China' because of the popular belief about the 'imperial' position they occupy within the family.This policy has, in a way, turned the traditional power structure of a Chinese family 'upside down" (Zhao, 1997, p. 8).
A culturally equivalent research instrument has to take into account the culture's shared norms and socially desirable behaviors, together with shared beliefs, ideas, and assumptions about the world.Moreover, the translated instrument has to be in line with the local culture's expectations and moral standards, and, of course, it is desirable to consider linguistic appropriateness for the local target population's comprehension, reading skills, and education level.The simplest terms can have significantly low equivalence; for example, intensity words such as 'moderate' and 'frequent' can be understood differently in different cultures, resulting in conceptual lack of equivalence unless such words are carefully analyzed.
Translation may stand as support for any ethnographic research, and translating a research instrument is a critical task.Researchers in different fields need consistency in their research across multiple populations of different languages, and have only recently begun to identify the best practices for translating research instruments into other languages and adapting them for other cultures (Harkness & Schoua-Glusberg, 1998;Harkness, 2003).Standards for translating, and for evaluating translated instruments, are variable across researchers and fields, and, at times, budgets and focus are not dedicated to translation efforts (Harkness, 2003;Rode, 2005;Montoya, 2011).This can result in poorly translated instruments that do not capture the entire plateau of differences between languages and cultures (Harkness, Pennell, & Schoua-Glusberg, 2004;Harkness & Schoua-Glusberg, 1998;Harkness, 2003;Rode, 2005;Montoya, 2011).

b. Translating Research Data
Data collection for the purposes of research can be costly and time consuming, and questionnaire development requires specialized knowledge; therefore, researchers often rely on data collection instruments already in existence.Adopting an available instrument is easier than creating one, and a number of validated instruments in most fields of research are used repeatedly by different researchers.A major problem arises when these instruments need to be used in different countries, requiring the instrument to be translated.The research questionnaires or other documents have been prepared for a population different from the one the researchers may be studying.Researchers use the' ask-the-same-question' model as a means of preserving the properties of the instrument.The "translated items must present the same stimulus as the source questionnaire items and do so by referring to the same entities (abstract and concrete) as do the source items" (Harkness, van der Vijver, & Johnson, 2003, p. 37).
Generally, this results in a verbatim translation of the questions, which is precisely where the problem lies.Different cultures have different cultural schemas and worldviews, different lenses through which they see the world, as well as different labels and descriptions of concepts.A simple back translation is insufficient in numerous cases, including in medical back translation (Rode, 2005;Smith, 2003;Rosdolsky, 2010;Muller et al., 2013).The wording of items within surveys is culturally anchored (Harkness, van der Vijver, & Johnson, 2003).Therefore, some questions may only be appropriate in some cultural contexts, and the change of context may render the scale invalid due to more or less subtle cultural differences.Furthermore, connotations can be lost after translation, and words may have historical connotations in certain countries (Rode, 2005).The context can also have a significant influence on meaning (Smith, 2003).
All these issues can result in using a back-translated instrument that does not measure the intended concept.Back translation is not a sufficient solution to these issues, as it focuses mostly on semantics and less on comprehensibility, naturalness, or connotations (Van der Vijver & Leung, 1997).Current practices of back translation have been criticized as insufficient in cross-cultural research.Only parts of the original meaning can be expressed in different languages (Sumathipala & Murray, 2000).The simple back translation may result in translated items that are too narrow, capturing only a portion of the initial question, or items that have been extended to contain more than the original item.Sumathipala & Murray (2000) and Harkness (2003) both conclude that one individual producing a translation is far less effective than a group; "A group is better placed to translate, modify, or eliminate inadequate or ambiguous items and generate culturally appropriate translation with semantic and conceptual equivalents" (Sumathipala & Murray, 2000, p. 87).In the committee approach proposed by Harkness (2003), following the first back translation, a group of experts reviews the translation, produces different versions of the instrument, and then collaborates to reach a consensus on which version of the instrument is best.At the end of the process, they pretest the instrument, documenting the issues and problems encountered while providing solutions for future use (Harkness, 2003).The problem with this method is the costly, time consuming process usually avoided in everyday research.Many researchers consider that simple back translations are "often successful enough for the purpose of the research at hand," and "if some of the aforementioned problems occur due to mistakes in translation, there is no immediate way to see them so researchers can pretend they do not exist" (Rode, 2005, p. 17).This path is wrong for research for multiple reasons, as it destroys reputations, it is unethical, and it leads knowledge on a path towards nothing.Rode's suggestion that a rigorous manner and direction must be preserved in research is correct: "No matter how low the budget for research is, special care should be taken to check the quality of the translated instrument" (Rode, 2005, p. 25).
The present study follows a slightly modified process that was tested and recommended by Harkness (2003).An additional step was added to the process to improve the possibility of identifying not-adapted items.The added validation step was used in conjunction with a different group of bilingual translators, and the results were analyzed statistically.This process resulted in the identification of several items considered to lack conceptual or normative equivalence, which happened in cases where terms or concepts were not possible to translate, when the translation was considered difficult as the exact idiom or concept was not the same in the local language and culture, and when the modification of meaning led to a loss of connotation, making items too narrow or too wide (Sumathipala & Murray, 2000).To identify the items that lacked cultural adaptation, the committee had to fully agree on the quality of each translation; after long discussions, lack of consensus resulted in revealing items that were not translated to every member's satisfaction.
This study also used more translators than the guidelines generally recommend.In general, it is considered that several translators are sufficient for each direction of translation, whereas the present study used more than 100 translators for the two translation directions combined.Their suggestions, notes, observations, and ideas helped to improve final instrument items, and allowed for a more in-depth analysis of each item.
In addition, since an in-depth literature review is considered to be a vital step for cultural adaptation (Gjersing et al., 2010), this study contains a review of literature in the traits and values domain.Each item of a research instrument has to be analyzed separately and assessed as to whether it is equally relevant and acceptable in the target population as in the original target population (Borsa, Damasio, & Bandeira, 2012).Such conceptual and item equivalence cannot be assessed without an in-depth literature review.Beaton, Bombardier, Guillemin, and Ferraz, (2002), Harkness (2003), andSolano-Flores et al. (2005), as well as institutions such as the U.S. Census bureau, have developed guidelines and translation processes to be used in cross-cultural research to avoid validity problems.The steps, then, the principles of good practice as recommended by Wild et al. (2005), are instrument creation, forward translation, followed by reconciliation, back translation, and expert committee.Usually, recommendations are made to return to the original target population and update, adjusting the adapted instrument after consensus is reached; however, in the present study this was not possible as the original and the target population were different (Gjersing et al., 2010).Rather, this study included the additional validation step described above, in which the research instrument was tested, with the aim of evaluating the understanding and the similarity of each item.

c. Instrument Translation Processes and Validity
The cultural adaptation of research instruments is of vital importance, as is clear from the literature that focuses on medical research (Beaton, et al., 1998;Bullinger, 2003).However, this principle is equally as true when psychological questions are at issue (Borsa, Damasio, & Bandeira, 2012;Watchravesringkan, 2012;McCarthy et al., 2013).Gjersing, Caplehorn and Clausen (2010) used an instrument that was not adapted in Norway, and the instrument failed confirmatory analysis.The authors explained that instruments need to be adapted from the perspectives of language, time, and context.Indeed, considering the importance of cultural adaptation, there is little agreement as to how to achieve this adaptation.Gjersing et al. (2010) emphasized that previously validated instruments do not signify that the instrument is valid in another time, culture, or context.Linguistic translation does not necessarily ensure the validity and reliability of an instrument, and so it is vital that each item is adapted properly (Gjersing et al., 2010).
Back translation has been defined as translating while trying to change as little as possible in the final version (Drennan, Levett, & Swarts, 1991).Back translation has not been not sufficiently studied even though this procedure is ubiquitous in business, technical, and medical instrument translation (Ozolins, 2009).The generic guidelines that exist, for example the World Health Organization (1994) guidelines that insist on a process of forward translation, synthesis, back translation and pilot testing, are challenged by the literature coming from sensitive areas of research, such as cross-cultural psychological studies, including Bullinger's (2003) review of methods and debate over cultural adaptation and Ozolins's (2008) overview of methods describe this criticism.The criticism describes the most common issues, namely grammar discrepancies, translation noise, and cultural issues that can be ultimately solved more or less rigorously, depending on the degree of confidence that authors want to have in the final version of the instrument.This differs across researchers, as some may take translation as a less important step in their research process, and not observe every nuance and variation that can exist or be missed in translation (Ozolins, 2009).
In studying translation, researchers have identified specific issues for cross-culture research.For example, Behling and Law (2000) explored three levels of equivalence and found differences depending on the types of questions asked.They identified different types of questions, such as demographic, behavioral, and knowledge related, and concluded that semantic and conceptual equivalence is relatively easy to achieve, especially in the case of demographic questions where ideas and words are more universal and commonly used.However, they warned that it is more difficult to achieve normative equivalence, as cultures differ radically.For example, even in the case of demographic questions, while semantic and conceptual equivalence can be achieved, cultures differ on their willingness to share personal information.From a normative perspective, equivalence is more difficult to achieve, as abstract concepts and ideas may not be equivalent or equally relevant across cultures (Behling & Law, 2000).Furthermore, there are often no equivalent terms for a given concept, and Western ideas of risk, health, and need, among others, may be viewed differently and be less dominant in other cultures (Hunt & Bhopal, 2004;Watchravesringkan, 2012).
A properly translated instrument must have semantic, conceptual, and normative equivalences to the original instrument.Semantics refers to words and syntax of sentences, an important dimension, but less important than conceptual or normative dimensions (Borsa, Damasio, & Bandeira, 2012).Conceptual equivalence refers to measuring the same concept as the source instrument, a dimension that might force changes in syntax and wording for the sake of moving closer to the original concept (Borsa et al., 2012).Normative equivalence refers to the successful measurement of different social norms across cultures, a dimension that creates further incentives to modify the instrument items for cultural considerations (Borsa et al., 2012).In addition to these dimensions of assessment, other types of assessment were identified, such as experiential equivalence, or whether each item is applicable in the target culture (Borsa et al., 2012).The final choice of item translation to be used has to be made through a consensus among the judges (Gjersing et al., 2010).It was recommended that the entire process to be transcribed and item choice or removal be described to provide a qualitative overview of the process; the present study followed this recommendation by providing the entire set of information related to the translation and cultural adaptation process.Full scripts describing requirements for each stage of the process are available in Beaton, Bombardier, Guillemin, and Ferraz (2002).
Researchers have identified specific problems concerning validity in translation research instruments across cultures.During translation, there are five kinds of validity problems (Bravo, Woodbury-Farina, Canino, & Rubio-Stipec, 1993;Gaviria et al., 1984): content validity, semantic validity, technical validity, criterion validity, and conceptual validity.Content validity refers to the fact that each item assesses concepts that are equally relevant in the investigated culture.Semantic validity refers to the words being used that contain a similar meaning to the original version.Technical validity refers to the achievement of similar effects in different cultures.Criterion validity refers to measuring the same concept.Finally, conceptual validity refers to asking questions that are relevant to the concept construct in both initial and the investigated culture.

Research Design
The objective of this study was to adapt two Western materialism scales for use in China, the 18-item Materialism Value Scale MVS (Richins & Dawson, 1992) and the 24-item Materialism Scale (Belk, 1984).To do so, the instrument translation process as recommended by Harkness was followed (Harkness, 2003), improved with the validation step adapted from Montoya, Llopis, and Gilaberte (2011) and an additional test was added using the Median and the Mann-Whitney analyses.This study consisted of six main steps: forward translation, reconciliation, blind back translation, expert committee review, validation, and statistical analysis.
In the first step of the study, the materialism scales were translated into Chinese by Chinese students, from large universities in Shanghai, who had good levels of English.During this forward translation step, Chinese students provided notes, observations regarding confusing items, issues, and suggestions.After the forward translation, the reconciliation step consisted of analysis for each translation, comments and notes, and with the help of two assistants, the assembly of the best synthesized Chinese translation for each item.This step took 2 days; the help of native Chinese assistants was required as the researcher is neither native Chinese, to understand cultural nuances, nor fluent in the Chinese language.
Next, during the blind back translation, other Chinese students, with similarly high English language skills but unfamiliar with the original English version, translated the Chinese version that resulted from the forward translation back into the English language, providing the researcher with another set of notes, observations, issues and suggestions.The intention was to have an in-depth image regarding each item's issues, discrepancies, and translation concerns.
After the back translation, an expert committee, composed of seven members with similarly good English skills, was summoned for a translation review and ample discussion regarding the translation and cultural differences, student notes, observations, and suggestions; this committee discussed each item in both the forward and back translations.Based on absolute consensus, the expert committee chose the best translation option for each item so that it fit best with the Chinese culture, worldview, and life concept.The committee received the materials before the committee meeting, and consensually agreed on each translation; if no consensus was found, then the ambiguous or inadequate items were eliminated.The process whereby equivocal response or disagreement was considered to be failure to reach consensus is similar to the one used by Sumathipala and Murray (2000).Each translation stage (forward and back) took around 40 minutes, and the expert committee discussion took approximately 4 hours.
The items that remained after this committee deliberation were further assessed during an added validation step to evaluate the comparability of the language.Specifically, this involved evaluating the formal similarity of words by looking at the similarity of interpretability; the degree to which the original and translated versions engendered the same response, even if the wording was different; and the degree of understandability -that is, the degree of comprehension of the two versions, even if the wording was different.Two statistical methods were used to identify whether all items could be considered to be fully culturally adapted.

a. Support and Rationale
Consensus building and consensus measurement techniques were central in the cultural adaptation of the research instruments in this study.The committee approach was the means to consensus development, and the validity step was a statistical measurement for consensus (Jones & Hunter, 1995).Furthermore, the selection of committee members was completed in line with Hunter et al.'s (1994) guidelines: members should have very good knowledge of the English language, have expressed interest in the topic, and, further, should represent a variety of backgrounds, to avoid situations where a particular opinion or interest would dominate (Hunter, McKie, Sanderson, & Black, 1994).The researcher of this study participated as facilitator for group meetings but did not participate as a decision maker due to insufficient experience in the Chinese culture.
This study followed a methodology similar to one used in the medical field for the translation of medical instruments used in research (Wild et al., 2005).The lesson learned in the medical world was that, when language issues are not well understood, things can be taken for granted until things go wrong.Unfortunately, in medicine, one error can result in serious consequences and immense costs.Likewise, in business, errors can also result in large costs and unexpected problems.Accordingly, brokering knowledge from the medical world is appropriate to ensure both efficiency and effectiveness during the translation and adaptation of instruments for business research.
The present study followed the best practices of most translating guidelines (Behling & Law, 2000;Solano-Flores & Hurtado, 2005).As suggested by the guidelines, at least two bilingual translators created independent forward translations.Written documentation was kept for each item and for each step of the process.The bilingual translators used in the second translation were unaware of the original version of the instrument and did not have access to the first translation.An expert committee comprised of seven experienced bilingual speakers met to discuss and consolidate the best version of the two translations (Beaton, et al., 2002;Harkness, 2003;Solano-Flores & Hurtado, 2005;United States Census Bureau, 2015).Moreover, the translators were provided with a description of the scope of the project, as indicated by the US Census Bureau guidelines.
The procedure followed during the present study rested on the principle that no one individual should decide anything alone.Therefore, at least two people were involved in most steps.For example, after the translation from English to Chinese, both an assistant and an expert participated in deciding the best translation of each item into Chinese, resulting in a synthesized translated version.The expert used in this stage was a native speaker who had not been involved in the forward translation stage.This is according to Wild et al.'s (2005) description of best practices for the translation and cultural adaptation process.Wild et al. (2005 specified that, in reconciliation stages, it is acceptable to use "an independent native speaker of the target language who had not been involved in any of the forward translations" (p.99).This study used two independent native speakers -the expert and an assistant.No participant was remunerated for participating in this study, as it was considered that material compensation could impact negatively on the results.
The purpose of the large number of translators selected for this study was to provide a variety of interpretations and views, observations and notes, avoiding the personal interpretations that could be possible if only one or two translators had been used.The Belk (1984) MVS trait scale contains several items that are potentially ambiguous, especially in a Chinese context - Chan and Prendergast (2007) specify the difference, for instance, between the Western and the Eastern sense of hierarchy and group norms; therefore, this study deemed it preferable to use more translators in order to adjust more accurately to cultural context.
In addition, the back translation method has been criticized for the bluntness of the method, and for its inability to identify subtle errors.The use of many translators, combined with the expert committee and the statistical validation, was intended as a guard against many of the limitations of the back translation method (Harkness & Schoua-Glusberg, 1998;Wild, et al., 2005).
As well, this study aimed to adapt the Western materialism research instruments for use in both urban and rural areas of China.Therefore, the adaptation needed to be appropriate for several reading and educational levels.In this case, the use of many translators with different levels of understanding was meant to reduce demographic limitations (Solano-Flores & Hurtado, 2005).
The use of multiple translators, committee members, and the statistical validation of consensus eliminated additional potential issues signaled by other studies (Gjersing et al., 2010), such as personal idiosyncrasies added by a limited number of translators, or a rushed back translation in which the researcher might have total decision power.It was necessary to remove the idiosyncrasies from the source instruments, as well as the outdated items, as the instruments were created in the 80s and 90s, and new settings due to changes in society over time had to be taken into consideration (Gjersing et al., 2010).

b. Methods
This study adapted the methodology of Harkness (2003) for translations.In this approach, in addition to forward translation, synthesis, and back translation, a group of experts reviewed translations, produced different versions of the instrument, and then collaborated to reach a consensus regarding which version of the instrument was best; at the end of the process, they pretested the instrument and documented the issues and problems encountered, while providing solutions for future use (Harkness, 2003).The back translation was performed by a large number of students, and the expert committee collaborated to reach a consensus regarding the best version for each item.
During the first stage, seventy-four Chinese college students translated the questionnaire from English to Chinese.The students answered the form, Scale Translation Form ENG to CN, and translated the original items from English into the Chinese language.They were asked to make notes wherever the items, phrasing, or the concepts were confusing.After that, the synthesis of the translations was done with the help of a Chinese University faculty member and a Chinese assistant, in order to choose the most common translations to create the first Chinese questionnaire.During the back translation step, forty-four college students from another university translated the synthesized questionnaire from Chinese back into English.These were different students from a different university, and they had no access to the original version of the questionnaires.This was a double blind translation, as the students did not see the original English version of the items.Moreover, each stage was performed with different groups of students from different universities to assure the reliability of translations.During this stage, the students used the Scale Translation Form CN to Eng.The two translations led to a number of discrepancies, as some items confused the translators; some items were difficult to understand conceptually and were difficult to translate into Chinese; and other items were not offering equal choices (e.g., everybody would have chosen only one option).
After translation, an expert committee was summoned to decide the best translation for each item.The committee method is one of the most widely used methods in cross-cultural validation of instruments (Harkness, 2003).For this stage, the expert committee was composed of faculty members from two large public universities in Shanghai, all of them native Chinese with excellent English language skills.The committee members were seven Chinese natives with an average age of thirty, all working in higher education, all college graduates.The committee's objective was to review the generated scale translation and choose the best possible translation adapted for Chinese customs and culture.For each questionnaire item, there were seventy-four choices from English to Chinese, and forty-four translations from Chinese to English.The committee members were given lists with the most common translations for each item.The session took four hours.The agenda of the meeting had three points: (a) discuss problematic items, define, and choose the best translation for each; (b) discuss and answer any questions regarding any item; and (c) review all items and agree on a final translation.
The condition for accepting translated items in the present study was full consensus between the committee members, and the committee members were asked to explain whether they fully agree or fully disagree with the resulted translation.Disagreements were discussed and alternative translations were sought.Translations failing to achieve consensus were discussed in depth, and the initial translations were again scrutinized for these items.
A number issues in eq to translate and even w ( 9 The χ^2 table is used to find the critical value at 5% significance level to test the hypothesis.If the calculated value is greater than the critical value, then H0 (i.e., k items have equal medians) is rejected.Furthermore, at least one item has a different median than others.If the calculated value is lower than the critical value at 5% significance level, H0 is accepted (i.e., k items have equal medians).If the median test is significant (calculated > critical value at 5% significance level), then the item that consists of the most number of 'poor' observations is removed, and the Median test is repeated on the rest of the items until the H0 is accepted.

d. The Mann Whitney Test
The Mann Whitney test can be used to check whether two groups are identical.This powerful nonparametric test is an alternative to the parametric t test when the sample size is low or t-test assumptions need to be avoided.This test encompasses the following hypotheses: Suppose we have two groups, A and B; H0: There are no differences between the responses of the two groups; H1: There are differences between the responses of the two groups.The Mann Whitney test's methodology follows these steps: First, determine the sample sizes of the two groups (e.g., let n = sample size of Group A and m = sample size of Group B); second, pool the data and rank together the scores of both groups, assigning the rank of one to the score which is algebraically lowest.
Test statistics.The test statistic U is given by focusing on one of the groups, Where R1 -sum of the ranks assigned to the group of the size n U_2=nm+(m+1)m/2-R_2 Where R2 -sum of the ranks assigned to the group of the size m The smaller of the two U values should be considered as test statistics.
Decision rule.If the smaller U ≤ critical value at 5% significance level, then the null hypothesis is rejected.
When sample sizes (n , m) increase in size, the sampling distribution of U rapidly approaches the normal distribution with mean μ=nm/2 and variance σ2= (nm(n+m+1))/12 then the test statistic Z=(U-μ)/σ ~ N (0 ,1) can be used to test the hypothesis.If the p value of the Z is less than 0.05, then the null hypothesis is rejected at 5% significance level (i.e., the compared item is significantly different than Item 1 for that particular characteristic).Note: If the p value of the Z is less than 0.1, then the null hypothesis is rejected at 10% significance level.

Details of Analysis and Results
During the first translation from English to Chinese, the seventy-four translators had to read and translate each item from the questionnaire.They were given ample space for notes and observations and were asked to explain whether the English item made sense for them, to avoid items that led all respondents to only one choice.The respondents were also asked to explain confusing items, reasons for confusion, and potential interpretations and translations into Chinese.The notes were compiled during reconciliation with the help of two native Chinese people who were not involved in the forward translation.After reconciliation, the best translation was synthesized into the first version of the questionnaire in Chinese.This synthesized version was used in the back translation into English.

a. Translations from Chinese to English
After deciding the most frequent, best translation from English to Chinese, a new form was compiled.A group of forty-three translators translated the new Chinese questionnaire into English.This was a double-blind translation, as the students translating the new Chinese questionnaire into English did not see the original English version of the items that resulted in the development of the new Chinese questionnaire.Moreover, each stage was performed with groups of students from different universities to assure the reliability of translations, and to conform to the requirements of a blind back translation.In translating from Chinese back to English, there were significantly fewer issues than in the earlier English to Chinese translation.

b. Expert Committee
After the back translations were completed, a group of experts reviewed these translations and collaborated to reach a consensus regarding which version of the instrument was best.This review happened in a session that took approximately four hours.The committee members received all material in advance and were asked to bring their notes at the meeting.Because of their interest in the topic, the experts spent many hours reading the materials and preparing for the meeting.The agenda of the meeting had three points: (a) discuss all items, define, and agree on each translation; (b) brainstorm, ask and answer any questions regarding any item; and (c) review all problematic items.The following issues emerged from the discussion: First, a number of expressions seemed more difficult to translate and comprehend, given the rigor/rigidity of the Chinese language.Expressions such as "pay much attention" (6), "all that important" (9), "don't mind" (32), "having luxury in one's life" (12), "wouldn't be any happier" (16), and "appealing" (19) posed concerns for committee members.Second, some items raised questions regarding meaning in English, as previously discussed.The committee reached a consensus for each item analyzed.The problematic items where consensus for translation was not reached were considered inadequate.

c. Translation Validation
After the committee meeting, forty-four students not involved in the earlier translation processes evaluated the original English items and the resulting Chinese translation in terms of equivalence.The criteria used included the following: Comparability of language, which refers to formal similarity of words; similarity of interpretability, which refers to the degree to which the original and the translated versions engender the same response, even if the wording is different; and degree of understandability, which refers to the degree of comprehension of the two versions, even if the wording is different.
Participants were asked to evaluate the translated items, and the validation form presented each item in both the original version and back-translated version.The scale used for the evaluation of each item was as follows: (1) extremely comparable, similar, or understandable; (4) moderately comparable, similar, or understandable; and ( 7) not at all comparable, similar, or understandable.
Typically, the middle value should be neutral, such as don't know/not sure/no opinion; however, the researcher intended to use the validation methodology as proposed by Montoya, Llopis and Gilaberte (2011), avoiding the potential for central tendency bias where many respondents choose the neutral, middle value, regardless of how they think.In addition, especially for the validation stage, the intention was to use the respondents' responses in a meaningful way; accordingly, the decision was made to drop the midpoint for this stage.As this stage was testing the level of quality of a translation, it made sense to eliminate the middle value and look in more depth at quality by adding a third positive.In addition, if the strongly disagree to strongly agree scale is a continuum theoretically from -1 to +1, in the case of this study (and in general in cases of quality evaluation, as this study concerns a quality evaluation scale), the scale is a continuum from 0 to 1: It is good (and how good it is, on various levels) or not.
There is no equivalent of strongly disagree or -1.Poor or very poor is the equivalent of 0, unsatisfactory quality, while excellent is the +1.Forty-four respondents, all Chinese students studying English language at the university level, completed the validation form.The 32 initial items were numbered in order.

d. Validation Methodology
The quality of each item is scaled using five categories: poor, fair, good, very good, and excellent.These categories are assigned values to aid in analysis.The categories are as follows: Poor -1, fair -2, good -3, very good -4, excellent -5 Step 1 -All k items are separately compared for three characteristics using the Median test (i.e.comparability of language, similarity of interpretability and degree of understandability of each item).
Step 2 -a) If the Median test is significant (the null hypothesis is rejected) for a particular characteristic, step 1 is repeated by removing the item which has the most observations below a score of 3 (i.e. the item that has more 'poor' and 'fair' observations.)b) If the Median test is not significant; Step 3 is repeated by removing the significant items from Step 1.
Step 3 -Each remaining item is compared using the Mann Whitney test.These are compared with Item 1, which has no 'poor' or 'fair' observations.Since the number of observations are larger than 20, Z statistic can be used to test the hypothesis.
If an item concerned is significant (different) in a particular characteristic when compared to Item 1, then that item is marked as such.
The Median test results for each characteristic are presented in Figure 3: The Media set after r characteris The Media removing I The Median test concluded that, apart from Items 21 and 23, other items have similar medians (i.e. the other 30 items are not significantly different from the median of 3).This observation is based on the number of observations for comparability of language, similarity of interpretability and degree of understandability.Items 21 and 23 are significantly different and therefore not fully culturally adapted.
The Mann Whitney Test was performed to compare the remaining items with three separate characteristics of item 1.This test was performed using the MINITAB statistical package.The following results were obtained: if at least one characteristic is significantly different from Item 1, that item is considered not culturally adapted, and therefore not suitable to use for collection of information.As a result, Items 4,8,14,15,17,18,20,24,and 32 were considered significantly different and not culturally adapted.

e. Validation Discussion
Using both non-parametric test results, Items 4, 8, 14, 21, and 23 were significantly different than their original version.This result is based on the comparison done between the comparability of language, similarity of interpretability, and degree of understandability of translated version.Comparability of the language was significantly different in Items 15, 24 and 32, but other two characteristics were not considered significant at 5% significance level.Items 20 and 18 were significantly different compared to the similarity of interpretability, but not when compared for the other two characteristics.Item 17 was not significant for similarity of interpretability but was significant when compared to the other two characteristics.
Translated items (Items 4,8,14,15,17,18,20,21,23,24,and 32), with a 5% significance level (95% confidence), cannot be validated for use in the survey.This is relatively consistent with the results.However, the results do permit the final questionnaire to contain items with several "poor" evaluations in all characteristics, with a 5% significance level (i.e.Items 11 and 13).This was believed to have the potential of skewing the results.Therefore, the Mann Whitney test was repeated with a 10% significance level.
Other than rejected items at a 5% significance level, the following items are significantly different at a 10% significance level compared to Item 1.Those are Items 9, 11, 12, 13, and 29.
When 10% significance level (90% confidence) was considered for statistical comparison, the items that were found to be not culturally adapted could not be validated for use in the survey.

Conclusion
During the translation and validation stages, the major take away was the importance of an in-depth translation and validation.The simple back translation does not work in the Chinese culture, which is so different from Western cultures.Beyond cultural differences, language poses multiple difficulties, as the culture reflects a different system of thinking that exists in China.For instance, the rigidity of the Chinese structure made literal translations almost impossible.The visual ways of the Chinese language found "expensive homes" translated as "luxurious" and expensive clothes as "fine" or "designer clothes."Moreover, verbs can express different meanings, such as the verb "羡慕[xiànmù]," which means both "admire" and "envy." Furthermore, buying can mean shopping 购物 [gòuwù] or paying 买东西 [mǎi dōng xī], and pleasure can be expressed as "fun," "delight" 乐趣 [lèqù], or "happiness" 快乐 [kuàilè].Each of these terms has uses in certain contexts, and missing such nuances can result in a poorly designed question or survey and, ultimately, in poor findings and conclusions.
A different type of confusion related to questions that were contradicting or asking obvious things from a Chinese cultural perspective.For example, a question asked to what extent "acquiring material possessions" is important.This acquisition is key in the Chinese culture, and so everyone will say it is important.This response does not necessarily reflect materialism, but rather cultural aspects of family life and face.Another example, the item "I have all the things I need to enjoy life," is interpreted this way: Things one really needs are not used for enjoying life but for satisfying basic needs.When this point was discussed during the English-to-Chinese-language translation stage, very few translators seemed to believe that having all the things one needs to enjoy life is ever possible.
An important recommendation emerging from this study regards the practices of back translation for research instruments, which is insufficient when used in cross-cultural contexts when the cultural distance is significant, as it is between the Western cultures for which instruments were first developed and the Chinese culture.Most research instruments must be culturally adapted for China in a process of translation and cultural adaptation as portrayed here.Subtle cultural details cannot be uncovered by a simple back translation, but a problem exists with this method in that it is a costly, time consuming process usually avoided in everyday research.Because of this obstacle, many researchers consider that simple back translations are "often successful enough for the purpose of the research at hand" (Rode, 2005, p. 17), which is contradicted by the present study.Rode's (2005) suggestion that a rigorous manner and direction must be preserved in research is correct and supported by the current study, and indeed, special care should be taken to check the quality of the translated instrument.
Studies by Wallendorf and Arnould (1988) and Rudmin (1988) advocated that Belk's (1984) materialism scale is more suitable for the United States' culture than for other cultures, particularly more suitable than for those of the so-called "Third World" (Rudmin, 1988;Wallendorf & Arnould, 1988).Ger (1990) attempted to remedy this situation, but the conclusion was that, "It is clear that despite the attempt to construct a cross-culturally reliable materialism scale, the resulting scale is more reliable in the United States and Europe than in Turkey" (Ger, 1990).He argued that this does not mean the scale can't be used, as he thought the materialistic consumer culture is being increasingly emulated around the world.The present study contradicts this theory and describes that the cultural adaptation process of a Western instrument can result in significant modifications of the original instrument, requiring revalidation and reliability analysis.
If the concepts are significantly differently interpreted and viewed across different cultures, there is a need to improve the research instrument.Creating equivalent items might be impossible, but researchers can follow the indications of Sumathipala and Murray (2000) and remove those items that are inappropriate after translation.Elimination of items can also be based on item format or duplication, criteria used by Farmer and Sundberg (1986).
Another possible solution could be creating research instruments that are designed specifically for use in China.These instruments could be shortened versions of Western instruments, where inadequate statements would be removed.The idiosyncrasies from the source instruments must also be removed, as well as the outdated items, as many research instruments were created in the 80s and 90s, and new settings due to changes in society over time have to be taken into consideration (Gjersing et al., 2010).
Elimination of items could be performed based on feedback, literature review, and ultimately by expert committee decision-making.Certainly, previous research has employed significant modifications, such as adding or removing items.As this study has shown, the newly resulted instruments have to be evaluated and checked for validity and reliability, but in some previous research this step has been overlooked; as well, no criteria were defined for reaching item elimination or addition decisions, and every decision was based solely on the subjective judgment of the researcher and group discussions (Gjersing et al., 2010).This did not occur in the present study, where all guidelines were strictly followed and additional steps were created for the sake of reliably adapting the two materialism scales.Furthermore, the in-depth information provided in this study will act as support for further efforts in this direction of adapting and using better research instruments.
Very little has been written regarding how a scale could be shortened.From Richins' (2004) examination of forty-four studies, thirteen of those were found to have used shortened versions of the materialism (MVS) scale, and ten of these studies used an ad-hoc way to shorten the scale (Richins, 2004).Most of the studies analyzed only used six or seven items, chosen entirely at the latitude and decision of the researcher.Therefore, Richins identified a need for a shorter materialism scale and evaluated the original scale of materialism, identifying best and worst performing items.The worst items were 6, 7, 10 and 13, and the present study also identified items 7, 10 and 13 from the Richins MVS scale as being difficult to culturally adapt.Items 1 and 5 were found to perform best in terms of validity, and these items were successfully validated in the present study.Comparing the results of Richins MVS item analysis with the present study's evaluation of translations, in most cases poorer items were identified during the present study's translation and validation stages, and items that were kept were also strong in Richins' study.In addition, the shorter versions of 15, 9, 6 and 3 items were all found reliable and valid, as was the result in the Richins study's statistical analysis, and all original and shortened scales were found to correlate at analogous scale levels with the Belk scale (Richins, 2004).Therefore, the combination of all the translated and culturally adapted items from both scales could provide useful results, but further investigation is required before this conclusion can be drawn with confidence.
There is a positive relationship between the two materialism instruments, as the Richins MVS scale was found to correlate with Belk's instrument of materialism (Richins, 2004).Therefore, the two scales could be expected to measure materialism from complementary perspectives.In addition, there are simplified versions of each instrument, each of these versions measuring materialism with certain validity and reliability.Therefore, it can be assumed that the two scales overlap, and it might make sense to merge the culturally adapted items into one instrument and attempt to identify whether it can provide any useful result. Figu