Connway and Powell (2007)

__Chapter 9 Analysis of Data__There are three steps in data analysis:

**Step 1 Establish set of categories or values**.

- Categories should be exhaustive and mutually exclusive.
- Example - Reference questions – directional or non-directional, reference collection size – number of volumes

**Step 2 Coding data**

- Coding data is converting responses to numerical codes (if it is not already in numerical form). It may involve regrouping.
- Inaccuracy may occur due to poorly worded questionnaire or during assigning to wrong category.

**Step 3 Analyzing data**

We may use:

- Descriptive statistics – i.e. frequency distributions, graphs, charts, measures of central tendency (mean, median, mode) or measures of dispersion or variability (mean deviation, standard deviation), correlation coefficiencies (cross-variation or bivariate frequency)
- Inferential statistics – parametric or non-parametric

Widemuth (2009)

__Chapter 29 Content Analysis (Quantitative)__**Quantitative Content Analysis**is the systematic, objective quantitative analysis of message characteristics. Message travels from source to destination, and may be recorded on paper, digital, analog, audio, video etc… format.

**Types of content to be analyzed:**

- Manifest Content – exists unambiguously in a message. It is countable and easy to observe. Eg. Occurrence of a word.
- Latent content – conceptual content. It cannot be directly observed. It is difficult (or impossible to count). It is usually measured with manifest content indicators.

Widemuth (2009)

__Chapter 30 Qualitative Content Analysis__**Qualitative content analysis**is analyzing speeches/texts in their specific contexts. It goes beyond counting words or extracting objective content. It allows the researcher to understand social reality in a subjective but scientific manner. It uses inductive reasoning process, unlike the quantitative content analysis. (Quantitative content analysis is criticized for missing semantic information.)

**Steps in Qualitative Content Analysis:**

- Prepare the data (eg. All questions or the main questions, literally or in summary)
- Define the unit of analysis (usually individual themes are used as unit of analysis rather than physical units like words.)
- Categories and coding scheme (it allows assigning a unit of text to more than one category. It doesn’t require mutually exclusive categories.)
- Test your coding scheme on a sample text
- Code all the text (coding proceeds while ne themes and concepts emerge which may be added to coding manual)
- Assess coding consistency
- Draw conclusions from data
- Report methods and findings

Widemuth (2009)

__Chapter 31 Discourse Analysis__**Analysis of discourse/communication**(discourse is all kinds of spoken interaction, formal and informal, written text. (Eg. Reference interview, professional literature.) It looks for agreed upon themes as well as differences (within a single text or across texts). The underlying assumption is people use speech/text to construct versions of their social world. Different people may have different interpretations and one person may have multiple perspectives. Its weakness is subjectivity in every step.

**Suggested steps:**

- Research question
- Select sample discourse
- Collect records and documents
- Coding the data
- Identifying themes within categories that emerge and take shape as you examine texts.
- Develop category scheme – texts may be categorized in more than one category.
- Analyze data – this involves close reading and reading of texts, search for Patterns, similarities, contradictions, vagueness, consider and reconsider patterns, search for evidence for and against your hypotheses and makes notes.
- Validate your findings

Widemuth (2009)

__Chapter 32 Analytic induction__

**Analytic induction**is a form of inductive reasoning. It is used to analyze data from various case studies, ethnographic observations, participant observations, semi-structured interviews and structured interviews.

**Steps in analytic induction**

- Rough definition of the phenomenon
- Develop hypothetical explanation
- Continue by choosing cases and study the cases and see if they are in line with the hypotheses
- When a negative case occurs, redefine the hypothesis to exclude the negative case or reformulate the hypothesis.
- Continue to study more cases.
- Consider known negative cases as well
- Selecting likely negative cases is requirement and account for negative

Widemuth (2009)

__Chapter 33 Descriptive Statistics__

The role of descriptive statistics is to summarize your results.

**Variables and levels of measurement**

- Nominal variables (categorical) - They have no true numerical value and no calculations can be done on them.
- Ordinal variables – rank ordered variables. Eg. Level of education
- Interval variables – this are also ordered, but have uniform distance between possible values. We can perform some basic calculations on them.
- Ratio level intervals - these are also ordered equal intervals. They have true zero point. Ratios can be calculated on these.

**Measures of central tendency**

The purpose is to find one value that best describes the values – eg. Mean, median, mode.

- For nominal data, we can only use mode.
- Mean is most stable and it can be applied for more calculations. But it may be skewed if there are some outliers (high values or low values).
- Mode is an actual value from the data. It is not affected by outliers.

**Measures of dispersion**

Shows how far scores spread out around the central point.

Eg. Range, interquartile range (range of the middle 50%), variance, standard deviation, confidence intervals.

Widemuth (2009)

__Chapter 34 Frequencies, Cross-tabulation and the Chi-Square Statistic__**Cross-tabulation tables (contingency table/bivariate table)**– shows categories of one variable shown as rows and the categories of the second variable shown as columns. It reports the number of cases that belong in that cell (cases that fit in that particular category of each of the two variables.)

**Pie charts**are used to understand how a particular variable is distributed/to show proportions.

Don’t use pie charts:

- When respondents may have selected more than 1 choice
- When there are too many variables (don't use for more than 6 variables.)

Widemuth (2009)

__Chapter 35 Sequential event analysis__

**Markov model approach**

It examines sequences step by step. It breaks up each step to examine their sequence. It assumes that the conditional probability of the occurrence of a certain event depends only on the event immediate proceeding it. Types of Markov model:

- Zero-order model – shows only frequency of occurrence of each event.
- 1
^{st}order Markov model (state transition matrix) – shows probability of corresponding row sate to the column state in percentage. - 2
^{nd}order Markov model – considers 2 steps.

**Optimal matching approach**- compares similarities and dissimilarities of two complete sequences.

Widemuth (2009)

__Chapter 36 Correlation__

**Correlation**is used to examine the relationship between two variables. It measures proportion of the variability in one variable that is explained by the variability in the other variable. Scatter diagram may be used.

**Types of Correlation**

- When perfectly correlated, all the variability in one variable is explained by variability in the other. But when there is no correlation, none of the variability in one is explained by the other.
- Positive correlation shows that when one variable increases the other increases as well. But Negative correlation means when one variable increases, the other decreases.

Widemuth (2009)

**Chapter 37 Comparing means**

**T-tests**are used to compare results from two groups (two samples). It compares mean scores from the two groups. The t-score shows the difference between the mean of the two samples. The p-score shows the probability that the result was due to chance.

**Analysis of variance (ANOVA) –**compares means like t-tests, but in more than two groups. Variance between groups is treated as “signal” and the variance within each group as “noise”. When there is a lot of signal, but not much noise, it means that there is a meaningful difference among the means. But if there is low signal but high noise, there is not much difference.

Sources

Connaway, L. S. & Powell, R.R. (2007). Chapter 9. In

*Basic Research Methods or Librarians*(5th ed.). California.Wildemuth, B. M. (2009). Chapter 29-37. In

*Applications of Social Research Methods to Questions in Information and Library Science.*Connecticut.