Wednesday, October 19, 2011

Key Concepts Blog: Data Analysis Techniques Readings

Connway and Powell (2007)

Chapter 9 Analysis of Data

There are three steps in data analysis:

Step 1  Establish set of categories or values
  • Categories should be exhaustive and mutually exclusive.
  • Example -  Reference questions – directional or non-directional, reference collection size – number of volumes
Step 2  Coding data
  • Coding data is converting responses to numerical codes (if it is not already in numerical form).  It may involve regrouping.
  • Inaccuracy may occur due to poorly worded questionnaire or during assigning to wrong category.

Step 3 Analyzing data
We may use:
  • Descriptive statistics – i.e. frequency distributions, graphs, charts, measures of central tendency (mean, median, mode) or measures of dispersion or variability (mean deviation, standard deviation), correlation coefficiencies (cross-variation or bivariate frequency)
  • Inferential statistics – parametric or non-parametric

Widemuth (2009)
Chapter 29 Content Analysis (Quantitative)

Quantitative Content Analysis is the systematic, objective quantitative analysis of message characteristics.  Message travels from source to destination, and may be recorded on paper, digital, analog, audio, video etc… format.

Types of content to be analyzed:
  • Manifest Content – exists unambiguously in a message.  It is countable and easy to observe.  Eg. Occurrence of a word.
  • Latent content – conceptual content.  It cannot be directly observed.  It is difficult (or impossible to count).  It is usually measured with manifest content indicators.
Widemuth (2009)
Chapter 30 Qualitative Content Analysis

Qualitative content analysis is analyzing speeches/texts in their specific contexts.  It goes beyond counting words or extracting objective content.  It allows the researcher to understand social reality in a subjective but scientific manner.  It uses inductive reasoning process, unlike the quantitative content analysis. (Quantitative content analysis is criticized for missing semantic information.)

Steps in Qualitative Content Analysis:
  • Prepare the data (eg. All questions or the main questions, literally or in summary)
  • Define the unit of analysis (usually individual themes are used as unit of analysis rather than physical units like words.)
  • Categories and coding scheme (it allows assigning a unit of text to more than one category.  It doesn’t require mutually exclusive categories.)
  • Test your coding scheme on a sample text
  • Code all the text (coding proceeds while ne themes and concepts emerge which may be added to coding manual)
  • Assess coding consistency
  • Draw conclusions from data
  • Report methods and findings

Widemuth (2009)
Chapter 31 Discourse Analysis

Analysis of discourse/communication (discourse is all kinds of spoken interaction, formal and informal, written text. (Eg. Reference interview, professional literature.)  It looks for agreed upon themes as well as differences (within a single text or across texts).  The underlying assumption is people use speech/text to construct versions of their social world.  Different people may have different interpretations and one person may have multiple perspectives.  Its weakness is subjectivity in every step.

Suggested steps:
  • Research question
  • Select sample discourse
  • Collect records and documents
  • Coding the data
    • Identifying themes within categories that emerge and take shape as you examine texts.
    • Develop category scheme – texts may be categorized in more than one category.
  • Analyze data – this involves close reading and reading of texts, search for Patterns, similarities, contradictions, vagueness, consider and reconsider patterns, search for evidence for and against your hypotheses and makes notes.
  • Validate your findings

Widemuth (2009)
Chapter 32 Analytic induction

Analytic induction is a form of inductive reasoning.  It is used to analyze data from various case studies, ethnographic observations, participant observations, semi-structured interviews and structured interviews. 

Steps in analytic induction
  • Rough definition of the phenomenon
  • Develop hypothetical explanation
  • Continue by choosing cases and study the cases and see if they are in line with the hypotheses
  • When a negative case occurs, redefine the hypothesis to exclude the negative case or reformulate the hypothesis.
  • Continue to study more cases.
  • Consider known negative cases as well
  • Selecting likely negative cases is requirement and account for negative

Widemuth (2009)
Chapter 33 Descriptive Statistics

The role of descriptive statistics is to summarize your results.

Variables and levels of measurement
  • Nominal variables (categorical) -  They have no true numerical value and no calculations can be done on them.
  • Ordinal variables – rank ordered variables. Eg. Level of education
  • Interval variables – this are also ordered, but have uniform distance between possible values.  We can perform some basic calculations on them.
  • Ratio level intervals - these are also ordered equal intervals.  They have true zero point.  Ratios can be calculated on these.
Measures of central tendency
The purpose is to find one value that best describes the values – eg. Mean, median, mode.
  • For nominal data, we can only use mode.  
  • Mean is most stable and it can be applied for more calculations.  But it may be skewed if there are some outliers (high values or low values).
  • Mode is an actual value from the data.  It is not affected by outliers.
Measures of dispersion
Shows how far scores spread out around the central point.
Eg. Range, interquartile range (range of the middle 50%), variance, standard deviation, confidence intervals.

Widemuth (2009)
Chapter 34 Frequencies, Cross-tabulation and the Chi-Square Statistic
Cross-tabulation tables (contingency table/bivariate table) – shows categories of one variable shown as rows and the categories of the second variable shown as columns.  It reports the number of cases that belong in that cell (cases that fit in that particular category of each of the two variables.)

Pie charts are used to understand how a particular variable is distributed/to show proportions.
Don’t use pie charts:
  • When respondents may have selected more than 1 choice
  • When there are too many variables (don't use for more than 6 variables.)

Widemuth (2009)
Chapter 35 Sequential event analysis

Markov model approach
It examines sequences step by step.  It breaks up each step to examine their sequence.  It assumes that the conditional probability of the occurrence of a certain event depends only on the event immediate proceeding it.  Types of Markov model:
  • Zero-order model – shows only frequency of occurrence of each event.
  • 1st order Markov model (state transition matrix) – shows probability of corresponding row sate to the column state in percentage.
  • 2nd order Markov model – considers 2 steps.

Optimal matching approach - compares similarities and dissimilarities of two complete sequences.

Widemuth (2009)
Chapter 36 Correlation

Correlation is used to examine the relationship between two variables.  It measures proportion of the variability in one variable that is explained by the variability in the other variable.  Scatter diagram may be used.

Types of Correlation
  • When perfectly correlated, all the variability in one variable is explained by variability in the other.  But when there is no correlation, none of the variability in one is explained by the other.
  • Positive correlation shows that when one variable increases the other increases as well.  But Negative correlation means when one variable increases, the other decreases.

Widemuth (2009)
Chapter 37 Comparing means
T-tests are used to compare results from two groups (two samples).  It compares mean scores from the two groups.  The t-score shows the difference between the mean of the two samples.  The p-score shows the probability that the result was due to chance.

Analysis of variance (ANOVA) – compares means like t-tests, but in more than two groups.  Variance between groups is treated as “signal” and the variance within each group as “noise”. When there is a lot of signal, but not much noise, it means that there is a meaningful difference among the means.  But if there is low signal but high noise, there is not much difference.

Sources


Connaway, L. S. & Powell, R.R. (2007). Chapter 9. In Basic Research Methods or Librarians (5th ed.). California.
Wildemuth, B. M. (2009). Chapter 29-37. In Applications of Social Research Methods to Questions in Information and Library Science. Connecticut.

No comments:

Post a Comment