Association scores measure the relationship between two concepts or groups of concepts.
Association scores range on scale from -1.00 to 1.00. These are extremes which represent the weakest and the strongest possible relationships:
- An association score of 1.00 represents the relationship between a concept and itself, while -1.00 is the relationship between a concept and the most unrelated other concept within the same data set.
- A 0.00 score represents how much we would expect two concepts to be discussed at the same time as a result of random chance.
What do they mean?
Thus, association scores above 0 indicate that two concepts are being discussed together more often than we would expect from random chance, while association scores below 0 indicate that two concepts are not frequently being talked about at the same time. (This is typically an indication that consumers are not associating those concepts with each other in the data set being analyzed.)
For example, let's say we're analyzing three concepts: "happy," "product," and "packaging." "Happy" and "product" have an association score of 0.26, "happy" and "packaging" have an association score of -0.18, and "product" and "packaging" have an association score of -0.01.
The 0.26 association score between "happy" and "product" indicates that consumers frequently associate happiness with the product, and we can thus infer that something about the product itself is pleasing to the consumer.
The -0.18 association score between "happy" and "packaging" simply indicates that consumers use the term "happy" to describe the packaging less often than random chance would suggest. This does not necessarily mean that they are unhappy with the packaging. It just means that they are typically not using the term "happy" when discussing the packaging. They may be using a different term ("easy to open"), or may be speaking of the packaging in a neutral way. These are important distinctions in interpreting the data.
The 0.00 association score between "product" and "packaging" indicates that consumers discuss the product and packaging together as a result of random chance. <0.00 would indicate no notable associations among consumers between these two concepts. The more negative the score, the more remote the association between concepts.
How are they calculated?
Using our Luminoso proprietary algorithms (a supercharged version of latent semantic analysis), each term is given a vector in a 150-dimensional vector space, based on data from ConceptNet and the text being analyzed.
Once vectors are assigned to terms, the dot product of two unit vectors gives the cosine of the angle between them. Two vectors that are very close together will have a dot product of near 1, two vectors that are orthogonal to each other will have a dot product of 0, and two vectors that point in opposite directions will have a dot product of -1.
How can they be interpreted?
Topic-Topic association scores:
- A >0.70 usually indicates a definitional relationship (e.g., awesome/amazing or screen/display or alert/notification or uninstall/reinstall). If two terms are not synonymous (or very close in meaning) and have a >0.70 score, this indicates a strong and meaningful relationship between concepts.
- A >0.20 and <0.70 association score is often where the most interesting findings lie in a project. This is where you should focus your attention in any project. These concepts are significantly highly related.
- A >0.0 and <0.20 association score indicates that two concepts have a weak association in the data.
- 0.00 score indicates no meaningful relationship between two concepts beyond random chance.
- A <0.0 association score (or negative association score) indicates that two topics are unrelated to one another in your dataset.
Topic-Filter association scores:
- Topic-Filter association scores generally fall within a smaller range that is closer to 0.0 as compared to Topic-Topic scores. Topic-Filter relationships typically fall between -0.30 to 0.30. Scores that are much higher than this range are extremely significant.
- Topic-Filter association scores also tend to have less variance from one another. A 0.05 point differential between two Topic-Filter association scores is meaningful and interesting. Differences of less than 0.05 scores between two Topic-Filter association scores should be evaluated with some caution as they may be influenced by varying document volume of each filter or topic.
It's important to note that negative association scores do NOT indicate negative sentiment.