Daylight: Luminoso product for analyzing a batch of text documents as a “project”, and producing insights and metrics about those documents. Daylight uses Luminoso QuickLearn technology to learn about the specific meanings of words and phrases in those text documents, identifies the words and phrases that are most relevant, identifies words and phrases that are conceptual matches because they’re close in meaning, and allows you to visualize and export all of this information. Compass is the Luminoso product for analyzing text that’s arriving in real time, and to classify it based on labeled examples or defined topics.
Document: A document is a row in the .csv file that you upload to Daylight. Also referred to as a verbatim. Source documents contain the natural-language text from product reviews, support tickets, survey responses etc.
Project: The basic unit of analysis in Luminoso Daylight. A project is built out of a set of documents; the documents and metrics in one project have no influence on or connection to other projects.
Term: A word or phrase. I think we might avoid using the word “term” outside of technical discussions, and instead just call it one instance of a “concept”.
Subsets: Documents can contain all sorts of categorical, numeric, or date subsets. For example, a categorical subset field such as "State" might include values such as "MA," "RI," "CT," etc. A numeric field such as "Age" might include values such as "29" or "54." A date field such as "Review date" might include values such as "05/31/2017."
Filter: Documents can be filtered by subset to further drill down into the data. For example, one might create a filter out of "State": "MA" or "RI" and "Age": "45-65" to see only the documents from 45- to 65-year-olds from Massachusetts or Rhode Island.
Concept: A concept is a term or phrase that appears in your dataset. It can be a single word or term, such as "amazing" or "smelly," or a phrase (also known as a "collocation"), such as "love my Kindle" or "not helpful.”
Exact Match: An exact match is a match between the concept you've searched for/are analyzing and any concept that is exactly the same. For instance, if you are analyzing the concept "app," searching for an exact match would return number of documents that include the concept "app," "App", "apps", etc., but not closely related concepts like "tablet" or "download.”
Conceptual Match: A conceptual match is a match between a term or phrase you've selected and another concept that it is closely related to, though not exactly, the same. For instance, if you are analyzing the concept "delicious," conceptual matches might include concepts like "yummy" and "tasty."
Association Score: An association score measures how strongly related two concepts are on a scale of -1.00 to 1.00. These are upper and lower limits that represent the weakest and the strongest possible relationships; a concept will have an association score of 1.00 with itself. A score of 0 suggests that the two concepts are only as associated with one another in your project as one would expect them to be by random chance. See the article "Association Scores" for more information.
Relevance: A word or a phrase is relevant if it appears more often in a project than one would expect. In a consumer electronics project, for example, the word “wifi” may occur 1000 times more frequently than it does in English as a whole, and this factor of 1000 is its relevance.
Saved concept: You can “save” a concept -- at that point, it will appear in the sidebar, you can associate a color with it in the Galaxy, and you can measure how associated documents are with it. A saved concept can be defined by a word, a phrase, or multiple words or phrases. Previously, we called these “topics”.
Conceptual match vs. exact match
An exact match is when a document contains exactly the term you’re looking for. Exact matches can miss important things, so we also provide conceptual matches, which occur when a document contains a word highly related to the term you’re looking for. Conceptual matches for “wifi” may include “wi-fi”, “wireless”, “connection”, and “antenna”.
Drivers: Terms that are relevant and reveal a significant difference on some measurable aspect of your data. For example, in a project of casual restaurant reviews where “scores” are star ratings, the term “long line” might be a negative driver. The impact of a driver is how much of a difference is associated with it on average. If “long line” has an impact of -0.6 stars, this means that reviews containing “long line” (or its conceptual matches) average 0.6 stars lower than reviews without it.
Sentiment: Generally, positive vs. negative feelings about things. Many NLP systems (including Daylight) allow you to assign a sentiment score to text, a positive or negative number expressing whether the text expresses something positive or negative. Sentiment can be subjective or unclear, and sometimes we want sentiment to span more dimensions than just positive vs. negative.
QuickLearn: The underlying transfer learning technology. QuickLearn takes in a background space trained on ConceptNet and a large amount of freely available text, and domain text, the text documents that you want to understand, and it creates a new space of word embeddings that is tuned for the domain. It understands specific meanings of in-domain words by seeing them used in context, and general meanings of common words because those are already in the background space. Topic-based classifier is one of the features we can offer because of QuickLearn: a classifier that you build by defining topics in a few words, as in Daylight’s “saved concepts”, instead of with lots of labeled examples.