Need Help?

Data Formatting: General Guidelines

Last Updated: May 19, 2017

Your data must be saved in CSV (Comma-Separated Values) format.  

To ensure that your data is processed correctly, make sure that each row represents a single text entry and that each column is correctly labeled. All column labels must appear as the first row of the .csv file.  The system skips over improperly labeled data. Your .csv file (which can be viewed in Microsoft Excel) should be labeled with the following column label options: Text, Date, Title, Subset_


Text – This is the only required column label. This column contains the text data to be analyzed.

Date (optional) – Having the date on each text entry allows for collection of trending data and other time-based analysis. All Excel date formats are accepted into our system.

European date formats must be converted to the ISO or an American standard (MM/DD/YY) or (MM/DD/YYYY). See article What does the date label do?  for more information.


Title (optional) – This column provides the title of the text entry and will be used as an identifier. This text is not analyzed. This column can include a review or post title, or metadata about the document's author (e.g., gender, age, and/or geographic information).  See article What does the 'Title' label do? for more information.

Subset_ (or Subset_title(optional) – Subsets allow you to filter your data by metadata and explore the text against metadata categories. To group related subsets (which are listed alphabetically), the column label for subsets may be annotated with a title (descriptor) by naming the column Subset, followed by an underscore, and then your desired title, such as 'Subset_Gender' or 'Subset_Star Rating'. You may have multiple subset columns in your file, as long as each column is labeled properly.

Bear in mind that not all subsets are useful (or even accurate) depending on your needs, such as the number of Twitter followers a user has or when blogs list a poster’s hometown rather than his/her current location. 

If you have metadata that you do not wish to include as a subset (without deleting the column from the spreadsheet), simply use a column header that is either blank or does not read 'Subset'. So long as the header cell of a metadata column does not contain an accepted column label, the system will skip over the entire column.

As an example, let's say that you have data for online product reviews and want to create subsets for quality rating and price rating, both expressed as a rating between 1 and 5. Just labeling the columns 'Subset' can get confusing, because in the tool you would just see 1-5 scores without knowing if they belong to Quality or Price. So, instead you want to use subset descriptors. This is a way to add additional detail to your subset label.  You do this by titling the column Subset, followed by an underscore (_), and then your description. So in this example, title the columns 'Subset_quality rating' and 'Subset_price rating', and you will then see them displayed in the user interface as "quality rating: 1 star", "price rating: 1 star", etc. 

See article What does the 'Subset' label do? for more information.


URL (optional) – A URL associated with each document will not be analyzed, but will be matched to each document and clickable from Luminoso Analytics. This will enable you to open a document in its original format (for example, on Twitter) by clicking on it within Luminoso. See What  does the 'URL' label do?  for more information.

All blank columns of data (or columns named something other than “Text”, “Date”, “Title”, “Subset_”, or “URL”) will not be processed. 


More Support
seconds ago
a minute ago
minutes ago
an hour ago
hours ago
a day ago
days ago
Invalid characters found