Prepare the dataset for upload
Once you choose a natural language source, include all the information you’ll need in your analysis so you can get the best results from Daylight. Invest time now, since Daylight learns directly from the data you upload and data can’t be modified once it’s uploaded.
Daylight’s Create a project feature only accepts comma-separated value (CSV) format with appropriately formatted columns. At a minimum, every uploaded data file must be a CSV file and have a column titled text.
A verbatim is the conversational text component of the sample you have collected.
A document is a row of your source data, including the conversational text and any associated metadata.
Metadata is structured data that creates context for text responses. Metadata may include demographics, dates, scores, or product details.
A CSV file, or comma-separated value file, is a plain-text file format that is used to organize data. CSV files exclude styling information that is included in an Excel XLS or XLSX file format. You can create a CSV file from most spreadsheet editors.
Supported languages and multilingual datasets
Daylight includes 15 natural language processing pipelines that analyze unstructured text in one language at a time. For best results with a multilingual dataset, split your data into one language per CSV file. Then, select the appropriate language when uploading each file.
To add metadata, designate specific headers as you create your CSV file. These headers tell Daylight how to treat the contents of a column. Columns without a Daylight-compatible header will be ignored.