How can we help you today?

Formatting your data

Modified on: Mon, Oct 7, 2019 at 9:48 AM


Prepare the dataset for upload

Once you choose a natural language source, include all the information you’ll need in your analysis so you can get the best results from Daylight. Invest time now, since Daylight learns directly from the data you upload and data can’t be modified once it’s uploaded.  


Daylight’s Create a project feature only accepts comma-separated value (CSV) format with appropriately formatted columns. At a minimum, every uploaded data file must be a CSV file and have a column titled text.

Key vocabulary

  • verbatim is the conversational text component of the sample you have collected. 

  • document is a row of your source data, including the conversational text and any associated metadata.

  • Metadata is structured data that creates context for text responses. Metadata may include demographics, dates, scores, or product details. 

  • CSV file, or comma-separated value file, is a plain-text file format that is used to organize data. CSV files exclude styling information that is included in an Excel XLS or XLSX file format. You can create a CSV file from most spreadsheet editors. 

Supported languages and multilingual datasets 

Daylight includes 15 natural language processing pipelines that analyze unstructured text in one language at a time. For best results with a multilingual dataset, split your data into one language per CSV file. Then, select the appropriate language when uploading each file.

Metadata

To add metadata, designate specific headers as you create your CSV file. These headers tell Daylight how to treat the contents of a column. Columns without a Daylight-compatible header will be ignored. 


Data type

Examples

Text (Required)

Column header: text

  • The natural language samples (verbatims) for Daylight to analyze

  • Only one text column is permitted per file

  • Each piece of text may not exceed 500,000 characters in length

  • I loved the free coffee and the room was very clean, but it smelled strongly of cigarette smoke.

  • I booked this room last-minute when my travel plans changed. The price was ok considering it was last-minute but it was way out of the way. 

  • We come to this hotel every year, and we appreciate the consistently top-notch experience!

Title

Column header: title

  • Any identifier that is associated with text

  • Only one column permitted per data file

  • Isn’t analyzed as part of language sample, but can help organize text

  • Recent stay

  • Hotel visit

  • We’ll definitely be back!

String

Column header: string_[FieldName] 

Example: string_MemberLevel

  • Information that helps categorize text

  • Include as many string columns as needed

  • Can only filter fields in Daylight with up to 10,000 values

  • Helps filter your data by category

None

Business

Loyalty


Number

Column header: number_[FieldName] 

Example: number_MemberSince

  • Any numeric-only data associated with text

  • Include as many number columns as needed

  • Can optionally use in Driver feature

2015

2018

1998

Score

Column header: score_[FieldName] 

Example: score_OverallExperience

  • Any score or rating data associated with text

  • Include as many score columns as needed

  • Recommended for using the Drivers function

7

10

Date

Column header: number_[FieldName] 

Example: date_CheckoutDate

ISO 8601 formatted dates:

  • 2018-04-10

  • 2018-04-10T13:45

  • 2018-04-10T13:45:00Z

US-style dates:

  • 04/10/2018

  • 04/10/2018 13:45:15

  • 4/10/18 1:45 PM




Did you find it helpful? Yes No

Send feedback
Sorry we couldn't be helpful. Help us improve this article with your feedback.