Yelp Polarity | 600,000+ Polar E-commerce reviews
Uploaded by Admin
Last Updated Jun 2022
Large Yelp Review Dataset. This is a dataset for binary sentiment classification. We provide a set of 560,000 highly polar yelp reviews for training, and 38,000 for testing. ORIGIN The Yelp reviews dataset consists of reviews from Yelp. It is extracted from the Yelp Dataset Challenge 2015 data. For more information, please refer to http://www.yelp.com/dataset_challenge The Yelp reviews polarity dataset is constructed by Xiang Zhang (firstname.lastname@example.org) from the above dataset. It is first used as a text classification benchmark in the following paper: Xiang Zhang, Junbo Zhao, Yann LeCun. Character-level Convolutional Networks for Text Classification. Advances in Neural Information Processing Systems 28 (NIPS 2015). DESCRIPTION The Yelp reviews polarity dataset is constructed by considering stars 1 and 2 negative, and 3 and 4 positive. For each polarity 280,000 training samples and 19,000 testing samples are take randomly. In total there are 560,000 training samples and 38,000 testing samples. Negative polarity is class 1, and positive class 2. The files train.csv and test.csv contain all the training samples as comma-separated values. There are 2 columns in them, corresponding to class index (1 and 2) and review text. The review texts are escaped using double quotes ("), and any internal double quote is escaped by 2 double quotes (""). New lines are escaped by a backslash followed with an "n" character, that is " ".
You won't be charged yet
Notia provides instant access to 100+ datasets - straight from your notebook.
More from this Vendor
The documents in the Reuters-21578 collection appeared on the Reuters newswire in 1987. The documents were assembled and indexed with categories by pe...
With the rise of cryptocurrency markets the interest in creating automated trading strategies, or trading bots, has grown. Developing algorithmic trad...
Studying top products requires more than just product listings. You also need to know what sells well and what does not. This dataset contains produc...
Text documents are one of the richest sources of data for businesses. This public dataset from the BBC is comprised of 2225 articles, each labelled un...