Yelp Polarity | 600,000+ Polar E-commerce reviews

Uploaded by Admin

Last Updated Jun 2022


Large Yelp Review Dataset. This is a dataset for binary sentiment classification. We provide a set of 560,000 highly polar yelp reviews for training, and 38,000 for testing. ORIGIN The Yelp reviews dataset consists of reviews from Yelp. It is extracted from the Yelp Dataset Challenge 2015 data. For more information, please refer to The Yelp reviews polarity dataset is constructed by Xiang Zhang ( from the above dataset. It is first used as a text classification benchmark in the following paper: Xiang Zhang, Junbo Zhao, Yann LeCun. Character-level Convolutional Networks for Text Classification. Advances in Neural Information Processing Systems 28 (NIPS 2015). DESCRIPTION The Yelp reviews polarity dataset is constructed by considering stars 1 and 2 negative, and 3 and 4 positive. For each polarity 280,000 training samples and 19,000 testing samples are take randomly. In total there are 560,000 training samples and 38,000 testing samples. Negative polarity is class 1, and positive class 2. The files train.csv and test.csv contain all the training samples as comma-separated values. There are 2 columns in them, corresponding to class index (1 and 2) and review text. The review texts are escaped using double quotes ("), and any internal double quote is escaped by 2 double quotes (""). New lines are escaped by a backslash followed with an "n" character, that is " ".


Sentiment Analysis


Total Size

166.5 MB

File Types

2 other



Open Source
Python Integration

You won't be charged yet

Profile Picture

Notia provides instant access to 100+ datasets - straight from your notebook.

More from this Vendor

Reuters News Dataset | 1987 Newswire Document Collection

The documents in the Reuters-21578 collection appeared on the Reuters newswire in 1987. The documents were assembled and indexed with categories by pe...

Open Source

Historical Crypto Data | 400+ cryptocurrency pairs | 2013-2022

With the rise of cryptocurrency markets the interest in creating automated trading strategies, or trading bots, has grown. Developing algorithmic trad...

Open Source

Wish E-commerce Summer Clothing Sale Data

Studying top products requires more than just product listings. You also need to know what sells well and what does not. This dataset contains produc...

Open Source

BBC News Articles | 2225 Labelled Articles

Text documents are one of the richest sources of data for businesses. This public dataset from the BBC is comprised of 2225 articles, each labelled un...

Open Source