Deriving the Pricing Power of Product Features by Mining Consumer Reviews

Increasingly, user-generated product reviews serve as a valuable source of information for customers making product choices online.  The existing literature typically incorporates the impact of product reviews on sales based on numeric variables representing the valence and volume of reviews.  In this paper, we posit that the information embedded in product reviews cannot be captured by a single scalar value.  Rather, we argue that product reviews are multifaceted, and hence the textual content of product reviews is an important determinant of consumers’ choices, over and above the valence and volume of reviews.  To demonstrate this, we use text mining to incorporate review text in a consumer choice model by decomposing textual reviews into segments describing different product features.  We estimate our model based on a unique data set from Amazon containing sales data and consumer review data for two different groups of products (digital cameras and camcorders) over a 15-month period.  We alleviate the problems of data sparsity and of omitted variables by providing two experimental techniques: clustering rare textual opinions based on pointwise mutual information and using externally imposed review semantics.  This paper demonstrates how textual data can be used to learn consumers’ relative preferences for different product features and also how text can be used for predictive modeling of future changes in sales.