logo

Against Monopoly

defending the right to innovate

Monopoly corrupts. Absolute monopoly corrupts absolutely.





Copyright Notice: We don't think much of copyright, so you can do what you want with the content on this blog. Of course we are hungry for publicity, so we would be pleased if you avoided plagiarism and gave us credit for what we have written. We encourage you not to impose copyright restrictions on your "derivative" works, but we won't try to stop you. For the legally or statist minded, you can consider yourself subject to a Creative Commons Attribution License.


back

Long Tail Innovation

Over on Ed Felten's blog there is an argument that the transactions costs imposed by IP law are devastatingly bad because of the high cumulative value of many small transactions. The specific fact quoted was

For those not in the know, “long tail” is one of the current buzzphrases of tech punditry. The term was coined by Chris Anderson in a famous Wired article. The idea is that in markets for creative works, niche works account for a surprisingly large fraction of consumer demand. For example, Anderson writes that about one-fourth of Amazon's book sales come from titles not among the 135,000 most popular. These books may sell in ones and twos, but there are so many of them that collectively they make up a big part of the market.

I was pretty surprised by this "fact." Michele and I have gathered some data on book revenues and the striking thing is the concentration of sales by a few very successful authors. While we argue that this isn't a very good fact for those who support IP it also doesn't seem consistent with the Chris Anderson fact cited above. (Anyone know what data he used?)

Anyway, I went back and looked at our data, and I found that 75% of book revenues are earned by roughly the top 200 books out of our sample of 1712 book titles. That is, the bottom 88% of books earns only a quarter of the revenue. This doesn't seem like that long a tail. In our sample, the top 700 books generally sold more than 100 copies during the two week period of the sample; while the bottom 900 books generally sold less than 20 copies. However, this bottom 900 accounted for only about 1/3rd of 1 percent of book revenue. So the idea that books that "sell in ones and twos" make up a big part of the market isn't in our data. In short, I am skeptical of Chris Anderson's fact.


Comments

Anderson's article says that his data came from Amazon.

I'm not sure that your data are inconsistent with his. You assert that a few hits account for most of the revenue. He asserts that a large number of niche works account collectively for about 25% of unit sales, at stores that actually stock those niche works. Neither claim invalidates the other.

I don't think your data is in conflict with Chris's. The main point that Chris has, is that there are so many titles that sell in one's and twos that they make up a large part of the total sales. That kind of situation would and could not show in a sample of 2000 books.

How did you make your sample? I would be very surprised, if it waa a real random sample of say all books that Amazon is selling, rather than skewed some way toward the more selling titles. If this is the case, then the proportion of all sales by the most selling titles can only fall as you increase the sample size (getting more books from the long tail into the sample).

So can you tell how you sampled the data?

I don't think your data is in conflict with Chris's. The main point that Chris has, is that there are so many titles that sell in one's and twos that they make up a large part of the total sales. That kind of situation would and could not show in a sample of 2000 books.

How did you make your sample? I would be very surprised, if it waa a real random sample of say all books that Amazon is selling, rather than skewed some way toward the more selling titles. If this is the case, then the proportion of all sales by the most selling titles can only fall as you increase the sample size (getting more books from the long tail into the sample).

So can you tell how you sampled the data?

The sample was created by using the Amazon advanced search to pick all hardcover fiction titles with publication dates in a particular month. The exact query

"subject: fiction and pubdate: during 9-2003 and binding: Hardcover"

It is conceivable that some works are released only in paperback, although there isn't much evidence this is the case for fiction. Once we had the ISBNs we used them to get sales data from Ingram, a major book distributor. You can find more details at http://www.dklevine.com/data.htm which describes the collection procedure in greater detail, as well as providing a spreadsheet with the actual data.

Total revenue in our data is $12,832,825.69. The number of books selling no copies was 604; with another couple of hundred selling no more than five. If we assumed that the 604 no sales really all sold 3 copies at $20 each (that's about what the data shows they sell for), that would amount to roughly an additional $36K in revenue - orders of magnitude less than the total revenue. Looking at our data, I just don't see a long tail of books that don't sell many copies and account for much revenue. In our data, there seem to be three classes of books

1. books that sell really well - the top ten account for about 25% of revenue

2. books that sell moderately well - if we looks at the books generating $12 of the $13 million in revenue the worst selling books are still selling about 300 copies in our data: this represents about 1/6th the market for two weeks, so that's a pretty good number of sales

3. the "long tail" of books that sell not so many copies - they just don't seem to add much revenue, or else they aren't in our data

As you have stated before, most book sales for a particular book seems to happen in the first few months. So you could argue that the sales distribution for new books and all books should and will look very different. Could it be that the time based sales distribution for books that sell a lot and books that end up selling little would be very different?

It would seem to me that the method of choosing only new books is capable of skewing the distribution, but with current information I cannot be sure of course. In theory, it should be able to explain the differences (but does it, is of course an empirical question).

Sorry, it looks like I misreported what we did. I went back to read the description of what we did. We looked at books published in 9-2003 and also in 9-2004. The sales data are data through 11-2004 in both cases (not as I said above just for two weeks). So when I said 300 books above, that would mean (assuming Ingram has about 1/6th the market) about 1800 annual sales.

One data set is for the first few months of sales; the other for over a year. But the distribution of sales doesn't look that different between the two - that is neither two month, nor one year sales seem to have a long tail.

If by "long tail" we want to mean books that sell 2-5 thousand (hardback) copies, the data supports that there are a lot of books like that generating a lot of revenue. But it is important to realize that these aren't amateur productions and that the authors that write them make a living doing so, which doesn't seem to be what Felten has in mind.

It may be relevant that your dataset is limited to hardcover books, to fiction, and to books published recently. I would expect long-tail books to be published as paperbacks.

For me, the prototypical long-tail book is a World War II memoir published last year by a family friend of ours. It's in paperback, published by a press I had never heard of, and there can't be many copies in circulation. But I read it and found it pretty interesting.

Another part of the long tail is older books, which are probably available mostly in paperback.

Again, I'm not saying that your data are wrong, only that I don't think they contradict Anderson's claim.

I agree with the points about paperbacks and fiction - it may be that hardback fiction doesn't contain a long tail that exists elsewhere. So it would be worth gathering similar data on, say, paperback non-fiction. I don't agree about books published a long time ago. From the perspective of transactions costs discouraging publication, the relevant consideration are the sales over the lifetime of the book - especially the sales right after publication, since this constitutes the bulk of sales. (One question is when a book goes into an additional printing how this would be whether this is listed as a "new publication" but that is scarcely relevant for books in the "long tail" that aren't likely to see a second printing.) That a book published long ago only sells a few copies now isn't relevant to the incentive to have published it at the time the decision was made.

One simple thing we could look at is to see what fraction of book revenues is generated by hard cover fiction - my guess is that it is fairly large. If it was 50% of the total, then the long tale would have to make up 50% of all other sales.

I should say that I'm somewhat skeptical of the claim that paperback non-fiction is going to show a different story. We looked up some of the books in the lower tail of the distribution of hardback fiction - and they look a lot less interesting than a World War II memoir (to put it mildly). This is consistent with the fact that they don't sell any copies.

I take the point that our data doesn't necessarily contradict Anderson's claim since the claim is for a broader category than we look at. But our data certainly doesn't support his claim.

It may be relevant that your dataset is limited to hardcover books, to fiction, and to books published recently. I would expect long-tail books to be published as paperbacks.

For me, the prototypical long-tail book is a World War II memoir published last year by a family friend of ours. It's in paperback, published by a press I had never heard of, and there can't be many copies in circulation. But I read it and found it pretty interesting.

Another part of the long tail is older books, which are probably available mostly in paperback.

Again, I'm not saying that your data are wrong, only that I don't think they contradict Anderson's claim.

Is it possible that taking a sample from Amazon incorporates a bias against 'long-tail' books? After all, if I write a long-tail book, presumably the number of printed books will be small, and maybe the cost of selling through Amazon is too high.

*If* that is the case, Amazon would give a lower bound for any estimation of the relevance of the long-tail. I have one datapoint to show, and I quote:

The cost of using Amazon is high. They take 55% of the "official" price (not the sale price but the price you originally determine). That means that even if they discount the book (good for sales), the discount is coming out of their half. But it means you are only getting 45% of your listed price. In addition you pay for shipping books there, and of course for printing them, so the math does not encourage fortune making. Most self-published books are in the "long tail" zone, selling only a few copies per month. I've done better, selling several thousand copies over a couple of years, but still: This is not a way to make money; this is a way to distribute your message.

The author of the post seem to suggest he's doing fine using Amazon, but I wonder if most long-tail authors choose a different way to get their stuff out there.

Thanks for clarifying your position. I agree with you that your data does not support Anderson's claim.
I agree that Amazon might be biased against "long tail books" (although in our data they have some awfully obscure stuff for sale). But the original Chris Anderson observation in wired about the long tail is based on data from Amazon. So the question is why his data from Amazon shows something different than our data from Amazon. (Our sales data, however, is not from Amazon it is from a much larger distributor.)
Ah, but that might explain one part of the controversy. You would expect traditional bookstores to sell mostly books from the most selling part of the curve. Which would mean that Amazon probably sells _more_ books on the fringe as a percentage than their share of the booksales, because that's the area where the traditional bookstores don't compete.

Thus, the actual percentage of the bottom sales would be smaller than what they constitute out of Amazon sales.

This may mean either of two things. Either the long tail has a smaller size than Amazon figures would make us believe, or we are just starting to tip to the long curve in booksales and their market share is going up as people learn to find and buy stuff from the Internet. I would guess both (that the marketshare is going up and that the demand overall is not as big as Amazon figures lead us to believe).

After all, Amazon is the bookstore for the fringe desires.


Submit Comment

Blog Post

Name:

Email (optional):

Your Humanity:

Prove you are human by retyping the anti-spam code.
For example if the code is unodosthreefour,
type 1234 in the textbox below.

Anti-spam Code
UnoThreeEightEight:


Post



   

Most Recent Comments

A Texas Tale of Intellectual Property Litigation (A Watering Hole Patent Trolls) Aunque suena insignificante, los números son alarmantes y nos demuestran que no es tan mínimo como

James Boyle's new book with his congenial IP views free to download

French firm has patents on using computers to choose medical treatment 1

French firm has patents on using computers to choose medical treatment 1

French firm has patents on using computers to choose medical treatment 1

French firm has patents on using computers to choose medical treatment 1

French firm has patents on using computers to choose medical treatment 1

French firm has patents on using computers to choose medical treatment 1

French firm has patents on using computers to choose medical treatment 1

French firm has patents on using computers to choose medical treatment 1

French firm has patents on using computers to choose medical treatment 1

French firm has patents on using computers to choose medical treatment 1

French firm has patents on using computers to choose medical treatment 1

French firm has patents on using computers to choose medical treatment 1

French firm has patents on using computers to choose medical treatment 1

French firm has patents on using computers to choose medical treatment 1

French firm has patents on using computers to choose medical treatment 1

French firm has patents on using computers to choose medical treatment 1

French firm has patents on using computers to choose medical treatment 1

French firm has patents on using computers to choose medical treatment 1