- 出版社: O'Reilly Media, Inc, USA (2015年5月1日)
- 平装: 330页
- 语种： 英语
- ISBN: 149190142X
- 条形码: 9781491901427
- 商品尺寸: 17.8 x 1.8 x 23.3 cm
- 商品重量: 599 g
- 品牌: O'Reilly Media, Inc, USA
- ASIN: 149190142X
- 用户评分: 分享我的评价
- 亚马逊热销商品排名: 图书商品里排第185,300名 (查看图书商品销售排行榜)
Data Science from Scratch (英语) 平装 – 2015年5月1日
Joel Grus is Chief Scientist and Head of Analytics at VoloMetrix, an enterprise analytics startup in Seattle. He has been doing data science since before it was called that, at a handful of Seattle startups before this one, and at Bing when one of those startups got bought by Microsoft. He's been using Python for almost 10 years. Joel is active in the Seattle data science community and maintains a blog at joelgrus.com
|5 星 (0%)|
|4 星 (0%)|
|3 星 (0%)|
|2 星 (0%)|
|1 星 (0%)|
+1 for relevant topics.
-1 for lack of real life library usage.
-1 for lack of real world data.
^ 2/5 stars.
I don't normally leave reviews. I buy books, read them, and put them away. I decided to make an exception for this one.
Before I go any further I'll give the benefit of the doubt that perhaps I had a misunderstanding of what this book was about. I am a data scientist who works in cyber security. I have been through a good amount of data science boot camps, books, and online training. They all have one thing in common. For being about "data" they all have pretty useless and unrealistic examples. This book is no exception. (I laughed at a part where the author literally says something similar to "real life data is usually very messy" and continues to not use real data).
I will start out with the good parts about this book. It gives you a good crash course into Python and pretty much every critical data science concept. It is concise and filled with code examples written from scratch with little to no libraries being used (which is also a bad thing, I will explain shortly). The flow of the book is well designed as well.
Now for the bad.
1. Pure Python is great, but should be kept at a minimum. Sure, it gives you a good understanding of how to implement a concept in pure Python, but that is not the industry standard whatsoever. There should have been a healthy amount of real world implementations to offset the typical college classroom feel to the book. It also tends to add too much filler to the content. It would have been much better if it was "read resource X to see how to implement this in pure Python" and not "read book X to see how to use a real world library to do this."
2. The data in this book is like all the data everyone uses in their examples. Completely useless. Randomly generated numbers, endless usage of the "coin flip" probability examples, typical artificial data that, I promise you, nobody analyzes on a daily basis. The book starts off with you role playing as a new data scientist for a fictional social networking platform for Data Scientists. Which was very promising start and I was eager to see how this "character" would deal with the data problems they would face... Spoiler, barely ever spoke about it. Most examples are riddled with typical Statistic 101 and randomly generated data. Yet again, another disappointment on that end.
Maybe I misunderstood what this book was about. I could be wrong. That being said, I am now afraid to touch another "Data Science" book or online resource because I am sufficiently tired of reading about 300 ways to solve a problem with np.random generated arrays then turning to my screen with real world data and literally looking like the Persian Room Cat Guardian.
I am not sure if I recommend this or not. It is a good book in the sense that you learn about what Data Science "contains" but definitely not how it applies to the real world.
At first I was very worried about this book based on the first few chapters for the one reason that the author was cracking jokes throughout the text and I thought if it kept up for the rest of the book I was going to be very upset. But it did not happen and it turns out to have been a very reasonable way to ease into this complicated subject.
The author steps through the toolbox of the data scientist, chapter by chapter, giving useful, insightful, clear pieces of code and textual explanations of each topic. So, for those new to data science it gives just enough to get the basic idea of a concept in terms of code and mathematical explanation, and then moves on to the next topic.
It is often said that in writing, less is better and this book gets things down to their essence. That is one of the great things about the book - that the length of each chapter is about 20 pages (over 25 chapters). So each chapter can be read and the code even exercised in about an hour. Further, the references at the end of each chapter invite the reader to expanded information at the level of one or more entire textbooks or references. Thus the book can be seen as kind of boiling down a 25-volume set of highly technical subject matter into roughly 300 pages.
The topics that were explored the best seem to be the ones on probability, working with data, regression, clustering, and databases (SQL). Some of the small but dense code samples were tough to follow but that is based on their algorithmic complexity - such as that for logistical regression and MapReduce. Occasionally the author uses a term that is not defined or in the index (such as data munging - which I still haven't looked up to see what it means). There are only a small number of typos which indicates good editing. While the Python crash course was pretty good, Python is a vast language and there could have been more to that section.
I read this book from cover to cover and stepped through logically all the code (but did not actually run any of it) and I would wholeheartedly recommend this book for anyone wanting to work in the area of data science or its related fields, such as big data engineering or data analysis.