| | | |
|---|
 Sell Book |
Machine Learning in Action Author: Peter Harrington ISBN-10: 1617290181 ISBN-13: 9781617290183 Published: 2012-04-16 Publisher: Manning Publications
|
Book Description:
SummaryMachine Learning in Action is unique book that blends the foundational theories of machine learning with the practical realities of building tools for everyday data analysis. You'll use the flexible Python programming language to build programs that implement algorithms for data classification, forecasting, recommendations, and higher-level features like summarization and simplification. About the BookA machine is said to learn when its performance improves with experience. Learning requires algorithms and programs that capture data and ferret out the interesting or useful patterns. Once the specialized domain of analysts and mathematicians, machine learning is becoming a skill needed by many.Machine Learning in Action is a clearly written tutorial for developers. It avoids academic language and takes you straight to the techniques you'll use in your day-to-day work. Many (Python) examples present the core algorithms of statistical data processing, data analysis, and data visualization in code you can reuse. You'll understand the concepts and how they fit in with tactical tasks like classification, forecasting, recommendations, and higher-level features like summarization and simplification.Readers need no prior experience with machine learning or statistical processing. Familiarity with Python is helpful. What's InsideA no-nonsense introduction Examples showing common ML tasks Everyday data analysis Implementing classic algorithms like Apriori and Adaboos=================================== Table of ContentsPART 1 CLASSIFICATION Machine learning basics Classifying with k-Nearest Neighbors Splitting datasets one feature at a time: decision trees Classifying with probability theory: naïve Bayes Logistic regression Support vector machines Improving classification with the AdaBoost meta algorithm PART 2 FORECASTING NUMERIC VALUES WITH REGRESSION Predicting numeric values: regression Tree-based regression PART 3 UNSUPERVISED LEARNING Grouping unlabeled items using k-means clustering Association analysis with the Apriori algorithm Efficiently finding frequent itemsets with FP-growth PART 4 ADDITIONAL TOOLS Using principal component analysis to simplify data Simplifying data with the singular value decomposition Big data and MapReduce
|
|
 Sell Book |
Hadoop: The Definitive Guide Author: Tom White ISBN-10: 1449311520 ISBN-13: 9781449311520 Published: 2012-05-29 Publisher: O'Reilly Media
|
Book Description:
Ready to unlock the power of your data? With this comprehensive guide, you’ll learn how to build and maintain reliable, scalable, distributed systems with Apache Hadoop. This book is ideal for programmers looking to analyze datasets of any size, and for administrators who want to set up and run Hadoop clusters. You’ll find illuminating case studies that demonstrate how Hadoop is used to solve specific problems. This third edition covers recent changes to Hadoop, including material on the new MapReduce API, as well as MapReduce 2 and its more flexible execution model (YARN).Store large datasets with the Hadoop Distributed File System (HDFS) Run distributed computations with MapReduce Use Hadoop’s data and I/O building blocks for compression, data integrity, serialization (including Avro), and persistence Discover common pitfalls and advanced features for writing real-world MapReduce programs Design, build, and administer a dedicated Hadoop cluster—or run Hadoop in the cloud Load data from relational databases into HDFS, using Sqoop Perform large-scale data processing with the Pig query language Analyze datasets with Hive, Hadoop’s data warehousing system Take advantage of HBase for structured and semi-structured data, and ZooKeeper for building distributed systems
|
|
 Sell Book |
Mining the Social Web: Analyzing Data from Facebook, Twitter, LinkedIn, and Other Social Media Sites Author: Matthew A. Russell ISBN-10: 1449388345 ISBN-13: 9781449388348 Published: 2011-02-08 Publisher: O'Reilly Media
|
Book Description:
Want to tap the tremendous amount of valuable social data in Facebook, Twitter, LinkedIn, and Google+? This refreshed edition helps you discover who’s making connections with social media, what they’re talking about, and where they’re located. You’ll learn how to combine social web data, analysis techniques, and visualization to find what you’ve been looking for in the social haystack—as well as useful information you didn’t know existed.Each standalone chapter introduces techniques for mining data in different areas of the social Web, including blogs and email. All you need to get started is a programming background and a willingness to learn basic Python tools.Get a straightforward synopsis of the social web landscape Use adaptable scripts on GitHub to harvest data from social network APIs such as Twitter, Facebook, LinkedIn, and Google+ Learn how to employ easy-to-use Python tools to slice and dice the data you collect Explore social connections in microformats with the XHTML Friends Network Apply advanced mining techniques such as TF-IDF, cosine similarity, collocation analysis, document summarization, and clique detection Build interactive visualizations with web technologies based upon HTML5 and JavaScript toolkits "A rich, compact, useful, practical introduction to a galaxy of tools, techniques, and theories for exploring structured and unstructured data." --Alex Martelli, Senior Staff Engineer, Google
|
|
 Sell Book |
Learning SQL Author: Alan Beaulieu ISBN-10: 0596007272 ISBN-13: 9780596007270 Published: 2005-08-29 Publisher: O'Reilly Media
|
Book Description:
SQL (Structured Query Language) is a standard programming language for generating, manipulating, and retrieving information from a relational database. If you're working with a relational database--whether you're writing applications, performing administrative tasks, or generating reports--you need to know how to interact with your data. Even if you are using a tool that generates SQL for you, such as a reporting tool, there may still be cases where you need to bypass the automatic generation feature and write your own SQL statements. To help you attain this fundamental SQL knowledge, look to Learning SQL, an introductory guide to SQL, designed primarily for developers just cutting their teeth on the language. Learning SQL moves you quickly through the basics and then on to some of the more commonly used advanced features. Among the topics discussed:The history of the computerized database SQL Data Statements--those used to create, manipulate, and retrieve data stored in your database; example statements include select, update, insert, and delete SQL Schema Statements--those used to create database objects, such as tables, indexes, and constraints How data sets can interact with queries The importance of subqueries Data conversion and manipulation via SQL's built-in functions How conditional logic can be used in Data Statements Best of all, Learning SQL talks to you in a real-world manner, discussing various platform differences that you're likely to encounter and offering a series of chapter exercises that walk you through the learning process. Whenever possible, the book sticks to the features included in the ANSI SQL standards. This means you'll be able to apply what you learn to any of several different databases; the book covers MySQL, Microsoft SQL Server, and Oracle Database, but the features and syntax should apply just as well (perhaps with some tweaking) to IBM DB2, Sybase Adaptive Server, and PostgreSQL. Put the power and flexibility of SQL to work. With Learning SQL you can master this important skill and know that the SQL statements you write are indeed correct.
|
|
 Sell Book |
Hadoop: The Definitive Guide Author: Tom White ISBN-10: 1449389732 ISBN-13: 9781449389734 Published: 2010-10-12 Publisher: Yahoo Press
|
Book Description:
Discover how Apache Hadoop can unleash the power of your data. This comprehensive resource shows you how to build and maintain reliable, scalable, distributed systems with the Hadoop framework -- an open source implementation of MapReduce, the algorithm on which Google built its empire. Programmers will find details for analyzing datasets of any size, and administrators will learn how to set up and run Hadoop clusters.This revised edition covers recent changes to Hadoop, including new features such as Hive, Sqoop, and Avro. It also provides illuminating case studies that illustrate how Hadoop is used to solve specific problems. Looking to get the most out of your data? This is your book.Use the Hadoop Distributed File System (HDFS) for storing large datasets, then run distributed computations over those datasets with MapReduce Become familiar with Hadoop’s data and I/O building blocks for compression, data integrity, serialization, and persistence Discover common pitfalls and advanced features for writing real-world MapReduce programs Design, build, and administer a dedicated Hadoop cluster, or run Hadoop in the cloud Use Pig, a high-level query language for large-scale data processing Analyze datasets with Hive, Hadoop’s data warehousing system Take advantage of HBase, Hadoop’s database for structured and semi-structured data Learn ZooKeeper, a toolkit of coordination primitives for building distributed systems "Now you have the opportunity to learn about Hadoop from a master -- not only of the technology, but also of common sense and plain talk." --Doug Cutting, Cloudera
|
|
 Sell Book |
HBase: The Definitive Guide Author: Lars George ISBN-10: 1449396100 ISBN-13: 9781449396107 Published: 2011-09-20 Publisher: O'Reilly Media
|
Book Description:
If you're looking for a scalable storage solution to accommodate a virtually endless amount of data, this book shows you how Apache HBase can fulfill your needs. As the open source implementation of Google's BigTable architecture, HBase scales to billions of rows and millions of columns, while ensuring that write and read performance remain constant. Many IT executives are asking pointed questions about HBase. This book provides meaningful answers, whether you’re evaluating this non-relational database or planning to put it into practice right away.Discover how tight integration with Hadoop makes scalability with HBase easier Distribute large datasets across an inexpensive cluster of commodity servers Access HBase with native Java clients, or with gateway servers providing REST, Avro, or Thrift APIs Get details on HBase’s architecture, including the storage format, write-ahead log, background processes, and more Integrate HBase with Hadoop's MapReduce framework for massively parallelized data processing jobs Learn how to tune clusters, design schemas, copy tables, import bulk data, decommission nodes, and many other tasks
|
|
 Sell Book |
Super Crunchers: Why Thinking-By-Numbers is the New Way To Be Smart Author: Ian Ayres ISBN-10: 0553384732 ISBN-13: 9780553384734 Published: 2008-08-26 Publisher: Bantam
|
Book Description:
An international sensation—and still the talk of the relevant blogosphere—this Wall Street Journal and New York Times business bestseller examines the “power” in numbers. Today more than ever, number crunching affects your life in ways you might not even imagine. Intuition and experience are no longer enough to make the grade. In order to succeed—even survive—in our data-based world, you need to become statistically literate.Cutting-edge organizations are already crunching increasingly larger databases to find the unseen connections among seemingly unconnected things to predict human behavior with staggeringly accurate results. From Internet sites like Google and Amazon that use filters to keep track of your tastes and your purchasing history, to insurance companies and government agencies that every day make decisions affecting your life, the brave new world of the super crunchers is happening right now. No one who wants to stay ahead of the curve should make another keystroke without reading Ian Ayres’s engrossing and enlightening book.
|
|
 Sell Book |
R in a Nutshell: A Desktop Quick Reference Author: Joseph Adler ISBN-10: 059680170X ISBN-13: 9780596801700 Published: 2010-01-11 Publisher: O'Reilly Media
|
Book Description:
Why learn R? Because it's rapidly becoming the standard for developing statistical software. R in a Nutshell provides a quick and practical way to learn this increasingly popular open source language and environment. You'll not only learn how to program in R, but also how to find the right user-contributed R packages for statistical modeling, visualization, and bioinformatics.The author introduces you to the R environment, including the R graphical user interface and console, and takes you through the fundamentals of the object-oriented R language. Then, through a variety of practical examples from medicine, business, and sports, you'll learn how you can use this remarkable tool to solve your own data analysis problems.Understand the basics of the language, including the nature of R objects Learn how to write R functions and build your own packages Work with data through visualization, statistical analysis, and other methods Explore the wealth of packages contributed by the R community Become familiar with the lattice graphics package for high-level data visualization Learn about bioinformatics packages provided by Bioconductor "I am excited about this book. R in a Nutshell is a great introduction to R, as well as a comprehensive reference for using R in data analytics and visualization. Adler provides 'real world' examples, practical advice, and scripts, making it accessible to anyone working with data, not just professional statisticians."
|
|
 Sell Book |
Mahout in Action Author: Sean Owen ISBN-10: 1935182684 ISBN-13: 9781935182689 Published: 2011-10-14 Publisher: Manning Publications
|
Book Description:
SummaryMahout in Action is a hands-on introduction to machine learning with Apache Mahout. Following real-world examples, the book presents practical use cases and then illustrates how Mahout can be applied to solve them. Includes a free audio- and video-enhanced ebook. About the TechnologyA computer system that learns and adapts as it collects data can be really powerful. Mahout, Apache's open source machine learning project, captures the core algorithms of recommendation systems, classification, and clustering in ready-to-use, scalable libraries. With Mahout, you can immediately apply to your own projects the machine learning techniques that drive Amazon, Netflix, and others. About this BookThis book covers machine learning using Apache Mahout. Based on experience with real-world applications, it introduces practical use cases and illustrates how Mahout can be applied to solve them. It places particular focus on issues of scalability and how to apply these techniques against large data sets using the Apache Hadoop framework.This book is written for developers familiar with Java -- no prior experience with Mahout is assumed. What's InsideUse group data to make individual recommendations Find logical clusters within your data Filter and refine with on-the-fly classification Free audio and video extrasTable of ContentsMeet Apache Mahout PART 1 RECOMMENDATIONS Introducing recommenders Representing recommender data Making recommendations Taking recommenders to production Distributing recommendation computations PART 2 CLUSTERING Introduction to clustering Representing data Clustering algorithms in Mahout Evaluating and improving clustering quality Taking clustering to production Real-world applications of clustering PART 3 CLASSIFICATION Introduction to classification Training a classifier Evaluating and tuning a classifier Deploying a classifier Case study: Shop It To Me
|
|
 Sell Book |
Microsoft PowerPivot for Excel 2010: Give Your Data Meaning Author: Marco Russo ISBN-10: 0735640580 ISBN-13: 9780735640580 Published: 2010-10-12 Publisher: Microsoft Press
|
Book Description:
Transform your skills, data, and business—with the power user’s guide to PowerPivot for Excel. Led by two business intelligence (BI) experts, you’ll learn how to create and share your own BI solutions using software you already know and love: Microsoft Excel. Discover how to extend your existing skills, using the PowerPivot add-in to quickly turn mass quantities of data into meaningful information and on-the-job results—no programming required. The book introduces you to PowerPivot functionality, then takes a pragmatic approach to understanding and working with data models, data loading, data manipulation with Data Analysis Expressions (DAX), simple-to-sophisticated calculations, what-if analysis, and PowerPivot patterns. Learn how to create your own, “self-service” BI solutions, then share your results effortlessly across your organization using Microsoft SharePoint®. A Note Regarding the CD or DVD The print version of this book ships with a CD or DVD. For those customers purchasing one of the digital formats in which this book is available, we are pleased to offer the CD/DVD content as a free download via O'Reilly Media's Digital Distribution services. To download this content, please visit O'Reilly's web site, search for the title of this book to find its catalog page, and click on the link below the cover image (Examples, Companion Content, or Practice Files). Note that while we provide as much of the media content as we are able via free download, we are sometimes limited by licensing restrictions. Please direct any questions or concerns to booktech@oreilly.com.
|
|
|