Big-Ish Data
bigishdata.comHome
flowingdata.comData Science Insights
www.datasciencebowl.comData Labs
insightdatalabs.comA Blog From a Human-engineer-being
A Blog From Human-engineer-being -
erogol.comPeter Norvig is not only the master mind behind the Google economy, teacher of a wonderful introduction to Python programming at Udacity and author of many scientific papers
on applied statistics and modeling, but he also seems to be the true
nerd. Who else would take a xkcd comic strip by the word and work out
the regular expression matching patterns that provide a solution to the
problem posed in the comic strip. I promise that your life will never be
the same after you went through this notebook – you’ll start to see
programming problems in almost every Internet meme from now on. Let me
know, when you found some interesting solutions!
Notebook on nbviewer
jupyter.orgThe days of holecount and 1000+ pages of statistical tables are finally
history. Today, data science and data visualization go together like
Bayesian priors and posteriors. One of the hippest and most powerful
technologies in modern browser-based visualization is the d3.js
framework. If you want to learn about the current state-of-the-art in
combining the beauty of d3.js with the ease and convenience of IPython,
Brian’s Strata talk is the perfect introduction to this topic.
ellisonbg/talk-2014-strata-sc
github.comGraphLab Graph Analytics Toolkit - Exploring the graph of American Films.
turi-code/tutorials
github.comThis is a Python script that replicates some features of Nate Silver's
538 Election Forecasting Model. It was constructed from reading the
methodology posts on the old site and the new one at the New York Times
jseabold/538model
github.comaka "Bayesian
Methods for Hackers": An introduction to Bayesian methods +
probabilistic programming with a computation/understanding-first,
mathematics-second point of view. All in pure Python ;)
CamDavidsonPilon/Probabilistic-Programming-and-Bayesian-Methods-for-Hackers
github.comThis tutorial will offer an in-depth experience of methods and tools for
the Machine Learning practitioner through a selection of advanced
features of scikit-learn and related projects. This tutorial targets
developers already familiar
Advanced Machine Learning with scikit-learn
www.youtube.comAutomatically extract information from
Twitter streams, Facebook fanpages, Google+ posts, Github accounts and
many more information sources, this is the best resource to start. It
started out as the code repository for Matthew’s O’Reilly published book,
but since the 2nd edition has become an active learning community. The
code comes with a complete setup for a virtual machine (Vagrant based) which saves you a lot of configuring and version-checking Python packages. Highly recommended!
Musings about data. Not really “big data”, probably something more resembling small data. Practical results always important.