Skip to main content

Histograms and kernel density estimation KDE 2

You can download this whole post as a Jupyter notebook here

Why histograms

As we all know, Histograms are an extremely common way to make sense of discrete data. Whether we mean to or not, when we're using histograms, we're usually doing some form of density estimation. That is, although we only have a few discrete data points, we'd really pretend that we have some sort of continuous distribution, and we'd really like to know what that distribution is. For instance, I was recently grading an exam and trying to figure out what the underlying distribution of grades looked like, whether I should curve the exam, and, if so, how I should curve it.

I'll poke at this in an IPython Notebook; if you're doing this in a different environments, you may wish to uncomment out the commented lines so that your namespace is properly polluted.

In [1]:
from __future__ import division
%pylab inline

grades = array((93.5,93,60.8,94.5,82,87.5,91.5,99.5,86,93.5,92.5,78,76,69,94.5,89.5,92.8,78,65.5,98,98.5,92.3,95.5,76,91,95,61.4,96,90))
junk = hist(grades)
Populating the interactive namespace from numpy and matplotlib

Why not histograms?

We can play around with the number of bins, but it's not totally clear what's going on with the left half of the grades.

In [2]:
junk = hist(grades,5)
In [3]:
junk = hist(grades,15)

So, maybe the histogram isn't the perfect tool for the job at hand. In fact, there are quite a few well-known problems with histograms. Shodor has a really nice histogram activity that lets you play around with data interactively. Rather than using Java or JavaScript directly, Jake Vanderplas has a great package called JSAnimation that lets us animate things directly in IPython Notebooks. I'll cheat a bit: since all I really need for this is a single slider, I can use JSAnimation to let us interact with data very similarly to the Shodor pages.

In [4]:
from JSAnimation.IPython_display import display_animation, anim_to_html

Before we start, I'll load in a few data sets. If you're interested, you can rerun this notebook with a different data set to see how it affects things. data_shodor is the "My Data" set from their histogram activity page, data_sat is the average SAT Math data from the same page, data_tarn is from Tarn Duong's fantastic KDE explanation (we'll get there), and simple_data is just a very simple data set.

In [5]:
data_tarn = array((2.1,2.2,2.3,2.25,2.4,2.61,2.62,3.3,3.4,3.41,3.6,3.8))
data_shodor = array((49,49,45,45,41,38,38,38,40,37,37,34,35,36,35,38,38,32,32,32,37,31,32,31,32,30,30,32,30,30,29,28,29,29,29,30,28,27,29,30,28,27,28,27,27,29,29,29,26,27,25,25,25,25,25,25,25,26,26,27))
data_sat = array((490,499,459,575,575,513,382,525,510,542,368,564,509,530,485,521,495,526,474,500,441,750,495,476,456,440,547,479,501,476,457,444,444,467,482,449,464,501,670,740,590,700,590,450,452,468,472,447,520,506,570,474,532,472,585,466,549,736,654,585,574,621,542,616,547,554,514,592,531,550,507,441,551,450,548,589,549,485,480,545,451,448,487,480,540,470,529,445,460,457,560,495,480,430,644,489,506,660,444,551,583,457,440,470,486,413,470,408,440,596,442,544,528,559,505,450,477,557,446,553,370,533,496,513,403,496,543,533,471,404,439,459,393,470,650,512,445,446,474,449,529,538,450,570,499,375,515,445,571,442,492,456,428,536,515,450,537,490,446,410,526,560,560,540,502,642,590,480,557,468,524,445,479))
simple_data = array((0,5,10))
data = grades

Two of the main problems with histograms are (1) you need to define a bin size (2) you need to decide where the left edge of the bin is.

Histogram bin size

Let's look at the effects of bin size on histograms.

Caveat: the code below is certainly not optimized. Ditto for all of the code in this notebook. I wrote it quickly and at the same time I learned what FuncAnimation does. In order to make this read more easily, I've included most of the code at the end. If you're running this interactively, run the cell at the end now!

Let's start with getHistBinNumAni. What does that do? Given a data set, it'll give us an interactive plot. By dragging the slider around, we can make a histogram with anywhere from 1 bin to some max (default: 20) number of bins. No matter how many bins we have, the actual data is shown in blue dots near the bottom. Here's what it looks like for the grades:

In [7]:
ani = getHistBinNumAni(data)
display_animation(ani, default_mode='once')
Out[7]:


Once Loop Reflect

So, obviously chosing the number of bins makes a huge difference in how we'd interpret the data.

Where do the histogram bins start?

One of the other big problems with histograms, especially for relatively small data sets, is that you have to choose where the left edge of the first bin goes. Do you center the bin around the first group of points? Do you make the left edge match up with the left-most data point? Let's make some plots to see how that can affect things, because it's a bit easier to understand what I'm going on about that way. We'll make a similar animation with getHistBinOffsetAni. As with the previous animation, drag the slider around. This time, we have the same number of bins, but the slider drags around the data relative to the bins (or vice versa, depending on how you think of it).

In [8]:
ani = getHistBinOffsetAni(data)
display_animation(ani, default_mode='once')
Out[8]:


Once Loop Reflect

KDE (Kernel Density Estimation) to the rescue!

Kernel density estimation is my favorite alternative to histograms. Tarn Duong has fantastic KDE explanation, which is well worth reading. The basic idea is that, if you're looking at our simple dataset (simple_data = array((0,5,10)), you might choose to represent each point as a rectangle:

In [9]:
bar(simple_data,ones_like(simple_data)*0.5,width=0.5,facecolor='grey',alpha=0.5)
junk = ylim(0, 2.0)

not so interesting so far, but what do we do when the rectangles get wide enough that they start to overlap? Instead of just letting them run over each other like

In [10]:
bar(simple_data,ones_like(simple_data)*0.5,width=6,facecolor='grey',alpha=0.5)
junk = ylim(0, 2.0)

and instead of coloring the overlap regions darker grey, we add the rectangles together. So, since each of the rectangles has height 0.5 in the above example, the dark grey regions should really have height 1.0. This idea is called "kernel density estimation" (KDE), and the rectangle that we're using is called the "kernel". If we wanted to draw a different shape at each point, we'd do so by specifying a different kernel (perhaps a bell curve, or a triangle).

KDE, rectangular kernel

Now let's try KDE with a rectangular kernel. This time, using getKdeRectAni, you get a slider controls the width of the kernel.

In [11]:
ani = getKdeRectAni(simple_data)
display_animation(ani, default_mode='once')
Out[11]:


Once Loop Reflect

play with the slider, and note what happens when you make it big enough that the rectangles start to overlap. By tuning the width of the rectangles, we can tune how finely or coarsely we're looking at the data. It's not so powerful with three data points, but check it out with the grades from above:

In [12]:
ani = getKdeRectAni(data)
display_animation(ani, default_mode='once')
Out[12]:


Once Loop Reflect

In my view, there's a sweet spot right around 1/8 or 1/9 of the way across the slider where there are three distinct peaks. It looks very much like a trimodal distribution to me. So far, this isn't totally automatic; we have to pick the width of our kernel, but it's obvoius that KDE can give us a much better view of the underlying data than histograms!

KDE, Gaussian kernel

As mentioned above, we can use a different kernel. One of the most common kernels to use is a Gaussian. Using getKdeGaussianAni:

Again, the slider controls kernel width.

In [13]:
ani = getKdeGaussianAni(data)
display_animation(ani, default_mode='once')
Out[13]:


Once Loop Reflect

This gives us a really nice picture of the data. Play around with the slider and see what you think.

Kernel width

Not only does KDE give us a better picture than histograms, but there turn out to be actual answers to the question of "how wide should my kernel be?" You can see, for instance, that making the kernel too narrow doesn't provide much more information than the raw data, while making it too large oversmooths the data, making it mostly look like a single kernel with some bits on the sides.

Daniel Smith has a really nice KDE module that chooses an optimal bandwidth and can be used with SciPy (scipy does have its own KDE module, but I've found Daniel's to be quite robust).

Other data sets

I highly recommend just playing around with other data sets using the above code. I was interested in playing around with income data, so I show how to grab that data from the IRS website below and play around a bit without comment. Enjoy!

Income data

Let's grab the income data from The IRS and make some plots.

In [14]:
import urllib
f = urllib.urlopen("http://www.irs.gov/file_source/pub/irs-soi/09incicsv.csv")
#"State_Code","County_Code","State_Abbrv","County_Name",   "Return_Num","Exmpt_Num","AGI","Wages_Salaries","Dividends","Interest"
irs2009 = loadtxt(f,delimiter=',',skiprows=1,usecols=(4,5,6,7,8,9))
agi2009 = irs2009[:,2]

Now try things like

In [15]:
ani = getHistBinNumAni(agi2009)
display_animation(ani, default_mode='once')
Out[15]:


Once Loop Reflect

Whoops, that's hard to make sense of. Let's use logs

In [16]:
la2009 = log(agi2009)
la2009 = la2009[-isnan(la2009)]
-c:1: RuntimeWarning: invalid value encountered in log
In [17]:
ani = getHistBinNumAni(la2009)
display_animation(ani, default_mode='once')
Out[17]:


Once Loop Reflect
In [18]:
ani = getKdeRectAni(la2009)
display_animation(ani, default_mode='once')
Out[18]:


Once Loop Reflect
In [19]:
ani = getKdeGaussianAni(la2009)
display_animation(ani,default_mode='once')
Out[19]:


Once Loop Reflect

In order to make this read more easily, I've put the bulk of the code below. You'll have to run it before the previous cells.

In [6]:
#!/usr/bin/env python

from numpy import histogram as nphistogram
#from numpy import array, linspace, zeros, ones, ones_like
#import numpy as np
#import matplotlib.pyplot as plt
#from matplotlib.pyplot import figure, hist, plot, ion, axes, title
from JSAnimation.IPython_display import display_animation, anim_to_html

from matplotlib import animation as animation


def getHistBinNumAni(data,totalframes=None,showpts=True):
    #ion()
    if totalframes is None:
        totalframes = min(len(data)-1,100)
    fig = figure()
    ax = fig.gca()

    n, bins, patches = hist(data, totalframes, normed=1, facecolor='green', alpha=0.0)
    if showpts:
        junk = plot(data,0.2*ones_like(data),'bo')
    def animate(i):
        n, bins = nphistogram(data, i+1, normed=False)
        #print n
        ax.set_ylim(0,1.1*n.max())
        for j in range(len(n)):
            rect,h = patches[j],n[j]
            #print h.max()
            x = bins[j]
            w = bins[j+1] - x
            rect.set_height(h)
            rect.set_x(x)
            rect.set_width(w)
            rect.set_alpha(0.75)
        #fig.canvas.draw()
    
    ani = animation.FuncAnimation(fig, animate, totalframes, repeat=False)
    return ani

def getHistBinOffsetAni(data,nbins=20,showpts=True):
    offsets = linspace(-0.5,0.5,50)
    totalframes = len(offsets)
    fig = figure()
    ax = fig.gca()

    n, _bins, patches = hist(data, nbins, normed=1, facecolor='green', alpha=0.0)
    if showpts:
        junk = plot(data,0.2*ones_like(data),'bo')
    # Obnoxious: find max number in a bin ever
    nmax = 1
    for i in range(totalframes):
        dx = (data.max() - data.min())/nbins
        _bins = linspace(data.min() - dx + offsets[i]*dx, data.max()+dx + offsets[i]*dx,len(data)+1)
        n, bins = nphistogram(data, bins=_bins, normed=False)
        nmax = max(nmax,n.max())
                               
    def animate(i):
        dx = (data.max() - data.min())/nbins
        # bins go from min - dx to max + dx, then offset.
        _bins = linspace(data.min() - dx + offsets[i]*dx, data.max()+dx + offsets[i]*dx,nbins)
        n, bins = nphistogram(data, bins = _bins, normed=False)
        ax.set_ylim(0,1.1*nmax)
        #ax.set_xlim(data.min()-dx,data.max()+dx)
        binwidth = bins[1] - bins[0]
        ax.set_xlim(bins[0]-binwidth,bins[-1] + binwidth)

        for j in range(len(n)):
            #continue
            rect,h = patches[j],n[j]
            #print h.max()
            x = bins[j]
            w = bins[j+1] - x
            rect.set_height(h)
            rect.set_x(x)
            rect.set_width(w)
            rect.set_alpha(0.75)
        fig.canvas.draw()    
    ani = animation.FuncAnimation(fig, animate, totalframes, repeat=False)
    return ani
#!/usr/bin/env python

from numpy import sqrt, pi, exp

def getKdeGaussianAni(data,totalframes=100, showpts=True):
    fig = figure()
    
    # Let's say 10000 points for the whole thing
    width = data.max() - data.min()
    left, right = data.min(), data.min() + (width)
    left, right = left - (totalframes/100)*width, right + (totalframes/100)*width
    
    ax = axes(xlim=(left,right),ylim=(-0.1,2))
    line, = ax.plot([], [], lw=2)
    if showpts:
        junk = plot(data,ones_like(data)*0.1,'go')

    
    numpts = 10000
    x = linspace(left,right,numpts)
    
    dx = (right-left)/(numpts-1)
    
    def init():
        line.set_data([], [])
        return line,
    
    def gaussian(x,sigma,mu):
        # Why isn't this defined somewhere?! It must be!
        return (1/sqrt(2*pi*sigma**2)) *  exp(-((x-mu)**2)/(2*sigma**2))
    
    def animate(i):
        y = zeros(10000)
        kernelwidth = .02*width*(i+1)
        kernelpts = int(kernelwidth/dx)
        kernel = gaussian(linspace(-3,3,kernelpts),1,0)
        #kernel = ones(kernelpts)
        for d in data:
            center = d - left
            centerpts = int(center/dx)
            bottom = centerpts - int(kernelpts/2)
            top = centerpts+int(kernelpts/2)
            if top - bottom < kernelpts: top = top + 1
            if top - bottom > kernelpts: top = top - 1
            y[bottom:top] += kernel
        ax.set_xlim(x[where(y>0)[0][0]],x[where(y>0)[0][-1]])
        line.set_data(x,y)
        ax.set_ylim(min(0,y.min()),1.1*y.max())
        #title('ymin %s ymax %s'%(y.min(),y.max()))

    
        #sleep(0.1)
        return line,
    ani = animation.FuncAnimation(fig, animate, init_func=init,
                                  frames=totalframes, repeat=False)
    return ani
#FACTOR ME for rect and gaussian
def getKdeRectAni(data,totalframes=100,showpts=True):
    #ion()
    totalframes = 100
    fig = figure()
    
    # Let's say 10000 points for the whole thing
    width = data.max() - data.min()
    left, right = data.min(), data.min() + (width)
    left, right = left - (totalframes/100)*width, right + (totalframes/100)*width
    
    ax = axes(xlim=(left,right),ylim=(-0.1,2))
    line, = ax.plot([], [], lw=2)
    
    numpts = 10000
    x = linspace(left,right,numpts)
    
    dx = (right-left)/(numpts-1)
    
    def init():
        line.set_data([], [])
        return line,

    if showpts:
        junk = plot(data,0.2*ones_like(data),'bo')
    
    def animate(i):
        y = zeros(10000)
        kernelwidth = .02*width*(i+1)
        kernelpts = int(kernelwidth/dx)
        kernel = ones(kernelpts)
        for d in data:
            center = d - left
            centerpts = int(center/dx)
            bottom = centerpts - int(kernelpts/2)
            top = centerpts+int(kernelpts/2)
            if top - bottom < kernelpts: top = top + 1
            if top - bottom > kernelpts: top = top - 1
            y[bottom:top] += kernel
        line.set_data(x,y)
        ax.set_ylim(0,1.1*y.max())
        ax.set_xlim(x[where(y>0)[0][0]],x[where(y>0)[0][-1]])
    
        #sleep(0.1)
        return line,
    ani = animation.FuncAnimation(fig, animate, init_func=init,
                                  frames=totalframes, repeat=False)
    return ani

And that's it. Cheers!

In [1]:
from IPython.core.display import HTML
HTML('''<h2>Comments from old blog</h2>
			<div id="comments">


			<h3 id="comments-title">13 Responses to <em>Histograms and Kernel Density Estimation (KDE)</em></h3>


			<ol class="commentlist">
					<li class="comment even thread-even depth-1" id="li-comment-511">
		<div id="comment-511">
			<div class="comment-author vcard">
				<img alt='' src='http://2.gravatar.com/avatar/e1e193d200f563e379624d348296699f?s=40&#038;d=mm&#038;r=g' srcset='http://2.gravatar.com/avatar/e1e193d200f563e379624d348296699f?s=80&amp;d=mm&amp;r=g 2x' class='avatar avatar-40 photo' height='40' width='40' />				<cite class="fn">Arindam Paul</cite> <span class="says">says:</span>			</div><!-- .comment-author .vcard -->
			
			<div class="comment-meta commentmetadata"><a href="http://www.mglerner.com/blog/?p=28#comment-511">
				December 14, 2013 at 10:04 pm</a> <a class="comment-edit-link" href="http://www.mglerner.com/blog/wp-admin/comment.php?action=editcomment&#038;c=511">(Edit)</a>			</div><!-- .comment-meta .commentmetadata -->

			<div class="comment-body"><p>Excellent !! One of the best explanations of KDE I have ever seen.<br />
This post has generated enough interest to read your other blogs. great job.</p>
</div>

			<div class="reply">
				<a rel='nofollow' class='comment-reply-link' href='http://www.mglerner.com/blog/?p=28&#038;replytocom=511#respond' onclick='return addComment.moveForm( "comment-511", "511", "respond", "28" )' aria-label='Reply to Arindam Paul'>Reply</a>			</div><!-- .reply -->
		</div><!-- #comment-##  -->

	</li><!-- #comment-## -->
	<li class="comment odd alt thread-odd thread-alt depth-1" id="li-comment-560">
		<div id="comment-560">
			<div class="comment-author vcard">
				<img alt='' src='http://0.gravatar.com/avatar/c4332fd6971134c4ce706b3021a1afef?s=40&#038;d=mm&#038;r=g' srcset='http://0.gravatar.com/avatar/c4332fd6971134c4ce706b3021a1afef?s=80&amp;d=mm&amp;r=g 2x' class='avatar avatar-40 photo' height='40' width='40' />				<cite class="fn">Nils Wagner</cite> <span class="says">says:</span>			</div><!-- .comment-author .vcard -->
			
			<div class="comment-meta commentmetadata"><a href="http://www.mglerner.com/blog/?p=28#comment-560">
				February 19, 2014 at 8:45 am</a> <a class="comment-edit-link" href="http://www.mglerner.com/blog/wp-admin/comment.php?action=editcomment&#038;c=560">(Edit)</a>			</div><!-- .comment-meta .commentmetadata -->

			<div class="comment-body"><p>Assume that we have a spatial energy distribution given at discrete points in 3-D, i.e.</p>
<p>E_i(x_i,y_i,z_i)</p>
<p>where E_i denotes the energy and x_i,y_i,z_i are the corresponding coordinates.</p>
<p>Is it possible to extract the local hot spots using scipy ?</p>
<p>A small example is appreciated.</p>
<p>Thanks in advance.</p>
</div>

			<div class="reply">
				<a rel='nofollow' class='comment-reply-link' href='http://www.mglerner.com/blog/?p=28&#038;replytocom=560#respond' onclick='return addComment.moveForm( "comment-560", "560", "respond", "28" )' aria-label='Reply to Nils Wagner'>Reply</a>			</div><!-- .reply -->
		</div><!-- #comment-##  -->

	</li><!-- #comment-## -->
	<li class="comment even thread-even depth-1" id="li-comment-9267">
		<div id="comment-9267">
			<div class="comment-author vcard">
				<img alt='' src='http://1.gravatar.com/avatar/77076e72c1acb8e5e811165d72334357?s=40&#038;d=mm&#038;r=g' srcset='http://1.gravatar.com/avatar/77076e72c1acb8e5e811165d72334357?s=80&amp;d=mm&amp;r=g 2x' class='avatar avatar-40 photo' height='40' width='40' />				<cite class="fn"><a href='http://www.yourdomain.com' rel='external nofollow' class='url'>domain</a></cite> <span class="says">says:</span>			</div><!-- .comment-author .vcard -->
			
			<div class="comment-meta commentmetadata"><a href="http://www.mglerner.com/blog/?p=28#comment-9267">
				October 12, 2014 at 9:36 pm</a> <a class="comment-edit-link" href="http://www.mglerner.com/blog/wp-admin/comment.php?action=editcomment&#038;c=9267">(Edit)</a>			</div><!-- .comment-meta .commentmetadata -->

			<div class="comment-body"><p>It's really a cool and helpful piece of info. I'm happy that you simply shared this helpful information with us.<br />
Please stay us up to date like this. Thanks for sharing.</p>
</div>

			<div class="reply">
				<a rel='nofollow' class='comment-reply-link' href='http://www.mglerner.com/blog/?p=28&#038;replytocom=9267#respond' onclick='return addComment.moveForm( "comment-9267", "9267", "respond", "28" )' aria-label='Reply to domain'>Reply</a>			</div><!-- .reply -->
		</div><!-- #comment-##  -->

	</li><!-- #comment-## -->
	<li class="comment odd alt thread-odd thread-alt depth-1" id="li-comment-21606">
		<div id="comment-21606">
			<div class="comment-author vcard">
				<img alt='' src='http://0.gravatar.com/avatar/39a8a0bf814df9653a7a7cb1ffc2aee1?s=40&#038;d=mm&#038;r=g' srcset='http://0.gravatar.com/avatar/39a8a0bf814df9653a7a7cb1ffc2aee1?s=80&amp;d=mm&amp;r=g 2x' class='avatar avatar-40 photo' height='40' width='40' />				<cite class="fn">Andreas</cite> <span class="says">says:</span>			</div><!-- .comment-author .vcard -->
			
			<div class="comment-meta commentmetadata"><a href="http://www.mglerner.com/blog/?p=28#comment-21606">
				February 1, 2015 at 9:13 am</a> <a class="comment-edit-link" href="http://www.mglerner.com/blog/wp-admin/comment.php?action=editcomment&#038;c=21606">(Edit)</a>			</div><!-- .comment-meta .commentmetadata -->

			<div class="comment-body"><p>Thanks for sharing your knowledge and interpretation of kernel density estimation with us. Very enlighting.</p>
</div>

			<div class="reply">
				<a rel='nofollow' class='comment-reply-link' href='http://www.mglerner.com/blog/?p=28&#038;replytocom=21606#respond' onclick='return addComment.moveForm( "comment-21606", "21606", "respond", "28" )' aria-label='Reply to Andreas'>Reply</a>			</div><!-- .reply -->
		</div><!-- #comment-##  -->

	</li><!-- #comment-## -->
	<li class="comment even thread-even depth-1" id="li-comment-22489">
		<div id="comment-22489">
			<div class="comment-author vcard">
				<img alt='' src='http://0.gravatar.com/avatar/64f00e8430ab28fbcf370406b089b937?s=40&#038;d=mm&#038;r=g' srcset='http://0.gravatar.com/avatar/64f00e8430ab28fbcf370406b089b937?s=80&amp;d=mm&amp;r=g 2x' class='avatar avatar-40 photo' height='40' width='40' />				<cite class="fn">gmas</cite> <span class="says">says:</span>			</div><!-- .comment-author .vcard -->
			
			<div class="comment-meta commentmetadata"><a href="http://www.mglerner.com/blog/?p=28#comment-22489">
				February 17, 2015 at 8:23 am</a> <a class="comment-edit-link" href="http://www.mglerner.com/blog/wp-admin/comment.php?action=editcomment&#038;c=22489">(Edit)</a>			</div><!-- .comment-meta .commentmetadata -->

			<div class="comment-body"><p>If I try to run your notebook, I get this name error:</p>
<p><code><br />
NameError                                 Traceback (most recent call last)<br />
 in ()<br />
----&gt; 1 ani = getHistBinNumAni(data)<br />
      2 display_animation(ani, default_mode='once')</p>
<p>NameError: name 'getHistBinNumAni' is not defined<br />
</code></p>
</div>

			<div class="reply">
				<a rel='nofollow' class='comment-reply-link' href='http://www.mglerner.com/blog/?p=28&#038;replytocom=22489#respond' onclick='return addComment.moveForm( "comment-22489", "22489", "respond", "28" )' aria-label='Reply to gmas'>Reply</a>			</div><!-- .reply -->
		</div><!-- #comment-##  -->

	<ul class="children">
	<li class="comment odd alt depth-2" id="li-comment-22490">
		<div id="comment-22490">
			<div class="comment-author vcard">
				<img alt='' src='http://0.gravatar.com/avatar/64f00e8430ab28fbcf370406b089b937?s=40&#038;d=mm&#038;r=g' srcset='http://0.gravatar.com/avatar/64f00e8430ab28fbcf370406b089b937?s=80&amp;d=mm&amp;r=g 2x' class='avatar avatar-40 photo' height='40' width='40' />				<cite class="fn">gmas</cite> <span class="says">says:</span>			</div><!-- .comment-author .vcard -->
			
			<div class="comment-meta commentmetadata"><a href="http://www.mglerner.com/blog/?p=28#comment-22490">
				February 17, 2015 at 8:25 am</a> <a class="comment-edit-link" href="http://www.mglerner.com/blog/wp-admin/comment.php?action=editcomment&#038;c=22490">(Edit)</a>			</div><!-- .comment-meta .commentmetadata -->

			<div class="comment-body"><p>Ops.. I have just read the last part that asks to run the code before the other cells! Maybe you can add a note at the begin of the post..</p>
</div>

			<div class="reply">
				<a rel='nofollow' class='comment-reply-link' href='http://www.mglerner.com/blog/?p=28&#038;replytocom=22490#respond' onclick='return addComment.moveForm( "comment-22490", "22490", "respond", "28" )' aria-label='Reply to gmas'>Reply</a>			</div><!-- .reply -->
		</div><!-- #comment-##  -->

	</li><!-- #comment-## -->
</ul><!-- .children -->
</li><!-- #comment-## -->
	<li class="comment even thread-odd thread-alt depth-1" id="li-comment-26940">
		<div id="comment-26940">
			<div class="comment-author vcard">
				<img alt='' src='http://1.gravatar.com/avatar/ab757a5013bba27ff3d69b8448b5b4a9?s=40&#038;d=mm&#038;r=g' srcset='http://1.gravatar.com/avatar/ab757a5013bba27ff3d69b8448b5b4a9?s=80&amp;d=mm&amp;r=g 2x' class='avatar avatar-40 photo' height='40' width='40' />				<cite class="fn">X</cite> <span class="says">says:</span>			</div><!-- .comment-author .vcard -->
			
			<div class="comment-meta commentmetadata"><a href="http://www.mglerner.com/blog/?p=28#comment-26940">
				June 23, 2015 at 4:54 pm</a> <a class="comment-edit-link" href="http://www.mglerner.com/blog/wp-admin/comment.php?action=editcomment&#038;c=26940">(Edit)</a>			</div><!-- .comment-meta .commentmetadata -->

			<div class="comment-body"><p>Is there a way to fit data to an exponential distribution such that it maximizes the entropy H(p_i) = - sum p_i*log(p_i) where p_i is the probability of a given event?</p>
</div>

			<div class="reply">
				<a rel='nofollow' class='comment-reply-link' href='http://www.mglerner.com/blog/?p=28&#038;replytocom=26940#respond' onclick='return addComment.moveForm( "comment-26940", "26940", "respond", "28" )' aria-label='Reply to X'>Reply</a>			</div><!-- .reply -->
		</div><!-- #comment-##  -->

	<ul class="children">
	<li class="comment byuser comment-author-mglerner bypostauthor odd alt depth-2" id="li-comment-26985">
		<div id="comment-26985">
			<div class="comment-author vcard">
				<img alt='' src='http://1.gravatar.com/avatar/d49bf8fdd300871a66f21a8a97674483?s=40&#038;d=mm&#038;r=g' srcset='http://1.gravatar.com/avatar/d49bf8fdd300871a66f21a8a97674483?s=80&amp;d=mm&amp;r=g 2x' class='avatar avatar-40 photo' height='40' width='40' />				<cite class="fn">mglerner</cite> <span class="says">says:</span>			</div><!-- .comment-author .vcard -->
			
			<div class="comment-meta commentmetadata"><a href="http://www.mglerner.com/blog/?p=28#comment-26985">
				June 25, 2015 at 4:00 pm</a> <a class="comment-edit-link" href="http://www.mglerner.com/blog/wp-admin/comment.php?action=editcomment&#038;c=26985">(Edit)</a>			</div><!-- .comment-meta .commentmetadata -->

			<div class="comment-body"><p>I don't know, but I've been wondering about similar things for a while. If I do learn the answer, I'll update.</p>
</div>

			<div class="reply">
				<a rel='nofollow' class='comment-reply-link' href='http://www.mglerner.com/blog/?p=28&#038;replytocom=26985#respond' onclick='return addComment.moveForm( "comment-26985", "26985", "respond", "28" )' aria-label='Reply to mglerner'>Reply</a>			</div><!-- .reply -->
		</div><!-- #comment-##  -->

	</li><!-- #comment-## -->
</ul><!-- .children -->
</li><!-- #comment-## -->
	<li class="post pingback">
		<p>Pingback: <a href='https://www.physicsforums.com/threads/histogram-to-pdf.835833/#post-5247985' rel='external nofollow' class='url'>Histogram to PDF</a> <a class="comment-edit-link" href="http://www.mglerner.com/blog/wp-admin/comment.php?action=editcomment&#038;c=29099">(Edit)</a></p>
	</li><!-- #comment-## -->
	<li class="post pingback">
		<p>Pingback: <a href='https://www.physicsforums.com/threads/histogram-to-pdf.835833/#post-5252532' rel='external nofollow' class='url'>Histogram to PDF</a> <a class="comment-edit-link" href="http://www.mglerner.com/blog/wp-admin/comment.php?action=editcomment&#038;c=29216">(Edit)</a></p>
	</li><!-- #comment-## -->
	<li class="comment even thread-even depth-1" id="li-comment-29361">
		<div id="comment-29361">
			<div class="comment-author vcard">
				<img alt='' src='http://2.gravatar.com/avatar/50bb656b9713cee43bb2bdd8c25f75ae?s=40&#038;d=mm&#038;r=g' srcset='http://2.gravatar.com/avatar/50bb656b9713cee43bb2bdd8c25f75ae?s=80&amp;d=mm&amp;r=g 2x' class='avatar avatar-40 photo' height='40' width='40' />				<cite class="fn"><a href='http://www.fixmynix.com' rel='external nofollow' class='url'>Koushik Khan</a></cite> <span class="says">says:</span>			</div><!-- .comment-author .vcard -->
			
			<div class="comment-meta commentmetadata"><a href="http://www.mglerner.com/blog/?p=28#comment-29361">
				October 14, 2015 at 3:09 am</a> <a class="comment-edit-link" href="http://www.mglerner.com/blog/wp-admin/comment.php?action=editcomment&#038;c=29361">(Edit)</a>			</div><!-- .comment-meta .commentmetadata -->

			<div class="comment-body"><p>Awesome presentation !</p>
</div>

			<div class="reply">
				<a rel='nofollow' class='comment-reply-link' href='http://www.mglerner.com/blog/?p=28&#038;replytocom=29361#respond' onclick='return addComment.moveForm( "comment-29361", "29361", "respond", "28" )' aria-label='Reply to Koushik Khan'>Reply</a>			</div><!-- .reply -->
		</div><!-- #comment-##  -->

	</li><!-- #comment-## -->
	<li class="comment odd alt thread-odd thread-alt depth-1" id="li-comment-33124">
		<div id="comment-33124">
			<div class="comment-author vcard">
				<img alt='' src='http://0.gravatar.com/avatar/67cc0e4cd331c2d2728133b615aced06?s=40&#038;d=mm&#038;r=g' srcset='http://0.gravatar.com/avatar/67cc0e4cd331c2d2728133b615aced06?s=80&amp;d=mm&amp;r=g 2x' class='avatar avatar-40 photo' height='40' width='40' />				<cite class="fn">Ben</cite> <span class="says">says:</span>			</div><!-- .comment-author .vcard -->
			
			<div class="comment-meta commentmetadata"><a href="http://www.mglerner.com/blog/?p=28#comment-33124">
				July 13, 2016 at 7:09 am</a> <a class="comment-edit-link" href="http://www.mglerner.com/blog/wp-admin/comment.php?action=editcomment&#038;c=33124">(Edit)</a>			</div><!-- .comment-meta .commentmetadata -->

			<div class="comment-body"><p>Fantastic explanation!<br />
Best KDE description I've found so far!<br />
Keep up the good work!</p>
</div>

			<div class="reply">
				<a rel='nofollow' class='comment-reply-link' href='http://www.mglerner.com/blog/?p=28&#038;replytocom=33124#respond' onclick='return addComment.moveForm( "comment-33124", "33124", "respond", "28" )' aria-label='Reply to Ben'>Reply</a>			</div><!-- .reply -->
		</div><!-- #comment-##  -->

	</li><!-- #comment-## -->
	<li class="comment even thread-even depth-1" id="li-comment-34145">
		<div id="comment-34145">
			<div class="comment-author vcard">
				<img alt='' src='http://2.gravatar.com/avatar/e04743b9f0b91fe808eda34de68c222a?s=40&#038;d=mm&#038;r=g' srcset='http://2.gravatar.com/avatar/e04743b9f0b91fe808eda34de68c222a?s=80&amp;d=mm&amp;r=g 2x' class='avatar avatar-40 photo' height='40' width='40' />				<cite class="fn"><a href='http://www.imdb.com/' rel='external nofollow' class='url'>Milissa Washam</a></cite> <span class="says">says:</span>			</div><!-- .comment-author .vcard -->
			
			<div class="comment-meta commentmetadata"><a href="http://www.mglerner.com/blog/?p=28#comment-34145">
				December 11, 2016 at 12:20 am</a> <a class="comment-edit-link" href="http://www.mglerner.com/blog/wp-admin/comment.php?action=editcomment&#038;c=34145">(Edit)</a>			</div><!-- .comment-meta .commentmetadata -->

			<div class="comment-body"><p>Just wanted to say this website is extremely good. I always want to hear new things about this because I’ve the similar blog during my Country with this subject which means this help´s me a lot. I did so a search around the issue and located a large amount of blogs but nothing beats this. Many thanks for sharing so much inside your blog..</p>
</div>

			<div class="reply">
				<a rel='nofollow' class='comment-reply-link' href='http://www.mglerner.com/blog/?p=28&#038;replytocom=34145#respond' onclick='return addComment.moveForm( "comment-34145", "34145", "respond", "28" )' aria-label='Reply to Milissa Washam'>Reply</a>			</div><!-- .reply -->
		</div><!-- #comment-##  -->

	</li><!-- #comment-## -->
			</ol>
	
</div><!-- #comments -->
''')
Out[1]:

Comments from old blog

13 Responses to Histograms and Kernel Density Estimation (KDE)

  1. Arindam Paul says:

    Excellent !! One of the best explanations of KDE I have ever seen.
    This post has generated enough interest to read your other blogs. great job.

  2. Nils Wagner says:

    Assume that we have a spatial energy distribution given at discrete points in 3-D, i.e.

    E_i(x_i,y_i,z_i)

    where E_i denotes the energy and x_i,y_i,z_i are the corresponding coordinates.

    Is it possible to extract the local hot spots using scipy ?

    A small example is appreciated.

    Thanks in advance.

  3. domain says:

    It's really a cool and helpful piece of info. I'm happy that you simply shared this helpful information with us.
    Please stay us up to date like this. Thanks for sharing.

  4. Andreas says:

    Thanks for sharing your knowledge and interpretation of kernel density estimation with us. Very enlighting.

  5. gmas says:

    If I try to run your notebook, I get this name error:


    NameError Traceback (most recent call last)
    in ()
    ----> 1 ani = getHistBinNumAni(data)
    2 display_animation(ani, default_mode='once')

    NameError: name 'getHistBinNumAni' is not defined

  6. X says:

    Is there a way to fit data to an exponential distribution such that it maximizes the entropy H(p_i) = - sum p_i*log(p_i) where p_i is the probability of a given event?

  7. Pingback: Histogram to PDF (Edit)

  8. Pingback: Histogram to PDF (Edit)

  9. Ben says:

    Fantastic explanation!
    Best KDE description I've found so far!
    Keep up the good work!

  10. Just wanted to say this website is extremely good. I always want to hear new things about this because I’ve the similar blog during my Country with this subject which means this help´s me a lot. I did so a search around the issue and located a large amount of blogs but nothing beats this. Many thanks for sharing so much inside your blog..

Comments

Comments powered by Disqus