The Poisson Regression with Python from scratch

4 min readFeb 1, 2020

The Poisson regression with Python from scratch to better understand it. A useful Python library called statsmodels which can perform regression analysis in an instant is available. However, I have tried to implement it without using Python library for having a deeper understanding.

In this article, we’ll cover simple explanation of Poisson regression, important probability concepts and how to code it without Python library.

What is Poisson ?

Poisson regression is used to model response variables (Y-values) that are counts. It tells you which explanatory variables have a statistically significant effect on the response variable. Count data counts the number of times a certain phenomenon has occurred within a certain period of time. For example, the number of accidents and the number of emails received per day.

General regression analysis assumes that the objective variable (ie, the residual) follows the normal distribution, whereas Poisson regression assumes the Poisson distribution. It seems bad if the variables following the normal distribution is assumed to the Poisson distribution.

How it is predicted

In the regression analysis, the coefficients are predicted as follows

(1) 𝜃 = 𝑎 + 𝑏𝑥

When the variable 𝑥 comes, the idea is to follow the normal distribution of the mean 𝜃. Applying this to the Poisson distribution, I would like to follow the Poisson distribution with mean 𝑦 when the variable 𝑥 comes. Furthermore, since the mean of the Poisson distribution is non-negative, the following can be considered as follows

(2) 𝜃 = exp(𝑎 + 𝑏𝑥)

If it is an exponential function, 𝑥 will be positive no matter what 𝑥, 𝑎, or b comes. At this time, it can be converted as follows

(3) log𝜃 = 𝑎 + 𝑏𝑥

The transformed left-hand side is called the Link Function(The log-link function). And the right side is called linear predictor. In fact, (1) does not convert anything, however the link function does not convert anything, this relationship holds up, then it is called the identity link function.

How to find the coefficient

Let’s find the coefficients 𝑎 and b in equation (2).

Suppose the data now has explanatory variable = {𝑥1, 𝑥2… 𝑥𝑛}, and the objective variable has 𝑦 = {𝑦1, 𝑦2… 𝑦𝑛}. What we want to do is to predict each parameter by an explanatory variable, such as “assuming Poisson distribution with mean 𝑦1 when 𝑥1 comes in”.

In mathematical terms, we would like to find the coefficients 𝑎 and b that maximize the probability that the explanatory variable will realize the current objective variable. Therefore, it seems better to find the coefficients 𝑎 and b that maximize the likelihood function. Therefore, maximize the following formula.

The meaning of 𝑦𝑖 (𝑦𝑖 | 𝜃𝑖) represents the probability that explanatory variable will occur in the Poisson distribution of the mean 𝜃𝑖, when the explanatory variable 𝑥𝑖 enters, transforms by the coefficients 𝑎 and 𝑏 to find the mean 𝜃𝑖. Remember that 𝜃𝑖 is

𝜃𝑖 = exp (𝑎 + 𝑏𝑥𝑖).

It is easier to convert logarithmic to the equation than calculating it as it is.

When it comes to c𝑜𝑛𝑠𝑡 on the right side, except for parameter 𝜃, it has nothing to do with maximization. This can be omitted when maximizing.

Implemented in Python

I used the data from the link as follows

aviatesk/intro-statistical-modeling

You can't perform that action at this time. You signed in with another tab or window. You signed out in another tab or…

github.com

Let’s try the regression analysis to see how the average number of seeds changes with the size of the flower !

Then, define the log likelihood function we would like to maximize this time.

If you have a better idea for coding, please let me know.

Calling data [‘y’] gives you a vector of objective variables, and calling data [‘x’] gives you a vector of explanatory variables. Then, take each one by the subscript and add them in totall, and subtract them at the end.

In regard to this minus, I found that only the scipy optimization function, so I made it minimized with minus. Before optimizing, we need coefficients. For now, let’s define it randomly.

The optimization is now complete! Finally, let’s plot it on a graph !

I got this graph

If you got such a slope, it works well. We can see that the number of seeds has changed (increased) to some extent depending on the size of the flower.