<rss xmlns:atom="http://www.w3.org/2005/Atom" xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" version="2.0"><channel><title>数据分析 - 标签 - 灿若星河 | 郝建锋</title><link>https://philohao.com/tags/%E6%95%B0%E6%8D%AE%E5%88%86%E6%9E%90/</link><description>数据分析 - 标签 - 灿若星河 | 郝建锋</description><generator>Hugo -- gohugo.io</generator><language>zh-CN</language><managingEditor>haojianfeng1997@gmail.com (Jianfeng.Hao)</managingEditor><webMaster>haojianfeng1997@gmail.com (Jianfeng.Hao)</webMaster><lastBuildDate>Sat, 23 Jan 2021 15:16:59 +0800</lastBuildDate><atom:link href="https://philohao.com/tags/%E6%95%B0%E6%8D%AE%E5%88%86%E6%9E%90/index.xml" rel="self" type="application/rss+xml"/><item><title>数据库笔记 05 - SQL &amp; Pandas 对照学习</title><link>https://philohao.com/2021/01/20210123/</link><pubDate>Sat, 23 Jan 2021 15:16:59 +0800</pubDate><dc:creator>Jianfeng.Hao</dc:creator><author>haojianfeng1997@gmail.com (Jianfeng.Hao)</author><guid isPermaLink="true">https://philohao.com/2021/01/20210123/</guid><description>SQL 与 Pandas 对照学习笔记，整理分组统计、查询和数据处理思路。</description><content:encoded><![CDATA[<p>其实 MySQL 分组统计的实现原理，与 Pandas 几乎是一致的，只要我们理解了 Pandas 分组统计的实现原理，就能理解 MySQL 分组统计的原理。大体过程就是：</p>
<p><img
        class="lazyload"
        data-src="https://mmbiz.qpic.cn/mmbiz_png/tXYict40xfLh0Ik9K1kXOEAmHWbyyibhT8WZPbThsYdogwSV9SoX7nxGGCFXqlbBYtLtia7iaPuxUvZCgIt70DC0aA/640?wx_fmt=png&amp;tp=webp&amp;wxfrom=5&amp;wx_lazy=1&amp;wx_co=1"
        data-srcset="https://mmbiz.qpic.cn/mmbiz_png/tXYict40xfLh0Ik9K1kXOEAmHWbyyibhT8WZPbThsYdogwSV9SoX7nxGGCFXqlbBYtLtia7iaPuxUvZCgIt70DC0aA/640?wx_fmt=png&amp;tp=webp&amp;wxfrom=5&amp;wx_lazy=1&amp;wx_co=1, https://mmbiz.qpic.cn/mmbiz_png/tXYict40xfLh0Ik9K1kXOEAmHWbyyibhT8WZPbThsYdogwSV9SoX7nxGGCFXqlbBYtLtia7iaPuxUvZCgIt70DC0aA/640?wx_fmt=png&amp;tp=webp&amp;wxfrom=5&amp;wx_lazy=1&amp;wx_co=1 1.5x, https://mmbiz.qpic.cn/mmbiz_png/tXYict40xfLh0Ik9K1kXOEAmHWbyyibhT8WZPbThsYdogwSV9SoX7nxGGCFXqlbBYtLtia7iaPuxUvZCgIt70DC0aA/640?wx_fmt=png&amp;tp=webp&amp;wxfrom=5&amp;wx_lazy=1&amp;wx_co=1 2x"
        data-sizes="auto"
        alt="https://mmbiz.qpic.cn/mmbiz_png/tXYict40xfLh0Ik9K1kXOEAmHWbyyibhT8WZPbThsYdogwSV9SoX7nxGGCFXqlbBYtLtia7iaPuxUvZCgIt70DC0aA/640?wx_fmt=png&amp;tp=webp&amp;wxfrom=5&amp;wx_lazy=1&amp;wx_co=1"
        title="https://mmbiz.qpic.cn/mmbiz_png/tXYict40xfLh0Ik9K1kXOEAmHWbyyibhT8WZPbThsYdogwSV9SoX7nxGGCFXqlbBYtLtia7iaPuxUvZCgIt70DC0aA/640?wx_fmt=png&amp;tp=webp&amp;wxfrom=5&amp;wx_lazy=1&amp;wx_co=1"
    /></p>
<p><img
        class="lazyload"
        data-src="https://mmbiz.qpic.cn/mmbiz_png/tXYict40xfLh0Ik9K1kXOEAmHWbyyibhT8cmgPcjwI4lxovPHjckjjoJTuetcCibDDY7N4zhdDUCbUtmHH0oFBjmw/640?wx_fmt=png&amp;tp=webp&amp;wxfrom=5&amp;wx_lazy=1&amp;wx_co=1"
        data-srcset="https://mmbiz.qpic.cn/mmbiz_png/tXYict40xfLh0Ik9K1kXOEAmHWbyyibhT8cmgPcjwI4lxovPHjckjjoJTuetcCibDDY7N4zhdDUCbUtmHH0oFBjmw/640?wx_fmt=png&amp;tp=webp&amp;wxfrom=5&amp;wx_lazy=1&amp;wx_co=1, https://mmbiz.qpic.cn/mmbiz_png/tXYict40xfLh0Ik9K1kXOEAmHWbyyibhT8cmgPcjwI4lxovPHjckjjoJTuetcCibDDY7N4zhdDUCbUtmHH0oFBjmw/640?wx_fmt=png&amp;tp=webp&amp;wxfrom=5&amp;wx_lazy=1&amp;wx_co=1 1.5x, https://mmbiz.qpic.cn/mmbiz_png/tXYict40xfLh0Ik9K1kXOEAmHWbyyibhT8cmgPcjwI4lxovPHjckjjoJTuetcCibDDY7N4zhdDUCbUtmHH0oFBjmw/640?wx_fmt=png&amp;tp=webp&amp;wxfrom=5&amp;wx_lazy=1&amp;wx_co=1 2x"
        data-sizes="auto"
        alt="https://mmbiz.qpic.cn/mmbiz_png/tXYict40xfLh0Ik9K1kXOEAmHWbyyibhT8cmgPcjwI4lxovPHjckjjoJTuetcCibDDY7N4zhdDUCbUtmHH0oFBjmw/640?wx_fmt=png&amp;tp=webp&amp;wxfrom=5&amp;wx_lazy=1&amp;wx_co=1"
        title="https://mmbiz.qpic.cn/mmbiz_png/tXYict40xfLh0Ik9K1kXOEAmHWbyyibhT8cmgPcjwI4lxovPHjckjjoJTuetcCibDDY7N4zhdDUCbUtmHH0oFBjmw/640?wx_fmt=png&amp;tp=webp&amp;wxfrom=5&amp;wx_lazy=1&amp;wx_co=1"
    /></p>
<p>今天我将带大家从 <code>MYSQL</code> 的执行顺序（<strong>FROM → WHERE → GROUP BY → HAVING → SELECT → ORDER BY → LIMIT</strong>）上，一步步通过 Pandas 向大家展示具体的执行过程，并借助 Python 基础编码，详解更细节的过程。</p>
<h2 id="mysql-分组统计的原理">MySQL 分组统计的原理</h2>
<p>其实上面给的示例代码等价于：</p>
<div class="highlight"><div class="chroma">
<table class="lntable"><tr><td class="lntd">
<pre tabindex="0" class="chroma"><code><span class="lnt">1
</span><span class="lnt">2
</span><span class="lnt">3
</span><span class="lnt">4
</span><span class="lnt">5
</span><span class="lnt">6
</span><span class="lnt">7
</span><span class="lnt">8
</span></code></pre></td>
<td class="lntd">
<pre tabindex="0" class="chroma"><code class="language-fallback" data-lang="fallback"><span class="line"><span class="cl">SELECT
</span></span><span class="line"><span class="cl">  deal_date,
</span></span><span class="line"><span class="cl">  COUNT(IF(area= &#39;A区&#39;, order_id, NULL)) &#39;A区&#39;,
</span></span><span class="line"><span class="cl">  COUNT(IF(area= &#39;B区&#39;, order_id, NULL)) &#39;B区&#39;,
</span></span><span class="line"><span class="cl">  COUNT(IF(area= &#39;C区&#39;, order_id, NULL)) &#39;C区&#39;
</span></span><span class="line"><span class="cl">FROM
</span></span><span class="line"><span class="cl">  order_info
</span></span><span class="line"><span class="cl">GROUP BY deal_date;
</span></span></code></pre></td></tr></table>
</div>
</div><p>对于 mysql 标准的执行顺序是：</p>
<blockquote>
<p>FROM → WHERE → GROUP BY → HAVING → SELECT → ORDER BY → LIMIT</p>
</blockquote>
<p>上面这个 sql 只涉及到 FROM → GROUP BY → SELECT ，可以调整一下 sql 的阅读顺序：</p>
<div class="highlight"><div class="chroma">
<table class="lntable"><tr><td class="lntd">
<pre tabindex="0" class="chroma"><code><span class="lnt">1
</span><span class="lnt">2
</span><span class="lnt">3
</span><span class="lnt">4
</span><span class="lnt">5
</span><span class="lnt">6
</span><span class="lnt">7
</span></code></pre></td>
<td class="lntd">
<pre tabindex="0" class="chroma"><code class="language-fallback" data-lang="fallback"><span class="line"><span class="cl">FROM order_info
</span></span><span class="line"><span class="cl">GROUP BY deal_date
</span></span><span class="line"><span class="cl">SELECT
</span></span><span class="line"><span class="cl">  deal_date,
</span></span><span class="line"><span class="cl">  COUNT(IF(area= &#39;A区&#39;, order_id, NULL)) &#39;A区&#39;,
</span></span><span class="line"><span class="cl">  COUNT(IF(area= &#39;B区&#39;, order_id, NULL)) &#39;B区&#39;,
</span></span><span class="line"><span class="cl">  COUNT(IF(area= &#39;C区&#39;, order_id, NULL)) &#39;C区&#39;;
</span></span></code></pre></td></tr></table>
</div>
</div><ol>
<li>FROM</li>
</ol>
<p>首先<code>FROM order_info</code>表示读取 order_info 表的数据</p>
<ol>
<li>GROUP BY</li>
</ol>
<p><code>GROUP BY deal_date</code>表示按照 deal_date 分组</p>
<ol>
<li>SELECT</li>
</ol>
<p>对每个分组选取指定的字段，并根据聚合函数对每个分组结果进行集合</p>
<h2 id="pandas-分组统计的过程">Pandas 分组统计的过程</h2>
<h3 id="from">From</h3>
<p><code>FROM order_info</code>本质就是读取数据：</p>
<div class="highlight"><div class="chroma">
<table class="lntable"><tr><td class="lntd">
<pre tabindex="0" class="chroma"><code><span class="lnt">1
</span><span class="lnt">2
</span><span class="lnt">3
</span><span class="lnt">4
</span></code></pre></td>
<td class="lntd">
<pre tabindex="0" class="chroma"><code class="language-fallback" data-lang="fallback"><span class="line"><span class="cl">import pandas as pd
</span></span><span class="line"><span class="cl">
</span></span><span class="line"><span class="cl">data = pd.read_csv(&#34;data.csv&#34;, encoding=&#34;gb18030&#34;)
</span></span><span class="line"><span class="cl">data
</span></span></code></pre></td></tr></table>
</div>
</div><p>结果：</p>
<table>
<thead>
<tr>
<th style="text-align:center"></th>
<th style="text-align:center">order_id</th>
<th style="text-align:center">price</th>
<th style="text-align:center">deal_date</th>
<th style="text-align:center">area</th>
</tr>
</thead>
<tbody>
<tr>
<td style="text-align:center">0</td>
<td style="text-align:center">S001</td>
<td style="text-align:center">10</td>
<td style="text-align:center">2019/1/1</td>
<td style="text-align:center">A 区</td>
</tr>
<tr>
<td style="text-align:center">1</td>
<td style="text-align:center">S002</td>
<td style="text-align:center">20</td>
<td style="text-align:center">2019/1/1</td>
<td style="text-align:center">B 区</td>
</tr>
<tr>
<td style="text-align:center">2</td>
<td style="text-align:center">S003</td>
<td style="text-align:center">30</td>
<td style="text-align:center">2019/1/1</td>
<td style="text-align:center">C 区</td>
</tr>
<tr>
<td style="text-align:center">3</td>
<td style="text-align:center">S004</td>
<td style="text-align:center">40</td>
<td style="text-align:center">2019/1/2</td>
<td style="text-align:center">A 区</td>
</tr>
<tr>
<td style="text-align:center">4</td>
<td style="text-align:center">S005</td>
<td style="text-align:center">10</td>
<td style="text-align:center">2019/1/2</td>
<td style="text-align:center">B 区</td>
</tr>
<tr>
<td style="text-align:center">5</td>
<td style="text-align:center">S006</td>
<td style="text-align:center">20</td>
<td style="text-align:center">2019/1/2</td>
<td style="text-align:center">C 区</td>
</tr>
<tr>
<td style="text-align:center">6</td>
<td style="text-align:center">S007</td>
<td style="text-align:center">30</td>
<td style="text-align:center">2019/1/3</td>
<td style="text-align:center">A 区</td>
</tr>
<tr>
<td style="text-align:center">7</td>
<td style="text-align:center">S008</td>
<td style="text-align:center">40</td>
<td style="text-align:center">2019/1/3</td>
<td style="text-align:center">C 区</td>
</tr>
</tbody>
</table>
<p>对于 Mysql 的任何 InnoDB 引擎表来说都存在一个主键索引，在没有指定任何字段作为主键时，InnoDB 表会生成一个 6 字节空间的自增主键 row_id 作为主键。上面的 Pandas 表的 Index(<code>data.index</code>)就相当于 mysql 表的自增主键 row_id。</p>
<p>当然这张 MySQL 表指定 order_id 为主键时：</p>
<div class="highlight"><div class="chroma">
<table class="lntable"><tr><td class="lntd">
<pre tabindex="0" class="chroma"><code><span class="lnt">1
</span></code></pre></td>
<td class="lntd">
<pre tabindex="0" class="chroma"><code class="language-fallback" data-lang="fallback"><span class="line"><span class="cl">ALTER TABLE order_info ADD PRIMARY KEY (order_id);
</span></span></code></pre></td></tr></table>
</div>
</div><p>就相当于：</p>
<div class="highlight"><div class="chroma">
<table class="lntable"><tr><td class="lntd">
<pre tabindex="0" class="chroma"><code><span class="lnt">1
</span></code></pre></td>
<td class="lntd">
<pre tabindex="0" class="chroma"><code class="language-fallback" data-lang="fallback"><span class="line"><span class="cl">data.set_index(&#34;order_id&#34;)
</span></span></code></pre></td></tr></table>
</div>
</div><p>结果：</p>
<p><img
        class="lazyload"
        data-src="https://mmbiz.qpic.cn/mmbiz_png/tXYict40xfLh0Ik9K1kXOEAmHWbyyibhT8IPXHG6u09oiaxiclQf5Bnw7PPh21qmHEUhf2qhlXYj1MtxFdeX2AsYXQ/640?wx_fmt=png&amp;tp=webp&amp;wxfrom=5&amp;wx_lazy=1&amp;wx_co=1"
        data-srcset="https://mmbiz.qpic.cn/mmbiz_png/tXYict40xfLh0Ik9K1kXOEAmHWbyyibhT8IPXHG6u09oiaxiclQf5Bnw7PPh21qmHEUhf2qhlXYj1MtxFdeX2AsYXQ/640?wx_fmt=png&amp;tp=webp&amp;wxfrom=5&amp;wx_lazy=1&amp;wx_co=1, https://mmbiz.qpic.cn/mmbiz_png/tXYict40xfLh0Ik9K1kXOEAmHWbyyibhT8IPXHG6u09oiaxiclQf5Bnw7PPh21qmHEUhf2qhlXYj1MtxFdeX2AsYXQ/640?wx_fmt=png&amp;tp=webp&amp;wxfrom=5&amp;wx_lazy=1&amp;wx_co=1 1.5x, https://mmbiz.qpic.cn/mmbiz_png/tXYict40xfLh0Ik9K1kXOEAmHWbyyibhT8IPXHG6u09oiaxiclQf5Bnw7PPh21qmHEUhf2qhlXYj1MtxFdeX2AsYXQ/640?wx_fmt=png&amp;tp=webp&amp;wxfrom=5&amp;wx_lazy=1&amp;wx_co=1 2x"
        data-sizes="auto"
        alt="https://mmbiz.qpic.cn/mmbiz_png/tXYict40xfLh0Ik9K1kXOEAmHWbyyibhT8IPXHG6u09oiaxiclQf5Bnw7PPh21qmHEUhf2qhlXYj1MtxFdeX2AsYXQ/640?wx_fmt=png&amp;tp=webp&amp;wxfrom=5&amp;wx_lazy=1&amp;wx_co=1"
        title="图片"
    /></p>
<h3 id="group-by">GROUP BY</h3>
<p><code>GROUP BY deal_date</code>表示按照 deal_date 分组，即：</p>
<div class="highlight"><div class="chroma">
<table class="lntable"><tr><td class="lntd">
<pre tabindex="0" class="chroma"><code><span class="lnt">1
</span><span class="lnt">2
</span></code></pre></td>
<td class="lntd">
<pre tabindex="0" class="chroma"><code class="language-fallback" data-lang="fallback"><span class="line"><span class="cl">df_group = data.groupby(&#34;deal_date&#34;)
</span></span><span class="line"><span class="cl">df_group
</span></span></code></pre></td></tr></table>
</div>
</div><p>结果：</p>
<div class="highlight"><div class="chroma">
<table class="lntable"><tr><td class="lntd">
<pre tabindex="0" class="chroma"><code><span class="lnt">1
</span></code></pre></td>
<td class="lntd">
<pre tabindex="0" class="chroma"><code class="language-fallback" data-lang="fallback"><span class="line"><span class="cl">&lt;pandas.core.groupby.generic.DataFrameGroupBy object at 0x0000000016CE8278&gt;
</span></span></code></pre></td></tr></table>
</div>
</div><p>其实这步的本质是获取每个分组对应的主键 id 列表，可以通过<code>DataFrameGroupBy</code>对象的<code>groups</code>方法查看：</p>
<div class="highlight"><div class="chroma">
<table class="lntable"><tr><td class="lntd">
<pre tabindex="0" class="chroma"><code><span class="lnt">1
</span></code></pre></td>
<td class="lntd">
<pre tabindex="0" class="chroma"><code class="language-fallback" data-lang="fallback"><span class="line"><span class="cl">df_group.groups
</span></span></code></pre></td></tr></table>
</div>
</div><p>结果：</p>
<div class="highlight"><div class="chroma">
<table class="lntable"><tr><td class="lntd">
<pre tabindex="0" class="chroma"><code><span class="lnt">1
</span></code></pre></td>
<td class="lntd">
<pre tabindex="0" class="chroma"><code class="language-fallback" data-lang="fallback"><span class="line"><span class="cl">{&#39;2019/1/1&#39;: [0, 1, 2], &#39;2019/1/2&#39;: [3, 4, 5], &#39;2019/1/3&#39;: [6, 7]}
</span></span></code></pre></td></tr></table>
</div>
</div><p>Pandas 返回的是每个分组对应的索引列表，它等价于 MySQL 的主键 id 列表。</p>
<h3 id="select">SELECT</h3>
<p>我们拿到每个分组对应的索引列表后，就可以拿到每个分组对应的全部数据：</p>
<div class="highlight"><div class="chroma">
<table class="lntable"><tr><td class="lntd">
<pre tabindex="0" class="chroma"><code><span class="lnt">1
</span><span class="lnt">2
</span><span class="lnt">3
</span></code></pre></td>
<td class="lntd">
<pre tabindex="0" class="chroma"><code class="language-fallback" data-lang="fallback"><span class="line"><span class="cl">for deal_date, ids in df_group.groups.items():
</span></span><span class="line"><span class="cl">    print(deal_date)
</span></span><span class="line"><span class="cl">    display(data.loc[ids])
</span></span></code></pre></td></tr></table>
</div>
</div><p>结果：</p>
<p><img
        class="lazyload"
        data-src="https://mmbiz.qpic.cn/mmbiz_png/tXYict40xfLh0Ik9K1kXOEAmHWbyyibhT8xCzx6goic3G7vdOntZYBVPBmsE5MSiaxice8Yic9jQ33IoJU43u0WFInTA/640?wx_fmt=png&amp;tp=webp&amp;wxfrom=5&amp;wx_lazy=1&amp;wx_co=1"
        data-srcset="https://mmbiz.qpic.cn/mmbiz_png/tXYict40xfLh0Ik9K1kXOEAmHWbyyibhT8xCzx6goic3G7vdOntZYBVPBmsE5MSiaxice8Yic9jQ33IoJU43u0WFInTA/640?wx_fmt=png&amp;tp=webp&amp;wxfrom=5&amp;wx_lazy=1&amp;wx_co=1, https://mmbiz.qpic.cn/mmbiz_png/tXYict40xfLh0Ik9K1kXOEAmHWbyyibhT8xCzx6goic3G7vdOntZYBVPBmsE5MSiaxice8Yic9jQ33IoJU43u0WFInTA/640?wx_fmt=png&amp;tp=webp&amp;wxfrom=5&amp;wx_lazy=1&amp;wx_co=1 1.5x, https://mmbiz.qpic.cn/mmbiz_png/tXYict40xfLh0Ik9K1kXOEAmHWbyyibhT8xCzx6goic3G7vdOntZYBVPBmsE5MSiaxice8Yic9jQ33IoJU43u0WFInTA/640?wx_fmt=png&amp;tp=webp&amp;wxfrom=5&amp;wx_lazy=1&amp;wx_co=1 2x"
        data-sizes="auto"
        alt="https://mmbiz.qpic.cn/mmbiz_png/tXYict40xfLh0Ik9K1kXOEAmHWbyyibhT8xCzx6goic3G7vdOntZYBVPBmsE5MSiaxice8Yic9jQ33IoJU43u0WFInTA/640?wx_fmt=png&amp;tp=webp&amp;wxfrom=5&amp;wx_lazy=1&amp;wx_co=1"
        title="图片"
    /></p>
<p>当然，由于 Pandas 本身有现成的 API，我们实际并不会这样遍历每个分区，而是：</p>
<div class="highlight"><div class="chroma">
<table class="lntable"><tr><td class="lntd">
<pre tabindex="0" class="chroma"><code><span class="lnt">1
</span><span class="lnt">2
</span><span class="lnt">3
</span></code></pre></td>
<td class="lntd">
<pre tabindex="0" class="chroma"><code class="language-fallback" data-lang="fallback"><span class="line"><span class="cl">for deal_date, split in df_group:
</span></span><span class="line"><span class="cl">    print(deal_date)
</span></span><span class="line"><span class="cl">    display(split)
</span></span></code></pre></td></tr></table>
</div>
</div><p>这段 Pandas 遍历每个分区的本质就是上面的代码，返回结果也与上面完全相同。</p>
<p>对于 MySQL 的 select 这步：</p>
<div class="highlight"><div class="chroma">
<table class="lntable"><tr><td class="lntd">
<pre tabindex="0" class="chroma"><code><span class="lnt">1
</span><span class="lnt">2
</span><span class="lnt">3
</span><span class="lnt">4
</span><span class="lnt">5
</span></code></pre></td>
<td class="lntd">
<pre tabindex="0" class="chroma"><code class="language-fallback" data-lang="fallback"><span class="line"><span class="cl">SELECT
</span></span><span class="line"><span class="cl">  deal_date,
</span></span><span class="line"><span class="cl">  COUNT(IF(AREA= &#39;A区&#39;, 1, NULL)) &#39;A区&#39;,
</span></span><span class="line"><span class="cl">  COUNT(IF(AREA= &#39;B区&#39;, 1, NULL)) &#39;B区&#39;,
</span></span><span class="line"><span class="cl">  COUNT(IF(AREA= &#39;C区&#39;, 1, NULL)) &#39;C区&#39;
</span></span></code></pre></td></tr></table>
</div>
</div><p>由于前面分组的存在，<code>count()</code>聚合函数将作用于每一个分组，用 Pandas 表达就是：</p>
<div class="highlight"><div class="chroma">
<table class="lntable"><tr><td class="lntd">
<pre tabindex="0" class="chroma"><code><span class="lnt">1
</span><span class="lnt">2
</span><span class="lnt">3
</span><span class="lnt">4
</span><span class="lnt">5
</span><span class="lnt">6
</span><span class="lnt">7
</span><span class="lnt">8
</span></code></pre></td>
<td class="lntd">
<pre tabindex="0" class="chroma"><code class="language-fallback" data-lang="fallback"><span class="line"><span class="cl">for deal_date, split in df_group:
</span></span><span class="line"><span class="cl">    split.loc[split.area == &#39;A区&#39;, &#39;A区&#39;] = split.order_id
</span></span><span class="line"><span class="cl">    split.loc[split.area == &#39;B区&#39;, &#39;B区&#39;] = split.order_id
</span></span><span class="line"><span class="cl">    split.loc[split.area == &#39;C区&#39;, &#39;C区&#39;] = split.order_id
</span></span><span class="line"><span class="cl">    split = split.set_index(&#39;deal_date&#39;)
</span></span><span class="line"><span class="cl">    split = split[[&#39;A区&#39;, &#39;B区&#39;, &#39;C区&#39;]]
</span></span><span class="line"><span class="cl">    display(split)
</span></span><span class="line"><span class="cl">    display(split.count().to_frame(deal_date).T)
</span></span></code></pre></td></tr></table>
</div>
</div><p>结果：</p>
<p><img
        class="lazyload"
        data-src="https://mmbiz.qpic.cn/mmbiz_png/tXYict40xfLh0Ik9K1kXOEAmHWbyyibhT8d5OLZYIWhZbSNFpOGk9oib0nvx0tCmm365ddanTpGelTD3TbbVA5PGA/640?wx_fmt=png&amp;tp=webp&amp;wxfrom=5&amp;wx_lazy=1&amp;wx_co=1"
        data-srcset="https://mmbiz.qpic.cn/mmbiz_png/tXYict40xfLh0Ik9K1kXOEAmHWbyyibhT8d5OLZYIWhZbSNFpOGk9oib0nvx0tCmm365ddanTpGelTD3TbbVA5PGA/640?wx_fmt=png&amp;tp=webp&amp;wxfrom=5&amp;wx_lazy=1&amp;wx_co=1, https://mmbiz.qpic.cn/mmbiz_png/tXYict40xfLh0Ik9K1kXOEAmHWbyyibhT8d5OLZYIWhZbSNFpOGk9oib0nvx0tCmm365ddanTpGelTD3TbbVA5PGA/640?wx_fmt=png&amp;tp=webp&amp;wxfrom=5&amp;wx_lazy=1&amp;wx_co=1 1.5x, https://mmbiz.qpic.cn/mmbiz_png/tXYict40xfLh0Ik9K1kXOEAmHWbyyibhT8d5OLZYIWhZbSNFpOGk9oib0nvx0tCmm365ddanTpGelTD3TbbVA5PGA/640?wx_fmt=png&amp;tp=webp&amp;wxfrom=5&amp;wx_lazy=1&amp;wx_co=1 2x"
        data-sizes="auto"
        alt="https://mmbiz.qpic.cn/mmbiz_png/tXYict40xfLh0Ik9K1kXOEAmHWbyyibhT8d5OLZYIWhZbSNFpOGk9oib0nvx0tCmm365ddanTpGelTD3TbbVA5PGA/640?wx_fmt=png&amp;tp=webp&amp;wxfrom=5&amp;wx_lazy=1&amp;wx_co=1"
        title="图片"
    /></p>
<h3 id="return">Return</h3>
<p>最后 MySQL 计算完成后，就会合并每个分组的结果集，用 Pandas 表达就是：</p>
<div class="highlight"><div class="chroma">
<table class="lntable"><tr><td class="lntd">
<pre tabindex="0" class="chroma"><code><span class="lnt"> 1
</span><span class="lnt"> 2
</span><span class="lnt"> 3
</span><span class="lnt"> 4
</span><span class="lnt"> 5
</span><span class="lnt"> 6
</span><span class="lnt"> 7
</span><span class="lnt"> 8
</span><span class="lnt"> 9
</span><span class="lnt">10
</span></code></pre></td>
<td class="lntd">
<pre tabindex="0" class="chroma"><code class="language-fallback" data-lang="fallback"><span class="line"><span class="cl">result = []
</span></span><span class="line"><span class="cl">for deal_date, split in df_group:
</span></span><span class="line"><span class="cl">    split.loc[split.area == &#39;A区&#39;, &#39;A区&#39;] = split.order_id
</span></span><span class="line"><span class="cl">    split.loc[split.area == &#39;B区&#39;, &#39;B区&#39;] = split.order_id
</span></span><span class="line"><span class="cl">    split.loc[split.area == &#39;C区&#39;, &#39;C区&#39;] = split.order_id
</span></span><span class="line"><span class="cl">    split = split.set_index(&#39;deal_date&#39;)
</span></span><span class="line"><span class="cl">    split = split[[&#39;A区&#39;, &#39;B区&#39;, &#39;C区&#39;]]
</span></span><span class="line"><span class="cl">    result.append(split.count().to_frame(deal_date).T)
</span></span><span class="line"><span class="cl">result = pd.concat(result)
</span></span><span class="line"><span class="cl">result
</span></span></code></pre></td></tr></table>
</div>
</div><p>结果：</p>
<table>
<thead>
<tr>
<th style="text-align:center"></th>
<th style="text-align:center">A 区</th>
<th style="text-align:center">B 区</th>
<th style="text-align:center">C 区</th>
</tr>
</thead>
<tbody>
<tr>
<td style="text-align:center">2019/1/1</td>
<td style="text-align:center">1</td>
<td style="text-align:center">1</td>
<td style="text-align:center">1</td>
</tr>
<tr>
<td style="text-align:center">2019/1/2</td>
<td style="text-align:center">1</td>
<td style="text-align:center">1</td>
<td style="text-align:center">1</td>
</tr>
<tr>
<td style="text-align:center">2019/1/3</td>
<td style="text-align:center">1</td>
<td style="text-align:center">0</td>
<td style="text-align:center">1</td>
</tr>
</tbody>
</table>
<h2 id="pandas-分组聚合的执行过程">Pandas 分组聚合的执行过程</h2>
<p>对于上面完整 MySQL 语句，整体执行流程等价于 Pandas 的：</p>
<div class="highlight"><div class="chroma">
<table class="lntable"><tr><td class="lntd">
<pre tabindex="0" class="chroma"><code><span class="lnt">1
</span><span class="lnt">2
</span><span class="lnt">3
</span><span class="lnt">4
</span><span class="lnt">5
</span><span class="lnt">6
</span><span class="lnt">7
</span><span class="lnt">8
</span></code></pre></td>
<td class="lntd">
<pre tabindex="0" class="chroma"><code class="language-fallback" data-lang="fallback"><span class="line"><span class="cl">def group_func(split):
</span></span><span class="line"><span class="cl">    split.loc[split.area == &#39;A区&#39;, &#39;A区&#39;] = split.order_id
</span></span><span class="line"><span class="cl">    split.loc[split.area == &#39;B区&#39;, &#39;B区&#39;] = split.order_id
</span></span><span class="line"><span class="cl">    split.loc[split.area == &#39;C区&#39;, &#39;C区&#39;] = split.order_id
</span></span><span class="line"><span class="cl">    split = split[[&#39;A区&#39;, &#39;B区&#39;, &#39;C区&#39;]]
</span></span><span class="line"><span class="cl">    return split.count()
</span></span><span class="line"><span class="cl">
</span></span><span class="line"><span class="cl">data.groupby(&#39;deal_date&#39;, as_index=False).apply(group_func)
</span></span></code></pre></td></tr></table>
</div>
</div><h2 id="python-演示分组的具体原理">Python 演示分组的具体原理</h2>
<p>上面的演示中：</p>
<div class="highlight"><div class="chroma">
<table class="lntable"><tr><td class="lntd">
<pre tabindex="0" class="chroma"><code><span class="lnt">1
</span></code></pre></td>
<td class="lntd">
<pre tabindex="0" class="chroma"><code class="language-fallback" data-lang="fallback"><span class="line"><span class="cl">data.groupby(&#34;deal_date&#34;).groups
</span></span></code></pre></td></tr></table>
</div>
</div><p>结果：</p>
<div class="highlight"><div class="chroma">
<table class="lntable"><tr><td class="lntd">
<pre tabindex="0" class="chroma"><code><span class="lnt">1
</span></code></pre></td>
<td class="lntd">
<pre tabindex="0" class="chroma"><code class="language-fallback" data-lang="fallback"><span class="line"><span class="cl">{&#39;2019/1/1&#39;: [0, 1, 2], &#39;2019/1/2&#39;: [3, 4, 5], &#39;2019/1/3&#39;: [6, 7]}
</span></span></code></pre></td></tr></table>
</div>
</div><p>可以看到 Pandas 和 MySQL 分组这步其实都是计算出了每个分组对应的主键 id（索引 id）。但它们具体又是怎么实现的呢？</p>
<p>这时候，我用纯 python 来给大家演示一下。</p>
<p>不管是 MySQL 还是 Pandas，都带有主键索引，只不过 Pandas 的索引不会因为重复而报错，而 MySQL 的索引是肯定唯一的，会覆盖前面索引相同的数据。</p>
<p>虽然 MySQL 将带有索引的数据存储到了磁盘上面，但为了方便，我只在内存上演示索引构建的过程。另外 MySQL 主键索引的数据结构一般是 B+树，这里我用 hash 表（字典）来简单演示。</p>
<p>首先，读取数据并构建索引：</p>
<div class="highlight"><div class="chroma">
<table class="lntable"><tr><td class="lntd">
<pre tabindex="0" class="chroma"><code><span class="lnt"> 1
</span><span class="lnt"> 2
</span><span class="lnt"> 3
</span><span class="lnt"> 4
</span><span class="lnt"> 5
</span><span class="lnt"> 6
</span><span class="lnt"> 7
</span><span class="lnt"> 8
</span><span class="lnt"> 9
</span><span class="lnt">10
</span><span class="lnt">11
</span><span class="lnt">12
</span></code></pre></td>
<td class="lntd">
<pre tabindex="0" class="chroma"><code class="language-gdscript3" data-lang="gdscript3"><span class="line"><span class="cl"><span class="n">import</span> <span class="n">csv</span>
</span></span><span class="line"><span class="cl">
</span></span><span class="line"><span class="cl"><span class="n">data</span> <span class="o">=</span> <span class="p">{}</span>
</span></span><span class="line"><span class="cl"><span class="n">columns</span> <span class="o">=</span> <span class="n">None</span>
</span></span><span class="line"><span class="cl"><span class="n">with</span> <span class="n">open</span><span class="p">(</span><span class="s2">&#34;data.csv&#34;</span><span class="p">,</span> <span class="n">encoding</span><span class="o">=</span><span class="s2">&#34;gb18030&#34;</span><span class="p">)</span> <span class="n">as</span> <span class="n">f</span><span class="p">:</span>
</span></span><span class="line"><span class="cl">    <span class="n">f_csv</span> <span class="o">=</span> <span class="n">csv</span><span class="o">.</span><span class="n">reader</span><span class="p">(</span><span class="n">f</span><span class="p">)</span>
</span></span><span class="line"><span class="cl">    <span class="n">columns</span> <span class="o">=</span> <span class="n">next</span><span class="p">(</span><span class="n">f_csv</span><span class="p">)</span>
</span></span><span class="line"><span class="cl">    <span class="n">columns</span> <span class="o">=</span> <span class="n">dict</span><span class="p">(</span><span class="n">zip</span><span class="p">(</span><span class="n">columns</span><span class="p">,</span> <span class="nb">range</span><span class="p">(</span><span class="n">len</span><span class="p">(</span><span class="n">columns</span><span class="p">))))</span>
</span></span><span class="line"><span class="cl">    <span class="k">for</span> <span class="n">i</span><span class="p">,</span> <span class="n">row</span> <span class="ow">in</span> <span class="n">enumerate</span><span class="p">(</span><span class="n">f_csv</span><span class="p">):</span>
</span></span><span class="line"><span class="cl">        <span class="n">data</span><span class="p">[</span><span class="n">i</span><span class="p">]</span> <span class="o">=</span> <span class="n">row</span>
</span></span><span class="line"><span class="cl"><span class="nb">print</span><span class="p">(</span><span class="n">columns</span><span class="p">)</span>
</span></span><span class="line"><span class="cl"><span class="n">display</span><span class="p">(</span><span class="n">data</span><span class="p">)</span>
</span></span></code></pre></td></tr></table>
</div>
</div><p>结果：</p>
<div class="highlight"><div class="chroma">
<table class="lntable"><tr><td class="lntd">
<pre tabindex="0" class="chroma"><code><span class="lnt">1
</span><span class="lnt">2
</span><span class="lnt">3
</span><span class="lnt">4
</span><span class="lnt">5
</span><span class="lnt">6
</span><span class="lnt">7
</span><span class="lnt">8
</span><span class="lnt">9
</span></code></pre></td>
<td class="lntd">
<pre tabindex="0" class="chroma"><code class="language-fallback" data-lang="fallback"><span class="line"><span class="cl">{&#39;order_id&#39;: 0, &#39;price&#39;: 1, &#39;deal_date&#39;: 2, &#39;area&#39;: 3}
</span></span><span class="line"><span class="cl">{0: [&#39;S001&#39;, &#39;10&#39;, &#39;2019/1/1&#39;, &#39;A区&#39;],
</span></span><span class="line"><span class="cl"> 1: [&#39;S002&#39;, &#39;20&#39;, &#39;2019/1/1&#39;, &#39;B区&#39;],
</span></span><span class="line"><span class="cl"> 2: [&#39;S003&#39;, &#39;30&#39;, &#39;2019/1/1&#39;, &#39;C区&#39;],
</span></span><span class="line"><span class="cl"> 3: [&#39;S004&#39;, &#39;40&#39;, &#39;2019/1/2&#39;, &#39;A区&#39;],
</span></span><span class="line"><span class="cl"> 4: [&#39;S005&#39;, &#39;10&#39;, &#39;2019/1/2&#39;, &#39;B区&#39;],
</span></span><span class="line"><span class="cl"> 5: [&#39;S006&#39;, &#39;20&#39;, &#39;2019/1/2&#39;, &#39;C区&#39;],
</span></span><span class="line"><span class="cl"> 6: [&#39;S007&#39;, &#39;30&#39;, &#39;2019/1/3&#39;, &#39;A区&#39;],
</span></span><span class="line"><span class="cl"> 7: [&#39;S008&#39;, &#39;40&#39;, &#39;2019/1/3&#39;, &#39;C区&#39;]}
</span></span></code></pre></td></tr></table>
</div>
</div><p>这样我们就读取数据并构建了主键索引，以及表的列名元信息。</p>
<p>下面我们开始实现分组：</p>
<div class="highlight"><div class="chroma">
<table class="lntable"><tr><td class="lntd">
<pre tabindex="0" class="chroma"><code><span class="lnt">1
</span><span class="lnt">2
</span><span class="lnt">3
</span><span class="lnt">4
</span><span class="lnt">5
</span><span class="lnt">6
</span><span class="lnt">7
</span><span class="lnt">8
</span></code></pre></td>
<td class="lntd">
<pre tabindex="0" class="chroma"><code class="language-fallback" data-lang="fallback"><span class="line"><span class="cl"># 获取分组数据所在的列
</span></span><span class="line"><span class="cl">group_num = columns[&#39;deal_date&#39;]
</span></span><span class="line"><span class="cl">id_groups = {}
</span></span><span class="line"><span class="cl">for index, row in data.items():
</span></span><span class="line"><span class="cl">    group_key = row[group_num]
</span></span><span class="line"><span class="cl">    ids = id_groups.setdefault(group_key, [])
</span></span><span class="line"><span class="cl">    ids.append(index)
</span></span><span class="line"><span class="cl">id_groups
</span></span></code></pre></td></tr></table>
</div>
</div><p>结果：</p>
<div class="highlight"><div class="chroma">
<table class="lntable"><tr><td class="lntd">
<pre tabindex="0" class="chroma"><code><span class="lnt">1
</span></code></pre></td>
<td class="lntd">
<pre tabindex="0" class="chroma"><code class="language-fallback" data-lang="fallback"><span class="line"><span class="cl">{&#39;2019/1/1&#39;: [0, 1, 2], &#39;2019/1/2&#39;: [3, 4, 5], &#39;2019/1/3&#39;: [6, 7]}
</span></span></code></pre></td></tr></table>
</div>
</div><p>最后完成聚合计算：</p>
<div class="highlight"><div class="chroma">
<table class="lntable"><tr><td class="lntd">
<pre tabindex="0" class="chroma"><code><span class="lnt"> 1
</span><span class="lnt"> 2
</span><span class="lnt"> 3
</span><span class="lnt"> 4
</span><span class="lnt"> 5
</span><span class="lnt"> 6
</span><span class="lnt"> 7
</span><span class="lnt"> 8
</span><span class="lnt"> 9
</span><span class="lnt">10
</span><span class="lnt">11
</span><span class="lnt">12
</span></code></pre></td>
<td class="lntd">
<pre tabindex="0" class="chroma"><code class="language-fallback" data-lang="fallback"><span class="line"><span class="cl">result = {}
</span></span><span class="line"><span class="cl">for deal_date, ids in id_groups.items():
</span></span><span class="line"><span class="cl">    areas = result.setdefault(deal_date, [0, 0, 0])
</span></span><span class="line"><span class="cl">    for index in ids:
</span></span><span class="line"><span class="cl">        area = data[index][columns[&#39;area&#39;]]
</span></span><span class="line"><span class="cl">        if area == &#39;A区&#39;:
</span></span><span class="line"><span class="cl">            areas[0] += 1
</span></span><span class="line"><span class="cl">        elif area == &#39;B区&#39;:
</span></span><span class="line"><span class="cl">            areas[1] += 1
</span></span><span class="line"><span class="cl">        elif area == &#39;C区&#39;:
</span></span><span class="line"><span class="cl">            areas[2] += 1
</span></span><span class="line"><span class="cl">result
</span></span></code></pre></td></tr></table>
</div>
</div><p>结果：</p>
<div class="highlight"><div class="chroma">
<table class="lntable"><tr><td class="lntd">
<pre tabindex="0" class="chroma"><code><span class="lnt">1
</span></code></pre></td>
<td class="lntd">
<pre tabindex="0" class="chroma"><code class="language-fallback" data-lang="fallback"><span class="line"><span class="cl">{&#39;2019/1/1&#39;: [1, 1, 1], &#39;2019/1/2&#39;: [1, 1, 1], &#39;2019/1/3&#39;: [1, 0, 1]}
</span></span></code></pre></td></tr></table>
</div>
</div><p>借助 Pandas 展示一下最终结果：</p>
<div class="highlight"><div class="chroma">
<table class="lntable"><tr><td class="lntd">
<pre tabindex="0" class="chroma"><code><span class="lnt">1
</span></code></pre></td>
<td class="lntd">
<pre tabindex="0" class="chroma"><code class="language-fallback" data-lang="fallback"><span class="line"><span class="cl">pd.DataFrame.from_dict(result, &#39;index&#39;, columns=[&#34;A区&#34;, &#34;B区&#34;, &#34;C区&#34;])
</span></span></code></pre></td></tr></table>
</div>
</div>]]></content:encoded></item><item><title>stargazer：R 语言输出统计表</title><link>https://philohao.com/2018/10/20181030/</link><pubDate>Tue, 30 Oct 2018 16:20:03 +0800</pubDate><dc:creator>Jianfeng.Hao</dc:creator><author>haojianfeng1997@gmail.com (Jianfeng.Hao)</author><guid isPermaLink="true">https://philohao.com/2018/10/20181030/</guid><description>stargazer 包使用笔记，记录 R 模型结果和统计表的输出方法。</description><content:encoded><![CDATA[<center>
    <i>
    	使用 stargazer 可以将 R 构建的模型结果以 LATEX 、 HTML 和 ASCII 格式输出，方便我们生成标准格式的表格<br />
    	结合 rmarkdown 来进行使用，会使我们优雅地写出一篇拥有期刊级统计表的文章
    </i>
</center>
<h2 id="简介">简介</h2>
<p>R 包 <code>stargazer</code> 可以将 <strong>数据统计汇总</strong> （格式可以为数据框、向量和矩阵等）和 <strong>统计模型结果</strong> 输出为标准统计表格式的 <code>LATEX</code> 、<code>HTML</code> 和 <code>ASCII</code> 格式的字符文本，<em><strong>将其复制到对应的软件中</strong></em> 即可生成标准的统计表，当然也可以配合 <code>rmarkdown</code> 使用直接渲染输出为表格，更加方便直接。</p>
<h2 id="安装及加载">安装及加载</h2>
<p>可以使用常规方法导入 <code>stargazer</code> 包：</p>
<div class="highlight"><div class="chroma">
<table class="lntable"><tr><td class="lntd">
<pre tabindex="0" class="chroma"><code><span class="lnt">1
</span><span class="lnt">2
</span></code></pre></td>
<td class="lntd">
<pre tabindex="0" class="chroma"><code class="language-R" data-lang="R"><span class="line"><span class="cl"><span class="nf">install.packages</span><span class="p">(</span><span class="s">&#34;stargazer&#34;</span><span class="p">)</span>
</span></span><span class="line"><span class="cl"><span class="nf">library</span><span class="p">(</span><span class="n">stargazer</span><span class="p">)</span>
</span></span></code></pre></td></tr></table>
</div>
</div><div class="note info"><p>
stargazer 包的输出结果是相应格式的，例如输出 LATEX 格式，可以直接将结果粘贴进在线编辑器 [Overleaf](https://www.overleaf.com) 中输出表格。下文直接将结果以对应表格的形式展示。
</p></div>
<hr>
<h2 id="数据统计汇总">数据统计汇总</h2>
<h3 id="统计汇总数据">统计汇总数据</h3>
<p>如果要展示数据集的基本描述性分析数据（由 R 函数 <code>summary</code> 得到），可以使用以下命令直接得到：</p>
<div class="highlight"><div class="chroma">
<table class="lntable"><tr><td class="lntd">
<pre tabindex="0" class="chroma"><code><span class="lnt">1
</span></code></pre></td>
<td class="lntd">
<pre tabindex="0" class="chroma"><code class="language-R" data-lang="R"><span class="line"><span class="cl"><span class="nf">stargazer</span><span class="p">(</span><span class="n">attitude</span><span class="p">)</span>
</span></span></code></pre></td></tr></table>
</div>
</div><p><img
        class="lazyload"
        data-src="https://static.datartisan.com/upload/attachment/2016/12/qCOW9qcQ.png"
        data-srcset="https://static.datartisan.com/upload/attachment/2016/12/qCOW9qcQ.png, https://static.datartisan.com/upload/attachment/2016/12/qCOW9qcQ.png 1.5x, https://static.datartisan.com/upload/attachment/2016/12/qCOW9qcQ.png 2x"
        data-sizes="auto"
        alt="https://static.datartisan.com/upload/attachment/2016/12/qCOW9qcQ.png"
        title="统计汇总数据"
    /></p>
<h3 id="原始数据展示">原始数据展示</h3>
<p>如果想输出某些数据框的特定行的原始内容，需要指定要查看的数据框的一部分，并将设置参数 <code>summary = FALSE</code>, 如下所示：</p>
<div class="highlight"><div class="chroma">
<table class="lntable"><tr><td class="lntd">
<pre tabindex="0" class="chroma"><code><span class="lnt">1
</span><span class="lnt">2
</span></code></pre></td>
<td class="lntd">
<pre tabindex="0" class="chroma"><code class="language-R" data-lang="R"><span class="line"><span class="cl"><span class="nf">data</span><span class="p">(</span><span class="s">&#34;attitude&#34;</span><span class="p">)</span>
</span></span><span class="line"><span class="cl"><span class="nf">stargazer</span><span class="p">(</span><span class="n">attitude[1</span><span class="o">:</span><span class="m">4</span><span class="p">,</span><span class="n">]</span><span class="p">,</span> <span class="n">summary</span> <span class="o">=</span> <span class="kc">FALSE</span><span class="p">,</span> <span class="n">rownames</span> <span class="o">=</span> <span class="kc">TRUE</span><span class="p">)</span>
</span></span></code></pre></td></tr></table>
</div>
</div><p><img
        class="lazyload"
        data-src="https://static.datartisan.com/upload/attachment/2016/12/O5nnDnBH.png"
        data-srcset="https://static.datartisan.com/upload/attachment/2016/12/O5nnDnBH.png, https://static.datartisan.com/upload/attachment/2016/12/O5nnDnBH.png 1.5x, https://static.datartisan.com/upload/attachment/2016/12/O5nnDnBH.png 2x"
        data-sizes="auto"
        alt="https://static.datartisan.com/upload/attachment/2016/12/O5nnDnBH.png"
        title="展示数据集"
    /></p>
<p>可以看到，<code>attitude</code> 数据集中包括 <code>rating</code>、<code>complaints</code> 等多个变量，数据展示形式为 <strong>三线表</strong> 。</p>
<h3 id="列联表">列联表</h3>
<p><code>stargazer </code> 也可以用来展示向量、矩阵或者数据框的内容。在这里我们建立了 <code>attitude</code> 数据集中变量 <code>rating</code>、<code>complaints</code>、<code>privileges</code> 的相关系数矩阵，并予以展示：</p>
<div class="highlight"><div class="chroma">
<table class="lntable"><tr><td class="lntd">
<pre tabindex="0" class="chroma"><code><span class="lnt">1
</span><span class="lnt">2
</span></code></pre></td>
<td class="lntd">
<pre tabindex="0" class="chroma"><code class="language-R" data-lang="R"><span class="line"><span class="cl"><span class="n">correlation.matrix</span> <span class="o">&lt;-</span> <span class="nf">cor</span><span class="p">(</span><span class="n">attitude[</span><span class="p">,</span><span class="nf">c</span><span class="p">(</span><span class="s">&#34;rating&#34;</span><span class="p">,</span> <span class="s">&#34;complaints&#34;</span><span class="p">,</span> <span class="s">&#34;privileges&#34;</span><span class="p">)</span><span class="n">]</span><span class="p">)</span>
</span></span><span class="line"><span class="cl"><span class="nf">stargazer</span><span class="p">(</span><span class="n">correlation.matrix</span><span class="p">,</span> <span class="n">title</span> <span class="o">=</span> <span class="s">&#34;Correlation Matrix&#34;</span><span class="p">)</span>
</span></span></code></pre></td></tr></table>
</div>
</div><p><img
        class="lazyload"
        data-src="https://static.datartisan.com/upload/attachment/2016/12/cC2cR66K.png"
        data-srcset="https://static.datartisan.com/upload/attachment/2016/12/cC2cR66K.png, https://static.datartisan.com/upload/attachment/2016/12/cC2cR66K.png 1.5x, https://static.datartisan.com/upload/attachment/2016/12/cC2cR66K.png 2x"
        data-sizes="auto"
        alt="https://static.datartisan.com/upload/attachment/2016/12/cC2cR66K.png"
        title="矩阵展示"
    /></p>
<hr>
<h2 id="统计模型结果">统计模型结果</h2>
<h3 id="回归表">回归表</h3>
<p>在 R 中可以很方便的使用 <code>lm()</code> 和 <code>glm()</code> 函数来构建回归模型，我们同样可以在同一张表中对这些模型进行比较，参数 <code>title</code> 用来设定表的标题，参数 <code>align</code> 使每列中的系数沿小数点对齐：</p>
<div class="highlight"><div class="chroma">
<table class="lntable"><tr><td class="lntd">
<pre tabindex="0" class="chroma"><code><span class="lnt"> 1
</span><span class="lnt"> 2
</span><span class="lnt"> 3
</span><span class="lnt"> 4
</span><span class="lnt"> 5
</span><span class="lnt"> 6
</span><span class="lnt"> 7
</span><span class="lnt"> 8
</span><span class="lnt"> 9
</span><span class="lnt">10
</span></code></pre></td>
<td class="lntd">
<pre tabindex="0" class="chroma"><code class="language-R" data-lang="R"><span class="line"><span class="cl"><span class="c1">## 构建两个线性回归模型</span>
</span></span><span class="line"><span class="cl"><span class="n">linear.1</span> <span class="o">&lt;-</span> <span class="nf">lm</span><span class="p">(</span><span class="n">rating</span> <span class="o">~</span> <span class="n">complaints</span> <span class="o">+</span> <span class="n">privileges</span> <span class="o">+</span> <span class="n">learning</span> <span class="o">+</span> <span class="n">raises</span> <span class="o">+</span> <span class="n">critical</span><span class="p">,</span>
</span></span><span class="line"><span class="cl"><span class="n">data</span> <span class="o">=</span> <span class="n">attitude</span><span class="p">)</span>
</span></span><span class="line"><span class="cl"><span class="n">linear.2</span> <span class="o">&lt;-</span> <span class="nf">lm</span><span class="p">(</span><span class="n">rating</span> <span class="o">~</span> <span class="n">complaints</span> <span class="o">+</span> <span class="n">privileges</span> <span class="o">+</span> <span class="n">learning</span><span class="p">,</span> <span class="n">data</span> <span class="o">=</span> <span class="n">attitude</span><span class="p">)</span>
</span></span><span class="line"><span class="cl"><span class="c1">## 构建一个 probit 模型</span>
</span></span><span class="line"><span class="cl"><span class="n">attitude</span><span class="o">$</span><span class="n">high.rating</span> <span class="o">&lt;-</span> <span class="p">(</span><span class="n">attitude</span><span class="o">$</span><span class="n">rating</span> <span class="o">&gt;</span> <span class="m">70</span><span class="p">)</span>
</span></span><span class="line"><span class="cl"><span class="n">probit.model</span> <span class="o">&lt;-</span> <span class="nf">glm</span><span class="p">(</span><span class="n">high.rating</span> <span class="o">~</span> <span class="n">learning</span> <span class="o">+</span> <span class="n">critical</span> <span class="o">+</span> <span class="n">advance</span><span class="p">,</span> <span class="n">data</span> <span class="o">=</span> <span class="n">attitude</span><span class="p">,</span>
</span></span><span class="line"><span class="cl"><span class="n">family</span> <span class="o">=</span> <span class="nf">binomial</span><span class="p">(</span><span class="n">link</span> <span class="o">=</span> <span class="s">&#34;probit&#34;</span><span class="p">))</span>
</span></span><span class="line"><span class="cl">
</span></span><span class="line"><span class="cl"><span class="nf">stargazer</span><span class="p">(</span><span class="n">linear.1</span><span class="p">,</span> <span class="n">linear.2</span><span class="p">,</span> <span class="n">probit.model</span><span class="p">,</span> <span class="n">title</span> <span class="o">=</span> <span class="s">&#34;Results&#34;</span><span class="p">,</span> <span class="n">align</span> <span class="o">=</span> <span class="kc">TRUE</span><span class="p">)</span>
</span></span></code></pre></td></tr></table>
</div>
</div><p><img
        class="lazyload"
        data-src="https://static.datartisan.com/upload/attachment/2016/12/Cyqs7adZ.png"
        data-srcset="https://static.datartisan.com/upload/attachment/2016/12/Cyqs7adZ.png, https://static.datartisan.com/upload/attachment/2016/12/Cyqs7adZ.png 1.5x, https://static.datartisan.com/upload/attachment/2016/12/Cyqs7adZ.png 2x"
        data-sizes="auto"
        alt="https://static.datartisan.com/upload/attachment/2016/12/Cyqs7adZ.png"
        title="回归表"
    /></p>
<h4 id="回归表的修饰">回归表的修饰</h4>
<p>为了使表格更加标准，我们还可以通过调整参数进行以下操作：</p>
<ul>
<li>删除表中的空白行：<code>no.space</code></li>
<li>移除不关心的统计量：<code>omit.stat</code></li>
<li>修改因变量和自变量的名称：<code>dep.var.labels</code> 、 <code>covariate.labels</code></li>
</ul>
<div class="highlight"><div class="chroma">
<table class="lntable"><tr><td class="lntd">
<pre tabindex="0" class="chroma"><code><span class="lnt">1
</span><span class="lnt">2
</span><span class="lnt">3
</span><span class="lnt">4
</span><span class="lnt">5
</span></code></pre></td>
<td class="lntd">
<pre tabindex="0" class="chroma"><code class="language-R" data-lang="R"><span class="line"><span class="cl"><span class="nf">stargazer</span><span class="p">(</span><span class="n">linear.1</span><span class="p">,</span> <span class="n">linear.2</span><span class="p">,</span> <span class="n">probit.model</span><span class="p">,</span> <span class="n">title</span> <span class="o">=</span> <span class="s">&#34;Regression Results&#34;</span><span class="p">,</span>
</span></span><span class="line"><span class="cl"><span class="n">align</span> <span class="o">=</span> <span class="kc">TRUE</span><span class="p">,</span> <span class="n">dep.var.labels</span> <span class="o">=</span> <span class="nf">c</span><span class="p">(</span><span class="s">&#34;Overall Rating&#34;</span><span class="p">,</span><span class="s">&#34;High Rating&#34;</span><span class="p">),</span>
</span></span><span class="line"><span class="cl"><span class="n">covariate.labels</span> <span class="o">=</span> <span class="nf">c</span><span class="p">(</span><span class="s">&#34;Handling of Complaints&#34;</span><span class="p">,</span> <span class="s">&#34;No Special Privileges&#34;</span><span class="p">,</span>
</span></span><span class="line"><span class="cl"><span class="s">&#34;Opportunity to Learn&#34;</span><span class="p">,</span> <span class="s">&#34;Performance-Based Raises&#34;</span><span class="p">,</span> <span class="s">&#34;Too Critical&#34;</span><span class="p">,</span><span class="s">&#34;Advancement&#34;</span><span class="p">),</span>
</span></span><span class="line"><span class="cl"><span class="n">omit.stat</span> <span class="o">=</span> <span class="nf">c</span><span class="p">(</span><span class="s">&#34;LL&#34;</span><span class="p">,</span> <span class="s">&#34;ser&#34;</span><span class="p">,</span> <span class="s">&#34;f&#34;</span><span class="p">),</span> <span class="n">no.space</span> <span class="o">=</span> <span class="kc">TRUE</span><span class="p">)</span>
</span></span></code></pre></td></tr></table>
</div>
</div><p><img
        class="lazyload"
        data-src="https://static.datartisan.com/upload/attachment/2016/12/xHMLhXyR.png"
        data-srcset="https://static.datartisan.com/upload/attachment/2016/12/xHMLhXyR.png, https://static.datartisan.com/upload/attachment/2016/12/xHMLhXyR.png 1.5x, https://static.datartisan.com/upload/attachment/2016/12/xHMLhXyR.png 2x"
        data-sizes="auto"
        alt="https://static.datartisan.com/upload/attachment/2016/12/xHMLhXyR.png"
        title="回归表的修饰"
    /></p>
<p>本例中对原表格做了以下修改：</p>
<blockquote>
<ol>
<li>
<p>使用 <code>dep.var.labels</code> 和 <code>covariate.lables</code> 参数分别将因变量和自变量重命名为容易理解的形式；</p>
</li>
<li>
<p>使用 <code>omit.stat</code> 参数移除对数似然比（<code>&quot;LL&quot;</code>）、标准化残差（<code>&quot;ser&quot;</code>）和 F 统计量（<code>&quot;f&quot;</code>）；</p>
</li>
<li>
<p>使用<code>no.space</code>参数将输出表格中的空行删去。</p>
</li>
</ol>
</blockquote>
<h4 id="展示置信区间">展示置信区间</h4>
<ul>
<li>设置是否展示置信区间：<code>ci</code></li>
<li>设置置信区间的置信度：<code>ci.level</code></li>
<li>使回归系数与置信区间并排展示：<code>single.row</code></li>
</ul>
<div class="highlight"><div class="chroma">
<table class="lntable"><tr><td class="lntd">
<pre tabindex="0" class="chroma"><code><span class="lnt">1
</span><span class="lnt">2
</span><span class="lnt">3
</span><span class="lnt">4
</span><span class="lnt">5
</span></code></pre></td>
<td class="lntd">
<pre tabindex="0" class="chroma"><code class="language-R" data-lang="R"><span class="line"><span class="cl"><span class="nf">stargazer</span><span class="p">(</span><span class="n">linear.1</span><span class="p">,</span> <span class="n">linear.2</span><span class="p">,</span> <span class="n">title</span> <span class="o">=</span> <span class="s">&#34;Regression Results&#34;</span><span class="p">,</span>
</span></span><span class="line"><span class="cl"><span class="n">dep.var.labels</span> <span class="o">=</span> <span class="nf">c</span><span class="p">(</span><span class="s">&#34;Overall Rating&#34;</span><span class="p">,</span> <span class="s">&#34;High Rating&#34;</span><span class="p">),</span>
</span></span><span class="line"><span class="cl"><span class="n">covariate.labels</span> <span class="o">=</span> <span class="nf">c</span><span class="p">(</span><span class="s">&#34;Handling of Complaints&#34;</span><span class="p">,</span> <span class="s">&#34;No Special Privileges&#34;</span><span class="p">,</span>
</span></span><span class="line"><span class="cl"><span class="s">&#34;Opportunity to Learn&#34;</span><span class="p">,</span> <span class="s">&#34;Performance-Based Raises&#34;</span><span class="p">,</span> <span class="s">&#34;Too Critical&#34;</span><span class="p">,</span> <span class="s">&#34;Advancement&#34;</span><span class="p">),</span>
</span></span><span class="line"><span class="cl"><span class="n">omit.stat</span> <span class="o">=</span> <span class="nf">c</span><span class="p">(</span><span class="s">&#34;LL&#34;</span><span class="p">,</span><span class="s">&#34;ser&#34;</span><span class="p">,</span><span class="s">&#34;f&#34;</span><span class="p">),</span> <span class="n">ci</span> <span class="o">=</span> <span class="kc">TRUE</span><span class="p">,</span> <span class="n">ci.level</span> <span class="o">=</span> <span class="m">0.90</span><span class="p">,</span> <span class="n">single.row</span> <span class="o">=</span> <span class="kc">TRUE</span><span class="p">)</span>
</span></span></code></pre></td></tr></table>
</div>
</div><p><img
        class="lazyload"
        data-src="https://static.datartisan.com/upload/attachment/2016/12/rUe11IM4.png"
        data-srcset="https://static.datartisan.com/upload/attachment/2016/12/rUe11IM4.png, https://static.datartisan.com/upload/attachment/2016/12/rUe11IM4.png 1.5x, https://static.datartisan.com/upload/attachment/2016/12/rUe11IM4.png 2x"
        data-sizes="auto"
        alt="https://static.datartisan.com/upload/attachment/2016/12/rUe11IM4.png"
        title="展示置信区间"
    /></p>
<h4 id="其他修饰功能">其他修饰功能</h4>
<blockquote>
<p>控制自变量展示的顺序：<code>order</code>
控制要展示的统计量：<code>keep.stat</code> , <code>keep.stat = &quot;n&quot;</code> 即只展示样本量的大小，并移除其他统计量</p>
</blockquote>
<div class="highlight"><div class="chroma">
<table class="lntable"><tr><td class="lntd">
<pre tabindex="0" class="chroma"><code><span class="lnt">1
</span><span class="lnt">2
</span><span class="lnt">3
</span><span class="lnt">4
</span></code></pre></td>
<td class="lntd">
<pre tabindex="0" class="chroma"><code class="language-R" data-lang="R"><span class="line"><span class="cl"><span class="nf">stargazer</span><span class="p">(</span><span class="n">linear.1</span><span class="p">,</span> <span class="n">linear.2</span><span class="p">,</span> <span class="n">title</span> <span class="o">=</span> <span class="s">&#34;Regression Results&#34;</span><span class="p">,</span>
</span></span><span class="line"><span class="cl"><span class="n">dep.var.labels</span> <span class="o">=</span> <span class="nf">c</span><span class="p">(</span><span class="s">&#34;Overall Rating&#34;</span><span class="p">,</span> <span class="s">&#34;High Rating&#34;</span><span class="p">),</span>
</span></span><span class="line"><span class="cl"><span class="n">order</span> <span class="o">=</span> <span class="nf">c</span><span class="p">(</span><span class="s">&#34;learning&#34;</span><span class="p">,</span> <span class="s">&#34;privileges&#34;</span><span class="p">),</span>
</span></span><span class="line"><span class="cl"><span class="n">keep.stat</span> <span class="o">=</span> <span class="s">&#34;n&#34;</span><span class="p">,</span> <span class="n">ci</span> <span class="o">=</span> <span class="kc">TRUE</span><span class="p">,</span> <span class="n">ci.level</span> <span class="o">=</span> <span class="m">0.90</span><span class="p">,</span> <span class="n">single.row</span> <span class="o">=</span> <span class="kc">TRUE</span><span class="p">)</span>
</span></span></code></pre></td></tr></table>
</div>
</div><p><img
        class="lazyload"
        data-src="https://static.datartisan.com/upload/attachment/2016/12/u0K7suc0.png"
        data-srcset="https://static.datartisan.com/upload/attachment/2016/12/u0K7suc0.png, https://static.datartisan.com/upload/attachment/2016/12/u0K7suc0.png 1.5x, https://static.datartisan.com/upload/attachment/2016/12/u0K7suc0.png 2x"
        data-sizes="auto"
        alt="https://static.datartisan.com/upload/attachment/2016/12/u0K7suc0.png"
        title="其他修饰功能"
    /></p>
<h4 id="控制输出格式">控制输出格式</h4>
<p>可以使用 <code>type</code> 参数控制以 <code>ASCII</code> 、<code>text</code>、<code>html</code>、<code>latex</code> 格式输出，默认为<code>LATEX</code> 格式。</p>
<div class="highlight"><div class="chroma">
<table class="lntable"><tr><td class="lntd">
<pre tabindex="0" class="chroma"><code><span class="lnt">1
</span><span class="lnt">2
</span><span class="lnt">3
</span><span class="lnt">4
</span></code></pre></td>
<td class="lntd">
<pre tabindex="0" class="chroma"><code class="language-R" data-lang="R"><span class="line"><span class="cl"><span class="nf">stargazer</span><span class="p">(</span><span class="n">linear.1</span><span class="p">,</span> <span class="n">linear.2</span><span class="p">,</span> <span class="n">type</span> <span class="o">=</span> <span class="s">&#34;text&#34;</span><span class="p">,</span> <span class="n">title</span> <span class="o">=</span> <span class="s">&#34;Regression Results&#34;</span><span class="p">,</span>
</span></span><span class="line"><span class="cl"><span class="n">dep.var.labels</span> <span class="o">=</span> <span class="nf">c</span><span class="p">(</span><span class="s">&#34;Overall Rating&#34;</span><span class="p">,</span> <span class="s">&#34;High Rating&#34;</span><span class="p">),</span>
</span></span><span class="line"><span class="cl"><span class="n">order</span> <span class="o">=</span> <span class="nf">c</span><span class="p">(</span><span class="s">&#34;learning&#34;</span><span class="p">,</span> <span class="s">&#34;privileges&#34;</span><span class="p">),</span> 
</span></span><span class="line"><span class="cl"><span class="n">keep.stat</span> <span class="o">=</span> <span class="s">&#34;n&#34;</span><span class="p">,</span> <span class="n">ci</span> <span class="o">=</span> <span class="kc">TRUE</span><span class="p">,</span> <span class="n">ci.level</span> <span class="o">=</span> <span class="m">0.90</span><span class="p">,</span> <span class="n">single.row</span> <span class="o">=</span> <span class="kc">TRUE</span><span class="p">,</span> <span class="n">header</span> <span class="o">=</span> <span class="bp">F</span><span class="p">)</span>
</span></span></code></pre></td></tr></table>
</div>
</div><p><img
        class="lazyload"
        data-src="https://static.datartisan.com/upload/attachment/2016/12/gOA9wyCN.png"
        data-srcset="https://static.datartisan.com/upload/attachment/2016/12/gOA9wyCN.png, https://static.datartisan.com/upload/attachment/2016/12/gOA9wyCN.png 1.5x, https://static.datartisan.com/upload/attachment/2016/12/gOA9wyCN.png 2x"
        data-sizes="auto"
        alt="https://static.datartisan.com/upload/attachment/2016/12/gOA9wyCN.png"
        title="控制输出格式"
    /></p>
<h4 id="自定义统计量">自定义统计量</h4>
<p>我们使用 <code>sandwich</code> 包来计算异方差-稳健标准误，并将其与默认计算的标准差一同展示。</p>
<div class="highlight"><div class="chroma">
<table class="lntable"><tr><td class="lntd">
<pre tabindex="0" class="chroma"><code><span class="lnt">1
</span><span class="lnt">2
</span><span class="lnt">3
</span><span class="lnt">4
</span><span class="lnt">5
</span><span class="lnt">6
</span></code></pre></td>
<td class="lntd">
<pre tabindex="0" class="chroma"><code class="language-R" data-lang="R"><span class="line"><span class="cl"><span class="nf">library</span><span class="p">(</span><span class="n">sandwich</span><span class="p">)</span>
</span></span><span class="line"><span class="cl"><span class="n">cov</span> <span class="o">&lt;-</span> <span class="nf">vcovHC</span><span class="p">(</span><span class="n">linear.1</span><span class="p">,</span> <span class="n">type</span> <span class="o">=</span> <span class="s">&#34;HC&#34;</span><span class="p">)</span>
</span></span><span class="line"><span class="cl"><span class="n">robust.se</span> <span class="o">&lt;-</span> <span class="nf">sqrt</span><span class="p">(</span><span class="nf">diag</span><span class="p">(</span><span class="n">cov</span><span class="p">))</span>
</span></span><span class="line"><span class="cl">
</span></span><span class="line"><span class="cl"><span class="nf">stargazer</span><span class="p">(</span><span class="n">linear.1</span><span class="p">,</span> <span class="n">linear.1</span><span class="p">,</span> <span class="n">se</span> <span class="o">=</span> <span class="nf">list</span><span class="p">(</span><span class="kc">NULL</span><span class="p">,</span> <span class="n">robust.se</span><span class="p">),</span>
</span></span><span class="line"><span class="cl"><span class="n">column.labels</span> <span class="o">=</span> <span class="nf">c</span><span class="p">(</span><span class="s">&#34;default&#34;</span><span class="p">,</span> <span class="s">&#34;robust&#34;</span><span class="p">))</span>
</span></span></code></pre></td></tr></table>
</div>
</div><p><img
        class="lazyload"
        data-src="https://static.datartisan.com/upload/attachment/2016/12/PC8L8NoB.png"
        data-srcset="https://static.datartisan.com/upload/attachment/2016/12/PC8L8NoB.png, https://static.datartisan.com/upload/attachment/2016/12/PC8L8NoB.png 1.5x, https://static.datartisan.com/upload/attachment/2016/12/PC8L8NoB.png 2x"
        data-sizes="auto"
        alt="https://static.datartisan.com/upload/attachment/2016/12/PC8L8NoB.png"
        title="自定义统计量"
    /></p>
<h3 id="支持的模型">支持的模型</h3>
<p>目前 <code>stargazer</code> 支持以下模型结果的展示：</p>
<blockquote>
<p>aftreg (eha), arima (stats), betareg (betareg), binaryChoice (sampleSelection), bj (rms), brglm (brglm), censReg (censReg), coeftest (lmtest), coxph (survival), coxreg (eha), clm (ordinal), clogit (survival), cph (rms), dynlm (dynlm), ergm(ergm), errorsarlm (spdev), felm (lfe), gam (mgcv), garchFit (fGarch), gee (gee), glm (stats), Glm (rms), glmer (lme4), glmrob(robustbase), gls (nlme), Gls (rms), gmm (gmm), heckit (sampleSelection), hetglm (glmx), hurdle (pscl), ivreg (AER), lagarlm (spdep), lm(stats), lme (nlme), lmer (lme4), lmrob (robustbase), lrm (rms), maBina (erer), mclogit (mclogit), mlogit (mlogit), mnlogit (mnlogit), mlreg (eha), multinom (nnet), nlme (nlme), nlmer (lme4), ols (rms), pgmm(plm), phreg (eha), plm (plm), pmg (plm), polr (MASS), psm (rms), rem.dyad (relevent), rlm(MASS), rq (quantreg), Rq (rms), selection (sampleSelection), svyglm (survey), survreg (survival), tobit (AER), weibreg (eha), zeroinfl (pscl), as well as from the implementation of these in zelig. In addition, stargazer also supports the following zelig models: “relogit”, “cloglog.net”, “gamma.net”, “probit.net” and “logit.net”.</p>
</blockquote>
<h3 id="支持的模板">支持的模板</h3>
<p><code>style</code> 参数可以用来选择统计表的展现形式，你可以通过  <code>?stargazer</code> 查看具体参数的设置来获取具体支持的格式，目前支持的期刊统计图格式有 <code>American Economic Review</code>、 <code>Quarterly Journal of Economics</code>  等。</p>
<h2 id="结合-rmarkdown-使用">结合 rmarkdown 使用</h2>
<div class="highlight"><div class="chroma">
<table class="lntable"><tr><td class="lntd">
<pre tabindex="0" class="chroma"><code><span class="lnt">1
</span><span class="lnt">2
</span><span class="lnt">3
</span></code></pre></td>
<td class="lntd">
<pre tabindex="0" class="chroma"><code class="language-r" data-lang="r"><span class="line"><span class="cl"><span class="n">```{r, results=&#39;asis&#39;}
</span></span></span><span class="line"><span class="cl"><span class="n">stargazer(model, header = F)
</span></span></span><span class="line"><span class="cl"><span class="n">```</span>
</span></span></code></pre></td></tr></table>
</div>
</div><blockquote>
<p><strong>注意事项：</strong></p>
<ul>
<li>要加上 <code>results='asis'</code> 保证输出的是表格，而不是 LATEX 文本；</li>
<li>参数 <code>align</code> 失效，不能使用；</li>
<li>加上参数 <code>header=FALSE</code>，避免输出关于包作者的一些文本信息。</li>
</ul>
</blockquote>
<h2 id="致谢">致谢</h2>
<blockquote>
<h3 id="参考文章">参考文章</h3>
<ul>
<li><a href="https://cran.r-project.org/web/packages/stargazer/vignettes/stargazer.pdf" target="_blank" rel="noopener noreffer">stargazer.pdf</a></li>
<li>Hlavac, Marek (2018). stargazer: Well-Formatted Regression and Summary Statistics
Tables. R package version 5.2.2. <a href="https://CRAN.R-project.org/package=stargazer" target="_blank" rel="noopener noreffer">https://CRAN.R-project.org/package=stargazer</a></li>
</ul>
</blockquote>]]></content:encoded></item></channel></rss>