为啥要用位运算代替取模呢-技术博客集

为啥要用位运算代替取模呢
编程技术 / houtizong 发布于 3年前 188

在hash中查找key的时候，经常会发现用&取代%，先看两段代码吧，

JDK6中的HashMap中的indexFor方法：

    /**     * Returns index for hash code h.     */    static int indexFor(int h, int length) {        return h & (length-1);    }

Redis2.4中的代码段：

    n.size = realsize;    n.sizemask = realsize-1;    //此处略去xxx行   while(de) {            unsigned int h;            nextde = de->next;            /* Get the index in the new hash table */            h = dictHashKey(d, de->key) & d->ht[1].sizemask;            de->next = d->ht[1].table[h];            d->ht[1].table[h] = de;            d->ht[0].used--;            d->ht[1].used++;            de = nextde;        }

大家可以看到a%b取模的形式都被替换成了a&(b-1) ，当hashtable的长度是2的幂的情况下（疏忽，一开始没写），这两者是等价的，那为什么要用后者呢？

另一方面，为什么hashtable的长度最好要是2的n次方呢，这个不在本次讨论范围之列，原因简单说一下就是1、分布更均匀 2、碰撞几率更小详情自己思考，JDK中的HashMap就会在初始化时，保证这一点：

    public HashMap(int initialCapacity, float loadFactor) {        if (initialCapacity < 0)            throw new IllegalArgumentException("Illegal initial capacity: " +                                               initialCapacity);        if (initialCapacity > MAXIMUM_CAPACITY)            initialCapacity = MAXIMUM_CAPACITY;        if (loadFactor <= 0 || Float.isNaN(loadFactor))            throw new IllegalArgumentException("Illegal load factor: " +                                               loadFactor);        // Find a power of 2 >= initialCapacity        int capacity = 1;        while (capacity < initialCapacity)            capacity <<= 1;        this.loadFactor = loadFactor;        threshold = (int)(capacity * loadFactor);        table = new Entry[capacity];        init();    }

redis中也有类似的保证：

/* Our hash table capability is a power of two */static unsigned long _dictNextPower(unsigned long size){    unsigned long i = DICT_HT_INITIAL_SIZE;    if (size >= LONG_MAX) return LONG_MAX;    while(1) {        if (i >= size)            return i;        i *= 2;    }}

言归正传，大家都知道位运算的效率最高，这也是&取代%的原因，来看个程序：

int main(int argc, char* argv[]){    int a = 0x111;    int b = 0x222;    int c = 0;    int d = 0;    c = a & (b-1);    d = a % b;    return 0;}

看反汇编的结果：

13:       c = a & (b-1);00401044   mov         eax,dword ptr [ebp-8]00401047   sub         eax,10040104A   mov         ecx,dword ptr [ebp-4]0040104D   and         ecx,eax0040104F   mov         dword ptr [ebp-0Ch],ecx14:       d = a % b;00401052   mov         eax,dword ptr [ebp-4]00401055   cdq00401056   idiv        eax,dword ptr [ebp-8]00401059   mov         dword ptr [ebp-10h],edx

可以看到，&操作用了:3mov+1and+1sub %操作用了：2mov+1cdp+1idiv

我们可以查阅Coding_ASM_-_Intel_Instruction_Set_Codes_and_Cycles资料，发现前者只需5个CPU周期，而后者至少需要26个CPU周期（注意，是最少！！！）效率显而易见。所以以后自己在写的时候，也可以使用前者的写法。

上一篇：web.xml报错

下一篇：算法竞赛入门经典（第二版）第2章习题

请勿发布不友善或者负能量的内容。与人为善，比聪明更重要！

&nbsp; &nbsp; 在hash中查找key的时候，经常会发现用&amp;取代%，先看两段代码吧， &nbsp; &nbsp; JDK6中的HashMap中的indexFor方法： <pre name="code" class="java"> /** * Returns index for hash code h. */ static int indexFor(int h, int length) { return h &amp; (length-1); }</pre> &nbsp; Redis2.4中的代码段： <pre name="code" class="c"> n.size = realsize; n.sizemask = realsize-1; //此处略去xxx行 while(de) { unsigned int h; nextde = de-&gt;next; /* Get the index in the new hash table */ h = dictHashKey(d, de-&gt;key) &amp; d-&gt;ht[1].sizemask; de-&gt;next = d-&gt;ht[1].table[h]; d-&gt;ht[1].table[h] = de; d-&gt;ht[0].used--; d-&gt;ht[1].used++; de = nextde; }</pre> &nbsp; 大家可以看到a%b取模的形式都被替换成了a&amp;(b-1) ，当hashtable的长度是2的幂的情况下（疏忽，一开始没写），这两者是等价的，那为什么要用后者呢？ 另一方面，为什么hashtable的长度最好要是2的n次方呢，这个不在本次讨论范围之列，原因简单说一下就是1、分布更均匀 2、碰撞几率更小&nbsp; 详情自己思考，JDK中的HashMap就会在初始化时，保证这一点： <pre name="code" class="java"> public HashMap(int initialCapacity, float loadFactor) { if (initialCapacity &lt; 0) throw new IllegalArgumentException(&quot;Illegal initial capacity: &quot; + initialCapacity); if (initialCapacity &gt; MAXIMUM_CAPACITY) initialCapacity = MAXIMUM_CAPACITY; if (loadFactor &lt;= 0 || Float.isNaN(loadFactor)) throw new IllegalArgumentException(&quot;Illegal load factor: &quot; + loadFactor); // Find a power of 2 &gt;= initialCapacity int capacity = 1; while (capacity &lt; initialCapacity) capacity &lt;&lt;= 1; this.loadFactor = loadFactor; threshold = (int)(capacity * loadFactor); table = new Entry[capacity]; init(); }</pre> &nbsp; redis中也有类似的保证： &nbsp; <pre name="code" class="java">/* Our hash table capability is a power of two */static unsigned long _dictNextPower(unsigned long size){ unsigned long i = DICT_HT_INITIAL_SIZE; if (size &gt;= LONG_MAX) return LONG_MAX; while(1) { if (i &gt;= size) return i; i *= 2; }}</pre> &nbsp; &nbsp; 言归正传，大家都知道位运算的效率最高，这也是&amp;取代%的原因，来看个程序： <pre name="code" class="c">int main(int argc, char* argv[]){ int a = 0x111; int b = 0x222; int c = 0; int d = 0; c = a &amp; (b-1); d = a % b; return 0;}</pre> &nbsp; &nbsp; 看反汇编的结果： <pre name="code" class="反汇编">13: c = a &amp; (b-1);00401044 mov eax,dword ptr [ebp-8]00401047 sub eax,10040104A mov ecx,dword ptr [ebp-4]0040104D and ecx,eax0040104F mov dword ptr [ebp-0Ch],ecx14: d = a % b;00401052 mov eax,dword ptr [ebp-4]00401055 cdq00401056 idiv eax,dword ptr [ebp-8]00401059 mov dword ptr [ebp-10h],edx</pre> &nbsp; 可以看到，&amp;操作用了:3mov+1and+1sub&nbsp; %操作用了：2mov+1cdp+1idiv &nbsp; 我们可以查阅Coding_ASM_-_Intel_Instruction_Set_Codes_and_Cycles资料，发现前者只需5个CPU周期，而后者至少需要26个CPU周期（注意，是最少！！！） 效率显而易见。所以以后自己在写的时候，也可以使用前者的写法。 &nbsp; &nbsp; </div>

留言需要登陆哦

技术博客集 - 网站简介：
前后端技术：
后端基于Hyperf2.1框架开发,前端使用Bootstrap可视化布局系统生成
网站主要作用：
1.编程技术分享及讨论交流，内置聊天系统;
2.测试交流框架问题，比如：Hyperf、Laravel、TP、beego;
3.本站数据是基于大数据采集等爬虫技术为基础助力分享知识，如有侵权请发邮件到站长邮箱，站长会尽快处理;
4.站长邮箱：[email protected];

文章归档

文章标签

友情链接

首页
关于我们

Auther ·HouTiZong: 侯体宗的博客