Wednesday, 30 January 2013

Cache Money Hoes


Before we begin, the underlined words are briefly described in the glossary at the end. If there are any terms you don't understand let me know and I'll place them in the glossary as well.

As the wise contemporary philosophers the Wu-Tang Clan once said: 
"Cache Rules Everything Around Me ( C.R.E.A.M ) "
Many people believe this is just typical rappers exhibiting the materialistic nature of modern life. However, they were in fact visionaries describing the importance of memory hierarchy utilisation on processor performance.

Traditionally the time complexity of an algorithm was based on the RAM model.  The model assumes that access to all memory locations takes equivalent constant time.  In reality a cache hit is an order of magnitude faster than a cache miss.

Therefore it should be possible to significantly improve an algorithm by reordering its memory access to be cache friendly. This isn't a particularly novel idea and it has a name: cache aware algorithms.

The problem with cache aware algorithms is that they depend on the cache size. The algorithm must be tuned for every different cache it runs on. This unportable tuning was the inspiration for Harald Prokop's cache oblivious algorithms. They are algorithms that are cache aware but don't require tuning.

To illustrate cache-oblivious and cache-aware algorithms we first need an algorithm we can improve. The key is that we don't change the time complexity at all. That means we still do the same number of operations - so the processors do the same work. The only variant is the order of memory access and therefore cache utilisation.

Algorithms that access memory often are ripe for the picking. I decided to implement the matrix transpose because it one of the simplest memory heavy algorithms. The matrix transpose simply takes the rows of input algorithm and makes them the columns of the output matrix, like so:



You don't need to understand matrices at all to understand this algorithm. You only need to understand one definition:
[\mathbf{A}^\mathrm{T}]_{ij} = [\mathbf{A}]_{ji}

This simply states that the entry with jth row and ith column for the input matrix is equal to the entry with ith row and jth column in the output matrix.  

This cache-unaware or naive algorithm directly uses this definition. As shown below:
      for( int i = 0; i < N; i++ ){  
           for( int j = 0; j < N; j++ ){  
                out[j][i] = in[i][j];  
           }  
      }  

Just a side note, the matrices here are large e.g. in this blogs associated code N=1024 i.e. the matrix is 4MB ( each entry is 4 bytes so 4x1024x1024) . Anyway back to the analysis 2D arrays in C are stored in row-major order.

This means that an entry's horizontal neighbours ( i.e. row neighbours ) will be on the same cache line. Unless the entry itself is at the end of a cache line.  Therefore accessing a horizontal neighbour will likely be a cache hit. Conversely accessing a vertical neighbour will likely be  a cache miss ( unless its a relatively small matrix and its row is small enough to fit in a cache line ).

The naive algorithm reading the input matrix will often cache hit because it traverses in row order ( the inner iterator j walks the column ). However writing to the output matrix will likely cache miss due to column order traversal.

How can this be improved? Well we can change the order of writing to minimize the cache misses. That is exactly what the cache-aware algorithm does. It splits the matrix into a set of sub-matrices that have a size equivalent to the cache line. Then runs that algorithm on the sub-matrices:


Don't let sub-matrices confuse you. We are still essentially following the [\mathbf{A}^\mathrm{T}]_{ij} = [\mathbf{A}]_{ji} rule. The only difference is we are restricting it to a sub-blocks of the matrix. Once the block has been transferred with minimal cache misses we move onto the next block. Use the source Luke if you want to know more.

The problem with the previous cache aware algorithm is it took the cache line size as a parameter. So we would have to tune it for each different target. Can we eliminate this tuning? Sure we can! The cache oblivious (CO) way.

The CO basically keeps splitting the matrix into sub-blocks until we reach blocks of size 1. In reality you set it to slightly larger then one to minimize the cost of recursion. Then apply the transpose on the blocks. So the CO is basically the cache-aware algorithm but the tuning is done automatically through divide and conquer.

But wait, isn't setting the block size too small going to cost a cache miss? No thats the clever bit! If the block is smaller than the cache line then the siblings and eventually parent block ( the block that encompasses the subblock ) will still be in cache.  As shown below



The initial block being accessed is the sub-block in the top left consisting of red and blue blocks. Its entries are shown in the cache line above. Once its been accessed its sibling block ( the top right block consisting of green and yellow ) is accessed. Since the blocks are smaller than the cache line the siblings entries are also in cache.

So lets see if the wu-tang hypothesis is correct.  The average results of running the transpose on a 1024x1024 matrix 100 times are:

NaiveCache AwareCache Oblivious
5704 ms 763.33 ms 1212 ms

Of course the Wu-Tang clan was right on the money. So the algorithms are doing the exact same amount of operations and the difference is huge. The cache oblivious algorithm is slightly slower because its paying for the recursion. As the size of the matrix grows the cost of recursion will reduce. So we can say that the cache oblivious algorithm is asymptotically equivalent to the cache aware one.

This blog has given a very broad overview. For more information please read Harald Prokop's MIT master's thesis and the attached source code

For the people that do read the thesis one thing that had me confused was the cache complexity was often stated as O( 1 + k / N ) and I couldn't figure out where the 1 came from. Its because big-O represents worst case, and the worst case in terms of cache is when the memory address is misaligned so you have to pay for an additional cache miss per object access. 



So the take home for today is, when your code runs slow don't forget the cache! Until next time keep it real...



Glossary 

Time Complexity - A way of comparing algorithms mathematically. Instead of comparing them by actual time measurements that are affected by the computer the algorithm is run on. Compare them by the expected number of operations. The number of operations is typically dependent on the size of the input n. So we say its complexity is a function of n.

Cache Hit - When the CPU attempts to access an object that's already in cache.

Cache Miss - When the CPU attempts to access an object that's not in cache so it must be copied from memory to cache ( a line size at a time ).

Row Major - A 2D ( or n dimensional ) array in C is actually stored as one large single array. In row major order rows are written out after each other. So for example if the array is defined as x[3][5] and you access element x[2][4] its actual set at 2*5 + 4 because it has to skip the first two rows and then move to the 4th element.

75 comments:

  1. Cool article with good illustration, please keep writing!

    ReplyDelete
  2. thank-you ... I will try to blog about every Sunday project I take on. Next Sun is going to be on linear programming or indexing. I may write an article on JIT from the Sunday before last :)

    ReplyDelete
  3. I believe that cache-oblivious algorithms are asymptotically optimal as the number of cache levels in the hierarchy increases, not asymptotically optimal as the input size increases.

    In "The Cost of Cache-Oblivious Searching" (Proceedings. 44th Annual IEEE Symposium on Foundations of Computer Science, 2003), Bender et al. prove that cache-oblivious search structures take (lg e) or approximately 44% more block transfers than the optimal cache-aware algorithm in the 2-level memory model (the DAM model) and can do no better. As the number of levels in the memory model increases, the performance loss of the cache-oblivious algorithm approaches zero.

    Of course real hardware does not have a 2-level memory hierarchy nor does it have an infinite level memory hierarchy. But I have found that in practice, the 2-level memory model is a good abstraction for application performance. There is usually one "jump" in the memory hierarchy that has a significant performance impact. Either the gap from disk to main memory, or the gap from main memory to highest cache level, depending on the application.

    ReplyDelete
    Replies
    1. Thank you for letting me know. Ill read that on Sat and update the correction.

      Delete
  4. Awesome article! Interesting note when I ran the code, the cache oblivious was faster than the cache-aware. (compiled with gcc -O3 and ran on an i7)

    naive : 1696 1707 1707
    cache aware : 798 800 803
    cache oblivous: 364 366 364

    ReplyDelete
    Replies
    1. Thanks Danny ... A friend of mine also noticed similar results. Unfortunately I must restrict myself to Sundays for tangent work so I really can't do more research on the cause.

      If are interested in finding out. Before studying the optimised output make sure:

      1) You have set the cache line parameter for the aware algorithm appropriately
      2) Verify that the matrix begins on a page boundary using mmap ( so that its cached align )

      If you do find out, please let me know and Ill update this blog.

      Thanks

      Delete
    2. Worth noting is that the ratio of performance increase in the three cases are as follows:

      naive: 3.35 times faster
      cache aware: 0.954 times as fast
      cache oblivious: 3.32 times faster

      So that supports the hypothesis that your test machine is around 3.3 times faster than the one used for the article, but you have incorrectly configured the cache size or alignment in the cache-aware case, in one of the ways Iain describes above.

      Incidentally, this is a good anecdotal argument as to why the cache-oblivious algorithm is a better choice in general, because it's harder to misconfigure.

      Delete
  5. get the memory, mega mega bytes ya'll!

    ReplyDelete
  6. Well-written. Made my day.

    (memory) Word!

    ReplyDelete
  7. Hey there, I’m Nikitha Bangalore. I’m a Model living in Bangalore. I am a fan of Independent Bangalore Escorts, Bangalore escorts, and call girls in Bangalore.

    ReplyDelete
  8. Thank you for sharing useful information with us. please keep sharing like this. You might like the following article also please visit us

    Freedom Fighters of india

    freedom fighters images

    ReplyDelete
  9. Our agency has the capability to fulfill your dreams with our Dwarka Escorts Service. We have so many girls in our agency to your day/night pleasures.
    Dwarka Escorts | Delhi Escorts

    ReplyDelete
  10. You are so awesome! I don’t think I’ve read through anything
    like that before.
    토토
    바카라사이트

    ReplyDelete
  11. When someone writes an article he/she keeps the thought of a user in his/her brain that how a user can understand it. Thus that’s why this article is outstanding. Thanks!
    경마사이트
    토토사이트

    ReplyDelete
  12. This is an informative post review. I am so pleased to get this post article and nice information. Thanks for sharing with us.

    토토
    바카라사이트
    파워볼
    카지노사이트

    ReplyDelete
  13. Your blogs further more each else volume is so entertaining further serviceable It appoints me befall retreat encore. I will instantly grab your rss feed to stay informed of any updates.

    스포츠토토
    안전놀이터
    토토사이트

    ReplyDelete
  14. I recently found many useful information in your website especially this blog page. Among the lots of comments on your articles. Thanks for sharing.

    스포츠토토
    카지노사이트
    파워볼게임
    바카라

    ReplyDelete
  15. Sand impacting framework is find inside the tremendous assortment of the business it is use inside the all sort of the undertaking automatic sand blasting machine

    ReplyDelete
  16. Everything is very open with a precise explanation of the issues. It was really informative. Your website is useful. Thank you for sharing! Feel free to visit my website; 야설


    ReplyDelete
  17. I think this is one of the best blog for me because this is really helpful for me. Thanks for sharing this valuable information for free Feel free to visit my website; 한국야동


    ReplyDelete
  18. If more people that write articles really concerned themselves with writing great content like you, more readers would be interested in their writings. Thank you for caring about your content. Feel free to visit my website; 국산야동


    ReplyDelete
  19. Your content is nothing short of brilliant in many ways. I think this is engaging and eye-opening material. Thank you so much for caring about your content and your readers. Feel free to visit my website; 일본야동


    ReplyDelete
  20. This website has very good content. Thank you for the great article I did enjoyed reading it, I will be sure to bookmark your blog. It is really very nice and you did a great job ! Feel free to visit my website; 일본야동

    ReplyDelete
  21. Queen Casino & Slots Review (2021) | Play Online Casino
    Queen Casino is rated 8.6/10. 퍼스트 카지노 It provides a fun and modern atmosphere with plenty of games for fun88 vin players to enjoy, the range of bonuses and promotions クイーンカジノ are

    ReplyDelete
  22. I was studying some of your articles on this site and I think this web site is really informative! Keep putting up. 바카라사이트

    ReplyDelete
  23. Thanks for sharing this marvelous post. I m very pleased to read this article.
    바카라사이트

    ReplyDelete
  24. This is one of the best website I have seen in a long time thank you so much.
    스포츠토토

    ReplyDelete
  25. I will recommend your website to everyone. You have a very good gloss. Write more high-quality articles. I support you.
    카지노사이트

    ReplyDelete
  26. Buying a business does not have to be a complicated endeavor when the proper process and methodology is followed. In this article, we outline eleven specific steps that should be adhered to when buying a business and bank financing is planned to be utilized. 메이저토토사이트추천

    ReplyDelete
  27. Oh, the data you've shared in this incredible article is just magnificent. I am definitely going to make more use of this data in my future projects. You must continue sharing more data like this with us. 메이저놀이터


    ReplyDelete
  28. If some one desires expert view concerning
    blogging and site-building afterward i suggest him/her to pay a quick visit this website, Keep up the nice work. 스포츠토토

    ReplyDelete
  29. This post gives clear idea in favor of the new users of blogging,
    that in fact how to do running a blog. 파워볼사이트

    ReplyDelete
  30. It’s an awesome paragraph in support of all the web
    viewers; they will get benefit from it I am sure. 바카라사이트

    ReplyDelete
  31. Hi! I just would like to give you a huge thumbs up for the great info you have got right here on this post. I'll be coming back to your site for more soon. Feel free to visit my website; 토토사이트

    ReplyDelete
  32. Keep up the good work , I read few blog posts on this website and I believe that your weblog is rattling interesting. Thank you for this effort, I will give you 5 stars for this. Kindly check the link below Thank you! Feel free to visit my website; 배트맨토토

    ReplyDelete
  33. Great website. Plenty of helpful information here. 토토

    ReplyDelete
  34. It is not my first time to go to see this website, i am browsing this web site daily and take good facts from here every day. 바카라사이트

    ReplyDelete
  35. Great blog right here! Also your site a lot up fast! 파워볼

    ReplyDelete
  36. This is the perfect post.안전놀이터 It helped me a lot. If you have time, I hope you come to my site and share your opinions. Have a nice day.

    ReplyDelete
  37. I ll recommend it whenever good comments come up. I ll let people know. I hope you can enjoy this blog. 메이저사이트

    ReplyDelete
  38. Please post useful information often. We will continue to visit in the future. Have a good day. 먹튀검증업체

    ReplyDelete
  39. I have a presentation next week. I m looking for such information. Thank you for these fantastic posts. Where can someone else get that type of information? Perfect way to write. 메이저사이트

    ReplyDelete
  40. I like your blog. i ma happy to read your blog its very informative and your blog is really good and impressive you made it 메이저검증업체

    ReplyDelete
  41. Thank you for the information provided! 토토 Maintain the good performance of your site. You can also check my article

    ReplyDelete
  42. I read this post completely about the comparison of hottest and previous technologies, it’s remarkable article. 안전사이트

    ReplyDelete
  43. Extremely decent blog and articles. I am realy extremely glad to visit your blog. Presently I am discovered which I really need. I check your blog regular and attempt to take in something from your blog. Much obliged to you and sitting tight for your new post.메이저사이트모음

    ReplyDelete
  44. I think a lot of articles related to are disappearing someday. That's why it's very hard to find, but I'm very fortunate to read your writing. When you come to my site, I have collected articles related to 크레이지슬롯 .

    ReplyDelete
  45. I am a 슬롯사이트 expert. I've read a lot of articles, but I'm the first person to understand as well as you. I leave a post for the first time. It's great!!

    ReplyDelete
  46. 토토
    안전놀이터
    프로토


    This is really interesting, You are a very skilled blogger.
    I've joined your feed and look forward to seeking more of your great post.
    Also, I've shared your web site in my social networks!

    ReplyDelete
  47. 스포츠중계
    스포츠토토티비
    토토사이트

    Asking questions are genuinely good thing if you
    are not understanding something fully, however this post offers
    fastidious understanding even.

    ReplyDelete
  48. 토토
    스포츠토토
    먹튀검증



    Excellent post. I was checking constantly this blog and I'm impressed!
    Very helpful info particularly the last part :) I care for such info much.
    I was seeking this particular info for a long time. Thank
    you and good luck.

    ReplyDelete
  49. Hey friend, it is very well written article, thank you for the valuable and useful information you provide in this post. Keep up the good work! FYI, shih tzu hair products , Airtel Axis Bank Credit Card Review, She erased her pdf download by Himanshu Rai,Paragraph On An Ideal Student

    ReplyDelete
  50. How to Play Casino: Easy Guide to playing slots on
    Casino games are played by 4 players, ventureberg.com/ the average 1xbet app time they casinosites.one take 토토 turns is around 14:20. The house is divided into three distinct wooricasinos.info categories: the house

    ReplyDelete
  51. Yes i am completely concurred with this article and i simply need say this article is extremely decent and exceptionally useful article.I will make a point to be perusing your blog more. You made a decent point yet I can"t resist the urge to ponder, shouldn"t something be said about the other side? 먹튀검증업체 .

    ReplyDelete
  52. I'm so happy to finally find a post with what I want. 안전놀이터순위 You have inspired me a lot. If you are satisfied, please visit my website and leave your feedback.

    ReplyDelete
  53. I clearly stumbled upon your weblog and favored to mention that I’ve truly loved reading your blog posts.
    온라인카지노

    ReplyDelete
  54. I really appreciate this wonderful post that you have provided for us.
    바카라사이트

    ReplyDelete
  55. I felt very happy while reading this site. This was really very informative site for me. 바둑이사이트넷

    ReplyDelete
  56. I spend a lot of time on this blog to learn a lot of good information. 먹튀폴리스

    ReplyDelete
  57. I am the one who writes on a topic similar to yours. I hope you come to my blog and take a look at the posts I've been writing. 안전놀이터추천

    ReplyDelete
  58. Your post is very helpful and information is reliable. I am satisfied with your post. Thank you so much for sharing this wonderful post. If you have any assignment requirement then you are at the right place. 메이저사이트
    cbb

    ReplyDelete
  59. I know this is one of the most meaningful information for me. And I'm animated reading your article 카지노사이트

    ReplyDelete
  60. But should remark on some general things, the website style is perfect; the articles are great.
    카지노사이트존
    카지노사이트
    바카라사이트

    ReplyDelete
  61. บริการเกมสล็อตออนไลน์ปี 2022 เกมให้เลือกเล่นมากกว่า 1,000 เกม สล็อต d เล่นได้จริง แจกจริง สมัครเลยตอนนี้

    ReplyDelete