YTSFlix_Go/vendor/github.com/willf/bloom/README.md
2018-11-04 15:58:15 +01:00

70 lines
3.2 KiB
Markdown

Bloom filters
-------------
[![Master Build Status](https://secure.travis-ci.org/willf/bloom.png?branch=master)](https://travis-ci.org/willf/bloom?branch=master)
[![Coverage Status](https://coveralls.io/repos/github/willf/bloom/badge.svg?branch=master)](https://coveralls.io/github/willf/bloom?branch=master)
[![Go Report Card](https://goreportcard.com/badge/github.com/willf/bloom)](https://goreportcard.com/report/github.com/willf/bloom)
[![GoDoc](https://godoc.org/github.com/willf/bloom?status.svg)](http://godoc.org/github.com/willf/bloom)
A Bloom filter is a representation of a set of _n_ items, where the main
requirement is to make membership queries; _i.e._, whether an item is a
member of a set.
A Bloom filter has two parameters: _m_, a maximum size (typically a reasonably large multiple of the cardinality of the set to represent) and _k_, the number of hashing functions on elements of the set. (The actual hashing functions are important, too, but this is not a parameter for this implementation). A Bloom filter is backed by a [BitSet](https://github.com/willf/bitset); a key is represented in the filter by setting the bits at each value of the hashing functions (modulo _m_). Set membership is done by _testing_ whether the bits at each value of the hashing functions (again, modulo _m_) are set. If so, the item is in the set. If the item is actually in the set, a Bloom filter will never fail (the true positive rate is 1.0); but it is susceptible to false positives. The art is to choose _k_ and _m_ correctly.
In this implementation, the hashing functions used is [murmurhash](github.com/spaolacci/murmur3), a non-cryptographic hashing function.
This implementation accepts keys for setting and testing as `[]byte`. Thus, to
add a string item, `"Love"`:
n := uint(1000)
filter := bloom.New(20*n, 5) // load of 20, 5 keys
filter.Add([]byte("Love"))
Similarly, to test if `"Love"` is in bloom:
if filter.Test([]byte("Love"))
For numeric data, I recommend that you look into the encoding/binary library. But, for example, to add a `uint32` to the filter:
i := uint32(100)
n1 := make([]byte, 4)
binary.BigEndian.PutUint32(n1, i)
filter.Add(n1)
Finally, there is a method to estimate the false positive rate of a particular
bloom filter for a set of size _n_:
if filter.EstimateFalsePositiveRate(1000) > 0.001
Given the particular hashing scheme, it's best to be empirical about this. Note
that estimating the FP rate will clear the Bloom filter.
Discussion here: [Bloom filter](https://groups.google.com/d/topic/golang-nuts/6MktecKi1bE/discussion)
Godoc documentation: https://godoc.org/github.com/willf/bloom
## Installation
```bash
go get -u github.com/willf/bloom
```
## Contributing
If you wish to contribute to this project, please branch and issue a pull request against master ("[GitHub Flow](https://guides.github.com/introduction/flow/)")
This project include a Makefile that allows you to test and build the project with simple commands.
To see all available options:
```bash
make help
```
## Running all tests
Before committing the code, please check if it passes all tests using (note: this will install some dependencies):
```bash
make deps
make qa
```