Hash functions
Hash functions can be used for the deterministic pseudo-random shuffling of elements.
Simhash is a hash function, which returns close hash values for close (similar) arguments.
Most hash functions accept any number of arguments of any types.
Hash of NULL is NULL. To get a non-NULL hash of a Nullable column, wrap it in a tuple:
To calculate hash of the whole contents of a table, use sum(cityHash64(tuple(*)))
(or other hash function). tuple
ensures that rows with NULL values are not skipped. sum
ensures that the order of rows doesn't matter.
halfMD5
Interprets all the input parameters as strings and calculates the MD5 hash value for each of them. Then combines hashes, takes the first 8 bytes of the hash of the resulting string, and interprets them as UInt64
in big-endian byte order.
The function is relatively slow (5 million short strings per second per processor core). Consider using the sipHash64 function instead.
Arguments
The function takes a variable number of input parameters. Arguments can be any of the supported data types. For some data types calculated value of hash function may be the same for the same values even if types of arguments differ (integers of different size, named and unnamed Tuple
with the same data, Map
and the corresponding Array(Tuple(key, value))
type with the same data).
Returned Value
A UInt64 data type hash value.
Example
MD4
Calculates the MD4 from a string and returns the resulting set of bytes as FixedString(16).
MD5
Calculates the MD5 from a string and returns the resulting set of bytes as FixedString(16). If you do not need MD5 in particular, but you need a decent cryptographic 128-bit hash, use the 'sipHash128' function instead. If you want to get the same result as output by the md5sum utility, use lower(hex(MD5(s))).
RIPEMD160
Produces RIPEMD-160 hash value.
Syntax
Parameters
input
: Input string. String
Returned value
- A 160-bit
RIPEMD-160
hash value of type FixedString(20).
Example
Use the hex function to represent the result as a hex-encoded string.
Query:
sipHash64
Produces a 64-bit SipHash hash value.
This is a cryptographic hash function. It works at least three times faster than the MD5 hash function.
The function interprets all the input parameters as strings and calculates the hash value for each of them. It then combines the hashes by the following algorithm:
- The first and the second hash value are concatenated to an array which is hashed.
- The previously calculated hash value and the hash of the third input parameter are hashed in a similar way.
- This calculation is repeated for all remaining hash values of the original input.
Arguments
The function takes a variable number of input parameters of any of the supported data types.
Returned Value
A UInt64 data type hash value.
Note that the calculated hash values may be equal for the same input values of different argument types. This affects for example integer types of different size, named and unnamed Tuple
with the same data, Map
and the corresponding Array(Tuple(key, value))
type with the same data.
Example
sipHash64Keyed
Same as sipHash64 but additionally takes an explicit key argument instead of using a fixed key.
Syntax
Arguments
Same as sipHash64, but the first argument is a tuple of two UInt64 values representing the key.
Returned value
A UInt64 data type hash value.
Example
Query:
sipHash128
Like sipHash64 but produces a 128-bit hash value, i.e. the final xor-folding state is done up to 128 bits.
This 128-bit variant differs from the reference implementation and it's weaker. This version exists because, when it was written, there was no official 128-bit extension for SipHash. New projects should probably use sipHash128Reference.
Syntax
Arguments
Same as for sipHash64.
Returned value
A 128-bit SipHash
hash value of type FixedString(16).
Example
Query:
Result:
sipHash128Keyed
Same as sipHash128 but additionally takes an explicit key argument instead of using a fixed key.
This 128-bit variant differs from the reference implementation and it's weaker. This version exists because, when it was written, there was no official 128-bit extension for SipHash. New projects should probably use sipHash128ReferenceKeyed.
Syntax
Arguments
Same as sipHash128, but the first argument is a tuple of two UInt64 values representing the key.
Returned value
A 128-bit SipHash
hash value of type FixedString(16).
Example
Query:
Result:
sipHash128Reference
Like sipHash128 but implements the 128-bit algorithm from the original authors of SipHash.
Syntax
Arguments
Same as for sipHash128.
Returned value
A 128-bit SipHash
hash value of type FixedString(16).
Example
Query:
Result:
sipHash128ReferenceKeyed
Same as sipHash128Reference but additionally takes an explicit key argument instead of using a fixed key.
Syntax
Arguments
Same as sipHash128Reference, but the first argument is a tuple of two UInt64 values representing the key.
Returned value
A 128-bit SipHash
hash value of type FixedString(16).
Example
Query:
Result:
cityHash64
Produces a 64-bit CityHash hash value.
This is a fast non-cryptographic hash function. It uses the CityHash algorithm for string parameters and implementation-specific fast non-cryptographic hash function for parameters with other data types. The function uses the CityHash combinator to get the final results.
Note that Google changed the algorithm of CityHash after it has been added to ClickHouse. In other words, ClickHouse's cityHash64 and Google's upstream CityHash now produce different results. ClickHouse cityHash64 corresponds to CityHash v1.0.2.
Arguments
The function takes a variable number of input parameters. Arguments can be any of the supported data types. For some data types calculated value of hash function may be the same for the same values even if types of arguments differ (integers of different size, named and unnamed Tuple
with the same data, Map
and the corresponding Array(Tuple(key, value))
type with the same data).
Returned Value
A UInt64 data type hash value.
Examples
Call example:
The following example shows how to compute the checksum of the entire table with accuracy up to the row order:
intHash32
Calculates a 32-bit hash code from any type of integer. This is a relatively fast non-cryptographic hash function of average quality for numbers.
Syntax
Arguments
int
— Integer to hash. (U)Int*.
Returned value
- 32-bit hash code. UInt32.
Example
Query:
Result:
intHash64
Calculates a 64-bit hash code from any type of integer. This is a relatively fast non-cryptographic hash function of average quality for numbers. It works faster than intHash32.
Syntax
Arguments
int
— Integer to hash. (U)Int*.
Returned value
- 64-bit hash code. UInt64.
Example
Query:
Result:
SHA1, SHA224, SHA256, SHA512, SHA512_256
Calculates SHA-1, SHA-224, SHA-256, SHA-512, SHA-512-256 hash from a string and returns the resulting set of bytes as FixedString.
Syntax
The function works fairly slowly (SHA-1 processes about 5 million short strings per second per processor core, while SHA-224 and SHA-256 process about 2.2 million).
We recommend using this function only in cases when you need a specific hash function and you can't select it.
Even in these cases, we recommend applying the function offline and pre-calculating values when inserting them into the table, instead of applying it in SELECT
queries.
Arguments
s
— Input string for SHA hash calculation. String.
Returned value
- SHA hash as a hex-unencoded FixedString. SHA-1 returns as FixedString(20), SHA-224 as FixedString(28), SHA-256 — FixedString(32), SHA-512 — FixedString(64). FixedString.
Example
Use the hex function to represent the result as a hex-encoded string.
Query:
Result:
BLAKE3
Calculates BLAKE3 hash string and returns the resulting set of bytes as FixedString.
Syntax
This cryptographic hash-function is integrated into ClickHouse with BLAKE3 Rust library. The function is rather fast and shows approximately two times faster performance compared to SHA-2, while generating hashes of the same length as SHA-256.
Arguments
- s - input string for BLAKE3 hash calculation. String.
Return value
- BLAKE3 hash as a byte array with type FixedString(32). FixedString.
Example
Use function hex to represent the result as a hex-encoded string.
Query:
Result:
URLHash(url[, N])
A fast, decent-quality non-cryptographic hash function for a string obtained from a URL using some type of normalization.
URLHash(s)
– Calculates a hash from a string without one of the trailing symbols /
,?
or #
at the end, if present.
URLHash(s, N)
– Calculates a hash from a string up to the N level in the URL hierarchy, without one of the trailing symbols /
,?
or #
at the end, if present.
Levels are the same as in URLHierarchy.
farmFingerprint64
farmHash64
Produces a 64-bit FarmHash or Fingerprint value. farmFingerprint64
is preferred for a stable and portable value.
These functions use the Fingerprint64
and Hash64
methods respectively from all available methods.
Arguments
The function takes a variable number of input parameters. Arguments can be any of the supported data types. For some data types calculated value of hash function may be the same for the same values even if types of arguments differ (integers of different size, named and unnamed Tuple
with the same data, Map
and the corresponding Array(Tuple(key, value))
type with the same data).
Returned Value
A UInt64 data type hash value.
Example
javaHash
Calculates JavaHash from a string, Byte, Short, Integer, Long. This hash function is neither fast nor having a good quality. The only reason to use it is when this algorithm is already used in another system and you have to calculate exactly the same result.
Note that Java only support calculating signed integers hash, so if you want to calculate unsigned integers hash you must cast it to proper signed ClickHouse types.
Syntax
Returned value
A Int32
data type hash value.
Example
Query:
Result:
Query:
Result:
javaHashUTF16LE
Calculates JavaHash from a string, assuming it contains bytes representing a string in UTF-16LE encoding.
Syntax
Arguments
stringUtf16le
— a string in UTF-16LE encoding.
Returned value
A Int32
data type hash value.
Example
Correct query with UTF-16LE encoded string.
Query:
Result:
hiveHash
Calculates HiveHash
from a string.
This is just JavaHash with zeroed out sign bit. This function is used in Apache Hive for versions before 3.0. This hash function is neither fast nor having a good quality. The only reason to use it is when this algorithm is already used in another system and you have to calculate exactly the same result.
Returned value
hiveHash
hash value. Int32.
Example
Query:
Result:
metroHash64
Produces a 64-bit MetroHash hash value.
Arguments
The function takes a variable number of input parameters. Arguments can be any of the supported data types. For some data types calculated value of hash function may be the same for the same values even if types of arguments differ (integers of different size, named and unnamed Tuple
with the same data, Map
and the corresponding Array(Tuple(key, value))
type with the same data).
Returned Value
A UInt64 data type hash value.
Example
jumpConsistentHash
Calculates JumpConsistentHash form a UInt64. Accepts two arguments: a UInt64-type key and the number of buckets. Returns Int32. For more information, see the link: JumpConsistentHash
kostikConsistentHash
An O(1) time and space consistent hash algorithm by Konstantin 'kostik' Oblakov. Previously yandexConsistentHash
.
Syntax
Alias: yandexConsistentHash
(left for backwards compatibility sake).
Parameters
Returned value
- A UInt16 data type hash value.
Implementation details
It is efficient only if n <= 32768.
Example
Query:
murmurHash2_32, murmurHash2_64
Produces a MurmurHash2 hash value.
Arguments
Both functions take a variable number of input parameters. Arguments can be any of the supported data types. For some data types calculated value of hash function may be the same for the same values even if types of arguments differ (integers of different size, named and unnamed Tuple
with the same data, Map
and the corresponding Array(Tuple(key, value))
type with the same data).
Returned Value
- The
murmurHash2_32
function returns hash value having the UInt32 data type. - The
murmurHash2_64
function returns hash value having the UInt64 data type.
Example
gccMurmurHash
Calculates a 64-bit MurmurHash2 hash value using the same hash seed as gcc. It is portable between Clang and GCC builds.
Syntax
Arguments
par1, ...
— A variable number of parameters that can be any of the supported data types.
Returned value
- Calculated hash value. UInt64.
Example
Query:
Result:
kafkaMurmurHash
Calculates a 32-bit MurmurHash2 hash value using the same hash seed as Kafka and without the highest bit to be compatible with Default Partitioner.
Syntax
Arguments
par1, ...
— A variable number of parameters that can be any of the supported data types.
Returned value
- Calculated hash value. UInt32.
Example
Query:
Result:
murmurHash3_32, murmurHash3_64
Produces a MurmurHash3 hash value.
Arguments
Both functions take a variable number of input parameters. Arguments can be any of the supported data types. For some data types calculated value of hash function may be the same for the same values even if types of arguments differ (integers of different size, named and unnamed Tuple
with the same data, Map
and the corresponding Array(Tuple(key, value))
type with the same data).
Returned Value
- The
murmurHash3_32
function returns a UInt32 data type hash value. - The
murmurHash3_64
function returns a UInt64 data type hash value.
Example
murmurHash3_128
Produces a 128-bit MurmurHash3 hash value.
Syntax
Arguments
expr
— A list of expressions. String.
Returned value
A 128-bit MurmurHash3
hash value. FixedString(16).
Example
Query:
Result:
xxh3
Produces a 64-bit xxh3 hash value.
Syntax
Arguments
expr
— A list of expressions of any data type.
Returned value
A 64-bit xxh3
hash value. UInt64.
Example
Query:
Result:
xxHash32, xxHash64
Calculates xxHash
from a string. It is proposed in two flavors, 32 and 64 bits.
Returned value
- Hash value. UInt32/64.
The return type will be UInt32
for xxHash32
and UInt64
for xxHash64
.
Example
Query:
Result:
See Also
ngramSimHash
Splits a ASCII string into n-grams of ngramsize
symbols and returns the n-gram simhash
. Is case sensitive.
Can be used for detection of semi-duplicate strings with bitHammingDistance. The smaller is the Hamming Distance of the calculated simhashes
of two strings, the more likely these strings are the same.
Syntax
Arguments
string
— String. String.ngramsize
— The size of an n-gram. Optional. Possible values: any number from1
to25
. Default value:3
. UInt8.
Returned value
- Hash value. UInt64.
Example
Query:
Result:
ngramSimHashCaseInsensitive
Splits a ASCII string into n-grams of ngramsize
symbols and returns the n-gram simhash
. Is case insensitive.
Can be used for detection of semi-duplicate strings with bitHammingDistance. The smaller is the Hamming Distance of the calculated simhashes
of two strings, the more likely these strings are the same.
Syntax
Arguments
string
— String. String.ngramsize
— The size of an n-gram. Optional. Possible values: any number from1
to25
. Default value:3
. UInt8.
Returned value
- Hash value. UInt64.
Example
Query:
Result:
ngramSimHashUTF8
Splits a UTF-8 string into n-grams of ngramsize
symbols and returns the n-gram simhash
. Is case sensitive.
Can be used for detection of semi-duplicate strings with bitHammingDistance. The smaller is the Hamming Distance of the calculated simhashes
of two strings, the more likely these strings are the same.
Syntax
Arguments
string
— String. String.ngramsize
— The size of an n-gram. Optional. Possible values: any number from1
to25
. Default value:3
. UInt8.
Returned value
- Hash value. UInt64.
Example
Query:
Result:
ngramSimHashCaseInsensitiveUTF8
Splits a UTF-8 string into n-grams of ngramsize
symbols and returns the n-gram simhash
. Is case insensitive.
Can be used for detection of semi-duplicate strings with bitHammingDistance. The smaller is the Hamming Distance of the calculated simhashes
of two strings, the more likely these strings are the same.
Syntax
Arguments
string
— String. String.ngramsize
— The size of an n-gram. Optional. Possible values: any number from1
to25
. Default value:3
. UInt8.
Returned value
- Hash value. UInt64.
Example
Query:
Result:
wordShingleSimHash
Splits a ASCII string into parts (shingles) of shinglesize
words and returns the word shingle simhash
. Is case sensitive.
Can be used for detection of semi-duplicate strings with bitHammingDistance. The smaller is the Hamming Distance of the calculated simhashes
of two strings, the more likely these strings are the same.
Syntax
Arguments
string
— String. String.shinglesize
— The size of a word shingle. Optional. Possible values: any number from1
to25
. Default value:3
. UInt8.
Returned value
- Hash value. UInt64.
Example
Query:
Result:
wordShingleSimHashCaseInsensitive
Splits a ASCII string into parts (shingles) of shinglesize
words and returns the word shingle simhash
. Is case insensitive.
Can be used for detection of semi-duplicate strings with bitHammingDistance. The smaller is the Hamming Distance of the calculated simhashes
of two strings, the more likely these strings are the same.
Syntax
Arguments
string
— String. String.shinglesize
— The size of a word shingle. Optional. Possible values: any number from1
to25
. Default value:3
. UInt8.
Returned value
- Hash value. UInt64.
Example
Query:
Result:
wordShingleSimHashUTF8
Splits a UTF-8 string into parts (shingles) of shinglesize
words and returns the word shingle simhash
. Is case sensitive.
Can be used for detection of semi-duplicate strings with bitHammingDistance. The smaller is the Hamming Distance of the calculated simhashes
of two strings, the more likely these strings are the same.
Syntax
Arguments
string
— String. String.shinglesize
— The size of a word shingle. Optional. Possible values: any number from1
to25
. Default value:3
. UInt8.
Returned value
- Hash value. UInt64.
Example
Query:
Result:
wordShingleSimHashCaseInsensitiveUTF8
Splits a UTF-8 string into parts (shingles) of shinglesize
words and returns the word shingle simhash
. Is case insensitive.
Can be used for detection of semi-duplicate strings with bitHammingDistance. The smaller is the Hamming Distance of the calculated simhashes
of two strings, the more likely these strings are the same.
Syntax
Arguments
string
— String. String.shinglesize
— The size of a word shingle. Optional. Possible values: any number from1
to25
. Default value:3
. UInt8.
Returned value
- Hash value. UInt64.
Example
Query:
Result:
wyHash64
Produces a 64-bit wyHash64 hash value.
Syntax
Arguments
string
— String. String.
Returned value
- Hash value. UInt64.
Example
Query:
Result:
ngramMinHash
Splits a ASCII string into n-grams of ngramsize
symbols and calculates hash values for each n-gram. Uses hashnum
minimum hashes to calculate the minimum hash and hashnum
maximum hashes to calculate the maximum hash. Returns a tuple with these hashes. Is case sensitive.
Can be used for detection of semi-duplicate strings with tupleHammingDistance. For two strings: if one of the returned hashes is the same for both strings, we think that those strings are the same.
Syntax
Arguments
string
— String. String.ngramsize
— The size of an n-gram. Optional. Possible values: any number from1
to25
. Default value:3
. UInt8.hashnum
— The number of minimum and maximum hashes used to calculate the result. Optional. Possible values: any number from1
to25
. Default value:6
. UInt8.
Returned value
Example
Query:
Result:
ngramMinHashCaseInsensitive
Splits a ASCII string into n-grams of ngramsize
symbols and calculates hash values for each n-gram. Uses hashnum
minimum hashes to calculate the minimum hash and hashnum
maximum hashes to calculate the maximum hash. Returns a tuple with these hashes. Is case insensitive.
Can be used for detection of semi-duplicate strings with tupleHammingDistance. For two strings: if one of the returned hashes is the same for both strings, we think that those strings are the same.
Syntax
Arguments
string
— String. String.ngramsize
— The size of an n-gram. Optional. Possible values: any number from1
to25
. Default value:3
. UInt8.hashnum
— The number of minimum and maximum hashes used to calculate the result. Optional. Possible values: any number from1
to25
. Default value:6
. UInt8.
Returned value
Example
Query:
Result:
ngramMinHashUTF8
Splits a UTF-8 string into n-grams of ngramsize
symbols and calculates hash values for each n-gram. Uses hashnum
minimum hashes to calculate the minimum hash and hashnum
maximum hashes to calculate the maximum hash. Returns a tuple with these hashes. Is case sensitive.
Can be used for detection of semi-duplicate strings with tupleHammingDistance. For two strings: if one of the returned hashes is the same for both strings, we think that those strings are the same.
Syntax
Arguments
string
— String. String.ngramsize
— The size of an n-gram. Optional. Possible values: any number from1
to25
. Default value:3
. UInt8.hashnum
— The number of minimum and maximum hashes used to calculate the result. Optional. Possible values: any number from1
to25
. Default value:6
. UInt8.
Returned value
Example
Query:
Result:
ngramMinHashCaseInsensitiveUTF8
Splits a UTF-8 string into n-grams of ngramsize
symbols and calculates hash values for each n-gram. Uses hashnum
minimum hashes to calculate the minimum hash and hashnum
maximum hashes to calculate the maximum hash. Returns a tuple with these hashes. Is case insensitive.
Can be used for detection of semi-duplicate strings with tupleHammingDistance. For two strings: if one of the returned hashes is the same for both strings, we think that those strings are the same.
Syntax
Arguments
string
— String. String.ngramsize
— The size of an n-gram. Optional. Possible values: any number from1
to25
. Default value:3
. UInt8.hashnum
— The number of minimum and maximum hashes used to calculate the result. Optional. Possible values: any number from1
to25
. Default value:6
. UInt8.
Returned value
Example
Query:
Result:
ngramMinHashArg
Splits a ASCII string into n-grams of ngramsize
symbols and returns the n-grams with minimum and maximum hashes, calculated by the ngramMinHash function with the same input. Is case sensitive.
Syntax
Arguments
string
— String. String.ngramsize
— The size of an n-gram. Optional. Possible values: any number from1
to25
. Default value:3
. UInt8.hashnum
— The number of minimum and maximum hashes used to calculate the result. Optional. Possible values: any number from1
to25
. Default value:6
. UInt8.
Returned value
Example
Query:
Result:
ngramMinHashArgCaseInsensitive
Splits a ASCII string into n-grams of ngramsize
symbols and returns the n-grams with minimum and maximum hashes, calculated by the ngramMinHashCaseInsensitive function with the same input. Is case insensitive.
Syntax
Arguments
string
— String. String.ngramsize
— The size of an n-gram. Optional. Possible values: any number from1
to25
. Default value:3
. UInt8.hashnum
— The number of minimum and maximum hashes used to calculate the result. Optional. Possible values: any number from1
to25
. Default value:6
. UInt8.
Returned value
Example
Query:
Result:
ngramMinHashArgUTF8
Splits a UTF-8 string into n-grams of ngramsize
symbols and returns the n-grams with minimum and maximum hashes, calculated by the ngramMinHashUTF8 function with the same input. Is case sensitive.
Syntax
Arguments
string
— String. String.ngramsize
— The size of an n-gram. Optional. Possible values: any number from1
to25
. Default value:3
. UInt8.hashnum
— The number of minimum and maximum hashes used to calculate the result. Optional. Possible values: any number from1
to25
. Default value:6
. UInt8.
Returned value
Example
Query:
Result:
ngramMinHashArgCaseInsensitiveUTF8
Splits a UTF-8 string into n-grams of ngramsize
symbols and returns the n-grams with minimum and maximum hashes, calculated by the ngramMinHashCaseInsensitiveUTF8 function with the same input. Is case insensitive.
Syntax
Arguments
string
— String. String.ngramsize
— The size of an n-gram. Optional. Possible values: any number from1
to25
. Default value:3
. UInt8.hashnum
— The number of minimum and maximum hashes used to calculate the result. Optional. Possible values: any number from1
to25
. Default value:6
. UInt8.
Returned value
Example
Query:
Result:
wordShingleMinHash
Splits a ASCII string into parts (shingles) of shinglesize
words and calculates hash values for each word shingle. Uses hashnum
minimum hashes to calculate the minimum hash and hashnum
maximum hashes to calculate the maximum hash. Returns a tuple with these hashes. Is case sensitive.
Can be used for detection of semi-duplicate strings with tupleHammingDistance. For two strings: if one of the returned hashes is the same for both strings, we think that those strings are the same.
Syntax
Arguments
string
— String. String.shinglesize
— The size of a word shingle. Optional. Possible values: any number from1
to25
. Default value:3
. UInt8.hashnum
— The number of minimum and maximum hashes used to calculate the result. Optional. Possible values: any number from1
to25
. Default value:6
. UInt8.
Returned value
Example
Query:
Result:
wordShingleMinHashCaseInsensitive
Splits a ASCII string into parts (shingles) of shinglesize
words and calculates hash values for each word shingle. Uses hashnum
minimum hashes to calculate the minimum hash and hashnum
maximum hashes to calculate the maximum hash. Returns a tuple with these hashes. Is case insensitive.
Can be used for detection of semi-duplicate strings with tupleHammingDistance. For two strings: if one of the returned hashes is the same for both strings, we think that those strings are the same.
Syntax
Arguments
string
— String. String.shinglesize
— The size of a word shingle. Optional. Possible values: any number from1
to25
. Default value:3
. UInt8.hashnum
— The number of minimum and maximum hashes used to calculate the result. Optional. Possible values: any number from1
to25
. Default value:6
. UInt8.
Returned value
Example
Query:
Result:
wordShingleMinHashUTF8
Splits a UTF-8 string into parts (shingles) of shinglesize
words and calculates hash values for each word shingle. Uses hashnum
minimum hashes to calculate the minimum hash and hashnum
maximum hashes to calculate the maximum hash. Returns a tuple with these hashes. Is case sensitive.
Can be used for detection of semi-duplicate strings with tupleHammingDistance. For two strings: if one of the returned hashes is the same for both strings, we think that those strings are the same.
Syntax
Arguments
string
— String. String.shinglesize
— The size of a word shingle. Optional. Possible values: any number from1
to25
. Default value:3
. UInt8.hashnum
— The number of minimum and maximum hashes used to calculate the result. Optional. Possible values: any number from1
to25
. Default value:6
. UInt8.
Returned value
Example
Query:
Result:
wordShingleMinHashCaseInsensitiveUTF8
Splits a UTF-8 string into parts (shingles) of shinglesize
words and calculates hash values for each word shingle. Uses hashnum
minimum hashes to calculate the minimum hash and hashnum
maximum hashes to calculate the maximum hash. Returns a tuple with these hashes. Is case insensitive.
Can be used for detection of semi-duplicate strings with tupleHammingDistance. For two strings: if one of the returned hashes is the same for both strings, we think that those strings are the same.
Syntax
Arguments
string
— String. String.shinglesize
— The size of a word shingle. Optional. Possible values: any number from1
to25
. Default value:3
. UInt8.hashnum
— The number of minimum and maximum hashes used to calculate the result. Optional. Possible values: any number from1
to25
. Default value:6
. UInt8.
Returned value
Example
Query:
Result:
wordShingleMinHashArg
Splits a ASCII string into parts (shingles) of shinglesize
words each and returns the shingles with minimum and maximum word hashes, calculated by the wordshingleMinHash function with the same input. Is case sensitive.
Syntax
Arguments
string
— String. String.shinglesize
— The size of a word shingle. Optional. Possible values: any number from1
to25
. Default value:3
. UInt8.hashnum
— The number of minimum and maximum hashes used to calculate the result. Optional. Possible values: any number from1
to25
. Default value:6
. UInt8.
Returned value
Example
Query:
Result:
wordShingleMinHashArgCaseInsensitive
Splits a ASCII string into parts (shingles) of shinglesize
words each and returns the shingles with minimum and maximum word hashes, calculated by the wordShingleMinHashCaseInsensitive function with the same input. Is case insensitive.
Syntax
Arguments
string
— String. String.shinglesize
— The size of a word shingle. Optional. Possible values: any number from1
to25
. Default value:3
. UInt8.hashnum
— The number of minimum and maximum hashes used to calculate the result. Optional. Possible values: any number from1
to25
. Default value:6
. UInt8.
Returned value
Example
Query:
Result:
wordShingleMinHashArgUTF8
Splits a UTF-8 string into parts (shingles) of shinglesize
words each and returns the shingles with minimum and maximum word hashes, calculated by the wordShingleMinHashUTF8 function with the same input. Is case sensitive.
Syntax
Arguments
string
— String. String.shinglesize
— The size of a word shingle. Optional. Possible values: any number from1
to25
. Default value:3
. UInt8.hashnum
— The number of minimum and maximum hashes used to calculate the result. Optional. Possible values: any number from1
to25
. Default value:6
. UInt8.
Returned value
Example
Query:
Result:
wordShingleMinHashArgCaseInsensitiveUTF8
Splits a UTF-8 string into parts (shingles) of shinglesize
words each and returns the shingles with minimum and maximum word hashes, calculated by the wordShingleMinHashCaseInsensitiveUTF8 function with the same input. Is case insensitive.
Syntax
Arguments
string
— String. String.shinglesize
— The size of a word shingle. Optional. Possible values: any number from1
to25
. Default value:3
. UInt8.hashnum
— The number of minimum and maximum hashes used to calculate the result. Optional. Possible values: any number from1
to25
. Default value:6
. UInt8.
Returned value
Example
Query:
Result:
sqidEncode
Encodes numbers as a Sqid which is a YouTube-like ID string.
The output alphabet is abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ0123456789
.
Do not use this function for hashing - the generated IDs can be decoded back into the original numbers.
Syntax
Alias: sqid
Arguments
- A variable number of UInt8, UInt16, UInt32 or UInt64 numbers.
Returned Value
A sqid String.
Example
sqidDecode
Decodes a Sqid back into its original numbers. Returns an empty array in case the input string is not a valid sqid.
Syntax
Arguments
- A sqid - String
Returned Value
The sqid transformed to numbers Array(UInt64).
Example
keccak256
Calculates Keccak-256 hash string and returns the resulting set of bytes as FixedString.
Syntax
This cryptographic hash-function is used a lot in EVM-based blockchains.
Arguments
- s - input string for Keccak-256 hash calculation. String.
Return value
- Keccak-256 hash as a byte array with type FixedString(32). FixedString.
Example
Use function hex to format the result as a hex-encoded string.
Query:
Result:
BLAKE3
Introduced in: v22.10
Calculates BLAKE3 hash string and returns the resulting set of bytes as FixedString. This cryptographic hash-function is integrated into ClickHouse with BLAKE3 Rust library. The function is rather fast and shows approximately two times faster performance compared to SHA-2, while generating hashes of the same length as SHA-256. It returns a BLAKE3 hash as a byte array with type FixedString(32).
Syntax
Arguments
message
— The input string to hash.String
Returned value
Returns the 32-byte BLAKE3 hash of the input string as a fixed-length string. FixedString(32)
Examples
hash
MD4
Introduced in: v21.11
Calculates the MD4 hash of the given string.
Syntax
Arguments
s
— The input string to hash.String
Returned value
Returns the MD4 hash of the given input string as a fixed-length string. FixedString(16)
Examples
Usage example
MD5
Introduced in: v1.1
Calculates the MD5 hash of the given string.
Syntax
Arguments
s
— The input string to hash.String
Returned value
Returns the MD5 hash of the given input string as a fixed-length string. FixedString(16)
Examples
Usage example
RIPEMD160
Introduced in: v24.10
Calculates the RIPEMD-160 hash of the given string.
Syntax
Arguments
s
— The input string to hash.String
Returned value
Returns the RIPEMD160 hash of the given input string as a fixed-length string. FixedString(20)
Examples
Usage example
SHA1
Introduced in: v1.1
Calculates the SHA1 hash of the given string.
Syntax
Arguments
s
— The input string to hashString
Returned value
Returns the SHA1 hash of the given input string as a fixed-length string. FixedString(20)
Examples
Usage example
SHA224
Introduced in: v1.1
Calculates the SHA224 hash of the given string.
Syntax
Arguments
s
— The input value to hash.String
Returned value
Returns the SHA224 hash of the given input string as a fixed-length string. FixedString(28)
Examples
Usage example
SHA256
Introduced in: v1.1
Calculates the SHA256 hash of the given string.
Syntax
Arguments
s
— The input string to hash.String
Returned value
Returns the SHA256 hash of the given input string as a fixed-length string. FixedString(32)
Examples
Usage example
SHA384
Introduced in: v1.1
Calculates the SHA384 hash of the given string.
Syntax
Arguments
s
— The input string to hash.String
Returned value
Returns the SHA384 hash of the given input string as a fixed-length string. FixedString(48)
Examples
Usage example
SHA512
Introduced in: v1.1
Calculates the SHA512 hash of the given string.
Syntax
Arguments
s
— The input string to hashString
Returned value
Returns the SHA512 hash of the given input string as a fixed-length string. FixedString(64)
Examples
Usage example
SHA512_256
Introduced in: v1.1
Calculates the SHA512_256 hash of the given string.
Syntax
Arguments
s
— The input string to hash.String
Returned value
Returns the SHA512_256 hash of the given input string as a fixed-length string. FixedString(32)
Examples
Usage example
URLHash
Introduced in: v1.1
A fast, decent-quality non-cryptographic hash function for a string obtained from a URL using some type of normalization.
This hash function has two modes:
Mode | Description |
---|---|
URLHash(url) | Calculates a hash from a string without one of the trailing symbols / ,? or # at the end, if present. |
URLHash(url, N) | Calculates a hash from a string up to the N level in the URL hierarchy, without one of the trailing symbols / ,? or # at the end, if present. Levels are the same as in URLHierarchy . |
Syntax
Arguments
Returned value
Returns the computed hash value of url
. UInt64
Examples
Usage example
Hash of url with specified level
cityHash64
Introduced in: v1.1
Produces a 64-bit CityHash hash value.
This is a fast non-cryptographic hash function. It uses the CityHash algorithm for string parameters and implementation-specific fast non-cryptographic hash function for parameters with other data types. The function uses the CityHash combinator to get the final results.
Google changed the algorithm of CityHash after it was added to ClickHouse. In other words, ClickHouse's cityHash64 and Google's upstream CityHash now produce different results. ClickHouse cityHash64 corresponds to CityHash v1.0.2.
The calculated hash values may be equal for the same input values of different argument types.
This affects for example integer types of different size, named and unnamed Tuple
with the same data, Map
and the corresponding Array(Tuple(key, value))
type with the same data.
Syntax
Arguments
arg1[, arg2, ...]
— A variable number of input arguments for which to compute the hash.Any
Returned value
Returns the computed hash of the input arguments. UInt64
Examples
Call example
Computing the checksum of the entire table with accuracy up to the row order
farmFingerprint64
Introduced in: v20.12
Produces a 64-bit FarmHash value using the Fingerprint64
method.
farmFingerprint64
is preferred for a stable and portable value over farmHash64
.
The calculated hash values may be equal for the same input values of different argument types.
This affects for example integer types of different size, named and unnamed Tuple
with the same data, Map
and the corresponding Array(Tuple(key, value))
type with the same data.
Syntax
Arguments
arg1[, arg2, ...]
— A variable number of input arguments for which to compute the hash.Any
Returned value
Returns the computed hash value of the input arguments. UInt64
Examples
Usage example
farmHash64
Introduced in: v1.1
Produces a 64-bit FarmHash using the Hash64
method.
farmFingerprint64
is preferred for a stable and portable value.
The calculated hash values may be equal for the same input values of different argument types.
This affects for example integer types of different size, named and unnamed Tuple
with the same data, Map
and the corresponding Array(Tuple(key, value))
type with the same data.
Syntax
Arguments
arg1[, arg2, ...]
— A variable number of input arguments for which to compute the hash.Any
Returned value
Returns the computed hash value of the input arguments. UInt64
Examples
Usage example
gccMurmurHash
Introduced in: v20.1
Computes the 64-bit MurmurHash2 hash of the input value using the same seed as used by GCC.
It is portable between Clang and GCC builds.
Syntax
Arguments
arg1[, arg2, ...]
— A variable number of arguments for which to compute the hash.Any
Returned value
Returns the calculated hash value of the input arguments. UInt64
Examples
Usage example
halfMD5
Introduced in: v1.1
Interprets all the input parameters as strings and calculates the MD5 hash value for each of them. Then combines hashes, takes the first 8 bytes of the hash of the resulting string, and interprets them as UInt64 in big-endian byte order. The function is relatively slow (5 million short strings per second per processor core).
Consider using the sipHash64
function instead.
The function takes a variable number of input parameters. Arguments can be any of the supported data types. For some data types calculated value of hash function may be the same for the same values even if types of arguments differ (integers of different size, named and unnamed Tuple with the same data, Map and the corresponding Array(Tuple(key, value)) type with the same data).
Syntax
Arguments
arg1[, arg2, ..., argN]
— Variable number of arguments for which to compute the hash.Any
Returned value
Returns the computed half MD5 hash of the given input params returned as a UInt64
in big-endian byte order. UInt64
Examples
Usage example
hiveHash
Introduced in: v20.1
Calculates a "HiveHash" from a string.
This is just JavaHash
with zeroed out sign bits.
This function is used in Apache Hive for versions before 3.0.
This hash function is unperformant. Use it only when this algorithm is already used in another system and you need to calculate the same result. :::
Syntax
Arguments
arg
— Input string to hash.String
Returned value
Returns the computed "hive hash" of the input string. Int32
Examples
Usage example
intHash32
Introduced in: v1.1
Calculates a 32-bit hash of an integer.
The hash function is relatively fast but not cryptographic hash function.
Syntax
Arguments
arg
— Integer to hash.(U)Int*
Returned value
Returns the computed 32-bit hash code of the input integer UInt32
Examples
Usage example
intHash64
Introduced in: v1.1
Calculates a 64-bit hash of an integer.
The hash function is relatively fast (even faster than intHash32
) but not a cryptographic hash function.
Syntax
Arguments
int
— Integer to hash.(U)Int*
Returned value
64-bit hash code. UInt64
Examples
Usage example
javaHash
Introduced in: v20.1
Calculates JavaHash from:
This hash function is unperformant. Use it only when this algorithm is already in use in another system and you need to calculate the same result.
Java only supports calculating the hash of signed integers, so if you want to calculate a hash of unsigned integers you must cast them to the proper signed ClickHouse types.
Syntax
Arguments
arg
— Input value to hash.Any
Returned value
Returns the computed hash of arg
Int32
Examples
Usage example 1
Usage example 2
javaHashUTF16LE
Introduced in: v20.1
Calculates JavaHash from a string, assuming it contains bytes representing a string in UTF-16LE encoding.
Syntax
Arguments
arg
— A string in UTF-16LE encoding.String
Returned value
Returns the computed hash of the UTF-16LE encoded string. Int32
Examples
Usage example
jumpConsistentHash
Introduced in: v1.1
Calculates the jump consistent hash for an integer.
Syntax
Arguments
Returned value
Returns the computed hash value. Int32
Examples
Usage example
kafkaMurmurHash
Introduced in: v23.4
Calculates the 32-bit MurmurHash2 hash of the input value using the same seed as used by Kafka and without the highest bit to be compatible with Default Partitioner.
Syntax
Arguments
arg1[, arg2, ...]
— A variable number of parameters for which to compute the hash.Any
Returned value
Returns the calculated hash value of the input arguments. UInt32
Examples
Usage example
keccak256
Introduced in: v25.4
Calculates the Keccak-256 cryptographic hash of the given string. This hash function is widely used in blockchain applications, particularly Ethereum.
Syntax
Arguments
message
— The input string to hash.String
Returned value
Returns the 32-byte Keccak-256 hash of the input string as a fixed-length string. FixedString(32)
Examples
Usage example
kostikConsistentHash
Introduced in: v22.6
An O(1) time and space consistent hash algorithm by Konstantin 'Kostik' Oblakov.
Only efficient with n <= 32768
.
Syntax
Arguments
Returned value
Returns the computed hash value. UInt16
Examples
Usage example
metroHash64
Introduced in: v1.1
Produces a 64-bit MetroHash hash value.
The calculated hash values may be equal for the same input values of different argument types.
This affects for example integer types of different size, named and unnamed Tuple
with the same data, Map
and the corresponding Array(Tuple(key, value))
type with the same data.
Syntax
Arguments
arg1[, arg2, ...]
— A variable number of input arguments for which to compute the hash.Any
Returned value
Returns the computed hash of the input arguments. UInt64
Examples
Usage example
murmurHash2_32
Introduced in: v18.5
Computes the MurmurHash2 hash of the input value.
The calculated hash values may be equal for the same input values of different argument types.
This affects for example integer types of different size, named and unnamed Tuple
with the same data, Map
and the corresponding Array(Tuple(key, value))
type with the same data.
Syntax
Arguments
arg1[, arg2, ...]
— A variable number of input arguments for which to compute the hash.Any
Returned value
Returns the computed hash value of the input arguments. UInt32
Examples
Usage example
murmurHash2_64
Introduced in: v18.10
Computes the MurmurHash2 hash of the input value.
The calculated hash values may be equal for the same input values of different argument types.
This affects for example integer types of different size, named and unnamed Tuple
with the same data, Map
and the corresponding Array(Tuple(key, value))
type with the same data.
Syntax
Arguments
arg1[, arg2, ...]
— A variable number of input arguments for which to compute the hash.Any
Returned value
Returns the computed hash of the input arguments. UInt64
Examples
Usage example
murmurHash3_128
Introduced in: v18.10
Computes the 128-bit MurmurHash3 hash of the input value.
Syntax
Arguments
arg1[, arg2, ...]
— A variable number of input arguments for which to compute the hash.Any
Returned value
Returns the computed 128-bit MurmurHash3
hash value of the input arguments. FixedString(16)
Examples
Usage example
murmurHash3_32
Introduced in: v18.10
Produces a MurmurHash3 hash value.
The calculated hash values may be equal for the same input values of different argument types.
This affects for example integer types of different size, named and unnamed Tuple
with the same data, Map
and the corresponding Array(Tuple(key, value))
type with the same data.
Syntax
Arguments
arg1[, arg2, ...]
— A variable number of input arguments for which to compute the hash.Any
Returned value
Returns the computed hash value of the input arguments. UInt32
Examples
Usage example
murmurHash3_64
Introduced in: v18.10
Computes the MurmurHash3 hash of the input value.
The calculated hash values may be equal for the same input values of different argument types.
This affects for example integer types of different size, named and unnamed Tuple
with the same data, Map
and the corresponding Array(Tuple(key, value))
type with the same data.
Syntax
Arguments
arg1[, arg2, ...]
— A variable number of input arguments for which to compute the hash.Any
Returned value
Returns the computed hash value of the input arguments. UInt64
Examples
Usage example
ngramMinHash
Introduced in: v21.1
Splits a ASCII string into n-grams of ngramsize
symbols and calculates hash values for each n-gram and returns a tuple with these hashes.
Uses hashnum
minimum hashes to calculate the minimum hash and hashnum
maximum hashes to calculate the maximum hash.
It is case sensitive.
Can be used to detect semi-duplicate strings with tupleHammingDistance
.
For two strings, if the returned hashes are the same for both strings, then those strings are the same.
Syntax
Arguments
string
— String for which to compute the hash.String
ngramsize
— Optional. The size of an n-gram, any number from1
to25
. The default value is3
.UInt8
hashnum
— Optional. The number of minimum and maximum hashes used to calculate the result, any number from1
to25
. The default value is6
.UInt8
Returned value
Returns a tuple with two hashes — the minimum and the maximum. Tuple
Examples
Usage example
ngramMinHashArg
Introduced in: v21.1
Splits a ASCII string into n-grams of ngramsize
symbols and returns the n-grams with minimum and maximum hashes, calculated by the ngramMinHash
function with the same input.
It is case sensitive.
Syntax
Arguments
string
— String for which to compute the hash.String
ngramsize
— Optional. The size of an n-gram, any number from1
to25
. The default value is3
.UInt8
hashnum
— Optional. The number of minimum and maximum hashes used to calculate the result, any number from1
to25
. The default value is6
.UInt8
Returned value
Returns a tuple with two tuples with hashnum
n-grams each. Tuple(String)
Examples
Usage example
ngramMinHashArgCaseInsensitive
Introduced in: v21.1
Splits a ASCII string into n-grams of ngramsize
symbols and returns the n-grams with minimum and maximum hashes, calculated by the ngramMinHashCaseInsensitive
function with the same input.
It is case insensitive.
Syntax
Arguments
string
— String for which to compute the hash.String
ngramsize
— Optional. The size of an n-gram, any number from1
to25
. The default value is3
.UInt8
hashnum
— Optional. The number of minimum and maximum hashes used to calculate the result, any number from1
to25
. The default value is6
.UInt8
Returned value
Returns a tuple with two tuples with hashnum
n-grams each. Tuple(Tuple(String))
Examples
Usage example
ngramMinHashArgCaseInsensitiveUTF8
Introduced in: v21.1
Splits a UTF-8 string into n-grams of ngramsize
symbols and returns the n-grams with minimum and maximum hashes, calculated by the ngramMinHashCaseInsensitiveUTF8 function with the same input.
It is case insensitive.
Syntax
Arguments
string
— String for which to compute the hash.String
ngramsize
— Optional. The size of an n-gram, any number from1
to25
. The default value is3
.UInt8
hashnum
— Optional. The number of minimum and maximum hashes used to calculate the result, any number from1
to25
. The default value is6
.UInt8
Returned value
Returns a tuple with two tuples with hashnum
n-grams each. Tuple(Tuple(String))
Examples
Usage example
ngramMinHashArgUTF8
Introduced in: v21.1
Splits a UTF-8 string into n-grams of ngramsize
symbols and returns the n-grams with minimum and maximum hashes, calculated by the ngramMinHashUTF8
function with the same input.
It is case sensitive.
Syntax
Arguments
string
— String for which to compute the hash.String
ngramsize
— Optional. The size of an n-gram, any number from1
to25
. The default value is3
.UInt8
hashnum
— Optional. The number of minimum and maximum hashes used to calculate the result, any number from1
to25
. The default value is6
.UInt8
Returned value
Returns a tuple with two tuples with hashnum
n-grams each. Tuple(Tuple(String))
Examples
Usage example
ngramMinHashCaseInsensitive
Introduced in: v21.1
Splits a ASCII string into n-grams of ngramsize
symbols and calculates hash values for each n-gram and returns a tuple with these hashes
Uses hashnum
minimum hashes to calculate the minimum hash and hashnum
maximum hashes to calculate the maximum hash.
It is case insensitive.
Can be used to detect semi-duplicate strings with tupleHammingDistance
.
For two strings, if the returned hashes are the same for both strings, then those strings are the same.
Syntax
Arguments
string
— String. String. -ngramsize
— The size of an n-gram. Optional. Possible values: any number from1
to25
. Default value:3
. UInt8. -hashnum
— The number of minimum and maximum hashes used to calculate the result. Optional. Possible values: any number from1
to25
. Default value:6
. UInt8.
Returned value
Tuple with two hashes — the minimum and the maximum. Tuple(UInt64, UInt64). Tuple
Examples
Usage example
ngramMinHashCaseInsensitiveUTF8
Introduced in: v21.1
Splits a UTF-8 string into n-grams of ngramsize
symbols and calculates hash values for each n-gram and returns a tuple with these hashes..
Uses hashnum
minimum hashes to calculate the minimum hash and hashnum
maximum hashes to calculate the maximum hash.
It is case insensitive.
Can be used to detect semi-duplicate strings with tupleHammingDistance
.
For two strings, if the returned hashes are the same for both strings, then those strings are the same.
Syntax
Arguments
string
— String for which to compute the hash.String
ngramsize
— Optional. The size of an n-gram, any number from1
to25
. The default value is3
.UInt8
hashnum
— Optional. The number of minimum and maximum hashes used to calculate the result, any number from1
to25
. The default value is6
.UInt8
Returned value
Returns a tuple with two hashes — the minimum and the maximum. Tuple
Examples
Usage example
ngramMinHashUTF8
Introduced in: v21.1
Splits a UTF-8 string into n-grams of ngramsize
symbols and calculates hash values for each n-gram and returns a tuple with these hashes.
Uses hashnum
minimum hashes to calculate the minimum hash and hashnum
maximum hashes to calculate the maximum hash.
It is case sensitive.
Can be used to detect semi-duplicate strings with tupleHammingDistance
.
For two strings, if the returned hashes are the same for both strings, then those strings are the same.
Syntax
Arguments
string
— String for which to compute the hash.String
ngramsize
— Optional. The size of an n-gram, any number from1
to25
. The default value is3
.UInt8
hashnum
— Optional. The number of minimum and maximum hashes used to calculate the result, any number from1
to25
. The default value is6
.UInt8
Returned value
Returns a tuple with two hashes — the minimum and the maximum. Tuple
Examples
Usage example
ngramSimHash
Introduced in: v21.1
Splits a ASCII string into n-grams of ngramsize
symbols and returns the n-gram simhash
.
Can be used for detection of semi-duplicate strings with bitHammingDistance
.
The smaller the Hamming distance of the calculated simhashes
of two strings, the more likely these strings are the same.
Syntax
Arguments
string
— String for which to compute the case sensitivesimhash
.String
ngramsize
— Optional. The size of an n-gram, any number from1
to25
. The default value is3
.UInt8
Returned value
Returns the computed hash of the input string. UInt64
Examples
Usage example
ngramSimHashCaseInsensitive
Introduced in: v21.1
Splits a ASCII string into n-grams of ngramsize
symbols and returns the n-gram simhash
.
It is case insensitive.
Can be used for detection of semi-duplicate strings with bitHammingDistance
.
The smaller the Hamming distance of the calculated simhashes
of two strings, the more likely these strings are the same.
Syntax
Arguments
string
— String for which to compute the case insensitivesimhash
.String
ngramsize
— Optional. The size of an n-gram, any number from1
to25
. The default value is3
.UInt8
Returned value
Examples
Usage example
ngramSimHashCaseInsensitiveUTF8
Introduced in: v21.1
Splits a UTF-8 string into n-grams of ngramsize
symbols and returns the n-gram simhash
.
It is case insensitive.
Can be used for detection of semi-duplicate strings with bitHammingDistance. The smaller is the Hamming Distance of the calculated simhashes
of two strings, the more likely these strings are the same.
Syntax
Arguments
string
— String for which to compute the hash.String
ngramsize
— Optional. The size of an n-gram, any number from1
to25
. The default value is3
.UInt8
Returned value
Returns the computed hash value. UInt64
Examples
Usage example
ngramSimHashUTF8
Introduced in: v21.1
Splits a UTF-8 encoded string into n-grams of ngramsize
symbols and returns the n-gram simhash
.
It is case sensitive.
Can be used for detection of semi-duplicate strings with bitHammingDistance
.
The smaller the Hamming distance of the calculated simhashes
of two strings, the more likely these strings are the same.
Syntax
Arguments
string
— String for which to compute the hash.String
ngramsize
— Optional. The size of an n-gram, any number from1
to25
. The default value is3
.UInt8
Returned value
Returns the computed hash value. UInt64
Examples
Usage example
sipHash128
Introduced in: v1.1
Like sipHash64
but produces a 128-bit hash value, i.e. the final xor-folding state is done up to 128 bits.
This 128-bit variant differs from the reference implementation and is weaker.
This version exists because, when it was written, there was no official 128-bit extension for SipHash.
New projects are advised to use sipHash128Reference
.
Syntax
Arguments
arg1[, arg2, ...]
— A variable number of input arguments for which to compute the hash.Any
Returned value
Returns a 128-bit SipHash
hash value. FixedString(16)
Examples
Usage example
sipHash128Keyed
Introduced in: v23.2
Same as sipHash128
but additionally takes an explicit key argument instead of using a fixed key.
This 128-bit variant differs from the reference implementation and it's weaker.
This version exists because, when it was written, there was no official 128-bit extension for SipHash.
New projects should probably use sipHash128ReferenceKeyed
.
Syntax
Arguments
(k0, k1)
— A tuple of two UInt64 values representing the key.Tuple(UInt64, UInt64)
arg1[, arg2, ...]
— A variable number of input arguments for which to compute the hash.Any
Returned value
A 128-bit SipHash
hash value of type FixedString(16). FixedString(16)
Examples
Usage example
sipHash128Reference
Introduced in: v23.2
Like sipHash128
but implements the 128-bit algorithm from the original authors of SipHash.
Syntax
Arguments
arg1[, arg2, ...]
— A variable number of input arguments for which to compute the hash.Any
Returned value
Returns the computed 128-bit SipHash
hash value of the input arguments. FixedString(16)
Examples
Usage example
sipHash128ReferenceKeyed
Introduced in: v23.2
Same as sipHash128Reference
but additionally takes an explicit key argument instead of using a fixed key.
Syntax
Arguments
(k0, k1)
— Tuple of two values representing the keyTuple(UInt64, UInt64)
arg1[, arg2, ...]
— A variable number of input arguments for which to compute the hash.Any
Returned value
Returns the computed 128-bit SipHash
hash value of the input arguments. FixedString(16)
Examples
Usage example
sipHash64
Introduced in: v1.1
Produces a 64-bit SipHash hash value.
This is a cryptographic hash function. It works at least three times faster than the MD5
hash function.
The function interprets all the input parameters as strings and calculates the hash value for each of them. It then combines the hashes using the following algorithm:
- The first and the second hash value are concatenated to an array which is hashed.
- The previously calculated hash value and the hash of the third input parameter are hashed in a similar way.
- This calculation is repeated for all remaining hash values of the original input.
the calculated hash values may be equal for the same input values of different argument types.
This affects for example integer types of different size, named and unnamed Tuple
with the same data, Map
and the corresponding Array(Tuple(key, value))
type with the same data.
Syntax
Arguments
arg1[, arg2, ...]
— A variable number of input arguments.Any
Returned value
Returns a computed hash value of the input arguments. UInt64
Examples
Usage example
sipHash64Keyed
Introduced in: v23.2
Like sipHash64
but additionally takes an explicit key argument instead of using a fixed key.
Syntax
Arguments
(k0, k1)
— A tuple of two values representing the key.Tuple(UInt64, UInt64)
arg1[,arg2, ...]
— A variable number of input arguments.Any
Returned value
Returns the computed hash of the input values. UInt64
Examples
Usage example
wordShingleMinHash
Introduced in: v21.1
Splits a ASCII string into parts (shingles) of shinglesize
words, calculates hash values for each word shingle and returns a tuple with these hashes.
Uses hashnum
minimum hashes to calculate the minimum hash and hashnum
maximum hashes to calculate the maximum hash.
It is case sensitive.
Can be used to detect semi-duplicate strings with tupleHammingDistance
.
For two strings, if the returned hashes are the same for both strings, then those strings are the same.
Syntax
Arguments
string
— String for which to compute the hash.String
shinglesize
— Optional. The size of a word shingle, any number from1
to25
. The default value is3
.UInt8
hashnum
— Optional. The number of minimum and maximum hashes used to calculate the result, any number from1
to25
. The default value is6
.UInt8
Returned value
Returns a tuple with two hashes — the minimum and the maximum. Tuple(UInt64, UInt64)
Examples
Usage example
wordShingleMinHashArg
Introduced in: v1.1
Splits a ASCII string into parts (shingles) of shinglesize
words each and returns the shingles with minimum and maximum word hashes, calculated by the wordShingleMinHash function with the same input.
It is case sensitive.
Syntax
Arguments
string
— String for which to compute the hash.String
shinglesize
— Optional. The size of a word shingle, any number from1
to25
. The default value is3
.UInt8
hashnum
— Optional. The number of minimum and maximum hashes used to calculate the result, any number from1
to25
. The default value is6
.UInt8
Returned value
Returns a tuple with two tuples with hashnum
word shingles each. Tuple(Tuple(String))
Examples
Usage example
wordShingleMinHashArgCaseInsensitive
Introduced in: v21.1
Splits a ASCII string into parts (shingles) of shinglesize
words each and returns the shingles with minimum and maximum word hashes, calculated by the wordShingleMinHashCaseInsensitive
function with the same input.
It is case insensitive.
Syntax
Arguments
string
— String for which to compute the hash.String
shinglesize
— Optional. The size of a word shingle, any number from1
to25
. The default value is3
.UInt8
hashnum
— Optional. The number of minimum and maximum hashes used to calculate the result, any number from1
to25
. The default value is6
.UInt8
Returned value
Returns a tuple with two tuples with hashnum
word shingles each. Tuple(Tuple(String))
Examples
Usage example
wordShingleMinHashArgCaseInsensitiveUTF8
Introduced in: v21.1
Splits a UTF-8 string into parts (shingles) of shinglesize
words each and returns the shingles with minimum and maximum word hashes, calculated by the wordShingleMinHashCaseInsensitiveUTF8
function with the same input.
It is case insensitive.
Syntax
Arguments
string
— String for which to compute the hash.String
shinglesize
— Optional. The size of a word shingle, any number from1
to25
. The default value is3
.UInt8
hashnum
— Optional. The number of minimum and maximum hashes used to calculate the result, any number from1
to25
. The default value is6
.UInt8
Returned value
Returns a tuple with two tuples with hashnum
word shingles each. Tuple(Tuple(String))
Examples
Usage example
wordShingleMinHashArgUTF8
Introduced in: v21.1
Splits a UTF-8 string into parts (shingles) of shinglesize
words each and returns the shingles with minimum and maximum word hashes, calculated by the wordShingleMinHashUTF8
function with the same input.
It is case sensitive.
Syntax
Arguments
string
— String for which to compute the hash.String
shinglesize
— Optional. The size of a word shingle, any number from1
to25
. The default value is3
.UInt8
hashnum
— Optional. The number of minimum and maximum hashes used to calculate the result, any number from1
to25
. The default value is6
.UInt8
Returned value
Returns a tuple with two tuples with hashnum
word shingles each. Tuple(Tuple(String))
Examples
Usage example
wordShingleMinHashCaseInsensitive
Introduced in: v21.1
Splits a ASCII string into parts (shingles) of shinglesize
words, calculates hash values for each word shingle and returns a tuple with these hashes.
Uses hashnum
minimum hashes to calculate the minimum hash and hashnum
maximum hashes to calculate the maximum hash.
It is case insensitive.
Can be used to detect semi-duplicate strings with tupleHammingDistance
.
For two strings, if the returned hashes are the same for both strings, then those strings are the same.
Syntax
Arguments
string
— String for which to compute the hash.String
shinglesize
— Optional. The size of a word shingle, any number from1
to25
. The default value is3
.UInt8
hashnum
— Optional. The number of minimum and maximum hashes used to calculate the result, any number from1
to25
. The default value is6
.UInt8
Returned value
Returns a tuple with two hashes — the minimum and the maximum. Tuple(UInt64, UInt64)
Examples
Usage example
wordShingleMinHashCaseInsensitiveUTF8
Introduced in: v21.1
Splits a UTF-8 string into parts (shingles) of shinglesize
words, calculates hash values for each word shingle and returns a tuple with these hashes.
Uses hashnum
minimum hashes to calculate the minimum hash and hashnum
maximum hashes to calculate the maximum hash.
It is case insensitive.
Can be used to detect semi-duplicate strings with tupleHammingDistance
.
For two strings, if the returned hashes are the same for both strings, then those strings are the same.
Syntax
Arguments
string
— String for which to compute the hash.String
shinglesize
— Optional. The size of a word shingle, any number from1
to25
. The default value is3
.UInt8
hashnum
— Optional. The number of minimum and maximum hashes used to calculate the result, any number from1
to25
. The default value is6
.UInt8
Returned value
Returns a tuple with two hashes — the minimum and the maximum. Tuple(UInt64, UInt64)
Examples
Usage example
wordShingleMinHashUTF8
Introduced in: v21.1
Splits a UTF-8 string into parts (shingles) of shinglesize
words, calculates hash values for each word shingle and returns a tuple with these hashes.
Uses hashnum
minimum hashes to calculate the minimum hash and hashnum
maximum hashes to calculate the maximum hash.
It is case sensitive.
Can be used to detect semi-duplicate strings with tupleHammingDistance
.
For two strings, if the returned hashes are the same for both strings, then those strings are the same.
Syntax
Arguments
string
— String for which to compute the hash.String
shinglesize
— Optional. The size of a word shingle, any number from1
to25
. The default value is3
.UInt8
hashnum
— Optional. The number of minimum and maximum hashes used to calculate the result, any number from1
to25
. The default value is6
.UInt8
Returned value
Returns a tuple with two hashes — the minimum and the maximum. Tuple(UInt64, UInt64)
Examples
Usage example
wordShingleSimHash
Introduced in: v21.1
Splits a ASCII string into parts (shingles) of shinglesize
words and returns the word shingle simhash
.
Is is case sensitive.
Can be used for detection of semi-duplicate strings with bitHammingDistance
.
The smaller the Hamming distance of the calculated simhashes
of two strings, the more likely these strings are the same.
Syntax
Arguments
string
— String for which to compute the hash.String
shinglesize
— Optional. The size of a word shingle, any number from1
to25
. The default value is3
.UInt8
Returned value
Returns the computed hash value. UInt64
Examples
Usage example
wordShingleSimHashCaseInsensitive
Introduced in: v21.1
Splits a ASCII string into parts (shingles) of shinglesize
words and returns the word shingle simhash
.
It is case insensitive.
Can be used for detection of semi-duplicate strings with bitHammingDistance
.
The smaller the Hamming distance of the calculated simhashes
of two strings, the more likely these strings are the same.
Syntax
Arguments
string
— String for which to compute the hash.String
shinglesize
— Optional. The size of a word shingle, any number from1
to25
. The default value is3
.UInt8
Returned value
Returns the computed hash value. UInt64
Examples
Usage example
wordShingleSimHashCaseInsensitiveUTF8
Introduced in: v1.1
Splits a UTF-8 encoded string into parts (shingles) of shinglesize
words and returns the word shingle simhash
.
It is case insensitive.
Can be used for detection of semi-duplicate strings with bitHammingDistance
.
The smaller the Hamming Distance of the calculated simhashes
of two strings, the more likely these strings are the same.
Syntax
Arguments
string
— String for which to compute the hash.String
shinglesize
— Optional. The size of a word shingle, any number from1
to25
. The default value is3
.UInt8
Returned value
Returns the computed hash value. UInt64
Examples
Usage example
wordShingleSimHashUTF8
Introduced in: v21.1
Splits a UTF-8 string into parts (shingles) of shinglesize
words and returns the word shingle simhash
.
It is case sensitive.
Can be used for detection of semi-duplicate strings with bitHammingDistance
.
The smaller the Hamming distance of the calculated simhashes
of two strings, the more likely these strings are the same.
Syntax
Arguments
string
— String for which to compute the hash.String
shinglesize
— Optional. The size of a word shingle, any number from1
to25
. The default value is3
.UInt8
Returned value
Returns the computed hash value. UInt64
Examples
Usage example
wyHash64
Introduced in: v22.7
Computes a 64-bit wyHash64 hash value.
Syntax
Arguments
arg
— String argument for which to compute the hash.String
Returned value
Returns the computed 64-bit hash value UInt64
Examples
Usage example
xxHash32
Introduced in: v20.1
Calculates a xxHash from a string.
For the 64-bit version see xxHash64
Syntax
Arguments
arg
— Input string to hash.String
Returned value
Returns the computed 32-bit hash of the input string. UInt32
Examples
Usage example
xxHash64
Introduced in: v20.1
Calculates a xxHash from a string.
For the 32-bit version see xxHash32
Syntax
Arguments
arg
— Input string to hash.String
Returned value
Returns the computed 64-bit hash of the input string. UInt64
Examples
Usage example
xxh3
Introduced in: v22.12
Computes a XXH3 64-bit hash value.
Syntax
Arguments
expr
— A list of expressions of any data type.Any
Returned value
Returns the computed 64-bit xxh3
hash value UInt64
Examples
Usage example