Danh hiệu: Administration
Nhóm: Administrators
Gia nhập: 23-07-2013(UTC) Bài viết: 6,117   Đến từ: Vietnam Cảm ơn: 10 lần Được cảm ơn: 2 lần trong 2 bài viết
|
The fuzzystrmatch module provides several functions to determine similarities and distance between strings. Caution | At present, fuzzystrmatch does not work well with multi-byte encodings (such as UTF-8). |
The Soundex system is a method of matching similar-sounding names by converting them to the same code. It was initially used by the United States Census in 1880, 1900, and 1910. Note that Soundex is not very useful for non-English names. The fuzzystrmatch module provides two functions for working with Soundex codes: soundex(text) returns text difference(text, text) returns int The soundex function converts a string to its Soundex code. The difference function converts two strings to their Soundex codes and then reports the number of matching code positions. Since Soundex codes have four characters, the result ranges from zero to four, with zero being no match and four being an exact match. (Thus, the function is misnamed —similarity would have been a better name.) Here are some usage examples: SELECT soundex('hello world!');SELECT soundex('Anne'), soundex('Ann'), difference('Anne', 'Ann'); SELECT soundex('Anne'), soundex('Andrew'), difference('Anne', 'Andrew'); SELECT soundex('Anne'), soundex('Margaret'), difference('Anne', 'Margaret'); CREATE TABLE s (nm text); INSERT INTO s VALUES ('john'); INSERT INTO s VALUES ('joan'); INSERT INTO s VALUES ('wobbly'); INSERT INTO s VALUES ('jack'); SELECT * FROM s WHERE soundex(nm) = soundex('john'); SELECT * FROM s WHERE difference(s.nm, 'john') > 2; This function calculates the Levenshtein distance between two strings: levenshtein(text source, text target, int ins_cost, int del_cost, int sub_cost) returns int levenshtein(text source, text target) returns int Both source and target can be any non-null string, with a maximum of 255 bytes. The cost parameters specify how much to charge for a character insertion, deletion, or substitution, respectively. You can omit the cost parameters, as in the second version of the function; in that case they all default to 1. Examples: test=# SELECT levenshtein('GUMBO', 'GAMBOL'); levenshtein 2(1 row) test=# SELECT levenshtein('GUMBO', 'GAMBOL', 2,1,1); levenshtein 3(1 row) Metaphone, like Soundex, is based on the idea of constructing a representative code for an input string. Two strings are then deemed similar if they have the same codes. This function calculates the metaphone code of an input string: metaphone(text source, int max_output_length) returns text source has to be a non-null string with a maximum of 255 characters. max_output_length sets the maximum length of the output metaphone code; if longer, the output is truncated to this length. Example: test=# SELECT metaphone('GUMBO', 4); metaphone KM(1 row) The Double Metaphone system computes two "sounds like" strings for a given input string — a "primary" and an "alternate". In most cases they are the same, but for non-English names especially they can be a bit different, depending on pronunciation. These functions compute the primary and alternate codes: dmetaphone(text source) returns text dmetaphone_alt(text source) returns text There is no length limit on the input strings. Example: test=# select dmetaphone('gumbo'); dmetaphone KMP(1 row)
This module implements a data type hstore for storing sets of (key,value) pairs within a single PostgreSQL data field. This can be useful in various scenarios, such as rows with many attributes that are rarely examined, or semi-structured data. In the current implementation, neither the key nor the value string can exceed 65535 bytes in length; an error will be thrown if this limit is exceeded. These maximum lengths may change in future releases. The text representation of an hstore value includes zero or more key => value items, separated by commas. For example: k => v foo => bar, baz => whatever "1-a" => "anything at all" The order of the items is not considered significant (and may not be reproduced on output). Whitespace between items or around the => sign is ignored. Use double quotes if a key or value includes whitespace, comma, = or >. To include a double quote or a backslash in a key or value, precede it with another backslash. (Keep in mind that depending on the setting of standard_conforming_strings, you may need to double backslashes in SQL literal strings.) A value (but not a key) can be a SQL NULL. This is represented as key => NULL The NULL keyword is not case-sensitive. Again, use double quotes if you want the string null to be treated as an ordinary data value. Currently, double quotes are always used to surround key and value strings on output, even when this is not strictly necessary. Table F-5. hstore Operators Operator | Description | Example | Result |
---|
hstore -> text | get value for key (null if not present) | 'a=>x, b=>y'::hstore -> 'a' | x | text => text | make single-item hstore | 'a' => 'b' | "a"=>"b" | hstore || hstore | concatenation | 'a=>b, c=>d'::hstore || 'c=>x, d=>q'::hstore | "a"=>"b", "c"=>"x", "d"=>"q" | hstore ? text | does hstore contain key? | 'a=>1'::hstore ? 'a' | t | hstore @> hstore | does left operand contain right? | 'a=>b, b=>1, c=>NULL'::hstore @> 'b=>1' | t | hstore <@ hstore | is left operand contained in right? | 'a=>c'::hstore <@ 'a=>b, b=>1, c=>NULL' | f |
(Before PostgreSQL 8.2, the containment operators @> and <@ were respectively called @ and ~. These names are still available, but are deprecated and will eventually be retired. Notice that the old names are reversed from the convention formerly followed by the core geometric datatypes!) Note: The => operator is deprecated and may be removed in a future release. Use the hstore(text, text) function instead.
Table F-6. hstore Functions Function | Return Type | Description | Example | Result |
---|
hstore(text, text) | hstore | make single-item hstore | hstore('a', 'b') | "a"=>"b" | akeys(hstore) | text[] | get hstore's keys as array | akeys('a=>1,b=>2') | {a,b} | skeys(hstore) | setof text | get hstore's keys as set | skeys('a=>1,b=>2') | ab | avals(hstore) | text[] | get hstore's values as array | avals('a=>1,b=>2') | {1,2} | svals(hstore) | setof text | get hstore's values as set | svals('a=>1,b=>2') | 12 | each(hstore) | setof (key text, value text) | get hstore's keys and values as set | select * from each('a=>1,b=>2') | key | value-----+------- a | 1 b | 2 | exist(hstore,text) | boolean | does hstore contain key? | exist('a=>1','a') | t | defined(hstore,text) | boolean | does hstore contain non-null value for key? | defined('a=>NULL','a') | f | delete(hstore,text) | hstore | delete any item matching key | delete('a=>1,b=>2','b') | "a"=>"1" |
hstore has index support for @> and ? operators. You can use either GiST or GIN index types. For example: CREATE INDEX hidx ON testhstore USING GIST(h);CREATE INDEX hidx ON testhstore USING GIN(h); Add a key, or update an existing key with a new value: UPDATE tab SET h = h || ('c' => '3'); Delete a key: UPDATE tab SET h = delete(h, 'k1'); The hstore type, because of its intrinsic liberality, could contain a lot of different keys. Checking for valid keys is the task of the application. Examples below demonstrate several techniques for checking keys and obtaining statistics. Simple example: SELECT * FROM each('aaa=>bq, b=>NULL, ""=>1'); Using a table: SELECT (each(h)).key, (each(h)).value INTO stat FROM testhstore; Online statistics: SELECT key, count(*) FROM (SELECT (each(h)).key FROM testhstore) AS stat GROUP BY key ORDER BY count DESC, key; key | count -----------+------- line | 883 query | 207 pos | 203 node | 202 space | 197 status | 195 public | 194 title | 190 org | 189 ...................
|