Scalar functions list

boolean functions

ifthenelse function
ifthenelse(condition, x, y)

Returns x if condition is true, else returns y.

Note

The difference with the if construct in most programming languages is that x and y expressions will always be evaluated.

Examples:

>>> sql("select ifthenelse(1>0,'yes','no') as answer")
answer
------
yes

date functions

activityindex function
activityIndex(date, c1, c2) → int

Specialized function that classifies the provided date argument into a 6-point scale (0 to 5)

Examples:

>>> table1('''
... '2009-01-01T01:32:03Z'
... '2010-01-01T00:03:13Z'
... '2010-12-31T00:03:13Z'
... '2011-04-01T00:03:13Z'
... ''')
>>> sql("select activityIndex(a) from table1")
activityIndex(a)
----------------
0
1
3
5
cleantimezone function
cleantimezone(date) → date

Specialized function that removes timezone information from date string

Examples:

>>> table1('''
... '2009-01-01T01:03:13+0100'
... '2009-01-01T01:03:13-0100'
... '2009-01-01T01:03:13+01:00'
... '2009-01-01T01:03:13-01:00'
... '2009-01-01T01:03:13+01'
... '2009-01-01T01:03:13-01'
... ''')
>>> sql("select cleantimezone(a) from table1")
cleantimezone(a)
-------------------
2009-01-01 01:03:13
2009-01-01 01:03:13
2009-01-01 01:03:13
2009-01-01 01:03:13
2009-01-01 01:03:13
2009-01-01 01:03:13
date2iso function
date2iso(sec) → ISO Datetime

Converts an input date to ISO-8601 date format. It tries to autodetect, the input date format.

Examples:

>>> table1('''
... 2007-12-31
... 2010-01-01
... 2010W06
... "18/Jan/2011:11:13:00 +0100"
... ''')
>>> sql("select date2iso(a) from table1")
date2iso(a)
-------------------------
2007-12-31T00:00:00+00:00
2010-01-01T00:00:00+00:00
2010-02-05T00:00:00+00:00
2011-01-18T11:13:00+01:00
datestrf2isoweek function
dateisoweek2week52(sec) → isoweek

Converts an ISOweek (having weeks in range [0,53]) to an ISOweek format which has weeks in range [1,53]. This function is usefull for producing week statistics which do not have incomplete weeks.

Examples:

>>> table1('''
... 2007-12-31
... 2010-01-01
... ''')
>>> sql("select strftime('%YW%W',a) from table1")
strftime('%YW%W',a)
-------------------
2007W53
2010W00
>>> sql("select datestrf2isoweek(strftime('%YW%W',a)) from table1")
datestrf2isoweek(strftime('%YW%W',a))
-------------------------------------
2007W53
2009W53
sectohuman function
sectohuman(sec) → human readable format

Converts a number of seconds to human readable format.

Examples:

>>> table1('''
... 3
... 63
... 10000
... 100000
... 1000000
... ''')
>>> sql("select sectohuman(a) from table1")
sectohuman(a)
------------------------------
3 sec
1 min 3 sec
2 hours 46 min 40 sec
1 day 3 hours 46 min 40 sec
11 days 13 hours 46 min 40 sec

evals functions

pyeval function
pyeval(expression)

Evaluates with Python the expression/s given and returns the result

>>> sql("pyeval '1+1'")
pyeval('1+1')
-------------
2
>>> sql("select var('test')")  
Traceback (most recent call last):
...
OperatorError: YeSQL SQLError:
Operator VAR: Variable 'test' does not exist
>>> sql("select var('test', pyeval('1+1'))")
var('test', pyeval('1+1'))
--------------------------
2
>>> sql("select var('test')")
var('test')
-----------
2
>>> sql('''pyeval '1+1' '"-"' '3+1' ''')
pyeval('1+1','"-"','3+1')
-------------------------
2-4
>>> sql("var 'testvar' of select 5")
var('testvar',(select 5))
-------------------------
5
>>> sql("pyeval 'testvar+5'")
pyeval('testvar+5')
-------------------
10
>>> sql('''pyeval keywords('lala') ''')
pyeval('keywords(''lala'')')
----------------------------
lala
pyfun function
pyfun(pyfunction, parameters)

Calls a python function and returns the result. If an error occurs, it throws an exception.

>>> sql("select pyfun('math.sqrt', 25)")
pyfun('math.sqrt', 25)
----------------------
5.0
>>> sql("select pyfun('math.log10', 100)")
pyfun('math.log10', 100)
------------------------
2.0
>>> sql("select pyfun('math.log10', -1)") 
Traceback (most recent call last):
...
OperatorError: YeSQL SQLError:
Operator PYFUN: math.log10: math domain error
pyfunerrtonul function
pyfunerrtonul(pyfunction, parameters)

Calls a python function and returns the result. If an error occurs it returns null.

>>> sql("select pyfunerrtonul('math.sqrt', -1)")
pyfunerrtonul('math.sqrt', -1)
------------------------------
None
>>> sql("select pyfunerrtonul('math.log10', -1)")
pyfunerrtonul('math.log10', -1)
-------------------------------
None

query function

subst function
subst(str, variables)

Substitutes the special text markers with the variables values.

>>> sql('''subst 'Variable %s has value %s' 'var1' '5' ''')
subst('Variable %s has value %s','var1','5')
--------------------------------------------
Variable var1 has value 5
>>> sql('''select subst('Variable %s has value %d','var2',5) ''')
subst('Variable %s has value %d','var2',5)
------------------------------------------
Variable var2 has value 5
>>> sql('''var 'testvar' 'testvalue' ''')
var('testvar','testvalue')
--------------------------
testvalue
>>> sql('''select subst('Variable %{testvar}1 %{testvar1} has value %s', 5) ''')
subst('Variable testvalue1 %{testvar1} has value %s', 5)
--------------------------------------------------------
Variable testvalue1 %{testvar1} has value %s

fileops functions

fileextension function
fileextension(text) → text

Returns the extension of a given text argument.

Examples:

>>> table1('''
... "http://www.test.com/lalala.gif"
... "http://www.test.com/lalala.GIF"
... ''')
>>> sql("select fileextension(a) from table1")
fileextension(a)
----------------
.gif
.gif
filetext function
filetext(filename) → text

Returns the contents of the file in a single value

Examples:

>>> sql("select filetext('testing/sales.tsv')") 
filetext('testing/sales.tsv')
----------------------------------
Cars        2010-01 Athens  200
Cars        2010-02 Athens  130
Bikes       2010-01 NY      10
Bikes       2010-02 NY      30
Cars        2010-01 NY      100
Cars        2010-02 NY      160
Cars        2010-01 Paris   70
Cars        2010-02 Paris   20
Bikes       2010-01 Paris   100
Bikes       2010-02 Paris   20
Boats       2010-01 Paris   200

formating functions

dateformat function
dateformat(date[, inpformat, outformat])

Returns date formatted with outformat. Default inpformat is %d-%m-%y and default outformat is %Y-%m-%d. To change inpformat just provide as second parameter the input format. To change outformat, both format parameters must be provided. If date is not formatted according to inpformat null value is returned.

Check formatting pattern details

Examples:

>>> sql("select dateformat('28-01-09') as date")
date
----------
2009-01-28

When date doesn’t meet the format null is returned

>>> sql("select dateformat('32-01-09') as date")
date
----
None
>>> sql("select dateformat('Thu, 28 Jun 2001 14:17:15 +0000','%a, %d %b %Y %H:%M:%S +0000') as date")
date
----------
2001-06-28
>>> sql("select dateformat('28-01-09','%d-%m-%y','%x') as date")
date
--------
01/28/09
strjoin function
strjoin(value1, value2, ....[, 'params', formatting options])

Returns a string with value1,value2,…. formatted according to formatting options. Literal parameter ‘params’ must precede formating options. Default formatting is comma separated values.

Examples:

>>> sql("select strjoin('First','Second','Third',100) as joinedvals")
joinedvals
-----------------------
First,Second,Third,100
>>> sql("select strjoin('First','Second','Third',100,'params','delimiter: ') as joinedvals")
joinedvals
-----------------------
First Second Third 100
>>> sql("select strjoin('lola',2,10,'llalallall','params','delimiter:%','quoting:QUOTE_ALL') as joinedvals")
joinedvals
-----------------------------
"lola"%"2"%"10"%"llalallall"
>>> sql("select strjoin('value, with delimiter ',2,10,'new','params','dialect:csv') as joinedvals")
joinedvals
----------------------------------
"value, with delimiter ",2,10,new

strsplit function

strsplit(str[, formatting options]) → [C1,C2,....]

Splits str according to formatting options, default behavior is to split on space. It is a multiset operator that returns one row.

See also

Formatting options:

dialect:

tsv/csv

Formats field as tab/comma separated values with minimal quoting

delimiter:

A string used to separate fields. It defaults to ‘ ‘

doublequote:

t/f

Controls how instances of quotechar appearing inside a field should be themselves be quoted. When True, the character is doubled. When False, the escapechar is used as a prefix to the quotechar. It defaults to True. On output, if doublequote is False and no escapechar is set, Error is raised if a quotechar is found in a field

escapechar:

A one-character string used by the writer to escape the delimiter if quoting is set to QUOTE_NONE and the quotechar if doublequote is False. On reading, the escapechar removes any special meaning from the following character. It defaults to None, which disables escaping.

quotechar:

A one-character string used to quote fields containing special characters, such as the delimiter or quotechar, or which contain new-line characters. It defaults to ‘”’.

quoting:

Controls when quotes should be generated by the writer and recognized by the reader. It can take on any of the QUOTE_* constants and defaults to QUOTE_MINIMAL. Possible values are QUOTE_ALL, QUOTE_NONE, QUOTE_MINIMAL, QUOTE_NONNUMERIC

skipinitialspace:
 

t/f

When True, whitespace immediately following the delimiter is ignored. The default is False

Returned multiset schema:
 
  • C1,C2…
    Text values produced from splitting.

Examples:

>>> sql("select strsplit('First,Second,Third', 'dialect:csv')")
C1    | C2     | C3
----------------------
First | Second | Third
>>> sql("select strsplit('-First-%Second%-Third-','delimiter:%','quotechar:-')")
C1    | C2     | C3
----------------------
First | Second | Third
strsplitv function
strsplitv(str[, formatting options]) → [C1]

Splits in rows str according to formatting options, default behavior is to split on space. It is a multiset operator that returns one column but many rows. Formatting options are like in strsplit() function.

See also

>>> sql("select strsplitv('First,Second,Third','dialect:csv')")
C1
------
First
Second
Third
>>> sql("select strsplitv('First Second Third')")
C1
------
First
Second
Third

htmlops functions

htmladdbreaks function
url(href, linktext)

Returns the a url pointing to href and having the link text linktext.

Examples:

>>> sql("select htmladdbreaks('very-long/string') as brokenhtml")
brokenhtml
--------------------------
very-<wbr>long/<wbr>string
htmldecode function
htmldecode(str)

Returns the html decoded str.

Examples:

>>> sql("select htmldecode('(&quot;die+wunderbaren+jahre&quot;)') as query")
query
-------------------------
("die+wunderbaren+jahre")
>>> sql("select htmldecode(null) as query")
query
-----
None
htmlencode function
htmldecode(str)

Returns the html decoded str.

Examples:

>>> sql("select htmldecode('(&quot;die+wunderbaren+jahre&quot;)') as query")
query
-------------------------
("die+wunderbaren+jahre")
>>> sql("select htmldecode(null) as query")
query
-----
None
htmllink function

Returns the an html link pointing to href and having the link text linktext.

Examples:

>>> sql("select htmllink('http://somewhere.org') as url") 
url
-----------------------------------------------------------------
<a href="http://somewhere.org">http://<wbr>somewhere.<wbr>org</a>
>>> sql("select htmllink('somewhere.org') as url")
url
-----------------------------------------------------
<a href="http://somewhere.org">somewhere.<wbr>org</a>
>>> sql("select htmllink('somewhere.org', 'go somewhere') as url")
url
-----------------------------------------------
<a href="http://somewhere.org">go somewhere</a>
htmlstriptags function
htmlstriptags(str, default_tag_conversion)

Strips the html tags of input. It also converts “<br>” tags to new lines. If a default_tag_conversion is provided then tags that would have been erased are converted to default_tag_conversion.

Examples:

>>> sql("select htmlstriptags('<tag1>asdf<>as< br>df<p class = lala>spaced</sp>paragraph</p>anotherline<tag2> w<sup>3</sup>') as query")
query
-------------------------------------------
asdfas
df spaced paragraph
anotherline w_3
>>> sql("select htmlstriptags('<tag1>asdf<>as< br>df<p class = lala>spaced</sp>paragraph</p>anotherline<tag2> w<sup>3</sup>', '***') as query")
query
----------------------------------------------------
***asdf***as
df spaced paragraph
anotherline*** w_3
>>> sql("select htmlstriptags(null) as query")
query
-----
urldecode function
urldecode(str)

Returns the url decoded str.

Examples:

>>> sql("select urldecode('where%2Ccollid%3Dcolid+and+u%3D%27val%27') as query")
query
------------------------------
where,collid=colid and u='val'
>>> sql("select urldecode(null) as query")
query
-----
None
urlencode function
urlescape(str)

Returns the escaped URL.

Examples:

>>> sql("select urlencode('where, collid=colid') as query")
query
-----------------------
where%2C+collid%3Dcolid
urllocation function
urllocation(str) → str

Returns the location part of provided URL.

Examples:

>>> table1('''
... http://www.test.com/apath/bpath/fname.pdf
... http://www.test.com/search.csv;p=5?q=test#hl=en
... ''')
>>> sql("select urllocation(a) from table1")
urllocation(a)
-----------------------------------------
http://www.test.com/apath/bpath/fname.pdf
http://www.test.com/search.csv
urlquery2jdict function
urlquery2jdict(URL or URL_query_part) → JDICT

Converts the query part of a URL into a JSON associative array.

Examples:

>>> table1('''
... 'url_ver=ver1&url_tim=2011-01-01T00%3A02%3A40Z'
... 'url_tim=2011-01-01T00%3A02%3A40Z&url_ver=ver1'
... http://www.test.com/search.csv;p=5?lang=test&ver=en
... ''')
>>> sql("select urlquery2jdict(a) from table1")
urlquery2jdict(a)
---------------------------------------------------
{"url_tim":"2011-01-01T00:02:40Z","url_ver":"ver1"}
{"url_tim":"2011-01-01T00:02:40Z","url_ver":"ver1"}
{"lang":"test","ver":"en"}
urlsplit function
urlsplit(text1[, text2, ...]) → multiset

Breaks a given URL into multiple fields. The returned table schema is:

scheme:What type the URL is (e.g. http, ftp …)
netloc:Network location of URL (e.g. www.text.com)
path:Path part of URL (e.g. /data/2010/). It always has a slash at the end
filename:Filename part of URL
type:Mime type of URL, or if not a mime type exists, the extension part of filename.
subtype:Mime subtype of URL.
params:All parameters following ‘;’ in URL.
query:All parameters following ‘?’ in URL.
fragment:All parameters following ‘#’ in URL.

Examples:

>>> table1('''
... http://www.test.com/apath/bpath/fname.pdf
... http://www.test.com/search.csv;p=5?q=test#hl=en
... ''')
>>> sql("select urlsplit(a) from table1")
scheme | netloc       | path          | filename   | type        | subtype | params | query  | fragment
-------------------------------------------------------------------------------------------------------
http   | www.test.com | /apath/bpath/ | fname.pdf  | application | pdf     |        |        |
http   | www.test.com | /             | search.csv | csv         |         | p=5    | q=test | hl=en

iptools functions

ip2long function
ip2long(ip) → int

Converts a decimal dotted quad IP string to long integer IP format. It can take either one column of IP strings or 4 columns each having one part of the IP address.

Examples:

>>> sql("select ip2long('123.123.123.123')")
ip2long('123.123.123.123')
--------------------------
2071690107
>>> sql("select ip2long(123,123,123,123)")
ip2long(123,123,123,123)
------------------------
2071690107
ip_prefix function
ip_prefix(ip, class_number) → ip

Returns the subnetwork class of an IP address.

Examples:

>>> sql("ip_prefix '123.34.24.54' ")
ip_prefix('123.34.24.54')
-------------------------
4
>>> sql("ip_prefix '123.34.24.54' '3'")
ip_prefix('123.34.24.54','3')
-----------------------------
123.34.24
>>> sql("ip_prefix '123.34.24.54' '2'")
ip_prefix('123.34.24.54','2')
-----------------------------
123.34
long2ip function
long2ip(int) → ip

Convert longint IP to dotted quad string

Examples:

>>> sql("select long2ip('2071690107')")
long2ip('2071690107')
---------------------
123.123.123.123
>>> sql("select long2ip(2071690107)")
long2ip(2071690107)
-------------------
123.123.123.123

jpacks functions

j2nl function
j2nl(jpack) → text

Converts multiple input jpacks to a newline separated text.

Examples:

>>> sql("select j2nl('[1,2,3]')") 
j2nl('[1,2,3]')
---------------
1
2
3
>>> sql("select j2nl('[1,2,3]','a')") 
j2nl('[1,2,3]','a')
-------------------
1
2
3
a
>>> sql("select j2nl('a', 'b')") 
j2nl('a', 'b')
--------------
a
b
j2s function
j2s(jpack) → space separated string

Converts multiple input jpacks to a space separated string. Newlines are converted to spaces.

Examples:

>>> sql("select j2s('[1,2,3]')") 
j2s('[1,2,3]')
--------------
1 2 3
>>> sql("select j2s('[1,2,3]','a')") 
j2s('[1,2,3]','a')
------------------
1 2 3 a
>>> sql("select j2s('a', 'b')") 
j2s('a', 'b')
-------------
a b
j2t function
j2t(jpack) → tabpack

Converts multiple input jpacks to a tab separated pack (tab separated values). If tab or newline characters are found in the source jpack they are converted to spaces.

Examples:

>>> sql("select j2t('[1,2,3]')") 
j2t('[1,2,3]')
--------------
1        2        3
>>> sql("select j2t('[1,2,3]','a')") 
j2t('[1,2,3]','a')
------------------
1        2        3        a
>>> sql("select j2t('a', 'b')") 
j2t('a', 'b')
-------------
a        b
jchars function
jletters(text) → character jpack

Splits an input text into its composing characters.

Examples:

>>> sql("select jchars('this is a text')")
jchars('this is a text')
---------------------------------------------------------
["t","h","i","s"," ","i","s"," ","a"," ","t","e","x","t"]
>>> sql("select jchars('another', 'text')")
jchars('another', 'text')
---------------------------------------------
["a","n","o","t","h","e","r","t","e","x","t"]
jcombinations function
jcombinations(jpack, r) → multiset

Returns all length r combinations of jpack.

Examples:

>>> sql('''select jcombinations('["t1","t2","t3"]',2)''')
C1 | C2
-------
t1 | t2
t1 | t3
t2 | t3
>>> sql('''select jcombinations('["t1","t2",["t3","t4"]]',2)''')
C1 | C2
----------------
t1 | t2
t1 | ["t3","t4"]
t2 | ["t3","t4"]
>>> sql('''select jcombinations(null,2)''')
>>> sql('''select jcombinations('["t1","t2","t3","t4"]')''')
C1
--
t1
t2
t3
t4
jdict function
jdict(key, value, key1, value1) → jdict

Returns a jdict of the keys and value pairs.

Examples:

>>> sql(''' select jdict('key1', 'val1', 'key2', 'val2') ''') 
jdict('key1', 'val1', 'key2', 'val2')
-------------------------------------
{"key1":"val1","key2":"val2"}
>>> sql(''' select jdict('key', '{"k1":1,"k2":2}') ''') 
jdict('key', '{"k1":1,"k2":2}')
-------------------------------
{"key":{"k1":1,"k2":2}}
>>> sql(''' select jdict('key', '["val1", "val2"]') ''') 
jdict('key', '["val1", "val2"]')
--------------------------------
{"key":["val1","val2"]}
>>> sql(''' select jdict('1') ''') 
Traceback (most recent call last):
...
OperatorError: YeSQL SQLError:
Operator JDICT: At least two arguments required
jdictgroupkey function
jdictgroupkey(list_of_jdicts, groupkey1, groupkey2, ...)

It groups an array of jdicts into a hierarchical structure. The grouping is done first by groupkey1 then by groupkey2 and so on.

If no groupkeys are provided, then the first key of array’s first jdict is used as a groupkey.

Examples:

>>> sql('''select jdictgroupkey('[{"gkey":"v1", "b":1},{"gkey":"v1","b":2},{"gkey":"v2","b":1, "c":2}]') as j''')
j
---------------------------------------------
{"v1":[{"b":1},{"b":2}],"v2":[{"b":1,"c":2}]}
>>> sql('''select jdictgroupkey('[{"gkey":"v1", "b":1},{"gkey":"v1","b":2},{"gkey":"v2","b":1, "c":2}]', "gkey") as j''')
j
---------------------------------------------
{"v1":[{"b":1},{"b":2}],"v2":[{"b":1,"c":2}]}
>>> sql('''select jdictgroupkey('[{"gkey":"v1", "gkey2":"f1", "b":1},{"gkey":"v1", "gkey2":"f2", "b":2},{"gkey":"v1", "gkey2":"f2", "b":1, "c":2}]', "gkey", "gkey2") as j''')
j
--------------------------------------------------------------
{"v1":{"gkey2":{"f1":[{"b":1}],"f2":[{"b":2},{"b":1,"c":2}]}}}
jdictkeys function
jdictkeys(jdict) → jpack

Returns a jpack of the keys of input jdict

Examples:

>>> sql(''' select jdictkeys('{"k1":1,"k2":2}', '{"k1":1,"k3":2}') ''') 
jdictkeys('{"k1":1,"k2":2}', '{"k1":1,"k3":2}')
-----------------------------------------------
["k1","k2","k3"]
>>> sql(''' select jdictkeys('{"k1":1,"k2":2}') ''') 
jdictkeys('{"k1":1,"k2":2}')
----------------------------
["k1","k2"]
>>> sql(''' select jdictkeys('test') ''') 
jdictkeys('test')
-----------------
[]
>>> sql(''' select jdictkeys(1) ''') 
jdictkeys(1)
------------
[]
jdictsplit function
jdictsplit(jdict[, key1, key2, ..]) → columns

If only the first argument (jdict) is provided, it returns a row containing the values of input jdict (sorted by the jdict keys).

If key values are also provided, it returns only the columns of which the keys have been provided.

Examples:

>>> sql(''' select jdictsplit('{"k1":1,"k2":2}') ''') 
k1 | k2
-------
1  | 2
>>> sql(''' select jdictsplit('{"k1":1,"k2":2, "k3":3}', 'k3', 'k1', 'k4') ''') 
k3 | k1 | k4
--------------
3  | 1  | None
jdictsplitv function
jdictsplitv(jdict[, key1, key2, ..]) → columns

If only the first argument (jdict) is provided, it returns rows containing the values of input jdict.

If key values are also provided, it returns only the columns of which the keys have been provided.

Examples:

>>> sql(''' select jdictsplitv('{"k1":1,"k2":2}') ''') 
key | val
---------
k1  | 1
k2  | 2
>>> sql(''' select jdictsplitv('{"k1":1,"k2":2, "k3":3}', 'k3', 'k1', 'k4') ''') 
key | val
---------
k3  | 3
k1  | 1
jdictvals function
jdictvals(jdict[, key1, key2, ..]) → jpack

If only the first argument (jdict) is provided, it returns a jpack of the values of input jdict (sorted by the jdict keys).

If key values are also provided, it returns only the keys that have been provided.

Examples:

>>> sql(''' select jdictvals('{"k1":1,"k2":2}') ''') 
jdictvals('{"k1":1,"k2":2}')
----------------------------
[1,2]
>>> sql(''' select jdictvals('{"k1":1,"k2":2, "k3":3}', 'k3', 'k1', 'k4') ''') 
jdictvals('{"k1":1,"k2":2, "k3":3}', 'k3', 'k1', 'k4')
------------------------------------------------------
[3,1,null]
>>> sql(''' select jdictvals('{"k1":1}') ''') 
jdictvals('{"k1":1}')
---------------------
1
>>> sql(''' select jdictvals('{"k1":1}') ''') 
jdictvals('{"k1":1}')
---------------------
1
>>> sql(''' select jdictvals(1) ''') 
jdictvals(1)
------------
1
jexcept function
jexcept(jpackA, jpackB) → jpack

Returns the items of jpackA except the items that appear on jpackB.

Examples:

>>> sql("select jexcept('[1,2,3]', '[1,2,3]')") 
jexcept('[1,2,3]', '[1,2,3]')
-----------------------------
[]
>>> sql("select jexcept('[1,2,3]', '[1,3]')") 
jexcept('[1,2,3]', '[1,3]')
---------------------------
2
jfilterempty function
jfilterempty(jpacks.) → jpack

Removes from input jpacks all empty elements.

Examples:

>>> sql("select jfilterempty('a', '', '[]')")
jfilterempty('a', '', '[]')
---------------------------
a
>>> sql("select jfilterempty('a','[null]',3)")
jfilterempty('a','[null]',3)
----------------------------
["a",3]
>>> sql("select jfilterempty('[3]', jpack('b', ''))")
jfilterempty('[3]', jpack('b', ''))
-----------------------------------
[3,"b"]
jflatten function
jflattten(jpacks) → jpack

Flattens all nested sub-jpacks.

Examples:

>>> sql(''' select jflatten('1', '[2]') ''') 
jflatten('1', '[2]')
--------------------
["1",2]
>>> sql(''' select jflatten('[["word1", 1], ["word2", 1], [["word3", 2], ["word4", 2]], 3]') ''') 
jflatten('[["word1", 1], ["word2", 1], [["word3", 2], ["word4", 2]], 3]')
-------------------------------------------------------------------------
["word1",1,"word2",1,"word3",2,"word4",2,3]
jfrequentwords function
jfrequentwords(args...) → jpack

Returns the frequent words of a text in a jpack

jintersection function
jintersection(jpackA, jpackB) → jpack

Returns the items of jpackA except the items that appear on jpackB.

Examples:

>>> sql("select jintersection('[1,2,3]', '[1,2,3]')") 
jintersection('[1,2,3]', '[1,2,3]')
-----------------------------------
[1,2,3]
>>> sql("select jintersection('[1,2,3]', '[1,3]', 1)") 
jintersection('[1,2,3]', '[1,3]', 1)
------------------------------------
1
jlen function
jlen(args...) → int

Returns the total length in elements of the input jpacks.

Examples:

>>> sql("select jlen('abc')")
jlen('abc')
-----------
1
>>> sql("select jlen('a','b',3)")
jlen('a','b',3)
---------------
3
>>> sql("select jlen('a', jpack('b',3))")
jlen('a', jpack('b',3))
-----------------------
3
>>> sql("select jlen('[1,2,3]')")
jlen('[1,2,3]')
---------------
3
jlengthiest function
jlengthiest(jpacks.) → jpack

Returns the string with the greatest length contained in the jpacks.

Examples:

>>> sql("select jlengthiest('a', '', '[]')")
jlengthiest('a', '', '[]')
--------------------------
a
>>> sql("select jlengthiest('a','longer',3)")
jlengthiest('a','longer',3)
---------------------------
longer
>>> sql("select jlengthiest('[3]', jpack('b', ''))")
jlengthiest('[3]', jpack('b', ''))
----------------------------------
3
jmerge function
jmerge(jpacks) → jpack

Merges multiple jpacks into one jpack.

Examples:

>>> sql("select jmerge('[1,2,3]', '[1,2,3]', 'a', 3 )") 
jmerge('[1,2,3]', '[1,2,3]', 'a', 3 )
-------------------------------------
[1,2,3,1,2,3,"a",3]
jmergeregexp function
jmergeregexp(jpacks) → jpack

Creates a regular expression that matches all of the jpack’s contents. If the input jpack contains keyword pairs, then jmergeregexp returns a regular expression with named groups.

Examples:

>>> sql(''' select jmergeregexp('["abc", "def"]') ''') 
jmergeregexp('["abc", "def"]')
------------------------------
(?:abc)|(?:def)
>>> sql(''' select jmergeregexp('[["pos", "p1"], ["neg", "n1"], ["pos", "p2"]]') ''') 
jmergeregexp('[["pos", "p1"], ["neg", "n1"], ["pos", "p2"]]')
-------------------------------------------------------------
(?P<neg>n1)|(?P<pos>p1|p2)
>>> sql(''' select jmergeregexp('[]') ''') 
jmergeregexp('[]')
------------------
_^
>>> sql(''' select jmergeregexp('["ab",""]') ''') 
jmergeregexp('["ab",""]')
-------------------------
(?:ab)
jmergeregexpnamed function
jmergeregexpnamed(jpacks) → jpack

Creates a regular expression that matches all of the jpack’s contents with named groups. If the number of named groups in a regular expression is greater than 99, then the output will be a jpack of regular expressions.

Examples:

>>> sql(''' select jmergeregexpnamed('["abc", "def"]') ''') 
jmergeregexpnamed('["abc", "def"]')
-----------------------------------
(abc)|(def)
jngrams function
jngrams(n, text) → jpack

Converts multiple input arguments into a jpack of ngrams.

Examples:

>>> sql("select jngrams(1,'This is a test phrase')")
jngrams(1,'This is a test phrase')
-------------------------------------------
[["This"],["is"],["a"],["test"],["phrase"]]
>>> sql("select jngrams(2,'This is a test phrase')")
jngrams(2,'This is a test phrase')
---------------------------------------------------------
[["This","is"],["is","a"],["a","test"],["test","phrase"]]
jpack function
jpack(args...) → jpack

Converts multiple input arguments into a single string. Jpacks preserve the types of their inputs and are based on JSON encoding. Single values are represented as themselves where possible.

Examples:

>>> sql("select jpack('a')")
jpack('a')
----------
a
>>> sql("select jpack('a','b',3)")
jpack('a','b',3)
----------------
["a","b",3]
>>> sql("select jpack('a', jpack('b',3))")
jpack('a', jpack('b',3))
------------------------
["a",["b",3]]
jpermutations function
jpermutations(jpack, r) → multiset

Returns all length r permutations of jpack.

Examples:

>>> sql('''select jpermutations('["t1","t2","t3"]',2)''')
C1 | C2
-------
t1 | t2
t1 | t3
t2 | t1
t2 | t3
t3 | t1
t3 | t2
>>> sql('''select jpermutations('["t1","t2",["t3","t4"]]',2)''')
C1          | C2
-------------------------
t1          | t2
t1          | ["t3","t4"]
t2          | t1
t2          | ["t3","t4"]
["t3","t4"] | t1
["t3","t4"] | t2
>>> sql('''select jpermutations(null,2)''')
>>> sql('''select jpermutations('["t1","t2","t3","t4"]')''')
C1
--
t1
t2
t3
t4
jrange function
jrange(num) → jrange

Returns a jrange of integer numbers.

Examples:

>>> sql("select jrange(5)")
jrange('a')
-----------------
["0","1","2","3"]
jset function
jset(jpacks) → jpack

Returns a set representation of a jpack, unifying duplicate items.

Examples:

>>> sql("select jset('[1,2,3]', '[1,2,3]', 'b', 'a', 3 )") 
jset('[1,2,3]', '[1,2,3]', 'b', 'a', 3 )
----------------------------------------
[1,2,3,"a","b"]
jsonpath function
jsonpath(JSON, jsonpathexpr1, jsonpathexpr2) → multiset

Uses jsonpath expressions to pick values from inside a JSON input. If the outputs of all JSONpath expressions have the same number of elements in them, it splits the output into multiple rows.

Note

For more on JSONpath see: http://goessner.net/articles/JsonPath/

Examples:

>>> sql('''select jsonpath('{"d1":[{"name":"n1", "value":"v1"}, {"name":"n2", "value":"v2"}]}', '$.d1') ''')
C1
-------------------------------------------------------
[{"name":"n1","value":"v1"},{"name":"n2","value":"v2"}]
>>> sql('''select jsonpath('{"d1":[{"name":"n1", "value":"v1"}, {"name":"n2", "value":"v2"}]}', '$.d1[*].name') ''')
C1
--
n1
n2
>>> sql('''select jsonpath('{"d1":[{"name":"n1", "value":"v1"}, {"name":"n2", "value":"v2"}]}', '$.d1[*].name', '$.d1[*].value') ''')
C1 | C2
-------
n1 | v1
n2 | v2
>>> sql('''select jsonpath('{"d1":[{"name":"n1", "value":"v1"}, {"name":"n2", "value":"v2"}]}', '$.d1[*].name', '$.d1[*].nonexisting') ''')
C1 | C2
---------
n1 | None
n2 | None
>>> sql('''select jsonpath('{"d1":[{"name":"n1", "value":"v1"}, {"name":"n2"}]}', '$.d1[*].name', '$.d1[*].value') ''')
C1          | C2
----------------
["n1","n2"] | v1
>>> sql('''select jsonpath('{"d1":[{"name":"n1", "value":"v1"}, {"name":"n2", "value":"v2"}]}', '$.nonexisting') ''')
jsonstrict function
jsonstrict(args...) → json string

Sometimes we wish to process json lists from another application. Jsonstrict function tries to always create json compatible lists. So it always returns json lists.

Examples:

>>> sql("select jsonstrict('a')")
jsonstrict('a')
---------------
["a"]
>>> sql("select jsonstrict('a','b',3)")
jsonstrict('a','b',3)
---------------------
["a","b",3]
>>> sql("select jsonstrict('a', jpack('b',3))")
jsonstrict('a', jpack('b',3))
-----------------------------
["a",["b",3]]
jsort function
jsort(jpacks) → jpack

Sorts the input jpacks.

Examples:

>>> sql("select jsort('[1,2,3]', '[1,2,3]', 'b', 'a', 3 )") 
jsort('[1,2,3]', '[1,2,3]', 'b', 'a', 3 )
-----------------------------------------
[1,1,2,2,3,3,3,"a","b"]
jsplice function
jsplice(jpack, range1_start, range1_end, ...) → jpack

Splices input jpack. If only a single range argument is provided, it returns input jpack’s element in provided position. If defined position index is positive, then it starts counting from the beginning of input jpack. If defined position is negative it starts counting from the end of input jpack.

If more than one range arguments are provided, then the arguments are assumed to be provided in pairs (start, end) that define ranges inside the input jpack that should be joined together in output jpack.

Examples:

>>> sql(''' select jsplice('[1,2,3,4,5]',0) ''') 
jsplice('[1,2,3,4,5]',0)
------------------------
1
>>> sql(''' select jsplice('[1,2,3,4,5]',-1) ''') 
jsplice('[1,2,3,4,5]',-1)
-------------------------
5
>>> sql(''' select jsplice('[1,2,3,4,5]',10) ''') 
jsplice('[1,2,3,4,5]',10)
-------------------------
None
>>> sql(''' select jsplice('[1,2,3,4,5]', 0, 3, 0, 2) ''') 
jsplice('[1,2,3,4,5]', 0, 3, 0, 2)
----------------------------------
[1,2,3,1,2]
>>> sql(''' select jsplice('[1,2,3,4,5]', 2, -1) ''') 
jsplice('[1,2,3,4,5]', 2, -1)
-----------------------------
[3,4]
jsplit function
jsplit(jpacks) → [C1, C2, ...]

Splits horizontally a jpack.

Examples:

>>> sql("select jsplit('[1,2,3]', '[3,4,5]')") 
C1 | C2 | C3 | C4 | C5 | C6
---------------------------
1  | 2  | 3  | 3  | 4  | 5
jsplitv function
jsplitv(jpacks) → [C1]

Splits vertically a jpack.

Examples:

>>> sql("select jsplitv(jmerge('[1,2,3]', '[1,2,3]', 'b', 'a', 3 ))") 
C1
--
1
2
3
1
2
3
b
a
3
jzip function
jzip(args...) → json string

It combines the corresponding elements of input jpacks.

Examples:

>>> sql('''select jzip('["a", "b"]', '[1,2]','[4,5]')''')
jzip('["a", "b"]', '[1,2]','[4,5]')
-----------------------------------
[["a",1,4],["b",2,5]]
jzipdict function
jzipdict(args...) → json string

It combines the correspinding elements of input jpacks into a jdict.

Examples:

>>> sql('''select jzipdict('["a", "b"]', '[1,2]','[4,5]')''')
jzipdict('["a", "b"]', '[1,2]','[4,5]')
---------------------------------------
{"a":[1,4],"b":[2,5]}
nl2j function
nl2j(text) → jpack

Converts a text with newlines to a jpack.

s2j function
s2j(tabpack) → jpack

Converts a space separated pack to a jpack.

Examples:

>>> sql("select s2j('1  2 3 ')") 
s2j('1  2 3 ')
--------------
["1","2","3"]
t2j function
t2j(tabpack) → jpack

Converts a tab separated pack to a jpack.

Examples:

>>> sql("select t2j(j2t('[1,2,3]'))") 
t2j(j2t('[1,2,3]'))
-------------------
["1","2","3"]
>>> sql("select t2j('asdfasdf')") 
t2j('asdfasdf')
---------------
["asdfasdf"]

langtools functions

detectlang function
detectlang(text1, text2, ...) → text

Detects the language of a snippet of text by analysing its statistical properties.

Examples:

>>> sql("detectlang 'ελληνικά'")
detectlang('ελληνικά')
------------------------------
greek-utf
>>> sql("detectlang this is in english")
detectlang('this is in english')
--------------------------------
english
>>> sql("detectlang ceci est en français")
detectlang('ceci est en français')
-----------------------------------
french
>>> sql("detectlang este es el español")
detectlang('este es el español')
---------------------------------
spanish
stem function
stem(text1, text2, ...) → text

Does stemming according to the porter algorithm.

Examples:

>>> sql("stem 'cutting and creating'")
stem('cutting and creating')
----------------------------
cut and creat
>>> sql("stem ceci est en français cutting")
stem('ceci est en français cutting')
-------------------------------------
ceci est en françai cut
stem_en function
stem_en(text1, text2, ...) → text

Detects if the input is in english and only then does the porter stemming else it returns the input arguments concatenated

Examples:

>>> sql("stem_en 'cutting and creating'")
stem_en('cutting and creating')
-------------------------------
cut and creat
>>> sql("stem_en ceci est en français cutting")
stem_en('ceci est en français cutting')
----------------------------------------
ceci est en français cutting

logparse functions

apachelogsplit function
apachelogsplit(apache_log_line) → [ip, ident, authuser, date, request, status, bytes, referrer, useragent]

Breaks a single apache log row into multiple fields.

Examples:

>>> table1('''
... '1.1.1.1 - - [01/Feb/2001:01:02:03 +0001] "HEAD /test.com HTTP/1.1" 200 - "-" "reftest"'
... ''')
>>> sql("select apachelogsplit(a) from table1")
ip      | ident | authuser | date                     | method | uri       | httpver | status | bytes | referrer | useragent
----------------------------------------------------------------------------------------------------------------------------
1.1.1.1 | None  | None     | 2001-02-01T01:02:03+0001 | HEAD   | /test.com | 1.1     | 200    | None  | None     | reftest

mathops functions

farith function
farith(calc) → float or Fraction

Takes as input a mathematical expression in polish notation and computes the result using fractional computation

Examples:

>>> sql("select farith('+',5,7)" )
farith('+',5,7)
---------------
12
>>> sql("select farith('-','*','/',15,'-',7,'+',1,1,3,'+',2,'+',1,1)" )
farith('-','*','/',15,'-',7,'+',1,1,3,'+',2,'+',1,1)
----------------------------------------------------
5
gaussdistribution function
gaussdistribution(mean, sigma) → float

Returns a gaussian distribution. Sigma is the standard deviation of the distribution

Examples:

>>> sql("select gaussdistribution(10,5)") 
gaussdistribution(10,5)
-----------------------
...
randomrange function
randomrange(start, end, step) → int

Returns a random number in the defined range

Examples:

>>> sql("select randomrange(0, 68, 1)") 
randomrange(0, 68, 1)
---------------------
...
>>> sql("select randomrange(0, 68)") 
randomrange(0, 68)
------------------
...
safediv function
safediv(int, int, int) → int

Returns the first argument, when the division of the two subsequent numbers includes zero in denominator (i.e. in third argument)

Examples:

>>> sql("select safeDiv(1,5,0)")
safeDiv(1,5,0)
--------------
1
sqroot function
sqroot(int) → int

Returns the square root of a given argument.

Examples:

>>> table1('''
... 25
... ''')
>>> sql("select sqroot(a) from table1")
sqroot(a)
---------
5.0
tonumber function
tonumber(variable) → int or float

Convert variable, whose type is str or unicode, to int or float, if it is feasible

Examples:

>>> sql("select tonumber('12.3') as val")
val
----
12.3
>>> sql("select tonumber(12.3) as val")
val
----
12.3
>>> sql("select tonumber('not a number') as val")
val
------------
not a number
>>> sql("select tonumber(null) as val")
val
----
None

settings functions

setlog function
setlog(filename)

Sets the log file path/filename for exec operator

setting function
setting(setting_name[, value])

Sets and returns a setting’s value

Examples:

>>> sql("setting 'tracing' 0")
setting('tracing','0')
----------------------
False
>>> sql("setting 'tracing'")
setting('tracing')
------------------
False
toggle function
toggle(setting_name)

Toggles a boolean global setting

Examples:

>>> sql("toggle lala")
toggle('lala')
--------------
lala not found
>>> sql("toggle tracing")
toggle('tracing')
--------------------
tracing is now: True

similarity functions

jaccard function
jaccard(jpack1, jpack2)

Return jaccard similarity value of two jpacks.

Example:

>>> table1('''
... user1   movie1 20
... user1   movie2 30
... user2   movie1 40
... user2   movie3 90
... user2   movie4 90
... user3   movie1 40
... user3   movie3 80
... user4   movie1 70
... user4   movie2 10
... ''')

NOTE that only column b is jgrouped because jaccard operates on packs as sets, not weighted values, So packing also column c would not make any difference.

>>> sql("""select u1.userid,u2.userid, jaccard(u1.pk, u2.pk) as similarity
...     from
...         (select a as userid, jgroup(b)  as pk from table1 group by a) as u1,
...         (select a as userid, jgroup(b) as pk from table1 group by a) as u2
...     where u1.userid<u2.userid""")
userid | userid | similarity
--------------------------------
user1  | user2  | 0.25
user1  | user3  | 0.333333333333
user1  | user4  | 1.0
user2  | user3  | 0.666666666667
user2  | user4  | 0.25
user3  | user4  | 0.333333333333
sorensendice function
sorensendice(jpack1, jpack2)

Return jaccard similarity value of two jpacks.

Example:

>>> table1('''
... user1   movie1 20
... user1   movie2 30
... user2   movie1 40
... user2   movie3 90
... user2   movie4 90
... user3   movie1 40
... user3   movie3 80
... user4   movie1 70
... user4   movie2 10
... ''')

NOTE that only column b is jgrouped because jaccard operates on packs as sets, not weighted values, So packing also column c would not make any difference.

>>> sql("""select u1.userid,u2.userid, sorensendice(u1.pk, u2.pk) as similarity
...     from
...         (select a as userid, jgroup(b)  as pk from table1 group by a) as u1,
...         (select a as userid, jgroup(b) as pk from table1 group by a) as u2
...     where u1.userid<u2.userid""")
userid | userid | similarity
----------------------------
user1  | user2  | 0.4
user1  | user3  | 0.5
user1  | user4  | 1.0
user2  | user3  | 0.8
user2  | user4  | 0.4
user3  | user4  | 0.5

stopwords functions

filterstopwords function
filterstopwords(str) → str

Returns the input text with the stopwords removed. The case of the first letter matters.

Examples:

>>> table1('''
... 'this and wood'         'NO more No words'
... 'No more stop words'    'more free time'
... ''')
>>> sql("select filterstopwords(a,b) from table1")
filterstopwords(a,b)
--------------------
wood NO words
stop words free time

stringdist functions

damlevendist function
damlevendist(str1, str2)

Returns int which is the damerau-levenshtein distance between str1 and str2

Examples:

>>> sql("select damlevendist('HURQBOHP','QKHOZ') ")
damlevendist('HURQBOHP','QKHOZ')
--------------------------------
6
levendist function
levendist(str1, str2)

Returns int which is the levenshtein distance between str1 and str2

Examples:

>>> sql("select levendist('HURQBOHP','QKHOZ') ")
levendist('HURQBOHP','QKHOZ')
-----------------------------
7
quickstrdist function
damlevendist(str1, str2)

Returns int which is a string distance between str1 and str2, based on Python’s difflib library. It is a lot faster than levendist or damlevendist.

Examples:

>>> sql("select quickstrdist('HURQBOHP','QKHOZ') ")
quickstrdist('HURQBOHP','QKHOZ')
--------------------------------
8

termsetops functions

tset function
termsetdiff(termset1, termset2) → termset

Returns the termset that is the difference of sets of termset1 - termset2.

Examples:

>>> table1('''
... 't1 t2 t3' 't2 t3'
... 't3 t2 t1' 't3 t4'
... ''')
>>> sql("select tset(a,b) from table1")
tset(a,b)
-----------
t1 t2 t3
t1 t2 t3 t4
tsetcombinations function
tsetcombinations(termset, r) → termset

Returns all the termset combinations of length r. It is a multiset operator that returns one column but many rows.

See also

>>> sql("select tsetcombinations('t1 t2 t3 t4',2)")
C1
-----
t1 t2
t1 t3
t1 t4
t2 t3
t2 t4
t3 t4
tsetdiff function
termsetdiff(termset1, termset2) → termset

Returns the termset that is the difference of sets of termset1 - termset2.

Examples:

>>> table1('''
... 't1 t2 t3' 't2 t3'
... 't3 t2 t1' 't3 t4'
... ''')
>>> sql("select tsetdiff(a,b) from table1")
tsetdiff(a,b)
-------------
t1
t1 t2

text functions

comprspaces function
comprspaces(text1[, text2, ...]) → text

This function strips (from the beginning and the end) and compresses the spaces in its input.

Examples:

>>> table1('''
... '   an example    with spaces      '    'another    example with spaces         '
... ''')
>>> sql("select comprspaces(a,b) from table1")
comprspaces(a,b)
--------------------------------------------------
an example with spaces another example with spaces
contains function
contains(str1, str2) → bool

Returns true if string str1 contains str2.

Examples:

>>> sql("select contains('test string', 'str') as test  ")
test
----
1
>>> sql("select contains('test string', 'nostr') as test  ")
test
----
0
cqlfields function

This functions returns the keywords inside a single column or aggregated from multiple columns. It plays well with Unicode.

The difference of cqlkeywords to keywords is that cqlkeywords also strips cql syntax like “title all” or “author all”.

>>> table1('''
... '(title all "scrieri") and (creator all "arghezi") and (title all "other")'
... '("maschinenschreiben") and (language all "ger")'
... '("sauer") and ("übungsbuch")'
... ''')
>>> sql("select cqlfields(a) from table1")
cqlfields(a)
-------------------
title creator title
language
cqlkeywords function
cqlkeywords(text1[, text2, ...]) → text

Returns the keywords inside a single column (text1) or aggregated from multiple columns.

The difference of cqlkeywords to keywords is that cqlkeywords also strips cql syntax like “title all” or “author all” and plain cql directives like ‘creator’, ‘title’…

Examples:

>>> table1('''
... first(second)   third+fourth
... πρωτο(δευτερο)  τριτο_τέταρτο
... 'πέμπτο all'      'έκτο title all τεστ'
... 'title all and something' 'other'
... 'title and something' 'other'
... ''')
>>> sql("select cqlkeywords(a,b) from table1")
cqlkeywords(a,b)
---------------------------------------------------
first second third fourth
πρωτο δευτερο τριτο_τέταρτο
έκτο τεστ
something other
and something other
crc32 function
crc32(args) → int

Returns the CRC32 of args. Numbers are converted to text before hashing is performed.

Examples:

>>> sql("select crc32(65)")
crc32(65)
----------
2658551721
>>> sql("select crc32(6,5)")
crc32(6,5)
----------
1565899724
>>> sql("select crc32(5)")
crc32(5)
----------
2226203566
>>> sql("select crc32('5')")
crc32('5')
----------
1201448970

escapechars function

hashmd5 function
hashmd5(args)

Returns an MD5 hash of args. Numbers are converted to text before hashing is performed.

Examples:

>>> sql("select hashmd5(65)")
hashmd5(65)
--------------------------------
fc490ca45c00b1249bbe3554a4fdf6fb
>>> sql("select hashmd5(6,5)")
hashmd5(6,5)
--------------------------------
f0d95c20cde50e3ca03cab53f986b6c3
>>> sql("select hashmd5(5)")
hashmd5(5)
--------------------------------
e4da3b7fbbce2345d7772b0674a318d5
>>> sql("select hashmd5('5')")
hashmd5('5')
--------------------------------
7000aaf68ca7a93da0af3d03850571c2
hashmd5mod function
hashmd5mod(args, divisor) → int

Returns the modulo with divisor number of the MD5 hash of args. Numbers are converted to text before hashing is performed.

Examples:

>>> sql("select hashmd5mod(65, 3)")
hashmd5mod(65, 3)
-----------------
0
>>> sql("select hashmd5mod(6,5, 4)")
hashmd5mod(6,5, 4)
------------------
2
>>> sql("select hashmd5mod(5, 5)")
hashmd5mod(5, 5)
----------------
3
>>> sql("select hashmd5mod('5', 5)")
hashmd5mod('5', 5)
------------------
4
hashmodarchdep function
hashmodarchdep(args, divisor) → int

Returns a hash of the args.

Note

This hash function is architecture dependent (32bit vs 64bit).

Examples:

>>> sql("select hashmodarchdep(65,5)") 
hashmodarchdep(65,5)
--------------------
...
>>> sql("select hashmodarchdep(6,5)") 
hashmodarchdep(6,5)
-------------------
...
>>> sql("select hashmodarchdep(5,5)") 
hashmodarchdep(5,5)
-------------------
...
>>> sql("select hashmodarchdep('5',5)") 
hashmodarchdep('5',5)
---------------------
...
hashmodarchdep2 function
hashmodarchdep2(arg, divisor) → int

Returns a hash of the args.

Note

This hash function is architecture dependent (32bit vs 64bit). It is specialized for 2 parameters

Examples:

>>> sql("select hashmodarchdep2(65,5)") 
hashmodarchdep2(65,5)
---------------------
...
>>> sql("select hashmodarchdep2(6,5)") 
hashmodarchdep2(6,5)
--------------------
...
>>> sql("select hashmodarchdep2(5,5)") 
hashmodarchdep2(5,5)
--------------------
...
>>> sql("select hashmodarchdep2('5',5)") 
hashmodarchdep2('5',5)
----------------------
...
isvalidutf8 function
isvalidutf8(text) → 1/0

Returns 1 if the input text is in valid UTF-8 format, or 0 if not. This function is used to find corrupted UTF-8 strings with a heuristic based on non common characters.

Examples:

>>> table1('''
... test
... δοκιμή!
... sévignÃ
... évezred
... ''')
>>> sql("select isvalidutf8(a) from table1")
isvalidutf8(a)
--------------
1
1
1
1
keywords function
keywords(text1[, text2, ...]) → text

Returns the keywords inside a single column (text1) or aggregated multiple columns.

Examples:

>>> table1('''
... first(second)   third+fourth
... πρωτο(δευτερο)  τριτο+τέταρτο
... 'πέμπτο all'      'qwer.zxcv'
... ''')
>>> sql("select keywords(a,b) from table1")
keywords(a,b)
---------------------------------------------------
first second third fourth
πρωτο δευτερο τριτο τέταρτο
πέμπτο all qwer zxcv
kwnum function
kwnum(text1[, text2, ...]) → int

Returns the number of simple keywords in a string. Its input should be words separated by spaces, as returned by cqlkeywords or keywords.

Examples:

>>> table1('''
... 'word1 word2 word3'
... 'word1 word2'
... 'word'
... ''')
>>> sql("select kwnum(a) from table1")
kwnum(a)
--------
3
2
1
normalizetext function
normalizetext(text1[, text2, ...]) → text

Normalizes a text by replacing all the non-words except s

,.; with ‘_’

Examples:

>>> table1('''
... first(second)   third+fourth
... πρωτο(δευτερο)  τριτο+τέταρτο
... 'πέμπτο all'      'έκτο title all τεστ'
... ''')
>>> sql("select normalizetext(a,b) from table1")
normalizetext(a,b)
----------------------------------------------------
first_second_ third_fourth
πρωτο_δευτερο_ τριτο_τέταρτο
πέμπτο all έκτο title all τεστ
normuni function
normuni(str)

Returns str normalised in the composed unicode normal form without replacing same look characters. For example this ‘À’ character can be encoded with one or two different characters, normuni() returns an one-character encoded version. This function is important to check true strings equality.

Functions sunitouni() and unitosuni() are used in the examples to make it more comprehensive.

Examples:

Note

Returned results in the next two examples should look the same, if not that is a bug at the combined characters rendering of the shell that the specifications was created.

>>> sql("select sunitouni('C\u0327') as test  ")
test
----

>>> sql("select normuni(sunitouni('C\u0327')) as test  ")
test
----
Ç
>>> sql("select unitosuni(normuni(sunitouni('C\u0327'))) as test  ")
test
------
\u00c7
regexpcountuniquematches function
regexpcountuniquematches(pattern, expression)

Returns the number of matches of pattern in expression.

Examples:

>>> sql("regexpcountuniquematches 'start' 'start end start'  ")
regexpcountuniquematches('start','start end start')
---------------------------------------------------
1
>>> sql("regexpcountuniquematches 'start end' 'start end start'  ")
regexpcountuniquematches('start end','start end start')
-------------------------------------------------------
1
regexpcountwithpositions function
regexpcountwithpositions(pattern, expression, start = 0, min = 0.5, multiply = 1)

Returns a score of positioned matches of pattern in expression.

Examples:

>>> sql("regexpcountwithpositions 'start' 'start end start'  ")
regexpcountwithpositions('start','start end start')
---------------------------------------------------
1.75
>>> sql("regexpcountwithpositions 'start' 'start end start'  ")
regexpcountwithpositions('start','start end start')
---------------------------------------------------
1.75
>>> sql("regexpcountwithpositions 'first' 'first second third fourth'")
regexpcountwithpositions('first','first second third fourth')
-------------------------------------------------------------
0.75
>>> sql("regexpcountwithpositions 'fourth' 'first second third fourth'")
regexpcountwithpositions('fourth','first second third fourth')
--------------------------------------------------------------
1.5
>>> sql("regexpcountwithpositions 'fourth' 'first second third fourth' 1")
regexpcountwithpositions('fourth','first second third fourth','1')
------------------------------------------------------------------
0.5
regexpcountwords function
regexpcountwords(pattern, expression)

Returns the number of matches of pattern in expression. If a match includes more than one words then it returns the number of the words.

Examples:

>>> sql("regexpcountwords 'start' 'start end start'  ")
regexpcountwords('start','start end start')
-------------------------------------------
2
>>> sql("regexpcountwords 'start end' 'start end start'  ")
regexpcountwords('start end','start end start')
-----------------------------------------------
2
regexpr function
regexp(pattern, expression[, replacestr])

This function returns a match to the first parenthesis of pattern or replaces the matches of pattern in expression with replacestr. Pattern Syntax is according to python’s re module.

Examples use inversion.

Examples:

>>> table1('''
... 25
... ''')
>>> sql("regexpr 'start\s(\w+)\send' 'start otherword end'  ")
regexpr('start\s(\w+)\send','start otherword end')
--------------------------------------------------
otherword
>>> sql("regexpr '\W+' '@#$%@$#% tobereplaced @#$%@#$%' 'nonword'  ")
regexpr('\W+','@#$%@$#% tobereplaced @#$%@#$%','nonword')
---------------------------------------------------------
nonwordtobereplacednonword
>>> sql("select regexpr('(\w+).*?(\w+)', 'one two three')")
regexpr('(\w+).*?(\w+)', 'one two three')
-----------------------------------------
["one","two"]
regexprfindall function
regexprfindall(pattern, text)

This function returns all matches of pattern in text.

Examples:

>>> sql("select regexprfindall('\w+', 'one')")
regexprfindall('\w+', 'one')
----------------------------
["one"]
>>> sql("select regexprfindall('\w+', 'one two three')")
regexprfindall('\w+', 'one two three')
--------------------------------------
["one","two","three"]
regexprmatches function
regexprmatches(pattern, arg)

This function returns true if the pattern matches arg or false otherwise.

Examples use inversion.

Examples:

>>> sql("regexprmatches '(a)' 'qwer a qwer'  ")
regexprmatches('(a)','qwer a qwer')
-----------------------------------
1
stripchars function
stripchars(str[, stripchars])

Returns str removing leading and trailing whitespace characters or stripchars characters if given. Works like python’s strip function.

Examples:

>>> sql("select stripchars(' initial and final spaces  ') as test  ")
test
------------------------
initial and final spaces
>>> sql("select stripchars(' <initial and final spaces>  ',' <>') as test  ")
test
------------------------
initial and final spaces
>>> sql("select stripchars(null)")
stripchars(null)
----------------
None
sunitouni function
sunitouni(str)

Returns str replacing literal unicode code points to their string representation.

Examples:

>>> sql("select sunitouni('br\u00fbl\u00e9') as test  ")
test
-------
brûlé
>>> sql("select sunitouni('\u that is not a unicode code point') as test  ")
test
-----------------------------------
\u that is not a unicode code point
>>> sql("select sunitouni(null)")
sunitouni(null)
---------------
None
>>> sql("select sunitouni(9)")
sunitouni(9)
------------
9
textreferences function
textreferences(text, maxlen = 5, pattern = (|_)(1|2)d{3, 3}(|_))

Returns the “Reference” section of documents. To find it, it searches for parts of the document that have a high density of pattern matches.

Examples:

>>> table1('''
... eeeeeeeeeeeeee
... gggggggggggggg
... aaaaaaaaaaaaaa
... bbbbbbbbbbbbbb
... aaa_1914_ccccc
... bbb_2014_bbbbb
... dddd_2008_ddddddd
... cccc_2005_ccccc
... ccccc_2014_ccccc
... dddddd_2009_ddddd
... gggggggggggggg
... ''')
>>> sql("select textreferences(group_concat(a,'\n'),1,'(|_)(1|2)\d{3,3}(|_)') as a from table1")
a
--------------------------------------------------------------------------------------------------
aaa_1914_ccccc
bbb_2014_bbbbb
dddd_2008_ddddddd
cccc_2005_ccccc
ccccc_2014_ccccc
dddddd_2009_ddddd

If an inadequate amount of newlines is found, it returns the text as is.

>>> sql("select textreferences(group_concat(a,'.')) from table1")
textreferences(group_concat(a,'.'))
-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------
eeeeeeeeeeeeee.gggggggggggggg.aaaaaaaaaaaaaa.bbbbbbbbbbbbbb.aaa_1914_ccccc.bbb_2014_bbbbb.dddd_2008_ddddddd.cccc_2005_ccccc.ccccc_2014_ccccc.dddddd_2009_ddddd.gggggggggggggg
>>> sql("select textreferences('')")
textreferences('')
------------------
textwindow function
textwindow(text, previous_word_count = 0, next_word_count = 0, middle_word_count = 1, pattern = None)

Returns a rolling window over the text. The window includes previous_word_count words before the middle word and next_word_count words after the middleword. Optionally you may choose more than one words to be in the middle, and filter your window with a regular expression pattern

If the value of prev_word_count or next_word_count is negative, and a pattern exists then all matches of the pattern are filtered out from prev and next output.

Examples:

>>> sql("select textwindow('This is a test phrase')  ")
middle
------
This
is
a
test
phrase
>>> sql("select textwindow('This is a test phrase',1,1)  ")
prev1 | middle | next1
-----------------------
      | This   | is
This  | is     | a
is    | a      | test
a     | test   | phrase
test  | phrase |
>>> sql("select textwindow('This is a test phrase',1,1,2)  ")
prev1 | middle      | next1
----------------------------
      | This is     | a
This  | is a        | test
is    | a test      | phrase
a     | test phrase |
test  | phrase      |
>>> sql("select textwindow('This is a test phrase  with pdb codes: 1abc 2bcd 3cde 4bde ',-2,1,2,'\d\w{3}' )  ")
prev1 | prev2  | middle    | next1
----------------------------------
pdb   | codes: | 1abc 2bcd | 3cde
pdb   | codes: | 2bcd 3cde | 4bde
pdb   | codes: | 3cde 4bde |
pdb   | codes: | 4bde      |
>>> sql("select textwindow('This is a test phrase (123) for filtering middle with a number',1,1,'\d+')  ")
prev1  | middle | next1
-----------------------
phrase | (123)  | for
textwindow2s function
textwindow2s(text, prev_word_count, middle_word_count, next_word_count, pattern)

Returns a rolling window in the text. The window includes numberofprev words before the middle word and numberofnext words after the middleword. You may filter your window using a pattern.

Examples:

>>> sql("select textwindow2s('This is a test phrase',2,1,1)  ")
prev    | middle | next
-------------------------
        | This   | is
This    | is     | a
This is | a      | test
is a    | test   | phrase
a test  | phrase |
>>> sql("select textwindow2s('This is a test phrase',2,1,1, '\w{4}')  ")
prev   | middle | next
------------------------
       | This   | is
is a   | test   | phrase
a test | phrase |
uniqueterms function
uniqueterms(text1[, text2, ...]) → text

Returns the unique terms of an input string.

Examples:

>>> table1('''
... 'word1 word2 word2'
... 'word1 word2 word1'
... 'word'
... ''')
>>> sql("select uniqueterms(a) from table1")
uniqueterms(a)
--------------
word1 word2
word1 word2
word
unitosuni function
unitosuni(str)

Returns str replacing non-ascii characters with their equivalent unicode code point literal at the u00 format.

Examples:

>>> sql("select unitosuni('brûlé') as test  ")
test
---------------
br\u00fbl\u00e9
>>> sql("select sunitouni(null)")
sunitouni(null)
---------------
None
>>> sql("select unitosuni(9)")
unitosuni(9)
------------
9
utf8clean function
utf8clean(text) → text

Removes control characters from input utf-8 text.

Examples:

>>> table1('''
... test
... δοκιμή!
... sévignÃ
... évezred
... ''')
>>> sql("select utf8clean(a) from table1")
utf8clean(a)
-------------
test
δοκιμή!
sévignÃ
évezred

tzconverter functions

tzconverter function
tzconverter(timestamp, offset)

Returns timestamps converted from UTC to target timezone, indicated by the offset parameter.

Example:

>>> table1('''

… “2010-12-05T00:00:00+00:00” … “2010-12-05T00:01:00+00:00” … “2010-12-05T00:02:00+00:00” … ‘’‘)

… ‘’‘) >>> sql(“select a, tzconverter(a,’-01:00’) from table1 “) a | tzconverter(a,’-01:00’) —————————————————– 2010-12-05T00:00:00+00:00 | 2010-12-04T23:00:00-01:00 2010-12-05T00:01:00+00:00 | 2010-12-04T23:01:00-01:00 2010-12-05T00:02:00+00:00 | 2010-12-04T23:02:00-01:00

… ‘’‘) >>> sql(“select a, tzconverter(a,’-01’) from table1 “) a | tzconverter(a,’-01’) ————————————————– 2010-12-05T00:00:00+00:00 | 2010-12-04T23:00:00-01 2010-12-05T00:01:00+00:00 | 2010-12-04T23:01:00-01 2010-12-05T00:02:00+00:00 | 2010-12-04T23:02:00-01

>>> sql("select a, tzconverter(a,'-0100')  from table1 ")
a                         | tzconverter(a,'-0100')
----------------------------------------------------
2010-12-05T00:00:00+00:00 | 2010-12-04T23:00:00-0100
2010-12-05T00:01:00+00:00 | 2010-12-04T23:01:00-0100
2010-12-05T00:02:00+00:00 | 2010-12-04T23:02:00-0100
>>> sql("select a, tzconverter(a,'+00:30')  from table1 ")
a                         | tzconverter(a,'+00:30')
-----------------------------------------------------
2010-12-05T00:00:00+00:00 | 2010-12-05T00:30:00+00:30
2010-12-05T00:01:00+00:00 | 2010-12-05T00:31:00+00:30
2010-12-05T00:02:00+00:00 | 2010-12-05T00:32:00+00:30

util functions

execprogram function
execprogram(stdin=null, program_name, parameters[, raise_error]) → text or blob

Function execprogram executes a shell command and returns its output. If the value of the first argument is not null, the arguments value will be pushed in program’s Standard Input.

If the program doesn’t return a 0 return code, then a YeSQL error will be raised, containing the contents of the program’s error stream.

If the last argument of execprogram is set to null, then all program errors will be returned as null (see “cat non_existent_file” examples below).

Every one of the program’s parameters must be provided as different arguments of the execprogram call (see “cat -n” example below).

Note

Function execprogram tries by default to convert the program’s output to UTF-8. If the conversion isn’t succesfull, then it returns the output as a binary blob.

Examples:

>>> table1('''
... echo    test
... echo    1
... ''')
>>> sql("select execprogram(null, a, b) from table1")
execprogram(null, a, b)
-----------------------
test
1
>>> sql("select execprogram(null, null, '-l')") 
Traceback (most recent call last):
...
OperatorError: YeSQL SQLError:
Operator EXECPROGRAM: Second parameter should be the name of the program to run
>>> sql("select execprogram(null, null, '-l', null)") 
execprogram(null, null, '-l', null)
-----------------------------------
None
>>> sql("select execprogram('test', 'cat')")
execprogram('test', 'cat')
--------------------------
test
>>> sql('''select execprogram('test', 'cat', '-n')''') 
execprogram('test', 'cat', '-n')
--------------------------------
     1        test
>>> sql("select execprogram(null, 'NON_EXISTENT_PROGRAM')") 
Traceback (most recent call last):
...
OperatorError: YeSQL SQLError:
Operator EXECPROGRAM: [Errno 2] No such file or directory
>>> sql("select execprogram(null, 'cat', 'non_existent_file')") 
Traceback (most recent call last):
...
OperatorError: YeSQL SQLError:
Operator EXECPROGRAM: cat: non_existent_file: No such file or directory
>>> sql("select execprogram(null, 'cat', 'non_existent_file', null)") 
execprogram(null, 'cat', 'non_existent_file', null)
---------------------------------------------------
None
failif function
failif(condition[, messsage])

If condition is true, raises an error. If message is provided, the message is included in raised error.

Examples:

>>> sql("select failif(1=1,'exception') as answer") 
Traceback (most recent call last):
...
OperatorError: YeSQL SQLError:
Operator FAILIF: exception
>>> sql("select failif(1=0,'exception') as answer") 
answer
------
0
>>> sql("select failif(1=1) as answer") 
Traceback (most recent call last):
...
OperatorError: YeSQL SQLError:
Operator FAILIF: an error was found
gz function
gz(text) → gzip compressed blob

Function gz compresses its input with gzip’s maximum compression level.

Examples:

>>> table1('''
... "qwerqewrqwerqwerqwerqwerqwer"
... "asdfasdfasdfasdfasdfasdfsadf"
... ''')
>>> sql("select length(a), length(gz(a)) from table1")
length(a) | length(gz(a))
-------------------------
28        | 20
28        | 18
sleep function
sleep(seconds)

This function waits for the given number of seconds before returning. The seconds parameters can be fractional (e.g. 0.1 will sleep for 100 milliseconds).

Examples:

>>> sql("select sleep(0.1)")
sleep(0.1)
----------
0.1
ungz function
ungz(blob) → text

Function ungz decompresses gzip blobs. If the input blobs aren’t gzip compressed, then it just returns them as they are.

Examples:

>>> table1('''
... "qwerqwerqwer"
... "asdfasdfasdf"
... ''')
>>> sql("select ungz(gz(a)) from table1")
ungz(gz(a))
------------
qwerqwerqwer
asdfasdfasdf
>>> sql("select ungz('string'), ungz(123)")
ungz('string') | ungz(123)
--------------------------
string         | 123
urlrequest function
urlrequest([null, ]url) → response

This functions connects to the url (via GET HTTP method) and returns the request’s result. If first parameter is null, then in case of errors null will be returned.

Examples:

>>> sql("select urlrequest('http://www.google.com/not_existing')")
Traceback (most recent call last):
...
HTTPError: HTTP Error 404: Not Found
>>> sql("select urlrequest(null, 'http://www.google.com/not_existing') as result")
result
------
None
urlrequestpost function
urlrequestpost(data_jdict, [null, ]url) → response

This functions connects to the url (via POST HTTP method), submits the data_jdict, and returns the request’s result. If second parameter is null, then in case of errors null will be returned.

Examples:

>>> sql('''select urlrequestpost('{"POST_param_name":"data"}', 'http://www.google.com/not_existing')''')
Traceback (most recent call last):
...
HTTPError: HTTP Error 404: Not Found
>>> sql('''select urlrequestpost('["POST_param_name","data"]', null, 'http://www.google.com/not_existing') as result''')
result
------
None
>>> sql("select urlrequestpost(jdict('param1','value1'), null, 'http://www.google.com/not_existing') as result")
result
------
None
>>> sql("select urlrequestpost(jpack('param1','value1'), null, 'http://www.google.com/not_existing') as result")
result
------
None

variables functions

flowname function
flowname([str])

Sets and retrieves, ‘flowname’ variable

Examples:

>>> sql("flowname test flow ")
flowname('test flow')
---------------------
test flow
>>> sql("flowname")
flowname()
----------
test flow
>>> sql("flowname 'arg1' arg2") 
Traceback (most recent call last):
    ...
OperatorError: YeSQL SQLError:
Operator FLOWNAME: Flowname accepts only 1 argument
getvar function
getvar(varname) → value

Returns the value of varname.

Examples:

>>> sql("var 't' 5")
var('t','5')
------------
5
>>> sql("getvar 't'")
getvar('t')
-----------
5
>>> sql("select getvar('t', random())")
getvar('t', random())
---------------------
5
requirevars function
requirevars(varname1[, varname2, ...])

Checks if all variables (varname1,…) exist. If not it throws an exception.

Examples:

>>> sql("var 'cv1' 5")
var('cv1','5')
--------------
5
>>> sql("var 'cv2' 10")
var('cv2','10')
---------------
10
>>> sql("requirevars 'cv1' 'cv2'")
requirevars('cv1','cv2')
------------------------
1
>>> sql("requirevars cv1 cv2")
requirevars('cv1 cv2')
----------------------
1
>>> sql("requirevars 'cv1' 'testvar'") 
Traceback (most recent call last):
...
OperatorError: YeSQL SQLError:
Operator REQUIREVARS: Variable testvar isn't initialized
setexecdb function
setexecdb(str)

Sets the database path/filename for exec operator.

var function
var(varname[, value]) → value

Sets (if both varname and value are given) or returns (if only varname is given) the contents of a variable.

Examples:

>>> sql("var 'v'")  
Traceback (most recent call last):
    ...
OperatorError: YeSQL SQLError:
Operator VAR: Variable 'v' does not exist
>>> sql("var 'v' 5")
var('v','5')
------------
5
>>> sql("var 'v'")
var('v')
--------
5
>>> sql("select var('v')")
var('v')
--------
5