Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Example Script: Get table names used by data source #11

Merged
merged 25 commits into from
Feb 24, 2021
Merged
Changes from 3 commits
Commits
Show all changes
25 commits
Select commit Hold shift + click to select a range
ca7aaa3
Initial version of script.
susodapop Apr 13, 2020
151fbc9
Adds a check for schema tokens. This rules out CTE's and temp tables.
susodapop Apr 13, 2020
ccccfd7
Cleans up printing code.
susodapop Apr 13, 2020
6a2fa4f
Reorder imports. Switch from .items() to .values() since the dict key
susodapop Apr 13, 2020
fccb4bd
Updates PATTERN to use non-capture groups.
susodapop Apr 13, 2020
820d767
Correct comment pronoun
susodapop Apr 13, 2020
c45555f
Swaps the `--detail` argument from a variable to a Click flag.
susodapop Apr 13, 2020
e51fd7c
Swaps out single-character variable names for more descriptive ones.
susodapop Apr 13, 2020
fea3392
Updates regex to support multiple whitespace characters between keyword
susodapop Apr 13, 2020
4ea52d9
Moves table name extraction to a separate function
susodapop Apr 13, 2020
0a8786c
Initial tests. test_6 currently fails (suggested by Levko)
susodapop Apr 13, 2020
34e6b33
Moves schema check logic out of the extract_tables function.
susodapop Apr 13, 2020
16cb2b4
Adds behavior: strip out identifier prefixes: brackets, double-quotes…
susodapop Apr 13, 2020
5dc273a
Update tests to check for result length and contents, not straight eq…
susodapop Apr 13, 2020
adddfe8
Switches regex to only capture the relevant group.
susodapop Apr 16, 2020
ae68395
Updates tests to check for length of result and equality of contents.
susodapop Apr 16, 2020
a833301
Augments test case for qualified field names.
susodapop Apr 16, 2020
e4bca1e
Adds logic to pre-format the query and split out comma-delimitted
susodapop Apr 16, 2020
600af49
Restyle Example Script: Get table names used by data source
restyled-io[bot] Apr 16, 2020
1c8bce5
Reverts to Black formatting.
susodapop Apr 16, 2020
f9c2b3c
Adds two more tests for comma-delimited matches that also use aliases.
susodapop Apr 17, 2020
87ebb82
Applies Black formatting.
susodapop Apr 17, 2020
df0bfb9
Adds ascii-art that draws attention to the tests
susodapop Apr 17, 2020
77b73aa
Adds console_scipts entry point. Had to exchange underscores for dashes
susodapop Apr 17, 2020
9651ced
Sync with master
susodapop Feb 18, 2021
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
99 changes: 85 additions & 14 deletions redash_toolbelt/examples/find-table-names.py
Original file line number Diff line number Diff line change
@@ -1,16 +1,9 @@
import itertools, json, re
import click
import pytest
from redash_toolbelt import Redash


# This regex captures three groups:
#
# 0. A FROM or JOIN statement
# 1. The whitespace character between FROM/JOIN and table name
# 2. The table name
PATTERN = re.compile(r"(?:FROM|JOIN)(?: )([^\s\(\)]+)", flags=re.IGNORECASE)


def find_table_names(url, key, data_source_id):

client = Redash(url, key)
Expand All @@ -29,18 +22,31 @@ def find_table_names(url, key, data_source_id):
]

tables_by_qry = {
query["id"]: [
match
for match in re.findall(PATTERN, query["query"])
if match in schema_tables or len(schema_tables) == 0
]
query["id"]: extract_table_names(query["query"], schema_tables)
for query in queries
if re.search(PATTERN, query["query"])
}

return tables_by_qry


def extract_table_names(str_sql, schema_tables=[]):
susodapop marked this conversation as resolved.
Show resolved Hide resolved

# This regex captures three groups:
susodapop marked this conversation as resolved.
Show resolved Hide resolved
#
# 0. A FROM or JOIN statement
# 1. The whitespace character(s) between FROM/JOIN and table name
# 2. The table name
PATTERN = re.compile(
r"(?:FROM|JOIN)(?:\s+)([^\s\(\)]+)", flags=re.IGNORECASE | re.UNICODE
)

return [
match
for match in re.findall(PATTERN, str_sql)
if match in schema_tables or len(schema_tables) == 0
]


def print_summary(tables_by_qry):
"""Builds a summary showing table names and count of queries that reference them."""

Expand Down Expand Up @@ -92,3 +98,68 @@ def main(url, key, data_source_id, detail):

if __name__ == "__main__":
main()

def test_1():

sql = """
SELECT field FROM table0 LEFT JOIN table1 ON table0.field = table1.field
"""

tables = extract_table_names(sql)

assert tables == ['table0', 'table1']

def test_2():

sql = """
SELECT field FROM table0 as a LEFT JOIN table1 as b ON a.field = b.field
"""

tables = extract_table_names(sql)

assert tables == ['table0', 'table1']

def test_3():

sql = """
SELECT field FROM table0 a LEFT JOIN table1 b ON a.field = b.field
"""

tables = extract_table_names(sql)

assert tables == ['table0', 'table1']

def test_4():

sql = """
SELECT field FROM schema.table0 a LEFT JOIN schema.table1 b ON a.field = b.field
"""

tables = extract_table_names(sql)

assert tables == ['schema.table0', 'schema.table1']

def test_5():

sql = """
SELECT field
FROM
table0
LEFT JOIN
table1
"""

tables = extract_table_names(sql)

assert tables == ['table0', 'table1']

def test_6():

sql = """
SELECT field FROM table1,table0
WHERE table0.field = table1.field
"""

tables = extract_table_names(sql)

assert tables == ['schema.table0', 'schema.table1']