Build trees or depend on database indexes to enhance text search?

Should we create our own trees to make searches on strings faster, or we can depend on database engine non-clustered indexes on char or Varchar fields to do so? The following facts will help us in deciding and understanding what is really happening in indexed char columns:

  • You can make an indexed column of Varchar2 in Oracle and SQL Server.
  • You can use LIKE operator with indexed Varchar2 column in Oracle and char column types in SQL Server.
  • You can use % or _ characters with like operator while scanning in indexed Varchar2 column in Oracle and there’re some limitations as well in SQL server.
  • Both SQL server and Oracle should have limited length of the char column in order to index it.
  • Index will not improve the scan if the leading character was _ or %[1].
  • Indexes in database engines like Oracle or SQL Server is based on balanced trees[2].
  • Doubly linked lists are the method in which SQL server indexes are built.
  • Function based indexes in Oracle can help to build the index according to some function, because functions are always conflicting with indexes and indexes cannot work fine with all kinds of functions. In SQL server doesn’t have function based indexes but instead you can make an index on a computed column.

Find (m) words in (n) words using SQL Server database:

Using an open connection to the database:

  • Get a word from (m).
  • Find the word in (n) using select query, where (n) words are stored in an indexed column, DB will use a balanced tree to find the word.
  • Go to next (m) word.

This takes  O(m log n) in time complexity big O notation calculations.

Now finding (m) words in (n) words using a tree that’s created in the program:

  • Get a word from (m).
  • Find the word in a tree of (n) words.
  • Go to next (m) word.

It takes the same but without the database connection, this will make it faster.

____________________________________________

[1] LIKE Operator, oracle docs.

[2] So…is it a Seek or a Scan?