Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

binary-search-tree: Binary Search Tree Exercise #513

Open
bmulvihill opened this issue Dec 21, 2016 · 12 comments
Open

binary-search-tree: Binary Search Tree Exercise #513

bmulvihill opened this issue Dec 21, 2016 · 12 comments

Comments

@bmulvihill
Copy link
Contributor

Hey I've been working on the BST exercise over at exercism/crystal#72 - and while researching other implementations I noticed a lot aren't fully implementing a binary search tree.

We are only implementing and testing insertion and traversal, but not searching and deletion. A binary search tree is a great data structure to learn and implementing all the operations provide additional challenge to the programmer. Maybe we could structure the tests in approximate descending difficulty - search, traverse, insert, delete.

I also think we should separate tests which are language specific (with some sort of annotation) from the actual BST functionality (i.e. returning an enumerator).

@remcopeereboom
Copy link
Contributor

We are only implementing and testing insertion and traversal, but not searching and deletion. A binary search tree is a great data structure to learn and implementing all the operations provide additional challenge to the programmer. Maybe we could structure the tests in approximate descending difficulty - search, traverse, insert, delete.

Some chaotically written down thoughts:

I like to add those options as suggestions when I nit them, but I kind of feel like search (#include?) should be mandatory for a BST. A nice thing about using #include? (I think) is that you can either include Enumerable, which gives you an easy solution, but isn't very performant ( vs ), or you can define it yourself to be more efficient. This is a nice way to let people decide for themselves which trade-offs they want to make.

It's worth pointing out that it's not necessarily standard to implement delete operations for binary trees, because in general doing so changes the performance behaviour of searching. Unless I'm much mistaken, depending on the order of operations, it's possible to end up with a tree in which searching takes .

There are also different ways of traversing the tree. Should we add those as well? More generally, what would comprise a "complete BST" and what goals would adding each requirement provide? Is the goal to introduce exercists to the data structure, to the ruby language and its features, or something else?

@bmulvihill
Copy link
Contributor Author

bmulvihill commented Jan 26, 2017

@remcopeereboom When thinking about what operations to include on a BST I look back to CS Data Structures course and they usually always go through the main 4 (Traverse, Search, Insert, Delete).

Quick question how would implementing delete change the behavior of search? I've always seen these methods implemented separately of each other.

Regarding the different ways of traversal I don't think there is a "wrong" way, but through nits you can encourage more efficiency. I think the goal for this particular problem (and any algorithm type problem) is not only to practice the language but also learn the algorithm/data structure and its benefits since they are building blocks for a deeper understanding of programming in general.

@remcopeereboom
Copy link
Contributor

remcopeereboom commented Jan 26, 2017

@bmulvihill

Quick question how would implementing delete change the behavior of search? I've always seen these methods implemented separately of each other.

The delete operations can cause the tree to be unbalanced towards one or the other direction. It's possible to end up with a tree that looks like:

/\
  /\
    /\

This transforms the search from logarithmic to linear worst case and average case performance. The problem stems from the fact that while it is trivial to delete nodes that have either zero children or one child, deleting nodes that have two children mean that you now have to select a branch to be the top node. Depending on your bias towards either the predecessor or the successor child node, this can eventually lead to either a left-biased or a right-biased tree respectively. This will lead to a tree that has linear performance for the major operations.

You can mitigate the problem by alternating left and right biasses. Experimentally (after many random insertions or deletions) this leads to O(sqrt n) average case performance, because deletions and insertions take place in random branches (apparently the alternating bias isn't quite enough to keep the tree perfectly balanced, but still good enough to prevent the tree from being linear). I'm pretty sure that nobody has proven this algebraically yet however, so that is still an open problem in computer science!

@bmulvihill
Copy link
Contributor Author

@remcopeereboom
Couldn't this problem also exist in insertion if we are always inserting values to one side of the tree, and wouldn't this be more of a problem of balancing the tree? Which can be solved by more specific types of BST (i.e. Red Black Trees). That would be outside the scope of this exercise though, or it could be a harder version of the same problem.

@remcopeereboom
Copy link
Contributor

Couldn't this problem also exist in insertion if we are always inserting values to one side of the tree, and wouldn't this be more of a problem of balancing the tree?

Absolutely. Worst case performance of a BST is O(n), not O(log n). But generally we don't have worst case performance because we don't insert elements in monotonically increasing or decreasing order. Instead we insert elements at random. The average case or expected performance is O(log n) for a BST with random insertions. But only O(sqrt n) for a tree with random insertions and deletions.

Which can be solved by more specific types of BST (i.e. Red Black Trees).

A self balancing tree like a Red-Black Tree does solve this problem. In fact Red-Black trees are used specifically for this purpose!

@bmulvihill
Copy link
Contributor Author

Good points @remcopeereboom I had to do a little research and refresh my brain a bit. I still think there is value in implementing the delete operation as it requires some thought, and is considered one of the basic operations of a BST (a least from an educational standpoint). Through nits one can encourage a more robust BST, or this exercise could just be a "basic" BST and there could be a different exercise that builds on the basic tree and deals with balancing.

@remcopeereboom
Copy link
Contributor

I still think there is value in implementing the delete operation as it requires some thought, and is considered one of the basic operations of a BST (a least from an educational standpoint).

I can get behind that.

Through nits one can encourage a more robust BST, or this exercise could just be a "basic" BST and there could be a different exercise that builds on the basic tree and deals with balancing.

There's value in both approaches. My concern with the nits is that I think most people won't get one. Perhaps some information in the README on the performance would nudge people in the right direction?

@remcopeereboom
Copy link
Contributor

remcopeereboom commented Feb 25, 2017

I've been working on writing new tests for this problem that fix some of the issues discussed above, will provide a little more challenge, and will hopefully guide people a little more to some Ruby conventions for data structures.

I am kind of stuck on whether to force the tree to add nodes in the order they are given or to allow people to reorder the tree as they see fit as long as they hold to the requirements that all value left are <= to the parent and that all values to the right are > to the parent. Didactically I think fixing the node order is best. It also has the benefit of making the tests easier. But the downside is that there is less flexibility in how to implement things. I would love other people's thoughts on that.

This question also has some implications for what to consider tree identity. Do we care only that the tree contains the same items, or do we care in which subtrees they are in. I am strongly leaning towards the latter, because it seems reasonable that if a == b, that a.left == b.left and a.right == b.right. That said, there are plenty of examples where that kind of transitive behaviour does not hold in Ruby (operations on Float's being a prime example).

The proposed (expanded) API:

class BST
  def self.new
    # Returns an empty tree.
  end

  def size
    # Returns the size of the tree.
  end

  def empty?
    # Returns true if the tree is empty, false otherwise.
  end

  def item
    # Returns the item at the root of the tree.
  end

  def left
    # Returns the left sub-tree.
  end

  def right
    # Returns the right sub-tree.
  end

  def include?(item)
     # Returns true if the item is in the tree, false otherwise.
  end

  def insert(item)
    # Inserts an item into the tree.
  end

  def delete(item)
    # Deletes the item if it is in the tree, does nothing if it is not.
  end

  def each_in_order(&block)
    # Yields the elements in sorted order
    # Returns an enumerator to each_in_order if no block is given.
    # Returns self if a block is given.
  end

  def ==(other)
    # Returns false if other is not a BST.
    # Returns true if item in root equals item in other's root and left == other.left and right == other.right,
    # false othewise.
  end
end

@stale
Copy link

stale bot commented Apr 29, 2017

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.

@kotp kotp added the stale label May 1, 2017
@stale stale bot removed the stale label May 1, 2017
@kotp kotp added stale and removed wontfix labels May 1, 2017
@stale stale bot removed the stale label May 1, 2017
@stale stale bot added the stale label Jun 30, 2017
@stale
Copy link

stale bot commented Jun 30, 2017

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.

@stale
Copy link

stale bot commented Aug 29, 2017

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.

@Insti Insti removed the stale label Sep 1, 2017
@stale
Copy link

stale bot commented Oct 31, 2017

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.

@stale stale bot added the stale label Oct 31, 2017
@Insti Insti added the pinned label Oct 31, 2017
@stale stale bot removed the stale label Oct 31, 2017
@Insti Insti changed the title Binary Search Tree Exercise binary-search-tree: Binary Search Tree Exercise Sep 22, 2018
@kytrinyx kytrinyx removed the pinned label Oct 13, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants