Code Review

by Brown, Thursday, January 14, 2021, 16:28 (255 days ago)

Cezary Kaliszyk went through the Proofgold code recently and found a number of issues. In some cases fixing the issues led to significant speedup in the code (e.g., making use of tail recursion, chaning garbage collection settings, changing defaults on the database caching, reading int32/hashval byte by byte instead of bit by bit when possible, and other issues I don't recall offhand).

There were some other suggestions he gave that haven't been followed up on yet:

1. There are many places in the code where a try is paired with an arbtrary catch-all wildcard. This is at least bad style since in principle we should know what exceptions are possible at the given try. It is also a likely way to introduce real bugs since an exception intended to be caught somewhere else is caught "early". We should also make sure all exceptions are actually caught in order to avoid the process "silently exiting".

2. It's likely that some of the recursive functions can be rewritten to be tail recursive, making them significantly more efficient.

3. Currently there's no (or almost no?) signal handling, but obviously there should be.

There's more, but I can add them when they come to mind.

As for (1) I noticed there are some exceptions defined in the code that are never raised, so these could be deleted (unless I'm missing something):

blocktree: NoReq
ctre: MissingCTreeElt InsufficientInformation FoundHashval
mathdata: UnknownSigna UnknownTerm NotKnown NonNormalTerm TermLimit BetaLimit

We should probably identify where every exception might be raised and might be caught, and especially which exceptions can escape a function or pass through a function. I don't know a quick way of doing this and there are well over 1000 functions, so doing it by hand seems unrealistic.

As for (2) there are over 300 recursively defined functions, so it will take some time to look through them and determine if they can be made tail recursive. I'm not as motivated as I'd like to be to start going through them by hand.

Hope this helps.

Code Review

by Brown, Friday, January 15, 2021, 10:11 (255 days ago) @ Brown
edited by Brown, Friday, January 15, 2021, 10:15

I gathered some more information and collected it into a form others might find useful.

There are 32 Proofgold modules, though two of them (config and proofgold) are a bit different. config.ml is created by the configure script and has values specific to the node setup. proofgold.ml is the "main" file that starts all the threads and then runs the console or daemon. (Technically there is a 33rd module proofgoldcli, but this is just a command line interface to communicate with the daemon if it's running).

Each module may define types, values and exceptions. Some values are functions and some of these functions are recursive. Generally I think a code review/code documentation should report the following about each item.

Each type should be given a short text description.

Each exception should be given a short text description, indicate which functions raise the exception and indicate which functions catch the exception.

Each value should be given a short text description. If the value is a function, then the following should be reported:

1. Are there bugs or potential bugs?
2. Does the function terminate and is it intended to?
3. What exceptions can be raised directly by the function?
4. What exceptions can be raised indirectly by the function via an internal function call?
5. What exceptions can be caught by the function?
6. Are there any critical sections in the code? What could go wrong if two threads entered the critical section at onece?

If the function is recursive, then the following should be reported:

7. Is the function tail recursive? If not, could it be modified to be tail recursive?

Here is some lisp code collecting the names of the modules, what modules each depend upon and what items are defined in the module:

http://s000.tinyupload.com/?file_id=19399511542592928349

In total there are almost 1600 items, so a full review and documentation will take some time and is more than I can do by myself. The review can be done in 17 phases where the modules with no dependencies are reviewed first (in parallel in principle) and later modules can be done after previous phases are complete. Calling (review-outline) lists the "plan" with 17 phases. Calling (review-outline t) gives the "plan" with the 1600 items included under each module where it is declared. This full outline is available here:

http://s000.tinyupload.com/?file_id=84028890324646711754

I'll post the outline without items in a comment below.

The "short text descriptions" of many items can be taken from the Qeditas documentation:

http://qeditas.org/docs/QeditasTechDoc.pdf

Note however that the Qeditas documentation is about 5 years out of date, so it will be up to the reviewer/documenter to determine if the old description still applies.

Code Review Outline

by Brown, Friday, January 15, 2021, 10:12 (255 days ago) @ Brown

- Phase 0 (3.6%): (config version)
+ config (3.4%)
+ version (.2%)
- Phase 1 (2.7%): (zarithint utils)
+ zarithint (1.8%)
+ utils (.8%)
- Phase 2 (5.6%): (json ser hashaux secp256k1)
+ json (.7%)
+ ser (2.6%)
+ hashaux (.7%)
+ secp256k1 (1.6%)
- Phase 3 (3.7%): (ripemd160 sha256)
+ ripemd160 (1.5%)
+ sha256 (2.2%)
- Phase 4 (4.6%): (hash)
+ hash (4.6%)
- Phase 5 (4.7%): (cryptocurr db htree logic)
+ cryptocurr (2.1%)
+ db (1.7%)
+ htree (.7%)
+ logic (.1%)
- Phase 6 (17.4%): (signat net mathdata setconfig)
+ signat (2.3%)
+ net (5.0%)
+ mathdata (8.4%)
+ setconfig (1.7%)
- Phase 7 (11.2%): (ltcrpc checking inputdraft)
+ ltcrpc (3.9%)
+ checking (6.0%)
+ inputdraft (1.3%)
- Phase 8 (7.0%): (script assets)
+ script (2.7%)
+ assets (4.3%)
- Phase 9 (3.6%): (tx)
+ tx (3.6%)
- Phase 10 (13.3%): (ctre)
+ ctre (13.3%)
- Phase 11 (.7%): (ctregraft)
+ ctregraft (.7%)
- Phase 12 (4.6%): (block)
+ block (4.6%)
- Phase 13 (4.8%): (blocktree)
+ blocktree (4.8%)
- Phase 14 (7.7%): (commands)
+ commands (7.7%)
- Phase 15 (1.3%): (staking)
+ staking (1.3%)
- Phase 16 (3.6%): (proofgold)
+ proofgold (3.6%)

Code Review (Phase 0)

by Brown, Friday, January 15, 2021, 19:12 (254 days ago) @ Brown

It was clear Phase 0 (config and version) would be relatively easy since there are no functions. About half the values in config were already described in the Qeditas documentation and could be ported over. I did find a few values in config that aren't used anywhere in the code (except maybe setconfig to reset the value), so those could be deprecated:

ltcnotifyport, ltcblockcheckpoint, maxburnrate, ordermatcher

lastcheckpoint is also unused, but I'm guessing this is there in case checkpoints start being used in the future.

Short text items (some ported from the Qeditas documentation) in lisp format is here"

http://s000.tinyupload.com/index.php?file_id=33232730362089140902

I was going to include the text descriptions in this message, but there's a 5000 character limit. I could include them in two replies, but I'm not sure I should. People can just download the link.

We should be able to now move on to Phase 1 (zarithint and utils). These shouldn't be too difficult either. zarithint is basically just a wrapper for zarith.

Code Review (Phase 1)

by Brown, Saturday, January 16, 2021, 12:02 (254 days ago) @ Brown

I wrote descriptions for the Phase 1 items, the ones from the modules zarithint and utils, so now we can start at Phase 2. This process is tedious, so even though I made it through the first two phases in two days shouldn't lead anyone to believe the rest will go as quickly. Without help, the rest probably won't happen at all.

I also refined the things I called "values" before into four kinds of values: constants, refs, hash tables and (non-recursive) functions. Constants are the easiest to descrbie. Refs and hash tables record some state that gets updates, so these require more effort. Functions are the most common kind of value and were the ones I was thinking of in my previous post.

Here is the updated modules.lisp file with the refined values:

http://s000.tinyupload.com/index.php?file_id=01310418650879839563

Here is the updated text file listing the phases, modules and items:

http://s000.tinyupload.com/index.php?file_id=09935063691758664210

Here is an updated lisp file with descriptions of items from the modules in Phase 0 (config, version) and Phase1 (zarith, utils).

http://s000.tinyupload.com/index.php?file_id=12350578141615263308

Code Review (Phase 1)

by BlakeKeiller, Sunday, January 17, 2021, 09:26 (253 days ago) @ Brown

I converted your lisp data into sql and made a basic web page so people can see the Proofgold Code 0.2.3 code annotated by your descriptions. Here are the four modules from your Phase 0 and Phase 1:

config: https://proofgold.org/code.php?m=1
version: https://proofgold.org/code.php?m=4
zarithint: https://proofgold.org/code.php?m=2
utils: https://proofgold.org/code.php?m=5

It's also possible to refer to a specific item:

https://proofgold.org/code.php?m=5#116

There are four modules listed in Phase 2:

json: https://proofgold.org/code.php?m=3
ser: https://proofgold.org/code.php?m=6
hashaux: https://proofgold.org/code.php?m=12
secp256k1: https://proofgold.org/code.php?m=7

I'm considering trying to crowdsource the code review, probably via memo.cash, giving out some BCH to people who look at an item, give a description and answer the basic questions (what exceptions are raised? is it tail recursive?). I can copy useful comments/answers/bug reports into sql entries so they'll be visible via the interface.

Of course, people can also join this forum and give comments and I will copy those over to sql entries as well. For people who don't mind giving their identity to github, they can also use the ai4reason github repo. I'll use comments/review/reports from there as long as I somehow end up getting access to them.

Are there any comments on this plan or the interface?

RSS Feed of thread
powered by my little forum