2022-08-25 |
billymg |
cool, yeah, i believe the published source is up-to-date with what i'm running myself, except an extra line i added recently to ignore loopback addresses (which didn't cause any noticeable problems until i started running a trb node on the same box as the crawler) |
2022-08-25 |
billymg |
if you meant the crawler data, no export feature at the moment, but running the crawler yourself is fairly trivial and the source can be grabbed here: http://billymg.com/bitdash-crawler-vtree/ |
2022-08-25 |
billymg |
mats: do you mean the logs or the crawler? if logs, then as asciilifeform mentioned (same url structure on logs.bitdash.io since they're the same proggy), plus the entire db dump linked in the footer |
2022-06-20 |
billymg |
i'm going to rent a server as a stopgap until another dulap becomes available in asciilifeform's rack. i'm going to move the crawler (currently on ec2) and the logger (currently on an rk) to this box. additionally i'll spin up a trb node on it, wainot |
2022-05-06 |
billymg |
might be something worth time series charting on the crawler www now that i think about it (trb distance behind prb) |
2022-05-06 |
billymg |
whaack: i noticed on the crawler a day or so ago all the trb nodes stuck at like 133XXX (i wanna say 133411 or around there) when prb was well into the 134XXX range |
2022-04-27 |
billymg |
yeah the crawler www is on ec2 now, the rk couldn't handle all the work done for the homepage |
2022-04-16 |
billymg |
new site is up and all crawler related stuff has been temporarily moved to ec2. the logger is still on the rk in asciilifeform's rack, only now with more resources to itself |
2022-04-04 |
billymg |
btw, still a WIP but i've got an updated version of the crawler www running here now: http://dev.bitdash.io/ |
2022-04-04 |
billymg |
whaack: for the longest time no other nodes were returning it as a peer (which is currently the only way the crawler can discover a new node). i noticed yesterday that it was finally found and queried to see who had reported it |
2022-04-03 |
billymg |
looks like the crawler finally found whaack's new node |
2022-03-25 |
billymg |
nothing appears to be wrong with the crawler either, its logs continue to print out "found new peer!" almost hourly (which it finds when it pings and gets a response from a previously unknown peer returned to it by the getaddrs call to an existing node) |
2022-03-25 |
billymg |
ah, this reminded me. whaack, i queried the crawler's data to answer your question: http://paste.deedbot.org/?id=t1N8 |
2022-03-22 |
billymg |
not the www portion, just the actual crawler part |
2022-03-22 |
billymg |
yeah, don't think i'll need anything fancy. i've started reading SICP and i think when i get far enough along in that i want to re-write my crawler in lisp |
2022-03-11 |
billymg |
whaack: in the case of those two you spot checked just now, the second indeed hasn't returned peers in a while (possibly ever, the crawler doesn't track this) |
2022-03-11 |
billymg |
whaack: asciilifeform's watchglass intentionally does not include the relay byte in order to be compatible with trb. my crawler tries both with/without that byte in order to coax peers out of the node being probed |
2022-03-11 |
billymg |
which is currently the only way for the crawler to discover new nodes, until this feature is added |
2022-03-11 |
billymg |
whaack: for some reason no other node scanned by the crawler has returned your new node as one of its peers |
2022-03-09 |
billymg |
yeah dunno, your node is obviously connected to (and returning) plenty of peers. just that none of those peers has included your node in its list yet (at least not in what it returns to the crawler's getaddr requests) |
2022-03-09 |
billymg |
i'm a bit curious why the crawler hasn't picked up your new node by now. all of those peers returned by watchglass are heathen nodes btw, i just looked them up in the db |
2022-03-06 |
* |
billymg just went to press a new patch for the crawler and realized there's a typo in the root directory, will have to regrind the first two |
2022-02-25 |
billymg |
finally getting back to working on the crawler, i've implemented geolocation (ty for the recommendation punkman) and time series data collection, for charting |
2022-02-14 |
billymg |
whaack: makes sense. like i said my crawler was network i/o bound when single-threaded, adding threading allowed it to send out pings and process results from 100s of nodes simultaneously (whatever you set the max_sockets knob to in the crawler's config) |
2022-02-14 |
billymg |
my crawler uses threading only because the only bottleneck there was network io (waiting for node responses), so a single python thread is more than enough |
2022-02-14 |
billymg |
the logotron and crawler both run on flask atop apache so unfortunately i'm already familiar with it |
2022-01-30 |
billymg |
prior to that it was getting some bogus queries through the crawler www, e.g. lookup info where host=drupal.php |
2022-01-25 |
billymg |
asciilifeform: oh, heh, crawler lost its pg connection (doesn't have the auto-reconnect feature yet), probably what freed up the resources for the logger http://bitdash.io/ |
2022-01-20 |
billymg |
no rush on my end, there are still features i'd like to add to the crawler, and some guides i'd like to publish |
2022-01-13 |
billymg |
yeah, i was thinking of adding bot UI to crawler |
2022-01-05 |
billymg |
yeah, it's on the same box as the crawler, so that could have something to do with it |
2021-12-01 |
billymg |
asciilifeform: any word on when this will be ready? i'm working on some updates to the crawler's www and could use the extra horsepower |
2021-09-17 |
billymg |
alright, i appreciate the info, will look more into SQLite and maybe do a test of it in the crawler. in the meantime might see about just adding some reconnect logic to these programs |
2021-09-17 |
billymg |
my setup is fairly small/simple. only two programs writing (logger and crawler) and two reading (their respective wwws) |
2021-09-17 |
billymg |
asciilifeform: so potentially the crawler is at times tying up postgres such that it times out for the logger? |
2021-09-17 |
billymg |
hmm, actually possibly the crawler still has its connection now |
2021-09-17 |
billymg |
the damn thing keeps losing its postgres connection (same thing happens to my crawler too, and they both stop working at the same time until restarted) |
2021-09-08 |
billymg |
caught a bug in my crawler's genesis though, where two of the sql queries use a different index name than the one that gets defined when initializing from bitdash_schema.sql. i'll post a regrind of the genesis soon but if anyone runs into it the fix is to change the two instances of 'ON CONFLICT ON CONSTRAINT unique_host DO UPDATE SET' to 'ON CONFLICT ON CONSTRAINT |
2021-09-08 |
billymg |
http://logs.nosuchlabs.com/log/asciilifeform/2021-09-07#1056800 << this method works, was able to get the crawler running. i installed all the python libs i needed by specifying exact versions, e.g. `pip install -Iv psycopg2==2.8.6`, and at least with my small list of required deps (flask, psycopg2, and requests) all were available |
2021-09-07 |
billymg |
asciilifeform: i plan to write up a complete guide for this build, including the source files and tarballs where not the default, once this is all done (so far have working mp-wp, just need the crawler and logotron now) |
2021-09-04 |
billymg |
cgra: ^ essentially that list, though asciilifeform's watchglass has a configurable knob for 'peershots', my crawler has that set to 5, not sure what alf's watchglass is set to |
2021-08-11 |
billymg |
http://logs.nosuchlabs.com/log/asciilifeform/2021-08-10#1051875 << the 36 number is any TRB node the crawler has encountered since it started running on my server. the homepage has a now, perhaps more useful, trb nodes active in last 48hrs http://bitdash.io/ |
2021-07-21 |
billymg |
my crawler uses watchglass via an 'import watchglass' statement at the top of the file, but i'm only using a couple methods out of it |
2021-07-20 |
billymg |
asciilifeform: yeah, i swear when i was running my crawler previously, a month or so ago, trb nodes always returned reasonable number of nodes (double or low triple digit counts) |
2021-07-20 |
billymg |
punkman: fake as in not even the heathen crawlers count them as real or ever having existed |
2021-07-17 |
billymg |
just restarted the crawler with peershots=5, it finishes scanning all nodes in the network in about 20 minutes |
2021-07-17 |
billymg |
signpost: interesting, the crawler results do seem to show that it's capped somewhere at about 2000 (i've never seen higher than 2001) |
2021-07-17 |
billymg |
the crawler www is now browsable |
2021-07-13 |
billymg |
the prb crawler has an api, maybe later at some point i could add in some automated cross referencing |
2021-07-10 |
billymg |
http://logs.nosuchlabs.com/log/asciilifeform/2021-07-10#1044769 << nice, looking at this node helped me identify a bug in my crawler |
2021-07-08 |
billymg |
asciilifeform: the peer lists have been captured (the crawler is now stores probe history up to N probes, as set in conf). i'll dump the results somewhere permanent before the cap is reached |
2021-07-08 |
billymg |
asciilifeform: it's consistent across all trb nodes that my crawler has picked up, and all in the last 1-2 hours (first time i've observed it since running this thing) |
2021-07-08 |
billymg |
asciilifeform: is 205.134.172.27 your node? my crawler is showing that sometime in the last hour or so it jumped from around ~40 connected peers (mostly good) to ~1200 peers (mostly fake/spam) |
2021-06-30 |
billymg |
i'm also looking at https://plotly.com/python/ as a potential library for rendering charts/graphs on the crawler www, in case anyone has experience with either, or has other recommendations |
2021-06-29 |
billymg |
asciilifeform: i'm also getting close to making the crawler site live, at least a basic version so that others can take a look and provide feedback. at that point i think i'll need to upgrade from my rk to a bigger rig, especially since i also want to run whaack's block explorer on there |
2021-05-19 |
billymg |
trinque: fwiw my working on the btc network crawler and new www to display obvious centralization of network is to attract more hands |
2021-05-09 |
billymg |
asciilifeform: from there if you added some watchglass methods for getting blocks i could then incorporate those into the crawler (if that is what you meant by trying to analyze block propagation) |
2021-05-09 |
billymg |
as soon as that's up will publish genesis for the crawler portion |
2021-05-09 |
billymg |
my goal in making this crawler is to get more "bitcoiners" running trb nodes, and i suspect some al gore / nate silver stats and infograffix will widen the pool of those who see there is a problem |
2021-05-09 |
billymg |
the ~8500 "actual" nodes number seems to be inline with what heathen trackers report as the total number of nodes on the network (~9k), so i suspect the crawler is nearly complete in mapping out reachable nodes |
2021-05-09 |
billymg |
good morning, asciilifeform. my crawler seems to have hit a spam vein on the network, total unique IPs in the db exploded over the last two days to over 100k (these all come only from what a node returns in a 'getaddr' request). of those, when subsequently interrogated, only 8580 respond with a valid version message |
2021-05-07 |
billymg |
i initially looked at it with the idea of repurposing for my crawler, barfed at 1001 dependencies pulled in, then remembered, "hey, watchglass does this" |
2021-05-07 |
billymg |
the updated version of the crawler has been humming along nicely since last night, it's now up to ~4900 nodes discovered (heathen sites report over 9000) |
2021-05-05 |
billymg |
http://logs.nosuchlabs.com/log/asciilifeform/2021-05-05#1035654 << this wasn't even on my mp-wp todo list but since using postgres for the crawler it's now jumped near the top |
2021-05-04 |
billymg |
from here i'm just going to proceed with making the crawler send the correct version message depending on whether it's trying to reach a trb or prb node. but yes, perhaps could write it so it tries once with '99999' then tries with '70001' and records result |
2021-05-03 |
billymg |
asciilifeform: ah, interesting. now i'm wondering what the crawler could do to coax a node into sending a 'heathen command' in a reasonable amount of time |
2021-05-03 |
billymg |
asciilifeform: i think whaack was working on a replacement block explorer. this thing i'm building is much simpler, just a network crawler |
2021-05-03 |
billymg |
my reason for doing so was because both bitnodes and coin.dance stopped tracking trb with their crawlers. i used to be able to check from time to time to see how many trb nodes are out there |
2021-05-03 |
billymg |
asciilifeform: i wrote a simple btc network crawler that uses watchglass for node probing and dumps the results into a postgres db. it's been running since yesterday afternoon, here are some stats so far: http://paste.deedbot.org/?id=yTLT |