Show Idle (>14 d.) Chans


← 2020-03-03 | 2020-03-05 →
09:48 whaack jfw: hm yup I received that, I will pay more attention to the =freenode tab going forward
~ 1 hours 15 minutes ~
11:03 feedbot http://ossasepia.com/2020/03/04/no-bones-in-thy-skeleton-and-no-theory-in-thy-research/ << Ossa Sepia -- No Bones in Thy Skeleton and No Theory in Thy Research
~ 3 hours 9 minutes ~
14:13 diana_coman jfw: why u no write? It's been a whole week!!1
14:15 diana_coman BingoBoingo: where are you with the scripts? kind of lost track of that part and saw only the drafts.
~ 51 minutes ~
15:06 BingoBoingo diana_coman: Feeding urls from a file to curl using command substitution appears to fit in the hand. I'll clean up the pieces I have and get them in here.
15:14 jfw diana_coman: because I'm ffa'ing it apparently, "can't possibly cut elephant into more manageable bites". Published nao.
15:19 diana_coman jfw: ahaha, "eat your elephants in small pieces!"
15:23 diana_coman BingoBoingo: more to the point: what steps do you have working, what did you obtain already with them, what's the next step and where are you with that ?
15:23 BingoBoingo http://paste.deedbot.org/?id=x7wl << The website discovery pieces. I've got a start to a filter for "Things with these file extensions aren't interesting" and a start to a "Does this page have a comment box" tester.
15:24 BingoBoingo Now that gathering works, the next step is cutting out the gathered items that aren't interesting.
15:24 diana_coman BingoBoingo: did you run those on anything? on what? what did you get out of it? where do you run them next?
15:27 diana_coman jfw: so you have in that very article some questions re the signatures thread - why didn't you ask those in #t?
15:30 BingoBoingo diana_coman: I've run them starting from a few different sites. I get at the end a file 'churndomains4' full of website urls.
15:30 BingoBoingo diana_coman: Since running out that many iterations gets very slow, for now I'm testing the filtering on the 'churn3' list of all urls collected from a bunch of discovered sites.
15:30 BingoBoingo This is the most recent 'churn3' I've produced http://paste.deedbot.org/?id=VxhL
15:32 BingoBoingo Here's the most recent (and smaller) churndomains4 http://paste.deedbot.org/?id=bKiT
15:33 diana_coman BingoBoingo: uhm, I don't quite get it - are you after the sites or after all pages of a site? (and even ...images??)
15:35 jfw diana_coman: they were pretty vague in my mind until spelling it all out now. Perhaps even still now, dunno; do they make sense to you?
15:35 diana_coman BingoBoingo: to my mind the initial exploration aims to get literally as many domains as you can reach starting from a given point; so yes, it follows links from there but you don't really need to save other than those that point to *another* domain, do you?
15:36 diana_coman BingoBoingo: what's though the core trouble you are having with this because it seems to me quite obviously going beyond curl/awk/sed/whatever command line ie you just don't see it as clear or specific enough steps at all, can't quite put my finger on it.
15:38 diana_coman jfw: well, your article there is quite highly strung and rather visibly the result of pain-writing; but the way it looks it's quite as you say in footnote 1 - you torture the writing because it's not as definitive as you'd want it to be, huh.
15:41 diana_coman jfw: the thing with questions though is that they are precisely exploratory - it's true that at times you can indeed ask questions to help the other party explore but not *all* questions are like that, lol; at times you literally ask to figure stuff out so yes, necessarily *before* things are clear, lol
15:42 diana_coman jfw: specifically on the questions in footnote iv, the second one assumes the whiteout - it's unclear that is the desired approach to start with so maybe ask *that*? ie how would it work, maybe whiteout or something else/what?
15:44 diana_coman the first one seems quite clear ie the underlying concern is that including signatures in the same place as the vpatch/text requires some clear separation of the roles of those 2 bunches of (ultimately) text; so how is that to be achieved?
15:44 diana_coman jfw: is that what you are asking there?
15:44 jfw so aiming too far even with the questions, hm.
15:45 diana_coman jfw: what do you mean by "too far"?
15:46 jfw trying to cover too much ground and possibly introducing bad assumptions rather than starting with something simpler
15:47 jfw yes, the boundary between sigs and text is the root of it
15:48 BingoBoingo diana_coman: It's not the most elegant approach, but I
15:48 BingoBoingo 'll try rearranging and presenting
15:48 BingoBoingo diana_coman: On the first couple rounds I'm after new sites. On the last round I'm after blogposts specifically. The thing I'm chewing on now is cutting the uninteresting stuff out of the file full of urls to images and everything else without stripping it down to the bare domains.
15:48 BingoBoingo diana_coman: As this works now, it curls one site puts all the urls in a file, the next step produces from that a smaller file of only new site urls, the third step curls the sites creating a large file of all encountered urls, fourth step trims it down to sites...
15:48 BingoBoingo diana_coman: So where I want to go is from an "all urls file" to "urls scrubbed of images, .js, .css, etc", from there retrieve urls and screen for comment boxes in the next cut.
15:49 diana_coman jfw: well, you probably have way more practice figuring things out on your own than through discussion, don't you?
15:49 BingoBoingo diana_coman: In between "scrub images etc" and "retrieve urls looking for comment boxes", I'm uncertain if I want to add a "cut the list to 3 or 4" urls per site step.
15:49 jfw diana_coman: yep
15:50 diana_coman jfw: that's pretty much the underlying cause really - in other words simply lack of practice.
15:51 diana_coman and it's quite possibly further coming from the fact that yeah, not much to get from asking questions of the clueless and so on, to the full context; but the solution is still...practice.
15:52 jfw makes sense.
15:53 diana_coman BingoBoingo: it's not about elegant or anything of the sort; but to start with, a program executes a series of steps itself, it doesn't have to be one step one script; the point and my repeated asking for your "steps" is to figure out what are you trying to achieve at one *stage* if you prefer; ie stage 1: discovery of linked domains starting from a given domain; 2. finding all pages with a comment box for a given domain
15:54 diana_coman BingoBoingo: basically you have a big problem to solve; you'll have to cut this into smaller problems so you can solve them; if needed, you cut and cut again (divide and conquer , pretty much)
15:55 diana_coman then once you have one small-enough problem, that you *know* how to solve *manually*, you simply take those manual steps and tell the machine to do them.
15:57 diana_coman http://logs.ossasepia.com/log/ossasepia/2020-03-04#1020036 - heh, now I suspect you've been reading the #e logs of today, lol
15:57 ossabot Logged on 2020-03-04 17:03:48 jfw: trying to cover too much ground and possibly introducing bad assumptions rather than starting with something simpler
15:58 jfw I haven't actually
15:59 diana_coman jfw: you know, one of the good things in academia is that you *have to* ask questions; as in, if you listen to a presentation, whatever it might be, on whatever topic and regardless of how well or badly made, at the end you *have to ask* at least x questions; that's practice, pure and simple and it...works.
16:00 diana_coman looking back at it (as I was initially rubbish at this part), I think initially I simply studied other people's questions to figure out how they managed it, lolz
16:00 diana_coman http://logs.ossasepia.com/log/ossasepia/2020-03-04#1020054 - then even more well done you!
16:00 ossabot Logged on 2020-03-04 17:15:06 jfw: I haven't actually
16:01 BingoBoingo diana_coman: Thank you. I'll get to breaking these problems up some more.
16:01 diana_coman (today's #e log is not directly on question asking but it is on exploring what is pretty much a big unknown and it touches at times on what makes for a better initial exploration precisely on the grounds you gave re possibly introducing bad assumptions if not simple enough)
16:02 diana_coman BingoBoingo: yw; is it clear to you what & how there? because I really don't want that it blocks you even more somehow.
16:03 jfw diana_coman: interesting, I hadn't heard about the mandatory questions. Re #e, perhaps it's that you brought the notion through your feedback, and I attempted to expand.
16:05 diana_coman might be.
16:05 diana_coman jfw: since you have presentations at your Junto meetings for that matter, do you have questions at the end?
16:06 jfw heh, sometimes we have to tamp down on questions popping up throughout so as to get to the end
16:07 diana_coman jfw: ahaha, that's good then; is it *you* asking questions though? :P
16:07 BingoBoingo diana_coman: The whats seem clear. The hows less so, but enough to get moving.
16:07 diana_coman BingoBoingo: alright then.
16:08 jfw diana_coman: sometimes; though hm, possibly less on the more unfamiliar topics.
16:10 jfw mandatory questions afterward sounds like a great addition actually.
16:10 diana_coman jfw: in principle there's nothing wrong with just agreeing to keep questions for the end (as some of them might be answered at times simply at a later point in the presentation) and otherwise set mandatory questions at the end, yeah
16:22 jfw so I wasn't sure what "high strung" meant, my guess was something in the vein of pretentious or stuffy or bombastic (not that those are all that similar), but I'm reading it's more in the vein of nervous or tense, which certainly seems to fit better here. Is that right diana_coman?
16:24 jfw and that'd be another example of where I coulda figured out by asking earlier!
16:27 * jfw afk, food
~ 15 minutes ~
16:42 diana_coman jfw: ah, not at all stuffy/bombastic/pretentious, no; and not nervous either; and note that I use adverbs correctly, it's highly (not "high") strung for a reason! if you think of how you tighten/loosen up strings on a guitar, that's pretty much the analogy there - you kept stretching and tuning and fiddling with it that the result is a highly strung (and generally too tightly but not only that) text/string.
16:43 * diana_coman will be back tomorrow.
~ 1 hours 27 minutes ~
18:11 whaack when i run top on one of my vms, I get "Mem: 3922344k total, 1768028k used, 2154316k free, 143744k buffers" for the line that describes memory usage. When I inspect how much memory an individual process is using on the same vm with the command pmap, i get "total 3245868K" for the last line. Why would pmap report more memory being used by one process than top reports for all proccesses?
~ 16 minutes ~
18:27 jfw whaack: do you know how virtual memory works?
18:30 whaack jfw: No, I do not
~ 55 minutes ~
19:25 jfw whaack: sorry 'bout the delay, got my attentions diverted. It's worth learning about (what, they didn't have any comp arch class at that MIT?!) but the short version is each process has its own address space, portions of which get mapped to different things such as physical RAM, files, hardware registers and such by the OS and CPU (MMU specifically).
19:27 jfw so what you're looking at with pmap is the total mappings, many of which may be shared with other processes, not actually allocated due to overcommit, and so on.
19:28 jfw The RES line in top or ps listings tends to the be closest approximation of actual usage attributable to the process in my understanding.
19:28 jfw (resident set size)
~ 21 minutes ~
19:50 whaack jfw: no worries, thank you. Yes MIT did, but through my fault the material didn't stick with me. I'll read up on the subj more later, I'm about to head out to the airport.
19:50 ossabot Logged on 2019-10-13 10:00:22 whaack: yes it did, but i ~failed that course
19:53 jfw whaack: cool, no need to pile on further tsks then, lol
~ 1 hours 52 minutes ~
21:45 lobbes http://logs.ericbenevides.com/log/ossasepia/2020-02-27#1019473 << I missed this earlier, but archiving should already be occurring in this channel. Currently lobbesbot is set to silently snarf urls-to-parse from all channels it sits in, so this channel ought to be covered
21:45 ericbot Logged on 2020-02-27 13:03:46 diana_coman: lobbes: how does that link-archiving work, can I have it in here too or what does it require?
← 2020-03-03 | 2020-03-05 →