Chatting While Working on RI Algorithm Implementation
Hyaline @hyaline@hackers.pub
- Subgraph isomorphism problem is a terrible rabbit hole.
- Looking at papers like this where they say "our experiment took 556 days!" and many papers mention "we set a time limit of 10 days per case" and so on.
- I was getting frustrated with the slow search algorithm right before completing the plugin I was developing
- So I carefully revisited papers I had previously given up on reading(...)
- I realized that half of what I built was correct and half was just inefficient... but the performance difference was so huge that it made me angry.
- I decided to adopt and implement RI, which seemed decent among the papers with publicly available code
- After spending a few days creating the integration part, I just ran it successfully.
- It's incredibly fast. I'm excited.
- But of course, the worst cases that I thought "this would probably take months to calculate" still don't finish running, haha ' -';
- By the way, what appears to be the most recent(?) version of that algorithm is this: ArcMatch
and the paper is here
- I had also considered VF3 as a candidate, but it seemed to require some modifications to check monomorphism (non-induced subgraph isomorphism), so I avoided it ' -' I was thinking about switching to ArcMatch once I got more familiar, and above all, ArcMatch seemed better in terms of scalability.
- Up to here was what I wrote on BlueSky, but here's a small update.
- I naively thought that I could just grab the source code and use it without fully understanding the algorithm and code.
- But if I wanted to tell the editor "that node doesn't look right, don't touch it anymore" based on the editor's circumstances or the plugin's design intentions... ahaha.
- I had to understand everything completely.
Actually, this is a core functionality, so I feel very ashamed for trying to use it without understanding, though unconsciously I was probably expecting that I'd end up understanding everything anyway... encapsulation means you can temporarily forget something in your context buffer while working, not that you don't need to understand what you're using - that's a blatant lie! There's no such thing in this field! - I spent 2 days carefully reading the paper and finally understood it.
- But shockingly, the paper didn't cover the parts of the source code that I couldn't understand.
- And the code looks like this:
int psi = -1;
int si = 0;
int ci = -1;
int sip1;
while(si != -1){
//steps++;
if(psi >= si){
matched[solution[si]] = false;
}
ci = -1;
candidatesIT[si]++;
while(candidatesIT[si] < candidatesSize[si]){
//triedcouples++;
ci = candidates[si][candidatesIT[si]];
solution[si] = ci;
psi, si, ci, sip1... and in other places there are things like ii too
- I ended up using the LLM feature in Rider to get code explanations and spent hours struggling to understand. Now it's time to set up debugging to check if my understanding is actually correct. The LLM probably lied to me in some parts.
- It's the first time in my life seeing tree traversal without recursion, based on arrays and pointers to arrays. It's mind-bogglingly confusing, but I'm also enjoying witnessing this kind of black magic.
- A helpful article I found while searching about this: Tree traversal without recursion: the tree as a state machine
- But the most enjoyable part is this aspect. The feeling of consuming in a short time
not quite a day, but stillthe results that many people worked hard on for a long time is so sweet. I wish I could have told this to my younger self who struggled with studying.
It's really nice that Markdown is available here.
I'm procrastinating on starting to create test cases and setting up debugging. Now I should get back to work ' ㅅ'