Finalizing details for the summer, Vijay, Peter, and Josh will contribute when possible. Rob will work on the server and Ylonka will get some DB work done. Good luck to all members of the team with their summ research and internships!
A large part of google blockly's engine is run off of a class or structure of some sort called Blockly. So for instance, Python code is rendered using the code "Blockly.Python.workspaceToCode();." But for the life of us, we cannot find where Blockly is defined. Believe me I've tried. We managed to get C code rendering by running it through the structure for Dart, but way more digging needs to be done on the whole "Blockly" thing.
After messing around with it a decent amount, multithreading the crawler doesn't seem to be something that's going to happen this semester. I had the structure of what would need to happen logically laid out, using Python's built in thread-safe queues and everything. The get_friends function would put a start index for each page that needed to be crawled into a queue, and then create an explicitly stated amount of threads, each with access to this page queue. A calling function would be used to initiate each thread, and would continuously grab the next page off the queue, and call the worker function on that page which would do a similar process to what happened in the single-threaded get_friends function. Each worker function would push a list of ID's it got for that page onto a results queue it would have access to, and at the end the get_friends function would combine all lists in the results queue to get the final list.
However, I ran into a lot of issues with Selenium page loading that made threading very difficult to work with. In addition, the immediate cost of having multiple threads managing unique webdrivers was very noticeable, and just opening 4 windows all controlled by different class objects happened sequentially instead of at the same time like I thought they would, this could just be the way Selenium handles it or I could have messed something up, I don't know. Previously, getting Paul's 1,235 friends took about 18 seconds (just under 70 ID's every second), where as opening the windows needed for 4 threads to do a quarter of the work each takes about 10 seconds. Because of these reasons we've decided to continue using our crawler with the single-threaded method it was designed for. I've saved the work I did trying to get the multithreaded approach as a backup so that I can look back on it later on to see if I can learn anything from my mistakes.