Message boards :
Number crunching :
Those ever-restarting UV-reionization tasks...
Message board moderation
Author | Message |
---|---|
Send message Joined: 4 Nov 16 Posts: 20 Credit: 118,453,585 RAC: 0 |
Today I saw a UV-task in the task list of one of my computers, and although it had a deadline of several month ahead I was far too curious not to run it at once! :-) Unfortunately though, it behaved the same way as they did a couple of month before: That is they run for about 14 minutes (although they are listed to run in about one minute) and then the same task starts all over again, and again, and again, and again..... I have taken ten screen dumps of the properties for that one task as it restarted all over again (see below) and the task had already restarted a several times before that. What makes these tasks restart instead of reporting themselves to the server??? Could anything be done to the very erratic runtime prognosis, that says they should only take on minute although they in reality take more than 14 minutes to complete? It would be very interesting to run them, but now I don't even dare to run the normal BH-spin, because I might catch one or two UV-reion.-tasks and then be standing there forever without doing any good at all. :-( //Gunnar |
Send message Joined: 16 Apr 17 Posts: 36 Credit: 39,603,949 RAC: 0 |
Until they fix this you could unselect the UV tasks in your Universe@Home preferences. That's what I did. Also, the server says there are 211 in progress. I hope they're not all in this same status. It's equivalent to a power virus at that point. |
Send message Joined: 4 Nov 16 Posts: 20 Credit: 118,453,585 RAC: 0 |
Hi Aaron! Thanks for your advice on unselecting different task-types! I've done so now, and allowed new tasks on the computer that is currently running U@H. About those 211 tasks that's out there, I hope that the admins can revoke them, so they won't become eternal show-stoppers for hundreds of cpu-cores. If the admins put out a notice when they fixed the issue with the endless restaring, I'll gladly try it again. That'll have to be sometime when I can supervise the computers and stop the tasks if they misbehave. //Gunnar |
Send message Joined: 16 Apr 17 Posts: 36 Credit: 39,603,949 RAC: 0 |
Hello to you as well Gunnar! I've read other threads on here regarding issues with the BHSpin tasks as well. But, I've crunched over 8,000 of them in the last few months without finding a single one that keeps restarting itself. Although, I do check in with my machines a few times a week, so it wouldn't be a very big waste compared to others that don't babysit their machines like I do. :-) |
Send message Joined: 28 Feb 15 Posts: 253 Credit: 200,562,581 RAC: 0 |
I've read other threads on here regarding issues with the BHSpin tasks as well. But, I've crunched over 8,000 of them in the last few months without finding a single one that keeps restarting itself. I haven't had an error on BHspin v2 in several months either, though I used to see a long runner about once every two weeks. Maybe updating the Linux version helped? But they were easy to spot and abort if they ran over 24 hours, though if you left your machines entirely unattended they could be a problem, as they could tie up all your cores eventually. Another possibility is that I run GPU projects also on that machine, currently with a GTX 1070. In the past it was Folding, which I know can interact with VirtualBox projects (either way, depending on the project). However in the past couple of months I have switched to GPUGrid, which does not show any interaction. And it is a dedicated machine that runs only BOINC 24/7. So the simpler your setup, the fewer problems you are likely to have. (Are your numbers correct? I have a total credit of 6,194,456, almost all BHspin v2 and have crunched or have in progress 219 of them, whereas your total credit is about the same, at 5,573,667. Though my numbers may be a little low. I am not sure the status page is correct.) |
Send message Joined: 16 Apr 17 Posts: 36 Credit: 39,603,949 RAC: 0 |
I haven't had an error on BHspin v2 in several months either, though I used to see a long runner about once every two weeks. Maybe updating the Linux version helped? But they were easy to spot and abort if they ran over 24 hours, though if you left your machines entirely unattended they could be a problem, as they could tie up all your cores eventually. I have seen some tasks take a long time to finish, but I haven't had one yet that didn't finish eventually. I think the longest I've seen a task run is around 50 hours, but that was on a Raspberry Pi 2 core (~900MhZ). (Are your numbers correct? I have a total credit of 6,194,456, almost all BHspin v2 and have crunched or have in progress 219 of them, whereas your total credit is about the same, at 5,573,667. Though my numbers may be a little low. I am not sure the status page is correct.) When you view your task list, it only shows recent tasks. After some amount of time, they drop off the list. I know I receive 666.67 credits per task, so 5,600,000/666.67 = 8,400. |
Send message Joined: 4 Feb 15 Posts: 847 Credit: 144,180,465 RAC: 0 |
If you have self-restarting tasks - please abort them and send me PM with link to host where that happens, please. Sometimes, on Linux machines some very long tasks. I don't know reason why they are so long (sometimes even 60h) but wingman on Windows do same tasks in "normal" time. It's something with Linux libraries I think. Krzysztof 'krzyszp' Piszczek Member of Radioactive@Home team My Patreon profile Universe@Home on YT |
Send message Joined: 4 Feb 15 Posts: 49 Credit: 15,956,546 RAC: 0 |
Have just aborted 3 ultaviolet work units, the system aborted 3 before that. They just keep restarting. They are short runners of less that 2 minutes. I can see from the failed work units that everyone is having the same problem and they have been aborted or failed multiple times. Conan |