Bug 4598 - Possible memory/swap corruption during hibernate or resume on HP EliteBook 820 G1
: Possible memory/swap corruption during hibernate or resume on HP EliteBook 82...
Status: CONFIRMED
Product: Desktop Bugs
Classification: ROSA Desktop
Component: Main Packages
: Fresh
: All Linux
: Normal normal
: ---
Assigned To: Eugene Shatokhin
: ROSA Linux Bugs
:
Depends on:
Blocks:
  Show dependency treegraph
 
Reported: 2014-10-30 12:17 MSK by Eugene Shatokhin
Modified: 2014-11-04 23:35 MSK (History)
0 users

See Also:
RPM Package:
ISO-related:
Bad POT generating:
Upstream:


Attachments

Note You need to log in before you can comment on or make changes to this bug.
Description Eugene Shatokhin 2014-10-30 12:17:02 MSK
Hardware and the logs:
http://hw.rosalinux.ru/index.php?probe=fca52d05ee

The problem only happens when I use hibernate/resume on that laptop. It has never shown up on suspend/resume during 2+ months of testing.

Sometimes it resulted in sudden hangs after the resume from hibernate, sometimes - in a strange out-of-memory condition when OOM killer arose and killed all processes after which the kernel panicked. All that was on kernel nrj-laptop-3.14.15.

With kernel nrj-laptop-3.14.22, the problem manifests itself in a similar way (see journalctl log in the hw probe above). There are messages about the out-of-memory condifions and the corrupted swap in the log, like this:

------------------------------
окт. 29 22:40:57 EliteBook-820 kernel: kded4 invoked oom-killer: gfp_mask=0x0, order=0, oom_score_adj=0 
окт. 29 22:40:57 EliteBook-820 kernel: kded4 cpuset=/ mems_allowed=0 
окт. 29 22:40:57 EliteBook-820 kernel: CPU: 2 PID: 5620 Comm: kded4 Tainted: P           O 3.14.22-nrj-laptop-3rosa #1 
окт. 29 22:40:57 EliteBook-820 kernel: Hardware name: Hewlett-Packard HP EliteBook 820 G1/1991, BIOS L71 Ver. 01.11 04/29/2014 
окт. 29 22:40:57 EliteBook-820 kernel:  0000000000000000 ffff8800a0037c80 ffffffff81657464 0000000000000000 
окт. 29 22:40:57 EliteBook-820 kernel:  ffff8800a0037ce8 ffffffff81654d63 00000000000000d0 ffffffff81c62f80 
окт. 29 22:40:57 EliteBook-820 kernel:  ffff8800a0037cb8 ffffffff8114cf53 ffffea0000c2de80 0000000000000202 
окт. 29 22:40:57 EliteBook-820 kernel: Call Trace: 
окт. 29 22:40:57 EliteBook-820 kernel:  [<ffffffff81657464>] dump_stack+0x4d/0x6f 
окт. 29 22:40:57 EliteBook-820 kernel:  [<ffffffff81654d63>] dump_header.isra.8+0x9b/0x206 
окт. 29 22:40:57 EliteBook-820 kernel:  [<ffffffff8114cf53>] ? __put_single_page+0x23/0x30 
окт. 29 22:40:57 EliteBook-820 kernel:  [<ffffffff81141e02>] oom_kill_process+0x202/0x370 
окт. 29 22:40:57 EliteBook-820 kernel:  [<ffffffff811425fc>] out_of_memory+0x4cc/0x520 
окт. 29 22:40:57 EliteBook-820 kernel:  [<ffffffff811426a8>] pagefault_out_of_memory+0x58/0x70 
окт. 29 22:40:57 EliteBook-820 kernel:  [<ffffffff8105810e>] mm_fault_error+0x12e/0x210 
окт. 29 22:40:57 EliteBook-820 kernel:  [<ffffffff81661d28>] __do_page_fault+0x4d8/0x5c0 
окт. 29 22:40:57 EliteBook-820 kernel:  [<ffffffff811a746f>] ? SYSC_newstat+0x2f/0x40 
окт. 29 22:40:57 EliteBook-820 kernel:  [<ffffffff81661e32>] do_page_fault+0x22/0x30 
окт. 29 22:40:57 EliteBook-820 kernel:  [<ffffffff8165ea38>] page_fault+0x28/0x30
<...>
------------------------------

By the way, the problem showed up both with broadcom-wl 6.30.223.141 and with 6.30.223.248.

First, it is needed to re-check everything without 'wl' driver. Just to be sure it is not the culprit.
Comment 1 Eugene Shatokhin 2014-11-03 19:55:01 MSK
Checked on the kernel 3.14.22 without broadcom-wl - no problem after resume from hibernate.

Installed dkms-broadcom-wl - the problem appeared again after the next resume. First - a series of OOM killer actions, then - kernel crash. This time, the kernel crashed while managing the page tables when do_notify_resume() was executing.

So the 'wl' driver is the suspect now.

The photos of the kernel error are here:
https://www.dropbox.com/sh/ks7pgnrvznt0eu3/AACdWpOKGogi7HlR3nn8w0qha?dl=0
Comment 2 Eugene Shatokhin 2014-11-04 23:35:42 MSK
By the way, I checked the RAM on that laptop with Memtest - no errors were reported after 3 hours of testing.