Update your Linux driver for Areca ARC-1883 SAS RAID Adapter

May 09, 2014
No Comments

Recently, when performing planned test scenarios with different hardware parts, our QA team identified an issue with kernel panic during read operations on the Areca ARC-1883 SAS RAID Adapter. We notified Areca and thanks to their fast reaction we were able to quickly resolve the problem. Here’s an overview.

The problem

During sequential read operations kernel panic occurred on Linux. As it turned out, the newer the kernel version the faster the system would hang.

Call trace from dying system:

BUG: unable to handle kernel paging request at ffff8800ffffffc8

IP: [<ffffffffa01be89d>] arcmsr_drain_donequeue+0xd/0x70 [arcmsr]

PGD 1a86063 PUD 0

Oops: 0000 [#1] SMP

last sysfs file: /sys/devices/pci0000:00/0000:00:1d.0/usb2/2-1/2-1.4/speed

CPU 12

Modules linked in: arcmsr(U) autofs4 sunrpc cpufreq_ondemand acpi_cpufreq freq_table mperf ipt_REJECT nf_conntrack_ipv4 nf_defrag_ipv4 iptable_filter ip_tab]

Pid: 3576, comm: dd Not tainted 2.6.32-431.11.2.el6.x86_64 #1 Supermicro X9DRH-7TF/7F/iTF/iF/X9DRH-7TF/7F/iTF/iF

RIP: 0010:[<ffffffffa01be89d>]  [<ffffffffa01be89d>] arcmsr_drain_donequeue+0xd/0x70 [arcmsr]

RSP: 0018:ffff88089c483e38  EFLAGS: 00010082

RAX: ffffc90016ea00c8 RBX: ffff8810731885e0 RCX: ffffc90016ea0020

RDX: 0000000000000001 RSI: ffff8800ffffffb0 RDI: ffff8810731885e0

RBP: ffff88089c483e48 R08: 0000000000000000 R09: 0000000000000000

R10: 0000000000000000 R11: 0000000000000000 R12: ffffc90016ea0030

R13: 0000000000000008 R14: 0000000000000010 R15: 0000000000000001

FS:  00007f7f733e5700(0000) GS:ffff88089c480000(0000) knlGS:0000000000000000

CS:  0010 DS: 0000 ES: 0000 CR0: 000000008005003b

CR2: ffff8800ffffffc8 CR3: 0000000f2b97c000 CR4: 00000000001407e0

DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000

DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400

Process dd (pid: 3576, threadinfo ffff8809c7172000, task ffff8810712b8ae0)

Stack:

ffff88089c483e48 ffffffff81095628 ffff88089c483eb8 ffffffffa01beeff

<d> 0000000000000005 ffff88089c483e90 ffff88107318acd8 ffffc90016ea0020

<d> ffffc90016ea00c8 ffffc90016ea0030 0000000000000100 ffff88107026dc40

Call Trace:

<IRQ>

[<ffffffff81095628>] ? schedule_work+0x18/0x20

[<ffffffffa01beeff>] arcmsr_interrupt+0x5ff/0x6a0 [arcmsr]

[<ffffffffa01befb1>] arcmsr_do_interrupt+0x11/0x20 [arcmsr]

[<ffffffff810e6eb0>] handle_IRQ_event+0x60/0x170

[<ffffffff8107a93f>] ? __do_softirq+0x11f/0x1e0

[<ffffffff810e980e>] handle_edge_irq+0xde/0x180

[<ffffffff8100c30c>] ? call_softirq+0x1c/0x30

[<ffffffff8100faf9>] handle_irq+0x49/0xa0

[<ffffffff815315fc>] do_IRQ+0x6c/0xf0

[<ffffffff8100b9d3>] ret_from_intr+0x0/0x11

<EOI>

[<ffffffff81136bc9>] ? activate_page+0x189/0x1a0

[<ffffffff81136bb9>] ? activate_page+0x179/0x1a0

[<ffffffff81136c21>] mark_page_accessed+0x41/0x50

[<ffffffff811213c3>] generic_file_aio_read+0x2c3/0x700

[<ffffffff811c4841>] blkdev_aio_read+0x51/0x80

[<ffffffff81188e7c>] ? do_sync_read+0xec/0x140

[<ffffffff81188e8a>] do_sync_read+0xfa/0x140

[<ffffffff8109b290>] ? autoremove_wake_function+0x0/0x40

[<ffffffff812334d6>] ? selinux_file_permission+0x26/0x150

[<ffffffff812335ab>] ? selinux_file_permission+0xfb/0x150

[<ffffffff81226496>] ? security_file_permission+0x16/0x20

[<ffffffff81189775>] vfs_read+0xb5/0x1a0

[<ffffffff811975bd>] ? path_put+0x1d/0x40

[<ffffffff811898b1>] sys_read+0x51/0x90

[<ffffffff810e1e4e>] ? __audit_syscall_exit+0x25e/0x290

[<ffffffff8100b072>] system_call_fastpath+0x16/0x1b

Code: ff ff c6 02 00 48 8d 7a 01 40 b6 5f e9 5f ff ff ff 66 66 66 66 2e 0f 1f 84 00 00 00 00 00 55 48 89 e5 48 83 ec 10 0f 1f 44 00 00 <4c> 8b 46 18 49 39 f

RIP  [<ffffffffa01be89d>] arcmsr_drain_donequeue+0xd/0x70 [arcmsr]

RSP <ffff88089c483e38>

CR2: ffff8800ffffffc8

Preliminary tests showed that the same scenario performed with an older kernel works longer but finally provides the same results – kernel panic.

What was the solution?

Our development team researched the driver and immediately informed Areca about the part of the code where this issue occurred. The issue was caused by getting the wrong Command Control Block pointer value in the arcmsr_hbaC_postqueue_isr function.

In the arcmsr_hbaC_postqueue_isr function of the old driver:

flag_ccb = readl(&phbcmu->outbound_queueport_low);ccb_cdb_phy = (flag_ccb & 0xFFFFFFF0);/*frame must be 32 bytes aligned*/arcmsr_cdb = (struct ARCMSR_CDB *)(acb->vir2phy_offset + ccb_cdb_phy);ccb = container_of(arcmsr_cdb, struct CommandControlBlock, arcmsr_cdb); <- ccb points to the wrong address

Upon Areca’s request, our kernel developers also tested this scenario on different Linux systems (including Ubuntu and CentOS) and provided further information about the test results.

Devices affected

We tested other Areca Adapters, however, this behavior only occurred on the Areca ARC-1883 SAS RAID Adapter.

Operation System affected

All Linux-based systems that use the Areca driver in versions lower than 1.30.0X.18-140417 are affected.

What to do if you want to use Open-E DSS V7 with the Areca ARC-1883 SAS RAID Adapter?

Please contact our support team which already has a fix for this issue. Additionally, Areca prepared a driver which fixes the issue. [http://www.areca.com.tw/support/s_linux/linux.htm]

All in all, we have to say that we were impressed with Areca’s technical support. They reacted very fast from the moment our QA team first reported the problem. Then, upon Areca’s request – our team further investigated the issue. Once the problem was identified, our technology partner Areca quickly prepared a driver fixing the issue. Great job guys!

ARC-1883 ARECA drivers kernel kernel panic Linux Open-E DSS V7 RAID SAS RAID Adapter

Kasia Kolodziej

Leave a Comment

Featured Posts

Optimizing Data Storage Costs & Efficiency with Open-E JovianDSS

In today’s data-driven world, the importance of optimizing data storage cannot be overstated. As data continues to grow at an unprecedented rate, businesses face significant challenges in managing, storing, and ...

Data Storage Monitoring in Open-E JovianDSS with Checkmk and Diagnostic Tools

Among the characteristics of an optimal data storage solution, several features should stand out. It should provide full checksumming, self-repair, and backup and restore capabilities with short RPOs and RTOs. ...

How To Improve Your Business With ZFS

The smooth workflow of almost any business today is mainly based on data management. Media, transportation and logistics, finance, the public, government, or medical sectors – basically, you can list ...

Welcome to Open-Experts — The Data Storage Podcast!

Our charismatic host, Todd Maxwell, with almost 20 years of experience in the data storage market, delves into the world of data storage solutions. Learn about key trends, technologies, and ...

Want to Learn More?

Open-E Data Storage Calculator page

3-in-1 Complete Data Storage Solution

Accelerate Your Data Storage with ZFS-based Storage System

Start 60 Day FREE TRIAL

Open-E data storage calculator tabs

Find the Exact License for Your Storage Setup

This calculator helps you to find the exact license required for your storage setup with Open-E JovianDSS, based on your individual specification.

Enter the configuration of your choice into the calculator and generate a PDF report.

Try the Calculator

Open-E Library

Manuals and Quick Starts

How-to Resources

Video Tutorials

Courses