Hello kernel gurus.

I have a char driver module that has been working fine under 2.6.13. It is a PCI card for a Heidenhaim rotational encoder. I am now moving to 2.6.22 and it no longer works. In order to get it to build I changed exactly one line:

I changed this:

rc = pci_module_init(&ik220_driver);

to this:

rc = pci_register_driver(&ik220_driver);

Under 2.6.22 it loads fine but as soon as my app opens /dev/ik200 the kernel crashes without a trace. I added lots of printk statements to see what is going on. The printk statements for pci_driver .probe and .remove show up as expected. But as soon as my application does an open the kernel crashes immediately -- I do not even see the file_operations .open() fops entry getting called.

Any clues would be appreciated.

Thanks in advance,

Elwood, Tucson AZ
__________________________

Comment viewing options

Select your preferred way to display the comments and click "Save settings" to activate your changes.
Mitch Frazier's picture

The Code

On July 5th, 2009 Mitch Frazier says:

Since this is not a standard driver a reference to where it can be downloaded would be useful.

__________________________

Mitch Frazier is an Associate Editor for Linux Journal and the Web Editor for linuxjournal.com.

Hi Mitch, thanks for taking an interest in my little conundrum. I put a copy of the driver at http://www.clearskyinstitute.com/ik220 . The device in question is a PCI card from Heidenhain that interfaces to up to 8 of their absolute encoders. I got the driver originally from them for 2.4 kernels. They have nothing more recent so I hacked on it and got it to work on 2.6.13. Now at 2.6.22 it's not working again.

The probe function works because after insmod the module ik220 is listed by lsmod and reported correctly by lspci -v as the driver for the given device. It also creates /dev/ik220 correctly. But as soon as I try to open ("/dev/ik220", O_RDWR) the kernel immediately crashes. I put a printk in the ioctl file_operations but it never prints, so it looks like the kernel is panicing before it reaches the driver. I don't need an open file_ops but I added one anyway just to test with a printk and found it never prints either, the kernel crash comes before it is called.

Mitch Frazier's picture

Char Driver

On July 9th, 2009 Mitch Frazier says:

I looked at your code before I finished reading your post... my first suggestion was going to be to add an open() function, but now I see you already tried that. It shouldn't be necessary anyways, the char device code does not require an open() function.

What precise version of the kernel are you using, 2.6.22.what? There were about 15 different revisions of 2.6.22.

Another thought, and I have no real reason to think this might fix it, but it's consistent with other kernel drivers, is to add static const to the file_operations declaration:

  static const struct file_operations ik220_fops = {
  ...
  };

Another thing you might want to do, again for consistency with newer code, is to change the syntax of the structure initializers:

  static struct pci_driver ik220_driver = {
          .name          = DRV_NAME,
          .id_table      = ik220_tbl,
          .probe         = ik220_init_one,
          .remove        = ik220_remove_one,
  };

  static const struct file_operations ik220_fops = {
          .ioctl         = ik220_driver_ioctl,
  };

The "name: value" syntax was long ago deprecated by gcc.

Since the code dies before it gets to your open() function (when you have one), the next move is probably to add some debugging code to the kernel proper. A good starting point is the function chrdev_open() in the file fs/char_dev.c. That's where character devices are "opened", it's the function that calls your open() function.

__________________________

Mitch Frazier is an Associate Editor for Linux Journal and the Web Editor for linuxjournal.com.

Thanks for the thoughts. Alas, the extra statics and .name changes did not help. This is 2.6.22-rtai. We're not using any RTAI stuff, it's just there for possible future work.

I hesitate to open the door to hacking the kernel itself so I'm still fiddling with the driver.

Since it loads but fails on first use, I'm thinking something in ik220_init_board is setting up for later trouble. I gutted the code and added it back slowly. I found that the open would not crash until I added back the call to ioremap_nocache(). I wonder what that could mean??

Mitch Frazier's picture

Char Driver

On July 10th, 2009 Mitch Frazier says:

Like I said I didn't really expect those changes to fix it, I might have hoped, but I didn't believe. :).

I don't blame you not wanting to hack the kernel itself.

Is the call to pci_request_regions() working? The return value is not checked, should return zero.

You might try changing ioremap_nocache to ioremap. Nocache should be the right one to call, since you don't want device registers to be cached, but very few existing drivers seem to use it. Although, again, I don't really expect that to fix it.

Also does it make sense that pci resource #1 is skipped?

   ik220_card[slot].conf_iomem_start = pci_resource_start(pdev, 0);
   ...
   ik220_card[slot].iomem_1_start = pci_resource_start(pdev, 2);
   ...
   ik220_card[slot].iomem_2_start = pci_resource_start(pdev, 3);

   // 0, 2, 3  no 1??
__________________________

Mitch Frazier is an Associate Editor for Linux Journal and the Web Editor for linuxjournal.com.

pci_request_regions() is confirmed to be returning 0.

using ioremap: no change (still later crashes).

changing pci_resource_start to 0, 1, 2 gives this in syslog:

Jul 10 18:40:12 montsec-ocs kernel: [37134.167618] IK220: 2nd IO-Resource is no
IOMemory! Wrong Card?

<Groan>

Mitch Frazier's picture

Stock Kernel

On July 10th, 2009 Mitch Frazier says:

Another thing that might be useful to test if possible is to see how the driver acts using a stock kernel, rather than an RTAI patched kernel.

__________________________

Mitch Frazier is an Associate Editor for Linux Journal and the Web Editor for linuxjournal.com.

What are the values printed out by the statements:

printk(KERN_INFO "%s: Config-Region start: 0x%lX end: 0x%lX flags: 0x%lX\n", ...);
printk(KERN_INFO "%s: 1st IO-Region start: 0x%lX end: 0x%lX flags: 0x%lX\n", ...);
printk(KERN_INFO "%s: 2nd IO-Region start: 0x%lX end: 0x%lX flags: 0x%lX\n", ...);
printk(KERN_INFO "%s: Config-Region remaped to virtual address 0x%lX\n", ...);
printk(KERN_INFO "%s: 1st IO-Region remaped to virtual address 0x%lX\n", driver_name, ...);
printk(KERN_INFO "%s: 2nd IO-Region remaped to virtual address 0x%lX\n", driver_name, ,,,);

What's the output from "lspci -vvv" for the card?

__________________________

Mitch Frazier is an Associate Editor for Linux Journal and the Web Editor for linuxjournal.com.

[oh I see, the > characters looked like html tags]

01:04.0 Bridge: PLX Technology, Inc. PCI - IOBus Bridge (rev 02)
	Subsystem: PLX Technology, Inc. IK220 (Heidenhain)
	Control: I/O+ Mem+ BusMaster- SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR- FastB2B- DisINTx-
	Status: Cap- 66MHz- UDF- FastB2B+ ParErr- DEVSEL=medium >TAbort- TAbort- MAbort- >SERR- PERR- INTx-
	Interrupt: pin A routed to IRQ 5
	Region 0: Memory at e80a2000 (32-bit, non-prefetchable) [size=128]
	Region 1: I/O ports at d000 [size=128]
	Region 2: Memory at e80a4000 (32-bit, non-prefetchable) [size=32]
	Region 3: Memory at e80a0000 (32-bit, non-prefetchable) [size=32]
	Kernel driver in use: ik220
Jul 11 06:50:22 montsec-ocs kernel: [80922.691198] IK220: Config-Region start: 0xE80A2000 end: 0xE80A207F flags: 0x200
Jul 11 06:50:22 montsec-ocs kernel: [80922.691239] IK220: 1st IO-Region start: 0xE80A4000 end: 0xE80A401F flags: 0x200
Jul 11 06:50:22 montsec-ocs kernel: [80922.691281] IK220: 2nd IO-Region start: 0xE80A0000 end: 0xE80A001F flags: 0x200
Jul 11 06:50:22 montsec-ocs kernel: [80922.691346] IK220: Config-Region remaped to virtual address 0xF8950000
Jul 11 06:50:22 montsec-ocs kernel: [80922.691375] IK220: 1st IO-Region remaped to virtual address 0xF8952000
Jul 11 06:50:22 montsec-ocs kernel: [80922.691403] IK220: 2nd IO-Region remaped to virtual address 0xF896C000

[lspci in next comment -- forum chopped it off]

Mitch Frazier's picture

Char Driver

On July 11th, 2009 Mitch Frazier says:

Nothing strange in those values that I can see.

However, I just noticed that the second and third region are mapped with ioremap and not with ioremap_nocache. Have you tried changing those to _nocache? Really doesn't seem like those should be cacheable.

__________________________

Mitch Frazier is an Associate Editor for Linux Journal and the Web Editor for linuxjournal.com.

That did not help either.

But we'll never know because we're giving up on 2.6.22. We've hatched a method to go back to 2.6.13 which we know works. It's an embedded system which will just sit and do it's job so the age of the kernel is not a big deal as long as we get it working.

Thank you very much for your efforts Mitch.

Elwood

Mitch Frazier's picture

Next Time

On July 13th, 2009 Mitch Frazier says:

It would have been nice to find/know the solution... but it doesn't always work out that way.

__________________________

Mitch Frazier is an Associate Editor for Linux Journal and the Web Editor for linuxjournal.com.

Post new comment

Please note that comments may not appear immediately, so there is no need to repost your comment.
The content of this field is kept private and will not be shown publicly.
  • Allowed HTML tags: <a> <em> <strong> <cite> <code> <pre> <ul> <ol> <li> <dl> <dt> <dd> <i> <b>
  • Lines and paragraphs break automatically.

More information about formatting options

Newsletter

Each week Linux Journal editors will tell you what's hot in the world of Linux. You will receive late breaking news, technical tips and tricks, and links to in-depth stories featured on www.linuxjournal.com.
Sign up for our Email Newsletter

Tech Tip Videos

From the Magazine

December 2009, #188

If last month's Infrastrucuture issue was too "big" for you then try on this month's Embedded issue. Find out how to use Player for programming mobile robots, build a humidity controller for your root cellar, find out how to reduce the boot time of your embedded system, and if you're new to embedded systems find out the basics that go into one. You can also read about the Beagle Board, the Mesh Potato and a spate of other interestingly named items. And along with our regular columns don't miss our new monthly column: Economy Size Geek.


Read this issue