We're updating the issue view to help you get more done. 

Data importer can't read larger xml files (>a few mbs)

Description

When loading large files of pastperfectxml data I'm getting the following error in the importer.

"Could not import source DHM_Objects_subset.xml: (2016-02-23 14:17:55) Could not read source DHM_Objects_subset.xml (format=pastperfectxml)
(1 error occurred)"

A 3 mb file will be read just fine. But I just carved up a 70 mb file down to 7 mb and still get the error. Previously Ive been able to load files as large as 20 mb.

Environment

None

Activity

Show:
User known
February 23, 2016, 2:34 PM

Are you sure the XML is well formed?

Jonathan Byerley
February 23, 2016, 2:36 PM

Yes - if i carve the same file down to 1.5 mb it loads fine.

User known
February 23, 2016, 2:46 PM

Is the 3mb version well formed?

Jonathan Byerley
February 23, 2016, 2:55 PM

Ah gotcha. Oxygen won't load the larger files so i'm using bbedit which doesnt validate anything. Turns out the 3mb version has an "invalid XML character (Unicode: 0x1e) was found in the element content of the document."

<descrip>
Black and white cartoon by Roy Reid, July 8th 1965. Two men at a table drinking beer, one being picked up by a hound dog by the collar of his jacket saying to the other man, "Don't let it bother you ~It's just Joe Zatzman wants me to form a quorum."</descrip>

Not sure what's wrong with the text above, but when i erase it the error falls away and the file loads.

User known
February 23, 2016, 2:57 PM

send me the files and I'll fix them.

Assignee

User known

Reporter

Jonathan Byerley

Labels

Priority

Blocker
Configure