We're updating the issue view to help you get more done. 

Bug : duplicate objects during import with relations between objects

Description

Seems that the import restrictions ' ... on idno ... ' do not work correctly at least in an xlxs mapping

I've made several tests, without solving my problem of multiples couples of objects with the same idno, sometime with preferred_labels = idno (exact match uppercase), sometimes with preferred_labels = ???, (with my original mapping and datafile, i've imported quite the double of objects !)

Simple example:
I join my excel test data file, my excel mapping file, very short and made for the tests using default.xml profile (no change).

Have a look at the Search object UI using * as wildcard : the sequentialy created objects are :

Object identifier Name (be careful of lowercase and UPPERCASE)
OBJECT_1 Object 1
OBJECT_2 OBJECT_2
OBJECT_3 OBJECT_3
OBJECT_2 Object 2
OBJECT_3 Object 3

due to the data file : (Object 1, object1) is created, ok

due to the refinery objectSplitter {"relationshipType": "similar", "objectType": "document", "delimiter": ";"}, (OBJECT_3, OBJECT_3) and (OBJECT_2, OBJECT_2) are created.

due to the data file : (Object 2, object2) is created, nok : should overwrite or merge

due to the refinery objectSplitter {"relationshipType": "similar", "objectType": "document", "delimiter": ";"}, (OBJECT_1, object 1) is linked.

due to the data file : (Object 3, object3) is created, nok : should overwrite or merge

no refenery for OBJECT_3

Analysis :
First refinery is ok, no possible indication of preferred_labels, so idno is selected for 'preferred_labels'
Second refinery is ok, found OBJET_1

But I've choosen one of the three possibilites for existingRecordPolicy

merge_on_idno
merge_on_idno_with_replace
overwrite_on_idno

I can't select an existingRecordPolicy with 'prefered_labels' choices, it is not possible to pass this data to the refinery.

The end of the analysis is :
if the object (idno, preferred_label) exist, merge or overwrite is working on idno
if the object (idno, preferred_label) does not exist, the merge or overwrite does not work on idno.

There is a problem : temporary index ? always match idno and preferred_labels ?

Environment

None

Activity

Show:
Eric Pierunek
June 26, 2015, 1:29 PM

I've done two more tests with this tiny xlsx attached datafile.

I've created two more mappings, instead of one with objects and Relationship import in the same run.
The first mapping (object only mapping) is only importing ^1,^2,^3 of the datafile, lines 1,2,3 of the attached mapping file. The second mapping (relationship mapping) only contains line 1 and 4 of the original mapping previously attached to this issue..

My first run with those two mappings was : object mapping, then Relationship mapping.(with an empty database)

The first result in Search Object UI after only (first) 'object mapping' is :
Object identifier Name
OBJECT_1 Object 1
OBJECT_2 Object 2
OBJECT_3 Object 3

The second result in the Search Object UI after (second) 'Relationship mapping' is :
Object identifier Name
OBJECT_1 Object 1
OBJECT_2 Object 2
OBJECT_3 Object 3

All is ok, with correct Relationships.

Then I've cleared the database again , and used the mappings in the reverse order.

The first result in the Search Object UI after (first) 'Relationship mapping' is :
Object identifier Name
OBJECT_1 ???
OBJECT_2 OBJECT_2
OBJECT_3 OBJECT_3
OBJECT_2 ???
OBJECT_3 ???

The second result in Search Object UI after (second) 'object mapping' is :
Object identifier Name
OBJECT_1 Object 1
OBJECT_2 Object 2
OBJECT_3 Object 3
OBJECT_2 ???
OBJECT_3 ???

I've found the ??? content of field that i had evocated in the original post.

I can say that it is a bug : If you only check for unique idnos, the good choices for existingRecordPolicy are those with only 'idno'. The refinery in the mapping must then only check for a record with idno , and not create exotic extra records
But depending of the order in running the mappings, other records with same idno OBJECT_2 and object_2 or ??? (in my examples) are created.

With this simple example, its easy to have a good file to import with two mappings : one for objects, one for Relationship, which produces a good result.
But in my case, I'm joining 6 databases, (2 paradox, 2 MS dbf, 2 MySQL), with about 20 relationships (at this time), and 20000 records. It is quite impossible to create one file with full correct relationships.

Providence must respect my choice : merge or overwrite only on idno ! as proposed in existingRecordPolicy

Assignee

User known

Reporter

Eric Pierunek

Components

Affects versions

Priority

Major
Configure