- What's the basic-control-file?
- How can I generate the basic-control-file?
- Is it required to modify the generated basic-control-file?
- What are identifying elements or attributes?
- Why are identifying elements/attributes needed?
- How can I define identifying elements or attributes?
- What is a row in xml-data?
- Can I write my own converting-functions?
- Can I configure forbidden elements or attributes?
- How can I preserve or trim white-space?
- comparing xml-data
- How do I execute comparing two xml-files?
- Where do I configure the comparison-rules?
- How can I compare non-identifying elements or attributes?
- How can I exclude a element or attribute from comparison.
- How can I compare elements, which are in different order?
- Which differences between xml-files can be detected?
- If identifying elements/attributes don't' match?
- If more than one row have the same identifying elements?
- Consequences, if I declare an element as identifying.
- How can I compare rows, which are in different sequence?
- How can I compare different formatted data?
- How can I convert element/attribute-values before comparing?
- Child-element as identifying for hierarchically upper row
- Can I stop comparison, if there are x differences detected?
- What are the results of the comparison?
- What is the content of the error-file?
- What is the content of the statistic-file?
- How do I generate html-output?
- How do i generate a xml-output, which contains the differences in the xml-context?
- How do i generate a pdf-output, which contains the differences in the xml-context?
- merging xml-data
- regrouping xml-data
- sorting xml-data
- Can I compare large xml-files?
- Any special memory requirements comparing large files?
- Any special disk-space-requirements?
- Which parameters have great influence on performance?
- Is <xml>cmp suitable for regression-tests?
- Are there any examples delivered with the xmlcmp-software?
- Are xml-namespaces supported?
- Can I use a different sax-parser?
- Can I compare xml-data having no DTD or schema?
- Are there any special software or hardware requirements?
- Reason for "java.lang.Unsupported ClassVersionError"
1.1. What's the basic-control-file?
The basic-rules for processing xml-data are defined in the basic control-file. Examples for those rules:
- Which elements/attributes are identifying.
- Should the content of a element/attribute be converted before further processing.
- Should a element/attribute not be processed.
All comparing-rules are also defined in the basic-control-files.
For merging, converting or ordering xml-data you need a additional merging-control-, converting-control- or ordering-control-file.
1.2. How can I generate the basic-control-file?
Procedure "xmlcmpcreate.sh" can generate a basic-control-file for a certain xml-file: Example: generating basic-control-file "cmp.xml" for xml-file "test.xml".
|$ xmlcmpcreate.sh test.xml > cmp.xml|
1.3. Is it required to modify the generated basic-control-file?
No, generally it is not required.
A comparison will work accurate and detect all difference without having modified the generated basic-control-file.
But if you want to have influence for example on the comparison-rules, you do this by modifying the basic-control-file.
For example you want to define that
- some elements should not be compared,
- some content should be converted before comparison,
- the sequence of rows should be the same
In these cases you have to adapt the basic-control-file.
1.4. What are identifying elements or attributes?
Elements are identifying, if their content is identifying for the parent-element (also called row).
Attributes are identifying, if their content is identifying for the actual element, to which the attribute belongs.
one "" is identified by the content of the elements "", "" and "".
one "" is identified by the content of the attribute "id".
1.5. Why are identifying elements/attributes needed?
Identifying elements/attributes are needed to decide, which elements (=rows) should be compared or merged. For example: Only rows, which have the same content in the identifying elements and attributes will be compared.
1.6. How can I define identifying elements or attributes?
Elements are marked as identifiying by the attribute "ident_text" in the basic-control-file.
The elements ", "" and "" are identifying for element "".
Attributes are marked as identifiying with the attribute "ident_att_
The attribute "id" of element "
If you generate the basic-control-file via the shell-procedure "xmlcmpcreate.sh" all elements and attributes are defined as identifying.
Decide which elements and attribute are really "identifying", and accordingly adapt the basic-control-file. Too many identifying marks will increase needed disk-space and may slow down performance, if you have hierarchically nested rows and many identifying attributes in each hierarchy-level.
1.7. What is a row in xml-data?
A row is an element, which has child-elements and/or attributes. The content of all detail-elements and attributes belong to the row.
A row has identifying elements and/or attributes.
In the following example there are two person-rows. Each row is identified by the attribute "id".
The following example extends last example 1.
Each person-row has now a child-address-row.
One address-row is identified by attribute "id" of element "" and by the elements "", "" and ""
1.8. Can I write my own converting-functions?
Yes, you can.
Your own converting-function must be written in java and they must implement the java-interface "de.sofika.test.ConvertInterface".
1.9. Can I configure forbidden elements or attributes?
Yes, you can.
If shell-variable "isOtherForbidden" has the value "true", there will be a error-message thrown for every element/attribute in the xml-file, which is not mentioned in the basic-control-file.
1.10. How can I preserve or trim white-space?
Per default white-space in elements or attributes is considered by the processing.
White-space will be trimmed for all elements and attributes, if you set shell-variable "isTrimAll" to "true".
If you only want to trim certain elements or attributes, you can do this via converting the content with the help of delivered java-class "de.sofika.test.ConvertTrim".
2. comparing xml-data
2.1. How do I execute comparing two xml-files?
By executing shell-procedure "xmlcmp.sh"
|$ xmlcmp.sh cmp1.xml test1a.xml test1b.xml|
2.2. Where do I configure the comparison-rules?
The comparison-rules have to be configured in the basic-control-file.
The basic-control-file is also an xml-file.
2.3. How can I compare non-identifying elements or attributes?
Non identifying elements or attributes are compared, if the attribute "cmp_text" or "cmp_att_" is set to "true"
2.4. How can I exclude a element or attribute from comparison.
Elements or attributes are not compared, if they are not in the basic-control-file, or if attribute cmp_text/cmp_att_ has the value "false".
in the following example the content of "" will not be compared:
2.5. How can I compare elements, which are in different order?
Per default the sequence of elements plays no role for the comparison. You can define via the attribute element_sequence="true", that the sequence of elements is compared.
If the elements "", "" and "" are not in the same sequence in both xml-file, there will be an error-message thrown.
2.6. Which differences between xml-files can be detected?
All differences in elements and attributes can be detected.
Differences can have the following error-messages:
|E0011||Content of element '' is different.|
|E0012||Input[1|2] is missing element ''|
|E0013||Tag '' of input[1|2] is not allowed.|
|E0021||Content of attribute '' is different.|
|E0022||Input[1|2] is missing attribute ''|
|E0023||Attribute ''of input[1|2] is not allowed.|
|E0032||Identity-path '' of input[1|2] is missing in input[1|2].|
|E0034||Detail-elements of identity-path '' have different sequence.|
|E0035||Row-sequence in identity-path '' is different.|
|E0036||Count of rows with same identity-path '' is different.|
|E0037||There are rows with same identity '' but of these rows have different content, for example in element ''|
2.7. If identifying elements/attributes don't' match?
If the identifying elements/attributes of one row of file1 do not match any row of file2, then there will be a error message.
In file2 there exists no person with "id" "2" or "3". Therefore the following error-messages will be thrown for id="2" and "3":
identity-path '/list/person" of input1 is missing in input2
1.8. If more than one row have the same identifying elements?
It is no problem at all.
Identifying elements/attributes have not be a unique primary key. But: You should choose the identifying elements/attributes so, that they identify most exactly one row. Because of performance-issues the ideal situation is, when the content of the identifying elements/attributes always identifies exactly one row.
A person is identified by "":
There are 3 person with the same name "Fischer", and with 3 different professions.
There are also 3 person with the same name "Fischer", but only two of them have the same profession as in file1 and the persons are ordered in an other sequence.
The comparison will throw the following error:
|E0037 there are rows with same identity '/list_person/person' but 1 of these rows have different content, for example in element '/list_person/person/profession'.|
2.9. Consequences, if I declare an element as identifying.
If you define an element or attribute as identifying, this has consequences on the kind of error-message, you will get, if this element or attribute has different values in the xml-files.
You have a element "" which has the child-elements "" and "".
In both xml-files there is a person with "Fischer", but in one file with "New York" and in the other file with residence "Washington".
If both elements and are defined as identifying, you will get the following two error-messages:
|E0032 identity-path |
E0032 identity-path of input2 is missing in input1. of input1 is missing in input2.
If only element is defined as identifying, you will get the following one error-messages:
|E0011 content of element '/person/residence' is different'.|
2.10. How can I compare rows, which are in different sequence?
Per default the sequence of rows plays no role for the comparison. You can define via the attribute "row_sequence=true", that the sequence of rows in both files has to be the same.
2.11. How can I compare different formatted data?
You configure in the basic-control-file, that the different formatted data is converted in a unique, equal format before comparison.
2.12. How can I convert element/attribute-values before comparing?
You can convert element/attribute-content via special "convert"-attributes in the basic-control-file:
Element "" is differently formatted. In file1 for example it has the value "2005-01-01" and in file2 the value "2005.01.01". The java-converting-class "de.sofika.test.ConvertDate" converts the different formatted dates in a unique comparable format:
With <xml>cmp there are the following java-converting-classes delivered:
|de.sofika.test.ConvertSortedStrings()||If in a element/attribute are several strings, then they will be sorted.|
2.13. Child-element as identifying for hierarchically upper row
That is a common and typical problem, when comparing xml-data.
For example: A person is not identified alone by the person-name and person-firstname. A person needs the address for identifying.
In the following example two persons in different files have the same names but have different addresses and different professions:
The -element has the attribute "ident_master=true". That has the effect, that all identifying elemnts of the address-row (,, ) are also identifying elements for the person-row.
If the two files are compared, the following correct error-message will arise:
|E0032 identity-path '/list_person/person' of input1 is missing in input2|
E0032 identity-path '/list_person/person' of input2 is missing in input1
Without attribute "ident_master=true" there would be the following "false" error-messages:
|E0032 identity-path '/list_person/person/address' of input1 is missing in input2|
identity-path '/list_person/person/address' of input2 is missing in input1
E0011 content of element '/list_person/person/profession" is different
2.14. Can I stop comparison, if there are x differences detected?
Yes, you can.
Per default all differences will be detected. Via the shell-variable "isStopIfErrorCount" you can limit the count of detected differences. That may improve performance.
2.15. What are the results of the comparison?
<xml>cmp delivers the following results:
2.16. What is the content of the error-file?
The error-file is also an xml-file.
In the error-file there are all error-messages (=differences found).
The errors are grouped by the error-message-text.
Every error-row has a ""-element. In this element there are listed the identifying values of the row, where the difference has been found.
In the " or -tag there are the non-identifying elements of the actual row listed.
2.17. What is the content of the statistic-file?
The statistic-file is also an xml-File.
In the statistic-file there are all parameters and statistic informations of the comparison recorded. For example: how many errors found, which error how often, how much time consumed the comparison and so on …
2.18. How do I generate html-output?
If the shell-variable "htmlDir" is set, then the error-file will be converted to html
2.19. How do i generate a xml-output, which contains the differences in the xml-context?
If shellvariable "isMerge" has the value "true", then a xml-file is created, which contains all data of file 1 and file 2 and all differences.
If shellvariable "isMergeOnlyDiff" has the value "true", then a xml-file is created, which contains only the differences between file 1 and file 2.
2.20. How do i generate a pdf-output, which contains the differences in the xml-context?
If shellvariable "isMergePrintPdf" has the value "true", then a xml- and a pdf-file is created, which contain all data of file 1 and file 2 and all differences.
If shellvariable "isMergeOnlyDiffPrintPdf" has the value "true", then a xml- and a pdf-file is created, which contain only the differences between file 1 and file 2.
3. merging xml-data
3.1. How do I execute merging two xml-file?
By executing shell-procedure "xmlmerge.sh"
|$ xmlmerge.sh cmp1.xml merge1.xml test1a.xml test1b.xml|
3.2. Where do I configure the merging-rules?
The merging-rules have to be configured in the merge-control-file.
The merge-control-file is also an xml-file.
3.3. How can I generate the merge-control-file?
Procedure "xmlmergecreate.sh" can generate a merge-control-file for a certain basic-control-file: Example: generating merge-control-file "merge.xml" for basic-control-file "cmp.xml".
|$ xmlmergecreate.sh cmp.xml > merge.xml|
4. regrouping xml-data
4.1. How do I execute regrouping a xml-file?
By executing shell-procedure "xmltoxml.sh"
|$ xmltoxml.sh cmp1.xml toxml1.xml test1.xml|
4.2. Where do I configure the regrouping-rules?
The regrouping-rules have to be configured in the toxml-control-file.
The toxml-file is also an xml-file.
4.3. How can I generate the regrouping-control-file?
The toxml-control-file is also an xml-file. It has nearly the same structure like a basic-control-file,
so you can create your toxml-control-file by just copying your basic-control-file and adapting it with an editor to your needs.
5. sorting xml-data
5.1. How do I execute sorting a xml-file?
By executing shell-procedure "xmlsort.sh"
|$ xmlsort.sh cmp1.xml sort1.xml test1.xml|
5.2. Where do I configure the sorting-rules?
The sorting-rules have to be configured in the sort-control-file.
The sort-control-file is also an xml-file.
5.3. How can I generate the sort-control-file?
Procedure "xmlsortcreate.sh" can generate a sort-control-file for a certain basic-control-file: Example: generating sort-control-file "sort.xml" for basic-control-file "cmp.xml".
|$ xmlsortcreate.sh cmp.xml > sort.xml|
6.1. Can I compare large xml-files?
Yes you can.
"<xml>cmp" is designed to compare very large xml-files.
Comparing very large xml-files for example with one gigabyte is as easy as comparing xml-files with only some kilobytes in size.
You have nothing special to do!
It is not required to increase the java-heap-size or so.
The comparison will always work.
You will not see - after waiting a lot of time - the frustrating message "java.lang.OutOfMemoryError: java heap space", as it tends to be by other comparison-tools.
6.2. Any special memory requirements processing large files?
Even with a small java-heap-size for example 100 megabytes you can process large files about for example 500megabytes with very good performance.
Increasing the java-heap-size may improve performance , but generally only round about some few percentages.
6.3. Any special disk-space-requirements?
Normally you need three times the size of the two comparing files as temporary disk-space.
If you compare two files and each file has about 100kilobytes you will need about 600kilobyte temporary disk-space.
How much disk-space is needed depends on many factors. Important factors are:
- How many elements/attributes are identifying for a row?
- How hierarchically nested are the rows, which have many identifying elements/attributes?
- Is the value of attribute element_sequence="true"?
In the statistic-file will be written, how much space has been required.
6.4. Which parameters have great influence on performance?
<xml>cmp works with three java-threads.
So if you have more processors performance will improve.
The performance of the disks are decisive, because the files are written several times into temporary files. You can define up to four temporary directories for these files via the shell-variable TMPDIR, TMPDIR1, TMPDIR2 and TMPDIR3. You will gain best performance, if the temporary directories have different, fast disk-controller.
7.1. Is <xml>cmp suitable for regression-tests?
Yes! <xml>cmp is designed for regression-tests.
In regression-tests you can evaluate the exit-code of the procedure "xmlcmp.sh". Exit-code unequal "0" means, there have been differences found.
7.2. Are there any examples delivered with the xmlcmp-software?
Yes, there are round about 100 self-testing examples delivered with the software.
You can execute all examples with the script examples/execute-examples.sh.
7.3. Are xml-namespaces supported?
Yes, namespaces are supported.
7.4. Can I use a different sax-parser?
Yes, you can.
You can use any JAXP1.1 compliant parser. Just fill shell-variable "parserClass" with the name of the class of the parser.
In the distribution of <xml>cmp is the xerces-SAX-Parser the default-parser.
7.5. Can I compare Xml-data having no DTD or schema?
yes, you can.
Generally: Even if there is a DTD, files are not validated against a DTD.
7.6. Are there any special software or hardware requirements?
<xml>cmp needs J2SE 1.5.0 (Linux) or higher. check your java-version with "$ java -version". (On Windows,DOS J2SE 1.6.0 or higher)
7.7. Reason for "java.lang.UnsupportedClassVersionError"
You have not the correct java-runtime-environment.
you need J2SE 1.5.0 or higher. check your java-version with "$ java -version".