Working with XML - Talend

Working with XML

XML is the eXtensible Markup Language and is both human and machine-readable. XML is a form of Electronic Data Interchange.
Here, let us create a new input file definition in our Metadata Repository. The data is both simple and useful, so provides a good real-world use for some XML content.

MindMajix Talend Training course is designed to make you an expert in developing ETL solutions using the Talend platform. Start Learning!

Create a Sample File

The following shows a few entries from the Wikipedia Abstract Database. These files are large. You will find it helpful to create your own small sample file or to use the data shown below.

Create a Sample File

Create File Specification

Open the New XML File dialog, by selecting Metadata->File XML and then selecting Create file xml by activating the popup menu (mouse right-click).

New XML File

Enter a Name, Purpose & Description

For this tutorial, set Name to WikipediaAbstract and press Next.

WikipediaAbstract

Select Model

Select the specification model. This may be either Input XML or an Output XML. For this tutorial, we are creating an Input XML model. Hit the Next button to proceed.

Related Article: Talend Interview Questions & Answers
Input XML model

Select XML File

Hit the Browse button, navigate to your sample XML file and then select it. You should see that it is displayed in the Schema Viewer pane. Hit the Next button to continue.

Schema Viewer

Map the Schema

This dialog allows you to perform your schema mapping. Two key aspects to this, is to define an Xpath loop expression and the Fields to extract. You can input these values manually, or drag values from the Source Schema pane.

 Subscribe MindMajix YouTube Channel

Source Schema pane

Xpath loop expression

This field allows us to specify an Absolute XPath expression. In our sample XML file, we have a number of ... elements, and these are the elements that we would like to loop through, to produce our raw data.
You can drag the element doc from the Source Schema pane to the Absolute XPath expression.
Fields to extract

We can now specify the fields to extract. As with the Xpath loop expression, we can drag these elements from the Source Schema pane. The elements of interest are title, URL, and abstract. Drag these elements across to Relative or absolute XPath expression fields, in the Fields to extract the grid.

The Fields to extract grid allows you to specify both the Relative or absolute XPath expression and the (output) Column Name. You’ll see from the following screenshot, that the element abstract has been renamed to the abstract text. This is because abstract is a Java reserved word.

Now that you’ve completed the mapping hit the Refresh Preview button that can be found on the Preview pane. The dialog should now look like the following screenshot, including the Preview pane that shows correctly mapped row data. Once the file has been correctly mapped, hit the Next button to proceed.

Related Article: Talend Tutorial
Add Metadata file on repository

Review output Schema

You will now be presented with the definition of the output Schema. Talend has made it’s best effort at correctly defining this schema, by sampling the available data. The datatypes have been correctly mapped; however, we can now take the opportunity to increase the column lengths, as shown in the next screenshot. When you are happy that you schema is correctly defined, hit the Finish button to complete this operation.

Review output Schema

Conclusion

You have now successfully defined an XML input file definition and can use this within your Jobs, to read and process XML data.

Explore TALEND Sample Resumes! Download & Edit, Get Noticed by Top Employers!

 

Job Support Program

Online Work Support for your on-job roles.

jobservice

Our work-support plans provide precise options as per your project tasks. Whether you are a newbie or an experienced professional seeking assistance in completing project tasks, we are here with the following plans to meet your custom needs:

  • Pay Per Hour
  • Pay Per Week
  • Monthly
Learn MoreGet Job Support
Course Schedule
NameDates
Talend TrainingNov 19 to Dec 04View Details
Talend TrainingNov 23 to Dec 08View Details
Talend TrainingNov 26 to Dec 11View Details
Talend TrainingNov 30 to Dec 15View Details
Last updated: 03 Apr 2023
About Author

I am Ruchitha, working as a content writer for MindMajix technologies. My writings focus on the latest technical software, tutorials, and innovations. I am also into research about AI and Neuromarketing. I am a media post-graduate from BCU – Birmingham, UK. Before, my writings focused on business articles on digital marketing and social media. You can connect with me on LinkedIn.

read less