Most Frequently Asked Talend Interview Question: How is tMap different from tFilterrow in Talend? How will you Integrate SVN with Talend?
Talend accepts the next-generation cloud and data integration software. It holds a promising level of power to exploit a massive job boom. The opportunity is approaching the IT market, so it’s the best time to prepare for upcoming talent professional openings.
To become a successful talent professional, you have to crack the tricky interviews of the company. As the demand is going to be at the peak, the competition will be tough. So you have to first well-prepare yourself to answer the interview questions. Here are a few FA Talend interview questions with maximum possible smart answers to help you attend the interview with full confidence.
Talend Interview Question
Q1. How will you manage to get files from FTP servers?
We have to configure the TFTP get component so that it can give text-only files to the root directory to the local directories. To do that, we have to follow 5 simple steps.
- Open the ‘Basic Settings’ view by double-clicking ‘tFTPGet components’.
- Provide the connection details required for specified FTP access by selecting and clicking the available options.
- Under the local directory, field selects the local directory under which you are willing to download the targeted files and folders.
- Under the remote directory, field selects the FTP server directory under which you are willing to download the targeted files and folders.
- Click on the [+] button in the Files table to add a line and the Filemask column field. Put *.txt within double quotation that will ensure the downloading of only text files.
Now you can access the FTP server by logging in with your credential and download the required text files,
Q2. In which way you will handle a Late Arriving Dimension Using Talend?
The Handling approach will completely depend on the project scenario. overall four types of approaches are available in the latest talent version.
- Postpone the implementation of fact records until you get dimension records.
The best way to postpone the fact record implementation is to put the fact row in a susceptible table.
- Go with SCD type 2 changes- Collect the dimension records from such a system that owns retro effective dates.
- Using ‘Unknown’ dimension record to revert to the default dimension recond- This is the most applied and simple approach. Just assign the ‘unknown’ dimension member to the factual record. Even these facts can be kept under the susceptible table mentioned in the first approach.
- Start interfering with the dimension record- Create a new dimension record with a new surrogate key and implement, But for the incoming fact, the same surrogate key has to be used.
Q3. What challenges might you expect within your applied late arriving dimension handling approach?
For the first approach, live reporting is not possible. You have to wait to report your fact row until all of the dimension records are handle in the proper manner.
For the second approach, a kind of double work is necessary. You have to take a new surrogate key for the record collection from the retro effective system date. The fact table you have previously taken has to be updated and ensure the correct reassignment of the same to the new surrogate key.
For the third approach, it is crucial to keep in mind that once the dimensional data arrives, we have to go to the table to update any fact records assigned to ‘unknown.’
For the fourth approach, the same thing has to complete as the third one.
Q4. How to execute more than one sub-jobs parallel?
We can use either of the multi-thread execution features available in Talend or using tParallelize component.
For multi-thread execution, we have to follow three consecutive steps.
- Preparation of jobs that will read the employees’ data set in different contexts.
- We have to prepare a parent job concerning which other parallel jobs are to be run.
- Finally execute all the created jobs.
Under tparallelize components, we have to set up different tMsgboxs for each of the sub-jobs. All of the sub-jobs have to be executed simultaneously. However, before execution for each component, different dialogues boxes as per requirement have to be created.
Connection and synchronization of all the created tMsgboxes with the parent tParallelize component is the most important thing that needs to be done cautiously.
But the tParallelize component is available only in the subscribed version.
Q5. Can you give me an idea about tReplicate?
This is nothing but the field where we can duplicate multiple numbers of source data into multiple copies. For example, suppose we got data for freelancers based on the number of hours they spend per day on the project. Now we can copy the data elsewhere within the Talend using the tReplicate feature and filter the data based on different parameters like highest to lowest cost per day, most paid freelancer, hours spent per day, etc. This is a standard component available across all Talend products, and it belongs to the Orchestration family.
This is a non-startable component, which means we always need input and output components. So, this component can be used in intermediate steps only.
Q6. How will you optimize Talend’s performance?
In one word the answer is to remove all unnecessary things like unusable fields, columns, data, or records. We can perform these removal tasks using the t-Filter features.
In case we retrieve the database, we can use the select quarry feature. Effective use of Database Bulk components can also optimize the Talend performance.
Allocating a good amount of memory also optimizes the Talend performance to a greater extent. The use of tParallelize component offers better performance than the manual parallel execution of sub-jobs.
Q7. How is tMap different from tFilterrow in Talend?
Both are filter components. But for tMap we can use filters within different sources at the same time while using tFilter rows we can input several conditions to filter input rows within the same source.
tMap is more powerful compared to tFilter as it owns additional features like lookup, joins, etc. Even application of transformation is also possible for both input and output links.
On the contrary, tFilterrow features are limited to apply to the input source only. That is tFilter allows the source to apply filters only on the course data columns. Not only that, a maximum of two output links can be there in the case of tfilter, and the input link is limited to one.
Q8. How will you differentiate between the ETL and ELT components of Talend Open Studio?
Both are process components.
- E= Extract
- T= Transform
- L= Load
In the first process, We transform the data immediately after extracting it from the source to fit within the current project requirements. So we can do this with such cases where very small changes are required, especially in the case of a project with the least number of source variations.
In the second process, the loading is done before the transformation. So, we can keep the column headings the same as the raw source. So, no data conversion and or calculation is required in this case. Rather we can fit the project requirement from the raw data just by using unnecessary columns of other data, just by doing filters. This is time-saving and performance-optimizing at the same time.
Q 9. How will you Integrate SVN with Talend?
- First, go to the installation library and open ‘\configuration\config.ini’.
- Just copy-paste ‘svn.update.info.check=true/false
- Finally, restart the Talend Studio application.
This is the basic integration of SVN with Talend. Processes get complex depending on project requirements. The difference mainly happens in installing the external modules.
Q 10. According to you, what are the 7 best practices in Talend?
As the best practitioner of Talend, we have to keep the data easily readable and yes, it has to be the least possible complex. To adhere to that we have to do few things,
- Always keep changing the component name as per their expected output.
- Always keep unconnected components organized so that they can be found easily when needed in the later phase.
- Before starting the design process thoroughly go through the documentation. Start only after going through this document.
- Always focus on the reusability of your current design and keep it up-to-date.
- It’s always better to avoid camel dependencies.
- Do behavior analysis at every stage before dropping into the next stage.
- And last but not least is to use standard HTTP web codes.
Although these are not the only questions that you will face during your Talend interviews, always remember there will remain a continuation of the next question when you are answering one question. The most important thing is that you always enrich your basic knowledge. No need to say that this will increase your confidence level, and no doubt you will crack the interview.
So, Best of Luck!